1. 02 2月, 2006 12 次提交
  2. 19 1月, 2006 3 次提交
  3. 09 1月, 2006 10 次提交
    • C
      [PATCH] SwapMig: Switch error handling in migrate_pages to use -Exx · d0d96328
      Christoph Lameter 提交于
      Use -Exxx instead of numeric return codes and cleanup the code in
      migrate_pages() using -Exx error codes.
      
      Consolidate successful migration handling
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      d0d96328
    • C
      [PATCH] SwapMig: Extend parameters for migrate_pages() · d4984711
      Christoph Lameter 提交于
      Extend the parameters of migrate_pages() to allow the caller control over the
      fate of successfully migrated or impossible to migrate pages.
      
      Swap migration and direct migration will have the same interface after this
      patch so that patches can be independently applied to the policy layer and the
      core migration code.
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Cc: Andi Kleen <ak@muc.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      d4984711
    • C
      [PATCH] SwapMig: Drop unused pages immediately · ee27497d
      Christoph Lameter 提交于
      Drop unused pages immediately
      
      If a page is encountered that is only referenced by the migration code then
      there is no reason to swap or migrate the page.  Release the page by calling
      move_to_lru().
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      ee27497d
    • C
      [PATCH] SwapMig: add_to_swap() avoid atomic allocations · 1480a540
      Christoph Lameter 提交于
      Add gfp_mask to add_to_swap
      
      add_to_swap does allocations with GFP_ATOMIC in order not to interfere with
      swapping.  During migration we may have use add_to_swap extensively which may
      lead to out of memory errors.
      
      This patch makes add_to_swap take a parameter that specifies the gfp mask.
      The page migration code can then make add_to_swap use GFP_KERNEL.
      Signed-off-by: NHirokazu Takahashi <taka@valinux.co.jp>
      Signed-off-by: NDave Hansen <haveblue@us.ibm.com>
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      1480a540
    • C
      [PATCH] SwapMig: CONFIG_MIGRATION fixes · 8419c318
      Christoph Lameter 提交于
      Move move_to_lru, putback_lru_pages and isolate_lru in section surrounded by
      CONFIG_MIGRATION saving some codesize for single processor kernels.
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      8419c318
    • C
      [PATCH] Swap Migration V5: Add CONFIG_MIGRATION for page migration support · 7cbe34cf
      Christoph Lameter 提交于
      Include page migration if the system is NUMA or having a memory model that
      allows distinct areas of memory (SPARSEMEM, DISCONTIGMEM).
      
      And:
      - Only include lru_add_drain_per_cpu if building for an SMP system.
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      7cbe34cf
    • C
      [PATCH] Swap Migration V5: migrate_pages() function · 49d2e9cc
      Christoph Lameter 提交于
      This adds the basic page migration function with a minimal implementation that
      only allows the eviction of pages to swap space.
      
      Page eviction and migration may be useful to migrate pages, to suspend
      programs or for remapping single pages (useful for faulty pages or pages with
      soft ECC failures)
      
      The process is as follows:
      
      The function wanting to migrate pages must first build a list of pages to be
      migrated or evicted and take them off the lru lists via isolate_lru_page().
      isolate_lru_page determines that a page is freeable based on the LRU bit set.
      
      Then the actual migration or swapout can happen by calling migrate_pages().
      
      migrate_pages does its best to migrate or swapout the pages and does multiple
      passes over the list.  Some pages may only be swappable if they are not dirty.
       migrate_pages may start writing out dirty pages in the initial passes over
      the pages.  However, migrate_pages may not be able to migrate or evict all
      pages for a variety of reasons.
      
      The remaining pages may be returned to the LRU lists using putback_lru_pages().
      
      Changelog V4->V5:
      - Use the lru caches to return pages to the LRU
      
      Changelog V3->V4:
      - Restructure code so that applying patches to support full migration does
        require minimal changes. Rename swapout_pages() to migrate_pages().
      
      Changelog V2->V3:
      - Extract common code from shrink_list() and swapout_pages()
      Signed-off-by: NMike Kravetz <kravetz@us.ibm.com>
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Cc: "Michael Kerrisk" <mtk-manpages@gmx.net>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      49d2e9cc
    • C
      [PATCH] Swap Migration V5: PF_SWAPWRITE to allow writing to swap · 930d9152
      Christoph Lameter 提交于
      Add PF_SWAPWRITE to control a processes permission to write to swap.
      
      - Use PF_SWAPWRITE in may_write_to_queue() instead of checking for kswapd
        and pdflush
      
      - Set PF_SWAPWRITE flag for kswapd and pdflush
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      930d9152
    • C
      [PATCH] Swap Migration V5: LRU operations · 21eac81f
      Christoph Lameter 提交于
      This is the start of the `swap migration' patch series.
      
      Swap migration allows the moving of the physical location of pages between
      nodes in a numa system while the process is running.  This means that the
      virtual addresses that the process sees do not change.  However, the system
      rearranges the physical location of those pages.
      
      The main intent of page migration patches here is to reduce the latency of
      memory access by moving pages near to the processor where the process
      accessing that memory is running.
      
      The patchset allows a process to manually relocate the node on which its
      pages are located through the MF_MOVE and MF_MOVE_ALL options while
      setting a new memory policy.
      
      The pages of process can also be relocated from another process using the
      sys_migrate_pages() function call.  Requires CAP_SYS_ADMIN.  The migrate_pages
      function call takes two sets of nodes and moves pages of a process that are
      located on the from nodes to the destination nodes.
      
      Manual migration is very useful if for example the scheduler has relocated a
      process to a processor on a distant node.  A batch scheduler or an
      administrator can detect the situation and move the pages of the process
      nearer to the new processor.
      
      sys_migrate_pages() could be used on non-numa machines as well, to force all
      of a particualr process's pages out to swap, if someone thinks that's useful.
      
      Larger installations usually partition the system using cpusets into sections
      of nodes.  Paul has equipped cpusets with the ability to move pages when a
      task is moved to another cpuset.  This allows automatic control over locality
      of a process.  If a task is moved to a new cpuset then also all its pages are
      moved with it so that the performance of the process does not sink
      dramatically (as is the case today).
      
      Swap migration works by simply evicting the page.  The pages must be faulted
      back in.  The pages are then typically reallocated by the system near the node
      where the process is executing.
      
      For swap migration the destination of the move is controlled by the allocation
      policy.  Cpusets set the allocation policy before calling sys_migrate_pages()
      in order to move the pages as intended.
      
      No allocation policy changes are performed for sys_migrate_pages().  This
      means that the pages may not faulted in to the specified nodes if no
      allocation policy was set by other means.  The pages will just end up near the
      node where the fault occurred.
      
      There's another patch series in the pipeline which implements "direct
      migration".
      
      The direct migration patchset extends the migration functionality to avoid
      going through swap.  The destination node of the relation is controllable
      during the actual moving of pages.  The crutch of using the allocation policy
      to relocate is not necessary and the pages are moved directly to the target.
      Its also faster since swap is not used.
      
      And sys_migrate_pages() can then move pages directly to the specified node.
      Implement functions to isolate pages from the LRU and put them back later.
      
      This patch:
      
      An earlier implementation was provided by Hirokazu Takahashi
      <taka@valinux.co.jp> and IWAMOTO Toshihiro <iwamoto@valinux.co.jp> for the
      memory hotplug project.
      
      From: Magnus
      
      This breaks out isolate_lru_page() and putpack_lru_page().  Needed for swap
      migration.
      Signed-off-by: NMagnus Damm <magnus.damm@gmail.com>
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      21eac81f
    • A
      [PATCH] drop-pagecache · 9d0243bc
      Andrew Morton 提交于
      Add /proc/sys/vm/drop_caches.  When written to, this will cause the kernel to
      discard as much pagecache and/or reclaimable slab objects as it can.  THis
      operation requires root permissions.
      
      It won't drop dirty data, so the user should run `sync' first.
      
      Caveats:
      
      a) Holds inode_lock for exorbitant amounts of time.
      
      b) Needs to be taught about NUMA nodes: propagate these all the way through
         so the discarding can be controlled on a per-node basis.
      
      This is a debugging feature: useful for getting consistent results between
      filesystem benchmarks.  We could possibly put it under a config option, but
      it's less than 300 bytes.
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      9d0243bc
  4. 07 1月, 2006 4 次提交
    • N
      [PATCH] mm: page_state opt · a74609fa
      Nick Piggin 提交于
      Optimise page_state manipulations by introducing interrupt unsafe accessors
      to page_state fields.  Callers must provide their own locking (either
      disable interrupts or not update from interrupt context).
      
      Switch over the hot callsites that can easily be moved under interrupts off
      sections.
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Cc: Hugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      a74609fa
    • C
      [PATCH] mm: add populated_zone() helper · f3fe6512
      Con Kolivas 提交于
      There are numerous places we check whether a zone is populated or not.
      
      Provide a helper function to check for populated zones and convert all
      checks for zone->present_pages.
      Signed-off-by: NCon Kolivas <kernel@kolivas.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      f3fe6512
    • A
      [PATCH] vmscan: balancing fix · 210fe530
      Andrew Morton 提交于
      Revert a patch which went into 2.6.8-rc1.  The changelog for that patch was:
      
        The shrink_zone() logic can, under some circumstances, cause far too many
        pages to be reclaimed.  Say, we're scanning at high priority and suddenly
        hit a large number of reclaimable pages on the LRU.
      
        Change things so we bale out when SWAP_CLUSTER_MAX pages have been
        reclaimed.
      
      Problem is, this change caused significant imbalance in inter-zone scan
      balancing by truncating scans of larger zones.
      
      Suppose, for example, ZONE_HIGHMEM is 10x the size of ZONE_NORMAL.  The zone
      balancing algorithm would require that if we're scanning 100 pages of
      ZONE_HIGHMEM, we should scan 10 pages of ZONE_NORMAL.  But this logic will
      cause the scanning of ZONE_HIGHMEM to bale out after only 32 pages are
      reclaimed.  Thus effectively causing smaller zones to be scanned relatively
      harder than large ones.
      
      Now I need to remember what the workload was which caused me to write this
      patch originally, then fix it up in a different way...
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      210fe530
    • A
      [PATCH] kill last zone_reclaim() bits · 7756b9e4
      Andrew Morton 提交于
      Remove the last bits of Martin's ill-fated sys_set_zone_reclaim().
      
      Cc: Martin Hicks <mort@wildopensource.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      7756b9e4
  5. 04 1月, 2006 1 次提交
    • Z
      [PATCH] add AOP_TRUNCATED_PAGE, prepend AOP_ to WRITEPAGE_ACTIVATE · 994fc28c
      Zach Brown 提交于
      readpage(), prepare_write(), and commit_write() callers are updated to
      understand the special return code AOP_TRUNCATED_PAGE in the style of
      writepage() and WRITEPAGE_ACTIVATE.  AOP_TRUNCATED_PAGE tells the caller that
      the callee has unlocked the page and that the operation should be tried again
      with a new page.  OCFS2 uses this to detect and work around a lock inversion in
      its aop methods.  There should be no change in behaviour for methods that don't
      return AOP_TRUNCATED_PAGE.
      
      WRITEPAGE_ACTIVATE is also prepended with AOP_ for consistency and they are
      made enums so that kerneldoc can be used to document their semantics.
      Signed-off-by: NZach Brown <zach.brown@oracle.com>
      994fc28c
  6. 29 11月, 2005 2 次提交
    • A
      [PATCH] shrinker->nr = LONG_MAX means deadlock for icache · ea164d73
      Andrea Arcangeli 提交于
      With Andrew Morton <akpm@osdl.org>
      
      The slab scanning code tries to balance the scanning rate of slabs versus the
      scanning rate of LRU pages.  To do this, it retains state concerning how many
      slabs have been scanned - if a particular slab shrinker didn't scan enough
      objects, we remember that for next time, and scan more objects on the next
      pass.
      
      The problem with this is that with (say) a huge number of GFP_NOIO
      direct-reclaim attempts, the number of objects which are to be scanned when we
      finally get a GFP_KERNEL request can be huge.  Because some shrinker handlers
      just bail out if !__GFP_FS.
      
      So the patch clamps the number of objects-to-be-scanned to 2* the total number
      of objects in the slab cache.
      Signed-off-by: NAndrea Arcangeli <andrea@suse.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      ea164d73
    • R
      [PATCH] temporarily disable swap token on memory pressure · f7b7fd8f
      Rik van Riel 提交于
      Some users (hi Zwane) have seen a problem when running a workload that
      eats nearly all of physical memory - th system does an OOM kill, even
      when there is still a lot of swap free.
      
      The problem appears to be a very big task that is holding the swap
      token, and the VM has a very hard time finding any other page in the
      system that is swappable.
      
      Instead of ignoring the swap token when sc->priority reaches 0, we could
      simply take the swap token away from the memory hog and make sure we
      don't give it back to the memory hog for a few seconds.
      
      This patch resolves the problem Zwane ran into.
      Signed-off-by: NRik van Riel <riel@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      f7b7fd8f
  7. 14 11月, 2005 1 次提交
  8. 30 10月, 2005 2 次提交
    • H
      [PATCH] mm: split page table lock · 4c21e2f2
      Hugh Dickins 提交于
      Christoph Lameter demonstrated very poor scalability on the SGI 512-way, with
      a many-threaded application which concurrently initializes different parts of
      a large anonymous area.
      
      This patch corrects that, by using a separate spinlock per page table page, to
      guard the page table entries in that page, instead of using the mm's single
      page_table_lock.  (But even then, page_table_lock is still used to guard page
      table allocation, and anon_vma allocation.)
      
      In this implementation, the spinlock is tucked inside the struct page of the
      page table page: with a BUILD_BUG_ON in case it overflows - which it would in
      the case of 32-bit PA-RISC with spinlock debugging enabled.
      
      Splitting the lock is not quite for free: another cacheline access.  Ideally,
      I suppose we would use split ptlock only for multi-threaded processes on
      multi-cpu machines; but deciding that dynamically would have its own costs.
      So for now enable it by config, at some number of cpus - since the Kconfig
      language doesn't support inequalities, let preprocessor compare that with
      NR_CPUS.  But I don't think it's worth being user-configurable: for good
      testing of both split and unsplit configs, split now at 4 cpus, and perhaps
      change that to 8 later.
      
      There is a benefit even for singly threaded processes: kswapd can be attacking
      one part of the mm while another part is busy faulting.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      4c21e2f2
    • L
      [PATCH] shrink_list(): skip anon pages if not may_swap · c340010e
      Lee Schermerhorn 提交于
      Martin Hicks' page cache reclaim patch added the 'may_swap' flag to the
      scan_control struct; and modified shrink_list() not to add anon pages to
      the swap cache if may_swap is not asserted.
      
      Ref:  http://marc.theaimsgroup.com/?l=linux-mm&m=111461480725322&w=4
      
      However, further down, if the page is mapped, shrink_list() calls
      try_to_unmap() which will call try_to_unmap_one() via try_to_unmap_anon ().
       try_to_unmap_one() will BUG_ON() an anon page that is NOT in the swap
      cache.  Martin says he never encountered this path in his testing, but
      agrees that it might happen.
      
      This patch modifies shrink_list() to skip anon pages that are not already
      in the swap cache when !may_swap, rather than just not adding them to the
      cache.
      Signed-off-by: NLee Schermerhorn <lee.schermerhorn@hp.com>
      Cc: Hugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      c340010e
  9. 28 10月, 2005 1 次提交
  10. 17 10月, 2005 1 次提交
    • L
      Fix memory ordering bug in page reclaim · 3d80636a
      Linus Torvalds 提交于
      As noticed by Nick Piggin, we need to make sure that we check the page
      count before we check for PageDirty, since the dirty check is only valid
      if the count implies that we're the only possible ones holding the page.
      
      We always did do this, but the code needs a read-memory-barrier to make
      sure that the orderign is also honored by the CPU.
      
      (The writer side is ordered due to the atomic decrement and test on the
      page count, see the discussion on linux-kernel)
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      3d80636a
  11. 13 9月, 2005 1 次提交
  12. 08 9月, 2005 1 次提交
    • P
      [PATCH] cpusets: formalize intermediate GFP_KERNEL containment · 9bf2229f
      Paul Jackson 提交于
      This patch makes use of the previously underutilized cpuset flag
      'mem_exclusive' to provide what amounts to another layer of memory placement
      resolution.  With this patch, there are now the following four layers of
      memory placement available:
      
       1) The whole system (interrupt and GFP_ATOMIC allocations can use this),
       2) The nearest enclosing mem_exclusive cpuset (GFP_KERNEL allocations can use),
       3) The current tasks cpuset (GFP_USER allocations constrained to here), and
       4) Specific node placement, using mbind and set_mempolicy.
      
      These nest - each layer is a subset (same or within) of the previous.
      
      Layer (2) above is new, with this patch.  The call used to check whether a
      zone (its node, actually) is in a cpuset (in its mems_allowed, actually) is
      extended to take a gfp_mask argument, and its logic is extended, in the case
      that __GFP_HARDWALL is not set in the flag bits, to look up the cpuset
      hierarchy for the nearest enclosing mem_exclusive cpuset, to determine if
      placement is allowed.  The definition of GFP_USER, which used to be identical
      to GFP_KERNEL, is changed to also set the __GFP_HARDWALL bit, in the previous
      cpuset_gfp_hardwall_flag patch.
      
      GFP_ATOMIC and GFP_KERNEL allocations will stay within the current tasks
      cpuset, so long as any node therein is not too tight on memory, but will
      escape to the larger layer, if need be.
      
      The intended use is to allow something like a batch manager to handle several
      jobs, each job in its own cpuset, but using common kernel memory for caches
      and such.  Swapper and oom_kill activity is also constrained to Layer (2).  A
      task in or below one mem_exclusive cpuset should not cause swapping on nodes
      in another non-overlapping mem_exclusive cpuset, nor provoke oom_killing of a
      task in another such cpuset.  Heavy use of kernel memory for i/o caching and
      such by one job should not impact the memory available to jobs in other
      non-overlapping mem_exclusive cpusets.
      
      This patch enables providing hardwall, inescapable cpusets for memory
      allocations of each job, while sharing kernel memory allocations between
      several jobs, in an enclosing mem_exclusive cpuset.
      
      Like Dinakar's patch earlier to enable administering sched domains using the
      cpu_exclusive flag, this patch also provides a useful meaning to a cpuset flag
      that had previously done nothing much useful other than restrict what cpuset
      configurations were allowed.
      Signed-off-by: NPaul Jackson <pj@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      9bf2229f
  13. 05 9月, 2005 1 次提交