1. 09 1月, 2006 3 次提交
    • C
      [PATCH] Swap Migration V5: Add CONFIG_MIGRATION for page migration support · 7cbe34cf
      Christoph Lameter 提交于
      Include page migration if the system is NUMA or having a memory model that
      allows distinct areas of memory (SPARSEMEM, DISCONTIGMEM).
      
      And:
      - Only include lru_add_drain_per_cpu if building for an SMP system.
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      7cbe34cf
    • C
      [PATCH] Swap Migration V5: migrate_pages() function · 49d2e9cc
      Christoph Lameter 提交于
      This adds the basic page migration function with a minimal implementation that
      only allows the eviction of pages to swap space.
      
      Page eviction and migration may be useful to migrate pages, to suspend
      programs or for remapping single pages (useful for faulty pages or pages with
      soft ECC failures)
      
      The process is as follows:
      
      The function wanting to migrate pages must first build a list of pages to be
      migrated or evicted and take them off the lru lists via isolate_lru_page().
      isolate_lru_page determines that a page is freeable based on the LRU bit set.
      
      Then the actual migration or swapout can happen by calling migrate_pages().
      
      migrate_pages does its best to migrate or swapout the pages and does multiple
      passes over the list.  Some pages may only be swappable if they are not dirty.
       migrate_pages may start writing out dirty pages in the initial passes over
      the pages.  However, migrate_pages may not be able to migrate or evict all
      pages for a variety of reasons.
      
      The remaining pages may be returned to the LRU lists using putback_lru_pages().
      
      Changelog V4->V5:
      - Use the lru caches to return pages to the LRU
      
      Changelog V3->V4:
      - Restructure code so that applying patches to support full migration does
        require minimal changes. Rename swapout_pages() to migrate_pages().
      
      Changelog V2->V3:
      - Extract common code from shrink_list() and swapout_pages()
      Signed-off-by: NMike Kravetz <kravetz@us.ibm.com>
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Cc: "Michael Kerrisk" <mtk-manpages@gmx.net>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      49d2e9cc
    • C
      [PATCH] Swap Migration V5: LRU operations · 21eac81f
      Christoph Lameter 提交于
      This is the start of the `swap migration' patch series.
      
      Swap migration allows the moving of the physical location of pages between
      nodes in a numa system while the process is running.  This means that the
      virtual addresses that the process sees do not change.  However, the system
      rearranges the physical location of those pages.
      
      The main intent of page migration patches here is to reduce the latency of
      memory access by moving pages near to the processor where the process
      accessing that memory is running.
      
      The patchset allows a process to manually relocate the node on which its
      pages are located through the MF_MOVE and MF_MOVE_ALL options while
      setting a new memory policy.
      
      The pages of process can also be relocated from another process using the
      sys_migrate_pages() function call.  Requires CAP_SYS_ADMIN.  The migrate_pages
      function call takes two sets of nodes and moves pages of a process that are
      located on the from nodes to the destination nodes.
      
      Manual migration is very useful if for example the scheduler has relocated a
      process to a processor on a distant node.  A batch scheduler or an
      administrator can detect the situation and move the pages of the process
      nearer to the new processor.
      
      sys_migrate_pages() could be used on non-numa machines as well, to force all
      of a particualr process's pages out to swap, if someone thinks that's useful.
      
      Larger installations usually partition the system using cpusets into sections
      of nodes.  Paul has equipped cpusets with the ability to move pages when a
      task is moved to another cpuset.  This allows automatic control over locality
      of a process.  If a task is moved to a new cpuset then also all its pages are
      moved with it so that the performance of the process does not sink
      dramatically (as is the case today).
      
      Swap migration works by simply evicting the page.  The pages must be faulted
      back in.  The pages are then typically reallocated by the system near the node
      where the process is executing.
      
      For swap migration the destination of the move is controlled by the allocation
      policy.  Cpusets set the allocation policy before calling sys_migrate_pages()
      in order to move the pages as intended.
      
      No allocation policy changes are performed for sys_migrate_pages().  This
      means that the pages may not faulted in to the specified nodes if no
      allocation policy was set by other means.  The pages will just end up near the
      node where the fault occurred.
      
      There's another patch series in the pipeline which implements "direct
      migration".
      
      The direct migration patchset extends the migration functionality to avoid
      going through swap.  The destination node of the relation is controllable
      during the actual moving of pages.  The crutch of using the allocation policy
      to relocate is not necessary and the pages are moved directly to the target.
      Its also faster since swap is not used.
      
      And sys_migrate_pages() can then move pages directly to the specified node.
      Implement functions to isolate pages from the LRU and put them back later.
      
      This patch:
      
      An earlier implementation was provided by Hirokazu Takahashi
      <taka@valinux.co.jp> and IWAMOTO Toshihiro <iwamoto@valinux.co.jp> for the
      memory hotplug project.
      
      From: Magnus
      
      This breaks out isolate_lru_page() and putpack_lru_page().  Needed for swap
      migration.
      Signed-off-by: NMagnus Damm <magnus.damm@gmail.com>
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      21eac81f
  2. 07 1月, 2006 2 次提交
  3. 29 11月, 2005 1 次提交
    • R
      [PATCH] temporarily disable swap token on memory pressure · f7b7fd8f
      Rik van Riel 提交于
      Some users (hi Zwane) have seen a problem when running a workload that
      eats nearly all of physical memory - th system does an OOM kill, even
      when there is still a lot of swap free.
      
      The problem appears to be a very big task that is holding the swap
      token, and the VM has a very hard time finding any other page in the
      system that is swappable.
      
      Instead of ignoring the swap token when sc->priority reaches 0, we could
      simply take the swap token away from the memory hog and make sure we
      don't give it back to the memory hog for a few seconds.
      
      This patch resolves the problem Zwane ran into.
      Signed-off-by: NRik van Riel <riel@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      f7b7fd8f
  4. 28 10月, 2005 1 次提交
  5. 09 10月, 2005 1 次提交
  6. 05 9月, 2005 5 次提交
    • H
      [PATCH] swap: swap_lock replace list+device · 5d337b91
      Hugh Dickins 提交于
      The idea of a swap_device_lock per device, and a swap_list_lock over them all,
      is appealing; but in practice almost every holder of swap_device_lock must
      already hold swap_list_lock, which defeats the purpose of the split.
      
      The only exceptions have been swap_duplicate, valid_swaphandles and an
      untrodden path in try_to_unuse (plus a few places added in this series).
      valid_swaphandles doesn't show up high in profiles, but swap_duplicate does
      demand attention.  However, with the hold time in get_swap_pages so much
      reduced, I've not yet found a load and set of swap device priorities to show
      even swap_duplicate benefitting from the split.  Certainly the split is mere
      overhead in the common case of a single swap device.
      
      So, replace swap_list_lock and swap_device_lock by spinlock_t swap_lock
      (generally we seem to prefer an _ in the name, and not hide in a macro).
      
      If someone can show a regression in swap_duplicate, then probably we should
      add a hashlock for the swap_map entries alone (shorts being anatomic), so as
      to help the case of the single swap device too.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      5d337b91
    • H
      [PATCH] swap: scan_swap_map drop swap_device_lock · 52b7efdb
      Hugh Dickins 提交于
      get_swap_page has often shown up on latency traces, doing lengthy scans while
      holding two spinlocks.  swap_list_lock is already dropped, now scan_swap_map
      drop swap_device_lock before scanning the swap_map.
      
      While scanning for an empty cluster, don't worry that racing tasks may
      allocate what was free and free what was allocated; but when allocating an
      entry, check it's still free after retaking the lock.  Avoid dropping the lock
      in the expected common path.  No barriers beyond the locks, just let the
      cookie crumble; highest_bit limit is volatile, but benign.
      
      Guard against swapoff: must check SWP_WRITEOK before allocating, must raise
      SWP_SCANNING reference count while in scan_swap_map, swapoff wait for that to
      fall - just use schedule_timeout, we don't want to burden scan_swap_map
      itself, and it's very unlikely that anyone can really still be in
      scan_swap_map once swapoff gets this far.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      52b7efdb
    • H
      [PATCH] swap: swap unsigned int consistency · 6eb396dc
      Hugh Dickins 提交于
      The swap header's unsigned int last_page determines the range of swap pages,
      but swap_info has been using int or unsigned long in some cases: use unsigned
      int throughout (except, in several places a local unsigned long is useful to
      avoid overflows when adding).
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NJens Axboe <axboe@suse.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      6eb396dc
    • H
      [PATCH] swap: show span of swap extents · 53092a74
      Hugh Dickins 提交于
      The "Adding %dk swap" message shows the number of swap extents, as a guide to
      how fragmented the swapfile may be.  But a useful further guide is what total
      extent they span across (sometimes scarily large).
      
      And there's no need to keep nr_extents in swap_info: it's unused after the
      initial message, so save a little space by keeping it on stack.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      53092a74
    • H
      [PATCH] swap: swap extent list is ordered · 11d31886
      Hugh Dickins 提交于
      There are several comments that swap's extent_list.prev points to the lowest
      extent: that's not so, it's extent_list.next which points to it, as you'd
      expect.  And a couple of loops in add_swap_extent which go all the way through
      the list, when they should just add to the other end.
      
      Fix those up, and let map_swap_page search the list forwards: profiles shows
      it to be twice as quick that way - because prefetch works better on how the
      structs are typically kmalloc'ed?  or because usually more is written to than
      read from swap, and swap is allocated ascendingly?
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      11d31886
  7. 08 8月, 2005 1 次提交
  8. 13 7月, 2005 1 次提交
  9. 08 7月, 2005 1 次提交
  10. 22 6月, 2005 2 次提交
    • D
      [PATCH] vm: try_to_free_pages unused argument · 1ad539b2
      Darren Hart 提交于
      try_to_free_pages accepts a third argument, order, but hasn't used it since
      before 2.6.0.  The following patch removes the argument and updates all the
      calls to try_to_free_pages.
      Signed-off-by: NDarren Hart <dvhltc@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      1ad539b2
    • M
      [PATCH] VM: early zone reclaim · 753ee728
      Martin Hicks 提交于
      This is the core of the (much simplified) early reclaim.  The goal of this
      patch is to reclaim some easily-freed pages from a zone before falling back
      onto another zone.
      
      One of the major uses of this is NUMA machines.  With the default allocator
      behavior the allocator would look for memory in another zone, which might be
      off-node, before trying to reclaim from the current zone.
      
      This adds a zone tuneable to enable early zone reclaim.  It is selected on a
      per-zone basis and is turned on/off via syscall.
      
      Adding some extra throttling on the reclaim was also required (patch
      4/4).  Without the machine would grind to a crawl when doing a "make -j"
      kernel build.  Even with this patch the System Time is higher on
      average, but it seems tolerable.  Here are some numbers for kernbench
      runs on a 2-node, 4cpu, 8Gig RAM Altix in the "make -j" run:
      
      			wall  user   sys   %cpu  ctx sw.  sleeps
      			----  ----   ---   ----   ------  ------
      No patch		1009  1384   847   258   298170   504402
      w/patch, no reclaim     880   1376   667   288   254064   396745
      w/patch & reclaim       1079  1385   926   252   291625   548873
      
      These numbers are the average of 2 runs of 3 "make -j" runs done right
      after system boot.  Run-to-run variability for "make -j" is huge, so
      these numbers aren't terribly useful except to seee that with reclaim
      the benchmark still finishes in a reasonable amount of time.
      
      I also looked at the NUMA hit/miss stats for the "make -j" runs and the
      reclaim doesn't make any difference when the machine is thrashing away.
      
      Doing a "make -j8" on a single node that is filled with page cache pages
      takes 700 seconds with reclaim turned on and 735 seconds without reclaim
      (due to remote memory accesses).
      
      The simple zone_reclaim syscall program is at
      http://www.bork.org/~mort/sgi/zone_reclaim.cSigned-off-by: NMartin Hicks <mort@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      753ee728
  11. 17 4月, 2005 1 次提交
    • L
      Linux-2.6.12-rc2 · 1da177e4
      Linus Torvalds 提交于
      Initial git repository build. I'm not bothering with the full history,
      even though we have it. We can create a separate "historical" git
      archive of that later if we want to, and in the meantime it's about
      3.2GB when imported into git - space that would just make the early
      git days unnecessarily complicated, when we don't have a lot of good
      infrastructure for it.
      
      Let it rip!
      1da177e4