1. 22 6月, 2005 4 次提交
    • M
      [PATCH] VM: rate limit early reclaim · 1e7e5a90
      Martin Hicks 提交于
      When early zone reclaim is turned on the LRU is scanned more frequently when a
      zone is low on memory.  This limits when the zone reclaim can be called by
      skipping the scan if another thread (either via kswapd or sync reclaim) is
      already reclaiming from the zone.
      Signed-off-by: NMartin Hicks <mort@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      1e7e5a90
    • M
      [PATCH] VM: early zone reclaim · 753ee728
      Martin Hicks 提交于
      This is the core of the (much simplified) early reclaim.  The goal of this
      patch is to reclaim some easily-freed pages from a zone before falling back
      onto another zone.
      
      One of the major uses of this is NUMA machines.  With the default allocator
      behavior the allocator would look for memory in another zone, which might be
      off-node, before trying to reclaim from the current zone.
      
      This adds a zone tuneable to enable early zone reclaim.  It is selected on a
      per-zone basis and is turned on/off via syscall.
      
      Adding some extra throttling on the reclaim was also required (patch
      4/4).  Without the machine would grind to a crawl when doing a "make -j"
      kernel build.  Even with this patch the System Time is higher on
      average, but it seems tolerable.  Here are some numbers for kernbench
      runs on a 2-node, 4cpu, 8Gig RAM Altix in the "make -j" run:
      
      			wall  user   sys   %cpu  ctx sw.  sleeps
      			----  ----   ---   ----   ------  ------
      No patch		1009  1384   847   258   298170   504402
      w/patch, no reclaim     880   1376   667   288   254064   396745
      w/patch & reclaim       1079  1385   926   252   291625   548873
      
      These numbers are the average of 2 runs of 3 "make -j" runs done right
      after system boot.  Run-to-run variability for "make -j" is huge, so
      these numbers aren't terribly useful except to seee that with reclaim
      the benchmark still finishes in a reasonable amount of time.
      
      I also looked at the NUMA hit/miss stats for the "make -j" runs and the
      reclaim doesn't make any difference when the machine is thrashing away.
      
      Doing a "make -j8" on a single node that is filled with page cache pages
      takes 700 seconds with reclaim turned on and 735 seconds without reclaim
      (due to remote memory accesses).
      
      The simple zone_reclaim syscall program is at
      http://www.bork.org/~mort/sgi/zone_reclaim.cSigned-off-by: NMartin Hicks <mort@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      753ee728
    • M
      [PATCH] VM: add may_swap flag to scan_control · bfbb38fb
      Martin Hicks 提交于
      Here's the next round of these patches.  These are totally different in
      an attempt to meet the "simpler" request after the last patches.  For
      reference the earlier threads are:
      
      http://marc.theaimsgroup.com/?l=linux-kernel&m=110839604924587&w=2
      http://marc.theaimsgroup.com/?l=linux-mm&m=111461480721249&w=2
      
      This set of patches replaces my other vm- patches that are currently in
      -mm.  So they're against 2.6.12-rc5-mm1 about half way through the -mm
      patchset.
      
      As I said already this patch is a lot simpler.  The reclaim is turned on
      or off on a per-zone basis using a syscall.  I haven't tested the x86
      syscall, so it might be wrong.  It uses the existing reclaim/pageout
      code with the small addition of a may_swap flag to scan_control
      (patch 1/4).
      
      I also added __GFP_NORECLAIM (patch 3/4) so that certain allocation
      types can be flagged to never cause reclaim.  This was a deficiency
      that was in all of my earlier patch sets.  Previously, doing a big
      buffered read would fill one zone with page cache and then start to
      reclaim from that same zone, leaving the other zones untouched.
      
      Adding some extra throttling on the reclaim was also required (patch
      4/4).  Without the machine would grind to a crawl when doing a "make -j"
      kernel build.  Even with this patch the System Time is higher on
      average, but it seems tolerable.  Here are some numbers for kernbench
      runs on a 2-node, 4cpu, 8Gig RAM Altix in the "make -j" run:
      
      			wall  user   sys   %cpu  ctx sw.  sleeps
      			----  ----   ---   ----   ------  ------
      No patch		1009  1384   847   258   298170   504402
      w/patch, no reclaim     880   1376   667   288   254064   396745
      w/patch & reclaim       1079  1385   926   252   291625   548873
      
      These numbers are the average of 2 runs of 3 "make -j" runs done right
      after system boot.  Run-to-run variability for "make -j" is huge, so
      these numbers aren't terribly useful except to seee that with reclaim
      the benchmark still finishes in a reasonable amount of time.
      
      I also looked at the NUMA hit/miss stats for the "make -j" runs and the
      reclaim doesn't make any difference when the machine is thrashing away.
      
      Doing a "make -j8" on a single node that is filled with page cache pages
      takes 700 seconds with reclaim turned on and 735 seconds without reclaim
      (due to remote memory accesses).
      
      The simple zone_reclaim syscall program is at
      http://www.bork.org/~mort/sgi/zone_reclaim.c
      
      This patch:
      
      This adds an extra switch to the scan_control struct.  It simply lets the
      reclaim code know if its allowed to swap pages out.
      
      This was required for a simple per-zone reclaimer.  Without this addition
      pages would be swapped out as soon as a zone ran out of memory and the early
      reclaim kicked in.
      Signed-off-by: NMartin Hicks <mort@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      bfbb38fb
    • A
      [PATCH] vmscan: notice slab shrinking · b15e0905
      akpm@osdl.org 提交于
      Fix a problem identified by Andrea Arcangeli <andrea@suse.de>
      
      kswapd will set a zone into all_unreclaimable state if it sees that we're not
      successfully reclaiming LRU pages.  But that fails to notice that we're
      successfully reclaiming slab obects, so we can set all_unreclaimable too soon.
      
      So change shrink_slab() to return a success indication if it actually
      reclaimed some objects, and don't assume that the zone is all_unreclaimable if
      that is true.  This means that we won't enter all_unreclaimable state if we
      are successfully freeing slab objects but we're not yet actually freeing slab
      pages, due to internal fragmentation.
      
      (hm, this has a shortcoming.  We could be successfully freeing ZONE_NORMAL
      slab objects while being really oom on ZONE_DMA.  If that happens then kswapd
      might burn a lot of CPU.  But given that there might be some slab objects in
      ZONE_DMA, perhaps that is appropriate.)
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      b15e0905
  2. 17 4月, 2005 2 次提交