1. 08 9月, 2010 1 次提交
  2. 29 8月, 2010 1 次提交
    • H
      mm: fix hang on anon_vma->root->lock · f1819427
      Hugh Dickins 提交于
      After several hours, kbuild tests hang with anon_vma_prepare() spinning on
      a newly allocated anon_vma's lock - on a box with CONFIG_TREE_PREEMPT_RCU=y
      (which makes this very much more likely, but it could happen without).
      
      The ever-subtle page_lock_anon_vma() now needs a further twist: since
      anon_vma_prepare() and anon_vma_fork() are liable to change the ->root
      of a reused anon_vma structure at any moment, page_lock_anon_vma()
      needs to check page_mapped() again before succeeding, otherwise
      page_unlock_anon_vma() might address a different root->lock.
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Reviewed-by: NRik van Riel <riel@redhat.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f1819427
  3. 27 8月, 2010 2 次提交
  4. 25 8月, 2010 1 次提交
  5. 24 8月, 2010 1 次提交
    • D
      writeback: write_cache_pages doesn't terminate at nr_to_write <= 0 · 546a1924
      Dave Chinner 提交于
      I noticed XFS writeback in 2.6.36-rc1 was much slower than it should have
      been. Enabling writeback tracing showed:
      
          flush-253:16-8516  [007] 1342952.351608: wbc_writepage: bdi 253:16: towrt=1024 skip=0 mode=0 kupd=0 bgrd=1 reclm=0 cyclic=1 more=0 older=0x0 start=0x0 end=0x0
          flush-253:16-8516  [007] 1342952.351654: wbc_writepage: bdi 253:16: towrt=1023 skip=0 mode=0 kupd=0 bgrd=1 reclm=0 cyclic=1 more=0 older=0x0 start=0x0 end=0x0
          flush-253:16-8516  [000] 1342952.369520: wbc_writepage: bdi 253:16: towrt=0 skip=0 mode=0 kupd=0 bgrd=1 reclm=0 cyclic=1 more=0 older=0x0 start=0x0 end=0x0
          flush-253:16-8516  [000] 1342952.369542: wbc_writepage: bdi 253:16: towrt=-1 skip=0 mode=0 kupd=0 bgrd=1 reclm=0 cyclic=1 more=0 older=0x0 start=0x0 end=0x0
          flush-253:16-8516  [000] 1342952.369549: wbc_writepage: bdi 253:16: towrt=-2 skip=0 mode=0 kupd=0 bgrd=1 reclm=0 cyclic=1 more=0 older=0x0 start=0x0 end=0x0
      
      Writeback is not terminating in background writeback if ->writepage is
      returning with wbc->nr_to_write == 0, resulting in sub-optimal single page
      writeback on XFS.
      
      Fix the write_cache_pages loop to terminate correctly when this situation
      occurs and so prevent this sub-optimal background writeback pattern. This
      improves sustained sequential buffered write performance from around
      250MB/s to 750MB/s for a 100GB file on an XFS filesystem on my 8p test VM.
      
      Cc:<stable@kernel.org>
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NWu Fengguang <fengguang.wu@intel.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      546a1924
  6. 23 8月, 2010 1 次提交
  7. 21 8月, 2010 7 次提交
  8. 18 8月, 2010 1 次提交
  9. 16 8月, 2010 1 次提交
    • L
      mm: fix up some user-visible effects of the stack guard page · d7824370
      Linus Torvalds 提交于
      This commit makes the stack guard page somewhat less visible to user
      space. It does this by:
      
       - not showing the guard page in /proc/<pid>/maps
      
         It looks like lvm-tools will actually read /proc/self/maps to figure
         out where all its mappings are, and effectively do a specialized
         "mlockall()" in user space.  By not showing the guard page as part of
         the mapping (by just adding PAGE_SIZE to the start for grows-up
         pages), lvm-tools ends up not being aware of it.
      
       - by also teaching the _real_ mlock() functionality not to try to lock
         the guard page.
      
         That would just expand the mapping down to create a new guard page,
         so there really is no point in trying to lock it in place.
      
      It would perhaps be nice to show the guard page specially in
      /proc/<pid>/maps (or at least mark grow-down segments some way), but
      let's not open ourselves up to more breakage by user space from programs
      that depends on the exact deails of the 'maps' file.
      
      Special thanks to Henrique de Moraes Holschuh for diving into lvm-tools
      source code to see what was going on with the whole new warning.
      
      Reported-and-tested-by: François Valenduc <francois.valenduc@tvcablenet.be
      Reported-by: NHenrique de Moraes Holschuh <hmh@hmh.eng.br>
      Cc: stable@kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d7824370
  10. 15 8月, 2010 2 次提交
  11. 14 8月, 2010 2 次提交
  12. 13 8月, 2010 1 次提交
    • L
      mm: keep a guard page below a grow-down stack segment · 320b2b8d
      Linus Torvalds 提交于
      This is a rather minimally invasive patch to solve the problem of the
      user stack growing into a memory mapped area below it.  Whenever we fill
      the first page of the stack segment, expand the segment down by one
      page.
      
      Now, admittedly some odd application might _want_ the stack to grow down
      into the preceding memory mapping, and so we may at some point need to
      make this a process tunable (some people might also want to have more
      than a single page of guarding), but let's try the minimal approach
      first.
      
      Tested with trivial application that maps a single page just below the
      stack, and then starts recursing.  Without this, we will get a SIGSEGV
      _after_ the stack has smashed the mapping.  With this patch, we'll get a
      nice SIGBUS just as the stack touches the page just above the mapping.
      Requested-by: NKeith Packard <keithp@keithp.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      320b2b8d
  13. 12 8月, 2010 4 次提交
    • W
      writeback: add comment to the dirty limit functions · 1babe183
      Wu Fengguang 提交于
      Document global_dirty_limits() and bdi_dirty_limit().
      Signed-off-by: NWu Fengguang <fengguang.wu@intel.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1babe183
    • W
      writeback: avoid unnecessary calculation of bdi dirty thresholds · 16c4042f
      Wu Fengguang 提交于
      Split get_dirty_limits() into global_dirty_limits()+bdi_dirty_limit(), so
      that the latter can be avoided when under global dirty background
      threshold (which is the normal state for most systems).
      Signed-off-by: NWu Fengguang <fengguang.wu@intel.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      16c4042f
    • W
      writeback: balance_dirty_pages(): reduce calls to global_page_state · e50e3720
      Wu Fengguang 提交于
      Reducing the number of times balance_dirty_pages calls global_page_state
      reduces the cache references and so improves write performance on a
      variety of workloads.
      
      'perf stats' of simple fio write tests shows the reduction in cache
      access.  Where the test is fio 'write,mmap,600Mb,pre_read' on AMD AthlonX2
      with 3Gb memory (dirty_threshold approx 600 Mb) running each test 10
      times, dropping the fasted & slowest values then taking the average &
      standard deviation
      
      		average (s.d.) in millions (10^6)
      2.6.31-rc8	648.6 (14.6)
      +patch		620.1 (16.5)
      
      Achieving this reduction is by dropping clip_bdi_dirty_limit as it rereads
      the counters to apply the dirty_threshold and moving this check up into
      balance_dirty_pages where it has already read the counters.
      
      Also by rearrange the for loop to only contain one copy of the limit tests
      allows the pdflush test after the loop to use the local copies of the
      counters rather than rereading them.
      
      In the common case with no throttling it now calls global_page_state 5
      fewer times and bdi_stat 2 fewer.
      
      Fengguang:
      
      This patch slightly changes behavior by replacing clip_bdi_dirty_limit()
      with the explicit check (nr_reclaimable + nr_writeback >= dirty_thresh) to
      avoid exceeding the dirty limit.  Since the bdi dirty limit is mostly
      accurate we don't need to do routinely clip.  A simple dirty limit check
      would be enough.
      
      The check is necessary because, in principle we should throttle everything
      calling balance_dirty_pages() when we're over the total limit, as said by
      Peter.
      
      We now set and clear dirty_exceeded not only based on bdi dirty limits,
      but also on the global dirty limit.  The global limit check is added in
      place of clip_bdi_dirty_limit() for safety and not intended as a behavior
      change.  The bdi limits should be tight enough to keep all dirty pages
      under the global limit at most time; occasional small exceeding should be
      OK though.  The change makes the logic more obvious: the global limit is
      the ultimate goal and shall be always imposed.
      
      We may now start background writeback work based on outdated conditions.
      That's safe because the bdi flush thread will (and have to) double check
      the states.  It reduces overall overheads because the test based on old
      states still have good chance to be right.
      
      [akpm@linux-foundation.org] fix uninitialized dirty_exceeded
      Signed-off-by: NRichard Kennedy <richard@rsk.demon.co.uk>
      Signed-off-by: NWu Fengguang <fengguang.wu@intel.com>
      Cc: Jan Kara <jack@suse.cz>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e50e3720
    • R
      mm: fix fatal kernel-doc error · 3c111a07
      Randy Dunlap 提交于
      Fix a fatal kernel-doc error due to a #define coming between a function's
      kernel-doc notation and the function signature.  (kernel-doc cannot handle
      this)
      Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3c111a07
  14. 11 8月, 2010 15 次提交