1. 30 4月, 2008 6 次提交
  2. 06 2月, 2008 4 次提交
    • F
      writeback: speed up writeback of big dirty files · 8bc3be27
      Fengguang Wu 提交于
      After making dirty a 100M file, the normal behavior is to start the
      writeback for all data after 30s delays.  But sometimes the following
      happens instead:
      
      	- after 30s:    ~4M
      	- after 5s:     ~4M
      	- after 5s:     all remaining 92M
      
      Some analyze shows that the internal io dispatch queues goes like this:
      
      		s_io            s_more_io
      		-------------------------
      	1)	100M,1K         0
      	2)	1K              96M
      	3)	0               96M
      1) initial state with a 100M file and a 1K file
      
      2) 4M written, nr_to_write <= 0, so write more
      
      3) 1K written, nr_to_write > 0, no more writes(BUG)
      
      nr_to_write > 0 in (3) fools the upper layer to think that data have all
      been written out.  The big dirty file is actually still sitting in
      s_more_io.  We cannot simply splice s_more_io back to s_io as soon as s_io
      becomes empty, and let the loop in generic_sync_sb_inodes() continue: this
      may starve newly expired inodes in s_dirty.  It is also not an option to
      draw inodes from both s_more_io and s_dirty, an let the loop go on: this
      might lead to live locks, and might also starve other superblocks in sync
      time(well kupdate may still starve some superblocks, that's another bug).
      
      We have to return when a full scan of s_io completes.  So nr_to_write > 0
      does not necessarily mean that "all data are written".  This patch
      introduces a flag writeback_control.more_io to indicate that more io should
      be done.  With it the big dirty file no longer has to wait for the next
      kupdate invokation 5s later.
      
      In sync_sb_inodes() we only set more_io on super_blocks we actually
      visited.  This avoids the interaction between two pdflush deamons.
      
      Also in __sync_single_inode() we don't blindly keep requeuing the io if the
      filesystem cannot progress.  Failing to do so may lead to 100% iowait.
      Tested-by: NMike Snitzer <snitzer@gmail.com>
      Signed-off-by: NFengguang Wu <wfg@mail.ustc.edu.cn>
      Cc: Michael Rubin <mrubin@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8bc3be27
    • H
      mm: remove fastcall from mm/ · 920c7a5d
      Harvey Harrison 提交于
      fastcall is always defined to be empty, remove it
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: NHarvey Harrison <harvey.harrison@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      920c7a5d
    • B
      mm/page-writeback: highmem_is_dirtyable option · 195cf453
      Bron Gondwana 提交于
      Add vm.highmem_is_dirtyable toggle
      
      A 32 bit machine with HIGHMEM64 enabled running DCC has an MMAPed file of
      approximately 2Gb size which contains a hash format that is written
      randomly by the dbclean process.  On 2.6.16 this process took a few
      minutes.  With lowmem only accounting of dirty ratios, this takes about 12
      hours of 100% disk IO, all random writes.
      
      Include a toggle in /proc/sys/vm/highmem_is_dirtyable which can be set to 1 to
      add the highmem back to the total available memory count.
      
      [akpm@linux-foundation.org: Fix the CONFIG_DETECT_SOFTLOCKUP=y build]
      Signed-off-by: NBron Gondwana <brong@fastmail.fm>
      Cc: Ethan Solomita <solo@google.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: WU Fengguang <wfg@mail.ustc.edu.cn>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      195cf453
    • A
      mm/page-writeback.c: make a function static · f61eaf9f
      Adrian Bunk 提交于
      task_dirty_limit() can become static.
      Signed-off-by: NAdrian Bunk <bunk@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f61eaf9f
  3. 15 1月, 2008 1 次提交
  4. 16 11月, 2007 1 次提交
    • L
      dirty page balancing: Get rid of broken unmapped_ratio logic · 8c086340
      Linus Torvalds 提交于
      This code harks back to the days when we didn't count dirty mapped
      pages, which led us to try to balance the number of dirty unmapped pages
      by how much unmapped memory there was in the system.
      
      That makes no sense any more, since now the dirty counts include the
      mapped pages.  Not to mention that the math doesn't work with HIGHMEM
      machines anyway, and causes the unmapped_ratio to potentially turn
      negative (which we do catch thanks to clamping it at a minimum value,
      but I mention that as an indication of how broken the code is).
      
      The code also was written at a time when the default dirty ratio was
      much larger, and the unmapped_ratio logic effectively capped that large
      dirty ratio a bit.  Again, we've since lowered the dirty ratio rather
      aggressively, further lessening the point of that code.
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8c086340
  5. 15 11月, 2007 1 次提交
  6. 20 10月, 2007 1 次提交
  7. 17 10月, 2007 10 次提交
  8. 09 10月, 2007 1 次提交
  9. 20 7月, 2007 3 次提交
    • A
      move page writeback acounting out of macros · d688abf5
      Andrew Morton 提交于
      page-writeback accounting is presently performed in the page-flags macros.
      This is inconsistent and a bit ugly and makes it awkward to implement
      per-backing_dev under-writeback page accounting.
      
      So move this accounting down to the callsite(s).
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d688abf5
    • F
      mm: share PG_readahead and PG_reclaim · fe3cba17
      Fengguang Wu 提交于
      Share the same page flag bit for PG_readahead and PG_reclaim.
      
      One is used only on file reads, another is only for emergency writes.  One
      is used mostly for fresh/young pages, another is for old pages.
      
      Combinations of possible interactions are:
      
      a) clear PG_reclaim => implicit clear of PG_readahead
      	it will delay an asynchronous readahead into a synchronous one
      	it actually does _good_ for readahead:
      		the pages will be reclaimed soon, it's readahead thrashing!
      		in this case, synchronous readahead makes more sense.
      
      b) clear PG_readahead => implicit clear of PG_reclaim
      	one(and only one) page will not be reclaimed in time
      	it can be avoided by checking PageWriteback(page) in readahead first
      
      c) set PG_reclaim => implicit set of PG_readahead
      	will confuse readahead and make it restart the size rampup process
      	it's a trivial problem, and can mostly be avoided by checking
      	PageWriteback(page) first in readahead
      
      d) set PG_readahead => implicit set of PG_reclaim
      	PG_readahead will never be set on already cached pages.
      	PG_reclaim will always be cleared on dirtying a page.
      	so not a problem.
      
      In summary,
      	a)   we get better behavior
      	b,d) possible interactions can be avoided
      	c)   racy condition exists that might affect readahead, but the chance
      	     is _really_ low, and the hurt on readahead is trivial.
      
      Compound pages also use PG_reclaim, but for now they do not interact with
      reclaim/readahead code.
      Signed-off-by: NFengguang Wu <wfg@mail.ustc.edu.cn>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fe3cba17
    • N
      mm: fix clear_page_dirty_for_io vs fault race · 79352894
      Nick Piggin 提交于
      Fix msync data loss and (less importantly) dirty page accounting
      inaccuracies due to the race remaining in clear_page_dirty_for_io().
      
      The deleted comment explains what the race was, and the added comments
      explain how it is fixed.
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Miklos Szeredi <miklos@szeredi.hu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      79352894
  10. 18 7月, 2007 1 次提交
    • N
      fs: introduce some page/buffer invariants · 787d2214
      Nick Piggin 提交于
      It is a bug to set a page dirty if it is not uptodate unless it has
      buffers.  If the page has buffers, then the page may be dirty (some buffers
      dirty) but not uptodate (some buffers not uptodate).  The exception to this
      rule is if the set_page_dirty caller is racing with truncate or invalidate.
      
      A buffer can not be set dirty if it is not uptodate.
      
      If either of these situations occurs, it indicates there could be some data
      loss problem.  Some of these warnings could be a harmless one where the
      page or buffer is set uptodate immediately after it is dirtied, however we
      should fix those up, and enforce this ordering.
      
      Bring the order of operations for truncate into line with those of
      invalidate.  This will prevent a page from being able to go !uptodate while
      we're holding the tree_lock, which is probably a good thing anyway.
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      787d2214
  11. 17 7月, 2007 1 次提交
  12. 11 5月, 2007 1 次提交
  13. 09 5月, 2007 1 次提交
  14. 08 5月, 2007 1 次提交
    • C
      Use ZVC counters to establish exact size of dirtyable pages · 1b424464
      Christoph Lameter 提交于
      We can use the global ZVC counters to establish the exact size of the LRU
      and the free pages.  This allows a more accurate determination of the dirty
      ratio.
      
      This patch will fix the broken ratio calculations if large amounts of
      memory are allocated to huge pags or other consumers that do not put the
      pages on to the LRU.
      
      Notes:
      - I did not add NR_SLAB_RECLAIMABLE to the calculation of the
        dirtyable pages. Those may be reclaimable but they are at this
        point not dirtyable. If NR_SLAB_RECLAIMABLE would be considered
        then a huge number of reclaimable pages would stop writeback
        from occurring.
      
      - This patch used to be in mm as the last one in a series of patches.
        It was removed when Linus updated the treatment of highmem because
        there was a conflict. I updated the patch to follow Linus' approach.
        This patch is neede to fulfill the claims made in the beginning of the
        patchset that is now in Linus' tree.
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1b424464
  15. 28 4月, 2007 1 次提交
  16. 02 3月, 2007 1 次提交
  17. 12 2月, 2007 3 次提交
  18. 30 1月, 2007 1 次提交
    • L
      Fix balance_dirty_page() calculations with CONFIG_HIGHMEM · dc6e29da
      Linus Torvalds 提交于
      This makes balance_dirty_page() always base its calculations on the
      amount of non-highmem memory in the machine, rather than try to base it
      on total memory and then falling back on non-highmem memory if the
      mapping it was writing wasn't highmem capable.
      
      This not only fixes a situation where two different writers can have
      wildly different notions about what is a "balanced" dirty state, but it
      also means that people with highmem machines don't run into an OOM
      situation when regular memory fills up with dirty pages.
      
      We used to try to handle the latter case by scaling down the dirty_ratio
      if the machine had a lot of highmem pages in page_writeback_init(), but
      it wasn't aggressive enough for some situations, and since basing the
      dirty ratio on highmem memory was broken in the first place, let's just
      stop doing so.
      
      (A variation of this theme fixed Justin Piszcz's OOM problem when
      copying an 18GB file on a RAID setup).
      Acked-by: NNick Piggin <nickpiggin@yahoo.com.au>
      Cc: Justin Piszcz <jpiszcz@lucidpixels.com>
      Cc: Andrew Morton <akpm@osdl.org>
      Cc: Neil Brown <neilb@suse.de>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Randy Dunlap <rdunlap@xenotime.net>
      Cc: Christoph Lameter <clameter@sgi.com>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Adrian Bunk <bunk@stusta.de>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      dc6e29da
  19. 30 12月, 2006 1 次提交
    • L
      VM: Fix nasty and subtle race in shared mmap'ed page writeback · 7658cc28
      Linus Torvalds 提交于
      The VM layer (on the face of it, fairly reasonably) expected that when
      it does a ->writepage() call to the filesystem, it would write out the
      full page at that point in time.  Especially since it had earlier marked
      the whole page dirty with "set_page_dirty()".
      
      But that isn't actually the case: ->writepage() does not actually write
      a page, it writes the parts of the page that have been explicitly marked
      dirty before, *and* that had not got written out for other reasons since
      the last time we told it they were dirty.
      
      That last caveat is the important one.
      
      Which _most_ of the time ends up being the whole page (since we had
      called "set_page_dirty()" on the page earlier), but if the filesystem
      had done any dirty flushing of its own (for example, to honor some
      internal write ordering guarantees), it might end up doing only a
      partial page IO (or none at all) when ->writepage() is actually called.
      
      That is the correct thing in general (since we actually often _want_
      only the known-dirty parts of the page to be written out), but the
      shared dirty page handling had implicitly forgotten about these details,
      and had a number of cases where it was doing just the "->writepage()"
      part, without telling the low-level filesystem that the whole page might
      have been re-dirtied as part of being mapped writably into user space.
      
      Since most of the time the FS did actually write out the full page, we
      didn't notice this for a loong time, and this needed some really odd
      patterns to trigger.  But it caused occasional corruption with rtorrent
      and with the Debian "apt" database, because both use shared mmaps to
      update the end result.
      
      This fixes it. Finally. After way too much hair-pulling.
      Acked-by: NNick Piggin <nickpiggin@yahoo.com.au>
      Acked-by: NMartin J. Bligh <mbligh@google.com>
      Acked-by: NMartin Michlmayr <tbm@cyrius.com>
      Acked-by: NMartin Johansson <martin@fatbob.nu>
      Acked-by: NIngo Molnar <mingo@elte.hu>
      Acked-by: NAndrei Popa <andrei.popa@i-neo.ro>
      Cc: High Dickins <hugh@veritas.com>
      Cc: Andrew Morton <akpm@osdl.org>,
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Segher Boessenkool <segher@kernel.crashing.org>
      Cc: David Miller <davem@davemloft.net>
      Cc: Arjan van de Ven <arjan@infradead.org>
      Cc: Gordon Farquharson <gordonfarquharson@gmail.com>
      Cc: Guillaume Chazarain <guichaz@yahoo.fr>
      Cc: Theodore Tso <tytso@mit.edu>
      Cc: Kenneth Cheng <kenneth.w.chen@intel.com>
      Cc: Tobias Diedrich <ranma@tdiedrich.de>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      7658cc28