1. 06 2月, 2008 1 次提交
    • B
      mm/page-writeback: highmem_is_dirtyable option · 195cf453
      Bron Gondwana 提交于
      Add vm.highmem_is_dirtyable toggle
      
      A 32 bit machine with HIGHMEM64 enabled running DCC has an MMAPed file of
      approximately 2Gb size which contains a hash format that is written
      randomly by the dbclean process.  On 2.6.16 this process took a few
      minutes.  With lowmem only accounting of dirty ratios, this takes about 12
      hours of 100% disk IO, all random writes.
      
      Include a toggle in /proc/sys/vm/highmem_is_dirtyable which can be set to 1 to
      add the highmem back to the total available memory count.
      
      [akpm@linux-foundation.org: Fix the CONFIG_DETECT_SOFTLOCKUP=y build]
      Signed-off-by: NBron Gondwana <brong@fastmail.fm>
      Cc: Ethan Solomita <solo@google.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: WU Fengguang <wfg@mail.ustc.edu.cn>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      195cf453
  2. 15 1月, 2008 1 次提交
  3. 17 10月, 2007 3 次提交
    • J
      introduce I_SYNC · 1c0eeaf5
      Joern Engel 提交于
      I_LOCK was used for several unrelated purposes, which caused deadlock
      situations in certain filesystems as a side effect.  One of the purposes
      now uses the new I_SYNC bit.
      
      Also document the various bits and change their order from historical to
      logical.
      
      [bunk@stusta.de: make fs/inode.c:wake_up_inode() static]
      Signed-off-by: NJoern Engel <joern@wohnheim.fh-wedel.de>
      Cc: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
      Cc: David Chinner <dgc@sgi.com>
      Cc: Anton Altaparmakov <aia21@cam.ac.uk>
      Cc: Al Viro <viro@ftp.linux.org.uk>
      Cc: Christoph Hellwig <hch@infradead.org>
      Signed-off-by: NAdrian Bunk <bunk@stusta.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1c0eeaf5
    • F
      writeback: introduce writeback_control.more_io to indicate more io · 2e6883bd
      Fengguang Wu 提交于
      After making dirty a 100M file, the normal behavior is to start the writeback
      for all data after 30s delays.  But sometimes the following happens instead:
      
      	- after 30s:    ~4M
      	- after 5s:     ~4M
      	- after 5s:     all remaining 92M
      
      Some analyze shows that the internal io dispatch queues goes like this:
      
      		s_io            s_more_io
      		-------------------------
      	1)	100M,1K         0
      	2)	1K              96M
      	3)	0               96M
      
      1) initial state with a 100M file and a 1K file
      2) 4M written, nr_to_write <= 0, so write more
      3) 1K written, nr_to_write > 0, no more writes(BUG)
      
      nr_to_write > 0 in (3) fools the upper layer to think that data have all been
      written out.  The big dirty file is actually still sitting in s_more_io.  We
      cannot simply splice s_more_io back to s_io as soon as s_io becomes empty, and
      let the loop in generic_sync_sb_inodes() continue: this may starve newly
      expired inodes in s_dirty.  It is also not an option to draw inodes from both
      s_more_io and s_dirty, an let the loop go on: this might lead to live locks,
      and might also starve other superblocks in sync time(well kupdate may still
      starve some superblocks, that's another bug).
      
      We have to return when a full scan of s_io completes.  So nr_to_write > 0 does
      not necessarily mean that "all data are written".  This patch introduces a
      flag writeback_control.more_io to indicate this situation.  With it the big
      dirty file no longer has to wait for the next kupdate invocation 5s later.
      
      Cc: David Chinner <dgc@sgi.com>
      Cc: Ken Chen <kenchen@google.com>
      Signed-off-by: NFengguang Wu <wfg@mail.ustc.edu.cn>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2e6883bd
    • P
      mm: per device dirty threshold · 04fbfdc1
      Peter Zijlstra 提交于
      Scale writeback cache per backing device, proportional to its writeout speed.
      
      By decoupling the BDI dirty thresholds a number of problems we currently have
      will go away, namely:
      
       - mutual interference starvation (for any number of BDIs);
       - deadlocks with stacked BDIs (loop, FUSE and local NFS mounts).
      
      It might be that all dirty pages are for a single BDI while other BDIs are
      idling. By giving each BDI a 'fair' share of the dirty limit, each one can have
      dirty pages outstanding and make progress.
      
      A global threshold also creates a deadlock for stacked BDIs; when A writes to
      B, and A generates enough dirty pages to get throttled, B will never start
      writeback until the dirty pages go away. Again, by giving each BDI its own
      'independent' dirty limit, this problem is avoided.
      
      So the problem is to determine how to distribute the total dirty limit across
      the BDIs fairly and efficiently. A DBI that has a large dirty limit but does
      not have any dirty pages outstanding is a waste.
      
      What is done is to keep a floating proportion between the DBIs based on
      writeback completions. This way faster/more active devices get a larger share
      than slower/idle devices.
      
      [akpm@linux-foundation.org: fix warnings]
      [hugh@veritas.com: Fix occasional hang when a task couldn't get out of balance_dirty_pages]
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      04fbfdc1
  4. 10 10月, 2007 2 次提交
  5. 09 10月, 2007 1 次提交
  6. 22 5月, 2007 1 次提交
    • A
      Detach sched.h from mm.h · e8edc6e0
      Alexey Dobriyan 提交于
      First thing mm.h does is including sched.h solely for can_do_mlock() inline
      function which has "current" dereference inside. By dealing with can_do_mlock()
      mm.h can be detached from sched.h which is good. See below, why.
      
      This patch
      a) removes unconditional inclusion of sched.h from mm.h
      b) makes can_do_mlock() normal function in mm/mlock.c
      c) exports can_do_mlock() to not break compilation
      d) adds sched.h inclusions back to files that were getting it indirectly.
      e) adds less bloated headers to some files (asm/signal.h, jiffies.h) that were
         getting them indirectly
      
      Net result is:
      a) mm.h users would get less code to open, read, preprocess, parse, ... if
         they don't need sched.h
      b) sched.h stops being dependency for significant number of files:
         on x86_64 allmodconfig touching sched.h results in recompile of 4083 files,
         after patch it's only 3744 (-8.3%).
      
      Cross-compile tested on
      
      	all arm defconfigs, all mips defconfigs, all powerpc defconfigs,
      	alpha alpha-up
      	arm
      	i386 i386-up i386-defconfig i386-allnoconfig
      	ia64 ia64-up
      	m68k
      	mips
      	parisc parisc-up
      	powerpc powerpc-up
      	s390 s390-up
      	sparc sparc-up
      	sparc64 sparc64-up
      	um-x86_64
      	x86_64 x86_64-up x86_64-defconfig x86_64-allnoconfig
      
      as well as my two usual configs.
      Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e8edc6e0
  7. 11 5月, 2007 1 次提交
  8. 01 5月, 2007 1 次提交
    • T
      NFS: Fix a race when doing NFS write coalescing · c63c7b05
      Trond Myklebust 提交于
      Currently we do write coalescing in a very inefficient manner: one pass in
      generic_writepages() in order to lock the pages for writing, then one pass
      in nfs_flush_mapping() and/or nfs_sync_mapping_wait() in order to gather
      the locked pages for coalescing into RPC requests of size "wsize".
      
      In fact, it turns out there is actually a deadlock possible here since we
      only start I/O on the second pass. If the user signals the process while
      we're in nfs_sync_mapping_wait(), for instance, then we may exit before
      starting I/O on all the requests that have been queued up.
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      c63c7b05
  9. 02 3月, 2007 1 次提交
  10. 21 10月, 2006 1 次提交
    • A
      [PATCH] separate bdi congestion functions from queue congestion functions · 3fcfab16
      Andrew Morton 提交于
      Separate out the concept of "queue congestion" from "backing-dev congestion".
      Congestion is a backing-dev concept, not a queue concept.
      
      The blk_* congestion functions are retained, as wrappers around the core
      backing-dev congestion functions.
      
      This proper layering is needed so that NFS can cleanly use the congestion
      functions, and so that CONFIG_BLOCK=n actually links.
      
      Cc: "Thomas Maier" <balagi@justmail.de>
      Cc: "Jens Axboe" <jens.axboe@oracle.com>
      Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Peter Osterlund <petero2@telia.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      3fcfab16
  11. 04 10月, 2006 1 次提交
  12. 01 10月, 2006 1 次提交
  13. 30 9月, 2006 1 次提交
  14. 26 9月, 2006 1 次提交
  15. 23 9月, 2006 1 次提交
  16. 23 6月, 2006 1 次提交
    • O
      [PATCH] writeback: fix range handling · 111ebb6e
      OGAWA Hirofumi 提交于
      When a writeback_control's `start' and `end' fields are used to
      indicate a one-byte-range starting at file offset zero, the required
      values of .start=0,.end=0 mean that the ->writepages() implementation
      has no way of telling that it is being asked to perform a range
      request.  Because we're currently overloading (start == 0 && end == 0)
      to mean "this is not a write-a-range request".
      
      To make all this sane, the patch changes range of writeback_control.
      
      So caller does: If it is calling ->writepages() to write pages, it
      sets range (range_start/end or range_cyclic) always.
      
      And if range_cyclic is true, ->writepages() thinks the range is
      cyclic, otherwise it just uses range_start and range_end.
      
      This patch does,
      
          - Add LLONG_MAX, LLONG_MIN, ULLONG_MAX to include/linux/kernel.h
            -1 is usually ok for range_end (type is long long). But, if someone did,
      
      		range_end += val;		range_end is "val - 1"
      		u64val = range_end >> bits;	u64val is "~(0ULL)"
      
            or something, they are wrong. So, this adds LLONG_MAX to avoid nasty
            things, and uses LLONG_MAX for range_end.
      
          - All callers of ->writepages() sets range_start/end or range_cyclic.
      
          - Fix updates of ->writeback_index. It seems already bit strange.
            If it starts at 0 and ended by check of nr_to_write, this last
            index may reduce chance to scan end of file.  So, this updates
            ->writeback_index only if range_cyclic is true or whole-file is
            scanned.
      Signed-off-by: NOGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
      Cc: Nathan Scott <nathans@sgi.com>
      Cc: Anton Altaparmakov <aia21@cantab.net>
      Cc: Steven French <sfrench@us.ibm.com>
      Cc: "Vladimir V. Saveliev" <vs@namesys.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      111ebb6e
  17. 24 3月, 2006 2 次提交
  18. 09 1月, 2006 1 次提交
  19. 07 1月, 2006 1 次提交
    • A
      identify multipage ->writepages() calls · 22905f77
      Andrew Morton 提交于
       NFS needs to be able to distinguish between single-page ->writepage() calls and
       multipage ->writepages() calls.
      
       For the single-page writepage calls NFS can kick off the I/O within the
       context of ->writepage().
      
       For multipage ->writepages calls, nfs_writepage() will leave the I/O pending
       and nfs_writepages() will kick off the I/O when it all has been queued up
       within NFS.
      
       Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      22905f77
  20. 04 1月, 2006 1 次提交
    • Z
      [PATCH] add AOP_TRUNCATED_PAGE, prepend AOP_ to WRITEPAGE_ACTIVATE · 994fc28c
      Zach Brown 提交于
      readpage(), prepare_write(), and commit_write() callers are updated to
      understand the special return code AOP_TRUNCATED_PAGE in the style of
      writepage() and WRITEPAGE_ACTIVATE.  AOP_TRUNCATED_PAGE tells the caller that
      the callee has unlocked the page and that the operation should be tried again
      with a new page.  OCFS2 uses this to detect and work around a lock inversion in
      its aop methods.  There should be no change in behaviour for methods that don't
      return AOP_TRUNCATED_PAGE.
      
      WRITEPAGE_ACTIVATE is also prepended with AOP_ for consistency and they are
      made enums so that kerneldoc can be used to document their semantics.
      Signed-off-by: NZach Brown <zach.brown@oracle.com>
      994fc28c
  21. 11 9月, 2005 1 次提交
  22. 29 6月, 2005 1 次提交
  23. 28 6月, 2005 1 次提交
    • J
      [PATCH] Update cfq io scheduler to time sliced design · 22e2c507
      Jens Axboe 提交于
      This updates the CFQ io scheduler to the new time sliced design (cfq
      v3).  It provides full process fairness, while giving excellent
      aggregate system throughput even for many competing processes.  It
      supports io priorities, either inherited from the cpu nice value or set
      directly with the ioprio_get/set syscalls.  The latter closely mimic
      set/getpriority.
      
      This import is based on my latest from -mm.
      Signed-off-by: NJens Axboe <axboe@suse.de>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      22e2c507
  24. 17 4月, 2005 1 次提交
    • L
      Linux-2.6.12-rc2 · 1da177e4
      Linus Torvalds 提交于
      Initial git repository build. I'm not bothering with the full history,
      even though we have it. We can create a separate "historical" git
      archive of that later if we want to, and in the meantime it's about
      3.2GB when imported into git - space that would just make the early
      git days unnecessarily complicated, when we don't have a lot of good
      infrastructure for it.
      
      Let it rip!
      1da177e4