1. 12 6月, 2009 1 次提交
    • J
      vfs: Make sys_sync() use fsync_super() (version 4) · 5cee5815
      Jan Kara 提交于
      It is unnecessarily fragile to have two places (fsync_super() and do_sync())
      doing data integrity sync of the filesystem. Alter __fsync_super() to
      accommodate needs of both callers and use it. So after this patch
      __fsync_super() is the only place where we gather all the calls needed to
      properly send all data on a filesystem to disk.
      
      Nice bonus is that we get a complete livelock avoidance and write_supers()
      is now only used for periodic writeback of superblocks.
      
      sync_blockdevs() introduced a couple of patches ago is gone now.
      
      [build fixes folded]
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      5cee5815
  2. 15 5月, 2009 1 次提交
    • J
      Revert "mm: add /proc controls for pdflush threads" · cd17cbfd
      Jens Axboe 提交于
      This reverts commit fafd688e.
      
      Work is progressing to switch away from pdflush as the process backing
      for flushing out dirty data. So it seems pointless to add more knobs
      to control pdflush threads. The original author of the patch did not
      have any specific use cases for adding the knobs, so we can easily
      revert this before 2.6.30 to avoid having to maintain this API
      forever.
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      cd17cbfd
  3. 07 4月, 2009 1 次提交
    • P
      mm: add /proc controls for pdflush threads · fafd688e
      Peter W Morreale 提交于
      Add /proc entries to give the admin the ability to control the minimum and
      maximum number of pdflush threads.  This allows finer control of pdflush
      on both large and small machines.
      
      The rationale is simply one size does not fit all.  Admins on large and/or
      small systems may want to tune the min/max pdflush thread count to best
      suit their needs.  Right now the min/max is hardcoded to 2/8.  While
      probably a fair estimate for smaller machines, large machines with large
      numbers of CPUs and large numbers of filesystems/block devices may benefit
      from larger numbers of threads working on different block devices.
      
      Even if the background flushing algorithm is radically changed, it is
      still likely that multiple threads will be involved and admins would still
      desire finer control on the min/max other than to have to recompile the
      kernel.
      
      The patch adds '/proc/sys/vm/nr_pdflush_threads_min' and
      '/proc/sys/vm/nr_pdflush_threads_max' with r/w permissions.
      
      The minimum value for nr_pdflush_threads_min is 1 and the maximum value is
      the current value of nr_pdflush_threads_max.  This minimum is required
      since additional thread creation is performed in a pdflush thread itself.
      
      The minimum value for nr_pdflush_threads_max is the current value of
      nr_pdflush_threads_min and the maximum value can be 1000.
      
      Documentation/sysctl/vm.txt is also updated.
      
      [akpm@linux-foundation.org: fix comment, fix whitespace, use __read_mostly]
      Signed-off-by: NPeter W Morreale <pmorreale@novell.com>
      Reviewed-by: NRik van Riel <riel@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fafd688e
  4. 01 4月, 2009 1 次提交
  5. 07 1月, 2009 3 次提交
    • N
      fs: remove WB_SYNC_HOLD · 4f5a99d6
      Nick Piggin 提交于
      Remove WB_SYNC_HOLD.  The primary motiviation is the design of my
      anti-starvation code for fsync.  It requires taking an inode lock over the
      sync operation, so we could run into lock ordering problems with multiple
      inodes.  It is possible to take a single global lock to solve the ordering
      problem, but then that would prevent a future nice implementation of "sync
      multiple inodes" based on lock order via inode address.
      
      Seems like a backward step to remove this, but actually it is busted
      anyway: we can't use the inode lists for data integrity wait: an inode can
      be taken off the dirty lists but still be under writeback.  In order to
      satisfy data integrity semantics, we should wait for it to finish
      writeback, but if we only search the dirty lists, we'll miss it.
      
      It would be possible to have a "writeback" list, for sys_sync, I suppose.
      But why complicate things by prematurely optimise?  For unmounting, we
      could avoid the "livelock avoidance" code, which would be easier, but
      again premature IMO.
      
      Fixing the existing data integrity problem will come next.
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4f5a99d6
    • D
      mm: add dirty_background_bytes and dirty_bytes sysctls · 2da02997
      David Rientjes 提交于
      This change introduces two new sysctls to /proc/sys/vm:
      dirty_background_bytes and dirty_bytes.
      
      dirty_background_bytes is the counterpart to dirty_background_ratio and
      dirty_bytes is the counterpart to dirty_ratio.
      
      With growing memory capacities of individual machines, it's no longer
      sufficient to specify dirty thresholds as a percentage of the amount of
      dirtyable memory over the entire system.
      
      dirty_background_bytes and dirty_bytes specify quantities of memory, in
      bytes, that represent the dirty limits for the entire system.  If either
      of these values is set, its value represents the amount of dirty memory
      that is needed to commence either background or direct writeback.
      
      When a `bytes' or `ratio' file is written, its counterpart becomes a
      function of the written value.  For example, if dirty_bytes is written to
      be 8096, 8K of memory is required to commence direct writeback.
      dirty_ratio is then functionally equivalent to 8K / the amount of
      dirtyable memory:
      
      	dirtyable_memory = free pages + mapped pages + file cache
      
      	dirty_background_bytes = dirty_background_ratio * dirtyable_memory
      		-or-
      	dirty_background_ratio = dirty_background_bytes / dirtyable_memory
      
      		AND
      
      	dirty_bytes = dirty_ratio * dirtyable_memory
      		-or-
      	dirty_ratio = dirty_bytes / dirtyable_memory
      
      Only one of dirty_background_bytes and dirty_background_ratio may be
      specified at a time, and only one of dirty_bytes and dirty_ratio may be
      specified.  When one sysctl is written, the other appears as 0 when read.
      
      The `bytes' files operate on a page size granularity since dirty limits
      are compared with ZVC values, which are in page units.
      
      Prior to this change, the minimum dirty_ratio was 5 as implemented by
      get_dirty_limits() although /proc/sys/vm/dirty_ratio would show any user
      written value between 0 and 100.  This restriction is maintained, but
      dirty_bytes has a lower limit of only one page.
      
      Also prior to this change, the dirty_background_ratio could not equal or
      exceed dirty_ratio.  This restriction is maintained in addition to
      restricting dirty_background_bytes.  If either background threshold equals
      or exceeds that of the dirty threshold, it is implicitly set to half the
      dirty threshold.
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Cc: Andrea Righi <righi.andrea@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2da02997
    • D
      mm: change dirty limit type specifiers to unsigned long · 364aeb28
      David Rientjes 提交于
      The background dirty and dirty limits are better defined with type
      specifiers of unsigned long since negative writeback thresholds are not
      possible.
      
      These values, as returned by get_dirty_limits(), are normally compared
      with ZVC values to determine whether writeback shall commence or be
      throttled.  Such page counts cannot be negative, so declaring the page
      limits as signed is unnecessary.
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Cc: Andrea Righi <righi.andrea@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      364aeb28
  6. 16 10月, 2008 1 次提交
  7. 14 10月, 2008 1 次提交
  8. 12 7月, 2008 1 次提交
    • A
      mm: Add range_cont mode for writeback · 06d6cf69
      Aneesh Kumar K.V 提交于
      Filesystems like ext4 needs to start a new transaction in
      the writepages for block allocation. This happens with delayed
      allocation and there is limit to how many credits we can request
      from the journal layer. So we call write_cache_pages multiple
      times with wbc->nr_to_write set to the maximum possible value
      limitted by the max journal credits available.
      
      Add a new mode to writeback that enables us to handle this
      behaviour. In the new mode we update the wbc->range_start
      to point to the new offset to be written. Next call to
      call to write_cache_pages will start writeout from specified
      range_start offset. In the new mode we also limit writing
      to the specified wbc->range_end.
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NMingming Cao <cmm@us.ibm.com>
      Acked-by: NJan Kara <jack@suse.cz>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      06d6cf69
  9. 24 5月, 2008 1 次提交
    • S
      ftrace: limit trace entries · 3eefae99
      Steven Rostedt 提交于
      Currently there is no protection from the root user to use up all of
      memory for trace buffers. If the root user allocates too many entries,
      the OOM killer might start kill off all tasks.
      
      This patch adds an algorith to check the following condition:
      
       pages_requested > (freeable_memory + current_trace_buffer_pages) / 4
      
      If the above is met then the allocation fails. The above prevents more
      than 1/4th of freeable memory from being used by trace buffers.
      
      To determine the freeable_memory, I made determine_dirtyable_memory in
      mm/page-writeback.c global.
      
      Special thanks goes to Peter Zijlstra for suggesting the above calculation.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      3eefae99
  10. 30 4月, 2008 1 次提交
  11. 06 2月, 2008 2 次提交
    • F
      writeback: speed up writeback of big dirty files · 8bc3be27
      Fengguang Wu 提交于
      After making dirty a 100M file, the normal behavior is to start the
      writeback for all data after 30s delays.  But sometimes the following
      happens instead:
      
      	- after 30s:    ~4M
      	- after 5s:     ~4M
      	- after 5s:     all remaining 92M
      
      Some analyze shows that the internal io dispatch queues goes like this:
      
      		s_io            s_more_io
      		-------------------------
      	1)	100M,1K         0
      	2)	1K              96M
      	3)	0               96M
      1) initial state with a 100M file and a 1K file
      
      2) 4M written, nr_to_write <= 0, so write more
      
      3) 1K written, nr_to_write > 0, no more writes(BUG)
      
      nr_to_write > 0 in (3) fools the upper layer to think that data have all
      been written out.  The big dirty file is actually still sitting in
      s_more_io.  We cannot simply splice s_more_io back to s_io as soon as s_io
      becomes empty, and let the loop in generic_sync_sb_inodes() continue: this
      may starve newly expired inodes in s_dirty.  It is also not an option to
      draw inodes from both s_more_io and s_dirty, an let the loop go on: this
      might lead to live locks, and might also starve other superblocks in sync
      time(well kupdate may still starve some superblocks, that's another bug).
      
      We have to return when a full scan of s_io completes.  So nr_to_write > 0
      does not necessarily mean that "all data are written".  This patch
      introduces a flag writeback_control.more_io to indicate that more io should
      be done.  With it the big dirty file no longer has to wait for the next
      kupdate invokation 5s later.
      
      In sync_sb_inodes() we only set more_io on super_blocks we actually
      visited.  This avoids the interaction between two pdflush deamons.
      
      Also in __sync_single_inode() we don't blindly keep requeuing the io if the
      filesystem cannot progress.  Failing to do so may lead to 100% iowait.
      Tested-by: NMike Snitzer <snitzer@gmail.com>
      Signed-off-by: NFengguang Wu <wfg@mail.ustc.edu.cn>
      Cc: Michael Rubin <mrubin@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8bc3be27
    • B
      mm/page-writeback: highmem_is_dirtyable option · 195cf453
      Bron Gondwana 提交于
      Add vm.highmem_is_dirtyable toggle
      
      A 32 bit machine with HIGHMEM64 enabled running DCC has an MMAPed file of
      approximately 2Gb size which contains a hash format that is written
      randomly by the dbclean process.  On 2.6.16 this process took a few
      minutes.  With lowmem only accounting of dirty ratios, this takes about 12
      hours of 100% disk IO, all random writes.
      
      Include a toggle in /proc/sys/vm/highmem_is_dirtyable which can be set to 1 to
      add the highmem back to the total available memory count.
      
      [akpm@linux-foundation.org: Fix the CONFIG_DETECT_SOFTLOCKUP=y build]
      Signed-off-by: NBron Gondwana <brong@fastmail.fm>
      Cc: Ethan Solomita <solo@google.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: WU Fengguang <wfg@mail.ustc.edu.cn>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      195cf453
  12. 15 1月, 2008 1 次提交
  13. 17 10月, 2007 3 次提交
    • J
      introduce I_SYNC · 1c0eeaf5
      Joern Engel 提交于
      I_LOCK was used for several unrelated purposes, which caused deadlock
      situations in certain filesystems as a side effect.  One of the purposes
      now uses the new I_SYNC bit.
      
      Also document the various bits and change their order from historical to
      logical.
      
      [bunk@stusta.de: make fs/inode.c:wake_up_inode() static]
      Signed-off-by: NJoern Engel <joern@wohnheim.fh-wedel.de>
      Cc: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
      Cc: David Chinner <dgc@sgi.com>
      Cc: Anton Altaparmakov <aia21@cam.ac.uk>
      Cc: Al Viro <viro@ftp.linux.org.uk>
      Cc: Christoph Hellwig <hch@infradead.org>
      Signed-off-by: NAdrian Bunk <bunk@stusta.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1c0eeaf5
    • F
      writeback: introduce writeback_control.more_io to indicate more io · 2e6883bd
      Fengguang Wu 提交于
      After making dirty a 100M file, the normal behavior is to start the writeback
      for all data after 30s delays.  But sometimes the following happens instead:
      
      	- after 30s:    ~4M
      	- after 5s:     ~4M
      	- after 5s:     all remaining 92M
      
      Some analyze shows that the internal io dispatch queues goes like this:
      
      		s_io            s_more_io
      		-------------------------
      	1)	100M,1K         0
      	2)	1K              96M
      	3)	0               96M
      
      1) initial state with a 100M file and a 1K file
      2) 4M written, nr_to_write <= 0, so write more
      3) 1K written, nr_to_write > 0, no more writes(BUG)
      
      nr_to_write > 0 in (3) fools the upper layer to think that data have all been
      written out.  The big dirty file is actually still sitting in s_more_io.  We
      cannot simply splice s_more_io back to s_io as soon as s_io becomes empty, and
      let the loop in generic_sync_sb_inodes() continue: this may starve newly
      expired inodes in s_dirty.  It is also not an option to draw inodes from both
      s_more_io and s_dirty, an let the loop go on: this might lead to live locks,
      and might also starve other superblocks in sync time(well kupdate may still
      starve some superblocks, that's another bug).
      
      We have to return when a full scan of s_io completes.  So nr_to_write > 0 does
      not necessarily mean that "all data are written".  This patch introduces a
      flag writeback_control.more_io to indicate this situation.  With it the big
      dirty file no longer has to wait for the next kupdate invocation 5s later.
      
      Cc: David Chinner <dgc@sgi.com>
      Cc: Ken Chen <kenchen@google.com>
      Signed-off-by: NFengguang Wu <wfg@mail.ustc.edu.cn>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2e6883bd
    • P
      mm: per device dirty threshold · 04fbfdc1
      Peter Zijlstra 提交于
      Scale writeback cache per backing device, proportional to its writeout speed.
      
      By decoupling the BDI dirty thresholds a number of problems we currently have
      will go away, namely:
      
       - mutual interference starvation (for any number of BDIs);
       - deadlocks with stacked BDIs (loop, FUSE and local NFS mounts).
      
      It might be that all dirty pages are for a single BDI while other BDIs are
      idling. By giving each BDI a 'fair' share of the dirty limit, each one can have
      dirty pages outstanding and make progress.
      
      A global threshold also creates a deadlock for stacked BDIs; when A writes to
      B, and A generates enough dirty pages to get throttled, B will never start
      writeback until the dirty pages go away. Again, by giving each BDI its own
      'independent' dirty limit, this problem is avoided.
      
      So the problem is to determine how to distribute the total dirty limit across
      the BDIs fairly and efficiently. A DBI that has a large dirty limit but does
      not have any dirty pages outstanding is a waste.
      
      What is done is to keep a floating proportion between the DBIs based on
      writeback completions. This way faster/more active devices get a larger share
      than slower/idle devices.
      
      [akpm@linux-foundation.org: fix warnings]
      [hugh@veritas.com: Fix occasional hang when a task couldn't get out of balance_dirty_pages]
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      04fbfdc1
  14. 10 10月, 2007 2 次提交
  15. 09 10月, 2007 1 次提交
  16. 22 5月, 2007 1 次提交
    • A
      Detach sched.h from mm.h · e8edc6e0
      Alexey Dobriyan 提交于
      First thing mm.h does is including sched.h solely for can_do_mlock() inline
      function which has "current" dereference inside. By dealing with can_do_mlock()
      mm.h can be detached from sched.h which is good. See below, why.
      
      This patch
      a) removes unconditional inclusion of sched.h from mm.h
      b) makes can_do_mlock() normal function in mm/mlock.c
      c) exports can_do_mlock() to not break compilation
      d) adds sched.h inclusions back to files that were getting it indirectly.
      e) adds less bloated headers to some files (asm/signal.h, jiffies.h) that were
         getting them indirectly
      
      Net result is:
      a) mm.h users would get less code to open, read, preprocess, parse, ... if
         they don't need sched.h
      b) sched.h stops being dependency for significant number of files:
         on x86_64 allmodconfig touching sched.h results in recompile of 4083 files,
         after patch it's only 3744 (-8.3%).
      
      Cross-compile tested on
      
      	all arm defconfigs, all mips defconfigs, all powerpc defconfigs,
      	alpha alpha-up
      	arm
      	i386 i386-up i386-defconfig i386-allnoconfig
      	ia64 ia64-up
      	m68k
      	mips
      	parisc parisc-up
      	powerpc powerpc-up
      	s390 s390-up
      	sparc sparc-up
      	sparc64 sparc64-up
      	um-x86_64
      	x86_64 x86_64-up x86_64-defconfig x86_64-allnoconfig
      
      as well as my two usual configs.
      Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e8edc6e0
  17. 11 5月, 2007 1 次提交
  18. 01 5月, 2007 1 次提交
    • T
      NFS: Fix a race when doing NFS write coalescing · c63c7b05
      Trond Myklebust 提交于
      Currently we do write coalescing in a very inefficient manner: one pass in
      generic_writepages() in order to lock the pages for writing, then one pass
      in nfs_flush_mapping() and/or nfs_sync_mapping_wait() in order to gather
      the locked pages for coalescing into RPC requests of size "wsize".
      
      In fact, it turns out there is actually a deadlock possible here since we
      only start I/O on the second pass. If the user signals the process while
      we're in nfs_sync_mapping_wait(), for instance, then we may exit before
      starting I/O on all the requests that have been queued up.
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      c63c7b05
  19. 02 3月, 2007 1 次提交
  20. 21 10月, 2006 1 次提交
    • A
      [PATCH] separate bdi congestion functions from queue congestion functions · 3fcfab16
      Andrew Morton 提交于
      Separate out the concept of "queue congestion" from "backing-dev congestion".
      Congestion is a backing-dev concept, not a queue concept.
      
      The blk_* congestion functions are retained, as wrappers around the core
      backing-dev congestion functions.
      
      This proper layering is needed so that NFS can cleanly use the congestion
      functions, and so that CONFIG_BLOCK=n actually links.
      
      Cc: "Thomas Maier" <balagi@justmail.de>
      Cc: "Jens Axboe" <jens.axboe@oracle.com>
      Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Peter Osterlund <petero2@telia.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      3fcfab16
  21. 04 10月, 2006 1 次提交
  22. 01 10月, 2006 1 次提交
  23. 30 9月, 2006 1 次提交
  24. 26 9月, 2006 1 次提交
  25. 23 9月, 2006 1 次提交
  26. 23 6月, 2006 1 次提交
    • O
      [PATCH] writeback: fix range handling · 111ebb6e
      OGAWA Hirofumi 提交于
      When a writeback_control's `start' and `end' fields are used to
      indicate a one-byte-range starting at file offset zero, the required
      values of .start=0,.end=0 mean that the ->writepages() implementation
      has no way of telling that it is being asked to perform a range
      request.  Because we're currently overloading (start == 0 && end == 0)
      to mean "this is not a write-a-range request".
      
      To make all this sane, the patch changes range of writeback_control.
      
      So caller does: If it is calling ->writepages() to write pages, it
      sets range (range_start/end or range_cyclic) always.
      
      And if range_cyclic is true, ->writepages() thinks the range is
      cyclic, otherwise it just uses range_start and range_end.
      
      This patch does,
      
          - Add LLONG_MAX, LLONG_MIN, ULLONG_MAX to include/linux/kernel.h
            -1 is usually ok for range_end (type is long long). But, if someone did,
      
      		range_end += val;		range_end is "val - 1"
      		u64val = range_end >> bits;	u64val is "~(0ULL)"
      
            or something, they are wrong. So, this adds LLONG_MAX to avoid nasty
            things, and uses LLONG_MAX for range_end.
      
          - All callers of ->writepages() sets range_start/end or range_cyclic.
      
          - Fix updates of ->writeback_index. It seems already bit strange.
            If it starts at 0 and ended by check of nr_to_write, this last
            index may reduce chance to scan end of file.  So, this updates
            ->writeback_index only if range_cyclic is true or whole-file is
            scanned.
      Signed-off-by: NOGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
      Cc: Nathan Scott <nathans@sgi.com>
      Cc: Anton Altaparmakov <aia21@cantab.net>
      Cc: Steven French <sfrench@us.ibm.com>
      Cc: "Vladimir V. Saveliev" <vs@namesys.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      111ebb6e
  27. 24 3月, 2006 2 次提交
  28. 09 1月, 2006 1 次提交
  29. 07 1月, 2006 1 次提交
    • A
      identify multipage ->writepages() calls · 22905f77
      Andrew Morton 提交于
       NFS needs to be able to distinguish between single-page ->writepage() calls and
       multipage ->writepages() calls.
      
       For the single-page writepage calls NFS can kick off the I/O within the
       context of ->writepage().
      
       For multipage ->writepages calls, nfs_writepage() will leave the I/O pending
       and nfs_writepages() will kick off the I/O when it all has been queued up
       within NFS.
      
       Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      22905f77
  30. 04 1月, 2006 1 次提交
    • Z
      [PATCH] add AOP_TRUNCATED_PAGE, prepend AOP_ to WRITEPAGE_ACTIVATE · 994fc28c
      Zach Brown 提交于
      readpage(), prepare_write(), and commit_write() callers are updated to
      understand the special return code AOP_TRUNCATED_PAGE in the style of
      writepage() and WRITEPAGE_ACTIVATE.  AOP_TRUNCATED_PAGE tells the caller that
      the callee has unlocked the page and that the operation should be tried again
      with a new page.  OCFS2 uses this to detect and work around a lock inversion in
      its aop methods.  There should be no change in behaviour for methods that don't
      return AOP_TRUNCATED_PAGE.
      
      WRITEPAGE_ACTIVATE is also prepended with AOP_ for consistency and they are
      made enums so that kerneldoc can be used to document their semantics.
      Signed-off-by: NZach Brown <zach.brown@oracle.com>
      994fc28c
  31. 11 9月, 2005 1 次提交
  32. 29 6月, 2005 1 次提交
  33. 28 6月, 2005 1 次提交
    • J
      [PATCH] Update cfq io scheduler to time sliced design · 22e2c507
      Jens Axboe 提交于
      This updates the CFQ io scheduler to the new time sliced design (cfq
      v3).  It provides full process fairness, while giving excellent
      aggregate system throughput even for many competing processes.  It
      supports io priorities, either inherited from the cpu nice value or set
      directly with the ioprio_get/set syscalls.  The latter closely mimic
      set/getpriority.
      
      This import is based on my latest from -mm.
      Signed-off-by: NJens Axboe <axboe@suse.de>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      22e2c507