1. 25 10月, 2010 1 次提交
    • R
      Revalidate caches on lock · 6b96724e
      Ricardo Labiaga 提交于
      Instead of blindly zapping the caches, attempt to revalidate them if
      the server has indicated that it uses high resolution timestamps.
      
      NFSv4 should be able to always revalidate the cache since the
      protocol requires the update of the change attribute on modification of
      the data.  In reality, there are servers (the Linux NFS server
      for example) that do not obey this requirement and use ctime as the
      basis for change attribute.  Long term, the server needs to be fixed.
      At this time, and to be on the safe side, continue zapping caches if
      the server indicates that it does not have a high resolution timestamp.
      Signed-off-by: NRicardo Labiaga <Ricardo.Labiaga@netapp.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      6b96724e
  2. 20 10月, 2010 1 次提交
  3. 23 9月, 2010 1 次提交
    • S
      nfs: introduce mount option '-olocal_lock' to make locks local · 5eebde23
      Suresh Jayaraman 提交于
      NFS clients since 2.6.12 support flock locks by emulating fcntl byte-range
      locks. Due to this, some windows applications which seem to use both flock
      (share mode lock mapped as flock by Samba) and fcntl locks sequentially on
      the same file, can't lock as they falsely assume the file is already locked.
      The problem was reported on a setup with windows clients accessing excel files
      on a Samba exported share which is originally a NFS mount from a NetApp filer.
      
      Older NFS clients (< 2.6.12) did not see this problem as flock locks were
      considered local. To support legacy flock behavior, this patch adds a mount
      option "-olocal_lock=" which can take the following values:
      
         'none'  		- Neither flock locks nor POSIX locks are local
         'flock' 		- flock locks are local
         'posix' 		- fcntl/POSIX locks are local
         'all'		- Both flock locks and POSIX locks are local
      
      Testing:
      
         - This patch was tested by using -olocal_lock option with different values
           and the NLM calls were noted from the network packet captured.
      
           'none'  - NLM calls were seen during both flock() and fcntl(), flock lock
         	       was granted, fcntl was denied
           'flock' - no NLM calls for flock(), NLM call was seen for fcntl(),
         	       granted
           'posix' - NLM call was seen for flock() - granted, no NLM call for fcntl()
           'all'   - no NLM calls were seen during both flock() and fcntl()
      
         - No bugs were seen during NFSv4 locking/unlocking in general and NFSv4
           reboot recovery.
      
      Cc: Neil Brown <neilb@suse.de>
      Signed-off-by: NSuresh Jayaraman <sjayaraman@suse.de>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      5eebde23
  4. 12 8月, 2010 1 次提交
  5. 04 8月, 2010 1 次提交
  6. 31 7月, 2010 1 次提交
  7. 28 5月, 2010 1 次提交
  8. 15 5月, 2010 1 次提交
  9. 30 3月, 2010 1 次提交
    • T
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking... · 5a0e3ad6
      Tejun Heo 提交于
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
      
      percpu.h is included by sched.h and module.h and thus ends up being
      included when building most .c files.  percpu.h includes slab.h which
      in turn includes gfp.h making everything defined by the two files
      universally available and complicating inclusion dependencies.
      
      percpu.h -> slab.h dependency is about to be removed.  Prepare for
      this change by updating users of gfp and slab facilities include those
      headers directly instead of assuming availability.  As this conversion
      needs to touch large number of source files, the following script is
      used as the basis of conversion.
      
        http://userweb.kernel.org/~tj/misc/slabh-sweep.py
      
      The script does the followings.
      
      * Scan files for gfp and slab usages and update includes such that
        only the necessary includes are there.  ie. if only gfp is used,
        gfp.h, if slab is used, slab.h.
      
      * When the script inserts a new include, it looks at the include
        blocks and try to put the new include such that its order conforms
        to its surrounding.  It's put in the include block which contains
        core kernel includes, in the same order that the rest are ordered -
        alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
        doesn't seem to be any matching order.
      
      * If the script can't find a place to put a new include (mostly
        because the file doesn't have fitting include block), it prints out
        an error message indicating which .h file needs to be added to the
        file.
      
      The conversion was done in the following steps.
      
      1. The initial automatic conversion of all .c files updated slightly
         over 4000 files, deleting around 700 includes and adding ~480 gfp.h
         and ~3000 slab.h inclusions.  The script emitted errors for ~400
         files.
      
      2. Each error was manually checked.  Some didn't need the inclusion,
         some needed manual addition while adding it to implementation .h or
         embedding .c file was more appropriate for others.  This step added
         inclusions to around 150 files.
      
      3. The script was run again and the output was compared to the edits
         from #2 to make sure no file was left behind.
      
      4. Several build tests were done and a couple of problems were fixed.
         e.g. lib/decompress_*.c used malloc/free() wrappers around slab
         APIs requiring slab.h to be added manually.
      
      5. The script was run on all .h files but without automatically
         editing them as sprinkling gfp.h and slab.h inclusions around .h
         files could easily lead to inclusion dependency hell.  Most gfp.h
         inclusion directives were ignored as stuff from gfp.h was usually
         wildly available and often used in preprocessor macros.  Each
         slab.h inclusion directive was examined and added manually as
         necessary.
      
      6. percpu.h was updated not to include slab.h.
      
      7. Build test were done on the following configurations and failures
         were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
         distributed build env didn't work with gcov compiles) and a few
         more options had to be turned off depending on archs to make things
         build (like ipr on powerpc/64 which failed due to missing writeq).
      
         * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
         * powerpc and powerpc64 SMP allmodconfig
         * sparc and sparc64 SMP allmodconfig
         * ia64 SMP allmodconfig
         * s390 SMP allmodconfig
         * alpha SMP allmodconfig
         * um on x86_64 SMP allmodconfig
      
      8. percpu.h modifications were reverted so that it could be applied as
         a separate patch and serve as bisection point.
      
      Given the fact that I had only a couple of failures from tests on step
      6, I'm fairly confident about the coverage of this conversion patch.
      If there is a breakage, it's likely to be something in one of the arch
      headers which should be easily discoverable easily on most builds of
      the specific arch.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Guess-its-ok-by: NChristoph Lameter <cl@linux-foundation.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
      5a0e3ad6
  10. 20 3月, 2010 1 次提交
  11. 10 2月, 2010 4 次提交
  12. 27 1月, 2010 1 次提交
  13. 10 12月, 2009 1 次提交
    • C
      vfs: Implement proper O_SYNC semantics · 6b2f3d1f
      Christoph Hellwig 提交于
      While Linux provided an O_SYNC flag basically since day 1, it took until
      Linux 2.4.0-test12pre2 to actually get it implemented for filesystems,
      since that day we had generic_osync_around with only minor changes and the
      great "For now, when the user asks for O_SYNC, we'll actually give
      O_DSYNC" comment.  This patch intends to actually give us real O_SYNC
      semantics in addition to the O_DSYNC semantics.  After Jan's O_SYNC
      patches which are required before this patch it's actually surprisingly
      simple, we just need to figure out when to set the datasync flag to
      vfs_fsync_range and when not.
      
      This patch renames the existing O_SYNC flag to O_DSYNC while keeping it's
      numerical value to keep binary compatibility, and adds a new real O_SYNC
      flag.  To guarantee backwards compatiblity it is defined as expanding to
      both the O_DSYNC and the new additional binary flag (__O_SYNC) to make
      sure we are backwards-compatible when compiled against the new headers.
      
      This also means that all places that don't care about the differences can
      just check O_DSYNC and get the right behaviour for O_SYNC, too - only
      places that actuall care need to check __O_SYNC in addition.  Drivers and
      network filesystems have been updated in a fail safe way to always do the
      full sync magic if O_DSYNC is set.  The few places setting O_SYNC for
      lower layers are kept that way for now to stay failsafe.
      
      We enforce that O_DSYNC is set when __O_SYNC is set early in the open path
      to make sure we always get these sane options.
      
      Note that parisc really screwed up their headers as they already define a
      O_DSYNC that has always been a no-op.  We try to repair it by using it for
      the new O_DSYNC and redefinining O_SYNC to send both the traditional
      O_SYNC numerical value _and_ the O_DSYNC one.
      
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Grant Grundler <grundler@parisc-linux.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Andreas Dilger <adilger@sun.com>
      Acked-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      Acked-by: NKyle McMartin <kyle@mcmartin.ca>
      Acked-by: NUlrich Drepper <drepper@redhat.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NJan Kara <jack@suse.cz>
      6b2f3d1f
  14. 28 9月, 2009 1 次提交
  15. 16 9月, 2009 1 次提交
  16. 10 8月, 2009 2 次提交
    • P
      NFS: read-modify-write page updating · 38c73044
      Peter Staubach 提交于
      Hi.
      
      I have a proposal for possibly resolving this issue.
      
      I believe that this situation occurs due to the way that the
      Linux NFS client handles writes which modify partial pages.
      
      The Linux NFS client handles partial page modifications by
      allocating a page from the page cache, copying the data from
      the user level into the page, and then keeping track of the
      offset and length of the modified portions of the page.  The
      page is not marked as up to date because there are portions
      of the page which do not contain valid file contents.
      
      When a read call comes in for a portion of the page, the
      contents of the page must be read in the from the server.
      However, since the page may already contain some modified
      data, that modified data must be written to the server
      before the file contents can be read back in the from server.
      And, since the writing and reading can not be done atomically,
      the data must be written and committed to stable storage on
      the server for safety purposes.  This means either a
      FILE_SYNC WRITE or a UNSTABLE WRITE followed by a COMMIT.
      This has been discussed at length previously.
      
      This algorithm could be described as modify-write-read.  It
      is most efficient when the application only updates pages
      and does not read them.
      
      My proposed solution is to add a heuristic to decide whether
      to do this modify-write-read algorithm or switch to a read-
      modify-write algorithm when initially allocating the page
      in the write system call path.  The heuristic uses the modes
      that the file was opened with, the offset in the page to
      read from, and the size of the region to read.
      
      If the file was opened for reading in addition to writing
      and the page would not be filled completely with data from
      the user level, then read in the old contents of the page
      and mark it as Uptodate before copying in the new data.  If
      the page would be completely filled with data from the user
      level, then there would be no reason to read in the old
      contents because they would just be copied over.
      
      This would optimize for applications which randomly access
      and update portions of files.  The linkage editor for the
      C compiler is an example of such a thing.
      
      I tested the attached patch by using rpmbuild to build the
      current Fedora rawhide kernel.  The kernel without the
      patch generated about 269,500 WRITE requests.  The modified
      kernel containing the patch generated about 261,000 WRITE
      requests.  Thus, about 8,500 fewer WRITE requests were
      generated.  I suspect that many of these additional
      WRITE requests were probably FILE_SYNC requests to WRITE
      a single page, but I didn't test this theory.
      
      The difference between this patch and the previous one was
      to remove the unneeded PageDirty() test.  I then retested to
      ensure that the resulting system continued to behave as
      desired.
      
      	Thanx...
      
      		ps
      Signed-off-by: NPeter Staubach <staubach@redhat.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      38c73044
    • T
      NFS: Add a ->migratepage() aop for NFS · 074cc1de
      Trond Myklebust 提交于
      Make NFS a bit more friendly to NUMA and memory hot removal...
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      074cc1de
  17. 13 7月, 2009 1 次提交
  18. 18 6月, 2009 2 次提交
  19. 03 5月, 2009 1 次提交
  20. 08 4月, 2009 1 次提交
  21. 03 4月, 2009 2 次提交
  22. 01 4月, 2009 1 次提交
    • N
      mm: page_mkwrite change prototype to match fault · c2ec175c
      Nick Piggin 提交于
      Change the page_mkwrite prototype to take a struct vm_fault, and return
      VM_FAULT_xxx flags.  There should be no functional change.
      
      This makes it possible to return much more detailed error information to
      the VM (and also can provide more information eg.  virtual_address to the
      driver, which might be important in some special cases).
      
      This is required for a subsequent fix.  And will also make it easier to
      merge page_mkwrite() with fault() in future.
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
      Cc: Miklos Szeredi <miklos@szeredi.hu>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <joel.becker@oracle.com>
      Cc: Artem Bityutskiy <dedekind@infradead.org>
      Cc: Felix Blyakher <felixb@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c2ec175c
  23. 20 3月, 2009 1 次提交
    • T
      NFS: Optimise NFS close() · 7fe5c398
      Trond Myklebust 提交于
      Close-to-open cache consistency rules really only require us to flush out
      writes on calls to close(), and require us to revalidate attributes on the
      very last close of the file.
      
      Currently we appear to be doing a lot of extra attribute revalidation
      and cache flushes.
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      7fe5c398
  24. 12 3月, 2009 2 次提交
    • T
      NFS: Kill the "defined but not used" compile error on nommu machines · e1ebfd33
      Trond Myklebust 提交于
      Bryan Wu reports that when compiling NFS on nommu machines he gets a
      "defined but not used" error on nfs_file_mmap().
      
      The easiest fix is simply to get rid of the special casing in NFS, and
      just always call generic_file_mmap() to set up the file.
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      e1ebfd33
    • T
      NFS: Throttle page dirtying while we're flushing to disk · 72cb77f4
      Trond Myklebust 提交于
      The following patch is a combination of a patch by myself and Peter
      Staubach.
      
      Trond: If we allow other processes to dirty pages while a process is doing
      a consistency sync to disk, we can end up never making progress.
      
      Peter: Attached is a patch which addresses a continuing problem with
      the NFS client generating out of order WRITE requests.  While
      this is compliant with all of the current protocol
      specifications, there are servers in the market which can not
      handle out of order WRITE requests very well.  Also, this may
      lead to sub-optimal block allocations in the underlying file
      system on the server.  This may cause the read throughputs to
      be reduced when reading the file from the server.
      
      Peter: There has been a lot of work recently done to address out of
      order issues on a systemic level.  However, the NFS client is
      still susceptible to the problem.  Out of order WRITE
      requests can occur when pdflush is in the middle of writing
      out pages while the process dirtying the pages calls
      generic_file_buffered_write which calls
      generic_perform_write which calls
      balance_dirty_pages_rate_limited which ends up calling
      writeback_inodes which ends up calling back into the NFS
      client to writes out dirty pages for the same file that
      pdflush happens to be working with.
      Signed-off-by: NPeter Staubach <staubach@redhat.com>
      [modification by Trond to merge the two similar patches]
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      72cb77f4
  25. 05 1月, 2009 1 次提交
    • N
      fs: symlink write_begin allocation context fix · 54566b2c
      Nick Piggin 提交于
      With the write_begin/write_end aops, page_symlink was broken because it
      could no longer pass a GFP_NOFS type mask into the point where the
      allocations happened.  They are done in write_begin, which would always
      assume that the filesystem can be entered from reclaim.  This bug could
      cause filesystem deadlocks.
      
      The funny thing with having a gfp_t mask there is that it doesn't really
      allow the caller to arbitrarily tinker with the context in which it can be
      called.  It couldn't ever be GFP_ATOMIC, for example, because it needs to
      take the page lock.  The only thing any callers care about is __GFP_FS
      anyway, so turn that into a single flag.
      
      Add a new flag for write_begin, AOP_FLAG_NOFS.  Filesystems can now act on
      this flag in their write_begin function.  Change __grab_cache_page to
      accept a nofs argument as well, to honour that flag (while we're there,
      change the name to grab_cache_page_write_begin which is more instructive
      and does away with random leading underscores).
      
      This is really a more flexible way to go in the end anyway -- if a
      filesystem happens to want any extra allocations aside from the pagecache
      ones in ints write_begin function, it may now use GFP_KERNEL (rather than
      GFP_NOFS) for common case allocations (eg.  ocfs2_alloc_write_ctxt, for a
      random example).
      
      [kosaki.motohiro@jp.fujitsu.com: fix ubifs]
      [kosaki.motohiro@jp.fujitsu.com: fix fuse]
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Reviewed-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: <stable@kernel.org>		[2.6.28.x]
      Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      [ Cleaned up the calling convention: just pass in the AOP flags
        untouched to the grab_cache_page_write_begin() function.  That
        just simplifies everybody, and may even allow future expansion of the
        logic.   - Linus ]
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      54566b2c
  26. 08 10月, 2008 1 次提交
  27. 07 10月, 2008 1 次提交
    • T
      NFS: Fix nfs_file_llseek() · d5e66348
      Trond Myklebust 提交于
      After the BKL removal patches were applied to the rest of the NFS code, the
      BKL protection in nfs_file_llseek() is no longer sufficient to ensure that
      inode->i_size is read safely in generic_file_llseek_unlocked().
      
      In order to fix the situation, we either have to replace the naked read of
      inode->i_size in generic_file_llseek_unlocked() with i_size_read(), or the
      whole thing needs to be executed under the inode->i_lock;
      In order to avoid disrupting other filesystems, avoid touching
      generic_file_llseek_unlocked() for now...
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      d5e66348
  28. 16 7月, 2008 2 次提交
  29. 10 7月, 2008 4 次提交