1. 27 1月, 2010 1 次提交
  2. 10 12月, 2009 1 次提交
    • C
      vfs: Implement proper O_SYNC semantics · 6b2f3d1f
      Christoph Hellwig 提交于
      While Linux provided an O_SYNC flag basically since day 1, it took until
      Linux 2.4.0-test12pre2 to actually get it implemented for filesystems,
      since that day we had generic_osync_around with only minor changes and the
      great "For now, when the user asks for O_SYNC, we'll actually give
      O_DSYNC" comment.  This patch intends to actually give us real O_SYNC
      semantics in addition to the O_DSYNC semantics.  After Jan's O_SYNC
      patches which are required before this patch it's actually surprisingly
      simple, we just need to figure out when to set the datasync flag to
      vfs_fsync_range and when not.
      
      This patch renames the existing O_SYNC flag to O_DSYNC while keeping it's
      numerical value to keep binary compatibility, and adds a new real O_SYNC
      flag.  To guarantee backwards compatiblity it is defined as expanding to
      both the O_DSYNC and the new additional binary flag (__O_SYNC) to make
      sure we are backwards-compatible when compiled against the new headers.
      
      This also means that all places that don't care about the differences can
      just check O_DSYNC and get the right behaviour for O_SYNC, too - only
      places that actuall care need to check __O_SYNC in addition.  Drivers and
      network filesystems have been updated in a fail safe way to always do the
      full sync magic if O_DSYNC is set.  The few places setting O_SYNC for
      lower layers are kept that way for now to stay failsafe.
      
      We enforce that O_DSYNC is set when __O_SYNC is set early in the open path
      to make sure we always get these sane options.
      
      Note that parisc really screwed up their headers as they already define a
      O_DSYNC that has always been a no-op.  We try to repair it by using it for
      the new O_DSYNC and redefinining O_SYNC to send both the traditional
      O_SYNC numerical value _and_ the O_DSYNC one.
      
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Grant Grundler <grundler@parisc-linux.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Andreas Dilger <adilger@sun.com>
      Acked-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      Acked-by: NKyle McMartin <kyle@mcmartin.ca>
      Acked-by: NUlrich Drepper <drepper@redhat.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NJan Kara <jack@suse.cz>
      6b2f3d1f
  3. 28 9月, 2009 1 次提交
  4. 16 9月, 2009 1 次提交
  5. 10 8月, 2009 2 次提交
    • P
      NFS: read-modify-write page updating · 38c73044
      Peter Staubach 提交于
      Hi.
      
      I have a proposal for possibly resolving this issue.
      
      I believe that this situation occurs due to the way that the
      Linux NFS client handles writes which modify partial pages.
      
      The Linux NFS client handles partial page modifications by
      allocating a page from the page cache, copying the data from
      the user level into the page, and then keeping track of the
      offset and length of the modified portions of the page.  The
      page is not marked as up to date because there are portions
      of the page which do not contain valid file contents.
      
      When a read call comes in for a portion of the page, the
      contents of the page must be read in the from the server.
      However, since the page may already contain some modified
      data, that modified data must be written to the server
      before the file contents can be read back in the from server.
      And, since the writing and reading can not be done atomically,
      the data must be written and committed to stable storage on
      the server for safety purposes.  This means either a
      FILE_SYNC WRITE or a UNSTABLE WRITE followed by a COMMIT.
      This has been discussed at length previously.
      
      This algorithm could be described as modify-write-read.  It
      is most efficient when the application only updates pages
      and does not read them.
      
      My proposed solution is to add a heuristic to decide whether
      to do this modify-write-read algorithm or switch to a read-
      modify-write algorithm when initially allocating the page
      in the write system call path.  The heuristic uses the modes
      that the file was opened with, the offset in the page to
      read from, and the size of the region to read.
      
      If the file was opened for reading in addition to writing
      and the page would not be filled completely with data from
      the user level, then read in the old contents of the page
      and mark it as Uptodate before copying in the new data.  If
      the page would be completely filled with data from the user
      level, then there would be no reason to read in the old
      contents because they would just be copied over.
      
      This would optimize for applications which randomly access
      and update portions of files.  The linkage editor for the
      C compiler is an example of such a thing.
      
      I tested the attached patch by using rpmbuild to build the
      current Fedora rawhide kernel.  The kernel without the
      patch generated about 269,500 WRITE requests.  The modified
      kernel containing the patch generated about 261,000 WRITE
      requests.  Thus, about 8,500 fewer WRITE requests were
      generated.  I suspect that many of these additional
      WRITE requests were probably FILE_SYNC requests to WRITE
      a single page, but I didn't test this theory.
      
      The difference between this patch and the previous one was
      to remove the unneeded PageDirty() test.  I then retested to
      ensure that the resulting system continued to behave as
      desired.
      
      	Thanx...
      
      		ps
      Signed-off-by: NPeter Staubach <staubach@redhat.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      38c73044
    • T
      NFS: Add a ->migratepage() aop for NFS · 074cc1de
      Trond Myklebust 提交于
      Make NFS a bit more friendly to NUMA and memory hot removal...
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      074cc1de
  6. 13 7月, 2009 1 次提交
  7. 18 6月, 2009 2 次提交
  8. 03 5月, 2009 1 次提交
  9. 08 4月, 2009 1 次提交
  10. 03 4月, 2009 2 次提交
  11. 01 4月, 2009 1 次提交
    • N
      mm: page_mkwrite change prototype to match fault · c2ec175c
      Nick Piggin 提交于
      Change the page_mkwrite prototype to take a struct vm_fault, and return
      VM_FAULT_xxx flags.  There should be no functional change.
      
      This makes it possible to return much more detailed error information to
      the VM (and also can provide more information eg.  virtual_address to the
      driver, which might be important in some special cases).
      
      This is required for a subsequent fix.  And will also make it easier to
      merge page_mkwrite() with fault() in future.
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
      Cc: Miklos Szeredi <miklos@szeredi.hu>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <joel.becker@oracle.com>
      Cc: Artem Bityutskiy <dedekind@infradead.org>
      Cc: Felix Blyakher <felixb@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c2ec175c
  12. 20 3月, 2009 1 次提交
    • T
      NFS: Optimise NFS close() · 7fe5c398
      Trond Myklebust 提交于
      Close-to-open cache consistency rules really only require us to flush out
      writes on calls to close(), and require us to revalidate attributes on the
      very last close of the file.
      
      Currently we appear to be doing a lot of extra attribute revalidation
      and cache flushes.
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      7fe5c398
  13. 12 3月, 2009 2 次提交
    • T
      NFS: Kill the "defined but not used" compile error on nommu machines · e1ebfd33
      Trond Myklebust 提交于
      Bryan Wu reports that when compiling NFS on nommu machines he gets a
      "defined but not used" error on nfs_file_mmap().
      
      The easiest fix is simply to get rid of the special casing in NFS, and
      just always call generic_file_mmap() to set up the file.
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      e1ebfd33
    • T
      NFS: Throttle page dirtying while we're flushing to disk · 72cb77f4
      Trond Myklebust 提交于
      The following patch is a combination of a patch by myself and Peter
      Staubach.
      
      Trond: If we allow other processes to dirty pages while a process is doing
      a consistency sync to disk, we can end up never making progress.
      
      Peter: Attached is a patch which addresses a continuing problem with
      the NFS client generating out of order WRITE requests.  While
      this is compliant with all of the current protocol
      specifications, there are servers in the market which can not
      handle out of order WRITE requests very well.  Also, this may
      lead to sub-optimal block allocations in the underlying file
      system on the server.  This may cause the read throughputs to
      be reduced when reading the file from the server.
      
      Peter: There has been a lot of work recently done to address out of
      order issues on a systemic level.  However, the NFS client is
      still susceptible to the problem.  Out of order WRITE
      requests can occur when pdflush is in the middle of writing
      out pages while the process dirtying the pages calls
      generic_file_buffered_write which calls
      generic_perform_write which calls
      balance_dirty_pages_rate_limited which ends up calling
      writeback_inodes which ends up calling back into the NFS
      client to writes out dirty pages for the same file that
      pdflush happens to be working with.
      Signed-off-by: NPeter Staubach <staubach@redhat.com>
      [modification by Trond to merge the two similar patches]
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      72cb77f4
  14. 05 1月, 2009 1 次提交
    • N
      fs: symlink write_begin allocation context fix · 54566b2c
      Nick Piggin 提交于
      With the write_begin/write_end aops, page_symlink was broken because it
      could no longer pass a GFP_NOFS type mask into the point where the
      allocations happened.  They are done in write_begin, which would always
      assume that the filesystem can be entered from reclaim.  This bug could
      cause filesystem deadlocks.
      
      The funny thing with having a gfp_t mask there is that it doesn't really
      allow the caller to arbitrarily tinker with the context in which it can be
      called.  It couldn't ever be GFP_ATOMIC, for example, because it needs to
      take the page lock.  The only thing any callers care about is __GFP_FS
      anyway, so turn that into a single flag.
      
      Add a new flag for write_begin, AOP_FLAG_NOFS.  Filesystems can now act on
      this flag in their write_begin function.  Change __grab_cache_page to
      accept a nofs argument as well, to honour that flag (while we're there,
      change the name to grab_cache_page_write_begin which is more instructive
      and does away with random leading underscores).
      
      This is really a more flexible way to go in the end anyway -- if a
      filesystem happens to want any extra allocations aside from the pagecache
      ones in ints write_begin function, it may now use GFP_KERNEL (rather than
      GFP_NOFS) for common case allocations (eg.  ocfs2_alloc_write_ctxt, for a
      random example).
      
      [kosaki.motohiro@jp.fujitsu.com: fix ubifs]
      [kosaki.motohiro@jp.fujitsu.com: fix fuse]
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Reviewed-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: <stable@kernel.org>		[2.6.28.x]
      Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      [ Cleaned up the calling convention: just pass in the AOP flags
        untouched to the grab_cache_page_write_begin() function.  That
        just simplifies everybody, and may even allow future expansion of the
        logic.   - Linus ]
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      54566b2c
  15. 08 10月, 2008 1 次提交
  16. 07 10月, 2008 1 次提交
    • T
      NFS: Fix nfs_file_llseek() · d5e66348
      Trond Myklebust 提交于
      After the BKL removal patches were applied to the rest of the NFS code, the
      BKL protection in nfs_file_llseek() is no longer sufficient to ensure that
      inode->i_size is read safely in generic_file_llseek_unlocked().
      
      In order to fix the situation, we either have to replace the naked read of
      inode->i_size in generic_file_llseek_unlocked() with i_size_read(), or the
      whole thing needs to be executed under the inode->i_lock;
      In order to avoid disrupting other filesystems, avoid touching
      generic_file_llseek_unlocked() for now...
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      d5e66348
  17. 16 7月, 2008 2 次提交
  18. 10 7月, 2008 9 次提交
  19. 03 7月, 2008 1 次提交
    • A
      Remove BKL from remote_llseek v2 · 9465efc9
      Andi Kleen 提交于
      - Replace remote_llseek with generic_file_llseek_unlocked (to force compilation
      failures in all users)
      - Change all users to either use generic_file_llseek_unlocked directly or
      take the BKL around. I changed the file systems who don't use the BKL
      for anything (CIFS, GFS) to call it directly. NCPFS and SMBFS and NFS
      take the BKL, but explicitely in their own source now.
      
      I moved them all over in a single patch to avoid unbisectable sections.
      
      Open problem: 32bit kernels can corrupt fpos because its modification
      is not atomic, but they can do that anyways because there's other paths who
      modify it without BKL.
      
      Do we need a special lock for the pos/f_version = 0 checks?
      
      Trond says the NFS BKL is likely not needed, but keep it for now
      until his full audit.
      
      v2: Use generic_file_llseek_unlocked instead of remote_llseek_unlocked
          and factor duplicated code (suggested by hch)
      
      Cc: Trond.Myklebust@netapp.com
      Cc: swhiteho@redhat.com
      Cc: sfrench@samba.org
      Cc: vandrove@vc.cvut.cz
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NJonathan Corbet <corbet@lwn.net>
      9465efc9
  20. 17 5月, 2008 1 次提交
  21. 20 4月, 2008 1 次提交
  22. 09 4月, 2008 1 次提交
  23. 20 3月, 2008 1 次提交
  24. 30 1月, 2008 2 次提交
  25. 20 10月, 2007 1 次提交
  26. 17 10月, 2007 1 次提交