1. 07 1月, 2011 1 次提交
    • N
      fs: icache RCU free inodes · fa0d7e3d
      Nick Piggin 提交于
      RCU free the struct inode. This will allow:
      
      - Subsequent store-free path walking patch. The inode must be consulted for
        permissions when walking, so an RCU inode reference is a must.
      - sb_inode_list_lock to be moved inside i_lock because sb list walkers who want
        to take i_lock no longer need to take sb_inode_list_lock to walk the list in
        the first place. This will simplify and optimize locking.
      - Could remove some nested trylock loops in dcache code
      - Could potentially simplify things a bit in VM land. Do not need to take the
        page lock to follow page->mapping.
      
      The downsides of this is the performance cost of using RCU. In a simple
      creat/unlink microbenchmark, performance drops by about 10% due to inability to
      reuse cache-hot slab objects. As iterations increase and RCU freeing starts
      kicking over, this increases to about 20%.
      
      In cases where inode lifetimes are longer (ie. many inodes may be allocated
      during the average life span of a single inode), a lot of this cache reuse is
      not applicable, so the regression caused by this patch is smaller.
      
      The cache-hot regression could largely be avoided by using SLAB_DESTROY_BY_RCU,
      however this adds some complexity to list walking and store-free path walking,
      so I prefer to implement this at a later date, if it is shown to be a win in
      real situations. I haven't found a regression in any non-micro benchmark so I
      doubt it will be a problem.
      Signed-off-by: NNick Piggin <npiggin@kernel.dk>
      fa0d7e3d
  2. 02 12月, 2010 1 次提交
    • T
      NFS: Fix a memory leak in nfs_readdir · 11de3b11
      Trond Myklebust 提交于
      We need to ensure that the entries in the nfs_cache_array get cleared
      when the page is removed from the page cache. To do so, we use the
      freepage address_space operation.
      
      Change nfs_readdir_clear_array to use kmap_atomic(), so that the
      function can be safely called from all contexts.
      
      Finally, modify the cache_page_release helper to call
      nfs_readdir_clear_array directly, when dealing with an anonymous
      page from 'uncached_readdir'.
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      11de3b11
  3. 25 10月, 2010 1 次提交
  4. 24 10月, 2010 1 次提交
  5. 08 10月, 2010 1 次提交
    • B
      NFS: new idmapper · 955a857e
      Bryan Schumaker 提交于
      This patch creates a new idmapper system that uses the request-key function to
      place a call into userspace to map user and group ids to names.  The old
      idmapper was single threaded, which prevented more than one request from running
      at a single time.  This means that a user would have to wait for an upcall to
      finish before accessing a cached result.
      
      The upcall result is stored on a keyring of type id_resolver.  See the file
      Documentation/filesystems/nfs/idmapper.txt for instructions.
      Signed-off-by: NBryan Schumaker <bjschuma@netapp.com>
      [Trond: fix up the return value of nfs_idmap_lookup_name and clean up code]
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      955a857e
  6. 30 9月, 2010 1 次提交
  7. 24 9月, 2010 1 次提交
  8. 22 9月, 2010 1 次提交
  9. 17 9月, 2010 1 次提交
    • T
      NFSv4: Clean up nfs4_atomic_open · cd9a1c0e
      Trond Myklebust 提交于
      Start moving the 'struct nameidata' dependent code out of the lower level
      NFS code in preparation for the removal of open intents.
      
      Instead of the struct nameidata, we pass down a partially initialised
      struct nfs_open_context that will be fully initialised by the atomic open
      upon success.
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      cd9a1c0e
  10. 10 8月, 2010 1 次提交
  11. 04 8月, 2010 1 次提交
  12. 31 7月, 2010 1 次提交
  13. 15 5月, 2010 4 次提交
  14. 10 4月, 2010 1 次提交
  15. 30 3月, 2010 1 次提交
    • T
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking... · 5a0e3ad6
      Tejun Heo 提交于
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
      
      percpu.h is included by sched.h and module.h and thus ends up being
      included when building most .c files.  percpu.h includes slab.h which
      in turn includes gfp.h making everything defined by the two files
      universally available and complicating inclusion dependencies.
      
      percpu.h -> slab.h dependency is about to be removed.  Prepare for
      this change by updating users of gfp and slab facilities include those
      headers directly instead of assuming availability.  As this conversion
      needs to touch large number of source files, the following script is
      used as the basis of conversion.
      
        http://userweb.kernel.org/~tj/misc/slabh-sweep.py
      
      The script does the followings.
      
      * Scan files for gfp and slab usages and update includes such that
        only the necessary includes are there.  ie. if only gfp is used,
        gfp.h, if slab is used, slab.h.
      
      * When the script inserts a new include, it looks at the include
        blocks and try to put the new include such that its order conforms
        to its surrounding.  It's put in the include block which contains
        core kernel includes, in the same order that the rest are ordered -
        alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
        doesn't seem to be any matching order.
      
      * If the script can't find a place to put a new include (mostly
        because the file doesn't have fitting include block), it prints out
        an error message indicating which .h file needs to be added to the
        file.
      
      The conversion was done in the following steps.
      
      1. The initial automatic conversion of all .c files updated slightly
         over 4000 files, deleting around 700 includes and adding ~480 gfp.h
         and ~3000 slab.h inclusions.  The script emitted errors for ~400
         files.
      
      2. Each error was manually checked.  Some didn't need the inclusion,
         some needed manual addition while adding it to implementation .h or
         embedding .c file was more appropriate for others.  This step added
         inclusions to around 150 files.
      
      3. The script was run again and the output was compared to the edits
         from #2 to make sure no file was left behind.
      
      4. Several build tests were done and a couple of problems were fixed.
         e.g. lib/decompress_*.c used malloc/free() wrappers around slab
         APIs requiring slab.h to be added manually.
      
      5. The script was run on all .h files but without automatically
         editing them as sprinkling gfp.h and slab.h inclusions around .h
         files could easily lead to inclusion dependency hell.  Most gfp.h
         inclusion directives were ignored as stuff from gfp.h was usually
         wildly available and often used in preprocessor macros.  Each
         slab.h inclusion directive was examined and added manually as
         necessary.
      
      6. percpu.h was updated not to include slab.h.
      
      7. Build test were done on the following configurations and failures
         were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
         distributed build env didn't work with gcov compiles) and a few
         more options had to be turned off depending on archs to make things
         build (like ipr on powerpc/64 which failed due to missing writeq).
      
         * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
         * powerpc and powerpc64 SMP allmodconfig
         * sparc and sparc64 SMP allmodconfig
         * ia64 SMP allmodconfig
         * s390 SMP allmodconfig
         * alpha SMP allmodconfig
         * um on x86_64 SMP allmodconfig
      
      8. percpu.h modifications were reverted so that it could be applied as
         a separate patch and serve as bisection point.
      
      Given the fact that I had only a couple of failures from tests on step
      6, I'm fairly confident about the coverage of this conversion patch.
      If there is a breakage, it's likely to be something in one of the arch
      headers which should be easily discoverable easily on most builds of
      the specific arch.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Guess-its-ok-by: NChristoph Lameter <cl@linux-foundation.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
      5a0e3ad6
  16. 11 3月, 2010 1 次提交
  17. 06 3月, 2010 7 次提交
  18. 04 3月, 2010 1 次提交
  19. 10 2月, 2010 1 次提交
    • C
      NFS: Make close(2) asynchronous when closing NFS O_DIRECT files · f895c53f
      Chuck Lever 提交于
      For NFSv2 and v3:
      
      O_DIRECT writes are always synchronous, and aren't cached, so nothing
      should be flushed when closing an NFS O_DIRECT file descriptor.  Thus
      there are no write errors to report on close(2).
      
      In addition, there's no cached data to verify on the next open(2),
      so we don't need clean GETATTR results at close time to compare with.
      
      Thus, there's no need for the nfs_revalidate_inode() call when closing
      an NFS O_DIRECT file.  This reduces the number of synchronous
      on-the-wire requests for a simple open-write-close of an NFS O_DIRECT
      file by roughly 20%.
      
      For NFSv4:
      
      Call nfs4_do_close() with wait set to zero when closing an NFS
      O_DIRECT file.  The CLOSE will go on the wire, but the application
      won't wait for it to complete.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      f895c53f
  20. 03 2月, 2010 1 次提交
  21. 24 9月, 2009 1 次提交
  22. 20 8月, 2009 1 次提交
  23. 10 8月, 2009 1 次提交
    • T
      NFSv4: Add 'server capability' flags for NFSv4 recommended attributes · 62ab460c
      Trond Myklebust 提交于
      If the NFSv4 server doesn't support a POSIX attribute, the generic NFS code
      needs to know that, so that it don't keep trying to poll for it.
      
      However, by the same count, if the NFSv4 server does support that
      attribute, then we should ensure that the inode metadata is appropriately
      labelled as being untrusted. For instance, if we don't know the correct
      value of the file's uid, we should certainly not be caching ACLs or ACCESS
      results.
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      62ab460c
  24. 13 7月, 2009 1 次提交
  25. 03 4月, 2009 2 次提交
  26. 20 3月, 2009 1 次提交
    • T
      NFS: Optimise NFS close() · 7fe5c398
      Trond Myklebust 提交于
      Close-to-open cache consistency rules really only require us to flush out
      writes on calls to close(), and require us to revalidate attributes on the
      very last close of the file.
      
      Currently we appear to be doing a lot of extra attribute revalidation
      and cache flushes.
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      7fe5c398
  27. 12 3月, 2009 4 次提交
    • T
      NFS: Throttle page dirtying while we're flushing to disk · 72cb77f4
      Trond Myklebust 提交于
      The following patch is a combination of a patch by myself and Peter
      Staubach.
      
      Trond: If we allow other processes to dirty pages while a process is doing
      a consistency sync to disk, we can end up never making progress.
      
      Peter: Attached is a patch which addresses a continuing problem with
      the NFS client generating out of order WRITE requests.  While
      this is compliant with all of the current protocol
      specifications, there are servers in the market which can not
      handle out of order WRITE requests very well.  Also, this may
      lead to sub-optimal block allocations in the underlying file
      system on the server.  This may cause the read throughputs to
      be reduced when reading the file from the server.
      
      Peter: There has been a lot of work recently done to address out of
      order issues on a systemic level.  However, the NFS client is
      still susceptible to the problem.  Out of order WRITE
      requests can occur when pdflush is in the middle of writing
      out pages while the process dirtying the pages calls
      generic_file_buffered_write which calls
      generic_perform_write which calls
      balance_dirty_pages_rate_limited which ends up calling
      writeback_inodes which ends up calling back into the NFS
      client to writes out dirty pages for the same file that
      pdflush happens to be working with.
      Signed-off-by: NPeter Staubach <staubach@redhat.com>
      [modification by Trond to merge the two similar patches]
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      72cb77f4
    • T
      fb8a1f11
    • T
      NFSv4: Support NFSv4 optional attributes in the struct nfs_fattr · 9e6e70f8
      Trond Myklebust 提交于
      Currently, filling struct nfs_fattr is more or less an all or nothing
      operation, since NFSv2 and NFSv3 have only mandatory attributes.
      In NFSv4, some attributes are optional, and so we may simply not be able to
      fill in those fields. Furthermore, NFSv4 allows you to specify which
      attributes you are interested in retrieving, thus permitting you to
      optimise away retrieval of attributes that you know will no change...
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      9e6e70f8
    • N
      NFS: flush cached directory information slightly more readily. · 37d9d76d
      NeilBrown 提交于
      If cached directory contents becomes incorrect, there is no way to
      flush the contents.  This contrasts with files where file locking is
      the recommended way to ensure cache consistency between multiple
      applications (a read-lock always flushes the cache).
      
      Also while changes to files often change the size of the file (thus
      triggering a cache flush), changes to directories often do not change
      the apparent size (as the size is often rounded to a block size).
      
      So it is particularly important with directories to avoid the
      possibility of an incorrect cache wherever possible.
      
      When the link count on a directory changes it implies a change in the
      number of child directories, and so a change in the contents of this
      directory.  So use that as a trigger to flush cached contents.
      
      When the ctime changes but the mtime does not, there are two possible
      reasons.
       1/ The owner/mode information has been changed.
       2/ utimes has been used to set the mtime backwards.
      
      In the first case, a data-cache flush is not required.
      In the second case it is.
      
      So on the basis that correctness trumps performance, flush the
      directory contents cache in this case also.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      37d9d76d