1. 07 1月, 2011 7 次提交
    • N
      fs: rcu-walk aware d_revalidate method · 34286d66
      Nick Piggin 提交于
      Require filesystems be aware of .d_revalidate being called in rcu-walk
      mode (nd->flags & LOOKUP_RCU). For now do a simple push down, returning
      -ECHILD from all implementations.
      Signed-off-by: NNick Piggin <npiggin@kernel.dk>
      34286d66
    • N
      fs: dcache reduce branches in lookup path · fb045adb
      Nick Piggin 提交于
      Reduce some branches and memory accesses in dcache lookup by adding dentry
      flags to indicate common d_ops are set, rather than having to check them.
      This saves a pointer memory access (dentry->d_op) in common path lookup
      situations, and saves another pointer load and branch in cases where we
      have d_op but not the particular operation.
      
      Patched with:
      
      git grep -E '[.>]([[:space:]])*d_op([[:space:]])*=' | xargs sed -e 's/\([^\t ]*\)->d_op = \(.*\);/d_set_d_op(\1, \2);/' -e 's/\([^\t ]*\)\.d_op = \(.*\);/d_set_d_op(\&\1, \2);/' -i
      Signed-off-by: NNick Piggin <npiggin@kernel.dk>
      fb045adb
    • N
      fs: icache RCU free inodes · fa0d7e3d
      Nick Piggin 提交于
      RCU free the struct inode. This will allow:
      
      - Subsequent store-free path walking patch. The inode must be consulted for
        permissions when walking, so an RCU inode reference is a must.
      - sb_inode_list_lock to be moved inside i_lock because sb list walkers who want
        to take i_lock no longer need to take sb_inode_list_lock to walk the list in
        the first place. This will simplify and optimize locking.
      - Could remove some nested trylock loops in dcache code
      - Could potentially simplify things a bit in VM land. Do not need to take the
        page lock to follow page->mapping.
      
      The downsides of this is the performance cost of using RCU. In a simple
      creat/unlink microbenchmark, performance drops by about 10% due to inability to
      reuse cache-hot slab objects. As iterations increase and RCU freeing starts
      kicking over, this increases to about 20%.
      
      In cases where inode lifetimes are longer (ie. many inodes may be allocated
      during the average life span of a single inode), a lot of this cache reuse is
      not applicable, so the regression caused by this patch is smaller.
      
      The cache-hot regression could largely be avoided by using SLAB_DESTROY_BY_RCU,
      however this adds some complexity to list walking and store-free path walking,
      so I prefer to implement this at a later date, if it is shown to be a win in
      real situations. I haven't found a regression in any non-micro benchmark so I
      doubt it will be a problem.
      Signed-off-by: NNick Piggin <npiggin@kernel.dk>
      fa0d7e3d
    • N
      fs: dcache remove dcache_lock · b5c84bf6
      Nick Piggin 提交于
      dcache_lock no longer protects anything. remove it.
      Signed-off-by: NNick Piggin <npiggin@kernel.dk>
      b5c84bf6
    • N
      fs: dcache scale subdirs · 2fd6b7f5
      Nick Piggin 提交于
      Protect d_subdirs and d_child with d_lock, except in filesystems that aren't
      using dcache_lock for these anyway (eg. using i_mutex).
      
      Note: if we change the locking rule in future so that ->d_child protection is
      provided only with ->d_parent->d_lock, it may allow us to reduce some locking.
      But it would be an exception to an otherwise regular locking scheme, so we'd
      have to see some good results. Probably not worthwhile.
      Signed-off-by: NNick Piggin <npiggin@kernel.dk>
      2fd6b7f5
    • N
      fs: dcache scale d_unhashed · da502956
      Nick Piggin 提交于
      Protect d_unhashed(dentry) condition with d_lock. This means keeping
      DCACHE_UNHASHED bit in synch with hash manipulations.
      Signed-off-by: NNick Piggin <npiggin@kernel.dk>
      da502956
    • N
      fs: dcache scale dentry refcount · b7ab39f6
      Nick Piggin 提交于
      Make d_count non-atomic and protect it with d_lock. This allows us to ensure a
      0 refcount dentry remains 0 without dcache_lock. It is also fairly natural when
      we start protecting many other dentry members with d_lock.
      Signed-off-by: NNick Piggin <npiggin@kernel.dk>
      b7ab39f6
  2. 18 12月, 2010 2 次提交
  3. 16 12月, 2010 1 次提交
  4. 07 12月, 2010 1 次提交
  5. 02 12月, 2010 4 次提交
  6. 19 11月, 2010 1 次提交
  7. 18 11月, 2010 1 次提交
  8. 12 11月, 2010 2 次提交
  9. 10 11月, 2010 2 次提交
  10. 09 11月, 2010 2 次提交
    • S
      ceph: fix update of ctime from MDS · d8672d64
      Sage Weil 提交于
      The client can have a newer ctime than the MDS due to AUTH_EXCL and
      XATTR_EXCL caps as well; update the check in ceph_fill_file_time
      appropriately.
      
      This fixes cases where ctime/mtime goes backward under the right sequence
      of local updates (e.g. chmod) and mds replies (e.g. subsequent stat that
      goes to the MDS).
      Signed-off-by: NSage Weil <sage@newdream.net>
      d8672d64
    • S
      ceph: fix version check on racing inode updates · 8bd59e01
      Sage Weil 提交于
      We may get updates on the same inode from multiple MDSs; generally we only
      pay attention if the update is newer than what we already have.  The
      exception is when an MDS sense unstable information, in which case we
      always update.
      
      The old > check got this wrong when our version was odd (e.g. 3) and the
      reply version was even (e.g. 2): the older stale (v2) info would be
      applied.  Fixed and clarified the comment.
      Signed-off-by: NSage Weil <sage@newdream.net>
      8bd59e01
  11. 08 11月, 2010 6 次提交
    • S
      ceph: fix uid/gid on resent mds requests · cb4276cc
      Sage Weil 提交于
      MDS requests can be rebuilt and resent in non-process context, but were
      filling in uid/gid from current_fsuid/gid.  Put that information in the
      request struct on request setup.
      
      This fixes incorrect (and root) uid/gid getting set for requests that
      are forwarded between MDSs, usually due to metadata migrations.
      Signed-off-by: NSage Weil <sage@newdream.net>
      cb4276cc
    • S
      ceph: fix rdcache_gen usage and invalidate · cd045cb4
      Sage Weil 提交于
      We used to use rdcache_gen to indicate whether we "might" have cached
      pages.  Now we just look at the mapping to determine that.  However, some
      old behavior remains from that transition.
      
      First, rdcache_gen == 0 no longer means we have no pages.  That can happen
      at any time (presumably when we carry FILE_CACHE).  We should not reset it
      to zero, and we should not check that it is zero.
      
      That means that the only purpose for rdcache_revoking is to resolve races
      between new issues of FILE_CACHE and an async invalidate.  If they are
      equal, we should invalidate.  On success, we decrement rdcache_revoking,
      so that it is no longer equal to rdcache_gen.  Similarly, if we success
      in doing a sync invalidate, set revoking = gen - 1.  (This is a small
      optimization to avoid doing unnecessary invalidate work and does not
      affect correctness.)
      Signed-off-by: NSage Weil <sage@newdream.net>
      cd045cb4
    • S
      ceph: re-request max_size if cap auth changes · feb4cc9b
      Sage Weil 提交于
      If the auth cap migrates to another MDS, clear requested_max_size so that
      we resend any pending max_size increase requests.  This fixes potential
      hangs on writes that extend a file and race with an cap migration between
      MDSs.
      Signed-off-by: NSage Weil <sage@newdream.net>
      feb4cc9b
    • S
      ceph: only let auth caps update max_size · 912a9b03
      Sage Weil 提交于
      Only the auth MDS has a meaningful max_size value for us, so only update it
      in fill_inode if we're being issued an auth cap.  Otherwise, a random
      stat result from a non-auth MDS can clobber a meaningful max_size, get
      the client<->mds cap state out of sync, and make writes hang.
      
      Specifically, even if the client re-requests a larger max_size (which it
      will), the MDS won't respond because as far as it knows we already have a
      sufficiently large value.
      Signed-off-by: NSage Weil <sage@newdream.net>
      912a9b03
    • S
      ceph: fix open for write on clustered mds · 7421ab80
      Sage Weil 提交于
      Normally when we open a file we already have a cap, and simply update the
      wanted set.  However, if we open a file for write, but don't have an auth
      cap, that doesn't work; we need to open a new cap with the auth MDS.  Only
      reuse existing caps if we are opening for read or the existing cap is auth.
      Signed-off-by: NSage Weil <sage@newdream.net>
      7421ab80
    • S
      ceph: fix bad pointer dereference in ceph_fill_trace · d8b16b3d
      Sage Weil 提交于
      We dereference *in a few lines down, but only set it on rename.  It is
      apparently pretty rare for this to trigger, but I have been hitting it
      with a clustered MDSs.
      Signed-off-by: NSage Weil <sage@newdream.net>
      d8b16b3d
  12. 29 10月, 2010 1 次提交
  13. 28 10月, 2010 1 次提交
    • S
      Revert "ceph: update issue_seq on cap grant" · 2f56f56a
      Sage Weil 提交于
      This reverts commit d91f2438.
      
      The intent of issue_seq is to distinguish between mds->client messages that
      (re)create the cap and those that do not, which means we should _only_ be
      updating that value in the create paths.  By updating it in handle_cap_grant,
      we reset it to zero, which then breaks release.
      
      The larger question is what workload/problem made me think it should be
      updated here...
      Signed-off-by: NSage Weil <sage@newdream.net>
      2f56f56a
  14. 27 10月, 2010 1 次提交
    • W
      writeback: remove nonblocking/encountered_congestion references · 1b430bee
      Wu Fengguang 提交于
      This removes more dead code that was somehow missed by commit 0d99519e
      (writeback: remove unused nonblocking and congestion checks).  There are
      no behavior change except for the removal of two entries from one of the
      ext4 tracing interface.
      
      The nonblocking checks in ->writepages are no longer used because the
      flusher now prefer to block on get_request_wait() than to skip inodes on
      IO congestion.  The latter will lead to more seeky IO.
      
      The nonblocking checks in ->writepage are no longer used because it's
      redundant with the WB_SYNC_NONE check.
      
      We no long set ->nonblocking in VM page out and page migration, because
      a) it's effectively redundant with WB_SYNC_NONE in current code
      b) it's old semantic of "Don't get stuck on request queues" is mis-behavior:
         that would skip some dirty inodes on congestion and page out others, which
         is unfair in terms of LRU age.
      
      Inspired by Christoph Hellwig. Thanks!
      Signed-off-by: NWu Fengguang <fengguang.wu@intel.com>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Sage Weil <sage@newdream.net>
      Cc: Steve French <sfrench@samba.org>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Christoph Hellwig <hch@infradead.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1b430bee
  15. 21 10月, 2010 8 次提交