1. 31 3月, 2011 1 次提交
  2. 24 3月, 2011 1 次提交
  3. 17 3月, 2011 1 次提交
  4. 15 3月, 2011 1 次提交
  5. 14 3月, 2011 3 次提交
  6. 11 3月, 2011 3 次提交
  7. 10 3月, 2011 3 次提交
  8. 09 3月, 2011 3 次提交
    • S
      GFS2: Remove potential race in flock code · 0a33443b
      Steven Whitehouse 提交于
      This patch ensures that we always wait for glock demotion when
      dropping flocks on a file in order to prevent any race
      conditions associated with further flock calls or closing
      the file.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      0a33443b
    • S
      GFS2: Fix glock deallocation race · fc0e38da
      Steven Whitehouse 提交于
      This patch fixes a race in deallocating glocks which was introduced
      in the RCU glock patch. We need to ensure that the glock count is
      kept correct even in the case that there is a race to add a new
      glock into the hash table. Also, to avoid having to wait for an
      RCU grace period, the glock counter can be decremented before
      call_rcu() is called.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      fc0e38da
    • A
      GFS2: quota allows exceeding hard limit · 662e3a55
      Abhijith Das 提交于
      Immediately after being synced to disk, cached quotas are zeroed out and a
      subsequent access of the cached quotas results in incorrect zero values. This
      meant that gfs2 assumed the actual usage to be the zero (or near-zero) usage
      values it found in the cached quotas and comparison against warn/limits never
      triggered a quota violation.
      
      This patch adds a new flag QDF_REFRESH that is set after a sync so that the
      cached quotas are forcefully refreshed from disk on a subsequent access on
      seeing this flag set.
      
      Resolves: rhbz#675944
      Signed-off-by: NAbhi Das <adas@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      662e3a55
  9. 24 2月, 2011 2 次提交
    • B
      GFS2: deallocation performance patch · 4c16c36a
      Bob Peterson 提交于
      This patch is a performance improvement to GFS2's dealloc code.
      Rather than update the quota file and statfs file for every
      single block that's stripped off in unlink function do_strip,
      this patch keeps track and updates them once for every layer
      that's stripped.  This is done entirely inside the existing
      transaction, so there should be no risk of corruption.
      The other functions that deallocate blocks will be unaffected
      because they are using wrapper functions that do the same
      thing that they do today.
      
      I tested this code on my roth cluster by creating 200
      files in a directory, each of which is 100MB, then on
      four nodes, I simultaneously deleted the files, thus competing
      for GFS2 resources (but different files).  The commands
      I used were:
      
      [root@roth-01]# time for i in `seq 1 4 200` ; do rm /mnt/gfs2/bigdir/gfs2.$i; done
      [root@roth-02]# time for i in `seq 2 4 200` ; do rm /mnt/gfs2/bigdir/gfs2.$i; done
      [root@roth-03]# time for i in `seq 3 4 200` ; do rm /mnt/gfs2/bigdir/gfs2.$i; done
      [root@roth-05]# time for i in `seq 4 4 200` ; do rm /mnt/gfs2/bigdir/gfs2.$i; done
      
      The performance increase was significant:
      
                   roth-01     roth-02     roth-03     roth-05
                   ---------   ---------   ---------   ---------
      old: real    0m34.027    0m25.021s   0m23.906s   0m35.646s
      new: real    0m22.379s   0m24.362s   0m24.133s   0m18.562s
      
      Total time spent deleting:
      old: 118.6s
      new:  89.4
      
      For this particular case, this showed a 25% performance increase for
      GFS2 unlinks.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      4c16c36a
    • M
      mm: prevent concurrent unmap_mapping_range() on the same inode · 2aa15890
      Miklos Szeredi 提交于
      Michael Leun reported that running parallel opens on a fuse filesystem
      can trigger a "kernel BUG at mm/truncate.c:475"
      
      Gurudas Pai reported the same bug on NFS.
      
      The reason is, unmap_mapping_range() is not prepared for more than
      one concurrent invocation per inode.  For example:
      
        thread1: going through a big range, stops in the middle of a vma and
           stores the restart address in vm_truncate_count.
      
        thread2: comes in with a small (e.g. single page) unmap request on
           the same vma, somewhere before restart_address, finds that the
           vma was already unmapped up to the restart address and happily
           returns without doing anything.
      
      Another scenario would be two big unmap requests, both having to
      restart the unmapping and each one setting vm_truncate_count to its
      own value.  This could go on forever without any of them being able to
      finish.
      
      Truncate and hole punching already serialize with i_mutex.  Other
      callers of unmap_mapping_range() do not, and it's difficult to get
      i_mutex protection for all callers.  In particular ->d_revalidate(),
      which calls invalidate_inode_pages2_range() in fuse, may be called
      with or without i_mutex.
      
      This patch adds a new mutex to 'struct address_space' to prevent
      running multiple concurrent unmap_mapping_range() on the same mapping.
      
      [ We'll hopefully get rid of all this with the upcoming mm
        preemptibility series by Peter Zijlstra, the "mm: Remove i_mmap_mutex
        lockbreak" patch in particular.  But that is for 2.6.39 ]
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      Reported-by: NMichael Leun <lkml20101129@newton.leun.net>
      Reported-by: NGurudas Pai <gurudas.pai@oracle.com>
      Tested-by: NGurudas Pai <gurudas.pai@oracle.com>
      Acked-by: NHugh Dickins <hughd@google.com>
      Cc: stable@kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2aa15890
  10. 17 2月, 2011 1 次提交
  11. 08 2月, 2011 1 次提交
  12. 02 2月, 2011 2 次提交
    • S
      GFS2: Improve cluster mmap scalability · b9c93bb7
      Steven Whitehouse 提交于
      The mmap system call grabs a glock when an update to atime maybe
      required. It does this in order to ensure that the flags on the
      inode are uptodate, but since it will only mark atime for a future
      update, an exclusive lock is not required here (one will be taken
      later when the actual update is performed).
      
      Also, the lock can be skipped when the mount is marked noatime in
      addition to the original check which only looked at the noatime
      flag for the inode itself.
      
      This should increase the scalability of the mmap call when multiple
      nodes are all mmaping the same file.
      Reported-by: NScooter Morris <scooter@cgl.ucsf.edu>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      b9c93bb7
    • E
      fs/vfs/security: pass last path component to LSM on inode creation · 2a7dba39
      Eric Paris 提交于
      SELinux would like to implement a new labeling behavior of newly created
      inodes.  We currently label new inodes based on the parent and the creating
      process.  This new behavior would also take into account the name of the
      new object when deciding the new label.  This is not the (supposed) full path,
      just the last component of the path.
      
      This is very useful because creating /etc/shadow is different than creating
      /etc/passwd but the kernel hooks are unable to differentiate these
      operations.  We currently require that userspace realize it is doing some
      difficult operation like that and than userspace jumps through SELinux hoops
      to get things set up correctly.  This patch does not implement new
      behavior, that is obviously contained in a seperate SELinux patch, but it
      does pass the needed name down to the correct LSM hook.  If no such name
      exists it is fine to pass NULL.
      Signed-off-by: NEric Paris <eparis@redhat.com>
      2a7dba39
  13. 31 1月, 2011 1 次提交
  14. 21 1月, 2011 2 次提交
    • S
      GFS2: Post-VFS scale update for RCU path walk · 75d5cfbe
      Steven Whitehouse 提交于
      We can allow a few more cases to use RCU path walking than
      originally allowed. It should be possible to also enable
      RCU path walking when the glock is already cached. Thats
      a bit more complicated though, so left for a future patch.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      Cc: Nick Piggin <npiggin@gmail.com>
      75d5cfbe
    • S
      GFS2: Use RCU for glock hash table · bc015cb8
      Steven Whitehouse 提交于
      This has a number of advantages:
      
       - Reduces contention on the hash table lock
       - Makes the code smaller and simpler
       - Should speed up glock dumps when under load
       - Removes ref count changing in examine_bucket
       - No longer need hash chain lock in glock_put() in common case
      
      There are some further changes which this enables and which
      we may do in the future. One is to look at using SLAB_RCU,
      and another is to look at using a per-cpu counter for the
      per-sb glock counter, since that is touched twice in the
      lifetime of each glock (but only used at umount time).
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      bc015cb8
  15. 18 1月, 2011 2 次提交
    • S
      GFS2: Fix error path in gfs2_lookup_by_inum() · 24d9765f
      Steven Whitehouse 提交于
      In the (impossible, except if there is fs corruption) error path
      in gfs2_lookup_by_inum() if the call to gfs2_inode_refresh()
      fails, it was leaving the function by calling iput() rather
      than iget_failed(). This would cause future lookups of the same
      inode to block forever.
      
      This patch fixes the problem by moving the call to gfs2_inode_refresh()
      into gfs2_inode_lookup() where iget_failed() is part of the error path
      already. Also this cleans up some unreachable code and makes
      gfs2_set_iop() static.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      24d9765f
    • B
      GFS2: remove iopen glocks from cache on failed deletes · 23c30108
      Benjamin Marzinski 提交于
      When a file gets deleted on GFS2, if a node can't get an exclusive lock on the
      file's iopen glock, it punts on actually freeing up the space, because another
      node is using the file.  When it does this, it needs to drop the iopen glock
      from its cache so that the other node can get an exclusive lock on it. Now,
      gfs2_delete_inode() sets GL_NOCACHE before dropping the shared lock on the
      iopen glock in preparation for grabbing it in the exclusive state.  Since the
      node needs the glock in the exclusive state, dropping the shared lock from the
      cache doesn't slow down the case where no other nodes are using the file.
      Signed-off-by: NBenjamin Marzinski <bmarzins@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      23c30108
  16. 17 1月, 2011 2 次提交
    • C
      fallocate should be a file operation · 2fe17c10
      Christoph Hellwig 提交于
      Currently all filesystems except XFS implement fallocate asynchronously,
      while XFS forced a commit.  Both of these are suboptimal - in case of O_SYNC
      I/O we really want our allocation on disk, especially for the !KEEP_SIZE
      case where we actually grow the file with user-visible zeroes.  On the
      other hand always commiting the transaction is a bad idea for fast-path
      uses of fallocate like for example in recent Samba versions.   Given
      that block allocation is a data plane operation anyway change it from
      an inode operation to a file operation so that we have the file structure
      available that lets us check for O_SYNC.
      
      This also includes moving the code around for a few of the filesystems,
      and remove the already unnedded S_ISDIR checks given that we only wire
      up fallocate for regular files.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      2fe17c10
    • C
      make the feature checks in ->fallocate future proof · 64c23e86
      Christoph Hellwig 提交于
      Instead of various home grown checks that might need updates for new
      flags just check for any bit outside the mask of the features supported
      by the filesystem.  This makes the check future proof for any newly
      added flag.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      64c23e86
  17. 13 1月, 2011 2 次提交
  18. 11 1月, 2011 1 次提交
  19. 07 1月, 2011 6 次提交
    • N
      fs: provide rcu-walk aware permission i_ops · b74c79e9
      Nick Piggin 提交于
      Signed-off-by: NNick Piggin <npiggin@kernel.dk>
      b74c79e9
    • N
      fs: rcu-walk aware d_revalidate method · 34286d66
      Nick Piggin 提交于
      Require filesystems be aware of .d_revalidate being called in rcu-walk
      mode (nd->flags & LOOKUP_RCU). For now do a simple push down, returning
      -ECHILD from all implementations.
      Signed-off-by: NNick Piggin <npiggin@kernel.dk>
      34286d66
    • N
      fs: dcache reduce branches in lookup path · fb045adb
      Nick Piggin 提交于
      Reduce some branches and memory accesses in dcache lookup by adding dentry
      flags to indicate common d_ops are set, rather than having to check them.
      This saves a pointer memory access (dentry->d_op) in common path lookup
      situations, and saves another pointer load and branch in cases where we
      have d_op but not the particular operation.
      
      Patched with:
      
      git grep -E '[.>]([[:space:]])*d_op([[:space:]])*=' | xargs sed -e 's/\([^\t ]*\)->d_op = \(.*\);/d_set_d_op(\1, \2);/' -e 's/\([^\t ]*\)\.d_op = \(.*\);/d_set_d_op(\&\1, \2);/' -i
      Signed-off-by: NNick Piggin <npiggin@kernel.dk>
      fb045adb
    • N
      fs: icache RCU free inodes · fa0d7e3d
      Nick Piggin 提交于
      RCU free the struct inode. This will allow:
      
      - Subsequent store-free path walking patch. The inode must be consulted for
        permissions when walking, so an RCU inode reference is a must.
      - sb_inode_list_lock to be moved inside i_lock because sb list walkers who want
        to take i_lock no longer need to take sb_inode_list_lock to walk the list in
        the first place. This will simplify and optimize locking.
      - Could remove some nested trylock loops in dcache code
      - Could potentially simplify things a bit in VM land. Do not need to take the
        page lock to follow page->mapping.
      
      The downsides of this is the performance cost of using RCU. In a simple
      creat/unlink microbenchmark, performance drops by about 10% due to inability to
      reuse cache-hot slab objects. As iterations increase and RCU freeing starts
      kicking over, this increases to about 20%.
      
      In cases where inode lifetimes are longer (ie. many inodes may be allocated
      during the average life span of a single inode), a lot of this cache reuse is
      not applicable, so the regression caused by this patch is smaller.
      
      The cache-hot regression could largely be avoided by using SLAB_DESTROY_BY_RCU,
      however this adds some complexity to list walking and store-free path walking,
      so I prefer to implement this at a later date, if it is shown to be a win in
      real situations. I haven't found a regression in any non-micro benchmark so I
      doubt it will be a problem.
      Signed-off-by: NNick Piggin <npiggin@kernel.dk>
      fa0d7e3d
    • N
      fs: change d_hash for rcu-walk · b1e6a015
      Nick Piggin 提交于
      Change d_hash so it may be called from lock-free RCU lookups. See similar
      patch for d_compare for details.
      
      For in-tree filesystems, this is just a mechanical change.
      Signed-off-by: NNick Piggin <npiggin@kernel.dk>
      b1e6a015
    • N
      fs: change d_delete semantics · fe15ce44
      Nick Piggin 提交于
      Change d_delete from a dentry deletion notification to a dentry caching
      advise, more like ->drop_inode. Require it to be constant and idempotent,
      and not take d_lock. This is how all existing filesystems use the callback
      anyway.
      
      This makes fine grained dentry locking of dput and dentry lru scanning
      much simpler.
      Signed-off-by: NNick Piggin <npiggin@kernel.dk>
      fe15ce44
  20. 16 12月, 2010 1 次提交
    • S
      GFS2: Don't flush delete workqueue when releasing the transaction lock · 846f4045
      Steven Whitehouse 提交于
      There is no requirement to flush the delete workqueue before a
      gfs2 filesystem is suspended. The workqueue's work will just
      be suspended along with the rest of the tasks on the filesystem.
      
      The resolves a deadlock situation where the transaction lock's
      demotion code was trying to flush the delete workqueue while at
      the same time, the workqueue was waiting for the transaction
      lock.
      
      The delete workqueue is flushed by gfs2_make_fs_ro() already, so
      that umount/remount are correctly protected anyway.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      846f4045
  21. 08 12月, 2010 1 次提交
    • B
      GFS2: fsck.gfs2 reported statfs error after gfs2_grow · bcd7278d
      Bob Peterson 提交于
      When you do gfs2_grow it failed to take the very last
      rgrp into account when adding up the new free space due
      to an off-by-one error.  It was not reading the last
      rgrp from the rindex because of a check for "<=" that
      should have been "<".  Therefore, fsck.gfs2 was finding
      (and fixing) an error with the system statfs file.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      bcd7278d