1. 27 6月, 2016 3 次提交
    • A
      gfs2: Lock holder cleanup · 6df9f9a2
      Andreas Gruenbacher 提交于
      Make the code more readable by cleaning up the different ways of
      initializing lock holders and checking for initialized lock holders:
      mark lock holders as uninitialized by setting the holder's glock to NULL
      (gfs2_holder_mark_uninitialized) instead of zeroing out the entire
      object or using a separate flag.  Recognize initialized holders by their
      non-NULL glock (gfs2_holder_initialized).  Don't zero out holder objects
      which are immeditiately initialized via gfs2_holder_init or
      gfs2_glock_nq_init.
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      6df9f9a2
    • A
      gfs2: Get rid of gfs2_ilookup · ec5ec66b
      Andreas Gruenbacher 提交于
      Now that gfs2_lookup_by_inum only takes the inode glock for new inodes
      (and not for cached inodes anymore), there no longer is a need to
      optimize the cached-inode case in gfs2_get_dentry or delete_work_func,
      and gfs2_ilookup can be removed.
      
      In addition, gfs2_get_dentry wasn't checking the GFS2_DIF_SYSTEM flag in
      i_diskflags in the gfs2_ilookup case (see gfs2_lookup_by_inum); this
      inconsistency goes away as well.
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      ec5ec66b
    • A
      gfs2: Fix gfs2_lookup_by_inum lock inversion · 3ce37b2c
      Andreas Gruenbacher 提交于
      The current gfs2_lookup_by_inum takes the glock of a presumed inode
      identified by block number, verifies that the block is indeed an inode,
      and then instantiates and reads the new inode via gfs2_inode_lookup.
      
      However, instantiating a new inode may block on freeing a previous
      instance of that inode (__wait_on_freeing_inode), and freeing an inode
      requires to take the glock already held, leading to lock inversion and
      deadlock.
      
      Fix this by first instantiating the new inode, then verifying that the
      block is an inode (if required), and then reading in the new inode, all
      in gfs2_inode_lookup.
      
      If the block we are looking for is not an inode, we discard the new
      inode via iget_failed, which marks inodes as bad and unhashes them.
      Other tasks waiting on that inode will get back a bad inode back from
      ilookup or iget_locked; in that case, retry the lookup.
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      3ce37b2c
  2. 13 4月, 2016 1 次提交
  3. 05 4月, 2016 1 次提交
  4. 24 3月, 2016 1 次提交
  5. 15 3月, 2016 2 次提交
  6. 14 1月, 2016 1 次提交
  7. 15 12月, 2015 2 次提交
  8. 17 11月, 2015 1 次提交
    • A
      GFS2: Use rht_for_each_entry_rcu in glock_hash_walk · 3dd1dd8c
      Andrew Price 提交于
      This lockdep splat was being triggered on umount:
      
      [55715.973122] ===============================
      [55715.980169] [ INFO: suspicious RCU usage. ]
      [55715.981021] 4.3.0-11553-g8d3de01c-dirty #15 Tainted: G        W
      [55715.982353] -------------------------------
      [55715.983301] fs/gfs2/glock.c:1427 suspicious rcu_dereference_protected() usage!
      
      The code it refers to is the rht_for_each_entry_safe usage in
      glock_hash_walk. The condition that triggers the warning is
      lockdep_rht_bucket_is_held(tbl, hash) which is checked in the
      __rcu_dereference_protected macro.
      
      The rhashtable buckets are not changed in glock_hash_walk so it's safe
      to rely on the rcu protection. Replace the rht_for_each_entry_safe()
      usage with rht_for_each_entry_rcu(), which doesn't care whether the
      bucket lock is held if the rcu read lock is held.
      Signed-off-by: NAndrew Price <anprice@redhat.com>
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Acked-by: NSteven Whitehouse <swhiteho@redhat.com>
      3dd1dd8c
  9. 30 10月, 2015 1 次提交
  10. 04 9月, 2015 5 次提交
  11. 19 6月, 2015 1 次提交
    • B
      GFS2: Don't add all glocks to the lru · e7ccaf5f
      Bob Peterson 提交于
      The glocks used for resource groups often come and go hundreds of
      thousands of times per second. Adding them to the lru list just
      adds unnecessary contention for the lru_lock spin_lock, especially
      considering we're almost certainly going to re-use the glock and
      take it back off the lru microseconds later. We never want the
      glock shrinker to cull them anyway. This patch adds a new bit in
      the glops that determines which glock types get put onto the lru
      list and which ones don't.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Acked-by: NSteven Whitehouse <swhiteho@redhat.com>
      e7ccaf5f
  12. 30 3月, 2015 1 次提交
  13. 21 1月, 2015 1 次提交
  14. 09 1月, 2015 1 次提交
  15. 18 11月, 2014 1 次提交
  16. 08 10月, 2014 1 次提交
  17. 18 7月, 2014 4 次提交
    • B
      GFS2: Allow flocks to use normal glock dq rather than dq_wait · 5bef3e7c
      Bob Peterson 提交于
      This patch allows flock glocks to use a non-blocking dequeue rather
      than dq_wait. It also reverts the previous patch I had posted regarding
      dq_wait. The reverted patch isn't necessarily a bad idea, but I decided
      this might avoid unforeseen side effects, and was therefore safer.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      5bef3e7c
    • S
      GFS2: Use GFP_NOFS when allocating glocks · fe0bbd29
      Steven Whitehouse 提交于
      Normally GFP_KERNEL is ok here, but there is now a rarely used code path
      relating to deallocation of unlinked inodes (in certain corner cases)
      which if hit at times of memory shortage can cause recursion while
      trying to free memory.
      
      One solution would be to try and move the gfs2_glock_get() call so
      that it is no longer called while another glock is held, but that
      doesn't look at all easy, so GFP_NOFS is the best solution for the
      time being.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      fe0bbd29
    • S
      GFS2: Fix race in glock lru glock disposal · 94a09a39
      Steven Whitehouse 提交于
      We must not leave items on the LRU list with GLF_LOCK set, since
      they can be removed if the glock is brought back into use, which
      may then potentially result in a hang, waiting for GLF_LOCK to
      clear.
      
      It doesn't happen very often, since it requires a glock that has
      not been used for a long time to be brought back into use at the
      same moment that the shrinker is part way through disposing of
      glocks.
      
      The fix is to set GLF_LOCK at a later time, when we already know
      that the other locks can be obtained. Also, we now only release
      the lru_lock in case a resched is needed, rather than on every
      iteration.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      94a09a39
    • B
      GFS2: Only wait for demote when last holder is dequeued · 79272b35
      Bob Peterson 提交于
      Function gfs2_glock_dq_wait is supposed to dequeue a glock and then
      wait for the lock to be demoted. The problem is, if this is a shared
      lock, its demote will depend on the other holders, which means you
      might end up waiting forever because the other process is blocked.
      This problem is especially apparent when dealing with nested flocks.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      79272b35
  18. 16 7月, 2014 1 次提交
    • N
      sched: Remove proliferation of wait_on_bit() action functions · 74316201
      NeilBrown 提交于
      The current "wait_on_bit" interface requires an 'action'
      function to be provided which does the actual waiting.
      There are over 20 such functions, many of them identical.
      Most cases can be satisfied by one of just two functions, one
      which uses io_schedule() and one which just uses schedule().
      
      So:
       Rename wait_on_bit and        wait_on_bit_lock to
              wait_on_bit_action and wait_on_bit_lock_action
       to make it explicit that they need an action function.
      
       Introduce new wait_on_bit{,_lock} and wait_on_bit{,_lock}_io
       which are *not* given an action function but implicitly use
       a standard one.
       The decision to error-out if a signal is pending is now made
       based on the 'mode' argument rather than being encoded in the action
       function.
      
       All instances of the old wait_on_bit and wait_on_bit_lock which
       can use the new version have been changed accordingly and their
       action functions have been discarded.
       wait_on_bit{_lock} does not return any specific error code in the
       event of a signal so the caller must check for non-zero and
       interpolate their own error code as appropriate.
      
      The wait_on_bit() call in __fscache_wait_on_invalidate() was
      ambiguous as it specified TASK_UNINTERRUPTIBLE but used
      fscache_wait_bit_interruptible as an action function.
      David Howells confirms this should be uniformly
      "uninterruptible"
      
      The main remaining user of wait_on_bit{,_lock}_action is NFS
      which needs to use a freezer-aware schedule() call.
      
      A comment in fs/gfs2/glock.c notes that having multiple 'action'
      functions is useful as they display differently in the 'wchan'
      field of 'ps'. (and /proc/$PID/wchan).
      As the new bit_wait{,_io} functions are tagged "__sched", they
      will not show up at all, but something higher in the stack.  So
      the distinction will still be visible, only with different
      function names (gds2_glock_wait versus gfs2_glock_dq_wait in the
      gfs2/glock.c case).
      
      Since first version of this patch (against 3.15) two new action
      functions appeared, on in NFS and one in CIFS.  CIFS also now
      uses an action function that makes the same freezer aware
      schedule call as NFS.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      Acked-by: David Howells <dhowells@redhat.com> (fscache, keys)
      Acked-by: Steven Whitehouse <swhiteho@redhat.com> (gfs2)
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Steve French <sfrench@samba.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/20140707051603.28027.72349.stgit@notabene.brownSigned-off-by: NIngo Molnar <mingo@kernel.org>
      74316201
  19. 18 4月, 2014 1 次提交
  20. 12 3月, 2014 1 次提交
  21. 07 3月, 2014 2 次提交
  22. 16 1月, 2014 1 次提交
    • S
      GFS2: Don't use ENOBUFS when ENOMEM is the correct error code · ac3beb6a
      Steven Whitehouse 提交于
      Al Viro has tactfully pointed out that we are using the incorrect
      error code in some cases. This patch fixes that, and also removes
      the (unused) return value for glock dumping.
      
      >        * gfs2_iget() - ENOBUFS instead of ENOMEM.  ENOBUFS is
      > "No buffer space available (POSIX.1 (XSI STREAMS option))" and since
      > we don't support STREAMS it's probably fair game, but... what the hell?
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      Cc: Al Viro <viro@ZenIV.linux.org.uk>
      ac3beb6a
  23. 02 1月, 2014 1 次提交
  24. 21 11月, 2013 1 次提交
  25. 15 10月, 2013 1 次提交
    • S
      GFS2: Use lockref for glocks · e66cf161
      Steven Whitehouse 提交于
      Currently glocks have an atomic reference count and also a spinlock
      which covers various internal fields, such as the state. This intent of
      this patch is to replace the spinlock and the atomic reference count
      with a lockref structure. This contains a spinlock which we can continue
      to use as before, and a reference counter which is used in conjuction
      with the spinlock to replace the previous atomic counter.
      
      As a result of this there are some new rules for reference counting on
      glocks. We need to distinguish between reference count changes under
      gl_spin (which are now just increment or decrement of the new counter,
      provided the count cannot hit zero) and those which are outside of
      gl_spin, but which now take gl_spin internally.
      
      The conversion is relatively straight forward. There is probably some
      further clean up which can be done, but the priority at this stage is to
      make the change in as simple a manner as possible.
      
      A consequence of this change is that the reference count is being
      decoupled from the lru list processing. This should allow future
      adoption of the lru_list code with glocks in due course.
      
      The reason for using the "dead" state and not just relying on 0 being
      the "invalid state" is so that in due course 0 ref counts can be
      allowable. The intent is to eventually be able to remove the ref count
      changes which are currently hidden away in state_change().
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      e66cf161
  26. 11 9月, 2013 2 次提交
    • D
      fs: convert fs shrinkers to new scan/count API · 1ab6c499
      Dave Chinner 提交于
      Convert the filesystem shrinkers to use the new API, and standardise some
      of the behaviours of the shrinkers at the same time.  For example,
      nr_to_scan means the number of objects to scan, not the number of objects
      to free.
      
      I refactored the CIFS idmap shrinker a little - it really needs to be
      broken up into a shrinker per tree and keep an item count with the tree
      root so that we don't need to walk the tree every time the shrinker needs
      to count the number of objects in the tree (i.e.  all the time under
      memory pressure).
      
      [glommer@openvz.org: fixes for ext4, ubifs, nfs, cifs and glock. Fixes are needed mainly due to new code merged in the tree]
      [assorted fixes folded in]
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NGlauber Costa <glommer@openvz.org>
      Acked-by: NMel Gorman <mgorman@suse.de>
      Acked-by: NArtem Bityutskiy <artem.bityutskiy@linux.intel.com>
      Acked-by: NJan Kara <jack@suse.cz>
      Acked-by: NSteven Whitehouse <swhiteho@redhat.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
      Cc: Arve Hjønnevåg <arve@android.com>
      Cc: Carlos Maiolino <cmaiolino@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Chuck Lever <chuck.lever@oracle.com>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Gleb Natapov <gleb@redhat.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: J. Bruce Fields <bfields@redhat.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Kent Overstreet <koverstreet@google.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Thomas Hellstrom <thellstrom@vmware.com>
      Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      1ab6c499
    • G
      super: fix calculation of shrinkable objects for small numbers · 55f841ce
      Glauber Costa 提交于
      The sysctl knob sysctl_vfs_cache_pressure is used to determine which
      percentage of the shrinkable objects in our cache we should actively try
      to shrink.
      
      It works great in situations in which we have many objects (at least more
      than 100), because the aproximation errors will be negligible.  But if
      this is not the case, specially when total_objects < 100, we may end up
      concluding that we have no objects at all (total / 100 = 0, if total <
      100).
      
      This is certainly not the biggest killer in the world, but may matter in
      very low kernel memory situations.
      Signed-off-by: NGlauber Costa <glommer@openvz.org>
      Reviewed-by: NCarlos Maiolino <cmaiolino@redhat.com>
      Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Acked-by: NMel Gorman <mgorman@suse.de>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
      Cc: Arve Hjønnevåg <arve@android.com>
      Cc: Carlos Maiolino <cmaiolino@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Chuck Lever <chuck.lever@oracle.com>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Gleb Natapov <gleb@redhat.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: J. Bruce Fields <bfields@redhat.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Kent Overstreet <koverstreet@google.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Thomas Hellstrom <thellstrom@vmware.com>
      Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      55f841ce
  27. 04 9月, 2013 1 次提交