1. 11 9月, 2013 25 次提交
    • G
      super: fix for destroy lrus · f5e1dd34
      Glauber Costa 提交于
      This patch adds the missing call to list_lru_destroy (spotted by Li Zhong)
      and moves the deletion to after the shrinker is unregistered, as correctly
      spotted by Dave
      Signed-off-by: NGlauber Costa <glommer@openvz.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Dave Chinner <dchinner@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      f5e1dd34
    • G
      list_lru: dynamically adjust node arrays · 5ca302c8
      Glauber Costa 提交于
      We currently use a compile-time constant to size the node array for the
      list_lru structure.  Due to this, we don't need to allocate any memory at
      initialization time.  But as a consequence, the structures that contain
      embedded list_lru lists can become way too big (the superblock for
      instance contains two of them).
      
      This patch aims at ameliorating this situation by dynamically allocating
      the node arrays with the firmware provided nr_node_ids.
      Signed-off-by: NGlauber Costa <glommer@openvz.org>
      Cc: Dave Chinner <dchinner@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
      Cc: Arve Hjønnevåg <arve@android.com>
      Cc: Carlos Maiolino <cmaiolino@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Chuck Lever <chuck.lever@oracle.com>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Gleb Natapov <gleb@redhat.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: J. Bruce Fields <bfields@redhat.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Kent Overstreet <koverstreet@google.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Thomas Hellstrom <thellstrom@vmware.com>
      Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      5ca302c8
    • D
      fs: convert fs shrinkers to new scan/count API · 1ab6c499
      Dave Chinner 提交于
      Convert the filesystem shrinkers to use the new API, and standardise some
      of the behaviours of the shrinkers at the same time.  For example,
      nr_to_scan means the number of objects to scan, not the number of objects
      to free.
      
      I refactored the CIFS idmap shrinker a little - it really needs to be
      broken up into a shrinker per tree and keep an item count with the tree
      root so that we don't need to walk the tree every time the shrinker needs
      to count the number of objects in the tree (i.e.  all the time under
      memory pressure).
      
      [glommer@openvz.org: fixes for ext4, ubifs, nfs, cifs and glock. Fixes are needed mainly due to new code merged in the tree]
      [assorted fixes folded in]
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NGlauber Costa <glommer@openvz.org>
      Acked-by: NMel Gorman <mgorman@suse.de>
      Acked-by: NArtem Bityutskiy <artem.bityutskiy@linux.intel.com>
      Acked-by: NJan Kara <jack@suse.cz>
      Acked-by: NSteven Whitehouse <swhiteho@redhat.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
      Cc: Arve Hjønnevåg <arve@android.com>
      Cc: Carlos Maiolino <cmaiolino@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Chuck Lever <chuck.lever@oracle.com>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Gleb Natapov <gleb@redhat.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: J. Bruce Fields <bfields@redhat.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Kent Overstreet <koverstreet@google.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Thomas Hellstrom <thellstrom@vmware.com>
      Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      1ab6c499
    • D
      xfs: fix dquot isolation hang · 35163417
      Dave Chinner 提交于
      The new LRU list isolation code in xfs_qm_dquot_isolate() isn't
      completely up to date.  Firstly, it needs conversion to return enum
      lru_status values, not raw numbers. Secondly - most importantly - it
      fails to unlock the dquot and relock the LRU in the LRU_RETRY path.
      This leads to deadlocks in xfstests generic/232. Fix them.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Cc: Glauber Costa <glommer@gmail.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      35163417
    • A
      xfs-convert-dquot-cache-lru-to-list_lru-fix · 2f5b56f8
      Andrew Morton 提交于
      fix warnings
      
      Cc: Dave Chinner <dchinner@redhat.com>
      Cc: Glauber Costa <glommer@openvz.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      2f5b56f8
    • D
      xfs: convert dquot cache lru to list_lru · cd56a39a
      Dave Chinner 提交于
      Convert the XFS dquot lru to use the list_lru construct and convert the
      shrinker to being node aware.
      
      [glommer@openvz.org: edited for conflicts + warning fixes]
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NGlauber Costa <glommer@openvz.org>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
      Cc: Arve Hjønnevåg <arve@android.com>
      Cc: Carlos Maiolino <cmaiolino@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Chuck Lever <chuck.lever@oracle.com>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Gleb Natapov <gleb@redhat.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: J. Bruce Fields <bfields@redhat.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Kent Overstreet <koverstreet@google.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Thomas Hellstrom <thellstrom@vmware.com>
      Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      cd56a39a
    • D
      xfs: rework buffer dispose list tracking · a4082357
      Dave Chinner 提交于
      In converting the buffer lru lists to use the generic code, the locking
      for marking the buffers as on the dispose list was lost.  This results in
      confusion in LRU buffer tracking and acocunting, resulting in reference
      counts being mucked up and filesystem beig unmountable.
      
      To fix this, introduce an internal buffer spinlock to protect the state
      field that holds the dispose list information.  Because there is now
      locking needed around xfs_buf_lru_add/del, and they are used in exactly
      one place each two lines apart, get rid of the wrappers and code the logic
      directly in place.
      
      Further, the LRU emptying code used on unmount is less than optimal.
      Convert it to use a dispose list as per a normal shrinker walk, and repeat
      the walk that fills the dispose list until the LRU is empty.  Thi avoids
      needing to drop and regain the LRU lock for every item being freed, and
      allows the same logic as the shrinker isolate call to be used.  Simpler,
      easier to understand.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NGlauber Costa <glommer@openvz.org>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
      Cc: Arve Hjønnevåg <arve@android.com>
      Cc: Carlos Maiolino <cmaiolino@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Chuck Lever <chuck.lever@oracle.com>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Gleb Natapov <gleb@redhat.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: J. Bruce Fields <bfields@redhat.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Kent Overstreet <koverstreet@google.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Thomas Hellstrom <thellstrom@vmware.com>
      Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      a4082357
    • A
      xfs-convert-buftarg-lru-to-generic-code-fix · addbda40
      Andrew Morton 提交于
      fix warnings
      
      Cc: Dave Chinner <dchinner@redhat.com>
      Cc: Glauber Costa <glommer@openvz.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      addbda40
    • D
      xfs: convert buftarg LRU to generic code · e80dfa19
      Dave Chinner 提交于
      Convert the buftarg LRU to use the new generic LRU list and take advantage
      of the functionality it supplies to make the buffer cache shrinker node
      aware.
      Signed-off-by: NGlauber Costa <glommer@openvz.org>
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
      Cc: Arve Hjønnevåg <arve@android.com>
      Cc: Carlos Maiolino <cmaiolino@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Chuck Lever <chuck.lever@oracle.com>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Gleb Natapov <gleb@redhat.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: J. Bruce Fields <bfields@redhat.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Kent Overstreet <koverstreet@google.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Thomas Hellstrom <thellstrom@vmware.com>
      Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      e80dfa19
    • D
      fs: convert inode and dentry shrinking to be node aware · 9b17c623
      Dave Chinner 提交于
      Now that the shrinker is passing a node in the scan control structure, we
      can pass this to the the generic LRU list code to isolate reclaim to the
      lists on matching nodes.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NGlauber Costa <glommer@parallels.com>
      Acked-by: NMel Gorman <mgorman@suse.de>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
      Cc: Arve Hjønnevåg <arve@android.com>
      Cc: Carlos Maiolino <cmaiolino@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Chuck Lever <chuck.lever@oracle.com>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Gleb Natapov <gleb@redhat.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: J. Bruce Fields <bfields@redhat.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Kent Overstreet <koverstreet@google.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Thomas Hellstrom <thellstrom@vmware.com>
      Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      9b17c623
    • D
      shrinker: add node awareness · 0ce3d744
      Dave Chinner 提交于
      Pass the node of the current zone being reclaimed to shrink_slab(),
      allowing the shrinker control nodemask to be set appropriately for node
      aware shrinkers.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NGlauber Costa <glommer@openvz.org>
      Acked-by: NMel Gorman <mgorman@suse.de>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
      Cc: Arve Hjønnevåg <arve@android.com>
      Cc: Carlos Maiolino <cmaiolino@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Chuck Lever <chuck.lever@oracle.com>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Gleb Natapov <gleb@redhat.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: J. Bruce Fields <bfields@redhat.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Kent Overstreet <koverstreet@google.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Thomas Hellstrom <thellstrom@vmware.com>
      Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      0ce3d744
    • G
      list_lru: remove special case function list_lru_dispose_all. · 4e717f5c
      Glauber Costa 提交于
      The list_lru implementation has one function, list_lru_dispose_all, with
      only one user (the dentry code).  At first, such function appears to make
      sense because we are really not interested in the result of isolating each
      dentry separately - all of them are going away anyway.  However, it's
      implementation is buggy in the following way:
      
      When we call list_lru_dispose_all in fs/dcache.c, we scan all dentries
      marking them with DCACHE_SHRINK_LIST.  However, this is done without the
      nlru->lock taken.  The imediate result of that is that someone else may
      add or remove the dentry from the LRU at the same time.  When list_lru_del
      happens in that scenario we will see an element that is not yet marked
      with DCACHE_SHRINK_LIST (even though it will be in the future) and
      obviously remove it from an lru where the element no longer is.  Since
      list_lru_dispose_all will in effect count down nlru's nr_items and
      list_lru_del will do the same, this will lead to an imbalance.
      
      The solution for this would not be so simple: we can obviously just keep
      the lru_lock taken, but then we have no guarantees that we will be able to
      acquire the dentry lock (dentry->d_lock).  To properly solve this, we need
      a communication mechanism between the lru and dentry code, so they can
      coordinate this with each other.
      
      Such mechanism already exists in the form of the list_lru_walk_cb
      callback.  So it is possible to construct a dcache-side prune function
      that does the right thing only by calling list_lru_walk in a loop until no
      more dentries are available.
      
      With only one user, plus the fact that a sane solution for the problem
      would involve boucing between dcache and list_lru anyway, I see little
      justification to keep the special case list_lru_dispose_all in tree.
      Signed-off-by: NGlauber Costa <glommer@openvz.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Acked-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      4e717f5c
    • D
      dcache: convert to use new lru list infrastructure · f6041567
      Dave Chinner 提交于
      [glommer@openvz.org: don't reintroduce double decrement of nr_unused_dentries, adapted for new LRU return codes]
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NGlauber Costa <glommer@openvz.org>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
      Cc: Arve Hjønnevåg <arve@android.com>
      Cc: Carlos Maiolino <cmaiolino@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Chuck Lever <chuck.lever@oracle.com>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Gleb Natapov <gleb@redhat.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: J. Bruce Fields <bfields@redhat.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Kent Overstreet <koverstreet@google.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Thomas Hellstrom <thellstrom@vmware.com>
      Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      f6041567
    • G
      inode: move inode to a different list inside lock · d38fa698
      Glauber Costa 提交于
      When removing an element from the lru, this will be done today after the lock
      is released. This is a clear mistake, although we are not sure if the bugs we
      are seeing are related to this. All list manipulations are done inside the
      lock, and so should this one.
      Signed-off-by: NGlauber Costa <glommer@openvz.org>
      Tested-by: NMichal Hocko <mhocko@suse.cz>
      Cc: Dave Chinner <dchinner@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      d38fa698
    • D
      inode: convert inode lru list to generic lru list code. · bc3b14cb
      Dave Chinner 提交于
      [glommer@openvz.org: adapted for new LRU return codes]
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NGlauber Costa <glommer@openvz.org>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
      Cc: Arve Hjønnevåg <arve@android.com>
      Cc: Carlos Maiolino <cmaiolino@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Chuck Lever <chuck.lever@oracle.com>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Gleb Natapov <gleb@redhat.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: J. Bruce Fields <bfields@redhat.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Kent Overstreet <koverstreet@google.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Thomas Hellstrom <thellstrom@vmware.com>
      Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      bc3b14cb
    • D
      shrinker: convert superblock shrinkers to new API · 0a234c6d
      Dave Chinner 提交于
      Convert superblock shrinker to use the new count/scan API, and propagate
      the API changes through to the filesystem callouts.  The filesystem
      callouts already use a count/scan API, so it's just changing counters to
      longs to match the VM API.
      
      This requires the dentry and inode shrinker callouts to be converted to
      the count/scan API.  This is mainly a mechanical change.
      
      [glommer@openvz.org: use mult_frac for fractional proportions, build fixes]
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NGlauber Costa <glommer@openvz.org>
      Acked-by: NMel Gorman <mgorman@suse.de>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
      Cc: Arve Hjønnevåg <arve@android.com>
      Cc: Carlos Maiolino <cmaiolino@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Chuck Lever <chuck.lever@oracle.com>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Gleb Natapov <gleb@redhat.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: J. Bruce Fields <bfields@redhat.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Kent Overstreet <koverstreet@google.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Thomas Hellstrom <thellstrom@vmware.com>
      Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      0a234c6d
    • D
      dcache: remove dentries from LRU before putting on dispose list · dd1f6b2e
      Dave Chinner 提交于
      One of the big problems with modifying the way the dcache shrinker and LRU
      implementation works is that the LRU is abused in several ways.  One of
      these is shrink_dentry_list().
      
      Basically, we can move a dentry off the LRU onto a different list without
      doing any accounting changes, and then use dentry_lru_prune() to remove it
      from what-ever list it is now on to do the LRU accounting at that point.
      
      This makes it -really hard- to change the LRU implementation.  The use of
      the per-sb LRU lock serialises movement of the dentries between the
      different lists and the removal of them, and this is the only reason that
      it works.  If we want to break up the dentry LRU lock and lists into, say,
      per-node lists, we remove the only serialisation that allows this lru
      list/dispose list abuse to work.
      
      To make this work effectively, the dispose list has to be isolated from
      the LRU list - dentries have to be removed from the LRU *before* being
      placed on the dispose list.  This means that the LRU accounting and
      isolation is completed before disposal is started, and that means we can
      change the LRU implementation freely in future.
      
      This means that dentries *must* be marked with DCACHE_SHRINK_LIST when
      they are placed on the dispose list so that we don't think that parent
      dentries found in try_prune_one_dentry() are on the LRU when the are
      actually on the dispose list.  This would result in accounting the dentry
      to the LRU a second time.  Hence dentry_lru_del() has to handle the
      DCACHE_SHRINK_LIST case
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NGlauber Costa <glommer@openvz.org>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
      Cc: Arve Hjønnevåg <arve@android.com>
      Cc: Carlos Maiolino <cmaiolino@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Chuck Lever <chuck.lever@oracle.com>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Gleb Natapov <gleb@redhat.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: J. Bruce Fields <bfields@redhat.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Kent Overstreet <koverstreet@google.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Thomas Hellstrom <thellstrom@vmware.com>
      Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      dd1f6b2e
    • D
      dentry: move to per-sb LRU locks · 19156840
      Dave Chinner 提交于
      With the dentry LRUs being per-sb structures, there is no real need for
      a global dentry_lru_lock. The locking can be made more fine-grained by
      moving to a per-sb LRU lock, isolating the LRU operations of different
      filesytsems completely from each other. The need for this is independent
      of any performance consideration that may arise: in the interest of
      abstracting the lru operations away, it is mandatory that each lru works
      around its own lock instead of a global lock for all of them.
      
      [glommer@openvz.org: updated changelog ]
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NGlauber Costa <glommer@openvz.org>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Acked-by: NMel Gorman <mgorman@suse.de>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
      Cc: Arve Hjønnevåg <arve@android.com>
      Cc: Carlos Maiolino <cmaiolino@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Chuck Lever <chuck.lever@oracle.com>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Gleb Natapov <gleb@redhat.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: J. Bruce Fields <bfields@redhat.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Kent Overstreet <koverstreet@google.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Thomas Hellstrom <thellstrom@vmware.com>
      Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      19156840
    • D
      dcache: convert dentry_stat.nr_unused to per-cpu counters · 62d36c77
      Dave Chinner 提交于
      Before we split up the dcache_lru_lock, the unused dentry counter needs to
      be made independent of the global dcache_lru_lock.  Convert it to per-cpu
      counters to do this.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NGlauber Costa <glommer@openvz.org>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Acked-by: NMel Gorman <mgorman@suse.de>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
      Cc: Arve Hjønnevåg <arve@android.com>
      Cc: Carlos Maiolino <cmaiolino@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Chuck Lever <chuck.lever@oracle.com>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Gleb Natapov <gleb@redhat.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: J. Bruce Fields <bfields@redhat.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Kent Overstreet <koverstreet@google.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Thomas Hellstrom <thellstrom@vmware.com>
      Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      62d36c77
    • G
      super: fix calculation of shrinkable objects for small numbers · 55f841ce
      Glauber Costa 提交于
      The sysctl knob sysctl_vfs_cache_pressure is used to determine which
      percentage of the shrinkable objects in our cache we should actively try
      to shrink.
      
      It works great in situations in which we have many objects (at least more
      than 100), because the aproximation errors will be negligible.  But if
      this is not the case, specially when total_objects < 100, we may end up
      concluding that we have no objects at all (total / 100 = 0, if total <
      100).
      
      This is certainly not the biggest killer in the world, but may matter in
      very low kernel memory situations.
      Signed-off-by: NGlauber Costa <glommer@openvz.org>
      Reviewed-by: NCarlos Maiolino <cmaiolino@redhat.com>
      Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Acked-by: NMel Gorman <mgorman@suse.de>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
      Cc: Arve Hjønnevåg <arve@android.com>
      Cc: Carlos Maiolino <cmaiolino@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Chuck Lever <chuck.lever@oracle.com>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Gleb Natapov <gleb@redhat.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: J. Bruce Fields <bfields@redhat.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Kent Overstreet <koverstreet@google.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Thomas Hellstrom <thellstrom@vmware.com>
      Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      55f841ce
    • G
      fs: bump inode and dentry counters to long · 3942c07c
      Glauber Costa 提交于
      This series reworks our current object cache shrinking infrastructure in
      two main ways:
      
       * Noticing that a lot of users copy and paste their own version of LRU
         lists for objects, we put some effort in providing a generic version.
         It is modeled after the filesystem users: dentries, inodes, and xfs
         (for various tasks), but we expect that other users could benefit in
         the near future with little or no modification.  Let us know if you
         have any issues.
      
       * The underlying list_lru being proposed automatically and
         transparently keeps the elements in per-node lists, and is able to
         manipulate the node lists individually.  Given this infrastructure, we
         are able to modify the up-to-now hammer called shrink_slab to proceed
         with node-reclaim instead of always searching memory from all over like
         it has been doing.
      
      Per-node lru lists are also expected to lead to less contention in the lru
      locks on multi-node scans, since we are now no longer fighting for a
      global lock.  The locks usually disappear from the profilers with this
      change.
      
      Although we have no official benchmarks for this version - be our guest to
      independently evaluate this - earlier versions of this series were
      performance tested (details at
      http://permalink.gmane.org/gmane.linux.kernel.mm/100537) yielding no
      visible performance regressions while yielding a better qualitative
      behavior in NUMA machines.
      
      With this infrastructure in place, we can use the list_lru entry point to
      provide memcg isolation and per-memcg targeted reclaim.  Historically,
      those two pieces of work have been posted together.  This version presents
      only the infrastructure work, deferring the memcg work for a later time,
      so we can focus on getting this part tested.  You can see more about the
      history of such work at http://lwn.net/Articles/552769/
      
      Dave Chinner (18):
        dcache: convert dentry_stat.nr_unused to per-cpu counters
        dentry: move to per-sb LRU locks
        dcache: remove dentries from LRU before putting on dispose list
        mm: new shrinker API
        shrinker: convert superblock shrinkers to new API
        list: add a new LRU list type
        inode: convert inode lru list to generic lru list code.
        dcache: convert to use new lru list infrastructure
        list_lru: per-node list infrastructure
        shrinker: add node awareness
        fs: convert inode and dentry shrinking to be node aware
        xfs: convert buftarg LRU to generic code
        xfs: rework buffer dispose list tracking
        xfs: convert dquot cache lru to list_lru
        fs: convert fs shrinkers to new scan/count API
        drivers: convert shrinkers to new count/scan API
        shrinker: convert remaining shrinkers to count/scan API
        shrinker: Kill old ->shrink API.
      
      Glauber Costa (7):
        fs: bump inode and dentry counters to long
        super: fix calculation of shrinkable objects for small numbers
        list_lru: per-node API
        vmscan: per-node deferred work
        i915: bail out earlier when shrinker cannot acquire mutex
        hugepage: convert huge zero page shrinker to new shrinker API
        list_lru: dynamically adjust node arrays
      
      This patch:
      
      There are situations in very large machines in which we can have a large
      quantity of dirty inodes, unused dentries, etc.  This is particularly true
      when umounting a filesystem, where eventually since every live object will
      eventually be discarded.
      
      Dave Chinner reported a problem with this while experimenting with the
      shrinker revamp patchset.  So we believe it is time for a change.  This
      patch just moves int to longs.  Machines where it matters should have a
      big long anyway.
      Signed-off-by: NGlauber Costa <glommer@openvz.org>
      Cc: Dave Chinner <dchinner@redhat.com>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
      Cc: Arve Hjønnevåg <arve@android.com>
      Cc: Carlos Maiolino <cmaiolino@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Chuck Lever <chuck.lever@oracle.com>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: Dave Chinner <dchinner@redhat.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Gleb Natapov <gleb@redhat.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: J. Bruce Fields <bfields@redhat.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Kent Overstreet <koverstreet@google.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Thomas Hellstrom <thellstrom@vmware.com>
      Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      3942c07c
    • D
      da5338c7
    • A
      bcce56d5
    • C
      fs: remove vfs_follow_link · aac34df1
      Christoph Hellwig 提交于
      For a long time no filesystem has been using vfs_follow_link, and as seen
      by recent filesystem submissions any new use is accidental as well.
      
      Remove vfs_follow_link, document the replacement in
      Documentation/filesystems/porting and also rename __vfs_follow_link
      to match its only caller better.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      aac34df1
    • L
      vfs: make sure we don't have a stale root path if unlazy_walk() fails · d0d27277
      Linus Torvalds 提交于
      When I moved the RCU walk termination into unlazy_walk(), I didn't copy
      quite all of it: for the successful RCU termination we properly add the
      necessary reference counts to our temporary copy of the root path, but
      for the failure case we need to make sure that any temporary root path
      information is cleared out (since it does _not_ have the proper
      reference counts from the RCU lookup).
      
      We could clean up this mess by just always dropping the temporary root
      information, but Al points out that that would mean that a single lookup
      through symlinks could see multiple different root entries if it races
      with another thread doing chroot.  Not that I think we should really
      care (we had that before too, back before we had a copy of the root path
      in the nameidata).
      
      Al says he has a cunning plan.  In the meantime, this is the minimal fix
      for the problem, even if it's not all that pretty.
      Reported-by: NMace Moneta <moneta.mace@gmail.com>
      Acked-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d0d27277
  2. 10 9月, 2013 3 次提交
    • A
      split read_seqretry_or_unlock(), convert d_walk() to resulting primitives · 48f5ec21
      Al Viro 提交于
      Separate "check if we need to retry" from "unlock if we are done and
      had seq_writelock"; that allows to use these guys in d_walk(), where
      we need to recheck every time we ascend back to parent, but do *not*
      want to unlock until the very end.  Lift rcu_read_lock/rcu_read_unlock
      out into callers.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      48f5ec21
    • O
      direct-io: Use return from cmpxchg to decide of assignment happened · 45150c43
      Olof Johansson 提交于
      Not using the return value can in the generic case be racy, so it's
      in general good practice to check the return value instead.
      
      This also resolved the warning caused on ARM and other architectures:
      
        fs/direct-io.c: In function 'sb_init_dio_done_wq':
        fs/direct-io.c:557:2: warning: value computed is not used [-Wunused-value]
      Signed-off-by: NOlof Johansson <olof@lixom.net>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: H Peter Anvin <hpa@zytor.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      45150c43
    • W
      dcache: Translating dentry into pathname without taking rename_lock · 232d2d60
      Waiman Long 提交于
      When running the AIM7's short workload, Linus' lockref patch eliminated
      most of the spinlock contention. However, there were still some left:
      
           8.46%     reaim  [kernel.kallsyms]     [k] _raw_spin_lock
                       |--42.21%-- d_path
                       |          proc_pid_readlink
                       |          SyS_readlinkat
                       |          SyS_readlink
                       |          system_call
                       |          __GI___readlink
                       |
                       |--40.97%-- sys_getcwd
                       |          system_call
                       |          __getcwd
      
      The big one here is the rename_lock (seqlock) contention in d_path()
      and the getcwd system call. This patch will eliminate the need to take
      the rename_lock while translating dentries into the full pathnames.
      
      The need to take the rename_lock is to make sure that no rename
      operation can be ongoing while the translation is in progress. However,
      only one thread can take the rename_lock thus blocking all the other
      threads that need it even though the translation process won't make
      any change to the dentries.
      
      This patch will replace the writer's write_seqlock/write_sequnlock
      sequence of the rename_lock of the callers of the prepend_path() and
      __dentry_path() functions with the reader's read_seqbegin/read_seqretry
      sequence within these 2 functions. As a result, the code will have to
      retry if one or more rename operations had been performed. In addition,
      RCU read lock will be taken during the translation process to make sure
      that no dentries will go away. To prevent live-lock from happening,
      the code will switch back to take the rename_lock if read_seqretry()
      fails for three times.
      
      To further reduce spinlock contention, this patch does not take the
      dentry's d_lock when copying the filename from the dentries. Instead,
      it treats the name pointer and length as unreliable and just copy
      the string byte-by-byte over until it hits a null byte or the end of
      string as specified by the length. This should avoid stepping into
      invalid memory address. The error cases are left to be handled by
      the sequence number check.
      
      The following code re-factoring are also made:
      1. Move prepend('/') into prepend_name() to remove one conditional
         check.
      2. Move the global root check in prepend_path() back to the top of
         the while loop.
      
      With this patch, the _raw_spin_lock will now account for only 1.2%
      of the total CPU cycles for the short workload. This patch also has
      the effect of reducing the effect of running perf on its profile
      since the perf command itself can be a heavy user of the d_path()
      function depending on the complexity of the workload.
      
      When taking the perf profile of the high-systime workload, the amount
      of spinlock contention contributed by running perf without this patch
      was about 16%. With this patch, the spinlock contention caused by
      the running of perf will go away and we will have a more accurate
      perf profile.
      Signed-off-by: NWaiman Long <Waiman.Long@hp.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      232d2d60
  3. 09 9月, 2013 7 次提交
    • I
      autofs4 - fix device ioctl mount lookup · ac838719
      Ian Kent 提交于
      When reconnecting to automounts at startup an autofs ioctl is used
      to find the device and inode of existing mounts so they can be used
      to open a file descriptor of possibly covered mounts.
      
      At this time the the caller might not yet "own" the mount so it can
      trigger calling ->d_automount(). This causes automount to hang when
      trying to reconnect to direct or offset mount types.
      
      Consequently kern_path() can't be used but kern_path_mountpoint() can be.
      Signed-off-by: NIan Kent <raven@themaw.net>
      Cc: Jeff Layton <jlayton@redhat.com>
      Cc: Al Viro <viro@ZenIV.linux.org.uk>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      ac838719
    • L
      vfs: fix dentry RCU to refcounting possibly sleeping dput() · e5c832d5
      Linus Torvalds 提交于
      This is the fix that the last two commits indirectly led up to - making
      sure that we don't call dput() in a bad context on the dentries we've
      looked up in RCU mode after the sequence count validation fails.
      
      This basically expands d_rcu_to_refcount() into the callers, and then
      fixes the callers to delay the dput() in the failure case until _after_
      we've dropped all locks and are no longer in an RCU-locked region.
      
      The case of 'complete_walk()' was trivial, since its failure case did
      the unlock_rcu_walk() directly after the call to d_rcu_to_refcount(),
      and as such that is just a pure expansion of the function with a trivial
      movement of the resulting dput() to after 'unlock_rcu_walk()'.
      
      In contrast, the unlazy_walk() case was much more complicated, because
      not only does convert two different dentries from RCU to be reference
      counted, but it used to not call unlock_rcu_walk() at all, and instead
      just returned an error and let the caller clean everything up in
      "terminate_walk()".
      
      Happily, one of the dentries in question (called "parent" inside
      unlazy_walk()) is the dentry of "nd->path", which terminate_walk() wants
      a refcount to anyway for the non-RCU case.
      
      So what the new and improved unlazy_walk() does is to first turn that
      dentry into a refcounted one, and once that is set up, the error cases
      can continue to use the terminate_walk() helper for cleanup, but for the
      non-RCU case.  Which makes it possible to drop out of RCU mode if we
      actually hit the sequence number failure case.
      Acked-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e5c832d5
    • A
      introduce kern_path_mountpoint() · 2d864651
      Al Viro 提交于
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      2d864651
    • A
      rename user_path_umountat() to user_path_mountpoint_at() · 197df04c
      Al Viro 提交于
      ... and move the extern from linux/namei.h to fs/internal.h,
      along with that of vfs_path_lookup().
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      197df04c
    • A
      take unlazy_walk() into umount_lookup_last() · 35759521
      Al Viro 提交于
      ... and massage it a bit to reduce nesting
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      35759521
    • L
      vfs: use lockred "dead" flag to mark unrecoverably dead dentries · 0d98439e
      Linus Torvalds 提交于
      This simplifies the RCU to refcounting code in particular.
      
      I was originally intending to leave this for later, but walking through
      all the dput() logic (see previous commit), I realized that the dput()
      "might_sleep()" check was misleadingly weak.  And I removed it as
      misleading, both for performance profiling and for debugging.
      
      However, the might_sleep() debugging case is actually true: the final
      dput() can indeed sleep, if the inode of the dentry that you are
      releasing ends up sleeping at iput time (see dentry_iput()).  So the
      problem with the might_sleep() in dput() wasn't that it wasn't true, it
      was that it wasn't actually testing and triggering on the interesting
      case.
      
      In particular, just about *any* dput() can indeed sleep, if you happen
      to race with another thread deleting the file in question, and you then
      lose the race to the be the last dput() for that file.  But because it's
      a very rare race, the debugging code would never trigger it in practice.
      
      Why is this problematic? The new d_rcu_to_refcount() (see commit
      15570086: "vfs: reimplement d_rcu_to_refcount() using
      lockref_get_or_lock()") does a dput() for the failure case, and it does
      it under the RCU lock.  So potentially sleeping really is a bug.
      
      But there's no way I'm going to fix this with the previous complicated
      "lockref_get_or_lock()" interface.  And rather than revert to the old
      and crufty nested dentry locking code (which did get this right by
      delaying the reference count updates until they were verified to be
      safe), let's make forward progress.
      
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0d98439e
    • L
      vfs: reorganize dput() memory accesses · 8aab6a27
      Linus Torvalds 提交于
      This is me being a bit OCD after all the dentry optimization work this
      merge window: profiles end up showing 'dput()' as a rather expensive
      operation, and there were two unrelated bad reasons for that.
      
      The first reason was reading d_lockref.count for debugging purposes,
      which touches the lockref cacheline (for reads) before really need to.
      More importantly, the debugging test in question is _wrong_, and has
      hidden bugs.  It's true that we can only sleep when the count goes down
      to zero, but the test as-is hides the much more subtle bug that happens
      if we race with somebody else deleting the file.
      
      Anyway we _will_ touch that cacheline, but let's do it for a write and
      in the right routine (ie in "lockref_put_or_lock()") which annotates the
      costs better.  So remove the misleading debug code.
      
      The other was an unnecessary access to the cacheline that contains the
      d_lru list, just to check whether we already were on the LRU list or
      not.  This is exactly what we have d_flags for, so that we can avoid
      touching extra cache lines for the common case.  So just add another bit
      for "is this dentry on the LRU".
      
      Finally, mark the tests properly likely/unlikely, so that the common
      fast-paths are dense in the instruction stream.
      
      This makes the profiles look much saner.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8aab6a27
  4. 08 9月, 2013 5 次提交