1. 11 9月, 2013 5 次提交
    • G
      list_lru: remove special case function list_lru_dispose_all. · 4e717f5c
      Glauber Costa 提交于
      The list_lru implementation has one function, list_lru_dispose_all, with
      only one user (the dentry code).  At first, such function appears to make
      sense because we are really not interested in the result of isolating each
      dentry separately - all of them are going away anyway.  However, it's
      implementation is buggy in the following way:
      
      When we call list_lru_dispose_all in fs/dcache.c, we scan all dentries
      marking them with DCACHE_SHRINK_LIST.  However, this is done without the
      nlru->lock taken.  The imediate result of that is that someone else may
      add or remove the dentry from the LRU at the same time.  When list_lru_del
      happens in that scenario we will see an element that is not yet marked
      with DCACHE_SHRINK_LIST (even though it will be in the future) and
      obviously remove it from an lru where the element no longer is.  Since
      list_lru_dispose_all will in effect count down nlru's nr_items and
      list_lru_del will do the same, this will lead to an imbalance.
      
      The solution for this would not be so simple: we can obviously just keep
      the lru_lock taken, but then we have no guarantees that we will be able to
      acquire the dentry lock (dentry->d_lock).  To properly solve this, we need
      a communication mechanism between the lru and dentry code, so they can
      coordinate this with each other.
      
      Such mechanism already exists in the form of the list_lru_walk_cb
      callback.  So it is possible to construct a dcache-side prune function
      that does the right thing only by calling list_lru_walk in a loop until no
      more dentries are available.
      
      With only one user, plus the fact that a sane solution for the problem
      would involve boucing between dcache and list_lru anyway, I see little
      justification to keep the special case list_lru_dispose_all in tree.
      Signed-off-by: NGlauber Costa <glommer@openvz.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Acked-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      4e717f5c
    • G
      list_lru: per-node API · 6a4f496f
      Glauber Costa 提交于
      This patch adapts the list_lru API to accept an optional node argument, to
      be used by NUMA aware shrinking functions.  Code that does not care about
      the NUMA placement of objects can still call into the very same functions
      as before.  They will simply iterate over all nodes.
      Signed-off-by: NGlauber Costa <glommer@openvz.org>
      Cc: Dave Chinner <dchinner@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
      Cc: Arve Hjønnevåg <arve@android.com>
      Cc: Carlos Maiolino <cmaiolino@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Chuck Lever <chuck.lever@oracle.com>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Gleb Natapov <gleb@redhat.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: J. Bruce Fields <bfields@redhat.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Kent Overstreet <koverstreet@google.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Thomas Hellstrom <thellstrom@vmware.com>
      Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      6a4f496f
    • D
      list_lru: fix broken LRU_RETRY behaviour · 5cedf721
      Dave Chinner 提交于
      The LRU_RETRY code assumes that the list traversal status after we have
      dropped and regained the list lock.  Unfortunately, this is not a valid
      assumption, and that can lead to racing traversals isolating objects that
      the other traversal expects to be the next item on the list.
      
      This is causing problems with the inode cache shrinker isolation, with
      races resulting in an inode on a dispose list being "isolated" because a
      racing traversal still thinks it is on the LRU.  The inode is then never
      reclaimed and that causes hangs if a subsequent lookup on that inode
      occurs.
      
      Fix it by always restarting the list walk on a LRU_RETRY return from the
      isolate callback.  Avoid the possibility of livelocks the current code was
      trying to avoid by always decrementing the nr_to_walk counter on retries
      so that even if we keep hitting the same item on the list we'll eventually
      stop trying to walk and exit out of the situation causing the problem.
      Reported-by: NMichal Hocko <mhocko@suse.cz>
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Cc: Glauber Costa <glommer@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      5cedf721
    • D
      list_lru: per-node list infrastructure · 3b1d58a4
      Dave Chinner 提交于
      Now that we have an LRU list API, we can start to enhance the
      implementation.  This splits the single LRU list into per-node lists and
      locks to enhance scalability.  Items are placed on lists according to the
      node the memory belongs to.  To make scanning the lists efficient, also
      track whether the per-node lists have entries in them in a active
      nodemask.
      
      Note: We use a fixed-size array for the node LRU, this struct can be very
      big if MAX_NUMNODES is big.  If this becomes a problem this is fixable by
      turning this into a pointer and dynamically allocating this to
      nr_node_ids.  This quantity is firwmare-provided, and still would provide
      room for all nodes at the cost of a pointer lookup and an extra
      allocation.  Because that allocation will most likely come from a may very
      well fail.
      
      [glommer@openvz.org: fix warnings, added note about node lru]
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NGlauber Costa <glommer@openvz.org>
      Reviewed-by: NGreg Thelen <gthelen@google.com>
      Acked-by: NMel Gorman <mgorman@suse.de>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
      Cc: Arve Hjønnevåg <arve@android.com>
      Cc: Carlos Maiolino <cmaiolino@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Chuck Lever <chuck.lever@oracle.com>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Gleb Natapov <gleb@redhat.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: J. Bruce Fields <bfields@redhat.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Kent Overstreet <koverstreet@google.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Thomas Hellstrom <thellstrom@vmware.com>
      Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      3b1d58a4
    • D
      list: add a new LRU list type · a38e4082
      Dave Chinner 提交于
      Several subsystems use the same construct for LRU lists - a list head, a
      spin lock and and item count.  They also use exactly the same code for
      adding and removing items from the LRU.  Create a generic type for these
      LRU lists.
      
      This is the beginning of generic, node aware LRUs for shrinkers to work
      with.
      
      [glommer@openvz.org: enum defined constants for lru. Suggested by gthelen, don't relock over retry]
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NGlauber Costa <glommer@openvz.org>
      Reviewed-by: NGreg Thelen <gthelen@google.com>
      Acked-by: NMel Gorman <mgorman@suse.de>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
      Cc: Arve Hjønnevåg <arve@android.com>
      Cc: Carlos Maiolino <cmaiolino@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Chuck Lever <chuck.lever@oracle.com>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Gleb Natapov <gleb@redhat.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: J. Bruce Fields <bfields@redhat.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Kent Overstreet <koverstreet@google.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Thomas Hellstrom <thellstrom@vmware.com>
      Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      a38e4082