1. 04 1月, 2012 29 次提交
  2. 07 1月, 2011 1 次提交
    • N
      fs: scale mntget/mntput · b3e19d92
      Nick Piggin 提交于
      The problem that this patch aims to fix is vfsmount refcounting scalability.
      We need to take a reference on the vfsmount for every successful path lookup,
      which often go to the same mount point.
      
      The fundamental difficulty is that a "simple" reference count can never be made
      scalable, because any time a reference is dropped, we must check whether that
      was the last reference. To do that requires communication with all other CPUs
      that may have taken a reference count.
      
      We can make refcounts more scalable in a couple of ways, involving keeping
      distributed counters, and checking for the global-zero condition less
      frequently.
      
      - check the global sum once every interval (this will delay zero detection
        for some interval, so it's probably a showstopper for vfsmounts).
      
      - keep a local count and only taking the global sum when local reaches 0 (this
        is difficult for vfsmounts, because we can't hold preempt off for the life of
        a reference, so a counter would need to be per-thread or tied strongly to a
        particular CPU which requires more locking).
      
      - keep a local difference of increments and decrements, which allows us to sum
        the total difference and hence find the refcount when summing all CPUs. Then,
        keep a single integer "long" refcount for slow and long lasting references,
        and only take the global sum of local counters when the long refcount is 0.
      
      This last scheme is what I implemented here. Attached mounts and process root
      and working directory references are "long" references, and everything else is
      a short reference.
      
      This allows scalable vfsmount references during path walking over mounted
      subtrees and unattached (lazy umounted) mounts with processes still running
      in them.
      
      This results in one fewer atomic op in the fastpath: mntget is now just a
      per-CPU inc, rather than an atomic inc; and mntput just requires a spinlock
      and non-atomic decrement in the common case. However code is otherwise bigger
      and heavier, so single threaded performance is basically a wash.
      Signed-off-by: NNick Piggin <npiggin@kernel.dk>
      b3e19d92
  3. 18 8月, 2010 1 次提交
    • N
      fs: brlock vfsmount_lock · 99b7db7b
      Nick Piggin 提交于
      fs: brlock vfsmount_lock
      
      Use a brlock for the vfsmount lock. It must be taken for write whenever
      modifying the mount hash or associated fields, and may be taken for read when
      performing mount hash lookups.
      
      A new lock is added for the mnt-id allocator, so it doesn't need to take
      the heavy vfsmount write-lock.
      
      The number of atomics should remain the same for fastpath rlock cases, though
      code would be slightly slower due to per-cpu access. Scalability is not not be
      much improved in common cases yet, due to other locks (ie. dcache_lock) getting
      in the way. However path lookups crossing mountpoints should be one case where
      scalability is improved (currently requiring the global lock).
      
      The slowpath is slower due to use of brlock. On a 64 core, 64 socket, 32 node
      Altix system (high latency to remote nodes), a simple umount microbenchmark
      (mount --bind mnt mnt2 ; umount mnt2 loop 1000 times), before this patch it
      took 6.8s, afterwards took 7.1s, about 5% slower.
      
      Cc: Al Viro <viro@ZenIV.linux.org.uk>
      Signed-off-by: NNick Piggin <npiggin@kernel.dk>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      99b7db7b
  4. 04 3月, 2010 1 次提交
    • A
      Kill CL_PROPAGATION, sanitize fs/pnode.c:get_source() · 796a6b52
      Al Viro 提交于
      First of all, get_source() never results in CL_PROPAGATION
      alone.  We either get CL_MAKE_SHARED (for the continuation
      of peer group) or CL_SLAVE (slave that is not shared) or both
      (beginning of peer group among slaves).  Massage the code to
      make that explicit, kill CL_PROPAGATION test in clone_mnt()
      (nothing sets CL_MAKE_SHARED without CL_PROPAGATION and in
      clone_mnt() we are checking CL_PROPAGATION after we'd found
      that there's no CL_SLAVE, so the check for CL_MAKE_SHARED
      would do just as well).
      
      Fix comments, while we are at it...
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      796a6b52
  5. 23 4月, 2008 2 次提交
  6. 22 4月, 2008 2 次提交
  7. 28 3月, 2008 1 次提交
  8. 07 2月, 2008 1 次提交
    • A
      MNT_UNBINDABLE fix · 0b03cfb2
      Andries E. Brouwer 提交于
      Some time ago ( http://lkml.org/lkml/2007/6/19/128 ) I wrote about
      MNT_UNBINDABLE that it felt like a bug that it is not reset by "mount
      --make-private".
      
      Today I happened to see mount(8) and Documentation/sharedsubtree.txt and
      both document the version obtained by applying the little patch given in
      the above (and again below).
      
      So, the present kernel code is not according to specs and must be regarded
      as buggy.
      
      Specification in Documentation/sharedsubtree.txt:
      See state diagram: unbindable should become private upon make-private.
      
      Specification in mount(8):
          ...  It's
          also possible to  set  up  uni-directional  propagation  (with  --make-
          slave),  to  make  a  mount  point unavailable for --bind/--rbind (with
          --make-unbindable), and to undo any  of  these  (with  --make-private).
      
      Repeat of old fix-shared-subtrees-make-private.patch
      (due to Dirk Gerrits, René Gabriëls, Peter Kooijmans):
      Acked-by: NRam Pai <linuxram@us.ibm.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0b03cfb2
  9. 09 5月, 2007 1 次提交
    • P
      Introduce a handy list_first_entry macro · b5e61818
      Pavel Emelianov 提交于
      There are many places in the kernel where the construction like
      
         foo = list_entry(head->next, struct foo_struct, list);
      
      are used.
      The code might look more descriptive and neat if using the macro
      
         list_first_entry(head, type, member) \
                   list_entry((head)->next, type, member)
      
      Here is the macro itself and the examples of its usage in the generic code.
       If it will turn out to be useful, I can prepare the set of patches to
      inject in into arch-specific code, drivers, networking, etc.
      Signed-off-by: NPavel Emelianov <xemul@openvz.org>
      Signed-off-by: NKirill Korotaev <dev@openvz.org>
      Cc: Randy Dunlap <randy.dunlap@oracle.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Zach Brown <zach.brown@oracle.com>
      Cc: Davide Libenzi <davidel@xmailserver.org>
      Cc: John McCutchan <ttb@tentacle.dhs.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: john stultz <johnstul@us.ibm.com>
      Cc: Ram Pai <linuxram@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b5e61818
  10. 09 12月, 2006 1 次提交