1. 07 1月, 2011 23 次提交
    • N
      fs: dcache reduce d_parent locking · a734eb45
      Nick Piggin 提交于
      Use RCU to simplify locking in dget_parent.
      Signed-off-by: NNick Piggin <npiggin@kernel.dk>
      a734eb45
    • N
      fs: dcache rationalise dget variants · dc0474be
      Nick Piggin 提交于
      dget_locked was a shortcut to avoid the lazy lru manipulation when we already
      held dcache_lock (lru manipulation was relatively cheap at that point).
      However, how that the lru lock is an innermost one, we never hold it at any
      caller, so the lock cost can now be avoided. We already have well working lazy
      dcache LRU, so it should be fine to defer LRU manipulations to scan time.
      Signed-off-by: NNick Piggin <npiggin@kernel.dk>
      dc0474be
    • N
      fs: dcache reduce dcache_inode_lock · 357f8e65
      Nick Piggin 提交于
      dcache_inode_lock can be avoided in d_delete() and d_materialise_unique()
      in cases where it is not required.
      Signed-off-by: NNick Piggin <npiggin@kernel.dk>
      357f8e65
    • N
      fs: dcache reduce locking in d_alloc · 89ad485f
      Nick Piggin 提交于
      Signed-off-by: NNick Piggin <npiggin@kernel.dk>
      89ad485f
    • N
      fs: dcache reduce dput locking · 61f3dee4
      Nick Piggin 提交于
      It is possible to run dput without taking data structure locks up-front. In
      many cases where we don't kill the dentry anyway, these locks are not required.
      Signed-off-by: NNick Piggin <npiggin@kernel.dk>
      61f3dee4
    • N
      fs: dcache avoid starvation in dcache multi-step operations · 58db63d0
      Nick Piggin 提交于
      Long lived dcache "multi-step" operations which retry on rename seq can
      be starved with a lot of rename activity. If they fail after the 1st pass,
      take the rename_lock for writing to avoid further starvation.
      Signed-off-by: NNick Piggin <npiggin@kernel.dk>
      58db63d0
    • N
      fs: dcache remove dcache_lock · b5c84bf6
      Nick Piggin 提交于
      dcache_lock no longer protects anything. remove it.
      Signed-off-by: NNick Piggin <npiggin@kernel.dk>
      b5c84bf6
    • N
      fs: Use rename lock and RCU for multi-step operations · 949854d0
      Nick Piggin 提交于
      The remaining usages for dcache_lock is to allow atomic, multi-step read-side
      operations over the directory tree by excluding modifications to the tree.
      Also, to walk in the leaf->root direction in the tree where we don't have
      a natural d_lock ordering.
      
      This could be accomplished by taking every d_lock, but this would mean a
      huge number of locks and actually gets very tricky.
      
      Solve this instead by using the rename seqlock for multi-step read-side
      operations, retry in case of a rename so we don't walk up the wrong parent.
      Concurrent dentry insertions are not serialised against.  Concurrent deletes
      are tricky when walking up the directory: our parent might have been deleted
      when dropping locks so also need to check and retry for that.
      
      We can also use the rename lock in cases where livelock is a worry (and it
      is introduced in subsequent patch).
      Signed-off-by: NNick Piggin <npiggin@kernel.dk>
      949854d0
    • N
      fs: increase d_name lock coverage · 9abca360
      Nick Piggin 提交于
      Cover d_name with d_lock in more cases, where there may be concurrent
      modification to it.
      Signed-off-by: NNick Piggin <npiggin@kernel.dk>
      9abca360
    • N
      fs: scale inode alias list · b23fb0a6
      Nick Piggin 提交于
      Add a new lock, dcache_inode_lock, to protect the inode's i_dentry list
      from concurrent modification. d_alias is also protected by d_lock.
      Signed-off-by: NNick Piggin <npiggin@kernel.dk>
      b23fb0a6
    • N
      fs: dcache scale subdirs · 2fd6b7f5
      Nick Piggin 提交于
      Protect d_subdirs and d_child with d_lock, except in filesystems that aren't
      using dcache_lock for these anyway (eg. using i_mutex).
      
      Note: if we change the locking rule in future so that ->d_child protection is
      provided only with ->d_parent->d_lock, it may allow us to reduce some locking.
      But it would be an exception to an otherwise regular locking scheme, so we'd
      have to see some good results. Probably not worthwhile.
      Signed-off-by: NNick Piggin <npiggin@kernel.dk>
      2fd6b7f5
    • N
      fs: dcache scale d_unhashed · da502956
      Nick Piggin 提交于
      Protect d_unhashed(dentry) condition with d_lock. This means keeping
      DCACHE_UNHASHED bit in synch with hash manipulations.
      Signed-off-by: NNick Piggin <npiggin@kernel.dk>
      da502956
    • N
      fs: dcache scale dentry refcount · b7ab39f6
      Nick Piggin 提交于
      Make d_count non-atomic and protect it with d_lock. This allows us to ensure a
      0 refcount dentry remains 0 without dcache_lock. It is also fairly natural when
      we start protecting many other dentry members with d_lock.
      Signed-off-by: NNick Piggin <npiggin@kernel.dk>
      b7ab39f6
    • N
      fs: dcache scale lru · 23044507
      Nick Piggin 提交于
      Add a new lock, dcache_lru_lock, to protect the dcache LRU list from concurrent
      modification. d_lru is also protected by d_lock, which allows LRU lists to be
      accessed without the lru lock, using RCU in future patches.
      Signed-off-by: NNick Piggin <npiggin@kernel.dk>
      23044507
    • N
      fs: dcache scale hash · 789680d1
      Nick Piggin 提交于
      Add a new lock, dcache_hash_lock, to protect the dcache hash table from
      concurrent modification. d_hash is also protected by d_lock.
      Signed-off-by: NNick Piggin <npiggin@kernel.dk>
      789680d1
    • N
      hostfs: simplify locking · ec2447c2
      Nick Piggin 提交于
      Remove dcache_lock locking from hostfs filesystem, and move it into dcache
      helpers. All that is required is a coherent path name. Protection from
      concurrent modification of the namespace after path name generation is not
      provided in current code, because dcache_lock is dropped before the path is
      used.
      Signed-off-by: NNick Piggin <npiggin@kernel.dk>
      ec2447c2
    • N
      fs: change d_hash for rcu-walk · b1e6a015
      Nick Piggin 提交于
      Change d_hash so it may be called from lock-free RCU lookups. See similar
      patch for d_compare for details.
      
      For in-tree filesystems, this is just a mechanical change.
      Signed-off-by: NNick Piggin <npiggin@kernel.dk>
      b1e6a015
    • N
      fs: change d_compare for rcu-walk · 621e155a
      Nick Piggin 提交于
      Change d_compare so it may be called from lock-free RCU lookups. This
      does put significant restrictions on what may be done from the callback,
      however there don't seem to have been any problems with in-tree fses.
      If some strange use case pops up that _really_ cannot cope with the
      rcu-walk rules, we can just add new rcu-unaware callbacks, which would
      cause name lookup to drop out of rcu-walk mode.
      
      For in-tree filesystems, this is just a mechanical change.
      Signed-off-by: NNick Piggin <npiggin@kernel.dk>
      621e155a
    • N
      fs: name case update method · fb2d5b86
      Nick Piggin 提交于
      smpfs and ncpfs want to update a live dentry name in-place. Rather than
      have them open code the locking, provide a documented dcache API.
      Signed-off-by: NNick Piggin <npiggin@kernel.dk>
      fb2d5b86
    • N
      fs: change d_delete semantics · fe15ce44
      Nick Piggin 提交于
      Change d_delete from a dentry deletion notification to a dentry caching
      advise, more like ->drop_inode. Require it to be constant and idempotent,
      and not take d_lock. This is how all existing filesystems use the callback
      anyway.
      
      This makes fine grained dentry locking of dput and dentry lru scanning
      much simpler.
      Signed-off-by: NNick Piggin <npiggin@kernel.dk>
      fe15ce44
    • N
      fs: use fast counters for vfs caches · 3e880fb5
      Nick Piggin 提交于
      percpu_counter library generates quite nasty code, so unless you need
      to dynamically allocate counters or take fast approximate value, a
      simple per cpu set of counters is much better.
      
      The percpu_counter can never be made to work as well, because it has an
      indirection from pointer to percpu memory, and it can't use direct
      this_cpu_inc interfaces because it doesn't use static PER_CPU data, so
      code will always be worse.
      
      In the fastpath, it is the difference between this:
      
              incl %gs:nr_dentry      # nr_dentry
      
      and this:
      
              movl    percpu_counter_batch(%rip), %edx        # percpu_counter_batch,
              movl    $1, %esi        #,
              movq    $nr_dentry, %rdi        #,
              call    __percpu_counter_add    # (plus I clobber registers)
      
      __percpu_counter_add:
              pushq   %rbp    #
              movq    %rsp, %rbp      #,
              subq    $32, %rsp       #,
              movq    %rbx, -24(%rbp) #,
              movq    %r12, -16(%rbp) #,
              movq    %r13, -8(%rbp)  #,
              movq    %rdi, %rbx      # fbc, fbc
      #APP
      # 216 "/home/npiggin/usr/src/linux-2.6/arch/x86/include/asm/thread_info.h" 1
              movq %gs:kernel_stack,%rax      #, pfo_ret__
      # 0 "" 2
      #NO_APP
              incl    -8124(%rax)     # <variable>.preempt_count
              movq    32(%rdi), %r12  # <variable>.counters, tcp_ptr__
      #APP
      # 78 "lib/percpu_counter.c" 1
              add %gs:this_cpu_off, %r12      # this_cpu_off, tcp_ptr__
      # 0 "" 2
      #NO_APP
              movslq  (%r12),%r13     #* tcp_ptr__, tmp73
              movslq  %edx,%rax       # batch, batch
              addq    %rsi, %r13      # amount, count
              cmpq    %rax, %r13      # batch, count
              jge     .L27    #,
              negl    %edx    # tmp76
              movslq  %edx,%rdx       # tmp76, tmp77
              cmpq    %rdx, %r13      # tmp77, count
              jg      .L28    #,
      .L27:
              movq    %rbx, %rdi      # fbc,
              call    _raw_spin_lock  #
              addq    %r13, 8(%rbx)   # count, <variable>.count
              movq    %rbx, %rdi      # fbc,
              movl    $0, (%r12)      #,* tcp_ptr__
              call    _raw_spin_unlock        #
      .L29:
      #APP
      # 216 "/home/npiggin/usr/src/linux-2.6/arch/x86/include/asm/thread_info.h" 1
              movq %gs:kernel_stack,%rax      #, pfo_ret__
      # 0 "" 2
      #NO_APP
              decl    -8124(%rax)     # <variable>.preempt_count
              movq    -8136(%rax), %rax       #, D.14625
              testb   $8, %al #, D.14625
              jne     .L32    #,
      .L31:
              movq    -24(%rbp), %rbx #,
              movq    -16(%rbp), %r12 #,
              movq    -8(%rbp), %r13  #,
              leave
              ret
              .p2align 4,,10
              .p2align 3
      .L28:
              movl    %r13d, (%r12)   # count,*
              jmp     .L29    #
      .L32:
              call    preempt_schedule        #
              .p2align 4,,6
              jmp     .L31    #
              .size   __percpu_counter_add, .-__percpu_counter_add
              .p2align 4,,15
      Signed-off-by: NNick Piggin <npiggin@kernel.dk>
      3e880fb5
    • N
      vfs: revert per-cpu nr_unused counters for dentry and inodes · 86c8749e
      Nick Piggin 提交于
      The nr_unused counters count the number of objects on an LRU, and as such they
      are synchronized with LRU object insertion and removal and scanning, and
      protected under the LRU lock.
      
      Making it per-cpu does not actually get any concurrency improvements because of
      this lock, and summing the counter is much slower, and
      incrementing/decrementing it costs more code size and is slower too.
      
      These counters should stay per-LRU, which currently means global.
      Signed-off-by: NNick Piggin <npiggin@kernel.dk>
      86c8749e
    • N
      fs: d_validate fixes · 786a5e15
      Nick Piggin 提交于
      d_validate has been broken for a long time.
      
      kmem_ptr_validate does not guarantee that a pointer can be dereferenced
      if it can go away at any time. Even rcu_read_lock doesn't help, because
      the pointer might be queued in RCU callbacks but not executed yet.
      
      So the parent cannot be checked, nor the name hashed. The dentry pointer
      can not be touched until it can be verified under lock. Hashing simply
      cannot be used.
      
      Instead, verify the parent/child relationship by traversing parent's
      d_child list. It's slow, but only ncpfs and the destaged smbfs care
      about it, at this point.
      Signed-off-by: NNick Piggin <npiggin@kernel.dk>
      786a5e15
  2. 05 1月, 2011 1 次提交
  3. 26 10月, 2010 7 次提交
  4. 18 8月, 2010 2 次提交
    • N
      fs: brlock vfsmount_lock · 99b7db7b
      Nick Piggin 提交于
      fs: brlock vfsmount_lock
      
      Use a brlock for the vfsmount lock. It must be taken for write whenever
      modifying the mount hash or associated fields, and may be taken for read when
      performing mount hash lookups.
      
      A new lock is added for the mnt-id allocator, so it doesn't need to take
      the heavy vfsmount write-lock.
      
      The number of atomics should remain the same for fastpath rlock cases, though
      code would be slightly slower due to per-cpu access. Scalability is not not be
      much improved in common cases yet, due to other locks (ie. dcache_lock) getting
      in the way. However path lookups crossing mountpoints should be one case where
      scalability is improved (currently requiring the global lock).
      
      The slowpath is slower due to use of brlock. On a 64 core, 64 socket, 32 node
      Altix system (high latency to remote nodes), a simple umount microbenchmark
      (mount --bind mnt mnt2 ; umount mnt2 loop 1000 times), before this patch it
      took 6.8s, afterwards took 7.1s, about 5% slower.
      
      Cc: Al Viro <viro@ZenIV.linux.org.uk>
      Signed-off-by: NNick Piggin <npiggin@kernel.dk>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      99b7db7b
    • N
      fs: remove extra lookup in __lookup_hash · b04f784e
      Nick Piggin 提交于
      fs: remove extra lookup in __lookup_hash
      
      Optimize lookup for create operations, where no dentry should often be
      common-case. In cases where it is not, such as unlink, the added overhead
      is much smaller than the removed.
      
      Also, move comments about __d_lookup racyness to the __d_lookup call site.
      d_lookup is intuitive; __d_lookup is what needs commenting. So in that same
      vein, add kerneldoc comments to __d_lookup and clean up some of the comments:
      
      - We are interested in how the RCU lookup works here, particularly with
        renames. Make that explicit, and point to the document where it is explained
        in more detail.
      - RCU is pretty standard now, and macros make implementations pretty mindless.
        If we want to know about RCU barrier details, we look in RCU code.
      - Delete some boring legacy comments because we don't care much about how the
        code used to work, more about the interesting parts of how it works now. So
        comments about lazy LRU may be interesting, but would better be done in the
        LRU or refcount management code.
      Signed-off-by: NNick Piggin <npiggin@kernel.dk>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      b04f784e
  5. 15 8月, 2010 1 次提交
  6. 11 8月, 2010 5 次提交
    • M
      vfs: show unreachable paths in getcwd and proc · 8df9d1a4
      Miklos Szeredi 提交于
      Prepend "(unreachable)" to path strings if the path is not reachable
      from the current root.
      
      Two places updated are
       - the return string from getcwd()
       - and symlinks under /proc/$PID.
      
      Other uses of d_path() are left unchanged (we know that some old
      software crashes if /proc/mounts is changed).
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      8df9d1a4
    • M
      vfs: only add " (deleted)" where necessary · ffd1f4ed
      Miklos Szeredi 提交于
      __d_path() has 4 callers:
      
        d_path()
        sys_getcwd()
        seq_path_root()
        tomoyo_realpath_from_path2()
      
      Of these the only one which needs the " (deleted)" ending is d_path().
      
      sys_getcwd() checks for existence before calling __d_path().
      
      seq_path_root() is used to show the mountpoint path in
      /proc/PID/mountinfo, which is always a positive.
      
      And tomoyo doesn't want the deleted ending.
      
      Create a helper "path_with_deleted()" as subsequent patches will need
      this in multiple places.
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      ffd1f4ed
    • M
      vfs: add prepend_path() helper · f2eb6575
      Miklos Szeredi 提交于
      Split off prepend_path() from __d_path().  This new helper takes an
      end-of-buffer pointer and buffer-length pointer just like the other
      prepend_* functions.  Move the " (deleted)" postfix out to __d_path().
      
      This patch doesn't change any functionality but paves the way for the
      following patches.
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      f2eb6575
    • M
      vfs: __d_path: dont prepend the name of the root dentry · 98dc568b
      Miklos Szeredi 提交于
      In the old times pseudo-filesystems set the name of theroot dentry to
      some prefix like "pipe:" and the name of the child dentry to "[123]"
      and relied on a hack in __d_path() to replace the preceding slash with
      the root's name to get "pipe:[123]".
      
      Then the d_dname() dentry operation was introduced which solved the
      same problem without having to pre-fill the name in each dentry.
      
      Currently the following pseudo filesystems exist in the kernel:
      
      perfmon
      mtd
      anon_inode
      bdev
      pipe
      socket
      
      Of these only perfmon, anon_inode, pipe and socket create
      sub-dentries, all of which have now been switched to using d_dname().
      
      bdev and mtd only create inodes.
      
      This means that now the hack to overwrite the slash can be removed, so
      for unreachable paths (e.g. within a detached mount) the path string
      won't be polluted with garbage.  For these cases a subsequent patch
      will add a prefix, indicating that the path is unreachable.
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      98dc568b
    • M
      vfs: add helpers to get root and pwd · f7ad3c6b
      Miklos Szeredi 提交于
      Add three helpers that retrieve a refcounted copy of the root and cwd
      from the supplied fs_struct.
      
       get_fs_root()
       get_fs_pwd()
       get_fs_root_and_pwd()
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      f7ad3c6b
  7. 10 8月, 2010 1 次提交