1. 07 1月, 2011 5 次提交
    • N
      fs: name case update method · fb2d5b86
      Nick Piggin 提交于
      smpfs and ncpfs want to update a live dentry name in-place. Rather than
      have them open code the locking, provide a documented dcache API.
      Signed-off-by: NNick Piggin <npiggin@kernel.dk>
      fb2d5b86
    • N
      fs: change d_delete semantics · fe15ce44
      Nick Piggin 提交于
      Change d_delete from a dentry deletion notification to a dentry caching
      advise, more like ->drop_inode. Require it to be constant and idempotent,
      and not take d_lock. This is how all existing filesystems use the callback
      anyway.
      
      This makes fine grained dentry locking of dput and dentry lru scanning
      much simpler.
      Signed-off-by: NNick Piggin <npiggin@kernel.dk>
      fe15ce44
    • N
      fs: use fast counters for vfs caches · 3e880fb5
      Nick Piggin 提交于
      percpu_counter library generates quite nasty code, so unless you need
      to dynamically allocate counters or take fast approximate value, a
      simple per cpu set of counters is much better.
      
      The percpu_counter can never be made to work as well, because it has an
      indirection from pointer to percpu memory, and it can't use direct
      this_cpu_inc interfaces because it doesn't use static PER_CPU data, so
      code will always be worse.
      
      In the fastpath, it is the difference between this:
      
              incl %gs:nr_dentry      # nr_dentry
      
      and this:
      
              movl    percpu_counter_batch(%rip), %edx        # percpu_counter_batch,
              movl    $1, %esi        #,
              movq    $nr_dentry, %rdi        #,
              call    __percpu_counter_add    # (plus I clobber registers)
      
      __percpu_counter_add:
              pushq   %rbp    #
              movq    %rsp, %rbp      #,
              subq    $32, %rsp       #,
              movq    %rbx, -24(%rbp) #,
              movq    %r12, -16(%rbp) #,
              movq    %r13, -8(%rbp)  #,
              movq    %rdi, %rbx      # fbc, fbc
      #APP
      # 216 "/home/npiggin/usr/src/linux-2.6/arch/x86/include/asm/thread_info.h" 1
              movq %gs:kernel_stack,%rax      #, pfo_ret__
      # 0 "" 2
      #NO_APP
              incl    -8124(%rax)     # <variable>.preempt_count
              movq    32(%rdi), %r12  # <variable>.counters, tcp_ptr__
      #APP
      # 78 "lib/percpu_counter.c" 1
              add %gs:this_cpu_off, %r12      # this_cpu_off, tcp_ptr__
      # 0 "" 2
      #NO_APP
              movslq  (%r12),%r13     #* tcp_ptr__, tmp73
              movslq  %edx,%rax       # batch, batch
              addq    %rsi, %r13      # amount, count
              cmpq    %rax, %r13      # batch, count
              jge     .L27    #,
              negl    %edx    # tmp76
              movslq  %edx,%rdx       # tmp76, tmp77
              cmpq    %rdx, %r13      # tmp77, count
              jg      .L28    #,
      .L27:
              movq    %rbx, %rdi      # fbc,
              call    _raw_spin_lock  #
              addq    %r13, 8(%rbx)   # count, <variable>.count
              movq    %rbx, %rdi      # fbc,
              movl    $0, (%r12)      #,* tcp_ptr__
              call    _raw_spin_unlock        #
      .L29:
      #APP
      # 216 "/home/npiggin/usr/src/linux-2.6/arch/x86/include/asm/thread_info.h" 1
              movq %gs:kernel_stack,%rax      #, pfo_ret__
      # 0 "" 2
      #NO_APP
              decl    -8124(%rax)     # <variable>.preempt_count
              movq    -8136(%rax), %rax       #, D.14625
              testb   $8, %al #, D.14625
              jne     .L32    #,
      .L31:
              movq    -24(%rbp), %rbx #,
              movq    -16(%rbp), %r12 #,
              movq    -8(%rbp), %r13  #,
              leave
              ret
              .p2align 4,,10
              .p2align 3
      .L28:
              movl    %r13d, (%r12)   # count,*
              jmp     .L29    #
      .L32:
              call    preempt_schedule        #
              .p2align 4,,6
              jmp     .L31    #
              .size   __percpu_counter_add, .-__percpu_counter_add
              .p2align 4,,15
      Signed-off-by: NNick Piggin <npiggin@kernel.dk>
      3e880fb5
    • N
      vfs: revert per-cpu nr_unused counters for dentry and inodes · 86c8749e
      Nick Piggin 提交于
      The nr_unused counters count the number of objects on an LRU, and as such they
      are synchronized with LRU object insertion and removal and scanning, and
      protected under the LRU lock.
      
      Making it per-cpu does not actually get any concurrency improvements because of
      this lock, and summing the counter is much slower, and
      incrementing/decrementing it costs more code size and is slower too.
      
      These counters should stay per-LRU, which currently means global.
      Signed-off-by: NNick Piggin <npiggin@kernel.dk>
      86c8749e
    • N
      fs: d_validate fixes · 786a5e15
      Nick Piggin 提交于
      d_validate has been broken for a long time.
      
      kmem_ptr_validate does not guarantee that a pointer can be dereferenced
      if it can go away at any time. Even rcu_read_lock doesn't help, because
      the pointer might be queued in RCU callbacks but not executed yet.
      
      So the parent cannot be checked, nor the name hashed. The dentry pointer
      can not be touched until it can be verified under lock. Hashing simply
      cannot be used.
      
      Instead, verify the parent/child relationship by traversing parent's
      d_child list. It's slow, but only ncpfs and the destaged smbfs care
      about it, at this point.
      Signed-off-by: NNick Piggin <npiggin@kernel.dk>
      786a5e15
  2. 05 1月, 2011 1 次提交
  3. 26 10月, 2010 7 次提交
  4. 18 8月, 2010 2 次提交
    • N
      fs: brlock vfsmount_lock · 99b7db7b
      Nick Piggin 提交于
      fs: brlock vfsmount_lock
      
      Use a brlock for the vfsmount lock. It must be taken for write whenever
      modifying the mount hash or associated fields, and may be taken for read when
      performing mount hash lookups.
      
      A new lock is added for the mnt-id allocator, so it doesn't need to take
      the heavy vfsmount write-lock.
      
      The number of atomics should remain the same for fastpath rlock cases, though
      code would be slightly slower due to per-cpu access. Scalability is not not be
      much improved in common cases yet, due to other locks (ie. dcache_lock) getting
      in the way. However path lookups crossing mountpoints should be one case where
      scalability is improved (currently requiring the global lock).
      
      The slowpath is slower due to use of brlock. On a 64 core, 64 socket, 32 node
      Altix system (high latency to remote nodes), a simple umount microbenchmark
      (mount --bind mnt mnt2 ; umount mnt2 loop 1000 times), before this patch it
      took 6.8s, afterwards took 7.1s, about 5% slower.
      
      Cc: Al Viro <viro@ZenIV.linux.org.uk>
      Signed-off-by: NNick Piggin <npiggin@kernel.dk>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      99b7db7b
    • N
      fs: remove extra lookup in __lookup_hash · b04f784e
      Nick Piggin 提交于
      fs: remove extra lookup in __lookup_hash
      
      Optimize lookup for create operations, where no dentry should often be
      common-case. In cases where it is not, such as unlink, the added overhead
      is much smaller than the removed.
      
      Also, move comments about __d_lookup racyness to the __d_lookup call site.
      d_lookup is intuitive; __d_lookup is what needs commenting. So in that same
      vein, add kerneldoc comments to __d_lookup and clean up some of the comments:
      
      - We are interested in how the RCU lookup works here, particularly with
        renames. Make that explicit, and point to the document where it is explained
        in more detail.
      - RCU is pretty standard now, and macros make implementations pretty mindless.
        If we want to know about RCU barrier details, we look in RCU code.
      - Delete some boring legacy comments because we don't care much about how the
        code used to work, more about the interesting parts of how it works now. So
        comments about lazy LRU may be interesting, but would better be done in the
        LRU or refcount management code.
      Signed-off-by: NNick Piggin <npiggin@kernel.dk>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      b04f784e
  5. 15 8月, 2010 1 次提交
  6. 11 8月, 2010 5 次提交
    • M
      vfs: show unreachable paths in getcwd and proc · 8df9d1a4
      Miklos Szeredi 提交于
      Prepend "(unreachable)" to path strings if the path is not reachable
      from the current root.
      
      Two places updated are
       - the return string from getcwd()
       - and symlinks under /proc/$PID.
      
      Other uses of d_path() are left unchanged (we know that some old
      software crashes if /proc/mounts is changed).
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      8df9d1a4
    • M
      vfs: only add " (deleted)" where necessary · ffd1f4ed
      Miklos Szeredi 提交于
      __d_path() has 4 callers:
      
        d_path()
        sys_getcwd()
        seq_path_root()
        tomoyo_realpath_from_path2()
      
      Of these the only one which needs the " (deleted)" ending is d_path().
      
      sys_getcwd() checks for existence before calling __d_path().
      
      seq_path_root() is used to show the mountpoint path in
      /proc/PID/mountinfo, which is always a positive.
      
      And tomoyo doesn't want the deleted ending.
      
      Create a helper "path_with_deleted()" as subsequent patches will need
      this in multiple places.
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      ffd1f4ed
    • M
      vfs: add prepend_path() helper · f2eb6575
      Miklos Szeredi 提交于
      Split off prepend_path() from __d_path().  This new helper takes an
      end-of-buffer pointer and buffer-length pointer just like the other
      prepend_* functions.  Move the " (deleted)" postfix out to __d_path().
      
      This patch doesn't change any functionality but paves the way for the
      following patches.
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      f2eb6575
    • M
      vfs: __d_path: dont prepend the name of the root dentry · 98dc568b
      Miklos Szeredi 提交于
      In the old times pseudo-filesystems set the name of theroot dentry to
      some prefix like "pipe:" and the name of the child dentry to "[123]"
      and relied on a hack in __d_path() to replace the preceding slash with
      the root's name to get "pipe:[123]".
      
      Then the d_dname() dentry operation was introduced which solved the
      same problem without having to pre-fill the name in each dentry.
      
      Currently the following pseudo filesystems exist in the kernel:
      
      perfmon
      mtd
      anon_inode
      bdev
      pipe
      socket
      
      Of these only perfmon, anon_inode, pipe and socket create
      sub-dentries, all of which have now been switched to using d_dname().
      
      bdev and mtd only create inodes.
      
      This means that now the hack to overwrite the slash can be removed, so
      for unreachable paths (e.g. within a detached mount) the path string
      won't be polluted with garbage.  For these cases a subsequent patch
      will add a prefix, indicating that the path is unreachable.
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      98dc568b
    • M
      vfs: add helpers to get root and pwd · f7ad3c6b
      Miklos Szeredi 提交于
      Add three helpers that retrieve a refcounted copy of the root and cwd
      from the supplied fs_struct.
      
       get_fs_root()
       get_fs_pwd()
       get_fs_root_and_pwd()
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      f7ad3c6b
  7. 10 8月, 2010 2 次提交
  8. 19 7月, 2010 1 次提交
    • D
      mm: add context argument to shrinker callback · 7f8275d0
      Dave Chinner 提交于
      The current shrinker implementation requires the registered callback
      to have global state to work from. This makes it difficult to shrink
      caches that are not global (e.g. per-filesystem caches). Pass the shrinker
      structure to the callback so that users can embed the shrinker structure
      in the context the shrinker needs to operate on and get back to it in the
      callback via container_of().
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      7f8275d0
  9. 30 6月, 2010 1 次提交
    • N
      fs: fix superblock iteration race · 57439f87
      npiggin@suse.de 提交于
      list_for_each_entry_safe is not suitable to protect against concurrent
      modification of the list. 6754af64 introduced a race in sb walking.
      
      list_for_each_entry can use the trick of pinning the current entry in
      the list before we drop and retake the lock because it subsequently
      follows cur->next. However list_for_each_entry_safe saves n=cur->next
      for following before entering the loop body, so when the lock is
      dropped, n may be deleted.
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: John Stultz <johnstul@us.ibm.com>
      Cc: Frank Mayhar <fmayhar@google.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      57439f87
  10. 22 5月, 2010 3 次提交
  11. 04 3月, 2010 3 次提交
  12. 17 12月, 2009 1 次提交
  13. 18 7月, 2009 1 次提交
    • F
      sched: Pull up the might_sleep() check into cond_resched() · 613afbf8
      Frederic Weisbecker 提交于
      might_sleep() is called late-ish in cond_resched(), after the
      need_resched()/preempt enabled/system running tests are
      checked.
      
      It's better to check the sleeps while atomic earlier and not
      depend on some environment datas that reduce the chances to
      detect a problem.
      
      Also define cond_resched_*() helpers as macros, so that the
      FILE/LINE reported in the sleeping while atomic warning
      displays the real origin and not sched.h
      
      Changes in v2:
      
       - Call __might_sleep() directly instead of might_sleep() which
         may call cond_resched()
      
       - Turn cond_resched() into a macro so that the file:line
         couple reported refers to the caller of cond_resched() and
         not __cond_resched() itself.
      
      Changes in v3:
      
       - Also propagate this __might_sleep() pull up to
         cond_resched_lock() and cond_resched_softirq()
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <1247725694-6082-6-git-send-email-fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      613afbf8
  14. 12 6月, 2009 1 次提交
  15. 09 5月, 2009 1 次提交
  16. 21 4月, 2009 1 次提交
  17. 01 4月, 2009 2 次提交
  18. 28 3月, 2009 1 次提交
  19. 28 2月, 2009 1 次提交