1. 16 11月, 2019 1 次提交
    • A
      fix dget_parent() fastpath race · e8400933
      Al Viro 提交于
      We are overoptimistic about taking the fast path there; seeing
      the same value in ->d_parent after having grabbed a reference
      to that parent does *not* mean that it has remained our parent
      all along.
      
      That wouldn't be a big deal (in the end it is our parent and
      we have grabbed the reference we are about to return), but...
      the situation with barriers is messed up.
      
      We might have hit the following sequence:
      
      d is a dentry of /tmp/a/b
      CPU1:					CPU2:
      parent = d->d_parent (i.e. dentry of /tmp/a)
      					rename /tmp/a/b to /tmp/b
      					rmdir /tmp/a, making its dentry negative
      grab reference to parent,
      end up with cached parent->d_inode (NULL)
      					mkdir /tmp/a, rename /tmp/b to /tmp/a/b
      recheck d->d_parent, which is back to original
      decide that everything's fine and return the reference we'd got.
      
      The trouble is, caller (on CPU1) will observe dget_parent()
      returning an apparently negative dentry.  It actually is positive,
      but CPU1 has stale ->d_inode cached.
      
      Use d->d_seq to see if it has been moved instead of rechecking ->d_parent.
      NOTE: we are *NOT* going to retry on any kind of ->d_seq mismatch;
      we just go into the slow path in such case.  We don't wait for ->d_seq
      to become even either - again, if we are racing with renames, we
      can bloody well go to slow path anyway.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      e8400933
  2. 10 7月, 2019 1 次提交
    • A
      Teach shrink_dcache_parent() to cope with mixed-filesystem shrink lists · 9bdebc2b
      Al Viro 提交于
      Currently, running into a shrink list that contains dentries from different
      filesystems can cause several unpleasant things for shrink_dcache_parent()
      and for umount(2).
      
      The first problem is that there's a window during shrink_dentry_list() between
      __dentry_kill() takes a victim out and dropping reference to its parent.  During
      that window the parent looks like a genuine busy dentry.  shrink_dcache_parent()
      (or, worse yet, shrink_dcache_for_umount()) coming at that time will see no
      eviction candidates and no indication that it needs to wait for some
      shrink_dentry_list() to proceed further.
      
      That applies for any shrink list that might intersect with the subtree we are
      trying to shrink; the only reason it does not blow on umount(2) in the mainline
      is that we unregister the memory shrinker before hitting shrink_dcache_for_umount().
      
      Another problem happens if something in a mixed-filesystem shrink list gets
      be stuck in e.g. iput(), getting umount of unrelated fs to spin waiting for
      the stuck shrinker to get around to our dentries.
      
      Solution:
              1) have shrink_dentry_list() decrement the parent's refcount and
      make sure it's on a shrink list (ours unless it already had been on some
      other) before calling __dentry_kill().  That eliminates the window when
      shrink_dcache_parent() would've blown past the entire subtree without
      noticing anything with zero refcount not on shrink lists.
      	2) when shrink_dcache_parent() has found no eviction candidates,
      but some dentries are still sitting on shrink lists, rather than
      repeating the scan in hope that shrinkers have progressed, scan looking
      for something on shrink lists with zero refcount.  If such a thing is
      found, grab rcu_read_lock() and stop the scan, with caller locking
      it for eviction, dropping out of RCU and doing __dentry_kill(), with
      the same treatment for parent as shrink_dentry_list() would do.
      
      Note that right now mixed-filesystem shrink lists do not occur, so this
      is not a mainline bug.  Howevere, there's a bunch of uses for such
      beasts (e.g. the "try and evict everything we can out of given page"
      patches; there are potential uses in mount-related code, considerably
      simplifying the life in fs/namespace.c, etc.)
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      9bdebc2b
  3. 20 6月, 2019 1 次提交
    • A
      fsnotify: move fsnotify_nameremove() hook out of d_delete() · 49246466
      Amir Goldstein 提交于
      d_delete() was piggy backed for the fsnotify_nameremove() hook when
      in fact not all callers of d_delete() care about fsnotify events.
      
      For all callers of d_delete() that may be interested in fsnotify events,
      we made sure to call one of fsnotify_{unlink,rmdir}() hooks before
      calling d_delete().
      
      Now we can move the fsnotify_nameremove() call from d_delete() to the
      fsnotify_{unlink,rmdir}() hooks.
      
      Two explicit calls to fsnotify_nameremove() from nfs/afs sillyrename
      are also removed. This will cause a change of behavior - nfs/afs will
      NOT generate an fsnotify delete event when renaming over a positive
      dentry.  This change is desirable, because it is consistent with the
      behavior of all other filesystems.
      Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      49246466
  4. 21 5月, 2019 1 次提交
  5. 27 4月, 2019 1 次提交
  6. 17 4月, 2019 1 次提交
  7. 10 4月, 2019 2 次提交
    • A
      unexport d_alloc_pseudo() · ab1152dd
      Al Viro 提交于
      No modular uses since introducion of alloc_file_pseudo(),
      and the only non-modular user not in alloc_file_pseudo()
      had actually been wrong - should've been d_alloc_anon().
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      ab1152dd
    • A
      dcache: sort the freeing-without-RCU-delay mess for good. · 5467a68c
      Al Viro 提交于
      For lockless accesses to dentries we don't have pinned we rely
      (among other things) upon having an RCU delay between dropping
      the last reference and actually freeing the memory.
      
      On the other hand, for things like pipes and sockets we neither
      do that kind of lockless access, nor want to deal with the
      overhead of an RCU delay every time a socket gets closed.
      
      So delay was made optional - setting DCACHE_RCUACCESS in ->d_flags
      made sure it would happen.  We tried to avoid setting it unless
      we knew we need it.  Unfortunately, that had led to recurring
      class of bugs, in which we missed the need to set it.
      
      We only really need it for dentries that are created by
      d_alloc_pseudo(), so let's not bother with trying to be smart -
      just make having an RCU delay the default.  The ones that do
      *not* get it set the replacement flag (DCACHE_NORCU) and we'd
      better use that sparingly.  d_alloc_pseudo() is the only
      such user right now.
      
      FWIW, the race that finally prompted that switch had been
      between __lock_parent() of immediate subdirectory of what's
      currently the root of a disconnected tree (e.g. from
      open-by-handle in progress) racing with d_splice_alias()
      elsewhere picking another alias for the same inode, either
      on outright corrupted fs image, or (in case of open-by-handle
      on NFS) that subdirectory having been just moved on server.
      It's not easy to hit, so the sky is not falling, but that's
      not the first race on similar missed cases and the logics
      for settinf DCACHE_RCUACCESS has gotten ridiculously
      convoluted.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      5467a68c
  8. 31 1月, 2019 2 次提交
    • W
      fs/dcache: Track & report number of negative dentries · af0c9af1
      Waiman Long 提交于
      The current dentry number tracking code doesn't distinguish between
      positive & negative dentries.  It just reports the total number of
      dentries in the LRU lists.
      
      As excessive number of negative dentries can have an impact on system
      performance, it will be wise to track the number of positive and
      negative dentries separately.
      
      This patch adds tracking for the total number of negative dentries in
      the system LRU lists and reports it in the 5th field in the
      /proc/sys/fs/dentry-state file.  The number, however, does not include
      negative dentries that are in flight but not in the LRU yet as well as
      those in the shrinker lists which are on the way out anyway.
      
      The number of positive dentries in the LRU lists can be roughly found by
      subtracting the number of negative dentries from the unused count.
      
      Matthew Wilcox had confirmed that since the introduction of the
      dentry_stat structure in 2.1.60, the dummy array was there, probably for
      future extension.  They were not replacements of pre-existing fields.
      So no sane applications that read the value of /proc/sys/fs/dentry-state
      will do dummy thing if the last 2 fields of the sysctl parameter are not
      zero.  IOW, it will be safe to use one of the dummy array entry for
      negative dentry count.
      Signed-off-by: NWaiman Long <longman@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      af0c9af1
    • W
      fs/dcache: Fix incorrect nr_dentry_unused accounting in shrink_dcache_sb() · 1dbd449c
      Waiman Long 提交于
      The nr_dentry_unused per-cpu counter tracks dentries in both the LRU
      lists and the shrink lists where the DCACHE_LRU_LIST bit is set.
      
      The shrink_dcache_sb() function moves dentries from the LRU list to a
      shrink list and subtracts the dentry count from nr_dentry_unused.  This
      is incorrect as the nr_dentry_unused count will also be decremented in
      shrink_dentry_list() via d_shrink_del().
      
      To fix this double decrement, the decrement in the shrink_dcache_sb()
      function is taken out.
      
      Fixes: 4e717f5c ("list_lru: remove special case function list_lru_dispose_all."
      Cc: stable@kernel.org
      Signed-off-by: NWaiman Long <longman@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1dbd449c
  9. 31 10月, 2018 1 次提交
  10. 27 10月, 2018 1 次提交
  11. 18 8月, 2018 1 次提交
    • T
      fs/dcache.c: fix kmemcheck splat at take_dentry_name_snapshot() · 6cd00a01
      Tetsuo Handa 提交于
      Since only dentry->d_name.len + 1 bytes out of DNAME_INLINE_LEN bytes
      are initialized at __d_alloc(), we can't copy the whole size
      unconditionally.
      
       WARNING: kmemcheck: Caught 32-bit read from uninitialized memory (ffff8fa27465ac50)
       636f6e66696766732e746d70000000000010000000000000020000000188ffff
        i i i i i i i i i i i i i u u u u u u u u u u i i i i i u u u u
                                        ^
       RIP: 0010:take_dentry_name_snapshot+0x28/0x50
       RSP: 0018:ffffa83000f5bdf8 EFLAGS: 00010246
       RAX: 0000000000000020 RBX: ffff8fa274b20550 RCX: 0000000000000002
       RDX: ffffa83000f5be40 RSI: ffff8fa27465ac50 RDI: ffffa83000f5be60
       RBP: ffffa83000f5bdf8 R08: ffffa83000f5be48 R09: 0000000000000001
       R10: ffff8fa27465ac00 R11: ffff8fa27465acc0 R12: ffff8fa27465ac00
       R13: ffff8fa27465acc0 R14: 0000000000000000 R15: 0000000000000000
       FS:  00007f79737ac8c0(0000) GS:ffffffff8fc30000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: ffff8fa274c0b000 CR3: 0000000134aa7002 CR4: 00000000000606f0
        take_dentry_name_snapshot+0x28/0x50
        vfs_rename+0x128/0x870
        SyS_rename+0x3b2/0x3d0
        entry_SYSCALL_64_fastpath+0x1a/0xa4
        0xffffffffffffffff
      
      Link: http://lkml.kernel.org/r/201709131912.GBG39012.QMJLOVFSFFOOtH@I-love.SAKURA.ne.jpSigned-off-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Cc: Vegard Nossum <vegard.nossum@gmail.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6cd00a01
  12. 10 8月, 2018 1 次提交
    • A
      make sure that __dentry_kill() always invalidates d_seq, unhashed or not · 4c0d7cd5
      Al Viro 提交于
      RCU pathwalk relies upon the assumption that anything that changes
      ->d_inode of a dentry will invalidate its ->d_seq.  That's almost
      true - the one exception is that the final dput() of already unhashed
      dentry does *not* touch ->d_seq at all.  Unhashing does, though,
      so for anything we'd found by RCU dcache lookup we are fine.
      Unfortunately, we can *start* with an unhashed dentry or jump into
      it.
      
      We could try and be careful in the (few) places where that could
      happen.  Or we could just make the final dput() invalidate the damn
      thing, unhashed or not.  The latter is much simpler and easier to
      backport, so let's do it that way.
      Reported-by: N"Dae R. Jeong" <threeearcat@gmail.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      4c0d7cd5
  13. 06 8月, 2018 2 次提交
    • A
      root dentries need RCU-delayed freeing · 90bad5e0
      Al Viro 提交于
      Since mountpoint crossing can happen without leaving lazy mode,
      root dentries do need the same protection against having their
      memory freed without RCU delay as everything else in the tree.
      
      It's partially hidden by RCU delay between detaching from the
      mount tree and dropping the vfsmount reference, but the starting
      point of pathwalk can be on an already detached mount, in which
      case umount-caused RCU delay has already passed by the time the
      lazy pathwalk grabs rcu_read_lock().  If the starting point
      happens to be at the root of that vfsmount *and* that vfsmount
      covers the entire filesystem, we get trouble.
      
      Fixes: 48a066e7 ("RCU'd vsfmounts")
      Cc: stable@vger.kernel.org
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      90bad5e0
    • G
      fs: dcache: Use true and false for boolean values · 7964410f
      Gustavo A. R. Silva 提交于
      Return statements in functions returning bool should use true or false
      instead of an integer value.
      
      This issue was detected with the help of Coccinelle.
      Signed-off-by: NGustavo A. R. Silva <gustavo@embeddedor.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      7964410f
  14. 04 8月, 2018 1 次提交
    • A
      new primitive: discard_new_inode() · c2b6d621
      Al Viro 提交于
      	We don't want open-by-handle picking half-set-up in-core
      struct inode from e.g. mkdir() having failed halfway through.
      In other words, we don't want such inodes returned by iget_locked()
      on their way to extinction.  However, we can't just have them
      unhashed - otherwise open-by-handle immediately *after* that would've
      ended up creating a new in-core inode over the on-disk one that
      is in process of being freed right under us.
      
      	Solution: new flag (I_CREATING) set by insert_inode_locked() and
      removed by unlock_new_inode() and a new primitive (discard_new_inode())
      to be used by such halfway-through-setup failure exits instead of
      unlock_new_inode() / iput() combinations.  That primitive unlocks new
      inode, but leaves I_CREATING in place.
      
      	iget_locked() treats finding an I_CREATING inode as failure
      (-ESTALE, once we sort out the error propagation).
      	insert_inode_locked() treats the same as instant -EBUSY.
      	ilookup() treats those as icache miss.
      
      [Fix by Dan Carpenter <dan.carpenter@oracle.com> folded in]
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      c2b6d621
  15. 02 8月, 2018 1 次提交
    • A
      kill d_instantiate_no_diralias() · c971e6a0
      Al Viro 提交于
      The only user is fuse_create_new_entry(), and there it's used to
      mitigate the same mkdir/open-by-handle race as in nfs_mkdir().
      The same solution applies - unhash the mkdir argument, then
      call d_splice_alias() and if that returns a reference to preexisting
      alias, dput() and report success.  ->mkdir() argument left unhashed
      negative with the preexisting alias moved in the right place is just
      fine from the ->mkdir() callers point of view.
      
      Cc: Miklos Szeredi <miklos@szeredi.hu>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      c971e6a0
  16. 24 6月, 2018 1 次提交
  17. 14 5月, 2018 1 次提交
    • A
      get rid of dead code in d_find_alias() · 61fec493
      Al Viro 提交于
      All "try disconnected alias if nothing else fits" logics in d_find_alias()
      got accidentally disabled by Neil a while ago; for most of the callers it
      was the right thing to do, so fixes belong in few callers that *do* want
      disconnected aliases.  This just takes the now-dead code in d_find_alias()
      out.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      61fec493
  18. 12 5月, 2018 1 次提交
    • A
      do d_instantiate/unlock_new_inode combinations safely · 1e2e547a
      Al Viro 提交于
      For anything NFS-exported we do _not_ want to unlock new inode
      before it has grown an alias; original set of fixes got the
      ordering right, but missed the nasty complication in case of
      lockdep being enabled - unlock_new_inode() does
      	lockdep_annotate_inode_mutex_key(inode)
      which can only be done before anyone gets a chance to touch
      ->i_mutex.  Unfortunately, flipping the order and doing
      unlock_new_inode() before d_instantiate() opens a window when
      mkdir can race with open-by-fhandle on a guessed fhandle, leading
      to multiple aliases for a directory inode and all the breakage
      that follows from that.
      
      	Correct solution: a new primitive (d_instantiate_new())
      combining these two in the right order - lockdep annotate, then
      d_instantiate(), then the rest of unlock_new_inode().  All
      combinations of d_instantiate() with unlock_new_inode() should
      be converted to that.
      
      Cc: stable@kernel.org	# 2.6.29 and later
      Tested-by: NMike Marshall <hubcap@omnibond.com>
      Reviewed-by: NAndreas Dilger <adilger@dilger.ca>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      1e2e547a
  19. 20 4月, 2018 1 次提交
  20. 16 4月, 2018 4 次提交
  21. 12 4月, 2018 2 次提交
    • N
      fs/dcache.c: add cond_resched() in shrink_dentry_list() · 32785c05
      Nikolay Borisov 提交于
      As previously reported (https://patchwork.kernel.org/patch/8642031/)
      it's possible to call shrink_dentry_list with a large number of dentries
      (> 10000).  This, in turn, could trigger the softlockup detector and
      possibly trigger a panic.  In addition to the unmount path being
      vulnerable to this scenario, at SuSE we've observed similar situation
      happening during process exit on processes that touch a lot of dentries.
      Here is an excerpt from a crash dump.  The number after the colon are
      the number of dentries on the list passed to shrink_dentry_list:
      
      PID 99760: 10722
      PID 107530: 215
      PID 108809: 24134
      PID 108877: 21331
      PID 141708: 16487
      
      So we want to kill between 15k-25k dentries without yielding.
      
      And one possible call stack looks like:
      
      4 [ffff8839ece41db0] _raw_spin_lock at ffffffff8152a5f8
      5 [ffff8839ece41db0] evict at ffffffff811c3026
      6 [ffff8839ece41dd0] __dentry_kill at ffffffff811bf258
      7 [ffff8839ece41df0] shrink_dentry_list at ffffffff811bf593
      8 [ffff8839ece41e18] shrink_dcache_parent at ffffffff811bf830
      9 [ffff8839ece41e50] proc_flush_task at ffffffff8120dd61
      10 [ffff8839ece41ec0] release_task at ffffffff81059ebd
      11 [ffff8839ece41f08] do_exit at ffffffff8105b8ce
      12 [ffff8839ece41f78] sys_exit at ffffffff8105bd53
      13 [ffff8839ece41f80] system_call_fastpath at ffffffff81532909
      
      While some of the callers of shrink_dentry_list do use cond_resched,
      this is not sufficient to prevent softlockups.  So just move
      cond_resched into shrink_dentry_list from its callers.
      
      David said: I've found hundreds of occurrences of warnings that we emit
      when need_resched stays set for a prolonged period of time with the
      stack trace that is included in the change log.
      
      Link: http://lkml.kernel.org/r/1521718946-31521-1-git-send-email-nborisov@suse.comSigned-off-by: NNikolay Borisov <nborisov@suse.com>
      Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Goldwyn Rodrigues <rgoldwyn@suse.de>
      Cc: Jeff Mahoney <jeffm@suse.com>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      32785c05
    • R
      dcache: account external names as indirectly reclaimable memory · f1782c9b
      Roman Gushchin 提交于
      I received a report about suspicious growth of unreclaimable slabs on
      some machines.  I've found that it happens on machines with low memory
      pressure, and these unreclaimable slabs are external names attached to
      dentries.
      
      External names are allocated using generic kmalloc() function, so they
      are accounted as unreclaimable.  But they are held by dentries, which
      are reclaimable, and they will be reclaimed under the memory pressure.
      
      In particular, this breaks MemAvailable calculation, as it doesn't take
      unreclaimable slabs into account.  This leads to a silly situation, when
      a machine is almost idle, has no memory pressure and therefore has a big
      dentry cache.  And the resulting MemAvailable is too low to start a new
      workload.
      
      To address the issue, the NR_INDIRECTLY_RECLAIMABLE_BYTES counter is
      used to track the amount of memory, consumed by external names.  The
      counter is increased in the dentry allocation path, if an external name
      structure is allocated; and it's decreased in the dentry freeing path.
      
      To reproduce the problem I've used the following Python script:
      
        import os
      
        for iter in range (0, 10000000):
            try:
                name = ("/some_long_name_%d" % iter) + "_" * 220
                os.stat(name)
            except Exception:
                pass
      
      Without this patch:
        $ cat /proc/meminfo | grep MemAvailable
        MemAvailable:    7811688 kB
        $ python indirect.py
        $ cat /proc/meminfo | grep MemAvailable
        MemAvailable:    2753052 kB
      
      With the patch:
        $ cat /proc/meminfo | grep MemAvailable
        MemAvailable:    7809516 kB
        $ python indirect.py
        $ cat /proc/meminfo | grep MemAvailable
        MemAvailable:    7749144 kB
      
      [guro@fb.com: fix indirectly reclaimable memory accounting for CONFIG_SLOB]
        Link: http://lkml.kernel.org/r/20180312194140.19517-1-guro@fb.com
      [guro@fb.com: fix indirectly reclaimable memory accounting]
        Link: http://lkml.kernel.org/r/20180313125701.7955-1-guro@fb.com
      Link: http://lkml.kernel.org/r/20180305133743.12746-5-guro@fb.comSigned-off-by: NRoman Gushchin <guro@fb.com>
      Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f1782c9b
  22. 30 3月, 2018 12 次提交
    • A
      d_genocide: move export to definition · cbd4a5bc
      Al Viro 提交于
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      cbd4a5bc
    • A
      42177007
    • A
      make non-exchanging __d_move() copy ->d_parent rather than swap them · 076515fc
      Al Viro 提交于
      Currently d_move(from, to) does the following:
      	* name/parent of from <- old name/parent of to, from hashed there
      	* to is unhashed
      	* name of to is preserved
      	* if from used to be detached, to gets detached
      	* if from used to be attached, parent of to <- old parent of from.
      
      That's both user-visibly bogus and complicates reasoning a lot.
      Much saner semantics would be
      	* name/parent of from <- name/parent of to, from hashed there.
      	* to is unhashed
      	* name/parent of to is unchanged.
      
      The price, of course, is that old parent of from might lose a reference.
      However,
      	* all potentially cross-directory callers of d_move() have both
      parents pinned directly; typically, dentries themselves are grabbed
      only after we have grabbed and locked both parents.  IOW, the decrement
      of old parent's refcount in case of d_move() won't reach zero.
      	* __d_move() from d_splice_alias() is done to detached alias.
      No refcount decrements in that case
      	* __d_move() from __d_unalias() *can* get the refcount to zero.
      So let's grab a reference to alias' old parent before calling __d_unalias()
      and dput() it after we'd dropped rename_lock.
      
      That does make d_splice_alias() potentially blocking.  However, it has
      no callers in non-sleepable contexts (and the case where we'd grown
      that dget/dput pair is _very_ rare, so performance is not an issue).
      
      Another thing that needs adjustment is unlocking in the end of __d_move();
      folded it in.  And cleaned the remnants of bogus ordering from the
      "lock them in the beginning" counterpart - it's never been right and
      now (well, for 7 years now) we have that thing always serialized on
      rename_lock anyway.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      076515fc
    • A
      split d_path() and friends into a separate file · 7a5cf791
      Al Viro 提交于
      Those parts of fs/dcache.c are pretty much self-contained.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      7a5cf791
    • A
      dcache.c: trim includes · 43986d63
      Al Viro 提交于
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      43986d63
    • J
      fs/dcache: Avoid a try_lock loop in shrink_dentry_list() · 8f04da2a
      John Ogness 提交于
      shrink_dentry_list() holds dentry->d_lock and needs to acquire
      dentry->d_inode->i_lock. This cannot be done with a spin_lock()
      operation because it's the reverse of the regular lock order.
      To avoid ABBA deadlocks it is done with a trylock loop.
      
      Trylock loops are problematic in two scenarios:
      
        1) PREEMPT_RT converts spinlocks to 'sleeping' spinlocks, which are
           preemptible. As a consequence the i_lock holder can be preempted
           by a higher priority task. If that task executes the trylock loop
           it will do so forever and live lock.
      
        2) In virtual machines trylock loops are problematic as well. The
           VCPU on which the i_lock holder runs can be scheduled out and a
           task on a different VCPU can loop for a whole time slice. In the
           worst case this can lead to starvation. Commits 47be6184
           ("fs/dcache.c: avoid soft-lockup in dput()") and 046b961b
           ("shrink_dentry_list(): take parent's d_lock earlier") are
           addressing exactly those symptoms.
      
      Avoid the trylock loop by using dentry_kill(). When pruning ancestors,
      the same code applies that is used to kill a dentry in dput(). This
      also has the benefit that the locking order is now the same. First
      the inode is locked, then the parent.
      Signed-off-by: NJohn Ogness <john.ogness@linutronix.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      8f04da2a
    • A
      get rid of trylock loop around dentry_kill() · f657a666
      Al Viro 提交于
      In case when trylock in there fails, deal with it directly in
      dentry_kill().  Note that in cases when we drop and retake
      ->d_lock, we need to recheck whether to retain the dentry.
      Another thing is that dropping/retaking ->d_lock might have
      ended up with negative dentry turning into positive; that,
      of course, can happen only once...
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      f657a666
    • A
      handle move to LRU in retain_dentry() · 62d9956c
      Al Viro 提交于
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      62d9956c
    • A
    • A
      split the slow part of lock_parent() off · 8b987a46
      Al Viro 提交于
      Turn the "trylock failed" part into uninlined __lock_parent().
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      8b987a46
    • A
      now lock_parent() can't run into killed dentry · 65d8eb5a
      Al Viro 提交于
      all remaining callers hold either a reference or ->i_lock
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      65d8eb5a
    • A
      get rid of trylock loop in locking dentries on shrink list · 3b3f09f4
      Al Viro 提交于
      In case of trylock failure don't re-add to the list - drop the locks
      and carefully get them in the right order.  For shrink_dentry_list(),
      somebody having grabbed a reference to dentry means that we can
      kick it off-list, so if we find dentry being modified under us we
      don't need to play silly buggers with retries anyway - off the list
      it is.
      
      The locking logics taken out into a helper of its own; lock_parent()
      is no longer used for dentries that can be killed under us.
      
      [fix from Eric Biggers folded]
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      3b3f09f4