提交 · 8d80d7dabe9668965574669afbd31733f7b0fe9b · openanolis / cloud-kernel

08 8月, 2014 7 次提交

dcache: d_find_alias needn't recheck IS_ROOT && DCACHE_DISCONNECTED · 8d80d7da

由 J. Bruce Fields 提交于 1月 16, 2014

If we get to this point and discover the dentry is not a root dentry, or
not DCACHE_DISCONNECTED--great, we always prefer that anyway.
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

8d80d7da

dcache: remove unused d_find_alias parameter · 52ed46f0

由 J. Bruce Fields 提交于 1月 16, 2014

Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

52ed46f0

dcache: d_obtain_alias callers don't all want DISCONNECTED · 1a0a397e

由 J. Bruce Fields 提交于 2月 14, 2014

There are a few d_obtain_alias callers that are using it to get the
root of a filesystem which may already have an alias somewhere else.

This is not the same as the filehandle-lookup case, and none of them
actually need DCACHE_DISCONNECTED set.

It isn't really a serious problem, but it would really be clearer if we
reserved DCACHE_DISCONNECTED for those cases where it's actually needed.

In the btrfs case this was causing a spurious printk from
nfsd/nfsfh.c:fh_verify when it found an unexpected DCACHE_DISCONNECTED
dentry.  Josef worked around this by unsetting DCACHE_DISCONNECTED
manually in 3a0dfa6a "Btrfs: unset DCACHE_DISCONNECTED when mounting
default subvol", and this replaces that workaround.

Cc: Josef Bacik <jbacik@fb.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

1a0a397e

dcache: d_splice_alias should ignore DCACHE_DISCONNECTED · da093a9b

由 J. Bruce Fields 提交于 2月 17, 2014

Any IS_ROOT() alias should be safe to use; there's nothing special about
DCACHE_DISCONNECTED dentries.

Note that this is in fact useful for filesystems such as btrfs which can
legimately encounter a directory with a preexisting IS_ROOT alias on a
lookup that crosses into a subvolume.  (Those aliases are currently
marked DCACHE_DISCONNECTED--but not really for any good reason, and
we'll change that soon.)
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

da093a9b

dcache: d_splice_alias mustn't create directory aliases · 908790fa

由 J. Bruce Fields 提交于 2月 17, 2014

Currently if d_splice_alias finds a directory with an alias that is not
IS_ROOT or not DCACHE_DISCONNECTED, it creates a duplicate directory.

Duplicate directory dentries are unacceptable; it is better just to
error out.

(In the case of a local filesystem the most likely case is filesystem
corruption: for example, perhaps two directories point to the same child
directory, and the other parent has already been found and cached.)

Note that distributed filesystems may encounter this case in normal
operation if a remote host moves a directory to a location different
from the one we last cached in the dcache.  For that reason, such
filesystems should instead use d_materialise_unique, which tries to move
the old directory alias to the right place instead of erroring out.
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

908790fa

dcache: close d_move race in d_splice_alias · 75a2352d

由 J. Bruce Fields 提交于 2月 17, 2014

d_splice_alias will d_move an IS_ROOT() directory dentry into place if
one exists.  This should be safe as long as the dentry remains IS_ROOT,
but I can't see what guarantees that: once we drop the i_lock all we
hold here is the i_mutex on an unrelated parent directory.

Instead copy the logic of d_materialise_unique.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

75a2352d

dcache: move d_splice_alias · 3f70bd51

由 J. Bruce Fields 提交于 2月 18, 2014

Just a trivial move to locate it near (similar) d_materialise_unique
code and save some forward references in a following patch.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

3f70bd51

12 6月, 2014 1 次提交

lock_parent: don't step on stale ->d_parent of all-but-freed one · c2338f2d

由 Al Viro 提交于 6月 12, 2014

Dentry that had been through (or into) __dentry_kill() might be seen
by shrink_dentry_list(); that's normal, it'll be taken off the shrink
list and freed if __dentry_kill() has already finished.  The problem
is, its ->d_parent might be pointing to already freed dentry, so
lock_parent() needs to be careful.

We need to check that dentry hasn't already gone into __dentry_kill()
*and* grab rcu_read_lock() before dropping ->d_lock - the latter makes
sure that whatever we see in ->d_parent after dropping ->d_lock it
won't be freed until we drop rcu_read_lock().
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

c2338f2d

07 6月, 2014 1 次提交

fs: convert use of typedef ctl_table to struct ctl_table · 1f7e0616

由 Joe Perches 提交于 6月 06, 2014

This typedef is unnecessary and should just be removed.
Signed-off-by: NJoe Perches <joe@perches.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

1f7e0616

01 6月, 2014 1 次提交

dcache: add missing lockdep annotation · 9f12600f

由 Linus Torvalds 提交于 5月 31, 2014

lock_parent() very much on purpose does nested locking of dentries, and
is careful to maintain the right order (lock parent first).  But because
it didn't annotate the nested locking order, lockdep thought it might be
a deadlock on d_lock, and complained.

Add the proper annotation for the inner locking of the child dentry to
make lockdep happy.

Introduced by commit 046b961b ("shrink_dentry_list(): take parent's
->d_lock earlier").
Reported-and-tested-by: NJosh Boyer <jwboyer@fedoraproject.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9f12600f

30 5月, 2014 3 次提交

A
dentry_kill() doesn't need the second argument now · 8cbf74da
由 Al Viro 提交于 5月 29, 2014
```
it's 1 in the only remaining caller.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
8cbf74da

dealing with the rest of shrink_dentry_list() livelock · b2b80195

由 Al Viro 提交于 5月 29, 2014

We have the same problem with ->d_lock order in the inner loop, where
we are dropping references to ancestors.  Same solution, basically -
instead of using dentry_kill() we use lock_parent() (introduced in the
previous commit) to get that lock in a safe way, recheck ->d_count
(in case if lock_parent() has ended up dropping and retaking ->d_lock
and somebody managed to grab a reference during that window), trylock
the inode->i_lock and use __dentry_kill() to do the rest.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

b2b80195

shrink_dentry_list(): take parent's ->d_lock earlier · 046b961b

由 Al Viro 提交于 5月 29, 2014

The cause of livelocks there is that we are taking ->d_lock on
dentry and its parent in the wrong order, forcing us to use
trylock on the parent's one.  d_walk() takes them in the right
order, and unfortunately it's not hard to create a situation
when shrink_dentry_list() can't make progress since trylock
keeps failing, and shrink_dcache_parent() or check_submounts_and_drop()
keeps calling d_walk() disrupting the very shrink_dentry_list() it's
waiting for.

Solution is straightforward - if that trylock fails, let's unlock
the dentry itself and take locks in the right order.  We need to
stabilize ->d_parent without holding ->d_lock, but that's doable
using RCU.  And we'd better do that in the very beginning of the
loop in shrink_dentry_list(), since the checks on refcount, etc.
would need to be redone anyway.

That deals with a half of the problem - killing dentries on the
shrink list itself.  Another one (dropping their parents) is
in the next commit.

locking parent is interesting - it would be easy to do rcu_read_lock(),
lock whatever we think is a parent, lock dentry itself and check
if the parent is still the right one.  Except that we need to check
that *before* locking the dentry, or we are risking taking ->d_lock
out of order.  Fortunately, once the D1 is locked, we can check if
D2->d_parent is equal to D1 without the need to lock D2; D2->d_parent
can start or stop pointing to D1 only under D1->d_lock, so taking
D1->d_lock is enough.  In other words, the right solution is
rcu_read_lock/lock what looks like parent right now/check if it's
still our parent/rcu_read_unlock/lock the child.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

046b961b

29 5月, 2014 2 次提交

expand dentry_kill(dentry, 0) in shrink_dentry_list() · ff2fde99

由 Al Viro 提交于 5月 28, 2014

Result will be massaged to saner shape in the next commits.  It is
ugly, no questions - the point of that one is to be a provably
equivalent transformation (and it might be worth splitting a bit
more).
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

ff2fde99

split dentry_kill() · e55fd011

由 Al Viro 提交于 5月 28, 2014

... into trylocks and everything else.  The latter (actual killing)
is __dentry_kill().
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

e55fd011

28 5月, 2014 1 次提交

lift the "already marked killed" case into shrink_dentry_list() · 64fd72e0

由 Al Viro 提交于 5月 28, 2014

It can happen only when dentry_kill() is called with unlock_on_failure
equal to 0 - other callers had dentry pinned until the moment they've
got ->d_lock and DCACHE_DENTRY_KILLED is set only after lockref_mark_dead().

IOW, only one of three call sites of dentry_kill() might end up reaching
that code. Just move it there.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

64fd72e0

04 5月, 2014 3 次提交

dcache: don't need rcu in shrink_dentry_list() · 60942f2f

由 Miklos Szeredi 提交于 5月 02, 2014

Since now the shrink list is private and nobody can free the dentry while
it is on the shrink list, we can remove RCU protection from this.
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

60942f2f

more graceful recovery in umount_collect() · 9c8c10e2

由 Al Viro 提交于 5月 02, 2014

Start with shrink_dcache_parent(), then scan what remains.

First of all, BUG() is very much an overkill here; we are holding
->s_umount, and hitting BUG() means that a lot of interesting stuff
will be hanging after that point (sync(2), for example).  Moreover,
in cases when there had been more than one leak, we'll be better
off reporting all of them.  And more than just the last component
of pathname - %pd is there for just such uses...

That was the last user of dentry_lru_del(), so kill it off...
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

9c8c10e2

don't remove from shrink list in select_collect() · fe91522a

由 Al Viro 提交于 5月 03, 2014

	If we find something already on a shrink list, just increment
data->found and do nothing else.  Loops in shrink_dcache_parent() and
check_submounts_and_drop() will do the right thing - everything we
did put into our list will be evicted and if there had been nothing,
but data->found got non-zero, well, we have somebody else shrinking
those guys; just try again.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

fe91522a

01 5月, 2014 5 次提交

dentry_kill(): don't try to remove from shrink list · 41edf278

由 Al Viro 提交于 5月 01, 2014

If the victim in on the shrink list, don't remove it from there.
If shrink_dentry_list() manages to remove it from the list before
we are done - fine, we'll just free it as usual.  If not - mark
it with new flag (DCACHE_MAY_FREE) and leave it there.

Eventually, shrink_dentry_list() will get to it, remove the sucker
from shrink list and call dentry_kill(dentry, 0).  Which is where
we'll deal with freeing.

Since now dentry_kill(dentry, 0) may happen after or during
dentry_kill(dentry, 1), we need to recognize that (by seeing
DCACHE_DENTRY_KILLED already set), unlock everything
and either free the sucker (in case DCACHE_MAY_FREE has been
set) or leave it for ongoing dentry_kill(dentry, 1) to deal with.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

41edf278

A
expand the call of dentry_lru_del() in dentry_kill() · 01b60351
由 Al Viro 提交于 4月 29, 2014
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
01b60351

new helper: dentry_free() · b4f0354e

由 Al Viro 提交于 4月 29, 2014

The part of old d_free() that dealt with actual freeing of dentry.
Taken out of dentry_kill() into a separate function.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

b4f0354e

A
fold try_prune_one_dentry() · 5c47e6d0
由 Al Viro 提交于 4月 29, 2014
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
5c47e6d0
A
fold d_kill() and d_free() · 03b3b889
由 Al Viro 提交于 4月 29, 2014
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
03b3b889

20 4月, 2014 1 次提交

fix races between __d_instantiate() and checks of dentry flags · 22213318

由 Al Viro 提交于 4月 19, 2014

in non-lazy walk we need to be careful about dentry switching from
negative to positive - both ->d_flags and ->d_inode are updated,
and in some places we might see only one store.  The cases where
dentry has been obtained by dcache lookup with ->i_mutex held on
parent are safe - ->d_lock and ->i_mutex provide all the barriers
we need.  However, there are several places where we run into
trouble:
	* do_last() fetches ->d_inode, then checks ->d_flags and
assumes that inode won't be NULL unless d_is_negative() is true.
Race with e.g. creat() - we might have fetched the old value of
->d_inode (still NULL) and new value of ->d_flags (already not
DCACHE_MISS_TYPE).  Lin Ming has observed and reported the resulting
oops.
	* a bunch of places checks ->d_inode for being non-NULL,
then checks ->d_flags for "is it a symlink".  Race with symlink(2)
in case if our CPU sees ->d_inode update first - we see non-NULL
there, but ->d_flags still contains DCACHE_MISS_TYPE instead of
DCACHE_SYMLINK_TYPE.  Result: false negative on "should we follow
link here?", with subsequent unpleasantness.

Cc: stable@vger.kernel.org # 3.13 and 3.14 need that one
Reported-and-tested-by: NLin Ming <minggr@gmail.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

22213318

01 4月, 2014 1 次提交

vfs: add cross-rename · da1ce067

由 Miklos Szeredi 提交于 4月 01, 2014

If flags contain RENAME_EXCHANGE then exchange source and destination files.
There's no restriction on the type of the files; e.g. a directory can be
exchanged with a symlink.
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
Reviewed-by: NJan Kara <jack@suse.cz>
Reviewed-by: NJ. Bruce Fields <bfields@redhat.com>

da1ce067

23 3月, 2014 1 次提交

make prepend_name() work correctly when called with negative *buflen · e825196d

由 Al Viro 提交于 3月 23, 2014

In all callchains leading to prepend_name(), the value left in *buflen
is eventually discarded unused if prepend_name() has returned a negative.
So we are free to do what prepend() does, and subtract from *buflen
*before* checking for underflow (which turns into checking the sign
of subtraction result, of course).

Cc: stable@vger.kernel.org
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

e825196d

16 3月, 2014 1 次提交

drm: add pseudo filesystem for shared inodes · 31bbe16f

由 David Herrmann 提交于 1月 03, 2014

Our current DRM design uses a single address_space for all users of the
same DRM device. However, there is no way to create an anonymous
address_space without an underlying inode. Therefore, we wait for the
first ->open() callback on a registered char-dev and take-over the inode
of the char-dev. This worked well so far, but has several drawbacks:
 - We screw with FS internals and rely on some non-obvious invariants like
   inode->i_mapping being the same as inode->i_data for char-devs.
 - We don't have any address_space prior to the first ->open() from
   user-space. This leads to ugly fallback code and we cannot allocate
   global objects early.

As pointed out by Al-Viro, fs/anon_inode.c is *not* supposed to be used by
drivers for anonymous inode-allocation. Therefore, this patch follows the
proposed alternative solution and adds a pseudo filesystem mount-point to
DRM. We can then allocate private inodes including a private address_space
for each DRM device at initialization time.

Note that we could use:
  sysfs_get_inode(sysfs_mnt->mnt_sb, drm_device->dev->kobj.sd);
to get access to the underlying sysfs-inode of a "struct device" object.
However, most of this information is currently hidden and it's not clear
whether this address_space is suitable for driver access. Thus, unless
linux allows anonymous address_space objects or driver-core provides a
public inode per device, we're left with our own private internal mount
point.

Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NDavid Herrmann <dh.herrmann@gmail.com>

31bbe16f

27 1月, 2014 1 次提交

__dentry_path() fixes · f6500801

由 Al Viro 提交于 1月 26, 2014

* we need to save the starting point for restarts
* reject pathologically short buffers outright
Spotted-by: NDenys Vlasenko <dvlasenk@redhat.com>
Spotted-by: NOleg Nesterov <oleg@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

f6500801

26 1月, 2014 1 次提交

vfs: Remove second variable named error in __dentry_path · a8323da0

由 Eric W. Biederman 提交于 1月 20, 2014

In commit  232d2d60
Author: Waiman Long <Waiman.Long@hp.com>
Date:   Mon Sep 9 12:18:13 2013 -0400

    dcache: Translating dentry into pathname without taking rename_lock

The __dentry_path locking was changed and the variable error was
intended to be moved outside of the loop.  Unfortunately the inner
declaration of error was not removed. Resulting in a version of
__dentry_path that will never return an error.

Remove the problematic inner declaration of error and allow
__dentry_path to return errors once again.

Cc: stable@vger.kernel.org
Cc: Waiman Long <Waiman.Long@hp.com>
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

a8323da0

13 12月, 2013 1 次提交

dcache: allow word-at-a-time name hashing with big-endian CPUs · a5c21dce

由 Will Deacon 提交于 12月 12, 2013

When explicitly hashing the end of a string with the word-at-a-time
interface, we have to be careful which end of the word we pick up.

On big-endian CPUs, the upper-bits will contain the data we're after, so
ensure we generate our masks accordingly (and avoid hashing whatever
random junk may have been sitting after the string).

This patch adds a new dcache helper, bytemask_from_count, which creates
a mask appropriate for the CPU endianness.

Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a5c21dce

27 11月, 2013 1 次提交

vfs: In d_path don't call d_dname on a mount point · f48cfddc

由 Eric W. Biederman 提交于 11月 08, 2013

Aditya Kali (adityakali@google.com) wrote:
> Commit bf056bfa:
> "proc: Fix the namespace inode permission checks." converted
> the namespace files into symlinks. The same commit changed
> the way namespace bind mounts appear in /proc/mounts:
>   $ mount --bind /proc/self/ns/ipc /mnt/ipc
> Originally:
>   $ cat /proc/mounts | grep ipc
>   proc /mnt/ipc proc rw,nosuid,nodev,noexec 0 0
>
> After commit bf056bfa:
>   $ cat /proc/mounts | grep ipc
>   proc ipc:[4026531839] proc rw,nosuid,nodev,noexec 0 0
>
> This breaks userspace which expects the 2nd field in
> /proc/mounts to be a valid path.

The symlink /proc/<pid>/ns/{ipc,mnt,net,pid,user,uts} point to
dentries allocated with d_alloc_pseudo that we can mount, and
that have interesting names printed out with d_dname.

When these files are bind mounted /proc/mounts is not currently
displaying the mount point correctly because d_dname is called instead
of just displaying the path where the file is mounted.

Solve this by adding an explicit check to distinguish mounted pseudo
inodes and unmounted pseudo inodes.  Unmounted pseudo inodes always
use mount of their filesstem as the mnt_root  in their path making
these two cases easy to distinguish.

CC: stable@vger.kernel.org
Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
Reported-by: NAditya Kali <adityakali@google.com>
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>

f48cfddc

16 11月, 2013 3 次提交

fold try_to_ascend() into the sole remaining caller · 31dec132

由 Al Viro 提交于 10月 25, 2013

There used to be a bunch of tree-walkers in dcache.c, all alike.
try_to_ascend() had been introduced to abstract a piece of logics
duplicated in all of them. These days all these tree-walkers are
implemented via the same iterator (d_walk()), which is the only
remaining caller of try_to_ascend(), so let's fold it back...
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

31dec132

dcache.c: get rid of pointless macros · 482db906

由 Al Viro 提交于 10月 25, 2013

D_HASH{MASK,BITS} are used once each, both in the same function (d_hash()).
At this point they are actively misguiding - they imply that values are
compiler constants, which is no longer true.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

482db906

A
take read_seqbegin_or_lock() and friends to seqlock.h · 2bc74feb
由 Al Viro 提交于 10月 25, 2013
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
2bc74feb

13 11月, 2013 2 次提交

A
prepend_path() needs to reinitialize dentry/vfsmount/mnt on restarts · ede4cebc
由 Al Viro 提交于 11月 13, 2013
```
... and equivalent is needed in 3.12; it's broken there as well
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
ede4cebc

fix unpaired rcu lock in prepend_path() · 4ec6c2ae

由 Li Zhong 提交于 11月 13, 2013

Signed-off-by: NLi Zhong <zhong@linux.vnet.ibm.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

4ec6c2ae

09 11月, 2013 3 次提交

dcache: don't clear DCACHE_DISCONNECTED too early · f80de2cd

由 J. Bruce Fields 提交于 7月 18, 2012

DCACHE_DISCONNECTED should not be cleared until we're sure the dentry is
connected all the way up to the root of the filesystem.  It *shouldn't*
be cleared as soon as the dentry is connected to a parent.  That will
cause bugs at least on exportable filesystems.
Acked-by: NChristoph Hellwig <hch@infradead.org>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

f80de2cd

dcache: Don't set DISCONNECTED on "pseudo filesystem" dentries · e1a24bb0

由 J. Bruce Fields 提交于 6月 29, 2012

I can't for the life of me see any reason why anyone should care whether
a dentry that is never hooked into the dentry cache would need
DCACHE_DISCONNECTED set.

This originates from 4b936885 "fs:
improve scalability of pseudo filesystems", which probably just made the
false assumption the DCACHE_DISCONNECTED was meant to be set on anything
not connected to a parent somehow.

So this is just confusing.  Ideally the only uses of DCACHE_DISCONNECTED
would be in the filehandle-lookup code, which needs it to ensure
dentries are connected into the dentry tree before use.

I left d_alloc_pseudo there even though it's now equivalent to
__d_alloc(), just on the theory the name is better documentation of its
intended use outside dcache.c.

Cc: Nick Piggin <npiggin@kernel.dk>
Acked-by: NChristoph Hellwig <hch@infradead.org>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

e1a24bb0

dcache: use IS_ROOT to decide where dentry is hashed · 7632e465

由 J. Bruce Fields 提交于 6月 28, 2012

Every hashed dentry is either hashed in the dentry_hashtable, or a
superblock's s_anon list.

__d_drop() assumes it can determine which is the case by checking
DCACHE_DISCONNECTED; this is not true.

It is true that when DCACHE_DISCONNECTED is cleared, the dentry is not
only hashed on dentry_hashtable, but is fully connected to its parents
back to the root.

But the converse is *not* true: fs/exportfs/expfs.c:reconnect_path()
attempts to connect a directory (found by filehandle lookup) back to
root by ascending to parents and performing lookups one at a time.  It
does not clear DCACHE_DISCONNECTED until it's done, and that is not at
all an atomic process.

In particular, it is possible for DCACHE_DISCONNECTED to be set on a
dentry which is hashed on the dentry_hashtable.

Instead, use IS_ROOT() to check which hash chain a dentry is on.  This
*does* work:

Dentries are hashed only by:

	- d_obtain_alias, which adds an IS_ROOT() dentry to sb_anon.

	- __d_rehash, called by _d_rehash: hashes to the dentry's
	  parent, and all callers of _d_rehash appear to have d_parent
	  set to a "real" parent.
	- __d_rehash, called by __d_move: rehashes the moved dentry to
	  hash chain determined by target, and assigns target's d_parent
	  to its d_parent, before dropping the dentry's d_lock.

Therefore I believe it's safe for a holder of a dentry's d_lock to
assume that it is hashed on sb_anon if and only if IS_ROOT(dentry) is
true.

I believe the incorrect assumption about DCACHE_DISCONNECTED was
originally introduced by ceb5bdc2 "fs: dcache per-bucket dcache hash
locking".

Also add a comment while we're here.

Cc: Nick Piggin <npiggin@kernel.dk>
Acked-by: NChristoph Hellwig <hch@infradead.org>
Reviewed-by: NNeilBrown <neilb@suse.de>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

7632e465

openanolis / cloud-kernel 大约 1 年 前同步成功

openanolis / cloud-kernel
大约 1 年前同步成功