提交 · 40fcf5a931af901198fcfb23a50354e54e1fa7a6 · openeuler / Kernel

14 3月, 2020 11 次提交

merging pick_link() with get_link(), part 3 · 40fcf5a9

由 Al Viro 提交于 1月 14, 2020

After a pure jump ("/" or procfs-style symlink) we don't need to
hold the link anymore.  link_path_walk() dropped it if such case
had been detected, lookup_last/do_last() (i.e. old trailing_symlink())
left it on the stack - it ended up calling terminate_walk() shortly
anyway, which would've purged the entire stack.

Do it in get_link() itself instead.  Simpler logics that way...
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

40fcf5a9

merging pick_link() with get_link(), part 2 · 1ccac622

由 Al Viro 提交于 1月 14, 2020

Fold trailing_symlink() into lookup_last() and do_last(), change
the calling conventions of those two.  Rules change:
	success, we are done => NULL instead of 0
	error	=> ERR_PTR(-E...) instead of -E...
	got a symlink to follow => return the path to be followed instead of 1

The loops calling those (in path_lookupat() and path_openat()) adjusted.

A subtle change of control flow here: originally a pure-jump trailing
symlink ("/" or procfs one) would've passed through the upper level
loop once more, with "" for path to traverse.  That would've brought
us back to the lookup_last/do_last entry and we would've hit LAST_BIND
case (LAST_BIND left from get_link() called by trailing_symlink())
and pretty much skip to the point right after where we'd left the
sucker back when we picked that trailing symlink.

Now we don't bother with that extra pass through the upper level
loop - if get_link() says "I've just done a pure jump, nothing
else to do", we just treat that as non-symlink case.

Boilerplate added on that step will go away shortly - it'll migrate
into walk_component() and then to step_into(), collapsing into the
change of calling conventions for those.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

1ccac622

merging pick_link() with get_link(), part 1 · 43679723

由 Al Viro 提交于 1月 14, 2020

Move restoring LOOKUP_PARENT and zeroing nd->stack.name[0] past
the call of get_link() (nothing _currently_ uses them in there).
That allows to moved the call of may_follow_link() into get_link()
as well, since now the presence of LOOKUP_PARENT distinguishes
the callers from each other (link_path_walk() has it, trailing_symlink()
doesn't).

Preparations for folding trailing_symlink() into callers (lookup_last()
and do_last()) and changing the calling conventions of those.  Next
stage after that will have get_link() call migrate into walk_component(),
then - into step_into().  It's tricky enough to warrant doing that
in stages, unfortunately...
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

43679723

A
expand the only remaining call of path_lookup_conditional() · a9dc1494
由 Al Viro 提交于 1月 12, 2020
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
a9dc1494

LOOKUP_MOUNTPOINT: fold path_mountpointat() into path_lookupat() · 161aff1d

由 Al Viro 提交于 1月 11, 2020

New LOOKUP flag, telling path_lookupat() to act as path_mountpointat().
IOW, traverse mounts at the final point and skip revalidation of the
location where it ends up.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

161aff1d

fold handle_mounts() into step_into() · cbae4d12

由 Al Viro 提交于 1月 12, 2020

The following is true:
	* calls of handle_mounts() and step_into() are always
paired in sequences like
	err = handle_mounts(nd, dentry, &path, &inode, &seq);
	if (unlikely(err < 0))
		return err;
	err = step_into(nd, &path, flags, inode, seq);
	* in all such sequences path is uninitialized before and
unused after this pair of calls
	* in all such sequences inode and seq are unused afterwards.

So the call of handle_mounts() can be shifted inside step_into(),
turning 'path' into a local variable in the combined function.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

cbae4d12

new step_into() flag: WALK_NOFOLLOW · aca2903e

由 Al Viro 提交于 1月 09, 2020

Tells step_into() not to follow symlinks, regardless of LOOKUP_FOLLOW.
Allows to switch handle_lookup_down() to of step_into(), getting
all follow_managed() and step_into() calls paired.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

aca2903e

step_into() callers: dismiss the symlink earlier · 56676ec3

由 Al Viro 提交于 3月 10, 2020

We need to dismiss a symlink when we are done traversing it;
currently that's done when we call step_into() for its last
component.  For the cases when we do not call step_into()
for that component (i.e. when it's . or ..) we do the same
symlink dismissal after the call of handle_dots().

What we need to guarantee is that the symlink won't be dismissed
while we are still using nd->last.name - it's pointing into the
body of said symlink.  step_into() is sufficiently late - by
the time it's called we'd already obtained the dentry, so the
name we'd been looking up is no longer needed.  However, it
turns out to be cleaner to have that ("we are done with that
component now, can dismiss the link") done explicitly - in the
callers of step_into().

In handle_dots() case we won't be using the component string
at all, so for . and .. the corresponding point is actually
_before_ the call of handle_dots(), not after it.

Fix a minor irregularity in do_last(), while we are at it -
if trailing symlink ended with . or .. we forgot to dismiss
it.  Not a problem, since nameidata is about to be done with
(neither . nor .. can be a trailing symlink, so this is the
last iteration through the loop) and terminate_walk() will
clean the stack anyway, but let's keep it more regular.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

56676ec3

lookup_fast(): take mount traversal into callers · 20e34357

由 Al Viro 提交于 1月 09, 2020

Current calling conventions: -E... on error, 0 on cache miss,
result of handle_mounts(nd, dentry, path, inode, seqp) on
success.  Turn that into returning ERR_PTR(-E...), NULL and dentry
resp.; deal with handle_mounts() in the callers.  The thing
is, they already do that in cache miss handling case, so we
just need to supply dentry to them and unify the mount traversal
in those cases.  Fewer arguments that way, and we get closer
to merging handle_mounts() and step_into().
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

20e34357

teach handle_mounts() to handle RCU mode · c153007b

由 Al Viro 提交于 1月 09, 2020

... and make the callers of __follow_mount_rcu() use handle_mounts().
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

c153007b

lookup_fast(): consolidate the RCU success case · b023e172

由 Al Viro 提交于 1月 17, 2020

1) in case of __follow_mount_rcu() failure, lookup_fast() proceeds
to call unlazy_child() and, should it succeed, handle_mounts().
Note that we have status > 0 (or we wouldn't be calling
__follow_mount_rcu() at all), so all stuff conditional upon
non-positive status won't be even touched.

Consolidate just that sequence after the call of __follow_mount_rcu().

2) calling d_is_negative() and keeping its result is pointless -
we either don't get past checking ->d_seq (and don't use the results of
d_is_negative() at all), or we are guaranteed that ->d_inode and
type bits of ->d_flags had been consistent at the time of d_is_negative()
call.  IOW, we could only get to the use of its result if it's
equal to !inode.  The same ->d_seq check guarantees that after that point
this CPU won't observe ->d_flags values older than ->d_inode update.
So 'negative' variable is completely pointless these days.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

b023e172

13 3月, 2020 3 次提交

handle_mounts(): pass dentry in, turn path into a pure out argument · db3c9ade

由 Al Viro 提交于 1月 09, 2020

All callers are equivalent to
	path->dentry = dentry;
	path->mnt = nd->path.mnt;
	err = handle_mounts(path, ...)
Pass dentry as an explicit argument, fill *path in handle_mounts()
itself.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

db3c9ade

do_last(): collapse the call of path_to_nameidata() · e73cabff

由 Al Viro 提交于 1月 09, 2020

... and shift filling struct path to just before the call of
handle_mounts().  All callers of handle_mounts() are
immediately preceded by path->mnt = nd->path.mnt now.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

e73cabff

A
lookup_open(): saner calling conventions (return dentry on success) · da5ebf5a
由 Al Viro 提交于 1月 09, 2020
```
same story as for atomic_open() in the previous commit.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
da5ebf5a

28 2月, 2020 6 次提交

atomic_open(): saner calling conventions (return dentry on success) · 239eb983

由 Al Viro 提交于 1月 09, 2020

Currently it either returns -E... or puts (nd->path.mnt,dentry)
into *path and returns 0.  Make it return ERR_PTR(-E...) or
dentry; adjust the caller.  Fewer arguments and it's easier
to keep track of *path contents that way.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

239eb983

handle_mounts(): start building a sane wrapper for follow_managed() · bd7c4b50

由 Al Viro 提交于 1月 08, 2020

All callers of follow_managed() follow it on success with the same steps -
d_backing_inode(path->dentry) is calculated and stored into some struct inode *
variable and, in all but one case, an unsigned variable (nd->seq to be) is
zeroed.  The single exception is lookup_fast() and there zeroing is correct
thing to do - not doing it is a pointless microoptimization.

	Add a wrapper for follow_managed() that would do that combination.
It's mostly a vehicle for code massage - it will be changing quite a bit,
and the current calling conventions are by no means final.  Right now it
takes path, nameidata and (as out params) inode and seq, similar to
__follow_mount_rcu().  Which will soon get folded into it...
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

bd7c4b50

make build_open_flags() treat O_CREAT | O_EXCL as implying O_NOFOLLOW · 31d1726d

由 Al Viro 提交于 1月 08, 2020

O_CREAT | O_EXCL means "-EEXIST if we run into a trailing symlink".
As it is, we might or might not have LOOKUP_FOLLOW in op->intent
in that case - that depends upon having O_NOFOLLOW in open flags.
It doesn't matter, since we won't be checking it in that case -
do_last() bails out earlier.

However, making sure it's not set (i.e. acting as if we had an explicit
O_NOFOLLOW) makes the behaviour more explicit and allows to reorder the
check for O_CREAT | O_EXCL in do_last() with the call of step_into()
immediately following it.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

31d1726d

follow_automount() doesn't need the entire nameidata · 1c9f5e06

由 Al Viro 提交于 1月 16, 2020

Only the address of ->total_link_count and the flags.
And fix an off-by-one is ELOOP detection - make it
consistent with symlink following, where we check if
the pre-increment value has reached 40, rather than
check the post-increment one.

[kudos to Christian Brauner for spotted braino]
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

1c9f5e06

follow_automount(): get rid of dead^Wstillborn code · 25e195aa

由 Al Viro 提交于 1月 11, 2020

1) no instances of ->d_automount() have ever made use of the "return
ERR_PTR(-EISDIR) if you don't feel like mounting anything" - that's
a rudiment of plans that got superseded before the thing went into
the tree.  Despite the comment in follow_automount(), autofs has
never done that.

2) if there's no ->d_automount() in dentry_operations, filesystems
should not set DCACHE_NEED_AUTOMOUNT in the first place.  None have
ever done so...
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

25e195aa

fix automount/automount race properly · 26df6034

由 Al Viro 提交于 1月 11, 2020

Protection against automount/automount races (two threads hitting the same
referral point at the same time) is based upon do_add_mount() prevention of
identical overmounts - trying to overmount the root of mounted tree with
the same tree fails with -EBUSY.  It's unreliable (the other thread might've
mounted something on top of the automount it has triggered) *and* causes
no end of headache for follow_automount() and its caller, since
finish_automount() behaves like do_new_mount() - if the mountpoint to be is
overmounted, it mounts on top what's overmounting it.  It's not only wrong
(we want to go into what's overmounting the automount point and quietly
discard what we planned to mount there), it introduces the possibility of
original parent mount getting dropped.  That's what 8aef1884 (VFS: Fix
vfsmount overput on simultaneous automount) deals with, but it can't do
anything about the reliability of conflict detection - if something had
been overmounted the other thread's automount (e.g. that other thread
having stepped into automount in mount(2)), we don't get that -EBUSY and
the result is
	 referral point under automounted NFS under explicit overmount
under another copy of automounted NFS

What we need is finish_automount() *NOT* digging into overmounts - if it
finds one, it should just quietly discard the thing it was asked to mount.
And don't bother with actually crossing into the results of finish_automount() -
the same loop that calls follow_automount() will do that just fine on the
next iteration.

IOW, instead of calling lock_mount() have finish_automount() do it manually,
_without_ the "move into overmount and retry" part.  And leave crossing into
the results to the caller of follow_automount(), which simplifies it a lot.

Moral: if you end up with a lot of glue working around the calling conventions
of something, perhaps these calling conventions are simply wrong...

Fixes: 8aef1884 (VFS: Fix vfsmount overput on simultaneous automount)
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

26df6034

02 2月, 2020 1 次提交

vfs: fix do_last() regression · 6404674a

由 Al Viro 提交于 2月 01, 2020

Brown paperbag time: fetching ->i_uid/->i_mode really should've been
done from nd->inode.  I even suggested that, but the reason for that has
slipped through the cracks and I went for dir->d_inode instead - made
for more "obvious" patch.

Analysis:

 - at the entry into do_last() and all the way to step_into(): dir (aka
   nd->path.dentry) is known not to have been freed; so's nd->inode and
   it's equal to dir->d_inode unless we are already doomed to -ECHILD.
   inode of the file to get opened is not known.

 - after step_into(): inode of the file to get opened is known; dir
   might be pointing to freed memory/be negative/etc.

 - at the call of may_create_in_sticky(): guaranteed to be out of RCU
   mode; inode of the file to get opened is known and pinned; dir might
   be garbage.

The last was the reason for the original patch.  Except that at the
do_last() entry we can be in RCU mode and it is possible that
nd->path.dentry->d_inode has already changed under us.

In that case we are going to fail with -ECHILD, but we need to be
careful; nd->inode is pointing to valid struct inode and it's the same
as nd->path.dentry->d_inode in "won't fail with -ECHILD" case, so we
should use that.
Reported-by: N"Rantala, Tommi T. (Nokia - FI/Espoo)" <tommi.t.rantala@nokia.com>
Reported-by: syzbot+190005201ced78a74ad6@syzkaller.appspotmail.com
Wearing-brown-paperbag: Al Viro <viro@zeniv.linux.org.uk>
Cc: stable@kernel.org
Fixes: d0cb5018 ("do_last(): fetch directory ->i_mode and ->i_uid before it's too late")
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

6404674a

26 1月, 2020 1 次提交

do_last(): fetch directory ->i_mode and ->i_uid before it's too late · d0cb5018

由 Al Viro 提交于 1月 26, 2020

may_create_in_sticky() call is done when we already have dropped the
reference to dir.

Fixes: 30aba665 (namei: allow restricted O_CREAT of FIFOs and regular files)
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

d0cb5018

15 1月, 2020 2 次提交

fix autofs regression caused by follow_managed() changes · 508c8772

由 Al Viro 提交于 1月 14, 2020

we need to reload ->d_flags after the call of ->d_manage() - the thing
might've been called with dentry still negative and have the damn thing
turned positive while we'd waited.

Fixes: d41efb52 "fs/namei.c: pull positivity check into follow_managed()"
Reported-by: NIan Kent <raven@themaw.net>
Tested-by: NIan Kent <raven@themaw.net>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

508c8772

reimplement path_mountpoint() with less magic · c64cd6e3

由 Al Viro 提交于 1月 10, 2020

... and get rid of a bunch of bugs in it.  Background:
the reason for path_mountpoint() is that umount() really doesn't
want attempts to revalidate the root of what it's trying to umount.
The thing we want to avoid actually happen from complete_walk();
solution was to do something parallel to normal path_lookupat()
and it both went overboard and got the boilerplate subtly
(and not so subtly) wrong.

A better solution is to do pretty much what the normal path_lookupat()
does, but instead of complete_walk() do unlazy_walk().  All it takes
to avoid that ->d_weak_revalidate() call...  mountpoint_last() goes
away, along with everything it got wrong, and so does the magic around
LOOKUP_NO_REVAL.

Another source of bugs is that when we traverse mounts at the final
location (and we need to do that - umount . expects to get whatever's
overmounting ., if any, out of the lookup) we really ought to take
care of ->d_manage() - as it is, manual umount of autofs automount
in progress can lead to unpleasant surprises for the daemon.  Easily
solved by using handle_lookup_down() instead of follow_mount().
Tested-by: NIan Kent <raven@themaw.net>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

c64cd6e3

09 12月, 2019 9 次提交

namei: LOOKUP_{IN_ROOT,BENEATH}: permit limited ".." resolution · ab87f9a5