提交 · 13aab428a73d3200b9283b61b7fdf5713181ac66 · openeuler / raspberrypi-kernel

14 3月, 2011 12 次提交

separate -ESTALE/-ECHILD retries in do_filp_open() from real work · 13aab428

由 Al Viro 提交于 2月 23, 2011

new helper: path_openat().  Does what do_filp_open() does, except
that it tries only the walk mode (RCU/normal/force revalidation)
it had been told to.

Both create and non-create branches are using path_lookupat() now.
Fixed the double audit_inode() in non-create branch.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

13aab428

switch do_filp_open() to struct open_flags · 47c805dc

由 Al Viro 提交于 2月 23, 2011

take calculation of open_flags by open(2) arguments into new helper
in fs/open.c, move filp_open() over there, have it and do_sys_open()
use that helper, switch exec.c callers of do_filp_open() to explicit
(and constant) struct open_flags.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

47c805dc

Collect "operation mode" arguments of do_last() into a structure · c3e380b0

由 Al Viro 提交于 2月 23, 2011

No point messing with passing shitloads of "operation mode" arguments
to do_open() one by one, especially since they are not going to change
during do_filp_open().  Collect them into a struct, fill it and pass
to do_last() by reference.

Make sure that lookup intent flags are correctly set and removed - we
want them for do_last(), but they make no sense for __do_follow_link().
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

c3e380b0

A
clean up the failure exits after __do_follow_link() in do_filp_open() · f1afe9ef
由 Al Viro 提交于 2月 22, 2011
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
f1afe9ef
A
pull security_inode_follow_link() into __do_follow_link() · 36f3b4f6
由 Al Viro 提交于 2月 22, 2011
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
36f3b4f6
A
pull dropping RCU on success of link_path_walk() into path_lookupat() · 086e183a
由 Al Viro 提交于 2月 22, 2011
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
086e183a

untangle the "need_reval_dot" mess · 16c2cd71

由 Al Viro 提交于 2月 22, 2011

instead of ad-hackery around need_reval_dot(), do the following:
set a flag (LOOKUP_JUMPED) in the beginning of path, on absolute
symlink traversal, on ".." and on procfs-style symlinks.  Clear on
normal components, leave unchanged on ".".  Non-nested callers of
link_path_walk() call handle_reval_path(), which checks that flag
is set and that fs does want the final revalidate thing, then does
->d_revalidate().  In link_path_walk() all the return_reval stuff
is gone.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

16c2cd71

merge component type recognition · fe479a58

由 Al Viro 提交于 2月 22, 2011

no need to do it in three places...
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

fe479a58

merge path_init and path_init_rcu · e41f7d4e

由 Al Viro 提交于 2月 22, 2011

Actual dependency on whether we want RCU or not is in 3 small areas
(as it ought to be) and everything around those is the same in both
versions.  Since each function has only one caller and those callers
are on two sides of if (flags & LOOKUP_RCU), it's easier and cleaner
to merge them and pull the checks inside.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

e41f7d4e

sanitize path_walk() mess · ee0827cd

由 Al Viro 提交于 2月 21, 2011

New helper: path_lookupat().  Basically, what do_path_lookup() boils to
modulo -ECHILD/-ESTALE handler.  path_walk* family is gone; vfs_path_lookup()
is using link_path_walk() directly, do_path_lookup() and do_filp_open()
are using path_lookupat().
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

ee0827cd

A
take RCU-dependent stuff around exec_permission() into a new helper · 52094c8a
由 Al Viro 提交于 2月 21, 2011
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
52094c8a

kill path_lookup() · c9c6cac0

由 Al Viro 提交于 2月 16, 2011

all remaining callers pass LOOKUP_PARENT to it, so
flags argument can die; renamed to kern_path_parent()
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

c9c6cac0

09 3月, 2011 1 次提交

nd->inode is not set on the second attempt in path_walk() · b306419a

由 Al Viro 提交于 3月 08, 2011

We leave it at whatever it had been pointing to after the
first link_path_walk() had failed with -ESTALE.  Things
do not work well after that...
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

b306419a

05 3月, 2011 1 次提交

minimal fix for do_filp_open() race · 1858efd4

由 Al Viro 提交于 3月 04, 2011

failure exits on the no-O_CREAT side of do_filp_open() merge with
those of O_CREAT one; unfortunately, if do_path_lookup() returns
-ESTALE, we'll get out_filp:, notice that we are about to return
-ESTALE without having trying to create the sucker with LOOKUP_REVAL
and jump right into the O_CREAT side of code.  And proceed to try
and create a file.  Usually that'll fail with -ESTALE again, but
we can race and get that attempt of pathname resolution to succeed.

open() without O_CREAT really shouldn't end up creating files, races
or not.  The real fix is to rearchitect the whole do_filp_open(),
but for now splitting the failure exits will do.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

1858efd4

17 2月, 2011 1 次提交

vfs: fix BUG_ON() in fs/namei.c:1461 · 3abb17e8

由 Linus Torvalds 提交于 2月 16, 2011

When Al moved the nameidata_dentry_drop_rcu_maybe() call into the
do_follow_link function in commit 844a3917 ("nothing in
do_follow_link() is going to see RCU"), he mistakenly left the

	BUG_ON(inode != path->dentry->d_inode);

behind.  Which would otherwise be ok, but that BUG_ON() really needs to
be _after_ dropping RCU, since the dentry isn't necessarily stable
otherwise.

So complete the code movement in that commit, and move the BUG_ON() into
do_follow_link() too.  This means that we need to pass in 'inode' as an
argument (just for this one use), but that's a small thing.  And
eventually we may be confident enough in our path lookup that we can
just remove the BUG_ON() and the unnecessary inode argument.
Reported-and-tested-by: NEric Dumazet <eric.dumazet@gmail.com>
Acked-by: NAl Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

3abb17e8

15 2月, 2011 5 次提交
- A
  get rid of nameidata_dentry_drop_rcu() calling nameidata_drop_rcu() · 4e924a4f
  由 Al Viro 提交于 2月 15, 2011
```
can't happen anymore and didn't work right anyway
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  4e924a4f
- A
  drop out of RCU in return_reval · f60aef7e
  由 Al Viro 提交于 2月 15, 2011
```
... thus killing the need to handle drop-from-RCU in d_revalidate()
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  f60aef7e
- A
  split do_revalidate() into RCU and non-RCU cases · f5e1c1c1
  由 Al Viro 提交于 2月 15, 2011
```
fixing oopsen in lookup_one_len()
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  f5e1c1c1
- A
  in do_lookup() split RCU and non-RCU cases of need_revalidate · 24643087
  由 Al Viro 提交于 2月 15, 2011
```
and use unlikely() instead of gotos, for fsck sake...
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  24643087
- A
  nothing in do_follow_link() is going to see RCU · 844a3917
  由 Al Viro 提交于 2月 15, 2011
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  844a3917
12 2月, 2011 1 次提交

Fix possible filp_cachep memory corruption · 2dab5974

由 Linus Torvalds 提交于 2月 11, 2011

In commit 31e6b01f ("fs: rcu-walk for path lookup") we started doing
path lookup using RCU, which then falls back to a careful non-RCU lookup
in case of problems (LOOKUP_REVAL).  So do_filp_open() has this "re-do
the lookup carefully" looping case.

However, that means that we must not release the open-intent file data
if we are going to loop around and use it once more!

Fix this by moving the release of the open-intent data to the function
that allocates it (do_filp_open() itself) rather than the helper
functions that can get called multiple times (finish_open() and
do_last()).  This makes the logic for the lifetime of that field much
more obvious, and avoids the possible double free.
Reported-by: NJ. R. Okajima <hooanon05@yahoo.co.jp>
Acked-by: NAl Viro <viro@zeniv.linux.org.uk>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

2dab5974

18 1月, 2011 1 次提交

vfs - fix dentry ref count in do_lookup() · 89312214

由 Ian Kent 提交于 1月 18, 2011

There is a ref count problem in fs/namei.c:do_lookup().

When walking in ref-walk mode, if follow_managed() returns a fail we
need to drop dentry and possibly vfsmount.  Clean up properly,
as we do in the other caller of follow_managed().
Signed-off-by: NIan Kent <raven@themaw.net>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

89312214

17 1月, 2011 2 次提交

A
Take the completion of automount into new helper · 19a167af
由 Al Viro 提交于 1月 17, 2011
```
... and shift it from namei.c to namespace.c
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
19a167af

sanitize vfsmount refcounting changes · f03c6599

由 Al Viro 提交于 1月 14, 2011

Instead of splitting refcount between (per-cpu) mnt_count
and (SMP-only) mnt_longrefs, make all references contribute
to mnt_count again and keep track of how many are longterm
ones.

Accounting rules for longterm count:
	* 1 for each fs_struct.root.mnt
	* 1 for each fs_struct.pwd.mnt
	* 1 for having non-NULL ->mnt_ns
	* decrement to 0 happens only under vfsmount lock exclusive

That allows nice common case for mntput() - since we can't drop the
final reference until after mnt_longterm has reached 0 due to the rules
above, mntput() can grab vfsmount lock shared and check mnt_longterm.
If it turns out to be non-zero (which is the common case), we know
that this is not the final mntput() and can just blindly decrement
percpu mnt_count.  Otherwise we grab vfsmount lock exclusive and
do usual decrement-and-check of percpu mnt_count.

For fs_struct.c we have mnt_make_longterm() and mnt_make_shortterm();
namespace.c uses the latter in places where we don't already hold
vfsmount lock exclusive and opencodes a few remaining spots where
we need to manipulate mnt_longterm.

Note that we mostly revert the code outside of fs/namespace.c back
to what we used to have; in particular, normal code doesn't need
to care about two kinds of references, etc.  And we get to keep
the optimization Nick's variant had bought us...
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

f03c6599

16 1月, 2011 8 次提交

Unexport do_add_mount() and add in follow_automount(), not ->d_automount() · ea5b778a

由 David Howells 提交于 1月 14, 2011

Unexport do_add_mount() and make ->d_automount() return the vfsmount to be
added rather than calling do_add_mount() itself. follow_automount() will then
do the addition.

This slightly complicates things as ->d_automount() normally wants to add the
new vfsmount to an expiration list and start an expiration timer. The problem
with that is that the vfsmount will be deleted if it has a refcount of 1 and
the timer will not repeat if the expiration list is empty.

To this end, we require the vfsmount to be returned from d_automount() with a
refcount of (at least) 2. One of these refs will be dropped unconditionally.
In addition, follow_automount() must get a 3rd ref around the call to
do_add_mount() lest it eat a ref and return an error, leaving the mount we
have open to being expired as we would otherwise have only 1 ref on it.

d_automount() should also add the the vfsmount to the expiration list (by
calling mnt_set_expiry()) and start the expiration timer before returning, if
this mechanism is to be used. The vfsmount will be unlinked from the
expiration list by follow_automount() if do_add_mount() fails.

This patch also fixes the call to do_add_mount() for AFS to propagate the mount
flags from the parent vfsmount.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

ea5b778a

Allow d_manage() to be used in RCU-walk mode · ab90911f

由 David Howells 提交于 1月 14, 2011

Allow d_manage() to be called from pathwalk when it is in RCU-walk mode as well
as when it is in Ref-walk mode.  This permits __follow_mount_rcu() to call
d_manage() directly.  d_manage() needs a parameter to indicate that it is in
RCU-walk mode as it isn't allowed to sleep if in that mode (but should return
-ECHILD instead).

autofs4_d_manage() can then be set to retain RCU-walk mode if the daemon
accesses it and otherwise request dropping back to ref-walk mode.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

ab90911f

Remove a further kludge from __do_follow_link() · 87556ef1

由 David Howells 提交于 1月 14, 2011

Remove a further kludge from __do_follow_link() as it's no longer required with
the automount code.

This reverts the non-helper-function parts of
051d3812, which breaks union mounts.

Reported-by: vaurora@redhat.com
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

87556ef1

Remove the automount through follow_link() kludge code from pathwalk · db372915

由 David Howells 提交于 1月 14, 2011

Remove the automount through follow_link() kludge code from pathwalk in favour
of using d_automount().
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Acked-by: NIan Kent <raven@themaw.net>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

db372915

Add an AT_NO_AUTOMOUNT flag to suppress terminal automount · 6f45b656

由 David Howells 提交于 1月 14, 2011

Add an AT_NO_AUTOMOUNT flag to suppress terminal automounting of automount
point directories.  This can be used by fstatat() users to permit the
gathering of attributes on an automount point and also prevent
mass-automounting of a directory of automount points by ls.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Acked-by: NIan Kent <raven@themaw.net>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

6f45b656

Add a dentry op to allow processes to be held during pathwalk transit · cc53ce53

由 David Howells 提交于 1月 14, 2011

Add a dentry op (d_manage) to permit a filesystem to hold a process and make it
sleep when it tries to transit away from one of that filesystem's directories
during a pathwalk.  The operation is keyed off a new dentry flag
(DCACHE_MANAGE_TRANSIT).

The filesystem is allowed to be selective about which processes it holds and
which it permits to continue on or prohibits from transiting from each flagged
directory.  This will allow autofs to hold up client processes whilst letting
its userspace daemon through to maintain the directory or the stuff behind it
or mounted upon it.

The ->d_manage() dentry operation:

	int (*d_manage)(struct path *path, bool mounting_here);

takes a pointer to the directory about to be transited away from and a flag
indicating whether the transit is undertaken by do_add_mount() or
do_move_mount() skipping through a pile of filesystems mounted on a mountpoint.

It should return 0 if successful and to let the process continue on its way;
-EISDIR to prohibit the caller from skipping to overmounted filesystems or
automounting, and to use this directory; or some other error code to return to
the user.

->d_manage() is called with namespace_sem writelocked if mounting_here is true
and no other locks held, so it may sleep.  However, if mounting_here is true,
it may not initiate or wait for a mount or unmount upon the parameter
directory, even if the act is actually performed by userspace.

Within fs/namei.c, follow_managed() is extended to check with d_manage() first
on each managed directory, before transiting away from it or attempting to
automount upon it.

follow_down() is renamed follow_down_one() and should only be used where the
filesystem deliberately intends to avoid management steps (e.g. autofs).

A new follow_down() is added that incorporates the loop done by all other
callers of follow_down() (do_add/move_mount(), autofs and NFSD; whilst AFS, NFS
and CIFS do use it, their use is removed by converting them to use
d_automount()).  The new follow_down() calls d_manage() as appropriate.  It
also takes an extra parameter to indicate if it is being called from mount code
(with namespace_sem writelocked) which it passes to d_manage().  follow_down()
ignores automount points so that it can be used to mount on them.

__follow_mount_rcu() is made to abort rcu-walk mode if it hits a directory with
DCACHE_MANAGE_TRANSIT set on the basis that we're probably going to have to
sleep.  It would be possible to enter d_manage() in rcu-walk mode too, and have
that determine whether to abort or not itself.  That would allow the autofs
daemon to continue on in rcu-walk mode.

Note that DCACHE_MANAGE_TRANSIT on a directory should be cleared when it isn't
required as every tranist from that directory will cause d_manage() to be
invoked.  It can always be set again when necessary.

==========================
WHAT THIS MEANS FOR AUTOFS
==========================

Autofs currently uses the lookup() inode op and the d_revalidate() dentry op to
trigger the automounting of indirect mounts, and both of these can be called
with i_mutex held.

autofs knows that the i_mutex will be held by the caller in lookup(), and so
can drop it before invoking the daemon - but this isn't so for d_revalidate(),
since the lock is only held on _some_ of the code paths that call it.  This
means that autofs can't risk dropping i_mutex from its d_revalidate() function
before it calls the daemon.

The bug could manifest itself as, for example, a process that's trying to
validate an automount dentry that gets made to wait because that dentry is
expired and needs cleaning up:

	mkdir         S ffffffff8014e05a     0 32580  24956
	Call Trace:
	 [<ffffffff885371fd>] :autofs4:autofs4_wait+0x674/0x897
	 [<ffffffff80127f7d>] avc_has_perm+0x46/0x58
	 [<ffffffff8009fdcf>] autoremove_wake_function+0x0/0x2e
	 [<ffffffff88537be6>] :autofs4:autofs4_expire_wait+0x41/0x6b
	 [<ffffffff88535cfc>] :autofs4:autofs4_revalidate+0x91/0x149
	 [<ffffffff80036d96>] __lookup_hash+0xa0/0x12f
	 [<ffffffff80057a2f>] lookup_create+0x46/0x80
	 [<ffffffff800e6e31>] sys_mkdirat+0x56/0xe4

versus the automount daemon which wants to remove that dentry, but can't
because the normal process is holding the i_mutex lock:

	automount     D ffffffff8014e05a     0 32581      1              32561
	Call Trace:
	 [<ffffffff80063c3f>] __mutex_lock_slowpath+0x60/0x9b
	 [<ffffffff8000ccf1>] do_path_lookup+0x2ca/0x2f1
	 [<ffffffff80063c89>] .text.lock.mutex+0xf/0x14
	 [<ffffffff800e6d55>] do_rmdir+0x77/0xde
	 [<ffffffff8005d229>] tracesys+0x71/0xe0
	 [<ffffffff8005d28d>] tracesys+0xd5/0xe0

which means that the system is deadlocked.

This patch allows autofs to hold up normal processes whilst the daemon goes
ahead and does things to the dentry tree behind the automouter point without
risking a deadlock as almost no locks are held in d_manage() and none in
d_automount().
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Was-Acked-by: NIan Kent <raven@themaw.net>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

cc53ce53

Add a dentry op to handle automounting rather than abusing follow_link() · 9875cf80

由 David Howells 提交于 1月 14, 2011

Add a dentry op (d_automount) to handle automounting directories rather than
abusing the follow_link() inode operation.  The operation is keyed off a new
dentry flag (DCACHE_NEED_AUTOMOUNT).

This also makes it easier to add an AT_ flag to suppress terminal segment
automount during pathwalk and removes the need for the kludge code in the
pathwalk algorithm to handle directories with follow_link() semantics.

The ->d_automount() dentry operation:

	struct vfsmount *(*d_automount)(struct path *mountpoint);

takes a pointer to the directory to be mounted upon, which is expected to
provide sufficient data to determine what should be mounted.  If successful, it
should return the vfsmount struct it creates (which it should also have added
to the namespace using do_add_mount() or similar).  If there's a collision with
another automount attempt, NULL should be returned.  If the directory specified
by the parameter should be used directly rather than being mounted upon,
-EISDIR should be returned.  In any other case, an error code should be
returned.

The ->d_automount() operation is called with no locks held and may sleep.  At
this point the pathwalk algorithm will be in ref-walk mode.

Within fs/namei.c itself, a new pathwalk subroutine (follow_automount()) is
added to handle mountpoints.  It will return -EREMOTE if the automount flag was
set, but no d_automount() op was supplied, -ELOOP if we've encountered too many
symlinks or mountpoints, -EISDIR if the walk point should be used without
mounting and 0 if successful.  The path will be updated to point to the mounted
filesystem if a successful automount took place.

__follow_mount() is replaced by follow_managed() which is more generic
(especially with the patch that adds ->d_manage()).  This handles transits from
directories during pathwalk, including automounting and skipping over
mountpoints (and holding processes with the next patch).

__follow_mount_rcu() will jump out of RCU-walk mode if it encounters an
automount point with nothing mounted on it.

follow_dotdot*() does not handle automounts as you don't want to trigger them
whilst following "..".

I've also extracted the mount/don't-mount logic from autofs4 and included it
here.  It makes the mount go ahead anyway if someone calls open() or creat(),
tries to traverse the directory, tries to chdir/chroot/etc. into the directory,
or sticks a '/' on the end of the pathname.  If they do a stat(), however,
they'll only trigger the automount if they didn't also say O_NOFOLLOW.

I've also added an inode flag (S_AUTOMOUNT) so that filesystems can mark their
inodes as automount points.  This flag is automatically propagated to the
dentry as DCACHE_NEED_AUTOMOUNT by __d_instantiate().  This saves NFS and could
save AFS a private flag bit apiece, but is not strictly necessary.  It would be
preferable to do the propagation in d_set_d_op(), but that doesn't normally
have access to the inode.

[AV: fixed breakage in case if __follow_mount_rcu() fails and nameidata_drop_rcu()
succeeds in RCU case of do_lookup(); we need to fall through to non-RCU case after
that, rather than just returning with ungrabbed *path]
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Was-Acked-by: NIan Kent <raven@themaw.net>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

9875cf80

do_lookup() fix · 1a8edf40

由 Al Viro 提交于 1月 15, 2011

do_lookup() has a path leading from LOOKUP_RCU case to non-RCU
crossing of mountpoints, which breaks things badly.  If we
hit need_revalidate: and do nothing in there, we need to come
back into LOOKUP_RCU half of things, not to done: in non-RCU
one.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

1a8edf40

14 1月, 2011 4 次提交

fs: namei fix ->put_link on wrong inode in do_filp_open · 7b9337aa

由 Nick Piggin 提交于 1月 14, 2011

J. R. Okajima noticed that ->put_link is being attempted on the
wrong inode, and suggested the way to fix it. I changed it a bit
according to Al's suggestion to keep an explicit link path around.
Signed-off-by: NNick Piggin <npiggin@kernel.dk>

7b9337aa

fs: fix do_last error case when need_reval_dot · f20877d9

由 J. R. Okajima 提交于 1月 14, 2011

When open(2) without O_DIRECTORY opens an existing dir, it should return
EISDIR. In do_last(), the variable 'error' is initialized EISDIR, but it
is changed by d_revalidate() which returns any positive to represent
'the target dir is valid.'

Should we keep and return the initialized 'error' in this case.
Signed-off-by: NNick Piggin <npiggin@kernel.dk>

f20877d9

fs: fix dropping of rcu-walk from force_reval_path · 90dbb77b

由 Nick Piggin 提交于 1月 14, 2011

As J. R. Okajima noted, force_reval_path passes in the same dentry to
d_revalidate as the one in the nameidata structure (other callers pass in a
child), so the locking breaks. This can oops with a chrooted nfs mount, for
example. Similarly there can be other problems with revalidating a dentry
which is already in nameidata of the path walk.
Signed-off-by: NNick Piggin <npiggin@kernel.dk>

90dbb77b

fs: force_reval_path drop rcu-walk before d_invalidate · bb20c18d

由 Nick Piggin 提交于 1月 14, 2011

d_revalidate can return in rcu-walk mode even when it returns 0.  We can't just
call any old dcache function on rcu-walk dentry (the dentry is unstable, so
even through d_lock can safely be taken, the result may no longer be what we
expect -- careful re-checks would be required). So just drop rcu in this case.

(I missed this conversion when switching to the rcu-walk convention that Linus
suggested)
Signed-off-by: NNick Piggin <npiggin@kernel.dk>

bb20c18d

13 1月, 2011 1 次提交

vfs: pass struct file to do_truncate on O_TRUNC opens (try #2) · e1181ee6

由 Jeff Layton 提交于 12月 07, 2010

When a file is opened with O_TRUNC, the truncate processing is handled
by handle_truncate(). This function however doesn't receive any info
about the newly instantiated filp, and therefore can't pass that info
along so that the setattr can use it.

This makes NFSv4 misbehave. The client does an open and gets a valid
stateid, and then doesn't use that stateid on the subsequent truncate.
It uses the zero-stateid instead. Most servers ignore this fact and
just do the truncate anyway, but some don't like it (notably, RHEL4).

It seems more correct that since we have a fully instantiated file at
the time that handle_truncate is called, that we pass that along so
that the truncate operation can properly use it.
Signed-off-by: NJeff Layton <jlayton@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

e1181ee6

10 1月, 2011 1 次提交

fs: fix namei.c kernel-doc notation · 39191628

由 Randy Dunlap 提交于 1月 08, 2011

Fix new kernel-doc notation warnings in fs/namei.c and spell
ECHILD correctly.

Warning(fs/namei.c:218): No description found for parameter 'flags'
Warning(fs/namei.c:425): Excess function parameter 'Returns' description in 'nameidata_drop_rcu'
Warning(fs/namei.c:478): Excess function parameter 'Returns' description in 'nameidata_dentry_drop_rcu'
Warning(fs/namei.c:540): Excess function parameter 'Returns' description in 'nameidata_drop_rcu_last'
Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
Cc: Nick Piggin <npiggin@kernel.dk>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

39191628

07 1月, 2011 2 次提交

fs: scale mntget/mntput · b3e19d92

由 Nick Piggin 提交于 1月 07, 2011

The problem that this patch aims to fix is vfsmount refcounting scalability.
We need to take a reference on the vfsmount for every successful path lookup,
which often go to the same mount point.

The fundamental difficulty is that a "simple" reference count can never be made
scalable, because any time a reference is dropped, we must check whether that
was the last reference. To do that requires communication with all other CPUs
that may have taken a reference count.

We can make refcounts more scalable in a couple of ways, involving keeping
distributed counters, and checking for the global-zero condition less
frequently.

- check the global sum once every interval (this will delay zero detection
  for some interval, so it's probably a showstopper for vfsmounts).

- keep a local count and only taking the global sum when local reaches 0 (this
  is difficult for vfsmounts, because we can't hold preempt off for the life of
  a reference, so a counter would need to be per-thread or tied strongly to a
  particular CPU which requires more locking).

- keep a local difference of increments and decrements, which allows us to sum
  the total difference and hence find the refcount when summing all CPUs. Then,
  keep a single integer "long" refcount for slow and long lasting references,
  and only take the global sum of local counters when the long refcount is 0.

This last scheme is what I implemented here. Attached mounts and process root
and working directory references are "long" references, and everything else is
a short reference.

This allows scalable vfsmount references during path walking over mounted
subtrees and unattached (lazy umounted) mounts with processes still running
in them.

This results in one fewer atomic op in the fastpath: mntget is now just a
per-CPU inc, rather than an atomic inc; and mntput just requires a spinlock
and non-atomic decrement in the common case. However code is otherwise bigger
and heavier, so single threaded performance is basically a wash.
Signed-off-by: NNick Piggin <npiggin@kernel.dk>

b3e19d92

N
fs: provide rcu-walk aware permission i_ops · b74c79e9
由 Nick Piggin 提交于 1月 07, 2011
```
Signed-off-by: NNick Piggin <npiggin@kernel.dk>
```
b74c79e9