提交 · 9c2c703929e4c41210cfa6e3f599514421bab8dc · openanolis / cloud-kernel

20 7月, 2011 10 次提交

A
->permission() sanitizing: pass MAY_NOT_BLOCK to ->check_acl() · 9c2c7039
由 Al Viro 提交于 6月 20, 2011
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
9c2c7039

->permission() sanitizing: MAY_NOT_BLOCK · 1fc0f78c

由 Al Viro 提交于 6月 20, 2011

Duplicate the flags argument into mask bitmap.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

1fc0f78c

kill check_acl callback of generic_permission() · 178ea735

由 Al Viro 提交于 6月 20, 2011

its value depends only on inode and does not change; we might as
well store it in ->i_op->check_acl and be done with that.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

178ea735

lockless get_write_access/deny_write_access · 07b8ce1e

由 Al Viro 提交于 6月 20, 2011

new helpers: atomic_inc_unless_negative()/atomic_dec_unless_positive()
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

07b8ce1e

A
move exec_permission() up to the rest of permission-related functions · f4d6ff89
由 Al Viro 提交于 6月 19, 2011
```
... and convert the comment before it into linuxdoc form.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
f4d6ff89

kill file_permission() completely · 3bfa784a

由 Al Viro 提交于 6月 19, 2011

convert the last remaining caller to inode_permission()
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

3bfa784a

A
switch path_init() to exec_permission() · 78f32a9b
由 Al Viro 提交于 6月 19, 2011
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
78f32a9b

make exec_permission(dir) really equivalent to inode_permission(dir, MAY_EXEC) · 4cf27141

由 Al Viro 提交于 6月 19, 2011

capability overrides apply only to the default case; if fs has ->permission()
that does _not_ call generic_permission(), we have no business doing them.
Moreover, if it has ->permission() that does call generic_permission(), we
have no need to recheck capabilities.

Besides, the capability overrides should apply only if we got EACCES from
acl_permission_check(); any other value (-EIO, etc.) should be returned
to caller, capabilities or not capabilities.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

4cf27141

fs: add a DCACHE_NEED_LOOKUP flag for d_flags · 44396f4b

由 Josef Bacik 提交于 5月 31, 2011

Btrfs (and I'd venture most other fs's) stores its indexes in nice disk order
for readdir, but unfortunately in the case of anything that stats the files in
order that readdir spits back (like oh say ls) that means we still have to do
the normal lookup of the file, which means looking up our other index and then
looking up the inode. What I want is a way to create dummy dentries when we
find them in readdir so that when ls or anything else subsequently does a
stat(), we already have the location information in the dentry and can go
straight to the inode itself. The lookup stuff just assumes that if it finds a
dentry it is done, it doesn't perform a lookup. So add a DCACHE_NEED_LOOKUP
flag so that the lookup code knows it still needs to run i_op->lookup() on the
parent to get the inode for the dentry. I have tested this with btrfs and I
went from something that looks like this

http://people.redhat.com/jwhiter/ls-noreada.png

To this

http://people.redhat.com/jwhiter/ls-good.png

Thats a savings of 1300 seconds, or 22 minutes. That is a significant savings.
Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

44396f4b

vfs: fix race in rcu lookup of pruned dentry · 59430262

由 Linus Torvalds 提交于 7月 18, 2011

Don't update *inode in __follow_mount_rcu() until we'd verified that
there is mountpoint there.  Kudos to Hugh Dickins for catching that
one in the first place and eventually figuring out the solution (and
catching a braino in the earlier version of patch).
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

59430262

13 7月, 2011 1 次提交

Fix ->d_lock locking order in unlazy_walk() · 94c0d4ec

由 Al Viro 提交于 7月 12, 2011

Make sure that child is still a child of parent before nested locking
of child->d_lock in unlazy_walk(); otherwise we are risking a violation
of locking order and deadlocks.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

94c0d4ec

20 6月, 2011 2 次提交

fix comment in generic_permission() · 8e833fd2

由 Al Viro 提交于 6月 19, 2011

CAP_DAC_OVERRIDE is enough for MAY_EXEC on directory, even if
no exec bits are set.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

8e833fd2

A
kill obsolete comment for follow_down() · 6291176b
由 Al Viro 提交于 6月 17, 2011
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
6291176b

16 6月, 2011 2 次提交

VFS: Fix vfsmount overput on simultaneous automount · 8aef1884

由 Al Viro 提交于 6月 16, 2011

[Kudos to dhowells for tracking that crap down]

If two processes attempt to cause automounting on the same mountpoint at the
same time, the vfsmount holding the mountpoint will be left with one too few
references on it, causing a BUG when the kernel tries to clean up.

The problem is that lock_mount() drops the caller's reference to the
mountpoint's vfsmount in the case where it finds something already mounted on
the mountpoint as it transits to the mounted filesystem and replaces path->mnt
with the new mountpoint vfsmount.

During a pathwalk, however, we don't take a reference on the vfsmount if it is
the same as the one in the nameidata struct, but do_add_mount() doesn't know
this.

The fix is to make sure we have a ref on the vfsmount of the mountpoint before
calling do_add_mount().  However, if lock_mount() doesn't transit, we're then
left with an extra ref on the mountpoint vfsmount which needs releasing.
We can handle that in follow_managed() by not making assumptions about what
we can and what we cannot get from lookup_mnt() as the current code does.

The callers of follow_managed() expect that reference to path->mnt will be
grabbed iff path->mnt has been changed.  follow_managed() and follow_automount()
keep track of whether such reference has been grabbed and assume that it'll
happen in those and only those cases that'll have us return with changed
path->mnt.  That assumption is almost correct - it breaks in case of
racing automounts and in even harder to hit race between following a mountpoint
and a couple of mount --move.  The thing is, we don't need to make that
assumption at all - after the end of loop in follow_manage() we can check
if path->mnt has ended up unchanged and do mntput() if needed.

The BUG can be reproduced with the following test program:

	#include <stdio.h>
	#include <sys/types.h>
	#include <sys/stat.h>
	#include <unistd.h>
	#include <sys/wait.h>
	int main(int argc, char **argv)
	{
		int pid, ws;
		struct stat buf;
		pid = fork();
		stat(argv[1], &buf);
		if (pid > 0) wait(&ws);
		return 0;
	}

and the following procedure:

 (1) Mount an NFS volume that on the server has something else mounted on a
     subdirectory.  For instance, I can mount / from my server:

	mount warthog:/ /mnt -t nfs4 -r

     On the server /data has another filesystem mounted on it, so NFS will see
     a change in FSID as it walks down the path, and will mark /mnt/data as
     being a mountpoint.  This will cause the automount code to be triggered.

     !!! Do not look inside the mounted fs at this point !!!

 (2) Run the above program on a file within the submount to generate two
     simultaneous automount requests:

	/tmp/forkstat /mnt/data/testfile

 (3) Unmount the automounted submount:

	umount /mnt/data

 (4) Unmount the original mount:

	umount /mnt

     At this point the kernel should throw a BUG with something like the
     following:

	BUG: Dentry ffff880032e3c5c0{i=2,n=} still in use (1) [unmount of nfs4 0:12]

Note that the bug appears on the root dentry of the original mount, not the
mountpoint and not the submount because sys_umount() hasn't got to its final
mntput_no_expire() yet, but this isn't so obvious from the call trace:

 [<ffffffff8117cd82>] shrink_dcache_for_umount+0x69/0x82
 [<ffffffff8116160e>] generic_shutdown_super+0x37/0x15b
 [<ffffffffa00fae56>] ? nfs_super_return_all_delegations+0x2e/0x1b1 [nfs]
 [<ffffffff811617f3>] kill_anon_super+0x1d/0x7e
 [<ffffffffa00d0be1>] nfs4_kill_super+0x60/0xb6 [nfs]
 [<ffffffff81161c17>] deactivate_locked_super+0x34/0x83
 [<ffffffff811629ff>] deactivate_super+0x6f/0x7b
 [<ffffffff81186261>] mntput_no_expire+0x18d/0x199
 [<ffffffff811862a8>] mntput+0x3b/0x44
 [<ffffffff81186d87>] release_mounts+0xa2/0xbf
 [<ffffffff811876af>] sys_umount+0x47a/0x4ba
 [<ffffffff8109e1ca>] ? trace_hardirqs_on_caller+0x1fd/0x22f
 [<ffffffff816ea86b>] system_call_fastpath+0x16/0x1b

as do_umount() is inlined.  However, you can see release_mounts() in there.

Note also that it may be necessary to have multiple CPU cores to be able to
trigger this bug.
Tested-by: NJeff Layton <jlayton@redhat.com>
Tested-by: NIan Kent <raven@themaw.net>
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

8aef1884

fix wrong iput on d_inode introduced by · 50338b88

由 Török Edwin 提交于 6月 16, 2011

Git bisection shows that commit e6bc45d6 causes
BUG_ONs under high I/O load:

kernel BUG at fs/inode.c:1368!
[ 2862.501007] Call Trace:
[ 2862.501007]  [<ffffffff811691d8>] d_kill+0xf8/0x140
[ 2862.501007]  [<ffffffff81169c19>] dput+0xc9/0x190
[ 2862.501007]  [<ffffffff8115577f>] fput+0x15f/0x210
[ 2862.501007]  [<ffffffff81152171>] filp_close+0x61/0x90
[ 2862.501007]  [<ffffffff81152251>] sys_close+0xb1/0x110
[ 2862.501007]  [<ffffffff814c14fb>] system_call_fastpath+0x16/0x1b

A reliable way to reproduce this bug is:
Login to KDE, run 'rsnapshot sync', and apt-get install openjdk-6-jdk,
and apt-get remove openjdk-6-jdk.

The buggy part of the patch is this:
	struct inode *inode = NULL;
.....
-               if (nd.last.name[nd.last.len])
-                       goto slashes;
                inode = dentry->d_inode;
-               if (inode)
-                       ihold(inode);
+               if (nd.last.name[nd.last.len] || !inode)
+                       goto slashes;
+               ihold(inode)
...
	if (inode)
		iput(inode);	/* truncate the inode here */

If nd.last.name[nd.last.len] is nonzero (and thus goto slashes branch is taken),
and dentry->d_inode is non-NULL, then this code now does an additional iput on
the inode, which is wrong.

Fix this by only setting the inode variable if nd.last.name[nd.last.len] is 0.

Reference: https://lkml.org/lkml/2011/6/15/50Reported-by: NNorbert Preining <preining@logic.at>
Reported-by: NTörök Edwin <edwintorok@gmail.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NTörök Edwin <edwintorok@gmail.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

50338b88

07 6月, 2011 1 次提交

vfs: make unlink() and rmdir() return ENOENT in preference to EROFS · e6bc45d6

由 Theodore Ts'o 提交于 6月 06, 2011

If user space attempts to remove a non-existent file or directory, and
the file system is mounted read-only, return ENOENT instead of EROFS.
Either error code is arguably valid/correct, but ENOENT is a more
specific error message.
Reported-by: NMichael Tokarev <mjt@tls.msk.ru>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

e6bc45d6

30 5月, 2011 1 次提交

vfs: shrink_dcache_parent before rmdir, dir rename · 3cebde24

由 Sage Weil 提交于 5月 29, 2011

The dentry_unhash push-down series missed that shink_dcache_parent needs to
be called prior to rmdir or dir rename to clear DCACHE_REFERENCED and
allow efficient dentry reclaim.
Reported-by: NDave Chinner <david@fromorbit.com>
Signed-off-by: NSage Weil <sage@newdream.net>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

3cebde24

27 5月, 2011 3 次提交

A
Lift the check for automount points into do_lookup() · d6e9bd25
由 Al Viro 提交于 5月 27, 2011
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
d6e9bd25

Trim excessive arguments of follow_mount_rcu() · dea39376

由 Al Viro 提交于 5月 27, 2011

... and kill a useless local variable in follow_dotdot_rcu(), while
we are at it - follow_mount_rcu(nd, path, inode) *always* assigned
value to *inode, and always it had been path->dentry->d_inode (aka
nd->path.dentry->d_inode, since it always got &nd->path as the second
argument).
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

dea39376

A
split __follow_mount_rcu() into normal and .. cases · 287548e4
由 Al Viro 提交于 5月 27, 2011
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
287548e4

26 5月, 2011 11 次提交

vfs: clean up vfs_rename_other · 51892bbb

由 Sage Weil 提交于 5月 24, 2011

Simplify control flow to match vfs_rename_dir.
Signed-off-by: NSage Weil <sage@newdream.net>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

51892bbb

vfs: clean up vfs_rename_dir · 9055cba7

由 Sage Weil 提交于 5月 24, 2011

Simplify control flow through vfs_rename_dir.
Signed-off-by: NSage Weil <sage@newdream.net>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

9055cba7

vfs: clean up vfs_rmdir · 912dbc15

由 Sage Weil 提交于 5月 24, 2011

Simplify the control flow with an out label.
Signed-off-by: NSage Weil <sage@newdream.net>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

912dbc15

vfs: fix vfs_rename_dir for FS_RENAME_DOES_D_MOVE filesystems · b5afd2c4

由 Miklos Szeredi 提交于 5月 24, 2011

vfs_rename_dir() doesn't properly account for filesystems with
FS_RENAME_DOES_D_MOVE.  If new_dentry has a target inode attached, it
unhashes the new_dentry prior to the rename() iop and rehashes it after,
but doesn't account for the possibility that rename() may have swapped
{old,new}_dentry.  For FS_RENAME_DOES_D_MOVE filesystems, it rehashes
new_dentry (now the old renamed-from name, which d_move() expected to go
away), such that a subsequent lookup will find it.  Currently all
FS_RENAME_DOES_D_MOVE filesystems compensate for this by failing in
d_revalidate.

The bug was introduced by: commit 349457cc
"[PATCH] Allow file systems to manually d_move() inside of ->rename()"

Fix by not rehashing the new dentry.  Rehashing used to be needed by
d_move() but isn't anymore.
Reported-by: NSage Weil <sage@newdream.net>
Signed-off-by: NMiklos Szeredi <miklos@szeredi.hu>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

b5afd2c4

vfs: update dentry_unhash() comment · a71905f0

由 Sage Weil 提交于 5月 24, 2011

The helper is now only called by file systems, not the VFS.
Signed-off-by: NSage Weil <sage@newdream.net>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

a71905f0

vfs: push dentry_unhash on rename_dir into file systems · e4eaac06

由 Sage Weil 提交于 5月 24, 2011

Only a few file systems need this.  Start by pushing it down into each
rename method (except gfs2 and xfs) so that it can be dealt with on a
per-fs basis.
Acked-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NSage Weil <sage@newdream.net>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

e4eaac06

vfs: push dentry_unhash on rmdir into file systems · 79bf7c73

由 Sage Weil 提交于 5月 24, 2011

Only a few file systems need this.  Start by pushing it down into each
fs rmdir method (except gfs2 and xfs) so it can be dealt with on a per-fs
basis.

This does not change behavior for any in-tree file systems.
Acked-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NSage Weil <sage@newdream.net>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

79bf7c73

vfs: remove dget() from dentry_unhash() · 64252c75

由 Sage Weil 提交于 5月 24, 2011

This serves no useful purpose that I can discern.  All callers (rename,
rmdir) hold their own reference to the dentry.

A quick audit of all file systems showed no relevant checks on the value
of d_count in vfs_rmdir/vfs_rename_dir paths.
Signed-off-by: NSage Weil <sage@newdream.net>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

64252c75

vfs: dentry_unhash immediately prior to rmdir · 48293699

由 Sage Weil 提交于 5月 24, 2011

This presumes that there is no reason to unhash a dentry if we fail because
it is a mountpoint or the LSM check fails, and that the LSM checks do not
depend on the dentry being unhashed.
Signed-off-by: NSage Weil <sage@newdream.net>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

48293699

merge handle_reval_dot and nameidata_drop_rcu_last · 9f1fafee

由 Al Viro 提交于 3月 25, 2011

new helper: complete_walk().  Done on successful completion
of walk, drops out of RCU mode, does d_revalidate of final
result if that hadn't been done already.

handle_reval_dot() and nameidata_drop_rcu_last() subsumed into
that one; callers converted to use of complete_walk().
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

9f1fafee

consolidate nameidata_..._drop_rcu() · 19660af7

由 Al Viro 提交于 3月 25, 2011

Merge these into a single function (unlazy_walk(nd, dentry)),
kill ..._maybe variants
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

19660af7

21 5月, 2011 1 次提交

VFS: move BUG_ON test for symlink nd->depth after current->link_count test · 1a4022f8

由 Erez Zadok 提交于 5月 21, 2011

This solves a serious VFS-level bug in nested_symlink (which was
rewritten from do_follow_link), and follows the order of depth tests
that existed before.

The bug triggers a BUG_ON in fs/namei.c:1381, when running racer with
symlink and rename ops.
Signed-off-by: NErez Zadok <ezk@cs.sunysb.edu>
Acked-by: NMiklos Szeredi <mszeredi@suse.cz>
Cc: stable@kernel.org
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

1a4022f8

14 5月, 2011 1 次提交

vfs: micro-optimize acl_permission_check() · 26cf46be

由 Linus Torvalds 提交于 5月 13, 2011

It's a hot function, and we're better off not mixing types in the mask
calculations.  The compiler just ends up mixing 16-bit and 32-bit
operations, for no good reason.

So do everything in 'unsigned int' rather than mixing 'unsigned int'
masking with a 'umode_t' (16-bit) mode variable.

This, together with the parent commit (47a150ed: "Cache user_ns in
struct cred") makes acl_permission_check() much nicer.
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

26cf46be

16 4月, 2011 1 次提交

vfs: Fix absolute RCU path walk failures due to uninitialized seq number · c1530019

由 Tim Chen 提交于 4月 15, 2011

During RCU walk in path_lookupat and path_openat, the rcu lookup
frequently failed if looking up an absolute path, because when root
directory was looked up, seq number was not properly set in nameidata.

We dropped out of RCU walk in nameidata_drop_rcu due to mismatch in
directory entry's seq number.  We reverted to slow path walk that need
to take references.

With the following patch, I saw a 50% increase in an exim mail server
benchmark throughput on a 4-socket Nehalem-EX system.
Signed-off-by: NTim Chen <tim.c.chen@linux.intel.com>
Reviewed-by: NAndi Kleen <ak@linux.intel.com>
Cc: stable@kernel.org (v2.6.38)
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c1530019

31 3月, 2011 1 次提交

Fix common misspellings · 25985edc

由 Lucas De Marchi 提交于 3月 30, 2011

Fixes generated by 'codespell' and manually reviewed.
Signed-off-by: NLucas De Marchi <lucas.demarchi@profusion.mobi>

25985edc

25 3月, 2011 1 次提交

vfs - check non-mountpoint dentry might block in __follow_mount_rcu() · 62a7375e

由 Ian Kent 提交于 3月 25, 2011

When following a mount in rcu-walk mode we must check if the incoming dentry
is telling us it may need to block, even if it isn't actually a mountpoint.
Signed-off-by: NIan Kent <raven@themaw.net>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

62a7375e

24 3月, 2011 2 次提交

userns: rename is_owner_or_cap to inode_owner_or_capable · 2e149670

由 Serge E. Hallyn 提交于 3月 23, 2011

And give it a kernel-doc comment.

[akpm@linux-foundation.org: btrfs changed in linux-next]
Signed-off-by: NSerge E. Hallyn <serge.hallyn@canonical.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Daniel Lezcano <daniel.lezcano@free.fr>
Acked-by: NDavid Howells <dhowells@redhat.com>
Cc: James Morris <jmorris@namei.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

2e149670

userns: userns: check user namespace for task->file uid equivalence checks · e795b717

由 Serge E. Hallyn 提交于 3月 23, 2011

Cheat for now and say all files belong to init_user_ns.  Next step will be
to let superblocks belong to a user_ns, and derive inode_userns(inode)
from inode->i_sb->s_user_ns.  Finally we'll introduce more flexible
arrangements.

Changelog:
	Feb 15: make is_owner_or_cap take const struct inode
	Feb 23: make is_owner_or_cap bool

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: NSerge E. Hallyn <serge.hallyn@canonical.com>
Acked-by: N"Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: NDaniel Lezcano <daniel.lezcano@free.fr>
Acked-by: NDavid Howells <dhowells@redhat.com>
Cc: James Morris <jmorris@namei.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

e795b717

23 3月, 2011 1 次提交
- A
  fix leaks in path_lookupat() · bd23a539
  由 Al Viro 提交于 3月 23, 2011
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  bd23a539
18 3月, 2011 1 次提交
- A
  lose 'mounting_here' argument in ->d_manage() · 1aed3e42
  由 Al Viro 提交于 3月 18, 2011
```
it's always false...
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  1aed3e42

openanolis / cloud-kernel 10 个月 前同步成功

openanolis / cloud-kernel
10 个月前同步成功