提交 · b21996e36c8e3b92a84e972378bde80b43acd890 · openeuler / raspberrypi-kernel

09 11月, 2013 40 次提交

locks: break delegations on unlink · b21996e3

由 J. Bruce Fields 提交于 9月 20, 2011

We need to break delegations on any operation that changes the set of
links pointing to an inode.  Start with unlink.

Such operations also hold the i_mutex on a parent directory.  Breaking a
delegation may require waiting for a timeout (by default 90 seconds) in
the case of a unresponsive NFS client.  To avoid blocking all directory
operations, we therefore drop locks before waiting for the delegation.
The logic then looks like:

	acquire locks
	...
	test for delegation; if found:
		take reference on inode
		release locks
		wait for delegation break
		drop reference on inode
		retry

It is possible this could never terminate.  (Even if we take precautions
to prevent another delegation being acquired on the same inode, we could
get a different inode on each retry.)  But this seems very unlikely.

The initial test for a delegation happens after the lock on the target
inode is acquired, but the directory inode may have been acquired
further up the call stack.  We therefore add a "struct inode **"
argument to any intervening functions, which we use to pass the inode
back up to the caller in the case it needs a delegation synchronously
broken.

Cc: David Howells <dhowells@redhat.com>
Cc: Tyler Hicks <tyhicks@canonical.com>
Cc: Dustin Kirkland <dustin.kirkland@gazzang.com>
Acked-by: NJeff Layton <jlayton@redhat.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

b21996e3

namei: minor vfs_unlink cleanup · 9accbb97

由 J. Bruce Fields 提交于 8月 28, 2012

We'll be using dentry->d_inode in one more place.
Acked-by: NJeff Layton <jlayton@redhat.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

9accbb97

locks: implement delegations · df4e8d2c

由 J. Bruce Fields 提交于 3月 05, 2012

Implement NFSv4 delegations at the vfs level using the new FL_DELEG lock
type.

Note nfsd is the only delegation user and is only using read
delegations.  Warn on any attempt to set a write delegation for now.
We'll come back to that case later.
Acked-by: NJeff Layton <jlayton@redhat.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

df4e8d2c

locks: introduce new FL_DELEG lock flag · 617588d5

由 J. Bruce Fields 提交于 7月 01, 2011

For now FL_DELEG is just a synonym for FL_LEASE.  So this patch doesn't
change behavior.

Next we'll modify break_lease to treat FL_DELEG leases differently, to
account for the fact that NFSv4 delegations should be broken in more
situations than Windows oplocks.
Acked-by: NJeff Layton <jlayton@redhat.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

617588d5

vfs: take i_mutex on renamed file · 6cedba89

由 J. Bruce Fields 提交于 3月 05, 2012

A read delegation is used by NFSv4 as a guarantee that a client can
perform local read opens without informing the server.

The open operation takes the last component of the pathname as an
argument, thus is also a lookup operation, and giving the client the
above guarantee means informing the client before we allow anything that
would change the set of names pointing to the inode.

Therefore, we need to break delegations on rename, link, and unlink.

We also need to prevent new delegations from being acquired while one of
these operations is in progress.

We could add some completely new locking for that purpose, but it's
simpler to use the i_mutex, since that's already taken by all the
operations we care about.

The single exception is rename.  So, modify rename to take the i_mutex
on the file that is being renamed.

Also fix up lockdep and Documentation/filesystems/directory-locking to
reflect the change.
Acked-by: NJeff Layton <jlayton@redhat.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

6cedba89

vfs: rename I_MUTEX_QUOTA now that it's not used for quotas · 40bd22c9

由 J. Bruce Fields 提交于 4月 18, 2012

I_MUTEX_QUOTA is now just being used whenever we want to lock two
non-directories.  So the name isn't right.  I_MUTEX_NONDIR2 isn't
especially elegant but it's the best I could think of.

Also fix some outdated documentation.
Acked-by: NJeff Layton <jlayton@redhat.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

40bd22c9

vfs: don't use PARENT/CHILD lock classes for non-directories · 27555516

由 J. Bruce Fields 提交于 4月 25, 2012

Reserve I_MUTEX_PARENT and I_MUTEX_CHILD for locking of actual
directories.

(Also I_MUTEX_QUOTA isn't really a meaningful name for this locking
class any more; fixed in a later patch.)
Acked-by: NJeff Layton <jlayton@redhat.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

27555516

vfs: pull ext4's double-i_mutex-locking into common code · 375e289e

由 J. Bruce Fields 提交于 4月 18, 2012

We want to do this elsewhere as well.

Also catch any attempts to use it for directories (where this ordering
would conflict with ancestor-first directory ordering in lock_rename).

Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: Dave Chinner <david@fromorbit.com>
Acked-by: NJeff Layton <jlayton@redhat.com>
Acked-by: N"Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

375e289e

exportfs: fix quadratic behavior in filehandle lookup · f27c9298

由 J. Bruce Fields 提交于 10月 17, 2013

Suppose we're given the filehandle for a directory whose closest
ancestor in the dcache is its Nth ancestor.

The main loop in reconnect_path searches for an IS_ROOT ancestor of
target_dir, reconnects that ancestor to its parent, then recommences the
search for an IS_ROOT ancestor from target_dir.

This behavior is quadratic in N.  And there's really no need to restart
the search from target_dir each time: once a directory has been looked
up, it won't become IS_ROOT again.  So instead of starting from
target_dir each time, we can continue where we left off.

This simplifies the code and improves performance on very deep directory
heirachies.  (I can't think of any reason anyone should need heirarchies
a hundred or more deep, but the performance improvement may be valuable
if only to limit damage in case of abuse.)
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

f27c9298

exportfs: better variable name · efbf201f

由 J. Bruce Fields 提交于 10月 17, 2013

Replace another unhelpful acronym.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

efbf201f

exportfs: move most of reconnect_path to helper function · bbf7a8a3

由 J. Bruce Fields 提交于 10月 17, 2013

Also replace 3 easily-confused three-letter acronyms by more helpful
variable names.

Just cleanup, no change in functionality, with one exception: the
dentry_connected() check in the "out_reconnected" case will now only
check the ancestors of the current dentry instead of checking all the
way from target_dir.  Since we've already verified connectivity up to
this dentry, that should be sufficient.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

bbf7a8a3

exportfs: eliminate unused "noprogress" counter · e4b70ebe

由 J. Bruce Fields 提交于 10月 16, 2013

Note this counter is now being set to 0 on every pass through the loop,
so it no longer serves any useful purpose.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

e4b70ebe

exportfs: stop retrying once we race with rename/remove · a056cc89

由 J. Bruce Fields 提交于 10月 16, 2013

There are two places here where we could race with a rename or remove:

	- We could find the parent, but then be removed or renamed away
	  from that parent directory before finding our name in that
	  directory.
	- We could find the parent, and find our name in that parent,
	  but then be renamed or removed before we look ourselves up by
	  that name in that parent.

In both cases the concurrent rename or remove will take care of
reconnecting the directory that we're currently examining.  Our target
directory should then also be connected.  Check this and clear
DISCONNECTED in these cases instead of looping around again.

Note: we *do* need to check that this actually happened if we want to be
robust in the face of corrupted filesystems: a corrupted filesystem
could just return a completely wrong parent, and we want to fail with an
error in that case before starting to clear DISCONNECTED on
non-DISCONNECTED filesystems.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

a056cc89

exportfs: clear DISCONNECTED on all parents sooner · 0dbc018a

由 J. Bruce Fields 提交于 9月 09, 2013

Once we've found any connected parent, we know all our parents are
connected--that's true even if there's a concurrent rename.  May as well
clear them all at once and be done with it.
Reviewed-by: NCristoph Hellwig <hch@lst.de>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

0dbc018a

exportfs: more detailed comment for path_reconnect · 78cee9a8

由 J. Bruce Fields 提交于 10月 22, 2013

Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

78cee9a8

exportfs: BUG_ON in crazy corner case · 854ff5ca

由 Christoph Hellwig 提交于 10月 16, 2013

This would indicate a nasty bug in the dcache and has never triggered in
the past 10 years as far as I know.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

854ff5ca

dcache: fix outdated DCACHE_NEED_LOOKUP comment · 13a2c3be

由 J. Bruce Fields 提交于 10月 23, 2013

The DCACHE_NEED_LOOKUP case referred to here was removed with
39e3c955 "vfs: remove
DCACHE_NEED_LOOKUP".

There are only four real_lookup() callers and all of them pass in an
unhashed dentry just returned from d_alloc.
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

13a2c3be

dcache: don't clear DCACHE_DISCONNECTED too early · f80de2cd

由 J. Bruce Fields 提交于 7月 18, 2012

DCACHE_DISCONNECTED should not be cleared until we're sure the dentry is
connected all the way up to the root of the filesystem.  It *shouldn't*
be cleared as soon as the dentry is connected to a parent.  That will
cause bugs at least on exportable filesystems.
Acked-by: NChristoph Hellwig <hch@infradead.org>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

f80de2cd

dcache: Don't set DISCONNECTED on "pseudo filesystem" dentries · e1a24bb0

由 J. Bruce Fields 提交于 6月 29, 2012

I can't for the life of me see any reason why anyone should care whether
a dentry that is never hooked into the dentry cache would need
DCACHE_DISCONNECTED set.

This originates from 4b936885 "fs:
improve scalability of pseudo filesystems", which probably just made the
false assumption the DCACHE_DISCONNECTED was meant to be set on anything
not connected to a parent somehow.

So this is just confusing.  Ideally the only uses of DCACHE_DISCONNECTED
would be in the filehandle-lookup code, which needs it to ensure
dentries are connected into the dentry tree before use.

I left d_alloc_pseudo there even though it's now equivalent to
__d_alloc(), just on the theory the name is better documentation of its
intended use outside dcache.c.

Cc: Nick Piggin <npiggin@kernel.dk>
Acked-by: NChristoph Hellwig <hch@infradead.org>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

e1a24bb0

dcache: use IS_ROOT to decide where dentry is hashed · 7632e465

由 J. Bruce Fields 提交于 6月 28, 2012

Every hashed dentry is either hashed in the dentry_hashtable, or a
superblock's s_anon list.

__d_drop() assumes it can determine which is the case by checking
DCACHE_DISCONNECTED; this is not true.

It is true that when DCACHE_DISCONNECTED is cleared, the dentry is not
only hashed on dentry_hashtable, but is fully connected to its parents
back to the root.

But the converse is *not* true: fs/exportfs/expfs.c:reconnect_path()
attempts to connect a directory (found by filehandle lookup) back to
root by ascending to parents and performing lookups one at a time.  It
does not clear DCACHE_DISCONNECTED until it's done, and that is not at
all an atomic process.

In particular, it is possible for DCACHE_DISCONNECTED to be set on a
dentry which is hashed on the dentry_hashtable.

Instead, use IS_ROOT() to check which hash chain a dentry is on.  This
*does* work:

Dentries are hashed only by:

	- d_obtain_alias, which adds an IS_ROOT() dentry to sb_anon.

	- __d_rehash, called by _d_rehash: hashes to the dentry's
	  parent, and all callers of _d_rehash appear to have d_parent
	  set to a "real" parent.
	- __d_rehash, called by __d_move: rehashes the moved dentry to
	  hash chain determined by target, and assigns target's d_parent
	  to its d_parent, before dropping the dentry's d_lock.

Therefore I believe it's safe for a holder of a dentry's d_lock to
assume that it is hashed on sb_anon if and only if IS_ROOT(dentry) is
true.

I believe the incorrect assumption about DCACHE_DISCONNECTED was
originally introduced by ceb5bdc2 "fs: dcache per-bucket dcache hash
locking".

Also add a comment while we're here.

Cc: Nick Piggin <npiggin@kernel.dk>
Acked-by: NChristoph Hellwig <hch@infradead.org>
Reviewed-by: NNeilBrown <neilb@suse.de>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

7632e465

A
ocfs2: get rid of impossible checks · b19f1336
由 Al Viro 提交于 11月 03, 2013
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
b19f1336
A
qnx4: i_sb is never NULL · fbad2bd1
由 Al Viro 提交于 11月 03, 2013
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
fbad2bd1

exportfs: fix 32-bit nfsd handling of 64-bit inode numbers · 950ee956

由 J. Bruce Fields 提交于 9月 10, 2013

Symptoms were spurious -ENOENTs on stat of an NFS filesystem from a
32-bit NFS server exporting a very large XFS filesystem, when the
server's cache is cold (so the inodes in question are not in cache).
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reported-by: NTrevor Cordes <trevor@tecnopolis.ca>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

950ee956

vfs: split out vfs_getattr_nosec · b7a6ec52

由 J. Bruce Fields 提交于 10月 02, 2013

The filehandle lookup code wants this version of getattr.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

b7a6ec52

A
iget/iget5: don't bother with ->i_lock until we find a match · 5a3cd992
由 Al Viro 提交于 11月 06, 2013
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
5a3cd992

VFS: Put a small type field into struct dentry::d_flags · b18825a7

由 David Howells 提交于 9月 12, 2013

Put a type field into struct dentry::d_flags to indicate if the dentry is one
of the following types that relate particularly to pathwalk:

	Miss (negative dentry)
	Directory
	"Automount" directory (defective - no i_op->lookup())
	Symlink
	Other (regular, socket, fifo, device)

The type field is set to one of the first five types on a dentry by calls to
__d_instantiate() and d_obtain_alias() from information in the inode (if one is
given).

The type is cleared by dentry_unlink_inode() when it reconstitutes an existing
dentry as a negative dentry.

Accessors provided are:

	d_set_type(dentry, type)
	d_is_directory(dentry)
	d_is_autodir(dentry)
	d_is_symlink(dentry)
	d_is_file(dentry)
	d_is_negative(dentry)
	d_is_positive(dentry)

A bunch of checks in pathname resolution switched to those.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

b18825a7

A
elf{,_fdpic} coredump: get rid of pointless if (siginfo->si_signo) · afabada9
由 Al Viro 提交于 10月 14, 2013
```
we can't get to do_coredump() if that condition isn't satisfied...
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
afabada9
A
constify do_coredump() argument · ec57941e
由 Al Viro 提交于 10月 13, 2013
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
ec57941e
A
constify copy_siginfo_to_user{,32}() · ce395960
由 Al Viro 提交于 10月 13, 2013
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
ce395960

... and kill anon_inode_getfile_private() · 078d8e62

由 Al Viro 提交于 10月 09, 2013

it's a seriously misguided API, now fortunately without users.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

078d8e62

rework aio migrate pages to use aio fs · 71ad7490

由 Benjamin LaHaise 提交于 9月 17, 2013

Don't abuse anon_inodes.c to host private files needed by aio;
we can bloody well declare a mini-fs of our own instead of
patching up what anon_inodes can create for us.
Tested-by: NBenjamin LaHaise <bcrl@kvack.org>
Acked-by: NBenjamin LaHaise <bcrl@kvack.org>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

71ad7490

A
take anon inode allocation to libfs.c · 6987843f
由 Al Viro 提交于 10月 02, 2013
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
6987843f

new helper: dump_align() · 22a8cb82

由 Al Viro 提交于 10月 08, 2013

dump_skip to given alignment...
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

22a8cb82

A
spufs: get rid of dump_emit() wrappers · 7b1f4020
由 Al Viro 提交于 10月 08, 2013
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
7b1f4020
A
dump_skip(): dump_seek() replacement taking coredump_params · 9b56d543
由 Al Viro 提交于 10月 08, 2013
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
9b56d543

make dump_emit() use vfs_write() instead of banging at ->f_op->write directly · 2507a4fb

由 Al Viro 提交于 10月 08, 2013

... and deal with short writes properly - the output might be to pipe, after
all; as it is, e.g. no-MMU case of elf_fdpic coredump can write a whole lot
more than a page worth of data at one call.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

2507a4fb

A
binfmt_elf: count notes towards coredump limit · 1ad67015
由 Al Viro 提交于 10月 07, 2013
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
1ad67015
A
aout: switch to dump_emit · 43a5d548
由 Al Viro 提交于 10月 07, 2013
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
43a5d548
A
switch elf_coredump_extra_notes_write() to dump_emit() · cdc3d562
由 Al Viro 提交于 10月 05, 2013
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
cdc3d562
A
convert the rest of binfmt_elf_fdpic to dump_emit() · e6c1baa9
由 Al Viro 提交于 10月 05, 2013
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
e6c1baa9