提交 · 80c30e8de4f81851b1f712bcc596e11d53bc76f1 · openanolis / cloud-kernel

26 1月, 2011 4 次提交

NFS: Prevent memory allocation failure in nfsacl_encode() · f61f6da0

由 Chuck Lever 提交于 1月 21, 2011

nfsacl_encode() allocates memory in certain cases.  This of course
is not guaranteed to work.

Since commit 9f06c719 "SUNRPC: New xdr_streams XDR encoder API", the
kernel's XDR encoders can't return a result indicating possibly a
failure, so a memory allocation failure in nfsacl_encode() has become
fatal (ie, the XDR code Oopses) in some cases.

However, the allocated memory is a tiny fixed amount, on the order
of 40-50 bytes.  We can easily use a stack-allocated buffer for
this, with only a wee bit of nose-holding.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

f61f6da0

NFS: Fix "kernel BUG at fs/nfs/nfs3xdr.c:1338!" · ee5dc773

由 Chuck Lever 提交于 1月 21, 2011

Milan Broz <mbroz@redhat.com> reports:

> on today Linus' tree I get OOps if using nfs.
>
> server (2.6.36) exports dir:
> /dir   172.16.1.0/24(rw,async,all_squash,no_subtree_check,anonuid=500,anongid=500)
>
> on client it is mounted  in fstab
> server:/dir  /mnt/tst  nfs  rw,soft 0 0
>
> and these commands OOpses it (simplified from a configure script):
>
> cd /dir
> touch x
> install x y
>
> [  105.327701] ------------[ cut here ]------------
> [  105.327979] kernel BUG at fs/nfs/nfs3xdr.c:1338!
> [  105.328075] invalid opcode: 0000 [#1] PREEMPT SMP
> [  105.328223] last sysfs file: /sys/devices/virtual/bdi/0:16/uevent
> [  105.328349] Modules linked in: usbcore dm_mod
> [  105.328553]
> [  105.328678] Pid: 3710, comm: install Not tainted 2.6.37+ #423 440BX Desktop Reference Platform/VMware Virtual Platform
> [  105.328853] EIP: 0060:[<c116c06c>] EFLAGS: 00010282 CPU: 0
> [  105.329152] EIP is at nfs3_xdr_enc_setacl3args+0x61/0x98
> [  105.329249] EAX: ffffffea EBX: ce941d98 ECX: 00000000 EDX: 00000004
> [  105.329340] ESI: ce941cd0 EDI: 000000a4 EBP: ce941cc0 ESP: ce941cb4
> [  105.329431]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
> [  105.329525] Process install (pid: 3710, ti=ce940000 task=ced36f20 task.ti=ce940000)
> [  105.336600] Stack:
> [  105.336693]  ce941cd0 ce9dc000 00000000 ce941cf8 c12ecd02 c12f43e0 c116c00b cf754158
> [  105.336982]  ce9dc004 cf754284 ce9dc004 cf7ffee8 ceff9978 ce9dc000 cf7ffee8 ce9dc000
> [  105.337182]  ce9dc000 ce941d14 c12e698d cf75412c ce941d98 cf7ffee8 cf7fff20 00000000
> [  105.337405] Call Trace:
> [  105.337695]  [<c12ecd02>] rpcauth_wrap_req+0x75/0x7f
> [  105.337806]  [<c12f43e0>] ? xdr_encode_opaque+0x12/0x15
> [  105.337898]  [<c116c00b>] ? nfs3_xdr_enc_setacl3args+0x0/0x98
> [  105.337988]  [<c12e698d>] call_transmit+0x17e/0x1e8
> [  105.338072]  [<c12ec307>] __rpc_execute+0x6d/0x1a6
> [  105.338155]  [<c12ec474>] rpc_execute+0x34/0x37
> [  105.338235]  [<c12e738d>] rpc_run_task+0xb5/0xbd
> [  105.338316]  [<c12e7474>] rpc_call_sync+0x3d/0x58
> [  105.338402]  [<c116d0c6>] nfs3_proc_setacls+0x18e/0x24f
> [  105.338493]  [<c10b3f76>] ? __kmalloc+0x148/0x1c4
> [  105.338579]  [<c10ecd01>] ? posix_acl_alloc+0x12/0x22
> [  105.338665]  [<c116d5c8>] nfs3_proc_setacl+0xa0/0xca
> [  105.338748]  [<c116d69c>] nfs3_setxattr+0x62/0x88
> [  105.338834]  [<c1317042>] ? sub_preempt_count+0x7c/0x89
> [  105.338926]  [<c116d63a>] ? nfs3_setxattr+0x0/0x88
> [  105.339026]  [<c10cfa79>] __vfs_setxattr_noperm+0x26/0x95
> [  105.339114]  [<c10cfb43>] vfs_setxattr+0x5b/0x76
> [  105.339211]  [<c10cfbfb>] setxattr+0x9d/0xc3
> [  105.339298]  [<c10a2ea8>] ? handle_pte_fault+0x258/0x5cb
> [  105.339428]  [<c1091ff6>] ? __free_pages+0x1a/0x23
> [  105.339517]  [<c10498ea>] ? up_read+0x16/0x2c
> [  105.339599]  [<c10b8365>] ? fget+0x0/0xa3
> [  105.339677]  [<c10b8365>] ? fget+0x0/0xa3
> [  105.339760]  [<c1025d23>] ? get_parent_ip+0xb/0x31
> [  105.339843]  [<c1317042>] ? sub_preempt_count+0x7c/0x89
> [  105.339931]  [<c10cfc72>] sys_fsetxattr+0x51/0x79
> [  105.340014]  [<c1002853>] sysenter_do_call+0x12/0x32
> [  105.340133] Code: 2e 76 18 00 58 31 d2 8b 7f 28 f6 43 04 01 74 03 8b 53 08 6a 00 8b 46 04 6a 01 8b 0b 52 89 fa e8 85 10 f8 ff 83 c4 0c 85 c0 79 04 <0f> 0b eb fe 31 c9 f6 43 04 04 74 03 8b 4b 0c 68 00 10 00 00 8d
> [  105.350321] EIP: [<c116c06c>] nfs3_xdr_enc_setacl3args+0x61/0x98 SS:ESP 0068:ce941cb4
> [  105.364385] ---[ end trace 01fcfe7f0f7f6e4a ]---

nfs3_xdr_enc_setacl3args() is not properly setting up the target
buffer before nfsacl_encode() attempts to encode the ACL.

Introduced by commit d9c407b1 "NFS: Introduce new-style XDR encoding
functions for NFSv3."
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

ee5dc773

NFS: Fix "kernel BUG at fs/aio.c:554!" · 839f7ad6

由 Chuck Lever 提交于 1月 21, 2011

Nick Piggin reports:

> I'm getting use after frees in aio code in NFS
>
> [ 2703.396766] Call Trace:
> [ 2703.396858]  [<ffffffff8100b057>] ? native_sched_clock+0x27/0x80
> [ 2703.396959]  [<ffffffff8108509e>] ? put_lock_stats+0xe/0x40
> [ 2703.397058]  [<ffffffff81088348>] ? lock_release_holdtime+0xa8/0x140
> [ 2703.397159]  [<ffffffff8108a2a5>] lock_acquire+0x95/0x1b0
> [ 2703.397260]  [<ffffffff811627db>] ? aio_put_req+0x2b/0x60
> [ 2703.397361]  [<ffffffff81039701>] ? get_parent_ip+0x11/0x50
> [ 2703.397464]  [<ffffffff81612a31>] _raw_spin_lock_irq+0x41/0x80
> [ 2703.397564]  [<ffffffff811627db>] ? aio_put_req+0x2b/0x60
> [ 2703.397662]  [<ffffffff811627db>] aio_put_req+0x2b/0x60
> [ 2703.397761]  [<ffffffff811647fe>] do_io_submit+0x2be/0x7c0
> [ 2703.397895]  [<ffffffff81164d0b>] sys_io_submit+0xb/0x10
> [ 2703.397995]  [<ffffffff8100307b>] system_call_fastpath+0x16/0x1b
>
> Adding some tracing, it is due to nfs completing the request then
> returning something other than -EIOCBQUEUED, so aio.c
> also completes the request.

To address this, prevent the NFS direct I/O engine from completing
async iocbs when the forward path returns an error without starting
any I/O.

This fix appears to survive ^C during both "xfstest no. 208" and "fsx
-Z."

It's likely this bug has existed for a very long while, as we are seeing
very similar symptoms in OEL 5.  Copying stable.

Cc: Stable <stable@kernel.org>
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

839f7ad6

NFS4: Avoid potential NULL pointer dereference in decode_and_add_ds(). · ad3d2eed

由 Jesper Juhl 提交于 1月 17, 2011

On Mon, 17 Jan 2011, Mi Jinlong wrote:

>
>
> Jesper Juhl:
> > strrchr() can return NULL if nothing is found. If this happens we'll
> > dereference a NULL pointer in
> > fs/nfs/nfs4filelayoutdev.c::decode_and_add_ds().
> >
> > I tried to find some other code that guarantees that this can never
> > happen but I was unsuccessful. So, unless someone else can point to some
> > code that ensures this can never be a problem, I believe this patch is
> > needed.
> >
> > While I was changing this code I also noticed that all the dprintk()
> > statements, except one, start with "%s:". The one missing the ":" I added
> > it to.
>
>   Maybe another one also should be changed at decode_and_add_ds() at line 243:
>
>    243  printk("%s Decoded address and port %s\n", __func__, buf);
>
Missed that one. Thanks.
Signed-off-by: NJesper Juhl <jj@chaosbits.net>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

ad3d2eed

20 1月, 2011 1 次提交

NFS: fix handling of malloc failure during nfs_flush_multi() · 0da2a4ac

由 Fred Isaman 提交于 1月 19, 2011

Cleanup of the allocated list entries should not call
put_nfs_open_context() on each entry, as the context will
always be NULL, causing an oops.
Signed-off-by: NFred Isaman <iisaman@netapp.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

0da2a4ac

16 1月, 2011 3 次提交

Unexport do_add_mount() and add in follow_automount(), not ->d_automount() · ea5b778a

由 David Howells 提交于 1月 14, 2011

Unexport do_add_mount() and make ->d_automount() return the vfsmount to be
added rather than calling do_add_mount() itself. follow_automount() will then
do the addition.

This slightly complicates things as ->d_automount() normally wants to add the
new vfsmount to an expiration list and start an expiration timer. The problem
with that is that the vfsmount will be deleted if it has a refcount of 1 and
the timer will not repeat if the expiration list is empty.

To this end, we require the vfsmount to be returned from d_automount() with a
refcount of (at least) 2. One of these refs will be dropped unconditionally.
In addition, follow_automount() must get a 3rd ref around the call to
do_add_mount() lest it eat a ref and return an error, leaving the mount we
have open to being expired as we would otherwise have only 1 ref on it.

d_automount() should also add the the vfsmount to the expiration list (by
calling mnt_set_expiry()) and start the expiration timer before returning, if
this mechanism is to be used. The vfsmount will be unlinked from the
expiration list by follow_automount() if do_add_mount() fails.

This patch also fixes the call to do_add_mount() for AFS to propagate the mount
flags from the parent vfsmount.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

ea5b778a

NFS: Use d_automount() rather than abusing follow_link() · 36d43a43

由 David Howells 提交于 1月 14, 2011

Make NFS use the new d_automount() dentry operation rather than abusing
follow_link() on directories.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Acked-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
Acked-by: NIan Kent <raven@themaw.net>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

36d43a43

Add a dentry op to allow processes to be held during pathwalk transit · cc53ce53

由 David Howells 提交于 1月 14, 2011

Add a dentry op (d_manage) to permit a filesystem to hold a process and make it
sleep when it tries to transit away from one of that filesystem's directories
during a pathwalk.  The operation is keyed off a new dentry flag
(DCACHE_MANAGE_TRANSIT).

The filesystem is allowed to be selective about which processes it holds and
which it permits to continue on or prohibits from transiting from each flagged
directory.  This will allow autofs to hold up client processes whilst letting
its userspace daemon through to maintain the directory or the stuff behind it
or mounted upon it.

The ->d_manage() dentry operation:

	int (*d_manage)(struct path *path, bool mounting_here);

takes a pointer to the directory about to be transited away from and a flag
indicating whether the transit is undertaken by do_add_mount() or
do_move_mount() skipping through a pile of filesystems mounted on a mountpoint.

It should return 0 if successful and to let the process continue on its way;
-EISDIR to prohibit the caller from skipping to overmounted filesystems or
automounting, and to use this directory; or some other error code to return to
the user.

->d_manage() is called with namespace_sem writelocked if mounting_here is true
and no other locks held, so it may sleep.  However, if mounting_here is true,
it may not initiate or wait for a mount or unmount upon the parameter
directory, even if the act is actually performed by userspace.

Within fs/namei.c, follow_managed() is extended to check with d_manage() first
on each managed directory, before transiting away from it or attempting to
automount upon it.

follow_down() is renamed follow_down_one() and should only be used where the
filesystem deliberately intends to avoid management steps (e.g. autofs).

A new follow_down() is added that incorporates the loop done by all other
callers of follow_down() (do_add/move_mount(), autofs and NFSD; whilst AFS, NFS
and CIFS do use it, their use is removed by converting them to use
d_automount()).  The new follow_down() calls d_manage() as appropriate.  It
also takes an extra parameter to indicate if it is being called from mount code
(with namespace_sem writelocked) which it passes to d_manage().  follow_down()
ignores automount points so that it can be used to mount on them.

__follow_mount_rcu() is made to abort rcu-walk mode if it hits a directory with
DCACHE_MANAGE_TRANSIT set on the basis that we're probably going to have to
sleep.  It would be possible to enter d_manage() in rcu-walk mode too, and have
that determine whether to abort or not itself.  That would allow the autofs
daemon to continue on in rcu-walk mode.

Note that DCACHE_MANAGE_TRANSIT on a directory should be cleared when it isn't
required as every tranist from that directory will cause d_manage() to be
invoked.  It can always be set again when necessary.

==========================
WHAT THIS MEANS FOR AUTOFS
==========================

Autofs currently uses the lookup() inode op and the d_revalidate() dentry op to
trigger the automounting of indirect mounts, and both of these can be called
with i_mutex held.

autofs knows that the i_mutex will be held by the caller in lookup(), and so
can drop it before invoking the daemon - but this isn't so for d_revalidate(),
since the lock is only held on _some_ of the code paths that call it.  This
means that autofs can't risk dropping i_mutex from its d_revalidate() function
before it calls the daemon.

The bug could manifest itself as, for example, a process that's trying to
validate an automount dentry that gets made to wait because that dentry is
expired and needs cleaning up:

	mkdir         S ffffffff8014e05a     0 32580  24956
	Call Trace:
	 [<ffffffff885371fd>] :autofs4:autofs4_wait+0x674/0x897
	 [<ffffffff80127f7d>] avc_has_perm+0x46/0x58
	 [<ffffffff8009fdcf>] autoremove_wake_function+0x0/0x2e
	 [<ffffffff88537be6>] :autofs4:autofs4_expire_wait+0x41/0x6b
	 [<ffffffff88535cfc>] :autofs4:autofs4_revalidate+0x91/0x149
	 [<ffffffff80036d96>] __lookup_hash+0xa0/0x12f
	 [<ffffffff80057a2f>] lookup_create+0x46/0x80
	 [<ffffffff800e6e31>] sys_mkdirat+0x56/0xe4

versus the automount daemon which wants to remove that dentry, but can't
because the normal process is holding the i_mutex lock:

	automount     D ffffffff8014e05a     0 32581      1              32561
	Call Trace:
	 [<ffffffff80063c3f>] __mutex_lock_slowpath+0x60/0x9b
	 [<ffffffff8000ccf1>] do_path_lookup+0x2ca/0x2f1
	 [<ffffffff80063c89>] .text.lock.mutex+0xf/0x14
	 [<ffffffff800e6d55>] do_rmdir+0x77/0xde
	 [<ffffffff8005d229>] tracesys+0x71/0xe0
	 [<ffffffff8005d28d>] tracesys+0xd5/0xe0

which means that the system is deadlocked.

This patch allows autofs to hold up normal processes whilst the daemon goes
ahead and does things to the dentry tree behind the automouter point without
risking a deadlock as almost no locks are held in d_manage() and none in
d_automount().
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Was-Acked-by: NIan Kent <raven@themaw.net>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

cc53ce53

14 1月, 2011 2 次提交

N
nfs: add missing rcu-walk check · 657e94b6
由 Nick Piggin 提交于 1月 14, 2011
```
Signed-off-by: NNick Piggin <npiggin@kernel.dk>
```
657e94b6

NFS: Fix NFSv3 exclusive open semantics · 8a0eebf6

由 Trond Myklebust 提交于 1月 13, 2011

Commit c0204fd2 (NFS: Clean up
nfs4_proc_create()) broke NFSv3 exclusive open by removing the code
that passes the O_EXCL flag down to nfs3_proc_create(). This patch
reverts that offending hunk from the original commit.
Reported-by: NNick Bowler <nbowler@elliptictech.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@kernel.org    [2.6.37]
Tested-by: NNick Bowler <nbowler@elliptictech.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

8a0eebf6

13 1月, 2011 1 次提交
- A
  switch nfs to ->s_d_op · 8b244ff2
  由 Al Viro 提交于 12月 18, 2010
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  8b244ff2
12 1月, 2011 1 次提交

NFS fix the setting of exchange id flag · 357f54d6

由 Andy Adamson 提交于 12月 14, 2010

Indicate support for referrals. Do not set any PNFS roles. Check the flags
returned by the server for validity. Do not use exchange flags from an old
client ID instance when recovering a client ID.

Update the EXCHID4_FLAG_XXX set to RFC 5661.
Signed-off-by: NAndy Adamson <andros@netapp.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

357f54d6

11 1月, 2011 1 次提交

NFS: Don't use vm_map_ram() in readdir · 6650239a

由 Trond Myklebust 提交于 1月 08, 2011

vm_map_ram() is not available on NOMMU platforms, and causes trouble
on incoherrent architectures such as ARM when we access the page data
through both the direct and the virtual mapping.

The alternative is to use the direct mapping to access page data
for the case when we are not crossing a page boundary, but to copy
the data into a linear scratch buffer when we are accessing data
that spans page boundaries.
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
Tested-by: NMarc Kleine-Budde <mkl@pengutronix.de>
Cc: stable@kernel.org  [2.6.37]

6650239a

07 1月, 2011 27 次提交

fs: dcache per-inode inode alias locking · 873feea0

由 Nick Piggin 提交于 1月 07, 2011

dcache_inode_lock can be replaced with per-inode locking. Use existing
inode->i_lock for this. This is slightly non-trivial because we sometimes
need to find the inode from the dentry, which requires d_inode to be
stabilised (either with refcount or d_lock).
Signed-off-by: NNick Piggin <npiggin@kernel.dk>

873feea0

N
fs: provide rcu-walk aware permission i_ops · b74c79e9
由 Nick Piggin 提交于 1月 07, 2011
```
Signed-off-by: NNick Piggin <npiggin@kernel.dk>
```
b74c79e9

fs: rcu-walk aware d_revalidate method · 34286d66

由 Nick Piggin 提交于 1月 07, 2011

Require filesystems be aware of .d_revalidate being called in rcu-walk
mode (nd->flags & LOOKUP_RCU). For now do a simple push down, returning
-ECHILD from all implementations.
Signed-off-by: NNick Piggin <npiggin@kernel.dk>

34286d66

fs: dcache reduce branches in lookup path · fb045adb

由 Nick Piggin 提交于 1月 07, 2011

Reduce some branches and memory accesses in dcache lookup by adding dentry
flags to indicate common d_ops are set, rather than having to check them.
This saves a pointer memory access (dentry->d_op) in common path lookup
situations, and saves another pointer load and branch in cases where we
have d_op but not the particular operation.

Patched with:

git grep -E '[.>]([[:space:]])*d_op([[:space:]])*=' | xargs sed -e 's/\([^\t ]*\)->d_op = \(.*\);/d_set_d_op(\1, \2);/' -e 's/\([^\t ]*\)\.d_op = \(.*\);/d_set_d_op(\&\1, \2);/' -i
Signed-off-by: NNick Piggin <npiggin@kernel.dk>

fb045adb

fs: icache RCU free inodes · fa0d7e3d

由 Nick Piggin 提交于 1月 07, 2011

RCU free the struct inode. This will allow:

- Subsequent store-free path walking patch. The inode must be consulted for
  permissions when walking, so an RCU inode reference is a must.
- sb_inode_list_lock to be moved inside i_lock because sb list walkers who want
  to take i_lock no longer need to take sb_inode_list_lock to walk the list in
  the first place. This will simplify and optimize locking.
- Could remove some nested trylock loops in dcache code
- Could potentially simplify things a bit in VM land. Do not need to take the
  page lock to follow page->mapping.

The downsides of this is the performance cost of using RCU. In a simple
creat/unlink microbenchmark, performance drops by about 10% due to inability to
reuse cache-hot slab objects. As iterations increase and RCU freeing starts
kicking over, this increases to about 20%.

In cases where inode lifetimes are longer (ie. many inodes may be allocated
during the average life span of a single inode), a lot of this cache reuse is
not applicable, so the regression caused by this patch is smaller.

The cache-hot regression could largely be avoided by using SLAB_DESTROY_BY_RCU,
however this adds some complexity to list walking and store-free path walking,
so I prefer to implement this at a later date, if it is shown to be a win in
real situations. I haven't found a regression in any non-micro benchmark so I
doubt it will be a problem.
Signed-off-by: NNick Piggin <npiggin@kernel.dk>

fa0d7e3d

fs: dcache remove dcache_lock · b5c84bf6

由 Nick Piggin 提交于 1月 07, 2011

dcache_lock no longer protects anything. remove it.
Signed-off-by: NNick Piggin <npiggin@kernel.dk>

b5c84bf6

fs: Use rename lock and RCU for multi-step operations · 949854d0

由 Nick Piggin 提交于 1月 07, 2011

The remaining usages for dcache_lock is to allow atomic, multi-step read-side
operations over the directory tree by excluding modifications to the tree.
Also, to walk in the leaf->root direction in the tree where we don't have
a natural d_lock ordering.

This could be accomplished by taking every d_lock, but this would mean a
huge number of locks and actually gets very tricky.

Solve this instead by using the rename seqlock for multi-step read-side
operations, retry in case of a rename so we don't walk up the wrong parent.
Concurrent dentry insertions are not serialised against. Concurrent deletes
are tricky when walking up the directory: our parent might have been deleted
when dropping locks so also need to check and retry for that.

We can also use the rename lock in cases where livelock is a worry (and it
is introduced in subsequent patch).
Signed-off-by: NNick Piggin <npiggin@kernel.dk>

949854d0

fs: scale inode alias list · b23fb0a6

由 Nick Piggin 提交于 1月 07, 2011

Add a new lock, dcache_inode_lock, to protect the inode's i_dentry list
from concurrent modification. d_alias is also protected by d_lock.
Signed-off-by: NNick Piggin <npiggin@kernel.dk>

b23fb0a6

fs: dcache scale dentry refcount · b7ab39f6

由 Nick Piggin 提交于 1月 07, 2011

Make d_count non-atomic and protect it with d_lock. This allows us to ensure a
0 refcount dentry remains 0 without dcache_lock. It is also fairly natural when
we start protecting many other dentry members with d_lock.
Signed-off-by: NNick Piggin <npiggin@kernel.dk>

b7ab39f6

fs: change d_delete semantics · fe15ce44

由 Nick Piggin 提交于 1月 07, 2011

Change d_delete from a dentry deletion notification to a dentry caching
advise, more like ->drop_inode. Require it to be constant and idempotent,
and not take d_lock. This is how all existing filesystems use the callback
anyway.

This makes fine grained dentry locking of dput and dentry lru scanning
much simpler.
Signed-off-by: NNick Piggin <npiggin@kernel.dk>

fe15ce44

NFSv4: Ensure continued open and lockowner name uniqueness · d035c36c

由 Trond Myklebust 提交于 12月 21, 2010

In order to enable migration support, we will want to move some of the
structures that are subject to migration into the struct nfs_server.
In particular, if we are to move the state_owner and state_owner_id to
being a per-filesystem structure, then we should label the resulting
open/lock owners with a per-filesytem label to ensure global uniqueness.

This patch does so by adding the super block s_dev to the open/lock owner
name.
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

d035c36c

NFS: Move cl_delegations to the nfs_server struct · d3978bb3

由 Chuck Lever 提交于 12月 24, 2010

Delegations are per-inode, not per-nfs_client.  When a server file
system is migrated, delegations on the client must be moved from the
source to the destination nfs_server.  Make it easier to manage a
mount point's delegation list across a migration event by moving the
list to the nfs_server struct.

Clean up: I added documenting comments to public functions I changed
in this patch.  For consistency I added comments to all the other
public functions in fs/nfs/delegation.c.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

d3978bb3

NFS: Introduce nfs_detach_delegations() · dda4b225

由 Chuck Lever 提交于 12月 24, 2010

Clean up:  Refactor code that takes clp->cl_lock and calls
nfs_detach_delegations_locked() into its own function.

While we're changing the call sites, get rid of the second parameter
and the logic in nfs_detach_delegations_locked() that uses it, since
callers always set that parameter of nfs_detach_delegations_locked()
to NULL.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

dda4b225

NFS: Move cl_state_owners and related fields to the nfs_server struct · 24d292b8

由 Chuck Lever 提交于 12月 24, 2010

NFSv4 migration needs to reassociate state owners from the source to
the destination nfs_server data structures.  To make that easier, move
the cl_state_owners field to the nfs_server struct.  cl_openowner_id
and cl_lockowner_id accompany this move, as they are used in
conjunction with cl_state_owners.

The cl_lock field in the parent nfs_client continues to protect all
three of these fields.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

24d292b8

NFS: Allow walking nfs_client.cl_superblocks list outside client.c · fca5238e

由 Chuck Lever 提交于 12月 24, 2010

We're about to move some fields from struct nfs_client to struct
nfs_server.  There is a many-to-one relationship between nfs_servers
and nfs_clients.  After these fields are moved to the nfs_server
struct, to visit all of the data in these fields that is owned by one
nfs_client, code will need to visit each nfs_server on the
cl_superblocks list for that nfs_client.

To serialize changes to the cl_superblocks list during these little
expeditions, protect the list with RCU.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

fca5238e

pnfs: layout roc code · f7e8917a

由 Fred Isaman 提交于 1月 06, 2011

A layout can request return-on-close. How this interacts with the
forgetful model of never sending LAYOUTRETURNS is a bit ambiguous.
We forget any layouts marked roc, and wait for them to be completely
forgotten before continuing with the close. In addition, to compensate
for races with any inflight LAYOUTGETs, and the fact that we do not get
any layout stateid back from the server, we set the barrier to the worst
case scenario of current_seqid + number of outstanding LAYOUTGETS.
Signed-off-by: NFred Isaman <iisaman@netapp.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

f7e8917a

pnfs: update nfs4_callback_recallany to handle layouts · 36840370

由 Alexandros Batsakis 提交于 1月 06, 2011

While here, update the code a bit.
Signed-off-by: NAlexandros Batsakis <batsakis@netapp.com>
Signed-off-by: NFred Isaman <iisaman@netapp.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

36840370

pnfs: add CB_LAYOUTRECALL handling · 43f1b3da

由 Fred Isaman 提交于 1月 06, 2011

This is the heart of the wave 2 submission.  Add the code to trigger
drain and forget of any afected layouts.  In addition, we set a
"barrier", below which any LAYOUTGET reply is ignored.  This is to
compensate for the fact that we do not wait for outstanding LAYOUTGETs
to complete as per section 12.5.5.2.1 of RFC 5661.
Signed-off-by: NFred Isaman <iisaman@netapp.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

43f1b3da

pnfs: CB_LAYOUTRECALL xdr code · f2a62561

由 Fred Isaman 提交于 1月 06, 2011

This is the xdr decoding for CB_LAYOUTRECALL.
Signed-off-by: NAlexandros Batsakis <batsakis@netapp.com>
Signed-off-by: NDean Hildebrand <dhildeb@us.ibm.com>
Signed-off-by: NMarc Eshel <eshel@almaden.ibm.com>
Signed-off-by: NAndy Adamson <andros@netapp.com>
Signed-off-by: NBenny Halevy <bhalevy@panasas.com>
Signed-off-by: NFred Isaman <iisaman@netapp.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

f2a62561

pnfs: change lo refcounting to atomic_t · cc6e5340

由 Fred Isaman 提交于 1月 06, 2011

This will be required to allow us to grab reference outside of i_lock.
While we are at it, make put_layout_hdr take the same argument as all the
related functions.
Signed-off-by: NFred Isaman <iisaman@netapp.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

cc6e5340

pnfs: check that partial LAYOUTGET return is ignored · fc1794c5

由 Fred Isaman 提交于 1月 06, 2011

Either a bad server reply, or our ignoring of multiple array segments in
a reply, can cause a reply to not meet our requirements.  Ensure
that we ignore such replies.
Signed-off-by: NFred Isaman <iisaman@netapp.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

fc1794c5

pnfs: add layout to client list before sending rpc · 2130ff66

由 Fred Isaman 提交于 1月 06, 2011

Since this list will be used to search for layouts to recall,
this is necessary to avoid a race where the recall comes in,
sees there is nothing in the client list, and prepares to return
NOMATCHING, while the LAYOUTGET gets processed before the recall
updates the stateid.
Signed-off-by: NFred Isaman <iisaman@netapp.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

2130ff66

pnfs: serialize LAYOUTGET(openstateid) · cf7d63f1

由 Fred Isaman 提交于 1月 06, 2011

We shouldn't send a LAYOUTGET(openstateid) unless all outstanding RPCs
using the previous stateid are completed. This requires choosing the
stateid to encode earlier, so we can abort if one is not available (we
want to use the open stateid, but a LAYOUTGET is already out using
it), and adding a count of the number of outstanding rpc calls using
layout state (which for now consist solely of LAYOUTGETs).
Signed-off-by: NFred Isaman <iisaman@netapp.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

cf7d63f1

pnfs: layoutget rpc code cleanup · c31663d4

由 Fred Isaman 提交于 1月 06, 2011

No functional changes, just some code minor code rearrangement and
comments.
Signed-off-by: NFred Isaman <iisaman@netapp.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

c31663d4

pnfs: change how lsegs are removed from layout list · 4541d16c

由 Fred Isaman 提交于 1月 06, 2011

This is to prepare the way for sensible io draining.  Instead of just
removing the lseg from the list, we instead clear the VALID flag
(preventing new io from grabbing references to the lseg) and remove
the reference holding it in the list.  Thus the lseg will be removed
once any io in progress completes and any references still held are
dropped.
Signed-off-by: NFred Isaman <iisaman@netapp.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

4541d16c

pnfs: change layout state seqlock to a spinlock · fd6002e9

由 Fred Isaman 提交于 1月 06, 2011

This prepares for future changes, where the layout state needs
to change atomically with several other variables. In particular,
it will need to know if lo->segs is empty, as we test that instead
of manipulating the NFS_LAYOUT_STATEID_SET bit. Moreover, the
layoutstateid is not really a read-mostly structure, as it is
written almost as often as it is read.

The behavior of pnfs_get_layout_stateid is also slightly changed, so that
it no longer changes the stateid. Its name is changed to +pnfs_choose_layoutget_stateid.
Signed-off-by: NFred Isaman <iisaman@netapp.com>
Signed-off-by: NBenny Halevy <bhalevy@panasas.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

fd6002e9

pnfs: add prefix to struct pnfs_layout_hdr fields · b7edfaa1

由 Fred Isaman 提交于 1月 06, 2011

Signed-off-by: NFred Isaman <iisaman@netapp.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

b7edfaa1

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功