提交 · 0c0e0d3c091ed5b823fb89e1d46dbb9abd5830a7 · openeuler / raspberrypi-kernel

09 9月, 2014 2 次提交

nfs: revert "nfs4: queue free_lock_state job submission to nfsiod" · 0c0e0d3c

由 Jeff Layton 提交于 9月 08, 2014

This reverts commit 49a4bda2.

Christoph reported an oops due to the above commit:

generic/089 242s ...[ 2187.041239] general protection fault: 0000 [#1]
SMP
[ 2187.042899] Modules linked in:
[ 2187.044000] CPU: 0 PID: 11913 Comm: kworker/0:1 Not tainted 3.16.0-rc6+ #1151
[ 2187.044287] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
[ 2187.044287] Workqueue: nfsiod free_lock_state_work
[ 2187.044287] task: ffff880072b50cd0 ti: ffff88007a4ec000 task.ti: ffff88007a4ec000
[ 2187.044287] RIP: 0010:[<ffffffff81361ca6>]  [<ffffffff81361ca6>] free_lock_state_work+0x16/0x30
[ 2187.044287] RSP: 0018:ffff88007a4efd58  EFLAGS: 00010296
[ 2187.044287] RAX: 6b6b6b6b6b6b6b6b RBX: ffff88007a947ac0 RCX: 8000000000000000
[ 2187.044287] RDX: ffffffff826af9e0 RSI: ffff88007b093c00 RDI: ffff88007b093db8
[ 2187.044287] RBP: ffff88007a4efd58 R08: ffffffff832d3e10 R09: 000001c40efc0000
[ 2187.044287] R10: 0000000000000000 R11: 0000000000059e30 R12: ffff88007fc13240
[ 2187.044287] R13: ffff88007fc18b00 R14: ffff88007b093db8 R15: 0000000000000000
[ 2187.044287] FS:  0000000000000000(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000
[ 2187.044287] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 2187.044287] CR2: 00007f93ec33fb80 CR3: 0000000079dc2000 CR4: 00000000000006f0
[ 2187.044287] Stack:
[ 2187.044287]  ffff88007a4efdd8 ffffffff810cc877 ffffffff810cc80d ffff88007fc13258
[ 2187.044287]  000000007a947af0 0000000000000000 ffffffff8353ccc8 ffffffff82b6f3d0
[ 2187.044287]  0000000000000000 ffffffff82267679 ffff88007a4efdd8 ffff88007fc13240
[ 2187.044287] Call Trace:
[ 2187.044287]  [<ffffffff810cc877>] process_one_work+0x1c7/0x490
[ 2187.044287]  [<ffffffff810cc80d>] ? process_one_work+0x15d/0x490
[ 2187.044287]  [<ffffffff810cd569>] worker_thread+0x119/0x4f0
[ 2187.044287]  [<ffffffff810fbbad>] ? trace_hardirqs_on+0xd/0x10
[ 2187.044287]  [<ffffffff810cd450>] ? init_pwq+0x190/0x190
[ 2187.044287]  [<ffffffff810d3c6f>] kthread+0xdf/0x100
[ 2187.044287]  [<ffffffff810d3b90>] ? __init_kthread_worker+0x70/0x70
[ 2187.044287]  [<ffffffff81d9873c>] ret_from_fork+0x7c/0xb0
[ 2187.044287]  [<ffffffff810d3b90>] ? __init_kthread_worker+0x70/0x70
[ 2187.044287] Code: 0f 1f 44 00 00 31 c0 5d c3 66 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 8d b7 48 fe ff ff 48 8b 87 58 fe ff ff 48 89 e5 48 8b 40 30 <48> 8b 00 48 8b 10 48 89 c7 48 8b 92 90 03 00 00 ff 52 28 5d c3
[ 2187.044287] RIP  [<ffffffff81361ca6>] free_lock_state_work+0x16/0x30
[ 2187.044287]  RSP <ffff88007a4efd58>
[ 2187.103626] ---[ end trace 0f11326d28e5d8fa ]---

The original reason for this patch was because the fl_release_private
operation couldn't sleep. With commit ed9814d8 (locks: defer freeing
locks in locks_delete_lock until after i_lock has been dropped), this is
no longer a problem so we can revert this patch.
Reported-by: NChristoph Hellwig <hch@infradead.org>
Signed-off-by: NJeff Layton <jlayton@primarydata.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Tested-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

0c0e0d3c

nfs: fix kernel warning when removing proc entry · 21e81002

由 Cong Wang 提交于 9月 08, 2014

I saw the following kernel warning:

[ 1852.321222] ------------[ cut here ]------------
[ 1852.326527] WARNING: CPU: 0 PID: 118 at fs/proc/generic.c:521 remove_proc_entry+0x154/0x16b()
[ 1852.335630] remove_proc_entry: removing non-empty directory 'fs/nfsfs', leaking at least 'volumes'
[ 1852.344084] CPU: 0 PID: 118 Comm: kworker/u8:2 Not tainted 3.16.0+ #540
[ 1852.350036] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[ 1852.354992] Workqueue: netns cleanup_net
[ 1852.358701]  0000000000000000 ffff880116f2fbd0 ffffffff819c03e9 ffff880116f2fc18
[ 1852.366474]  ffff880116f2fc08 ffffffff810744ee ffffffff811e0e6e ffff8800d4e96238
[ 1852.373507]  ffffffff81dbe665 ffff8800d46a5948 0000000000000005 ffff880116f2fc68
[ 1852.380224] Call Trace:
[ 1852.381976]  [<ffffffff819c03e9>] dump_stack+0x4d/0x66
[ 1852.385495]  [<ffffffff810744ee>] warn_slowpath_common+0x7a/0x93
[ 1852.389869]  [<ffffffff811e0e6e>] ? remove_proc_entry+0x154/0x16b
[ 1852.393987]  [<ffffffff8107457b>] warn_slowpath_fmt+0x4c/0x4e
[ 1852.397999]  [<ffffffff811e0e6e>] remove_proc_entry+0x154/0x16b
[ 1852.402034]  [<ffffffff8129c73d>] nfs_fs_proc_net_exit+0x53/0x56
[ 1852.406136]  [<ffffffff812a103b>] nfs_net_exit+0x12/0x1d
[ 1852.409774]  [<ffffffff81785bc9>] ops_exit_list+0x44/0x55
[ 1852.413529]  [<ffffffff81786389>] cleanup_net+0xee/0x182
[ 1852.417198]  [<ffffffff81088c9e>] process_one_work+0x209/0x40d
[ 1852.502320]  [<ffffffff81088bf7>] ? process_one_work+0x162/0x40d
[ 1852.587629]  [<ffffffff810890c1>] worker_thread+0x1f0/0x2c7
[ 1852.673291]  [<ffffffff81088ed1>] ? process_scheduled_works+0x2f/0x2f
[ 1852.759470]  [<ffffffff8108e079>] kthread+0xc9/0xd1
[ 1852.843099]  [<ffffffff8109427f>] ? finish_task_switch+0x3a/0xce
[ 1852.926518]  [<ffffffff8108dfb0>] ? __kthread_parkme+0x61/0x61
[ 1853.008565]  [<ffffffff819cbeac>] ret_from_fork+0x7c/0xb0
[ 1853.076477]  [<ffffffff8108dfb0>] ? __kthread_parkme+0x61/0x61
[ 1853.140653] ---[ end trace 69c4c6617f78e32d ]---

It looks wrong that we add "/proc/net/nfsfs" in nfs_fs_proc_net_init()
while remove "/proc/fs/nfsfs" in nfs_fs_proc_net_exit().

Fixes: commit 65b38851 (NFS: Fix /proc/fs/nfsfs/servers and /proc/fs/nfsfs/volumes)
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Trond Myklebust <trond.myklebust@primarydata.com>
Cc: Dan Aloni <dan@kernelim.com>
Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
[Trond: replace uses of remove_proc_entry() with remove_proc_subtree()
as suggested by Al Viro]
Cc: stable@vger.kernel.org # 3.4.x : 65b38851: NFS: Fix /proc/fs/nfsfs/servers
Cc: stable@vger.kernel.org # 3.4.x
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

21e81002

27 8月, 2014 3 次提交

NFSv3: Fix another acl regression · f87d928f

由 Trond Myklebust 提交于 8月 24, 2014

When creating a new object on the NFS server, we should not be sending
posix setacl requests unless the preceding posix_acl_create returned a
non-trivial acl. Doing so, causes Solaris servers in particular to
return an EINVAL.

Fixes: 013cdf10 (nfs: use generic posix ACL infrastructure,,,)
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1132786
Cc: stable@vger.kernel.org # 3.14+
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

f87d928f

NFSv4: Don't clear the open state when we just did an OPEN_DOWNGRADE · 412f6c4c

由 Trond Myklebust 提交于 8月 25, 2014

If we did an OPEN_DOWNGRADE, then the right thing to do on success, is
to apply the new open mode to the struct nfs4_state. Instead, we were
unconditionally clearing the state, making it appear to our state
machinery as if we had just performed a CLOSE.

Fixes: 226056c5 (NFSv4: Use correct locking when updating nfs4_state...)
Cc: stable@vger.kernel.org # 3.15+
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

412f6c4c

NFSv4: Fix problems with close in the presence of a delegation · aee7af35

由 Trond Myklebust 提交于 8月 25, 2014

In the presence of delegations, we can no longer assume that the
state->n_rdwr, state->n_rdonly, state->n_wronly reflect the open
stateid share mode, and so we need to calculate the initial value
for calldata->arg.fmode using the state->flags.
Reported-by: NJames Drews <drews@engr.wisc.edu>
Fixes: 88069f77 (NFSv41: Fix a potential state leakage when...)
Cc: stable@vger.kernel.org # 2.6.33+
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

aee7af35

23 8月, 2014 8 次提交

nfs: Don't busy-wait on SIGKILL in __nfs_iocounter_wait · 92a56555

由 David Jeffery 提交于 8月 05, 2014

If a SIGKILL is sent to a task waiting in __nfs_iocounter_wait,
it will busy-wait or soft lockup in its while loop.
nfs_wait_bit_killable won't sleep, and the loop won't exit on
the error return.

Stop the busy-wait by breaking out of the loop when
nfs_wait_bit_killable returns an error.
Signed-off-by: NDavid Jeffery <djeffery@redhat.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

92a56555

nfs: can_coalesce_requests must enforce contiguity · 78270e8f

由 Weston Andros Adamson 提交于 8月 14, 2014

Commit 6094f838
"nfs: allow coalescing of subpage requests" got rid of the requirement
that requests cover whole pages, but it made some incorrect assumptions.

It turns out that callers of this interface can map adjacent requests
(by file position as seen by req_offset + req->wb_bytes) to different pages,
even when they could share a page. An example is the direct I/O interface -
iov_iter_get_pages_alloc may return one segment with a partial page filled
and the next segment (which is adjacent in the file position) starts with a
new page.
Reported-by: NToralf Förster <toralf.foerster@gmx.de>
Signed-off-by: NWeston Andros Adamson <dros@primarydata.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

78270e8f

nfs: disallow duplicate pages in pgio page vectors · bba5c188

由 Weston Andros Adamson 提交于 8月 14, 2014

Adjacent requests that share the same page are allowed, but should only
use one entry in the page vector. This avoids overruning the page
vector - it is sized based on how many bytes there are, not by
request count.

This fixes issues that manifest as "Redzone overwritten" bugs (the
vector overrun) and hangs waiting on page read / write, as it waits on
the same page more than once.

This also adds bounds checking to the page vector with a graceful failure
(WARN_ON_ONCE and pgio error returned to application).
Reported-by: NToralf Förster <toralf.foerster@gmx.de>
Signed-off-by: NWeston Andros Adamson <dros@primarydata.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

bba5c188

nfs: don't sleep with inode lock in lock_and_join_requests · 7c3af975

由 Weston Andros Adamson 提交于 8月 08, 2014

This handles the 'nonblock=false' case in nfs_lock_and_join_requests.
If the group is already locked and blocking is allowed, drop the inode lock
and wait for the group lock to be cleared before trying it all again.
This should fix warnings found in peterz's tree (sched/wait branch), where
might_sleep() checks are added to wait.[ch].
Reported-by: NFengguang Wu <fengguang.wu@intel.com>
Signed-off-by: NWeston Andros Adamson <dros@primarydata.com>
Reviewed-by: NPeng Tao <tao.peng@primarydata.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

7c3af975

nfs: fix error handling in lock_and_join_requests · 94970014

由 Weston Andros Adamson 提交于 8月 08, 2014

This fixes handling of errors from nfs_page_group_lock in
nfs_lock_and_join_requests.  It now releases the inode lock and the
reference to the head request.
Reported-by: NPeng Tao <tao.peng@primarydata.com>
Signed-off-by: NWeston Andros Adamson <dros@primarydata.com>
Reviewed-by: NPeng Tao <tao.peng@primarydata.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

94970014

nfs: use blocking page_group_lock in add_request · bfd484a5

由 Weston Andros Adamson 提交于 8月 08, 2014

__nfs_pageio_add_request was calling nfs_page_group_lock nonblocking, but
this can return -EAGAIN which would end up passing -EIO to the application.

There is no reason not to block in this path, so change the two calls to
do so. Also, there is no need to check the return value of
nfs_page_group_lock when nonblock=false, so remove the error handling code.
Signed-off-by: NWeston Andros Adamson <dros@primarydata.com>
Reviewed-by: NPeng Tao <tao.peng@primarydata.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

bfd484a5

nfs: fix nonblocking calls to nfs_page_group_lock · bc8a309e

由 Weston Andros Adamson 提交于 8月 08, 2014

nfs_page_group_lock was calling wait_on_bit_lock even when told not to
block. Fix by first trying test_and_set_bit, followed by wait_on_bit_lock
if and only if blocking is allowed. Return -EAGAIN if nonblocking and the
test_and_set of the bit was already locked.
Signed-off-by: NWeston Andros Adamson <dros@primarydata.com>
Reviewed-by: NPeng Tao <tao.peng@primarydata.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

bc8a309e

nfs: change nfs_page_group_lock argument · fd2f3a06

由 Weston Andros Adamson 提交于 8月 08, 2014

Flip the meaning of the second argument from 'wait' to 'nonblock' to
match related functions. Update all five calls to reflect this change.
Signed-off-by: NWeston Andros Adamson <dros@primarydata.com>
Reviewed-by: NPeng Tao <tao.peng@primarydata.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

fd2f3a06

08 8月, 2014 1 次提交

dcache: d_obtain_alias callers don't all want DISCONNECTED · 1a0a397e

由 J. Bruce Fields 提交于 2月 14, 2014

There are a few d_obtain_alias callers that are using it to get the
root of a filesystem which may already have an alias somewhere else.

This is not the same as the filehandle-lookup case, and none of them
actually need DCACHE_DISCONNECTED set.

It isn't really a serious problem, but it would really be clearer if we
reserved DCACHE_DISCONNECTED for those cases where it's actually needed.

In the btrfs case this was causing a spurious printk from
nfsd/nfsfh.c:fh_verify when it found an unexpected DCACHE_DISCONNECTED
dentry.  Josef worked around this by unsetting DCACHE_DISCONNECTED
manually in 3a0dfa6a "Btrfs: unset DCACHE_DISCONNECTED when mounting
default subvol", and this replaces that workaround.

Cc: Josef Bacik <jbacik@fb.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

1a0a397e

05 8月, 2014 3 次提交

nfs: reject changes to resvport and sharecache during remount · 71a6ec8a

由 Scott Mayhew 提交于 8月 04, 2014

Commit c8e47028 made it possible to change resvport/noresvport and
sharecache/nosharecache via a remount operation, neither of which should be
allowed.
Signed-off-by: NScott Mayhew <smayhew@redhat.com>
Fixes: c8e47028 (nfs: Apply NFS_MOUNT_CMP_FLAGMASK to nfs_compare_remount_data)
Cc: stable@vger.kernel.org # 3.16+
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

71a6ec8a

NFS: Avoid infinite loop when RELEASE_LOCKOWNER getting expired error · 5b53dc88

由 Kinglong Mee 提交于 8月 04, 2014

Fix Commit 60ea6812 (NFS: Migration support for RELEASE_LOCKOWNER)
If getting expired error, client will enter a infinite loop as,

client                            server
   RELEASE_LOCKOWNER(old clid) ----->
                <--- expired error
   RENEW(old clid)             ----->
                <--- expired error
   SETCLIENTID                 ----->
                <--- a new clid
   SETCLIENTID_CONFIRM (new clid) -->
                <--- ok
   RELEASE_LOCKOWNER(old clid) ----->
                <--- expired error
   RENEW(new clid)             ----->
                <-- ok
   RELEASE_LOCKOWNER(old clid) ----->
                <--- expired error
   RENEW(new clid)             ----->
                <-- ok
                ... ...
Signed-off-by: NKinglong Mee <kinglongmee@gmail.com>
[Trond: replace call to nfs4_async_handle_error() with
 nfs4_schedule_lease_recovery()]
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

5b53dc88

NFS: Fix /proc/fs/nfsfs/servers and /proc/fs/nfsfs/volumes · 65b38851

由 Eric W. Biederman 提交于 7月 31, 2014

The usage of pid_ns->child_reaper->nsproxy->net_ns in
nfs_server_list_open and nfs_client_list_open is not safe.

/proc for a pid namespace can remain mounted after the all of the
process in that pid namespace have exited.  There are also times
before the initial process in a pid namespace has started or after the
initial process in a pid namespace has exited where
pid_ns->child_reaper can be NULL or stale.  Making the idiom
pid_ns->child_reaper->nsproxy a double whammy of problems.

Luckily all that needs to happen is to move /proc/fs/nfsfs/servers and
/proc/fs/nfsfs/volumes under /proc/net to /proc/net/nfsfs/servers and
/proc/net/nfsfs/volumes and add a symlink from the original location,
and to use seq_open_net as it has been designed.

Cc: stable@vger.kernel.org
Cc: Trond Myklebust <trond.myklebust@primarydata.com>
Cc: Stanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>

65b38851

04 8月, 2014 16 次提交

NFS: fix two problems in lookup_revalidate in RCU-walk · 50d77739

由 NeilBrown 提交于 8月 04, 2014

1/ rcu_dereference isn't correct: that field isn't
   RCU protected.   It could potentially change at any time
   so ACCESS_ONCE might be justified.

   changes to ->d_parent are protected by ->d_seq.  However
   that isn't always checked after ->d_revalidate is called,
   so it is safest to keep the double-check that ->d_parent
   hasn't changed at the end of these functions.

2/ in nfs4_lookup_revalidate, "->d_parent" was forgotten.
   So 'parent' was not the parent of 'dentry'.
   This fails safe is the context is that dentry->d_inode is
   NULL, and the result of parent->d_inode being NULL is
   that ECHILD is returned, which is always safe.
Reported-by: Nkbuild test robot <fengguang.wu@intel.com>
Signed-off-by: NNeilBrown <neilb@suse.de>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

50d77739

NFS: allow lockless access to access_cache · f682a398

由 NeilBrown 提交于 7月 14, 2014

The access cache is used during RCU-walk path lookups, so it is best
to avoid locking if possible as taking a lock kills concurrency.

The rbtree is not rcu-safe and cannot easily be made so.
Instead we simply check the last (i.e. most recent) entry on the LRU
list.  If this doesn't match, then we return -ECHILD and retry in
lock/refcount mode.

This requires freeing the nfs_access_entry struct with rcu, and
requires using rcu access primatives when adding entries to the lru, and
when examining the last entry.

Calling put_rpccred before kfree_rcu looks a bit odd, but as
put_rpccred already provides rcu protection, we know that the cred will
not actually be freed until the next grace period, so any concurrent
access will be safe.

This patch provides about 5% performance improvement on a stat-heavy
synthetic work load with 4 threads on a 2-core CPU.
Signed-off-by: NNeilBrown <neilb@suse.de>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

f682a398

NFS: teach nfs_lookup_verify_inode to handle LOOKUP_RCU · 1fa1e384

由 NeilBrown 提交于 7月 14, 2014

It fails with -ECHILD rather than make an RPC call.

This allows nfs_lookup_revalidate to call it in RCU-walk mode.
Signed-off-by: NNeilBrown <neilb@suse.de>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

1fa1e384

NFS: teach nfs_neg_need_reval to understand LOOKUP_RCU · 912a108d

由 NeilBrown 提交于 7月 14, 2014

This requires nfs_check_verifier to take an rcu_walk flag, and requires
an rcu version of nfs_revalidate_inode which returns -ECHILD rather
than making an RPC call.

With this, nfs_lookup_revalidate can call nfs_neg_need_reval in
RCU-walk mode.

We can also move the LOOKUP_RCU check past the nfs_check_verifier()
call in nfs_lookup_revalidate.

If RCU_WALK prevents nfs_check_verifier or nfs_neg_need_reval from
doing a full check, they return a status indicating that a revalidation
is required.  As this revalidation will not be possible in RCU_WALK
mode, -ECHILD will ultimately be returned, which is the desired result.
Signed-off-by: NNeilBrown <neilb@suse.de>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

912a108d

NFS: support RCU_WALK in nfs_permission() · f3324a2a

由 NeilBrown 提交于 7月 14, 2014

nfs_permission makes two calls which are not always safe in RCU_WALK,
rpc_lookup_cred and nfs_do_access.

The second can easily be made rcu-safe by aborting with -ECHILD before
making the RPC call.

The former can be made rcu-safe by calling rpc_lookup_cred_nonblock()
instead.
As this will almost always succeed, we use it even when RCU_WALK
isn't being used as it still saves some spinlocks in a common case.
We only fall back to rpc_lookup_cred() if rpc_lookup_cred_nonblock()
fails and MAY_NOT_BLOCK isn't set.

This optimisation (always trying rpc_lookup_cred_nonblock()) is
particularly important when a security module is active.
In that case inode_permission() may return -ECHILD from
security_inode_permission() even though ->permission() succeeded in
RCU_WALK mode.
This leads to may_lookup() retrying inode_permission after performing
unlazy_walk().  The spinlock that rpc_lookup_cred() takes is often
more expensive than anything security_inode_permission() does, so that
spinlock becomes the main bottleneck.
Signed-off-by: NNeilBrown <neilb@suse.de>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

f3324a2a

NFS: prepare for RCU-walk support but pushing tests later in code. · d51ac1a8

由 NeilBrown 提交于 7月 14, 2014

nfs_lookup_revalidate, nfs4_lookup_revalidate, and nfs_permission
all need to understand and handle RCU-walk for NFS to gain the
benefits of RCU-walk for cached information.

Currently these functions all immediately return -ECHILD
if the relevant flag (LOOKUP_RCU or MAY_NOT_BLOCK) is set.

This patch pushes those tests later in the code so that we only abort
immediately before we enter rcu-unsafe code.  As subsequent patches
make that rcu-unsafe code rcu-safe, several of these new tests will
disappear.

With this patch there are several paths through the code which will no
longer return -ECHILD during an RCU-walk.  However these are mostly
error paths or other uninteresting cases.

A noteworthy change in nfs_lookup_revalidate is that we don't take
(or put) the reference to ->d_parent when LOOKUP_RCU is set.
Rather we rcu_dereference ->d_parent, and check that ->d_inode
is not NULL.  We also check that ->d_parent hasn't changed after
all the tests.

In nfs4_lookup_revalidate we simply avoid testing LOOKUP_RCU on the
path that only calls nfs_lookup_revalidate() as that function
already performs the required test.
Signed-off-by: NNeilBrown <neilb@suse.de>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

d51ac1a8

NFS: nfs4_lookup_revalidate: only evaluate parent if it will be used. · 49317a7f

由 NeilBrown 提交于 7月 14, 2014

nfs4_lookup_revalidate only uses 'parent' to get 'dir', and only
uses 'dir' if 'inode == NULL'.

So we don't need to find out what 'parent' or 'dir' is until we
know that 'inode' is NULL.

By moving 'dget_parent' inside the 'if', we can reduce the number of
call sites for 'dput(parent)'.
Signed-off-by: NNeilBrown <neilb@suse.de>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

49317a7f

NFS: add checks for returned value of try_module_get() · 1f70ef96

由 Alexey Khoroshilov 提交于 7月 18, 2014

There is a couple of places in client code where returned value
of try_module_get() is ignored. As a result there is a small chance
to premature unload module because of unbalanced refcounting.

The patch adds error handling in that places.

Found by Linux Driver Verification project (linuxtesting.org).
Signed-off-by: NAlexey Khoroshilov <khoroshilov@ispras.ru>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

1f70ef96

nfs: clear_request_commit while holding i_lock · 411a99ad

由 Weston Andros Adamson 提交于 7月 17, 2014

Signed-off-by: NWeston Andros Adamson <dros@primarydata.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

411a99ad

pnfs: add pnfs_put_lseg_async · e6cf82d1

由 Weston Andros Adamson 提交于 7月 17, 2014

This is useful when lsegs need to be released while holding locks.
Signed-off-by: NWeston Andros Adamson <dros@primarydata.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

e6cf82d1

pnfs: find swapped pages on pnfs commit lists too · 02d1426c

由 Weston Andros Adamson 提交于 7月 17, 2014

nfs_page_find_head_request_locked looks through the regular nfs commit lists
when the page is swapped out, but doesn't look through the pnfs commit lists.

I'm not sure if anyone has hit any issues caused by this.
Suggested-by: NPeng Tao <tao.peng@primarydata.com>
Signed-off-by: NWeston Andros Adamson <dros@primarydata.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

02d1426c

nfs: fix comment and add warn_on for PG_INODE_REF · b412ddf0

由 Weston Andros Adamson 提交于 7月 17, 2014

Fix the comment in nfs_page.h for PG_INODE_REF to reflect that it's no longer
set only on head requests. Also add a WARN_ON_ONCE in nfs_inode_remove_request
as PG_INODE_REF should always be set.
Suggested-by: NPeng Tao <tao.peng@primarydata.com>
Signed-off-by: NWeston Andros Adamson <dros@primarydata.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

b412ddf0

nfs: check wait_on_bit_lock err in page_group_lock · e7029206

由 Weston Andros Adamson 提交于 7月 17, 2014

Return errors from wait_on_bit_lock from nfs_page_group_lock.

Add a bool argument @wait to nfs_page_group_lock. If true, loop over
wait_on_bit_lock until it returns cleanly. If false, return the error
from wait_on_bit_lock.
Signed-off-by: NWeston Andros Adamson <dros@primarydata.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

e7029206

NFS: nfs4_do_open should add negative results to the dcache. · 4fa2c54b

由 NeilBrown 提交于 7月 21, 2014

If you have an NFSv4 mounted directory which does not container 'foo'
and:

  ls -l foo
  ssh $server touch foo
  cat foo

then the 'cat' will fail (usually, depending a bit on the various
cache ages).  This is correct as negative looks are cached by default.
However with the same initial conditions:

  cat foo
  ssh $server touch foo
  cat foo

will usually succeed.  This is because an "open" does not add a
negative dentry to the dcache, while a "lookup" does.

This can have negative performance effects.  When "gcc" searches for
an include file, it will try to "open" the file in every director in
the search path.  Without caching of negative "open" results, this
generates much more traffic to the server than it should (or than
NFSv3 does).

The root of the problem is that _nfs4_open_and_get_state() will call
d_add_unique() on a positive result, but not on a negative result.
Compare with nfs_lookup() which calls d_materialise_unique on both
a positive result and on ENOENT.

This patch adds a call d_add() in the ENOENT case for
_nfs4_open_and_get_state() and also calls nfs_set_verifier().

With it, many fewer "open" requests for known-non-existent files are
sent to the server.
Signed-off-by: NNeilBrown <neilb@suse.de>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

4fa2c54b

nfs3_list_one_acl(): check get_acl() result with IS_ERR_OR_NULL · 7a9e75a1

由 Andrey Utkin 提交于 7月 26, 2014

There was a check for result being not NULL. But get_acl() may return
NULL, or ERR_PTR, or actual pointer.
The purpose of the function where current change is done is to "list
ACLs only when they are available", so any error condition of get_acl()
mustn't be elevated, and returning 0 there is still valid.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=81111Signed-off-by: NAndrey Utkin <andrey.krieger.utkin@gmail.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Fixes: 74adf83f (nfs: only show Posix ACLs in listxattr if actually...)
Cc: stable@vger.kernel.org # 3.14+
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

7a9e75a1

NFS: Enforce an upper limit on the number of cached access call · 3a505845

由 Trond Myklebust 提交于 7月 21, 2014

This may be used to limit the number of cached credentials building up
inside the access cache.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

3a505845

23 7月, 2014 1 次提交

KEYS: user: Use key preparsing · f9167789

由 David Howells 提交于 7月 18, 2014

Make use of key preparsing in user-defined and logon keys so that quota size
determination can take place prior to keyring locking when a key is being
added.

Also the idmapper key types need to change to match as they use the
user-defined key type routines.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Acked-by: NSteve Dickson <steved@redhat.com>
Acked-by: NJeff Layton <jlayton@primarydata.com>

f9167789

18 7月, 2014 1 次提交

KEYS: Allow special keys (eg. DNS results) to be invalidated by CAP_SYS_ADMIN · 0c7774ab

由 David Howells 提交于 7月 17, 2014

Special kernel keys, such as those used to hold DNS results for AFS, CIFS and
NFS and those used to hold idmapper results for NFS, used to be
'invalidateable' with key_revoke().  However, since the default permissions for
keys were reduced:

	Commit: 96b5c8fe
	KEYS: Reduce initial permissions on keys

it has become impossible to do this.

Add a key flag (KEY_FLAG_ROOT_CAN_INVAL) that will permit a key to be
invalidated by root.  This should not be used for system keyrings as the
garbage collector will try and remove any invalidate key.  For system keyrings,
KEY_FLAG_ROOT_CAN_CLEAR can be used instead.

After this, from userspace, keyctl_invalidate() and "keyctl invalidate" can be
used by any possessor of CAP_SYS_ADMIN (typically root) to invalidate DNS and
idmapper keys.  Invalidated keys are immediately garbage collected and will be
immediately rerequested if needed again.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Tested-by: NSteve Dickson <steved@redhat.com>

0c7774ab

16 7月, 2014 2 次提交

sched: Allow wait_on_bit_action() functions to support a timeout · c1221321

由 NeilBrown 提交于 7月 07, 2014

It is currently not possible for various wait_on_bit functions
to implement a timeout.

While the "action" function that is called to do the waiting
could certainly use schedule_timeout(), there is no way to carry
forward the remaining timeout after a false wake-up.
As false-wakeups a clearly possible at least due to possible
hash collisions in bit_waitqueue(), this is a real problem.

The 'action' function is currently passed a pointer to the word
containing the bit being waited on.  No current action functions
use this pointer.  So changing it to something else will be a
little noisy but will have no immediate effect.

This patch changes the 'action' function to take a pointer to
the "struct wait_bit_key", which contains a pointer to the word
containing the bit so nothing is really lost.

It also adds a 'private' field to "struct wait_bit_key", which
is initialized to zero.

An action function can now implement a timeout with something
like

static int timed_out_waiter(struct wait_bit_key *key)
{
	unsigned long waited;
	if (key->private == 0) {
		key->private = jiffies;
		if (key->private == 0)
			key->private -= 1;
	}
	waited = jiffies - key->private;
	if (waited > 10 * HZ)
		return -EAGAIN;
	schedule_timeout(waited - 10 * HZ);
	return 0;
}

If any other need for context in a waiter were found it would be
easy to use ->private for some other purpose, or even extend
"struct wait_bit_key".

My particular need is to support timeouts in nfs_release_page()
to avoid deadlocks with loopback mounted NFS.

While wait_on_bit_timeout() would be a cleaner interface, it
will not meet my need.  I need the timeout to be sensitive to
the state of the connection with the server, which could change.
 So I need to use an 'action' interface.
Signed-off-by: NNeilBrown <neilb@suse.de>
Acked-by: NPeter Zijlstra <peterz@infradead.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steve French <sfrench@samba.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20140707051604.28027.41257.stgit@notabene.brownSigned-off-by: NIngo Molnar <mingo@kernel.org>

c1221321

sched: Remove proliferation of wait_on_bit() action functions · 74316201

由 NeilBrown 提交于 7月 07, 2014

The current "wait_on_bit" interface requires an 'action'
function to be provided which does the actual waiting.
There are over 20 such functions, many of them identical.
Most cases can be satisfied by one of just two functions, one
which uses io_schedule() and one which just uses schedule().

So:
 Rename wait_on_bit and        wait_on_bit_lock to
        wait_on_bit_action and wait_on_bit_lock_action
 to make it explicit that they need an action function.

 Introduce new wait_on_bit{,_lock} and wait_on_bit{,_lock}_io
 which are *not* given an action function but implicitly use
 a standard one.
 The decision to error-out if a signal is pending is now made
 based on the 'mode' argument rather than being encoded in the action
 function.

 All instances of the old wait_on_bit and wait_on_bit_lock which
 can use the new version have been changed accordingly and their
 action functions have been discarded.
 wait_on_bit{_lock} does not return any specific error code in the
 event of a signal so the caller must check for non-zero and
 interpolate their own error code as appropriate.

The wait_on_bit() call in __fscache_wait_on_invalidate() was
ambiguous as it specified TASK_UNINTERRUPTIBLE but used
fscache_wait_bit_interruptible as an action function.
David Howells confirms this should be uniformly
"uninterruptible"

The main remaining user of wait_on_bit{,_lock}_action is NFS
which needs to use a freezer-aware schedule() call.

A comment in fs/gfs2/glock.c notes that having multiple 'action'
functions is useful as they display differently in the 'wchan'
field of 'ps'. (and /proc/$PID/wchan).
As the new bit_wait{,_io} functions are tagged "__sched", they
will not show up at all, but something higher in the stack.  So
the distinction will still be visible, only with different
function names (gds2_glock_wait versus gfs2_glock_dq_wait in the
gfs2/glock.c case).

Since first version of this patch (against 3.15) two new action
functions appeared, on in NFS and one in CIFS.  CIFS also now
uses an action function that makes the same freezer aware
schedule call as NFS.
Signed-off-by: NNeilBrown <neilb@suse.de>
Acked-by: David Howells <dhowells@redhat.com> (fscache, keys)
Acked-by: Steven Whitehouse <swhiteho@redhat.com> (gfs2)
Acked-by: NPeter Zijlstra <peterz@infradead.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Steve French <sfrench@samba.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20140707051603.28027.72349.stgit@notabene.brownSigned-off-by: NIngo Molnar <mingo@kernel.org>

74316201

14 7月, 2014 1 次提交

NFS: Don't reset pg_moreio in __nfs_pageio_add_request · f563b89b

由 Trond Myklebust 提交于 7月 13, 2014

Once we've started sending unstable NFS writes, we do not want to
clear pg_moreio, or we may end up sending the very last request as
a stable write if the commit lists are still empty.

Do, however, reset pg_moreio in the case where we end up having to
recoalesce the write if an attempt to use pNFS failed.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

f563b89b

13 7月, 2014 2 次提交

NFS: use ARRAY_SIZE instead of sizeof/sizeof[0] · 00216026

由 Fabian Frederick 提交于 6月 30, 2014

Use macro definition

Cc: Trond Myklebust <trond.myklebust@primarydata.com>
Cc: linux-nfs@vger.kernel.org
Signed-off-by: NFabian Frederick <fabf@skynet.be>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

00216026

NFSv4: Drop cast · 8ee2b78a

由 Himangi Saraogi 提交于 6月 27, 2014

This patch does away with the cast on void * as it is unnecessary.

The following Coccinelle semantic patch was used for making the change:

@r@
expression x;
void* e;
type T;
identifier f;
@@

(
  *((T *)e)
|
  ((T *)x)[...]
|
  ((T *)x)->f
|
- (T *)
  e
)
Signed-off-by: NHimangi Saraogi <himangi774@gmail.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

8ee2b78a