提交 · b6808145ad2aa625b962fc55f30484091d5e8fe7 · openanolis / cloud-kernel

20 12月, 2016 13 次提交

NFSv4: Add encode/decode of the layoutreturn op in OPEN_DOWNGRADE · b6808145

由 Trond Myklebust 提交于 11月 20, 2016

While we do not need to return the RW layout when downgrading from a
read/write open state to read-only, we might want to do so in order
to reduce the burden on the metadataserver so that it does not need
to check for changed data when responding to GETATTR requests.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

b6808145

NFS: Don't disconnect open-owner on NFS4ERR_BAD_SEQID · 86cfb041

由 NeilBrown 提交于 12月 19, 2016

When an NFS4ERR_BAD_SEQID is received the open-owner is removed from
the ->state_owners rbtree so that it will no longer be used.

If any stateids attached to this open-owner are still in use, and if a
request using one gets an NFS4ERR_BAD_STATEID reply, this can for bad.

The state is marked as needing recovery and the nfs4_state_manager()
is scheduled to clean up. nfs4_state_manager() finds states to be
recovered by walking the state_owners rbtree. As the open-owner is
not in the rbtree, the bad state is not found so nfs4_state_manager()
completes having done nothing. The request is then retried, with a
predicatable result (indefinite retries).

If the stateid is for a delegation, this open_owner will be used
to open files when the delegation is returned. For that to work,
a new open-owner needs to be presented to the server.

This patch changes NFS4ERR_BAD_SEQID handling to leave the open-owner
in the rbtree but updates the 'create_time' so it looks like a new
open-owner. With this the indefinite retries no longer happen.
Signed-off-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

86cfb041

NFSv4: ensure __nfs4_find_lock_state returns consistent result. · 3f8f2548

由 NeilBrown 提交于 12月 19, 2016

If a file has both flock locks and OFD locks, then it is possible that
two different nfs4 lock states could apply to file accesses from a
single process.

It is not possible to know, efficiently, which one is "correct".
Presumably the state which represents a lock that covers the region
undergoing IO would be the "correct" one to use, but finding that has
a non-trivial cost and would provide miniscule value.

Currently we just return whichever is first in the list, which could
result in inconsistent behaviour if an application ever put it self in
this position.  As consistent behaviour is preferable (when perfectly
correct behaviour is not available), change the search to return a
consistent result in this circumstance.
Specifically: if there is both a flock and OFD lock state, always return
the flock one.
Reviewed-by: NJeff Layton <jlayton@redhat.com>
Signed-off-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

3f8f2548

NFSv4.1: nfs4_fl_prepare_ds must be careful about reporting success. · cfd278c2

由 NeilBrown 提交于 12月 19, 2016

Various places assume that if nfs4_fl_prepare_ds() turns a non-NULL 'ds',
then ds->ds_clp will also be non-NULL.

This is not necessasrily true in the case when the process received a fatal signal
while nfs4_pnfs_ds_connect is waiting in nfs4_wait_ds_connect().
In that case ->ds_clp may not be set, and the devid may not recently have been marked
unavailable.

So add a test for ds_clp == NULL and return NULL in that case.

Fixes: c23266d5 ("NFS4.1 Fix data server connection race")
Signed-off-by: NNeilBrown <neilb@suse.com>
Acked-by: NOlga Kornievskaia <aglo@umich.edu>
Acked-by: NAdamson, Andy <William.Adamson@netapp.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

cfd278c2

pNFS/flexfiles: delete deviceid, don't mark inactive · 1c48cee8

由 Weston Andros Adamson 提交于 12月 14, 2016

Instead of marking a device inactive, remove it from the cache entirely.

Flexfiles has a way to report errors back to the server, so we don't want
to stop devices from being tried again for 120 seconds.
Signed-off-by: NWeston Andros Adamson <dros@primarydata.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

1c48cee8

T
NFS: Clean up nfs_attribute_timeout() · 187e593d
由 Trond Myklebust 提交于 12月 16, 2016
```
It can be made static.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
```
187e593d
T
NFS: Remove unused function nfs_revalidate_inode_rcu() · 3f642a13
由 Trond Myklebust 提交于 12月 16, 2016
```
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
```
3f642a13

NFS: Fix and clean up the access cache validity checking · 21c3ba7e

由 Trond Myklebust 提交于 12月 16, 2016

The access cache needs to check whether or not the mode bits, ownership,
or ACL has changed or the cache has timed out.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

21c3ba7e

NFS: Only look at the change attribute cache state in nfs_weak_revalidate() · 9cdd1d3f

由 Trond Myklebust 提交于 12月 16, 2016

Just like in nfs_check_verifier(), we want to use
nfs_mapping_need_revalidate_inode() to check our knowledge of the
change attribute is up to date.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

9cdd1d3f

NFS: Clean up cache validity checking · 61540bf6

由 Trond Myklebust 提交于 12月 08, 2016

Consolidate the open-coded checking of NFS_I(inode)->cache_validity
into a couple of helper functions.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

61540bf6

NFS: Don't revalidate the file on close if we hold a delegation · 58ff4184

由 Trond Myklebust 提交于 12月 16, 2016

If we're holding a delegation, we can skip sending the close-to-open
GETATTR until we're returning that delegation.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

58ff4184

NFSv4: Don't discard the attributes returned by asynchronous DELEGRETURN · 0bc2c9b4

由 Trond Myklebust 提交于 12月 16, 2016

DELEGRETURN will always carry a reference to the inode except when
the latter is being freed, so let's ensure that we always use that
inode information to ensure close-to-open cache consistency, even
when the DELEGRETURN call is asynchronous.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

0bc2c9b4

NFSv4: Update the attribute cache info in update_changeattr · e603a4c1

由 Trond Myklebust 提交于 12月 16, 2016

If we successfully updated the change attribute, we should timestamp the
cache. While we do know that the other attributes are not completely up
to date, we have the NFS_INO_INVALID_ATTR flag that let us know that,
so it is valid to say that the cache has not timed out.
We can also clear NFS_INO_REVAL_PAGECACHE, since our change attribute
is now known to be valid.

Conversely, if the change attribute did not match, we should make sure to
also revalidate the access and ACL caches.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

e603a4c1

10 12月, 2016 5 次提交

Merge tag 'nfs-rdma-4.10-1' of git://git.linux-nfs.org/projects/anna/nfs-rdma · 2549f307

由 Trond Myklebust 提交于 12月 10, 2016

NFS: NFSoRDMA Client Side Changes

New Features:
- Support for SG_GAP devices

Bugfixes and cleanups:
- Cap size of callback buffer resources
- Improve send queue and RPC metric accounting
- Fix coverity warning
- Avoid calls to ro_unmap_safe()
- Refactor FRMR invalidation
- Error message improvements

2549f307

SUNRPC: fix refcounting problems with auth_gss messages. · 1cded9d2

由 NeilBrown 提交于 12月 05, 2016

There are two problems with refcounting of auth_gss messages.

First, the reference on the pipe->pipe list (taken by a call
to rpc_queue_upcall()) is not counted.  It seems to be
assumed that a message in pipe->pipe will always also be in
pipe->in_downcall, where it is correctly reference counted.

However there is no guaranty of this.  I have a report of a
NULL dereferences in rpc_pipe_read() which suggests a msg
that has been freed is still on the pipe->pipe list.

One way I imagine this might happen is:
- message is queued for uid=U and auth->service=S1
- rpc.gssd reads this message and starts processing.
  This removes the message from pipe->pipe
- message is queued for uid=U and auth->service=S2
- rpc.gssd replies to the first message. gss_pipe_downcall()
  calls __gss_find_upcall(pipe, U, NULL) and it finds the
  *second* message, as new messages are placed at the head
  of ->in_downcall, and the service type is not checked.
- This second message is removed from ->in_downcall and freed
  by gss_release_msg() (even though it is still on pipe->pipe)
- rpc.gssd tries to read another message, and dereferences a pointer
  to this message that has just been freed.

I fix this by incrementing the reference count before calling
rpc_queue_upcall(), and decrementing it if that fails, or normally in
gss_pipe_destroy_msg().

It seems strange that the reply doesn't target the message more
precisely, but I don't know all the details.  In any case, I think the
reference counting irregularity became a measureable bug when the
extra arg was added to __gss_find_upcall(), hence the Fixes: line
below.

The second problem is that if rpc_queue_upcall() fails, the new
message is not freed. gss_alloc_msg() set the ->count to 1,
gss_add_msg() increments this to 2, gss_unhash_msg() decrements to 1,
then the pointer is discarded so the memory never gets freed.

Fixes: 9130b8db ("SUNRPC: allow for upcalls for same uid but different gss service")
Cc: stable@vger.kernel.org
Link: https://bugzilla.opensuse.org/show_bug.cgi?id=1011250Signed-off-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

1cded9d2

nfs: add support for the umask attribute · dff25ddb

由 Andreas Gruenbacher 提交于 12月 02, 2016

Clients can set the umask attribute when creating files to cause the
server to apply it always except when inheriting permissions from the
parent directory. That way, the new files will end up with the same
permissions as files created locally.

See https://tools.ietf.org/html/draft-ietf-nfsv4-umask-02 for more details.
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

dff25ddb

pNFS/flexfiles: Ensure we have enough buffer for layoutreturn · d9152114

由 Trond Myklebust 提交于 12月 09, 2016

The flexfiles client can piggyback both layout errors and layoutstats
as part of the layoutreturn. Both these payloads can get large, with
20 layout error entries taking up about 1.2K, and 4 layoutstats entries
taking up another 1K.
This patch allows a maximum payload of 4k by allocating a full page.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

d9152114

T
pNFS/flexfiles: Remove a redundant parameter in ff_layout_encode_ioerr() · 5ba6a09e
由 Trond Myklebust 提交于 12月 09, 2016
```
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
```
5ba6a09e

09 12月, 2016 1 次提交

pNFS/flexfiles: Fix a deadlock on LAYOUTGET · 65990d1a

由 Fred Isaman 提交于 9月 30, 2016

  We encountered a deadlock where the SEQUENCE that accompanied the
LAYOUTGET triggered a session drain, while ff_layout_alloc_lseg
triggered a GETDEVICEINFO.  The GETDEVICEINFO hung waiting for the
session drain, while the LAYOUTGET held the slot waiting for
alloc_lseg to finish.
  Avoid this by moving the call to nfs4_find_get_deviceid out of
ff_layout_alloc_lseg and into nfs4_ff_layout_prepare_ds.
Signed-off-by: NFred Isaman <fred.isaman@gmail.com>
[dros@primarydata.com: pNFS/flexfiles: fix races in ff_layout_mirror_valid]
Signed-off-by: NWeston Andros Adamson <dros@primarydata.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

65990d1a

08 12月, 2016 3 次提交

pNFS: Layoutreturn must free the layout after the layout-private data · 2f065ddb

由 Trond Myklebust 提交于 12月 07, 2016

The layout-private data may depend on the layout and/or the inode
still existing when it does post-processing and frees its data, so we
need to free them after calling lrp->ld_private.ops->free().

This fixes a mirror list corruption issue in the flexfiles driver.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

2f065ddb

pNFS/flexfiles: Fix ff_layout_add_ds_error_locked() · cb067935

由 Trond Myklebust 提交于 12月 06, 2016

When we're merging an old entry into our new entry, we want to ensure that
we add the list entry in the correct place.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

cb067935

NFSv4: Add missing nfs_put_lock_context() · 7a0566b3

由 NeilBrown 提交于 12月 06, 2016

Otherwise the lock context won't be freed when we're done with it.

From: NeilBrown <neilb@suse.com>
Fixes: 5bd3f817 ("NFSv4: change nfs4_select_rw_stateid to take a lock_context inplace of lock_owner")
Signed-off-by: NAnna Schumaker <Anna.Schumaker@netapp.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

7a0566b3

06 12月, 2016 1 次提交

pNFS: Release NFS_LAYOUT_RETURN when invalidating the layout stateid · 362fb578

由 Trond Myklebust 提交于 12月 05, 2016

Ensure we release the NFS_LAYOUT_RETURN lock when we invalidate the
layout stateid, so that processes and RPC tasks that are waiting on
the layout return can continue.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

362fb578

05 12月, 2016 5 次提交

NFSv4.1: Don't schedule lease recovery in nfs4_schedule_session_recovery() · d94cbf6c

由 Trond Myklebust 提交于 12月 04, 2016

If the session has an error, then we want to start by recovering the
session, as any SEQUENCE we send is going to fail with a session
error.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

d94cbf6c

NFSv4.1: Handle NFS4ERR_BADSESSION/NFS4ERR_DEADSESSION replies to OP_SEQUENCE · 2cf10cdd

由 Trond Myklebust 提交于 12月 04, 2016

In the case where SEQUENCE receives a NFS4ERR_BADSESSION or
NFS4ERR_DEADSESSION error, we just want to report the session as needing
recovery, and then we want to retry the operation.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

2cf10cdd

NFS: Only look at the change attribute cache state in nfs_check_verifier · 1cd9cb05

由 Trond Myklebust 提交于 12月 04, 2016

When looking at whether or not our dcache is valid, we really don't care
about the general state of the directory attribute cache. Instead, we
we only care about the state of the change attribute.

This fixes a performance issue when the client is responsible for
changing the directory contents; a number of NFSv4 operations will
atomically update the directory change attribute, but may not return
all the other attributes.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

1cd9cb05

NFS: Fix incorrect size revalidation when holding a delegation · 9310b224

由 Trond Myklebust 提交于 12月 04, 2016

We should only care about checking the attributes if the page cache
is marked as dubious (using NFS_INO_REVAL_PAGECACHE) and the
NFS_INO_REVAL_FORCED flag is set.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

9310b224

NFS: Fix incorrect mapping revalidation when holding a delegation · 10727772

由 Trond Myklebust 提交于 12月 04, 2016

We should only care about checking the attributes if the page cache
is marked as dubious (using NFS_INO_REVAL_PAGECACHE) and the
NFS_INO_REVAL_FORCED flag is set.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

10727772

04 12月, 2016 6 次提交

pNFS/flexfiles: Support sending layoutstats in layoutreturn · 230bc962

由 Trond Myklebust 提交于 10月 19, 2016

Add the ability to send an array of layoutstats entries as part of
layoutreturn.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

230bc962

T
pNFS/flexfiles: Minor refactoring before adding iostats to layoutreturn · 422c93c8
由 Trond Myklebust 提交于 10月 06, 2016
```
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
```
422c93c8

NFS: Fix up read of mirror stats · 2f8220c1

由 Trond Myklebust 提交于 10月 03, 2016

Need to lock while reading in order to ensure 64-bit reads are correct.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

2f8220c1

T
pNFS/flexfiles: Clean up layoutstats · 08e2e5bc
由 Trond Myklebust 提交于 9月 29, 2016
```
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
```
08e2e5bc

pNFS/flexfiles: Refactor encoding of the layoutreturn payload · 5b9b3c85

由 Trond Myklebust 提交于 12月 02, 2016

Add the layout error payload to the flexfiles layoutreturn private
data, and set up the encoding mechanisms. This is a refactoring in
preparation for adding the layout iostats payload.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

5b9b3c85

pNFS: Add a layoutreturn callback to performa layout-private setup · 287bd3e9

由 Trond Myklebust 提交于 12月 02, 2016

Add a callback to allow the flexfiles layout driver to initialise the
layout private payload.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

287bd3e9

03 12月, 2016 6 次提交

pNFS: Allow layout drivers to manage private data in struct nfs4_layoutreturn · 4d796d75

由 Trond Myklebust 提交于 9月 23, 2016

Cleanup to allow layout drivers to attach private data to layoutreturn,
and manage the data.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

4d796d75

T
NFSv4: Add a generic structure for managing layout-private information · f8c3cf9d
由 Trond Myklebust 提交于 10月 20, 2016
```
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
```
f8c3cf9d

pNFS/flexfiles: Only send layoutstats updates for mirrors that were updated · 06946c6a

由 Trond Myklebust 提交于 11月 25, 2016

If there have been no reads or writes to a given mirror since the last
layoutstats update, then don't resend the same data.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

06946c6a

T
pNFS/flexfiles: Don't attempt to send layoutstats if there are no entries · 46c98c6d
由 Trond Myklebust 提交于 11月 25, 2016
```
If the list of mirrors is empty, then don't send an RPC call.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
```
46c98c6d

NFS: Allow getattr to also report readdirplus cache hits · 1bcf4c5c

由 Trond Myklebust 提交于 12月 02, 2016

If the use called stat() on an 'ls -l' workload, and the attribute
cache was successfully revalidate by READDIRPLUS, then we want to
report that back so that the readdir code continues to use
readdirplus.
Reviewed-by: NBenjamin Coddington <bcodding@redhat.com>
Tested-by: NBenjamin Coddington <bcodding@redhat.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

1bcf4c5c

NFS: Be more targeted about readdirplus use when doing lookup/revalidation · 63519fbc

由 Trond Myklebust 提交于 11月 19, 2016

There is little point in setting NFS_INO_ADVISE_RDPLUS in nfs_lookup and
nfs_lookup_revalidate() unless a process is actually doing readdir on the
parent directory.
Furthermore, there is little point in using readdirplus if we're trying
to revalidate a negative dentry.
Reviewed-by: NBenjamin Coddington <bcodding@redhat.com>
Tested-by: NBenjamin Coddington <bcodding@redhat.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

63519fbc

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功