提交 · ba69cd122ece618eba47589764c7f9c1f57aed95 · openeuler / Kernel

03 1月, 2019 9 次提交

xprtrdma: Remove support for FMR memory registration · ba69cd12

由 Chuck Lever 提交于 12月 19, 2018

FMR is not supported on most recent RDMA devices. It is also less
secure than FRWR because an FMR memory registration can expose
adjacent bytes to remote reading or writing. As discussed during the
RDMA BoF at LPC 2018, it is time to remove support for FMR in the
NFS/RDMA client stack.

Note that NFS/RDMA server-side uses either local memory registration
or FRWR. FMR is not used.

There are a few Infiniband/RoCE devices in the kernel tree that do
not appear to support MEM_MGT_EXTENSIONS (FRWR), and therefore will
not support client-side NFS/RDMA after this patch. These are:

 - mthca
 - qib
 - hns (RoCE)

Users of these devices can use NFS/TCP on IPoIB instead.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

ba69cd12

xprtrdma: Reduce max_frwr_depth · a7886849

由 Chuck Lever 提交于 12月 19, 2018

Some devices advertise a large max_fast_reg_page_list_len
capability, but perform optimally when MRs are significantly smaller
than that depth -- probably when the MR itself is no larger than a
page.

By default, the RDMA R/W core API uses max_sge_rd as the maximum
page depth for MRs. For some devices, the value of max_sge_rd is
1, which is also not optimal. Thus, when max_sge_rd is larger than
1, use that value. Otherwise use the value of the
max_fast_reg_page_list_len attribute.

I've tested this with CX-3 Pro, FastLinq, and CX-5 devices. It
reproducibly improves the throughput of large I/Os by several
percent.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

a7886849

xprtrdma: Fix ri_max_segs and the result of ro_maxpages · 6946f823

由 Chuck Lever 提交于 12月 19, 2018

With certain combinations of krb5i/p, MR size, and r/wsize, I/O can
fail with EMSGSIZE. This is because the calculated value of
ri_max_segs (the max number of MRs per RPC) exceeded
RPCRDMA_MAX_HDR_SEGS, which caused Read or Write list encoding to
walk off the end of the transport header.

Once that was addressed, the ro_maxpages result has to be corrected
to account for the number of MRs needed for Reply chunks, which is
2 MRs smaller than a normal Read or Write chunk.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

6946f823

xprtrdma: Don't wake pending tasks until disconnect is done · 0c0829bc

由 Chuck Lever 提交于 12月 19, 2018

Transport disconnect processing does a "wake pending tasks" at
various points.

Suppose an RPC Reply is being processed. The RPC task that Reply
goes with is waiting on the pending queue. If a disconnect wake-up
happens before reply processing is done, that reply, even if it is
good, is thrown away, and the RPC has to be sent again.

This window apparently does not exist for socket transports because
there is a lock held while a reply is being received which prevents
the wake-up call until after reply processing is done.

To resolve this, all RPC replies being processed on an RPC-over-RDMA
transport have to complete before pending tasks are awoken due to a
transport disconnect.

Callers that already hold the transport write lock may invoke
->ops->close directly. Others use a generic helper that schedules
a close when the write lock can be taken safely.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

0c0829bc

xprtrdma: No qp_event disconnect · 3d433ad8

由 Chuck Lever 提交于 12月 19, 2018

After thinking about this more, and auditing other kernel ULP imple-
mentations, I believe that a DISCONNECT cm_event will occur after a
fatal QP event. If that's the case, there's no need for an explicit
disconnect in the QP event handler.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

3d433ad8

xprtrdma: Replace rpcrdma_receive_wq with a per-xprt workqueue · 6d2d0ee2

由 Chuck Lever 提交于 12月 19, 2018

To address a connection-close ordering problem, we need the ability
to drain the RPC completions running on rpcrdma_receive_wq for just
one transport. Give each transport its own RPC completion workqueue,
and drain that workqueue when disconnecting the transport.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

6d2d0ee2

xprtrdma: Refactor Receive accounting · 6ceea368

由 Chuck Lever 提交于 12月 19, 2018

Clean up: Divide the work cleanly:

- rpcrdma_wc_receive is responsible only for RDMA Receives
- rpcrdma_reply_handler is responsible only for RPC Replies
- the posted send and receive counts both belong in rpcrdma_ep
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

6ceea368

xprtrdma: Ensure MRs are DMA-unmapped when posting LOCAL_INV fails · b674c4b4

由 Chuck Lever 提交于 12月 19, 2018

The recovery case in frwr_op_unmap_sync needs to DMA unmap each MR.
frwr_release_mr does not DMA-unmap, but the recycle worker does.

Fixes: 61da886b ("xprtrdma: Explicitly resetting MRs is ... ")
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

b674c4b4

xprtrdma: Yet another double DMA-unmap · e2f34e26

由 Chuck Lever 提交于 12月 19, 2018

While chasing yet another set of DMAR fault reports, I noticed that
the frwr recycler conflates whether or not an MR has been DMA
unmapped with frwr->fr_state. Actually the two have only an indirect
relationship. It's in fact impossible to guess reliably whether the
MR has been DMA unmapped based on its fr_state field, especially as
the surrounding code and its assumptions have changed over time.

A better approach is to track the DMA mapping status explicitly so
that the recycler is less brittle to unexpected situations, and
attempts to DMA-unmap a second time are prevented.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Cc: stable@vger.kernel.org # v4.20
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

e2f34e26

20 12月, 2018 15 次提交

SUNRPC discard cr_uid from struct rpc_cred. · 04d1532b

由 NeilBrown 提交于 12月 03, 2018

Just use ->cr_cred->fsuid directly.
Signed-off-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

04d1532b

SUNRPC: simplify auth_unix. · 2edd8d74

由 NeilBrown 提交于 12月 03, 2018

1/ discard 'struct unx_cred'.  We don't need any data that
   is not already in 'struct rpc_cred'.
2/ Don't keep these creds in a hash table.  When a credential
   is needed, simply allocate it.  When not needed, discard it.
   This can easily be faster than performing a lookup on
   a shared hash table.
   As the lookup can happen during write-out, use a mempool
   to ensure forward progress.
   This means that we cannot compare two credentials for
   equality by comparing the pointers, but we never do that anyway.
Signed-off-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

2edd8d74

SUNRPC: remove crbind rpc_cred operation · d6efccd9

由 NeilBrown 提交于 12月 03, 2018

This now always just does get_rpccred(), so we
don't need an operation pointer to know to do that.
Signed-off-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

d6efccd9

SUNRPC: remove generic cred code. · 89a4f758

由 NeilBrown 提交于 12月 03, 2018

This is no longer used.
Signed-off-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

89a4f758

NFS/NFSD/SUNRPC: replace generic creds with 'struct cred'. · a52458b4

由 NeilBrown 提交于 12月 03, 2018

SUNRPC has two sorts of credentials, both of which appear as
"struct rpc_cred".
There are "generic credentials" which are supplied by clients
such as NFS and passed in 'struct rpc_message' to indicate
which user should be used to authorize the request, and there
are low-level credentials such as AUTH_NULL, AUTH_UNIX, AUTH_GSS
which describe the credential to be sent over the wires.

This patch replaces all the generic credentials by 'struct cred'
pointers - the credential structure used throughout Linux.

For machine credentials, there is a special 'struct cred *' pointer
which is statically allocated and recognized where needed as
having a special meaning.  A look-up of a low-level cred will
map this to a machine credential.
Signed-off-by: NNeilBrown <neilb@suse.com>
Acked-by: NJ. Bruce Fields <bfields@redhat.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

a52458b4

SUNRPC: remove RPCAUTH_AUTH_NO_CRKEY_TIMEOUT · 354698b7

由 NeilBrown 提交于 12月 03, 2018

This is no longer used.
Signed-off-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

354698b7

NFS: move credential expiry tracking out of SUNRPC into NFS. · ddf529ee

由 NeilBrown 提交于 12月 03, 2018

NFS needs to know when a credential is about to expire so that
it can modify write-back behaviour to finish the write inside the
expiry time.
It currently uses functions in SUNRPC code which make use of a
fairly complex callback scheme and flags in the generic credientials.

As I am working to discard the generic credentials, this has to change.

This patch moves the logic into NFS, in part by finding and caching
the low-level credential in the open_context.  We then make direct
cred-api calls on that.

This makes the code much simpler and removes a dependency on generic
rpc credentials.
Signed-off-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

ddf529ee

SUNRPC: add side channel to use non-generic cred for rpc call. · 1de7eea9

由 NeilBrown 提交于 12月 03, 2018

The credential passed in rpc_message.rpc_cred is always a
generic credential except in one instance.
When gss_destroying_context() calls rpc_call_null(), it passes
a specific credential that it needs to destroy.
In this case the RPC acts *on* the credential rather than
being authorized by it.

This special case deserves explicit support and providing that will
mean that rpc_message.rpc_cred is *always* generic, allowing
some optimizations.

So add "tk_op_cred" to rpc_task and "rpc_op_cred" to the setup data.
Use this to pass the cred down from rpc_call_null(), and have
rpcauth_bindcred() notice it and bind it in place.

Credit to kernel test robot <fengguang.wu@intel.com> for finding
a bug in earlier version of this patch.
Signed-off-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

1de7eea9

SUNRPC: introduce RPC_TASK_NULLCREDS to request auth_none · a68a72e1

由 NeilBrown 提交于 12月 03, 2018

In almost all cases the credential stored in rpc_message.rpc_cred
is a "generic" credential.  One of the two expections is when an
AUTH_NULL credential is used such as for RPC ping requests.

To improve consistency, don't pass an explicit credential in
these cases, but instead pass NULL and set a task flag,
similar to RPC_TASK_ROOTCREDS, which requests that NULL credentials
be used by default.
Signed-off-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

a68a72e1

NFS/SUNRPC: don't lookup machine credential until rpcauth_bindcred(). · 5e16923b

由 NeilBrown 提交于 12月 03, 2018

When NFS creates a machine credential, it is a "generic" credential,
not tied to any auth protocol, and is really just a container for
the princpal name.
This doesn't get linked to a genuine credential until rpcauth_bindcred()
is called.
The lookup always succeeds, so various places that test if the machine
credential is NULL, are pointless.

As a step towards getting rid of generic credentials, this patch gets
rid of generic machine credentials.  The nfs_client and rpc_client
just hold a pointer to a constant principal name.
When a machine credential is wanted, a special static 'struct rpc_cred'
pointer is used. rpcauth_bindcred() recognizes this, finds the
principal from the client, and binds the correct credential.
Signed-off-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

5e16923b

SUNRPC: remove machine_cred field from struct auth_cred · 1a80810f

由 NeilBrown 提交于 12月 03, 2018

The cred is a machine_cred iff ->principal is set, so there is no
need for the extra flag.

There is one case which deserves some
explanation. nfs4_root_machine_cred() calls rpc_lookup_machine_cred()
with a NULL principal name which results in not getting a machine
credential, but getting a root credential instead.
This appears to be what is expected of the caller, and is
clearly the result provided by both auth_unix and auth_gss
which already ignore the flag.
Signed-off-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

1a80810f

SUNRPC: remove uid and gid from struct auth_cred · 8276c902

由 NeilBrown 提交于 12月 03, 2018

Use cred->fsuid and cred->fsgid instead.
Signed-off-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

8276c902

SUNRPC: remove groupinfo from struct auth_cred. · fc0664fd

由 NeilBrown 提交于 12月 03, 2018

We can use cred->groupinfo (from the 'struct cred') instead.
Signed-off-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

fc0664fd

SUNRPC: add 'struct cred *' to auth_cred and rpc_cred · 97f68c6b

由 NeilBrown 提交于 12月 03, 2018

The SUNRPC credential framework was put together before
Linux has 'struct cred'.  Now that we have it, it makes sense to
use it.
This first step just includes a suitable 'struct cred *' pointer
in every 'struct auth_cred' and almost every 'struct rpc_cred'.

The rpc_cred used for auth_null has a NULL 'struct cred *' as nothing
else really makes sense.

For rpc_cred, the pointer is reference counted.
For auth_cred it isn't.  struct auth_cred are either allocated on
the stack, in which case the thread owns a reference to the auth,
or are part of 'struct generic_cred' in which case gc_base owns the
reference, and "acred" shares it.
Signed-off-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

97f68c6b

SUNRPC: allow /proc entries without CONFIG_SUNRPC_DEBUG · 8e2e5b7c

由 Ben Dooks 提交于 11月 28, 2018

If we want /proc/sys/sunrpc the current kernel also drags in other debug
features which we don't really want. Instead, we should always show the
following entries:

/proc/sys/sunrpc/udp_slot_table_entries
/proc/sys/sunrpc/tcp_slot_table_entries
/proc/sys/sunrpc/tcp_max_slot_table_entries
/proc/sys/sunrpc/min_resvport
/proc/sys/sunrpc/max_resvport
/proc/sys/sunrpc/tcp_fin_timeout
Signed-off-by: NBen Dooks <ben.dooks@codethink.co.uk>
Signed-off-by: NThomas Preston <thomas.preston@codethink.co.uk>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

8e2e5b7c

19 12月, 2018 3 次提交

SUNRPC: Remove xprt_connect_status() · abc13275

由 Trond Myklebust 提交于 12月 17, 2018

Over the years, xprt_connect_status() has been superseded by
call_connect_status(), which now handles all the errors that
xprt_connect_status() does and more. Since the latter converts
all errors that it doesn't recognise to EIO, then it is time
for it to be retired.
Reported-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>
Tested-by: NChuck Lever <chuck.lever@oracle.com>

abc13275

SUNRPC: Fix a race with XPRT_CONNECTING · cf76785d

由 Trond Myklebust 提交于 12月 17, 2018

Ensure that we clear XPRT_CONNECTING before releasing the XPRT_LOCK so that
we don't have races between the (asynchronous) socket setup code and
tasks in xprt_connect().
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>
Tested-by: NChuck Lever <chuck.lever@oracle.com>

cf76785d

SUNRPC: Fix disconnection races · 0445f92c

由 Trond Myklebust 提交于 12月 17, 2018

When the socket is closed, we need to call xprt_disconnect_done() in order
to clean up the XPRT_WRITE_SPACE flag, and wake up the sleeping tasks.

However, we also want to ensure that we don't wake them up before the socket
is closed, since that would cause thundering herd issues with everyone
piling up to retransmit before the TCP shutdown dance has completed.
Only the task that holds XPRT_LOCKED needs to wake up early in order to
allow the close to complete.
Reported-by: NDave Wysochanski <dwysocha@redhat.com>
Reported-by: NScott Mayhew <smayhew@redhat.com>
Cc: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>
Tested-by: NChuck Lever <chuck.lever@oracle.com>

0445f92c

05 12月, 2018 6 次提交

SUNRPC: Don't force a redundant disconnection in xs_read_stream() · 79462857

由 Trond Myklebust 提交于 12月 03, 2018

If the connection is broken, then xs_tcp_state_change() will take care
of scheduling the socket close as soon as appropriate. xs_read_stream()
just needs to report the error.
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>

79462857

SUNRPC: Fix up socket polling · dfcf0380

由 Trond Myklebust 提交于 12月 04, 2018

Ensure that we do not exit the socket read callback without clearing
XPRT_SOCK_DATA_READY.
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>

dfcf0380

SUNRPC: Use the discard iterator rather than MSG_TRUNC · b76a5afd

由 Trond Myklebust 提交于 12月 03, 2018

When discarding message data from the stream, we're better off using
the discard iterator, since that will work with non-TCP streams.
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>

b76a5afd

T
SUNRPC: Treat EFAULT as a truncated message in xs_read_stream_request() · 26781eab
由 Trond Myklebust 提交于 12月 03, 2018
```
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>
```
26781eab

SUNRPC: Fix up handling of the XDRBUF_SPARSE_PAGES flag · 16e5e90f

由 Trond Myklebust 提交于 12月 02, 2018

If the allocator fails before it has reached the target number of pages,
then we need to recheck that we're not seeking past the page buffer.
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>

16e5e90f

SUNRPC: Fix RPC receive hangs · c4433055

由 Trond Myklebust 提交于 12月 04, 2018

The RPC code is occasionally hanging when the receive code fails to
empty the socket buffer due to a partial read of the data. When we
convert that to an EAGAIN, it appears we occasionally leave data in the
socket. The fix is to just keep reading until the socket returns
EAGAIN/EWOULDBLOCK.
Reported-by: NCatalin Marinas <catalin.marinas@arm.com>
Reported-by: NCristian Marussi <cristian.marussi@arm.com>
Reported-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>
Tested-by: NCatalin Marinas <catalin.marinas@arm.com>
Tested-by: NCristian Marussi <cristian.marussi@arm.com>

c4433055

02 12月, 2018 4 次提交

SUNRPC: Fix a potential race in xprt_connect() · 0a9a4304

由 Trond Myklebust 提交于 12月 01, 2018

If an asynchronous connection attempt completes while another task is
in xprt_connect(), then the call to rpc_sleep_on() could end up
racing with the call to xprt_wake_pending_tasks().
So add a second test of the connection state after we've put the
task to sleep and set the XPRT_CONNECTING flag, when we know that there
can be no asynchronous connection attempts still in progress.

Fixes: 0b9e7943 ("SUNRPC: Move the test for XPRT_CONNECTING into...")
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>

0a9a4304

SUNRPC: Fix a memory leak in call_encode() · 71700bb9

由 Trond Myklebust 提交于 11月 30, 2018

If we retransmit an RPC request, we currently end up clobbering the
value of req->rq_rcv_buf.bvec that was allocated by the initial call to
xprt_request_prepare(req).
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>

71700bb9

SUNRPC: Fix leak of krb5p encode pages · 8dae5398

由 Chuck Lever 提交于 11月 30, 2018

call_encode can be invoked more than once per RPC call. Ensure that
each call to gss_wrap_req_priv does not overwrite pointers to
previously allocated memory.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Cc: stable@kernel.org
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>

8dae5398

SUNRPC: call_connect_status() must handle tasks that got transmitted · 9bd11523

由 Trond Myklebust 提交于 11月 30, 2018

If a task failed to get the write lock in the call to xprt_connect(), then
it will be queued on xprt->sending. In that case, it is possible for it
to get transmitted before the call to call_connect_status(), in which
case it needs to be handled by call_transmit_status() instead.
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>

9bd11523

13 11月, 2018 2 次提交

T
SUNRPC: Fix a bogus get/put in generic_key_to_expire() · e3d5e573
由 Trond Myklebust 提交于 11月 12, 2018
```
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>
```
e3d5e573

SUNRPC: Fix a Oops when destroying the RPCSEC_GSS credential cache · a652a4bc

由 Trond Myklebust 提交于 11月 12, 2018

Commit 07d02a67 causes a use-after free in the RPCSEC_GSS credential
destroy code, because the call to get_rpccred() in gss_destroying_context()
will now always fail to increment the refcount.

While we could just replace the get_rpccred() with a refcount_set(), that
would have the unfortunate consequence of resurrecting a credential in
the credential cache for which we are in the process of destroying the
RPCSEC_GSS context. Rather than do this, we choose to make a copy that
is never added to the cache and use that to destroy the context.

Fixes: 07d02a67 ("SUNRPC: Simplify lookup code")
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>

a652a4bc

09 11月, 2018 1 次提交

SUNRPC: drop pointless static qualifier in xdr_get_next_encode_buffer() · 025911a5

由 YueHaibing 提交于 11月 08, 2018

There is no need to have the '__be32 *p' variable static since new value
always be assigned before use it.
Signed-off-by: NYueHaibing <yuehaibing@huawei.com>
Cc: stable@vger.kernel.org
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

025911a5

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功