提交 · aba11831794356ff58da69de46a125e6335eb9ca · openeuler / Kernel

03 1月, 2019 18 次提交

xprtrdma: Clean up of xprtrdma chunk trace points · aba11831

由 Chuck Lever 提交于 12月 19, 2018

The chunk-related trace points capture nearly the same information
as the MR-related trace points.

Also, rename them so globbing can be used to enable or disable
these trace points more easily.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

aba11831

xprtrdma: Remove unused fields from rpcrdma_ia · 9bef848f

由 Chuck Lever 提交于 12月 19, 2018

Clean up. The last use of these fields was in commit 173b8f49
("xprtrdma: Demote "connect" log messages") .
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

9bef848f

xprtrdma: Cull dprintk() call sites · ddbb347f

由 Chuck Lever 提交于 12月 19, 2018

Clean up: Remove dprintk() call sites that report rare or impossible
errors. Leave a few that display high-value low noise status
information.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

ddbb347f

xprtrdma: Simplify locking that protects the rl_allreqs list · 92f4433e

由 Chuck Lever 提交于 12月 19, 2018

Clean up: There's little chance of contention between the use of
rb_lock and rb_reqslock, so merge the two. This avoids having to
take both in some (possibly future) cases.

Transport tear-down is already serialized, thus there is no need for
locking at all when destroying rpcrdma_reqs.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

92f4433e

xprtrdma: Expose transport header errors · 236b0943

由 Chuck Lever 提交于 12月 19, 2018

For better observability of parsing errors, return the error code
generated in the decoders to the upper layer consumer.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

236b0943

xprtrdma: Remove request_module from backchannel · 889ee07f

由 Chuck Lever 提交于 12月 19, 2018

Since commit ffe1f0df ("rpcrdma: Merge svcrdma and xprtrdma
modules into one"), the forward and backchannel components are part
of the same kernel module. A separate request_module() call in the
backchannel code is no longer necessary.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

889ee07f

xprtrdma: Recognize XDRBUF_SPARSE_PAGES · 15303d9e

由 Chuck Lever 提交于 12月 19, 2018

Commit 431f6eb3 ("SUNRPC: Add a label for RPC calls that require
allocation on receive") didn't update similar logic in rpc_rdma.c.
I don't think this is a bug, per-se; the commit just adds more
careful checking for broken upper layer behavior.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

15303d9e

xprtrdma: Plant XID in on-the-wire RDMA offset (FRWR) · 0a93fbcb

由 Chuck Lever 提交于 12月 19, 2018

Place the associated RPC transaction's XID in the upper 32 bits of
each RDMA segment's rdma_offset field. There are two reasons to do
this:

- The R_key only has 8 bits that are different from registration to
  registration. The XID adds more uniqueness to each RDMA segment to
  reduce the likelihood of a software bug on the server reading from
  or writing into memory it's not supposed to.

- On-the-wire RDMA Read and Write requests do not otherwise carry
  any identifier that matches them up to an RPC. The XID in the
  upper 32 bits will act as an eye-catcher in network captures.
Suggested-by: NTom Talpey <ttalpey@microsoft.com>
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

0a93fbcb

xprtrdma: Remove rpcrdma_memreg_ops · 5f62412b

由 Chuck Lever 提交于 12月 19, 2018

Clean up: Now that there is only FRWR, there is no need for a memory
registration switch. The indirect calls to the memreg operations can
be replaced with faster direct calls.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

5f62412b

xprtrdma: Remove support for FMR memory registration · ba69cd12

由 Chuck Lever 提交于 12月 19, 2018

FMR is not supported on most recent RDMA devices. It is also less
secure than FRWR because an FMR memory registration can expose
adjacent bytes to remote reading or writing. As discussed during the
RDMA BoF at LPC 2018, it is time to remove support for FMR in the
NFS/RDMA client stack.

Note that NFS/RDMA server-side uses either local memory registration
or FRWR. FMR is not used.

There are a few Infiniband/RoCE devices in the kernel tree that do
not appear to support MEM_MGT_EXTENSIONS (FRWR), and therefore will
not support client-side NFS/RDMA after this patch. These are:

 - mthca
 - qib
 - hns (RoCE)

Users of these devices can use NFS/TCP on IPoIB instead.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

ba69cd12

xprtrdma: Reduce max_frwr_depth · a7886849

由 Chuck Lever 提交于 12月 19, 2018

Some devices advertise a large max_fast_reg_page_list_len
capability, but perform optimally when MRs are significantly smaller
than that depth -- probably when the MR itself is no larger than a
page.

By default, the RDMA R/W core API uses max_sge_rd as the maximum
page depth for MRs. For some devices, the value of max_sge_rd is
1, which is also not optimal. Thus, when max_sge_rd is larger than
1, use that value. Otherwise use the value of the
max_fast_reg_page_list_len attribute.

I've tested this with CX-3 Pro, FastLinq, and CX-5 devices. It
reproducibly improves the throughput of large I/Os by several
percent.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

a7886849

xprtrdma: Fix ri_max_segs and the result of ro_maxpages · 6946f823

由 Chuck Lever 提交于 12月 19, 2018

With certain combinations of krb5i/p, MR size, and r/wsize, I/O can
fail with EMSGSIZE. This is because the calculated value of
ri_max_segs (the max number of MRs per RPC) exceeded
RPCRDMA_MAX_HDR_SEGS, which caused Read or Write list encoding to
walk off the end of the transport header.

Once that was addressed, the ro_maxpages result has to be corrected
to account for the number of MRs needed for Reply chunks, which is
2 MRs smaller than a normal Read or Write chunk.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

6946f823

xprtrdma: Don't wake pending tasks until disconnect is done · 0c0829bc

由 Chuck Lever 提交于 12月 19, 2018

Transport disconnect processing does a "wake pending tasks" at
various points.

Suppose an RPC Reply is being processed. The RPC task that Reply
goes with is waiting on the pending queue. If a disconnect wake-up
happens before reply processing is done, that reply, even if it is
good, is thrown away, and the RPC has to be sent again.

This window apparently does not exist for socket transports because
there is a lock held while a reply is being received which prevents
the wake-up call until after reply processing is done.

To resolve this, all RPC replies being processed on an RPC-over-RDMA
transport have to complete before pending tasks are awoken due to a
transport disconnect.

Callers that already hold the transport write lock may invoke
->ops->close directly. Others use a generic helper that schedules
a close when the write lock can be taken safely.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

0c0829bc

xprtrdma: No qp_event disconnect · 3d433ad8

由 Chuck Lever 提交于 12月 19, 2018

After thinking about this more, and auditing other kernel ULP imple-
mentations, I believe that a DISCONNECT cm_event will occur after a
fatal QP event. If that's the case, there's no need for an explicit
disconnect in the QP event handler.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

3d433ad8

xprtrdma: Replace rpcrdma_receive_wq with a per-xprt workqueue · 6d2d0ee2

由 Chuck Lever 提交于 12月 19, 2018

To address a connection-close ordering problem, we need the ability
to drain the RPC completions running on rpcrdma_receive_wq for just
one transport. Give each transport its own RPC completion workqueue,
and drain that workqueue when disconnecting the transport.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

6d2d0ee2

xprtrdma: Refactor Receive accounting · 6ceea368

由 Chuck Lever 提交于 12月 19, 2018

Clean up: Divide the work cleanly:

- rpcrdma_wc_receive is responsible only for RDMA Receives
- rpcrdma_reply_handler is responsible only for RPC Replies
- the posted send and receive counts both belong in rpcrdma_ep
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

6ceea368

xprtrdma: Ensure MRs are DMA-unmapped when posting LOCAL_INV fails · b674c4b4

由 Chuck Lever 提交于 12月 19, 2018

The recovery case in frwr_op_unmap_sync needs to DMA unmap each MR.
frwr_release_mr does not DMA-unmap, but the recycle worker does.

Fixes: 61da886b ("xprtrdma: Explicitly resetting MRs is ... ")
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

b674c4b4

xprtrdma: Yet another double DMA-unmap · e2f34e26

由 Chuck Lever 提交于 12月 19, 2018

While chasing yet another set of DMAR fault reports, I noticed that
the frwr recycler conflates whether or not an MR has been DMA
unmapped with frwr->fr_state. Actually the two have only an indirect
relationship. It's in fact impossible to guess reliably whether the
MR has been DMA unmapped based on its fr_state field, especially as
the surrounding code and its assumptions have changed over time.

A better approach is to track the DMA mapping status explicitly so
that the recycler is less brittle to unexpected situations, and
attempts to DMA-unmap a second time are prevented.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Cc: stable@vger.kernel.org # v4.20
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

e2f34e26

20 12月, 2018 15 次提交

SUNRPC discard cr_uid from struct rpc_cred. · 04d1532b

由 NeilBrown 提交于 12月 03, 2018

Just use ->cr_cred->fsuid directly.
Signed-off-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

04d1532b

SUNRPC: simplify auth_unix. · 2edd8d74

由 NeilBrown 提交于 12月 03, 2018

1/ discard 'struct unx_cred'.  We don't need any data that
   is not already in 'struct rpc_cred'.
2/ Don't keep these creds in a hash table.  When a credential
   is needed, simply allocate it.  When not needed, discard it.
   This can easily be faster than performing a lookup on
   a shared hash table.
   As the lookup can happen during write-out, use a mempool
   to ensure forward progress.
   This means that we cannot compare two credentials for
   equality by comparing the pointers, but we never do that anyway.
Signed-off-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

2edd8d74

SUNRPC: remove crbind rpc_cred operation · d6efccd9

由 NeilBrown 提交于 12月 03, 2018

This now always just does get_rpccred(), so we
don't need an operation pointer to know to do that.
Signed-off-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

d6efccd9

SUNRPC: remove generic cred code. · 89a4f758

由 NeilBrown 提交于 12月 03, 2018

This is no longer used.
Signed-off-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

89a4f758

NFS/NFSD/SUNRPC: replace generic creds with 'struct cred'. · a52458b4

由 NeilBrown 提交于 12月 03, 2018

SUNRPC has two sorts of credentials, both of which appear as
"struct rpc_cred".
There are "generic credentials" which are supplied by clients
such as NFS and passed in 'struct rpc_message' to indicate
which user should be used to authorize the request, and there
are low-level credentials such as AUTH_NULL, AUTH_UNIX, AUTH_GSS
which describe the credential to be sent over the wires.

This patch replaces all the generic credentials by 'struct cred'
pointers - the credential structure used throughout Linux.

For machine credentials, there is a special 'struct cred *' pointer
which is statically allocated and recognized where needed as
having a special meaning.  A look-up of a low-level cred will
map this to a machine credential.
Signed-off-by: NNeilBrown <neilb@suse.com>
Acked-by: NJ. Bruce Fields <bfields@redhat.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

a52458b4

SUNRPC: remove RPCAUTH_AUTH_NO_CRKEY_TIMEOUT · 354698b7

由 NeilBrown 提交于 12月 03, 2018

This is no longer used.
Signed-off-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

354698b7

NFS: move credential expiry tracking out of SUNRPC into NFS. · ddf529ee

由 NeilBrown 提交于 12月 03, 2018

NFS needs to know when a credential is about to expire so that
it can modify write-back behaviour to finish the write inside the
expiry time.
It currently uses functions in SUNRPC code which make use of a
fairly complex callback scheme and flags in the generic credientials.

As I am working to discard the generic credentials, this has to change.

This patch moves the logic into NFS, in part by finding and caching
the low-level credential in the open_context.  We then make direct
cred-api calls on that.

This makes the code much simpler and removes a dependency on generic
rpc credentials.
Signed-off-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

ddf529ee

SUNRPC: add side channel to use non-generic cred for rpc call. · 1de7eea9

由 NeilBrown 提交于 12月 03, 2018

The credential passed in rpc_message.rpc_cred is always a
generic credential except in one instance.
When gss_destroying_context() calls rpc_call_null(), it passes
a specific credential that it needs to destroy.
In this case the RPC acts *on* the credential rather than
being authorized by it.

This special case deserves explicit support and providing that will
mean that rpc_message.rpc_cred is *always* generic, allowing
some optimizations.

So add "tk_op_cred" to rpc_task and "rpc_op_cred" to the setup data.
Use this to pass the cred down from rpc_call_null(), and have
rpcauth_bindcred() notice it and bind it in place.

Credit to kernel test robot <fengguang.wu@intel.com> for finding
a bug in earlier version of this patch.
Signed-off-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

1de7eea9

SUNRPC: introduce RPC_TASK_NULLCREDS to request auth_none · a68a72e1

由 NeilBrown 提交于 12月 03, 2018

In almost all cases the credential stored in rpc_message.rpc_cred
is a "generic" credential.  One of the two expections is when an
AUTH_NULL credential is used such as for RPC ping requests.

To improve consistency, don't pass an explicit credential in
these cases, but instead pass NULL and set a task flag,
similar to RPC_TASK_ROOTCREDS, which requests that NULL credentials
be used by default.
Signed-off-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

a68a72e1

NFS/SUNRPC: don't lookup machine credential until rpcauth_bindcred(). · 5e16923b

由 NeilBrown 提交于 12月 03, 2018

When NFS creates a machine credential, it is a "generic" credential,
not tied to any auth protocol, and is really just a container for
the princpal name.
This doesn't get linked to a genuine credential until rpcauth_bindcred()
is called.
The lookup always succeeds, so various places that test if the machine
credential is NULL, are pointless.

As a step towards getting rid of generic credentials, this patch gets
rid of generic machine credentials.  The nfs_client and rpc_client
just hold a pointer to a constant principal name.
When a machine credential is wanted, a special static 'struct rpc_cred'
pointer is used. rpcauth_bindcred() recognizes this, finds the
principal from the client, and binds the correct credential.
Signed-off-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

5e16923b

SUNRPC: remove machine_cred field from struct auth_cred · 1a80810f

由 NeilBrown 提交于 12月 03, 2018

The cred is a machine_cred iff ->principal is set, so there is no
need for the extra flag.

There is one case which deserves some
explanation. nfs4_root_machine_cred() calls rpc_lookup_machine_cred()
with a NULL principal name which results in not getting a machine
credential, but getting a root credential instead.
This appears to be what is expected of the caller, and is
clearly the result provided by both auth_unix and auth_gss
which already ignore the flag.
Signed-off-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

1a80810f

SUNRPC: remove uid and gid from struct auth_cred · 8276c902

由 NeilBrown 提交于 12月 03, 2018

Use cred->fsuid and cred->fsgid instead.
Signed-off-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

8276c902

SUNRPC: remove groupinfo from struct auth_cred. · fc0664fd

由 NeilBrown 提交于 12月 03, 2018

We can use cred->groupinfo (from the 'struct cred') instead.
Signed-off-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

fc0664fd

SUNRPC: add 'struct cred *' to auth_cred and rpc_cred · 97f68c6b

由 NeilBrown 提交于 12月 03, 2018

The SUNRPC credential framework was put together before
Linux has 'struct cred'.  Now that we have it, it makes sense to
use it.
This first step just includes a suitable 'struct cred *' pointer
in every 'struct auth_cred' and almost every 'struct rpc_cred'.

The rpc_cred used for auth_null has a NULL 'struct cred *' as nothing
else really makes sense.

For rpc_cred, the pointer is reference counted.
For auth_cred it isn't.  struct auth_cred are either allocated on
the stack, in which case the thread owns a reference to the auth,
or are part of 'struct generic_cred' in which case gc_base owns the
reference, and "acred" shares it.
Signed-off-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

97f68c6b

SUNRPC: allow /proc entries without CONFIG_SUNRPC_DEBUG · 8e2e5b7c

由 Ben Dooks 提交于 11月 28, 2018

If we want /proc/sys/sunrpc the current kernel also drags in other debug
features which we don't really want. Instead, we should always show the
following entries:

/proc/sys/sunrpc/udp_slot_table_entries
/proc/sys/sunrpc/tcp_slot_table_entries
/proc/sys/sunrpc/tcp_max_slot_table_entries
/proc/sys/sunrpc/min_resvport
/proc/sys/sunrpc/max_resvport
/proc/sys/sunrpc/tcp_fin_timeout
Signed-off-by: NBen Dooks <ben.dooks@codethink.co.uk>
Signed-off-by: NThomas Preston <thomas.preston@codethink.co.uk>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

8e2e5b7c

19 12月, 2018 3 次提交

SUNRPC: Remove xprt_connect_status() · abc13275

由 Trond Myklebust 提交于 12月 17, 2018

Over the years, xprt_connect_status() has been superseded by
call_connect_status(), which now handles all the errors that
xprt_connect_status() does and more. Since the latter converts
all errors that it doesn't recognise to EIO, then it is time
for it to be retired.
Reported-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>
Tested-by: NChuck Lever <chuck.lever@oracle.com>

abc13275

SUNRPC: Fix a race with XPRT_CONNECTING · cf76785d

由 Trond Myklebust 提交于 12月 17, 2018

Ensure that we clear XPRT_CONNECTING before releasing the XPRT_LOCK so that
we don't have races between the (asynchronous) socket setup code and
tasks in xprt_connect().
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>
Tested-by: NChuck Lever <chuck.lever@oracle.com>

cf76785d

SUNRPC: Fix disconnection races · 0445f92c

由 Trond Myklebust 提交于 12月 17, 2018

When the socket is closed, we need to call xprt_disconnect_done() in order
to clean up the XPRT_WRITE_SPACE flag, and wake up the sleeping tasks.

However, we also want to ensure that we don't wake them up before the socket
is closed, since that would cause thundering herd issues with everyone
piling up to retransmit before the TCP shutdown dance has completed.
Only the task that holds XPRT_LOCKED needs to wake up early in order to
allow the close to complete.
Reported-by: NDave Wysochanski <dwysocha@redhat.com>
Reported-by: NScott Mayhew <smayhew@redhat.com>
Cc: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>
Tested-by: NChuck Lever <chuck.lever@oracle.com>

0445f92c

05 12月, 2018 4 次提交

SUNRPC: Don't force a redundant disconnection in xs_read_stream() · 79462857

由 Trond Myklebust 提交于 12月 03, 2018

If the connection is broken, then xs_tcp_state_change() will take care
of scheduling the socket close as soon as appropriate. xs_read_stream()
just needs to report the error.
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>

79462857

SUNRPC: Fix up socket polling · dfcf0380

由 Trond Myklebust 提交于 12月 04, 2018

Ensure that we do not exit the socket read callback without clearing
XPRT_SOCK_DATA_READY.
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>

dfcf0380

SUNRPC: Use the discard iterator rather than MSG_TRUNC · b76a5afd

由 Trond Myklebust 提交于 12月 03, 2018

When discarding message data from the stream, we're better off using
the discard iterator, since that will work with non-TCP streams.
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>

b76a5afd

T
SUNRPC: Treat EFAULT as a truncated message in xs_read_stream_request() · 26781eab
由 Trond Myklebust 提交于 12月 03, 2018
```
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>
```
26781eab

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功