提交 · 4112be70becb82bc9a53cf2d11ab51c35602b063 · openeuler / Kernel

18 11月, 2017 40 次提交

sunrpc: exit_net cleanup check added · 4112be70

由 Vasily Averin 提交于 11月 12, 2017

Be sure that all_clients list initialized in net_init hook was return
to initial state.
Signed-off-by: NVasily Averin <vvs@virtuozzo.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

4112be70

nfs client: exit_net cleanup check added · b0b5352d

由 Vasily Averin 提交于 11月 12, 2017

Be sure that nfs_client_list and nfs_volume_list lists initialized
in net_init hook were return to initial state in net_exit hook.
Signed-off-by: NVasily Averin <vvs@virtuozzo.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

b0b5352d

nfs/write: Use common error handling code in nfs_lock_and_join_requests() · 0671d8f1

由 Markus Elfring 提交于 11月 07, 2017

Add a jump target so that a bit of exception handling can be better reused
at the end of this function.

This issue was detected by using the Coccinelle software.
Signed-off-by: NMarkus Elfring <elfring@users.sourceforge.net>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

0671d8f1

NFSv4: Replace closed stateids with the "invalid special stateid" · fcd8843c

由 Trond Myklebust 提交于 11月 07, 2017

When decoding a CLOSE, replace the stateid returned by the server
with the "invalid special stateid" described in RFC5661, Section 8.2.3.

In nfs_set_open_stateid_locked, ignore stateids from closed state.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

fcd8843c

NFSv4: nfs_set_open_stateid must not trigger state recovery for closed state · e1fff5df

由 Trond Myklebust 提交于 11月 07, 2017

In nfs_set_open_stateid_locked, we must ignore stateids from closed state.
Reported-by: NAndrew W Elble <aweits@rit.edu>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

e1fff5df

NFSv4: Check the open stateid when searching for expired state · 46280d9d

由 Trond Myklebust 提交于 11月 06, 2017

Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

46280d9d

NFSv4: Clean up nfs4_delegreturn_done · 140087fd

由 Trond Myklebust 提交于 11月 06, 2017

Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

140087fd

NFSv4: cleanup nfs4_close_done · 91b30d2e

由 Trond Myklebust 提交于 11月 06, 2017

Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

91b30d2e

NFSv4: Retry NFS4ERR_OLD_STATEID errors in layoutreturn · ff90514e

由 Trond Myklebust 提交于 11月 06, 2017

If our layoutreturn returns an NFS4ERR_OLD_STATEID, then try to
update the stateid and retry.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

ff90514e

pNFS: Retry NFS4ERR_OLD_STATEID errors in layoutreturn-on-close · 7380020e

由 Trond Myklebust 提交于 11月 06, 2017

If our layoutreturn on close operation returns an NFS4ERR_OLD_STATEID,
then try to update the stateid and retry. We know that there should
be no further LAYOUTGET requests being launched.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

7380020e

NFSv4: Don't try to CLOSE if the stateid 'other' field has changed · c82bac6f

由 Trond Myklebust 提交于 11月 06, 2017

If the stateid is no longer recognised on the server, either due to a
restart, or due to a competing CLOSE call, then we do not have to
retry. Any open contexts that triggered a reopen of the file, will
also act as triggers for any CLOSE for the updated stateids.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

c82bac6f

NFSv4: Retry CLOSE and DELEGRETURN on NFS4ERR_OLD_STATEID. · 12f275cd

由 Trond Myklebust 提交于 11月 06, 2017

If we're racing with an OPEN, then retry the operation instead of
declaring it a success.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
[Andrew W Elble: Fix a typo in nfs4_refresh_open_stateid]
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

12f275cd

NFS: Fix a typo in nfs_rename() · d803224c

由 Trond Myklebust 提交于 11月 06, 2017

On successful rename, the "old_dentry" is retained and is attached to
the "new_dir", so we need to call nfs_set_verifier() accordingly.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

d803224c

NFSv4: Fix open create exclusive when the server reboots · 8fd1ab74

由 Trond Myklebust 提交于 11月 06, 2017

If the server that does not implement NFSv4.1 persistent session
semantics reboots while we are performing an exclusive create,
then the return value of NFS4ERR_DELAY when we replay the open
during the grace period causes us to lose the verifier.
When the grace period expires, and we present a new verifier,
the server will then correctly reply NFS4ERR_EXIST.

This commit ensures that we always present the same verifier when
replaying the OPEN.
Reported-by: NTigran Mkrtchyan <tigran.mkrtchyan@desy.de>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

8fd1ab74

NFSv4: Add a tracepoint to document open stateid updates · ad9e02dc

由 Trond Myklebust 提交于 11月 06, 2017

Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

ad9e02dc

NFSv4: Fix OPEN / CLOSE race · c9399f21

由 Trond Myklebust 提交于 11月 06, 2017

Ben Coddington has noted the following race between OPEN and CLOSE
on a single client.

Process 1		Process 2		Server
=========		=========		======

1)  OPEN file
2)			OPEN file
3)						Process OPEN (1) seqid=1
4)						Process OPEN (2) seqid=2
5)						Reply OPEN (2)
6)			Receive reply (2)
7)			new stateid, seqid=2

8)			CLOSE file, using
			stateid w/ seqid=2
9)						Reply OPEN (1)
10(						Process CLOSE (8)
11)						Reply CLOSE (8)
12)						Forget stateid
						file closed

13)			Receive reply (7)
14)			Forget stateid
			file closed.

15) Receive reply (1).
16) New stateid seqid=1
    is really the same
    stateid that was
    closed.

IOW: the reply to the first OPEN is delayed. Since "Process 2" does
not wait before closing the file, and it does not cache the closed
stateid, then when the delayed reply is finally received, it is treated
as setting up a new stateid by the client.

The fix is to ensure that the client processes the OPEN and CLOSE calls
in the same order in which the server processed them.

This commit ensures that we examine the seqid of the stateid
returned by OPEN. If it is a new stateid, we assume the seqid
must be equal to the value 1, and that each state transition
increments the seqid value by 1 (See RFC7530, Section 9.1.4.2,
and RFC5661, Section 8.2.2).

If the tracker sees that an OPEN returns with a seqid that is greater
than the cached seqid + 1, then it bumps a flag to ensure that the
caller waits for the RPCs carrying the missing seqids to complete.

Note that there can still be pathologies where the server crashes before
it can even send us the missing seqids. Since the OPEN call is still
holding a slot when it waits here, that could cause the recovery to
stall forever. To avoid that, we time out after a 5 second wait.
Reported-by: NBenjamin Coddington <bcodding@redhat.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

c9399f21

sunrpc: Add rpc_request static trace point · c435da68

由 Chuck Lever 提交于 11月 03, 2017

Display information about the RPC procedure being requested in the
trace log. This sometimes critical information cannot always be
derived from other RPC trace entries.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

c435da68

sunrpc: Fix rpc_task_begin trace point · b2bfe591

由 Chuck Lever 提交于 11月 03, 2017

The rpc_task_begin trace point always display a task ID of zero.
Move the trace point call site so that it picks up the new task ID.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

b2bfe591

SUNRPC: Fix parsing failure in trace points with XIDs · a30ccf1a

由 Chuck Lever 提交于 10月 20, 2017

mount.nf-11159 8.... 905.248380: xprt_transmit: [FAILED TO PARSE] xid=351291440 status=0 addr=192.168.2.5 port=20049
mount.nf-11159 8.... 905.248381: rpc_task_sleep: task:6210@1 flags=0e80 state=0005 status=0 timeout=60000 queue=xprt_pending
kworker/-1591 1.... 905.248419: xprt_lookup_rqst: [FAILED TO PARSE] xid=351291440 status=0 addr=192.168.2.5 port=20049
kworker/-1591 1.... 905.248423: xprt_complete_rqst: [FAILED TO PARSE] xid=351291440 status=24 addr=192.168.2.5 port=20049

Byte swapping is not available during trace-cmd report.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

a30ccf1a

net: sunrpc: mark expected switch fall-throughs · e9d47639

由 Gustavo A. R. Silva 提交于 10月 20, 2017

In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.
Signed-off-by: NGustavo A. R. Silva <garsilva@embeddedor.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

e9d47639

NFS: Fix bool initialization/comparison · 6089dd0d

由 Thomas Meyer 提交于 10月 07, 2017

Bool initializations should use true and false. Bool tests don't need
comparisons.
Signed-off-by: NThomas Meyer <thomas@m3y3r.de>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

6089dd0d

NFS: Avoid RCU usage in tracepoints · 3944369d

由 Anna Schumaker 提交于 11月 01, 2017

There isn't an obvious way to acquire and release the RCU lock during a
tracepoint, so we can't use the rpc_peeraddr2str() function here.
Instead, rely on the client's cl_hostname, which should have similar
enough information without needing an rcu_dereference().
Reported-by: NDave Jones <davej@codemonkey.org.uk>
Cc: stable@vger.kernel.org # v3.12
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

3944369d

xprtrdma: Update copyright notices · 62b56a67

由 Chuck Lever 提交于 10月 30, 2017

Credit work contributed by Oracle engineers since 2014.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

62b56a67

xprtrdma: Remove include for linux/prefetch.h · 1b746c1e

由 Chuck Lever 提交于 10月 30, 2017

Clean up. This include should have been removed by
commit 23826c7a ("xprtrdma: Serialize credit accounting again").
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Reviewed-by: NDevesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

1b746c1e

rpcrdma: Remove C structure definitions of XDR data items · 2232df5e

由 Chuck Lever 提交于 10月 30, 2017

Clean up: C-structure style XDR encoding and decoding logic has
been replaced over the past several merge windows on both the
client and server. These data structures are no longer used.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Reviewed-by: NDevesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

2232df5e

xprtrdma: Put Send CQ in IB_POLL_WORKQUEUE mode · a4699f56

由 Chuck Lever 提交于 10月 30, 2017

Lift the Send and LocalInv completion handlers out of soft IRQ mode
to make room for other work. Also, move the Send CQ to a different
CPU than the CPU where the Receive CQ is running, for improved
scalability.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Reviewed-by: NDevesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

a4699f56

fs, nfs: convert nfs_client.cl_count from atomic_t to refcount_t · 212bf41d

由 Elena Reshetova 提交于 10月 20, 2017

atomic_t variables are currently used to implement reference
counters with the following properties:
 - counter is initialized to 1 using atomic_set()
 - a resource is freed upon counter reaching zero
 - once counter reaches zero, its further
   increments aren't allowed
 - counter schema uses basic atomic operations
   (set, inc, inc_not_zero, dec_and_test, etc.)

Such atomic variables should be converted to a newly provided
refcount_t type and API that prevents accidental counter overflows
and underflows. This is important since overflows and underflows
can lead to use-after-free situation and be exploitable.

The variable nfs_client.cl_count is used as pure reference counter.
Convert it to refcount_t and fix up the operations.
Suggested-by: NKees Cook <keescook@chromium.org>
Reviewed-by: NDavid Windsor <dwindsor@gmail.com>
Reviewed-by: NHans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: NElena Reshetova <elena.reshetova@intel.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

212bf41d

fs, nfs: convert nfs_lock_context.count from atomic_t to refcount_t · 2f62b5aa

由 Elena Reshetova 提交于 10月 20, 2017

atomic_t variables are currently used to implement reference
counters with the following properties:
 - counter is initialized to 1 using atomic_set()
 - a resource is freed upon counter reaching zero
 - once counter reaches zero, its further
   increments aren't allowed
 - counter schema uses basic atomic operations
   (set, inc, inc_not_zero, dec_and_test, etc.)

Such atomic variables should be converted to a newly provided
refcount_t type and API that prevents accidental counter overflows
and underflows. This is important since overflows and underflows
can lead to use-after-free situation and be exploitable.

The variable nfs_lock_context.count is used as pure reference counter.
Convert it to refcount_t and fix up the operations.
Suggested-by: NKees Cook <keescook@chromium.org>
Reviewed-by: NDavid Windsor <dwindsor@gmail.com>
Reviewed-by: NHans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: NElena Reshetova <elena.reshetova@intel.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

2f62b5aa

fs, nfs: convert nfs4_lock_state.ls_count from atomic_t to refcount_t · 194bc1f4

由 Elena Reshetova 提交于 10月 20, 2017

atomic_t variables are currently used to implement reference
counters with the following properties:
 - counter is initialized to 1 using atomic_set()
 - a resource is freed upon counter reaching zero
 - once counter reaches zero, its further
   increments aren't allowed
 - counter schema uses basic atomic operations
   (set, inc, inc_not_zero, dec_and_test, etc.)

Such atomic variables should be converted to a newly provided
refcount_t type and API that prevents accidental counter overflows
and underflows. This is important since overflows and underflows
can lead to use-after-free situation and be exploitable.

The variable nfs4_lock_state.ls_count  is used as pure reference counter.
Convert it to refcount_t and fix up the operations.
Suggested-by: NKees Cook <keescook@chromium.org>
Reviewed-by: NDavid Windsor <dwindsor@gmail.com>
Reviewed-by: NHans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: NElena Reshetova <elena.reshetova@intel.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

194bc1f4

fs, nfs: convert nfs_cache_defer_req.count from atomic_t to refcount_t · 0896cade

由 Elena Reshetova 提交于 10月 20, 2017

atomic_t variables are currently used to implement reference
counters with the following properties:
 - counter is initialized to 1 using atomic_set()
 - a resource is freed upon counter reaching zero
 - once counter reaches zero, its further
   increments aren't allowed
 - counter schema uses basic atomic operations
   (set, inc, inc_not_zero, dec_and_test, etc.)

Such atomic variables should be converted to a newly provided
refcount_t type and API that prevents accidental counter overflows
and underflows. This is important since overflows and underflows
can lead to use-after-free situation and be exploitable.

The variable nfs_cache_defer_req.count is used as pure reference counter.
Convert it to refcount_t and fix up the operations.
Suggested-by: NKees Cook <keescook@chromium.org>
Reviewed-by: NDavid Windsor <dwindsor@gmail.com>
Reviewed-by: NHans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: NElena Reshetova <elena.reshetova@intel.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

0896cade

fs, nfs: convert nfs4_ff_layout_mirror.ref from atomic_t to refcount_t · 81a090b9

由 Elena Reshetova 提交于 10月 20, 2017

atomic_t variables are currently used to implement reference
counters with the following properties:
 - counter is initialized to 1 using atomic_set()
 - a resource is freed upon counter reaching zero
 - once counter reaches zero, its further
   increments aren't allowed
 - counter schema uses basic atomic operations
   (set, inc, inc_not_zero, dec_and_test, etc.)

Such atomic variables should be converted to a newly provided
refcount_t type and API that prevents accidental counter overflows
and underflows. This is important since overflows and underflows
can lead to use-after-free situation and be exploitable.

The variable nfs4_ff_layout_mirror.ref is used as pure reference counter.
Convert it to refcount_t and fix up the operations.
Suggested-by: NKees Cook <keescook@chromium.org>
Reviewed-by: NDavid Windsor <dwindsor@gmail.com>
Reviewed-by: NHans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: NElena Reshetova <elena.reshetova@intel.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

81a090b9

fs, nfs: convert pnfs_layout_hdr.plh_refcount from atomic_t to refcount_t · 2b28a7be

由 Elena Reshetova 提交于 10月 20, 2017

atomic_t variables are currently used to implement reference
counters with the following properties:
 - counter is initialized to 1 using atomic_set()
 - a resource is freed upon counter reaching zero
 - once counter reaches zero, its further
   increments aren't allowed
 - counter schema uses basic atomic operations
   (set, inc, inc_not_zero, dec_and_test, etc.)

Such atomic variables should be converted to a newly provided
refcount_t type and API that prevents accidental counter overflows
and underflows. This is important since overflows and underflows
can lead to use-after-free situation and be exploitable.

The variable pnfs_layout_hdr.plh_refcount is used as pure reference counter.
Convert it to refcount_t and fix up the operations.
Suggested-by: NKees Cook <keescook@chromium.org>
Reviewed-by: NDavid Windsor <dwindsor@gmail.com>
Reviewed-by: NHans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: NElena Reshetova <elena.reshetova@intel.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

2b28a7be

fs, nfs: convert pnfs_layout_segment.pls_refcount from atomic_t to refcount_t · eba6dd69

由 Elena Reshetova 提交于 10月 20, 2017

refcount_t type and corresponding API should be
used instead of atomic_t when the variable is used as
a reference counter. This allows to avoid accidental
refcounter overflows that might lead to use-after-free
situations.
Signed-off-by: NElena Reshetova <elena.reshetova@intel.com>
Signed-off-by: NHans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: NKees Cook <keescook@chromium.org>
Signed-off-by: NDavid Windsor <dwindsor@gmail.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

eba6dd69

fs, nfs: convert nfs4_pnfs_ds.ds_count from atomic_t to refcount_t · a2a5dea7

由 Elena Reshetova 提交于 10月 20, 2017

atomic_t variables are currently used to implement reference
counters with the following properties:
 - counter is initialized to 1 using atomic_set()
 - a resource is freed upon counter reaching zero
 - once counter reaches zero, its further
   increments aren't allowed
 - counter schema uses basic atomic operations
   (set, inc, inc_not_zero, dec_and_test, etc.)

Such atomic variables should be converted to a newly provided
refcount_t type and API that prevents accidental counter overflows
and underflows. This is important since overflows and underflows
can lead to use-after-free situation and be exploitable.

The variable nfs4_pnfs_ds.ds_count is used as pure reference counter.
Convert it to refcount_t and fix up the operations.
Suggested-by: NKees Cook <keescook@chromium.org>
Reviewed-by: NDavid Windsor <dwindsor@gmail.com>
Reviewed-by: NHans Liljestrand <ishkamiel@gmail.com>
Signed-off-by: NElena Reshetova <elena.reshetova@intel.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

a2a5dea7

NFSv4.1: Fix up replays of interrupted requests · 3be0f80b

由 Trond Myklebust 提交于 10月 19, 2017

If the previous request on a slot was interrupted before it was
processed by the server, then our slot sequence number may be out of whack,
and so we try the next operation using the old sequence number.

The problem with this, is that not all servers check to see that the
client is replaying the same operations as previously when they decide
to go to the replay cache, and so instead of the expected error of
NFS4ERR_SEQ_FALSE_RETRY, we get a replay of the old reply, which could
(if the operations match up) be mistaken by the client for a new reply.

To fix this, we attempt to send a COMPOUND containing only the SEQUENCE op
in order to resync our slot sequence number.

Cc: Olga Kornievskaia <olga.kornievskaia@gmail.com>
[olga.kornievskaia@gmail.com: fix an Oops]
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

3be0f80b

xprtrdma: Remove atomic send completion counting · 6f0afc28

由 Chuck Lever 提交于 10月 20, 2017

The sendctx circular queue now guarantees that xprtrdma cannot
overflow the Send Queue, so remove the remaining bits of the
original Send WQE counting mechanism.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

6f0afc28

xprtrdma: RPC completion should wait for Send completion · 01bb35c8

由 Chuck Lever 提交于 10月 20, 2017

When an RPC Call includes a file data payload, that payload can come
from pages in the page cache, or a user buffer (for direct I/O).

If the payload can fit inline, xprtrdma includes it in the Send
using a scatter-gather technique. xprtrdma mustn't allow the RPC
consumer to re-use the memory where that payload resides before the
Send completes. Otherwise, the new contents of that memory would be
exposed by an HCA retransmit of the Send operation.

So, block RPC completion on Send completion, but only in the case
where a separate file data payload is part of the Send. This
prevents the reuse of that memory while it is still part of a Send
operation without an undue cost to other cases.

Waiting is avoided in the common case because typically the Send
will have completed long before the RPC Reply arrives.

These days, an RPC timeout will trigger a disconnect, which tears
down the QP. The disconnect flushes all waiting Sends. This bounds
the amount of time the reply handler has to wait for a Send
completion.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

01bb35c8

xprtrdma: Refactor rpcrdma_deferred_completion · 0ba6f370

由 Chuck Lever 提交于 10月 20, 2017

Invoke a common routine for releasing hardware resources (for
example, invalidating MRs). This needs to be done whether an
RPC Reply has arrived or the RPC was terminated early.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

0ba6f370

xprtrdma: Add a field of bit flags to struct rpcrdma_req · 531cca0c

由 Chuck Lever 提交于 10月 20, 2017

We have one boolean flag in rpcrdma_req today. I'd like to add more
flags, so convert that boolean to a bit flag.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

531cca0c

xprtrdma: Add data structure to manage RDMA Send arguments · ae72950a

由 Chuck Lever 提交于 10月 20, 2017

Problem statement:

Recently Sagi Grimberg <sagi@grimberg.me> observed that kernel RDMA-
enabled storage initiators don't handle delayed Send completion
correctly. If Send completion is delayed beyond the end of a ULP
transaction, the ULP may release resources that are still being used
by the HCA to complete a long-running Send operation.

This is a common design trait amongst our initiators. Most Send
operations are faster than the ULP transaction they are part of.
Waiting for a completion for these is typically unnecessary.

Infrequently, a network partition or some other problem crops up
where an ordering problem can occur. In NFS parlance, the RPC Reply
arrives and completes the RPC, but the HCA is still retrying the
Send WR that conveyed the RPC Call. In this case, the HCA can try
to use memory that has been invalidated or DMA unmapped, and the
connection is lost. If that memory has been re-used for something
else (possibly not related to NFS), and the Send retransmission
exposes that data on the wire.

Thus we cannot assume that it is safe to release Send-related
resources just because a ULP reply has arrived.

After some analysis, we have determined that the completion
housekeeping will not be difficult for xprtrdma:

 - Inline Send buffers are registered via the local DMA key, and
   are already left DMA mapped for the lifetime of a transport
   connection, thus no additional handling is necessary for those
 - Gathered Sends involving page cache pages _will_ need to
   DMA unmap those pages after the Send completes. But like
   inline send buffers, they are registered via the local DMA key,
   and thus will not need to be invalidated

In addition, RPC completion will need to wait for Send completion
in the latter case. However, nearly always, the Send that conveys
the RPC Call will have completed long before the RPC Reply
arrives, and thus no additional latency will be accrued.

Design notes:

In this patch, the rpcrdma_sendctx object is introduced, and a
lock-free circular queue is added to manage a set of them per
transport.

The RPC client's send path already prevents sending more than one
RPC Call at the same time. This allows us to treat the consumer
side of the queue (rpcrdma_sendctx_get_locked) as if there is a
single consumer thread.

The producer side of the queue (rpcrdma_sendctx_put_locked) is
invoked only from the Send completion handler, which is a single
thread of execution (soft IRQ).

The only care that needs to be taken is with the tail index, which
is shared between the producer and consumer. Only the producer
updates the tail index. The consumer compares the head with the
tail to ensure that the a sendctx that is in use is never handed
out again (or, expressed more conventionally, the queue is empty).

When the sendctx queue empties completely, there are enough Sends
outstanding that posting more Send operations can result in a Send
Queue overflow. In this case, the ULP is told to wait and try again.
This introduces strong Send Queue accounting to xprtrdma.

As a final touch, Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
suggested a mechanism that does not require signaling every Send.
We signal once every N Sends, and perform SGE unmapping of N Send
operations during that one completion.
Reported-by: NSagi Grimberg <sagi@grimberg.me>
Suggested-by: NJason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

ae72950a

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功