提交 · b5cde6aa882dfb40a2b29c1c7371fdc3655c51ce · openeuler / Kernel

24 10月, 2019 11 次提交

xprtrdma: Remove rpcrdma_sendctx::sc_device · b5cde6aa

由 Chuck Lever 提交于 10月 17, 2019

Micro-optimization: Save eight bytes in a frequently allocated
structure.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

b5cde6aa

xprtrdma: Remove rpcrdma_sendctx::sc_xprt · f995879e

由 Chuck Lever 提交于 10月 17, 2019

Micro-optimization: Save eight bytes in a frequently allocated
structure.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

f995879e

xprtrdma: Ensure ri_id is stable during MR recycling · 15d9b015

由 Chuck Lever 提交于 10月 17, 2019

ia->ri_id is replaced during a reconnect. The connect_worker runs
with the transport send lock held to prevent ri_id from being
dereferenced by the send_request path during this process.

Currently, however, there is no guarantee that ia->ri_id is stable
in the MR recycling worker, which operates in the background and is
not serialized with the connect_worker in any way.

But now that Local_Inv completions are being done in process
context, we can handle the recycling operation there instead of
deferring the recycling work to another process. Because the
disconnect path drains all work before allowing tear down to
proceed, it is guaranteed that Local Invalidations complete only
while the ri_id pointer is stable.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

15d9b015

xprtrdma: Manage MRs in context of a single connection · 9d2da4ff

由 Chuck Lever 提交于 10月 09, 2019

MRs are now allocated on demand so we can safely throw them away on
disconnect. This way an idle transport can disconnect and it won't
pin hardware MR resources.

Two additional changes:

- Now that all MRs are destroyed on disconnect, there's no need to
  check during header marshaling if a req has MRs to recycle. Each
  req is sent only once per connection, and now rl_registered is
  guaranteed to be empty when rpcrdma_marshal_req is invoked.

- Because MRs are now destroyed in a WQ_MEM_RECLAIM context, they
  also must be allocated in a WQ_MEM_RECLAIM context. This reduces
  the likelihood that device driver memory allocation will trigger
  memory reclaim during NFS writeback.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

9d2da4ff

xprtrdma: Fix MR list handling · c3700780

由 Chuck Lever 提交于 10月 09, 2019

Close some holes introduced by commit 6dc6ec9e ("xprtrdma: Cache
free MRs in each rpcrdma_req") that could result in list corruption.

In addition, the result that is tabulated in @count is no longer
used, so @count is removed.

Fixes: 6dc6ec9e ("xprtrdma: Cache free MRs in each rpcrdma_req")
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

c3700780

xprtrdma: Close window between waking RPC senders and posting Receives · 2ae50ad6

由 Chuck Lever 提交于 10月 09, 2019

A recent clean up attempted to separate Receive handling and RPC
Reply processing, in the name of clean layering.

Unfortunately, we can't do this because the Receive Queue has to be
refilled _after_ the most recent credit update from the responder
is parsed from the transport header, but _before_ we wake up the
next RPC sender. That is right in the middle of
rpcrdma_reply_handler().

Usually this isn't a problem because current responder
implementations don't vary their credit grant. The one exception is
when a connection is established: the grant goes from one to a much
larger number on the first Receive. The requester MUST post enough
Receives right then so that any outstanding requests can be sent
without risking RNR and connection loss.

Fixes: 6ceea368 ("xprtrdma: Refactor Receive accounting")
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

2ae50ad6

xprtrdma: Initialize rb_credits in one place · eea63ca7

由 Chuck Lever 提交于 10月 09, 2019

Clean up/code de-duplication.

Nit: RPC_CWNDSHIFT is incorrect as the initial value for xprt->cwnd.
This mistake does not appear to have operational consequences, since
the cwnd value is replaced with a valid value upon the first Receive
completion.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

eea63ca7

xprtrdma: Connection becomes unstable after a reconnect · a31b2f93

由 Chuck Lever 提交于 10月 09, 2019

This is because xprt_request_get_cong() is allowing more than one
RPC Call to be transmitted before the first Receive on the new
connection. The first Receive fills the Receive Queue based on the
server's credit grant. Before that Receive, there is only a single
Receive WR posted because the client doesn't know the server's
credit grant.

Solution is to clear rq_cong on all outstanding rpc_rqsts when the
the cwnd is reset. This is because an RPC/RDMA credit is good for
one connection instance only.

Fixes: 75891f50 ("SUNRPC: Support for congestion control ... ")
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

a31b2f93

xprtrdma: Add unique trace points for posting Local Invalidate WRs · 4b93dab3

由 Chuck Lever 提交于 10月 09, 2019

When adding frwr_unmap_async way back when, I re-used the existing
trace_xprtrdma_post_send() trace point to record the return code
of ib_post_send.

Unfortunately there are some cases where re-using that trace point
causes a crash. Instead, construct a trace point specific to posting
Local Invalidate WRs that will always be safe to use in that context,
and will act as a trace log eye-catcher for Local Invalidation.

Fixes: 84756894 ("xprtrdma: Remove fr_state")
Fixes: d8099fed ("xprtrdma: Reduce context switching due ... ")
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Tested-by: NBill Baker <bill.baker@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

4b93dab3

SUNRPC: Add trace points to observe transport congestion control · bf7ca707

由 Chuck Lever 提交于 10月 09, 2019

To help debug problems with RPC/RDMA credit management, replace
dprintk() call sites in the transport send lock paths with trace
events.

Similar trace points are defined for the non-congestion paths.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

bf7ca707

SUNRPC: Eliminate log noise in call_reserveresult · 5cd8b0d4

由 Chuck Lever 提交于 10月 09, 2019

Sep 11 16:35:20 manet kernel:
		call_reserveresult: unrecognized error -512, exiting

Diagnostic error messages such as this likely have no value for NFS
client administrators.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

5cd8b0d4

11 10月, 2019 1 次提交

SUNRPC: fix race to sk_err after xs_error_report · af84537d

由 Benjamin Coddington 提交于 10月 02, 2019

Since commit 4f8943f8 ("SUNRPC: Replace direct task wakeups from
softirq context") there has been a race to the value of the sk_err if both
XPRT_SOCK_WAKE_ERROR and XPRT_SOCK_WAKE_DISCONNECT are set.  In that case,
we may end up losing the sk_err value that existed when xs_error_report was
called.

Fix this by reverting to the previous behavior: instead of using SO_ERROR
to retrieve the value at a later time (which might also return sk_err_soft),
copy the sk_err value onto struct sock_xprt, and use that value to wake
pending tasks.
Signed-off-by: NBenjamin Coddington <bcodding@redhat.com>
Fixes: 4f8943f8 ("SUNRPC: Replace direct task wakeups from softirq context")
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

af84537d

25 9月, 2019 1 次提交

sunrpc: clean up indentation issue · e41f9efb

由 Colin Ian King 提交于 9月 25, 2019

There are statements that are indented incorrectly, remove the
extraneous spacing.
Signed-off-by: NColin Ian King <colin.king@canonical.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

e41f9efb

21 9月, 2019 4 次提交

SUNRPC: Fix congestion window race with disconnect · 8593e010

由 Chuck Lever 提交于 9月 13, 2019

If the congestion window closes just as the transport disconnects,
a reconnect is never driven because:

1. The XPRT_CONG_WAIT flag prevents tasks from taking the write lock
2. There's no wake-up of the first task on the xprt->sending queue

To address this, clear the congestion wait flag as part of
completing a disconnect.

Fixes: 75891f50 ("SUNRPC: Support for congestion control ... ")
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

8593e010

SUNRPC: Don't try to parse incomplete RPC messages · 9ba82886

由 Trond Myklebust 提交于 9月 16, 2019

If the copy of the RPC reply into our buffers did not complete, and
we could end up with a truncated message. In that case, just resend
the call.

Fixes: a0584ee9 ("SUNRPC: Use struct xdr_stream when decoding...")
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

9ba82886

SUNRPC: Rename xdr_buf_read_netobj to xdr_buf_read_mic · f925ab92

由 Benjamin Coddington 提交于 9月 16, 2019

Let the name reflect the single use.  The function now assumes the GSS MIC
is the last object in the buffer.
Signed-off-by: NBenjamin Coddington <bcodding@redhat.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

f925ab92

SUNRPC: Fix buffer handling of GSS MIC without slack · 5f1bc399

由 Benjamin Coddington 提交于 9月 16, 2019

The GSS Message Integrity Check data for krb5i may lie partially in the XDR
reply buffer's pages and tail.  If so, we try to copy the entire MIC into
free space in the tail.  But as the estimations of the slack space required
for authentication and verification have improved there may be less free
space in the tail to complete this copy -- see commit 2c94b8ec
("SUNRPC: Use au_rslack when computing reply buffer size").  In fact, there
may only be room in the tail for a single copy of the MIC, and not part of
the MIC and then another complete copy.

The real world failure reported is that `ls` of a directory on NFS may
sometimes return -EIO, which can be traced back to xdr_buf_read_netobj()
failing to find available free space in the tail to copy the MIC.

Fix this by checking for the case of the MIC crossing the boundaries of
head, pages, and tail. If so, shift the buffer until the MIC is contained
completely within the pages or tail.  This allows the remainder of the
function to create a sub buffer that directly address the complete MIC.
Signed-off-by: NBenjamin Coddington <bcodding@redhat.com>
Cc: stable@vger.kernel.org # v5.1
Reviewed-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

5f1bc399

18 9月, 2019 3 次提交

SUNRPC: RPC level errors should always set task->tk_rpc_status · 714fbc73

由 Trond Myklebust 提交于 9月 12, 2019

Ensure that we set task->tk_rpc_status for all RPC level errors so that
the caller can distinguish between those and server reply status errors.
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

714fbc73

SUNRPC: Don't receive TCP data into a request buffer that has been reset · 45835a63

由 Trond Myklebust 提交于 9月 12, 2019

If we've removed the request from the receive list, and have added
it back after resetting the request receive buffer, then we should
only receive message data if it is a new reply (i.e. if
transport->recv.copied is zero).

Fixes: 277e4ab7 ("SUNRPC: Simplify TCP receive code by switching...")
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

45835a63

SUNRPC: Dequeue the request from the receive queue while we're re-encoding · cc204d01

由 Trond Myklebust 提交于 9月 10, 2019

Ensure that we dequeue the request from the transport receive queue
while we're re-encoding to prevent issues like use-after-free when
we release the bvec.

Fixes: 75369089 ("SUNRPC: Ensure the bvecs are reset when we re-encode...")
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>
Cc: stable@vger.kernel.org # v4.20+
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

cc204d01

06 9月, 2019 1 次提交

new helper: get_tree_keyed() · 533770cc

由 Al Viro 提交于 9月 03, 2019

For vfs_get_keyed_super users.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

533770cc

05 9月, 2019 1 次提交

sunrpc: Use kzfree rather than its implementation. · 60b3990c

由 zhong jiang 提交于 9月 04, 2019

Use kzfree instead of memset() + kfree().
Signed-off-by: Nzhong jiang <zhongjiang@huawei.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

60b3990c

27 8月, 2019 7 次提交

xprtrdma: Send Queue size grows after a reconnect · 98ef77d1

由 Chuck Lever 提交于 8月 26, 2019

Eli Dorfman reports that after a series of idle disconnects, an
RPC/RDMA transport becomes unusable (rdma_create_qp returns
-ENOMEM). Problem was tracked down to increasing Send Queue size
after each reconnect.

The rdma_create_qp() API does not promise to leave its @qp_init_attr
parameter unaltered. In fact, some drivers do modify one or more of
its fields. Thus our calls to rdma_create_qp must use a fresh copy
of ib_qp_init_attr each time.

This fix is appropriate for kernels dating back to late 2007, though
it will have to be adapted, as the connect code has changed over the
years.
Reported-by: NEli Dorfman <eli@vastdata.com>
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

98ef77d1

xprtrdma: Clear xprt->reestablish_timeout on close · f9e1afe0

由 Chuck Lever 提交于 8月 26, 2019

Ensure that the re-establishment delay does not grow exponentially
on each good reconnect. This probably should have been part of
commit 675dd90a ("xprtrdma: Modernize ops->connect").
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

f9e1afe0

SUNRPC: Handle connection breakages correctly in call_status() · c82e5472

由 Trond Myklebust 提交于 8月 16, 2019

If the connection breaks while we're waiting for a reply from the
server, then we want to immediately try to reconnect.

Fixes: ec6017d9 ("SUNRPC fix regression in umount of a secure mount")
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>

c82e5472

Revert "NFSv4/flexfiles: Abort I/O early if the layout segment was invalidated" · d5711920

由 Trond Myklebust 提交于 8月 16, 2019

This reverts commit a79f194a.
The mechanism for aborting I/O is racy, since we are not guaranteed that
the request is asleep while we're changing both task->tk_status and
task->tk_action.
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>
Cc: stable@vger.kernel.org # v5.1

d5711920

SUNRPC: Handle EADDRINUSE and ENOBUFS correctly · 80f455da

由 Trond Myklebust 提交于 8月 15, 2019

If a connect or bind attempt returns EADDRINUSE, that means we want to
retry with a different port. It is not a fatal connection error.
Similarly, ENOBUFS is not fatal, but just indicates a memory allocation
issue. Retry after a short delay.
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>

80f455da

SUNRPC: Don't handle errors if the bind/connect succeeded · bd736ed3

由 Trond Myklebust 提交于 8月 15, 2019

Don't handle errors in call_bind_status()/call_connect_status()
if it turns out that a previous call caused it to succeed.
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>
Cc: stable@vger.kernel.org # v5.1+

bd736ed3

xprtrdma: Recycle MRs after disconnect · ee2f412e

由 Chuck Lever 提交于 8月 26, 2019

The optimization done in "xprtrdma: Simplify rpcrdma_mr_pop" was a
bit too optimistic. MRs left over after a reconnect still need to
be recycled, not added back to the free list, since they could be
in flight or actually fully registered.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

ee2f412e

22 8月, 2019 3 次提交

xprtrdma: Optimize rpcrdma_post_recvs() · 435eba4a

由 Chuck Lever 提交于 8月 19, 2019

Micro-optimization: In rpcrdma_post_recvs, since commit e340c2d6
("xprtrdma: Reduce the doorbell rate (Receive)"), the common case is
to return without doing anything. Found with perf.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

435eba4a

xprtrdma: Inline XDR chunk encoder functions · 1738de33

由 Chuck Lever 提交于 8月 19, 2019

Micro-optimization: Save the cost of three function calls during
transport header encoding.

These were "noinline" before to generate more meaningful call stacks
during debugging, but this code is now pretty stable.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

1738de33

xprtrdma: Fix bc_max_slots return value · 17d47f93

由 Chuck Lever 提交于 8月 19, 2019

For the moment the returned value just happens to be correct because
the current backchannel server implementation does not vary the
number of credits it offers. The spec does permit this value to
change during the lifetime of a connection, however.

The actual maximum is fixed for all RPC/RDMA transports, because
each transport instance has to pre-allocate the resources for
processing BC requests. That's the value that should be returned.

Fixes: 7402a4fe ("SUNRPC: Fix up backchannel slot table ... ")
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

17d47f93

21 8月, 2019 8 次提交

xprtrdma: Clean up xprt_rdma_set_connect_timeout() · 2a7f77c7

由 Chuck Lever 提交于 8月 19, 2019

Clean up: The function name should match the documenting comment.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

2a7f77c7

xprtrdma: Use an llist to manage free rpcrdma_reps · b0b227f0

由 Chuck Lever 提交于 8月 19, 2019

rpcrdma_rep objects are removed from their free list by only a
single thread: the Receive completion handler. Thus that free list
can be converted to an llist, where a single-threaded consumer and
a multi-threaded producer (rpcrdma_buffer_put) can both access the
llist without the need for any serialization.

This eliminates spin lock contention between the Receive completion
handler and rpcrdma_buffer_get, and makes the rep consumer wait-
free.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

b0b227f0

xprtrdma: Remove rpcrdma_buffer::rb_mrlock · 4d6b8890

由 Chuck Lever 提交于 8月 19, 2019

Clean up: Now that the free list is used sparingly, get rid of the
separate spin lock protecting it.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

4d6b8890

xprtrdma: Cache free MRs in each rpcrdma_req · 6dc6ec9e

由 Chuck Lever 提交于 8月 19, 2019

Instead of a globally-contended MR free list, cache MRs in each
rpcrdma_req as they are released. This means acquiring and releasing
an MR will be lock-free in the common case, even outside the
transport send lock.

The original idea of per-rpcrdma_req MR free lists was suggested by
Shirley Ma <shirley.ma@oracle.com> several years ago. I just now
figured out how to make that idea work with on-demand MR allocation.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

6dc6ec9e

xprtrdma: Ensure creating an MR does not trigger FS writeback · 805a1f62

由 Chuck Lever 提交于 8月 19, 2019

Probably would be good to also pass GFP flags to ib_alloc_mr.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

805a1f62

xprtrdma: Move rpcrdma_mr_get out of frwr_map · 3b39f52a

由 Chuck Lever 提交于 8月 19, 2019

Refactor: Retrieve an MR and handle error recovery entirely in
rpc_rdma.c, as this is not a device-specific function.

Note that since commit 89f90fe1 ("SUNRPC: Allow calls to
xprt_transmit() to drain the entire transmit queue"), the
xprt_transmit function handles the cond_resched. The transport no
longer has to do this itself.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

3b39f52a

xprtrdma: Combine rpcrdma_mr_put and rpcrdma_mr_unmap_and_put · 1ca3f4c0

由 Chuck Lever 提交于 8月 19, 2019

Clean up. There is only one remaining rpcrdma_mr_put call site, and
it can be directly replaced with unmap_and_put because mr->mr_dir is
set to DMA_NONE just before the call.

Now all the call sites do a DMA unmap, and we can just rename
mr_unmap_and_put to mr_put, which nicely matches mr_get.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

1ca3f4c0

xprtrdma: Simplify rpcrdma_mr_pop · 265a38d4

由 Chuck Lever 提交于 8月 19, 2019

Clean up: rpcrdma_mr_pop call sites check if the list is empty
first. Let's replace the list_empty with less costly logic.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

265a38d4

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功