提交 · dc15c3d5f16808f7c171b55da6a82a5c0f279647 · openeuler / Kernel

24 10月, 2019 4 次提交

xprtrdma: Move the rpcrdma_sendctx::sc_wr field · dc15c3d5

由 Chuck Lever 提交于 10月 17, 2019

Clean up: This field is not needed in the Send completion handler,
so it can be moved to struct rpcrdma_req to reduce the size of
struct rpcrdma_sendctx, and to reduce the amount of memory that
is sloshed between the sending process and the Send completion
process.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

dc15c3d5

xprtrdma: Ensure ri_id is stable during MR recycling · 15d9b015

由 Chuck Lever 提交于 10月 17, 2019

ia->ri_id is replaced during a reconnect. The connect_worker runs
with the transport send lock held to prevent ri_id from being
dereferenced by the send_request path during this process.

Currently, however, there is no guarantee that ia->ri_id is stable
in the MR recycling worker, which operates in the background and is
not serialized with the connect_worker in any way.

But now that Local_Inv completions are being done in process
context, we can handle the recycling operation there instead of
deferring the recycling work to another process. Because the
disconnect path drains all work before allowing tear down to
proceed, it is guaranteed that Local Invalidations complete only
while the ri_id pointer is stable.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

15d9b015

xprtrdma: Manage MRs in context of a single connection · 9d2da4ff

由 Chuck Lever 提交于 10月 09, 2019

MRs are now allocated on demand so we can safely throw them away on
disconnect. This way an idle transport can disconnect and it won't
pin hardware MR resources.

Two additional changes:

- Now that all MRs are destroyed on disconnect, there's no need to
  check during header marshaling if a req has MRs to recycle. Each
  req is sent only once per connection, and now rl_registered is
  guaranteed to be empty when rpcrdma_marshal_req is invoked.

- Because MRs are now destroyed in a WQ_MEM_RECLAIM context, they
  also must be allocated in a WQ_MEM_RECLAIM context. This reduces
  the likelihood that device driver memory allocation will trigger
  memory reclaim during NFS writeback.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

9d2da4ff

xprtrdma: Add unique trace points for posting Local Invalidate WRs · 4b93dab3

由 Chuck Lever 提交于 10月 09, 2019

When adding frwr_unmap_async way back when, I re-used the existing
trace_xprtrdma_post_send() trace point to record the return code
of ib_post_send.

Unfortunately there are some cases where re-using that trace point
causes a crash. Instead, construct a trace point specific to posting
Local Invalidate WRs that will always be safe to use in that context,
and will act as a trace log eye-catcher for Local Invalidation.

Fixes: 84756894 ("xprtrdma: Remove fr_state")
Fixes: d8099fed ("xprtrdma: Reduce context switching due ... ")
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Tested-by: NBill Baker <bill.baker@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

4b93dab3

27 8月, 2019 1 次提交

xprtrdma: Recycle MRs after disconnect · ee2f412e

由 Chuck Lever 提交于 8月 26, 2019

The optimization done in "xprtrdma: Simplify rpcrdma_mr_pop" was a
bit too optimistic. MRs left over after a reconnect still need to
be recycled, not added back to the free list, since they could be
in flight or actually fully registered.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

ee2f412e

21 8月, 2019 6 次提交

xprtrdma: Remove rpcrdma_buffer::rb_mrlock · 4d6b8890

由 Chuck Lever 提交于 8月 19, 2019

Clean up: Now that the free list is used sparingly, get rid of the
separate spin lock protecting it.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

4d6b8890

xprtrdma: Cache free MRs in each rpcrdma_req · 6dc6ec9e

由 Chuck Lever 提交于 8月 19, 2019

Instead of a globally-contended MR free list, cache MRs in each
rpcrdma_req as they are released. This means acquiring and releasing
an MR will be lock-free in the common case, even outside the
transport send lock.

The original idea of per-rpcrdma_req MR free lists was suggested by
Shirley Ma <shirley.ma@oracle.com> several years ago. I just now
figured out how to make that idea work with on-demand MR allocation.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

6dc6ec9e

xprtrdma: Ensure creating an MR does not trigger FS writeback · 805a1f62

由 Chuck Lever 提交于 8月 19, 2019

Probably would be good to also pass GFP flags to ib_alloc_mr.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

805a1f62

xprtrdma: Move rpcrdma_mr_get out of frwr_map · 3b39f52a

由 Chuck Lever 提交于 8月 19, 2019

Refactor: Retrieve an MR and handle error recovery entirely in
rpc_rdma.c, as this is not a device-specific function.

Note that since commit 89f90fe1 ("SUNRPC: Allow calls to
xprt_transmit() to drain the entire transmit queue"), the
xprt_transmit function handles the cond_resched. The transport no
longer has to do this itself.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

3b39f52a

xprtrdma: Combine rpcrdma_mr_put and rpcrdma_mr_unmap_and_put · 1ca3f4c0

由 Chuck Lever 提交于 8月 19, 2019

Clean up. There is only one remaining rpcrdma_mr_put call site, and
it can be directly replaced with unmap_and_put because mr->mr_dir is
set to DMA_NONE just before the call.

Now all the call sites do a DMA unmap, and we can just rename
mr_unmap_and_put to mr_put, which nicely matches mr_get.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

1ca3f4c0

xprtrdma: Simplify rpcrdma_mr_pop · 265a38d4

由 Chuck Lever 提交于 8月 19, 2019

Clean up: rpcrdma_mr_pop call sites check if the list is empty
first. Let's replace the list_empty with less costly logic.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

265a38d4

20 8月, 2019 2 次提交

xprtrdma: Fix calculation of ri_max_segs again · 36bdd905

由 Chuck Lever 提交于 8月 19, 2019

Commit 302d3deb ("xprtrdma: Prevent inline overflow") added this
calculation back in 2016, but got it wrong. I tested only the lower
bound, which is why there is a max_t there. The upper bound should be
rounded up too.

Now, when using DIV_ROUND_UP, that takes care of the lower bound as
well.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

36bdd905

xprtrdma: Refresh the documenting comment in frwr_ops.c · 2fb2a4d5

由 Chuck Lever 提交于 8月 19, 2019

Things have changed since this comment was written. In particular,
the reworking of connection closing, on-demand creation of MRs, and
the removal of fr_state all mean that deferring MR recovery to
frwr_map is no longer needed. The description is obsolete.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

2fb2a4d5

09 7月, 2019 4 次提交

xprtrdma: Reduce context switching due to Local Invalidation · d8099fed

由 Chuck Lever 提交于 6月 19, 2019

Since commit ba69cd12 ("xprtrdma: Remove support for FMR memory
registration"), FRWR is the only supported memory registration mode.

We can take advantage of the asynchronous nature of FRWR's LOCAL_INV
Work Requests to get rid of the completion wait by having the
LOCAL_INV completion handler take care of DMA unmapping MRs and
waking the upper layer RPC waiter.

This eliminates two context switches when local invalidation is
necessary. As a side benefit, we will no longer need the per-xprt
deferred completion work queue.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

d8099fed

xprtrdma: Add mechanism to place MRs back on the free list · 40088f0e

由 Chuck Lever 提交于 6月 19, 2019

When a marshal operation fails, any MRs that were already set up for
that request are recycled. Recycling releases MRs and creates new
ones, which is expensive.

Since commit f2877623 ("xprtrdma: Chain Send to FastReg WRs")
was merged, recycling FRWRs is unnecessary. This is because before
that commit, frwr_map had already posted FAST_REG Work Requests,
so ownership of the MRs had already been passed to the NIC and thus
dealing with them had to be delayed until they completed.

Since that commit, however, FAST_REG WRs are posted at the same time
as the Send WR. This means that if marshaling fails, we are certain
the MRs are safe to simply unmap and place back on the free list
because neither the Send nor the FAST_REG WRs have been posted yet.
The kernel still has ownership of the MRs at this point.

This reduces the total number of MRs that the xprt has to create
under heavy workloads and makes the marshaling logic less brittle.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

40088f0e

xprtrdma: Remove fr_state · 84756894

由 Chuck Lever 提交于 6月 19, 2019

Now that both the Send and Receive completions are handled in
process context, it is safe to DMA unmap and return MRs to the
free or recycle lists directly in the completion handlers.

Doing this means rpcrdma_frwr no longer needs to track the state of
each MR, meaning that a VALID or FLUSHED MR can no longer appear on
an xprt's MR free list. Thus there is no longer a need to track the
MR's registration state in rpcrdma_frwr.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

84756894

xprtrdma: Fix occasional transport deadlock · 05eb06d8

由 Chuck Lever 提交于 6月 19, 2019

Under high I/O workloads, I've noticed that an RPC/RDMA transport
occasionally deadlocks (IOPS goes to zero, and doesn't recover).
Diagnosis shows that the sendctx queue is empty, but when sendctxs
are returned to the queue, the xprt_write_space wake-up never
occurs. The wake-up logic in rpcrdma_sendctx_put_locked is racy.

I noticed that both EMPTY_SCQ and XPRT_WRITE_SPACE are implemented
via an atomic bit. Just one of those is sufficient. Removing
EMPTY_SCQ in favor of the generic bit mechanism makes the deadlock
un-reproducible.

Without EMPTY_SCQ, rpcrdma_buffer::rb_flags is no longer used and
is therefore removed.

Unfortunately this patch does not apply cleanly to stable. If
needed, someone will have to port it and test it.

Fixes: 2fad6592 ("xprtrdma: Wait on empty sendctx queue")
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

05eb06d8

26 4月, 2019 4 次提交

xprtrdma: Remove pr_err() call sites from completion handlers · 5f2311f5

由 Chuck Lever 提交于 4月 24, 2019

Clean up: rely on the trace points instead.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

5f2311f5

xprtrdma: Eliminate struct rpcrdma_create_data_internal · 86c4ccd9

由 Chuck Lever 提交于 4月 24, 2019

Clean up.

Move the remaining field in rpcrdma_create_data_internal so the
structure can be removed.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

86c4ccd9

xprtrdma: Eliminate rpcrdma_ia::ri_device · f19bd0bb

由 Chuck Lever 提交于 4月 24, 2019

Clean up.

Since commit 54cbd6b0 ("xprtrdma: Delay DMA mapping Send and
Receive buffers"), a pointer to the device is now saved in each
regbuf when it is DMA mapped.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

f19bd0bb

xprtrdma: Fix an frwr_map recovery nit · b2ca473b

由 Chuck Lever 提交于 4月 24, 2019

After a DMA map failure in frwr_map, mark the MR so that recycling
won't attempt to DMA unmap it.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Fixes: e2f34e26 ("xprtrdma: Yet another double DMA-unmap")
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

b2ca473b

13 2月, 2019 1 次提交

xprtrdma: Fix sparse warnings · ec482cc1

由 Chuck Lever 提交于 2月 11, 2019

linux/net/sunrpc/xprtrdma/rpc_rdma.c:375:63: warning: incorrect type in argument 5 (different base types)
linux/net/sunrpc/xprtrdma/rpc_rdma.c:375:63: expected unsigned int [usertype] xid
linux/net/sunrpc/xprtrdma/rpc_rdma.c:375:63: got restricted __be32 [usertype] rq_xid
linux/net/sunrpc/xprtrdma/rpc_rdma.c:432:62: warning: incorrect type in argument 5 (different base types)
linux/net/sunrpc/xprtrdma/rpc_rdma.c:432:62: expected unsigned int [usertype] xid
linux/net/sunrpc/xprtrdma/rpc_rdma.c:432:62: got restricted __be32 [usertype] rq_xid
linux/net/sunrpc/xprtrdma/rpc_rdma.c:489:62: warning: incorrect type in argument 5 (different base types)
linux/net/sunrpc/xprtrdma/rpc_rdma.c:489:62: expected unsigned int [usertype] xid
linux/net/sunrpc/xprtrdma/rpc_rdma.c:489:62: got restricted __be32 [usertype] rq_xid

Fixes: 0a93fbcb ("xprtrdma: Plant XID in on-the-wire RDMA ... ")
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

ec482cc1

03 1月, 2019 10 次提交

xprtrdma: Don't leak freed MRs · f85adb1b