提交 · 1738de336ebc9f8d4bb1b3126c5417a5923a660a · openeuler / Kernel

21 8月, 2019 9 次提交

xprtrdma: Use an llist to manage free rpcrdma_reps · b0b227f0

由 Chuck Lever 提交于 8月 19, 2019

rpcrdma_rep objects are removed from their free list by only a
single thread: the Receive completion handler. Thus that free list
can be converted to an llist, where a single-threaded consumer and
a multi-threaded producer (rpcrdma_buffer_put) can both access the
llist without the need for any serialization.

This eliminates spin lock contention between the Receive completion
handler and rpcrdma_buffer_get, and makes the rep consumer wait-
free.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

b0b227f0

xprtrdma: Remove rpcrdma_buffer::rb_mrlock · 4d6b8890

由 Chuck Lever 提交于 8月 19, 2019

Clean up: Now that the free list is used sparingly, get rid of the
separate spin lock protecting it.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

4d6b8890

xprtrdma: Cache free MRs in each rpcrdma_req · 6dc6ec9e

由 Chuck Lever 提交于 8月 19, 2019

Instead of a globally-contended MR free list, cache MRs in each
rpcrdma_req as they are released. This means acquiring and releasing
an MR will be lock-free in the common case, even outside the
transport send lock.

The original idea of per-rpcrdma_req MR free lists was suggested by
Shirley Ma <shirley.ma@oracle.com> several years ago. I just now
figured out how to make that idea work with on-demand MR allocation.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

6dc6ec9e

xprtrdma: Ensure creating an MR does not trigger FS writeback · 805a1f62

由 Chuck Lever 提交于 8月 19, 2019

Probably would be good to also pass GFP flags to ib_alloc_mr.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

805a1f62

xprtrdma: Move rpcrdma_mr_get out of frwr_map · 3b39f52a

由 Chuck Lever 提交于 8月 19, 2019

Refactor: Retrieve an MR and handle error recovery entirely in
rpc_rdma.c, as this is not a device-specific function.

Note that since commit 89f90fe1 ("SUNRPC: Allow calls to
xprt_transmit() to drain the entire transmit queue"), the
xprt_transmit function handles the cond_resched. The transport no
longer has to do this itself.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

3b39f52a

xprtrdma: Combine rpcrdma_mr_put and rpcrdma_mr_unmap_and_put · 1ca3f4c0

由 Chuck Lever 提交于 8月 19, 2019

Clean up. There is only one remaining rpcrdma_mr_put call site, and
it can be directly replaced with unmap_and_put because mr->mr_dir is
set to DMA_NONE just before the call.

Now all the call sites do a DMA unmap, and we can just rename
mr_unmap_and_put to mr_put, which nicely matches mr_get.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

1ca3f4c0

xprtrdma: Simplify rpcrdma_mr_pop · 265a38d4

由 Chuck Lever 提交于 8月 19, 2019

Clean up: rpcrdma_mr_pop call sites check if the list is empty
first. Let's replace the list_empty with less costly logic.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

265a38d4

xprtrdma: Rename rpcrdma_buffer::rb_all · eed48a9c

由 Chuck Lever 提交于 8月 19, 2019

Clean up: There are other "all" list heads. For code clarity
distinguish this one as for use only for MRs by renaming it.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

eed48a9c

xprtrdma: Rename CQE field in Receive trace points · 2dfdcd88

由 Chuck Lever 提交于 8月 19, 2019

Make the field name the same for all trace points that handle
pointers to struct rpcrdma_rep. That makes it easy to grep for
matching rep points in trace output.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

2dfdcd88

20 8月, 2019 1 次提交

xprtrdma: Boost maximum transport header size · f3c66a2f

由 Chuck Lever 提交于 8月 19, 2019

Although I haven't seen any performance results that justify it,
I've received several complaints that NFS/RDMA no longer supports
a maximum rsize and wsize of 1MB. These days it is somewhat smaller.

To simplify the logic that determines whether a chunk list is
necessary, the implementation uses a fixed maximum size of the
transport header. Currently that maximum size is 256 bytes, one
quarter of the default inline threshold size for RPC/RDMA v1.

Since commit a7886849 ("xprtrdma: Reduce max_frwr_depth"), the
size of chunks is also smaller to take advantage of inline page
lists in device internal MR data structures.

The combination of these two design choices has reduced the maximum
NFS rsize and wsize that can be used for most RNIC/HCAs. Increasing
the maximum transport header size and the maximum number of RDMA
segments it can contain increases the negotiated maximum rsize/wsize
on common RNIC/HCAs.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

f3c66a2f

09 7月, 2019 6 次提交

xprtrdma: Remove rpcrdma_req::rl_buffer · 5828ceba

由 Chuck Lever 提交于 6月 19, 2019

Clean up.

There is only one remaining function, rpcrdma_buffer_put(), that
uses this field. Its caller can supply a pointer to the correct
rpcrdma_buffer, enabling the removal of an 8-byte pointer field
from a frequently-allocated shared data structure.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

5828ceba

xprtrdma: Streamline rpcrdma_post_recvs · 9ef33ef5

由 Chuck Lever 提交于 6月 19, 2019

rb_lock is contended between rpcrdma_buffer_create,
rpcrdma_buffer_put, and rpcrdma_post_recvs.

Commit e340c2d6 ("xprtrdma: Reduce the doorbell rate (Receive)")
causes rpcrdma_post_recvs to take the rb_lock repeatedly when it
determines more Receives are needed. Streamline this code path so
it takes the lock just once in most cases to build the Receive
chain that is about to be posted.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

9ef33ef5

xprtrdma: Simplify rpcrdma_rep_create · 379d1bc5

由 Chuck Lever 提交于 6月 19, 2019

Clean up.

Commit 7c8d9e7c ("xprtrdma: Move Receive posting to Receive
handler") reduced the number of rpcrdma_rep_create call sites to
one. After that commit, the backchannel code no longer invokes it.

Therefore the free list logic added by commit d698c4a0
("xprtrdma: Fix backchannel allocation of extra rpcrdma_reps") is
no longer necessary, and in fact adds some extra overhead that we
can do without.

Simply post any newly created reps. They will get added back to
the rb_recv_bufs list when they subsequently complete.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

379d1bc5

xprtrdma: Wake RPCs directly in rpcrdma_wc_send path · 0ab11523

由 Chuck Lever 提交于 6月 19, 2019

Eliminate a context switch in the path that handles RPC wake-ups
when a Receive completion has to wait for a Send completion.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

0ab11523

xprtrdma: Reduce context switching due to Local Invalidation · d8099fed

由 Chuck Lever 提交于 6月 19, 2019

Since commit ba69cd12 ("xprtrdma: Remove support for FMR memory
registration"), FRWR is the only supported memory registration mode.

We can take advantage of the asynchronous nature of FRWR's LOCAL_INV
Work Requests to get rid of the completion wait by having the
LOCAL_INV completion handler take care of DMA unmapping MRs and
waking the upper layer RPC waiter.

This eliminates two context switches when local invalidation is
necessary. As a side benefit, we will no longer need the per-xprt
deferred completion work queue.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

d8099fed

xprtrdma: Fix occasional transport deadlock · 05eb06d8

由 Chuck Lever 提交于 6月 19, 2019

Under high I/O workloads, I've noticed that an RPC/RDMA transport
occasionally deadlocks (IOPS goes to zero, and doesn't recover).
Diagnosis shows that the sendctx queue is empty, but when sendctxs
are returned to the queue, the xprt_write_space wake-up never
occurs. The wake-up logic in rpcrdma_sendctx_put_locked is racy.

I noticed that both EMPTY_SCQ and XPRT_WRITE_SPACE are implemented
via an atomic bit. Just one of those is sufficient. Removing
EMPTY_SCQ in favor of the generic bit mechanism makes the deadlock
un-reproducible.

Without EMPTY_SCQ, rpcrdma_buffer::rb_flags is no longer used and
is therefore removed.

Unfortunately this patch does not apply cleanly to stable. If
needed, someone will have to port it and test it.

Fixes: 2fad6592 ("xprtrdma: Wait on empty sendctx queue")
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

05eb06d8

03 7月, 2019 1 次提交

xprtrdma: Fix use-after-free in rpcrdma_post_recvs · 2d0abe36

由 Chuck Lever 提交于 6月 19, 2019

Dereference wr->next /before/ the memory backing wr has been
released. This issue was found by code inspection. It is not
expected to be a significant problem because it is in an error
path that is almost never executed.

Fixes: 7c8d9e7c ("xprtrdma: Move Receive posting to ... ")
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

2d0abe36

28 5月, 2019 1 次提交

xprtrdma: Use struct_size() in kzalloc() · 66d4218f

由 Gustavo A. R. Silva 提交于 1月 30, 2019

One of the more common cases of allocation size calculations is finding
the size of a structure that has a zero-sized array at the end, along
with memory for some number of elements for that array. For example:

struct foo {
    int stuff;
    struct boo entry[];
};

instance = kzalloc(sizeof(struct foo) + count * sizeof(struct boo), GFP_KERNEL);

Instead of leaving these open-coded and prone to type mistakes, we can
now use the new struct_size() helper:

instance = kzalloc(struct_size(instance, entry, count), GFP_KERNEL);

This code was detected with the help of Coccinelle.
Signed-off-by: NGustavo A. R. Silva <gustavo@embeddedor.com>
Reviewed-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

66d4218f

26 4月, 2019 13 次提交

xprtrdma: Update comments that reference ib_drain_qp · b8fe677f

由 Chuck Lever 提交于 4月 24, 2019

Commit e1ede312 ("xprtrdma: Fix helper that drains the
transport") replaced the ib_drain_qp() call, so update documenting
comments to reflect current operation.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

b8fe677f

xprtrdma: Remove pr_err() call sites from completion handlers · 5f2311f5

由 Chuck Lever 提交于 4月 24, 2019

Clean up: rely on the trace points instead.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

5f2311f5

xprtrdma: Eliminate struct rpcrdma_create_data_internal · 86c4ccd9

由 Chuck Lever 提交于 4月 24, 2019

Clean up.

Move the remaining field in rpcrdma_create_data_internal so the
structure can be removed.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

86c4ccd9

xprtrdma: Aggregate the inline settings in struct rpcrdma_ep · 94087e97

由 Chuck Lever 提交于 4月 24, 2019

Clean up.

The inline settings are actually a characteristic of the endpoint,
and not related to the device. They are also modified after the
transport instance is created, so they do not belong in the cdata
structure either.

Lastly, let's use names that are more natural to RDMA than to NFS:
inline_write -> inline_send and inline_read -> inline_recv. The
/proc files retain their names to avoid breaking user space.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

94087e97

xprtrdma: Eliminate rpcrdma_ia::ri_device · f19bd0bb

由 Chuck Lever 提交于 4月 24, 2019

Clean up.

Since commit 54cbd6b0 ("xprtrdma: Delay DMA mapping Send and
Receive buffers"), a pointer to the device is now saved in each
regbuf when it is DMA mapped.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

f19bd0bb

xprtrdma: More Send completion batching · c209e49c

由 Chuck Lever 提交于 4月 24, 2019

Instead of using a fixed number, allow the amount of Send completion
batching to vary based on the client's maximum credit limit.

- A larger default gives a small boost to IOPS throughput

- Reducing it based on max_requests gives a safe result when the
  max credit limit is cranked down (eg. when the device has a small
  max_qp_wr).
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

c209e49c

xprtrdma: Clean up sendctx functions · dbcc53a5

由 Chuck Lever 提交于 4月 24, 2019

Minor clean-ups I've stumbled on since sendctx was merged last year.
In particular, making Send completion processing more efficient
appears to have a measurable impact on IOPS throughput.

Note: test_and_clear_bit() returns a value, thus an explicit memory
barrier is not necessary.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

dbcc53a5

xprtrdma: Clean up regbuf helpers · d2832af3

由 Chuck Lever 提交于 4月 24, 2019

For code legibility, clean up the function names to be consistent
with the pattern: "rpcrdma" _ object-type _ action

Also rpcrdma_regbuf_alloc and rpcrdma_regbuf_free no longer have any
callers outside of verbs.c, and can thus be made static.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

d2832af3

xprtrdma: De-duplicate "allocate new, free old regbuf" · 0f665ceb

由 Chuck Lever 提交于 4月 24, 2019

Clean up by providing an API to do this common task.

At this point, the difference between rpcrdma_get_sendbuf and
rpcrdma_get_recvbuf has become tiny. These can be collapsed into a
single helper.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

0f665ceb

xprtrdma: Allocate req's regbufs at xprt create time · bb93a1ae

由 Chuck Lever 提交于 4月 24, 2019

Allocating an rpcrdma_req's regbufs at xprt create time enables
a pair of micro-optimizations:

First, if these regbufs are always there, we can eliminate two
conditional branches from the hot xprt_rdma_allocate path.

Second, by allocating a 1KB buffer, it places a lower bound on the
size of these buffers, without adding yet another conditional
branch. The lower bound reduces the number of hardway re-
allocations. In fact, for some workloads it completely eliminates
hardway allocations.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

bb93a1ae

xprtrdma: rpcrdma_regbuf alignment · 8cec3dba

由 Chuck Lever 提交于 4月 24, 2019

Allocate the struct rpcrdma_regbuf separately from the I/O buffer
to better guarantee the alignment of the I/O buffer and eliminate
the wasted space between the rpcrdma_regbuf metadata and the buffer
itself.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

8cec3dba

xprtrdma: Clean up rpcrdma_create_rep() and rpcrdma_destroy_rep() · 23146500

由 Chuck Lever 提交于 4月 24, 2019

For code legibility, clean up the function names to be consistent
with the pattern: "rpcrdma" _ object-type _ action
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

23146500

xprtrdma: Clean up rpcrdma_create_req() · 1769e6a8

由 Chuck Lever 提交于 4月 24, 2019

Eventually, I'd like to invoke rpcrdma_create_req() during the
call_reserve step. Memory allocation there probably needs to use
GFP_NOIO. Therefore a set of GFP flags needs to be passed in.

As an additional clean up, just return a pointer or NULL, because
the only error return code here is -ENOMEM.

Lastly, clean up the function names to be consistent with the
pattern: "rpcrdma" _ object-type _ action
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

1769e6a8

12 4月, 2019 1 次提交

xprtrdma: Fix helper that drains the transport · e1ede312

由 Chuck Lever 提交于 4月 09, 2019

We want to drain only the RQ first. Otherwise the transport can
deadlock on ->close if there are outstanding Send completions.

Fixes: 6d2d0ee2 ("xprtrdma: Replace rpcrdma_receive_wq ... ")
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Cc: stable@vger.kernel.org # v5.0+
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>

e1ede312

13 2月, 2019 2 次提交

xprtrdma: Reduce the doorbell rate (Receive) · e340c2d6

由 Chuck Lever 提交于 2月 11, 2019

Post RECV WRs in batches to reduce the hardware doorbell rate per
transport. This helps the RPC-over-RDMA client scale better in
number of transports.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

e340c2d6

xprtrdma: Make sure Send CQ is allocated on an existing compvec · a4cb5bdb

由 Nicolas Morey-Chaisemartin 提交于 2月 05, 2019

Make sure the device has at least 2 completion vectors
before allocating to compvec#1

Fixes: a4699f56 (xprtrdma: Put Send CQ in IB_POLL_WORKQUEUE mode)
Signed-off-by: NNicolas Morey-Chaisemartin <nmoreychaisemartin@suse.com>
Reviewed-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

a4cb5bdb

09 1月, 2019 2 次提交

xprtrdma: Double free in rpcrdma_sendctxs_create() · 6e17f58c

由 Dan Carpenter 提交于 1月 05, 2019

The clean up is handled by the caller, rpcrdma_buffer_create(), so this
call to rpcrdma_sendctxs_destroy() leads to a double free.

Fixes: ae72950a ("xprtrdma: Add data structure to manage RDMA Send arguments")
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

6e17f58c

xprtrdma: Fix error code in rpcrdma_buffer_create() · 4429b668

由 Dan Carpenter 提交于 1月 05, 2019

This should return -ENOMEM if __alloc_workqueue_key() fails, but it
returns success.

Fixes: 6d2d0ee2 ("xprtrdma: Replace rpcrdma_receive_wq with a per-xprt workqueue")
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

4429b668

03 1月, 2019 4 次提交

xprtrdma: Add documenting comment for rpcrdma_buffer_destroy · af65ed40

由 Chuck Lever 提交于 12月 19, 2018

Make a note of the function's dependency on an earlier ib_drain_qp.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

af65ed40

xprtrdma: Replace outdated comment for rpcrdma_ep_post · 995d312a

由 Chuck Lever 提交于 12月 19, 2018

Since commit 7c8d9e7c ("xprtrdma: Move Receive posting to
Receive handler"), rpcrdma_ep_post is no longer responsible for
posting Receive buffers. Update the documenting comment to reflect
this change.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

995d312a

xprtrdma: Trace mapping, alloc, and dereg failures · 53b2c1cb

由 Chuck Lever 提交于 12月 19, 2018

These are rare, but can be helpful at tracking down DMAR and other
problems.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

53b2c1cb

xprtrdma: Cull dprintk() call sites · ddbb347f

由 Chuck Lever 提交于 12月 19, 2018

Clean up: Remove dprintk() call sites that report rare or impossible
errors. Leave a few that display high-value low noise status
information.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

ddbb347f

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功