提交 · 06eb8a56af23ae32e90fdd6b27fec30930364b52 · openeuler / Kernel

26 4月, 2017 8 次提交

svcrdma: Remove the req_map cache · 2cf32924

由 Chuck Lever 提交于 4月 09, 2017

req_maps are no longer used by the send path and can thus be removed.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

2cf32924

svcrdma: Report Write/Reply chunk overruns · 4757d90b

由 Chuck Lever 提交于 4月 09, 2017

Observed at Connectathon 2017.

If a client has underestimated the size of a Write or Reply chunk,
the Linux server writes as much payload data as it can, then it
recognizes there was a problem and closes the connection without
sending the transport header.

This creates a couple of problems:

<> The client never receives indication of the server-side failure,
   so it continues to retransmit the bad RPC. Forward progress on
   the transport is blocked.

<> The reply payload pages are not moved out of the svc_rqst, thus
   they can be released by the RPC server before the RDMA Writes
   have completed.

The new rdma_rw-ized helpers return a distinct error code when a
Write/Reply chunk overrun occurs, so it's now easy for the caller
(svc_rdma_sendto) to recognize this case.

Instead of dropping the connection, post an RDMA_ERROR message. The
client now sees an RDMA_ERROR and can properly terminate the RPC
transaction.

As part of the new logic, set up the same delayed release for these
payload pages as would have occurred in the normal case.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

4757d90b

svcrdma: Clean up RDMA_ERROR path · 6b19cc5c

由 Chuck Lever 提交于 4月 09, 2017

Now that svc_rdma_sendto has been renovated, svc_rdma_send_error can
be refactored to reduce code duplication and remove C structure-
based XDR encoding. It is also relocated to the source file that
contains its only caller.

This is a refactoring change only.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

6b19cc5c

svcrdma: Use rdma_rw API in RPC reply path · 9a6a180b

由 Chuck Lever 提交于 4月 09, 2017

The current svcrdma sendto code path posts one RDMA Write WR at a
time. Each of these Writes typically carries a small number of pages
(for instance, up to 30 pages for mlx4 devices). That means a 1MB
NFS READ reply requires 9 ib_post_send() calls for the Write WRs,
and one for the Send WR carrying the actual RPC Reply message.

Instead, use the new rdma_rw API. The details of Write WR chain
construction and memory registration are taken care of in the RDMA
core. svcrdma can focus on the details of the RPC-over-RDMA
protocol. This gives three main benefits:

1. All Write WRs for one RDMA segment are posted in a single chain.
As few as one ib_post_send() for each Write chunk.

2. The Write path can now use FRWR to register the Write buffers.
If the device's maximum page list depth is large, this means a
single Write WR is needed for each RPC's Write chunk data.

3. The new code introduces support for RPCs that carry both a Write
list and a Reply chunk. This combination can be used for an NFSv4
READ where the data payload is large, and thus is removed from the
Payload Stream, but the Payload Stream is still larger than the
inline threshold.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

9a6a180b

svcrdma: Clean up svc_rdma_get_inv_rkey() · c238c4c0

由 Chuck Lever 提交于 4月 09, 2017

Replace C structure-based XDR decoding with more portable code that
instead uses pointer arithmetic.

This is a refactoring change only.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

c238c4c0

svcrdma: Add helper to save pages under I/O · c55ab070

由 Chuck Lever 提交于 4月 09, 2017

Clean up: extract the logic to save pages under I/O into a helper to
add a big documenting comment without adding clutter in the send
path.

This is a refactoring change only.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

c55ab070

svcrdma: Add svc_rdma_map_reply_hdr() · 6e6092ca

由 Chuck Lever 提交于 4月 09, 2017

Introduce a helper to DMA-map a reply's transport header before
sending it. This will in part replace the map vector cache.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

6e6092ca

svcrdma: Move send_wr to svc_rdma_op_ctxt · 17f5f7f5

由 Chuck Lever 提交于 4月 09, 2017

Clean up: Move the ib_send_wr off the stack, and move common code
to post a Send Work Request into a helper.

This is a refactoring change only.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

17f5f7f5

09 2月, 2017 2 次提交

svcrdma: Clean up RPC-over-RDMA Reply header encoder · 98fc21d3

由 Chuck Lever 提交于 2月 07, 2017

Replace C structure-based XDR decoding with pointer arithmetic.
Pointer arithmetic is considered more portable, and is used
throughout the kernel's existing XDR encoders. The gcc optimizer
generates similar assembler code either way.

Byte-swapping before a memory store on x86 typically results in an
instruction pipeline stall. Avoid byte-swapping when encoding a new
header.

svcrdma currently doesn't alter a connection's credit grant value
after the connection has been accepted, so it is effectively a
constant. Cache the byte-swapped value in a separate field.

Christoph suggested pulling the header encoding logic into the only
function that uses it.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

98fc21d3

svcrdma: Another sendto chunk list parsing update · cbaf5803

由 Chuck Lever 提交于 2月 07, 2017

Commit 5fdca653 ("svcrdma: Renovate sendto chunk list parsing")
missed a spot. svc_rdma_xdr_get_reply_hdr_len() also assumes the
Write list has only one Write chunk. There's no harm in making this
code more general.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

cbaf5803

01 12月, 2016 3 次提交

svcrdma: Further clean-up of svc_rdma_get_inv_rkey() · fafedf81

由 Chuck Lever 提交于 11月 29, 2016

No longer any need for the dprintk().
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

fafedf81

svcrdma: Remove BH-disabled spin locking in svc_rdma_send() · e4eb42ce

由 Chuck Lever 提交于 11月 29, 2016

svcrdma's current SQ accounting algorithm takes sc_lock and disables
bottom-halves while posting all RDMA Read, Write, and Send WRs.

This is relatively heavyweight serialization. And note that Write and
Send are already fully serialized by the xpt_mutex.

Using a single atomic_t should be all that is necessary to guarantee
that ib_post_send() is called only when there is enough space on the
send queue. This is what the other RDMA-enabled storage targets do.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

e4eb42ce

svcrdma: Renovate sendto chunk list parsing · 5fdca653

由 Chuck Lever 提交于 11月 29, 2016

The current sendto code appears to support clients that provide only
one of a Read list, a Write list, or a Reply chunk. My reading of
that code is that it doesn't support the following cases:

 - Read list + Write list
 - Read list + Reply chunk
 - Write list + Reply chunk
 - Read list + Write list + Reply chunk

The protocol allows more than one Read or Write chunk in those
lists. Some clients do send a Read list and Reply chunk
simultaneously. NFSv4 WRITE uses a Read list for the data payload,
and a Reply chunk because the GETATTR result in the reply can
contain a large object like an ACL.

Generalize one of the sendto code paths needed to support all of
the above cases, and attempt to ensure that only one pass is done
through the RPC Call's transport header to gather chunk list
information for building the reply.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

5fdca653

23 9月, 2016 3 次提交

svcrdma: support Remote Invalidation · 25d55296

由 Chuck Lever 提交于 9月 13, 2016

Support Remote Invalidation. A private message is exchanged with
the client upon RDMA transport connect that indicates whether
Send With Invalidation may be used by the server to send RPC
replies. The invalidate_rkey is arbitrarily chosen from among
rkeys present in the RPC-over-RDMA header's chunk lists.

Send With Invalidate improves performance only when clients can
recognize, while processing an RPC reply, that an rkey has already
been invalidated. That has been submitted as a separate change.

In the future, the RPC-over-RDMA protocol might support Remote
Invalidation properly. The protocol needs to enable signaling
between peers to indicate when Remote Invalidation can be used
for each individual RPC.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

25d55296

svcrdma: Skip put_page() when send_reply() fails · 9995237b

由 Chuck Lever 提交于 9月 13, 2016

Message from syslogd@klimt at Aug 18 17:00:37 ...
kernel:page:ffffea0020639b00 count:0 mapcount:0 mapping: (null) index:0x0
Aug 18 17:00:37 klimt kernel: flags: 0x2fffff80000000()
Aug 18 17:00:37 klimt kernel: page dumped because: VM_BUG_ON_PAGE(page_ref_count(page) == 0)

Aug 18 17:00:37 klimt kernel: kernel BUG at /home/cel/src/linux/linux-2.6/include/linux/mm.h:445!
Aug 18 17:00:37 klimt kernel: RIP: 0010:[<ffffffffa05c21c1>] svc_rdma_sendto+0x641/0x820 [rpcrdma]

send_reply() assigns its page argument as the first page of ctxt. On
error, send_reply() already invokes svc_rdma_put_context(ctxt, 1);
which does a put_page() on that very page. No need to do that again
as svc_rdma_sendto exits.

Fixes: 3e1eeb98 ("svcrdma: Close connection when a send error occurs")
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

9995237b

svcrdma: Tail iovec leaves an orphaned DMA mapping · cace564f

由 Chuck Lever 提交于 9月 13, 2016

The ctxt's count field is overloaded to mean the number of pages in
the ctxt->page array and the number of SGEs in the ctxt->sge array.
Typically these two numbers are the same.

However, when an inline RPC reply is constructed from an xdr_buf
with a tail iovec, the head and tail often occupy the same page,
but each are DMA mapped independently. In that case, ->count equals
the number of pages, but it does not equal the number of SGEs.
There's one more SGE, for the tail iovec. Hence there is one more
DMA mapping than there are pages in the ctxt->page array.

This isn't a real problem until the server's iommu is enabled. Then
each RPC reply that has content in that iovec orphans a DMA mapping
that consists of real resources.

krb5i and krb5p always populate that tail iovec. After a couple
million sent krb5i/p RPC replies, the NFS server starts behaving
erratically. Reboot is needed to clear the problem.

Fixes: 9d11b51c ("svcrdma: Fix send_reply() scatter/gather set-up")
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

cace564f

14 5月, 2016 1 次提交

svcrdma: svc_rdma_put_context() is invoked twice in Send error path · 9ec64052

由 Chuck Lever 提交于 5月 04, 2016

Get a fresh op_ctxt in send_reply() instead of in svc_rdma_sendto().
This ensures that svc_rdma_put_context() is invoked only once if
send_reply() fails.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

9ec64052

02 3月, 2016 7 次提交

svcrdma: Use new CQ API for RPC-over-RDMA server send CQs · be99bb11

由 Chuck Lever 提交于 3月 01, 2016

Calling ib_poll_cq() to sort through WCs during a completion is a
common pattern amongst RDMA consumers. Since commit 14d3a3b2
("IB: add a proper completion queue abstraction"), WC sorting can
be handled by the IB core.

By converting to this new API, svcrdma is made a better neighbor to
other RDMA consumers, as it allows the core to schedule the delivery
of completions more fairly amongst all active consumers.

This new API also aims each completion at a function that is
specific to the WR's opcode. Thus the ctxt->wr_op field and the
switch in process_context is replaced by a set of methods that
handle each completion type.

Because each ib_cqe carries a pointer to a completion method, the
core can now post operations on a consumer's QP, and handle the
completions itself.

The server's rdma_stat_sq_poll and rdma_stat_sq_prod metrics are no
longer updated.

As a clean up, the cq_event_handler, the dto_tasklet, and all
associated locking is removed, as they are no longer referenced or
used.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Tested-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

be99bb11

svcrdma: Make RDMA_ERROR messages work · a6081b82

由 Chuck Lever 提交于 3月 01, 2016

Fix several issues with svc_rdma_send_error():

 - Post a receive buffer to replace the one that was consumed by
   the incoming request
 - Posting a send should use DMA_TO_DEVICE, not DMA_FROM_DEVICE
 - No need to put_page _and_ free pages in svc_rdma_put_context
 - Make sure the sge is set up completely in case the error
   path goes through svc_rdma_unmap_dma()
 - Replace the use of ENOSYS, which has a reserved meaning

Related fixes in svc_rdma_recvfrom():

 - Don't leak the ctxt associated with the incoming request
 - Don't close the connection after sending an error reply
 - Let svc_rdma_send_error() figure out the right header error code

As a last clean up, move svc_rdma_send_error() to svc_rdma_sendto.c
with other similar functions. There is some common logic in these
functions that could someday be combined to reduce code duplication.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Reviewed-by: NDevesh Sharma <devesh.sharma@broadcom.com>
Tested-by: NDevesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

a6081b82

svcrdma: svc_rdma_post_recv() should close connection on error · bf36387a

由 Chuck Lever 提交于 3月 01, 2016

Clean up: Most svc_rdma_post_recv() call sites close the transport
connection when a receive cannot be posted. Wrap that in a common
helper.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Reviewed-by: NDevesh Sharma <devesh.sharma@broadcom.com>
Tested-by: NDevesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

bf36387a

svcrdma: Close connection when a send error occurs · 3e1eeb98

由 Chuck Lever 提交于 3月 01, 2016

Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

3e1eeb98

svcrdma: Do not send Write chunk XDR pad with inline content · f6763c29

由 Chuck Lever 提交于 3月 01, 2016

The NFS server's XDR encoders adds an XDR pad for content in the
xdr_buf page list at the beginning of the xdr_buf's tail buffer.

On RDMA transports, Write chunks are sent separately and without an
XDR pad.

If a Write chunk is being sent, strip off the pad in the tail buffer
so that inline content following the Write chunk remains XDR-aligned
when it is sent to the client.

BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=294Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

f6763c29

svcrdma: Do not write xdr_buf::tail in a Write chunk · cf570a93

由 Chuck Lever 提交于 3月 01, 2016

When the Linux NFS server writes an odd-length data item into a
Write chunk, it finishes with XDR pad bytes. If the data item is
smaller than the Write chunk, the pad bytes are written at the end
of the data item, but still inside the chunk (ie, in the
application's buffer). Since this is direct data placement, that
exposes the pad bytes.

XDR pad bytes are inserted in order to preserve the XDR alignment
of the next XDR data item in an XDR stream. But Write chunks do not
appear in the payload XDR stream, and only one data item is allowed
in each chunk. Thus XDR padding is not needed in a Write chunk.

With NFSv4, the Linux NFS server places the results of any
operations that follow an NFSv4 READ or READLINK in the xdr_buf's
tail. Those results also should never be sent as a part of a Write
chunk. The current logic in send_write_chunks() appears to assume
that the xdr_buf's tail contains only pad bytes (ie, NFSv3).

The server should write only the contents of the xdr_buf's page list
in a Write chunk. If there's more than an XDR pad in the tail, that
needs to go inline or in the Reply chunk.

BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=294Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

cf570a93

svcrdma: Find client-provided write and reply chunks once per reply · 08ae4e7f

由 Chuck Lever 提交于 3月 01, 2016

The client provides the location of Write chunks into which the
server writes bulk payload. The client provides these when the
Upper Layer Protocol wants direct data placement and the Binding
allows it. (For NFS, this is READ and READLINK operations).

The client also provides the location of a Reply chunk into which
the server writes the non-bulk part of an RPC reply. The client
provides this chunk whenever it believes the reply can be larger
than its receive buffers.

The server then uses the presence of these chunks to determine how
it will form its reply message.

svc_rdma_sendto() was looking for Write and Reply chunks multiple
times for every reply message. It would be more efficient to do it
just once.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

08ae4e7f

20 1月, 2016 5 次提交

svc_rdma: use local_dma_lkey · 5fe1043d

由 Christoph Hellwig 提交于 1月 07, 2016

We now alwasy have a per-PD local_dma_lkey available.  Make use of that
fact in svc_rdma and stop registering our own MR.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NSagi Grimberg <sagig@mellanox.com>
Reviewed-by: NJason Gunthorpe <jgunthorpe@obsidianresearch.com>
Reviewed-by: NChuck Lever <chuck.lever@oracle.com>
Reviewed-by: NSteve Wise <swise@opengridcomputing.com>
Acked-by: NJ. Bruce Fields <bfields@redhat.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

5fe1043d

svcrdma: Make map_xdr non-static · ba986c96

由 Chuck Lever 提交于 1月 07, 2016

Pre-requisite to use map_xdr in the backchannel code.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Acked-by: NBruce Fields <bfields@fieldses.org>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

ba986c96

svcrdma: Remove last two __GFP_NOFAIL call sites · 78da2b3c

由 Chuck Lever 提交于 1月 07, 2016

Clean up.

These functions can otherwise fail, so check for page allocation
failures too.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Acked-by: NBruce Fields <bfields@fieldses.org>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

78da2b3c

svcrdma: Add gfp flags to svc_rdma_post_recv() · 39b09a1a

由 Chuck Lever 提交于 1月 07, 2016

svc_rdma_post_recv() allocates pages for receive buffers on-demand.
It uses GFP_KERNEL so the allocator tries hard, and may sleep. But
I'm about to add a call to svc_rdma_post_recv() from a function
that may not sleep.

Since all svc_rdma_post_recv() call sites can tolerate its failure,
allow it to fail if the page allocator returns nothing. Longer term,
receive buffers, being a finite resource per-connection, should be
pre-allocated and re-used.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Acked-by: NBruce Fields <bfields@fieldses.org>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

39b09a1a

svcrdma: Improve allocation of struct svc_rdma_req_map · 2fe81b23

由 Chuck Lever 提交于 1月 07, 2016

To ensure this allocation cannot fail and will not sleep,
pre-allocate the req_map structures per-connection.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Acked-by: NBruce Fields <bfields@fieldses.org>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

2fe81b23

07 1月, 2016 1 次提交

Revert "svcrdma: Do not send XDR roundup bytes for a write chunk" · 3daa020f

由 J. Bruce Fields 提交于 12月 23, 2015

This reverts commit 6f18dc89.

Just as one example, it appears this code could do the wrong thing in
the case of a two-byte NFS READ that crosses a page boundary.

Chuck says: "In that case, nfsd would pass down an xdr_buf that has one
byte in a page, one byte in another page, and a two-byte XDR pad. The
logic introduced by this optimization would be fooled, and neither the
second byte nor the XDR pad would be written to the client."

Cc: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

3daa020f

24 11月, 2015 1 次提交

svcrdma: Do not send XDR roundup bytes for a write chunk · 6f18dc89

由 Chuck Lever 提交于 11月 12, 2015

Minor optimization: when dealing with write chunk XDR roundup, do
not post a Write WR for the zero bytes in the pad. Simply update
the write segment in the RPC-over-RDMA header to reflect the extra
pad bytes.

The Reply chunk is also a write chunk, but the server does not use
send_write_chunks() to send the Reply chunk. That's OK in this case:
the server Upper Layer typically marshals the Reply chunk contents
in a single contiguous buffer, without a separate tail for the XDR
pad.

The comments and the variable naming refer to "chunks" but what is
really meant is "segments." The existing code sends only one
xdr_write_chunk per RPC reply.

The fix assumes this as well. When the XDR pad in the first write
chunk is reached, the assumption is the Write list is complete and
send_write_chunks() returns.

That will remain a valid assumption until the server Upper Layer can
support multiple bulk payload results per RPC.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

6f18dc89

08 10月, 2015 1 次提交

IB: split struct ib_send_wr · e622f2f4

由 Christoph Hellwig 提交于 10月 08, 2015

This patch split up struct ib_send_wr so that all non-trivial verbs
use their own structure which embedds struct ib_send_wr.  This dramaticly
shrinks the size of a WR for most common operations:

sizeof(struct ib_send_wr) (old):	96

sizeof(struct ib_send_wr):		48
sizeof(struct ib_rdma_wr):		64
sizeof(struct ib_atomic_wr):		96
sizeof(struct ib_ud_wr):		88
sizeof(struct ib_fast_reg_wr):		88
sizeof(struct ib_bind_mw_wr):		96
sizeof(struct ib_sig_handover_wr):	80

And with Sagi's pending MR rework the fast registration WR will also be
down to a reasonable size:

sizeof(struct ib_fastreg_wr):		64
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> [srp, srpt]
Reviewed-by: Chuck Lever <chuck.lever@oracle.com> [sunrpc]
Tested-by: NHaggai Eran <haggaie@mellanox.com>
Tested-by: NSagi Grimberg <sagig@mellanox.com>
Tested-by: NSteve Wise <swise@opengridcomputing.com>

e622f2f4

21 7月, 2015 2 次提交

svcrdma: Clean up svc_rdma_get_reply_array() · 10dc4512

由 Chuck Lever 提交于 7月 09, 2015

Kernel coding conventions frown upon having large nontrivial
functions in header files, and the preference these days is to
allow the compiler to make inlining decisions if possible.

As these functions are re-homed into a .c file, be sure that
comparisons with fields in struct rpcrdma_msg are with be32
constants.

This is a refactoring change; no behavior change is intended.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

10dc4512

svcrdma: Fix send_reply() scatter/gather set-up · 9d11b51c

由 Chuck Lever 提交于 7月 09, 2015

The Linux NFS server returns garbage in the data payload of inline
NFS/RDMA READ replies. These are READs of under 1000 bytes or so
where the client has not provided either a reply chunk or a write
list.

The NFS server delivers the data payload for an NFS READ reply to
the transport in an xdr_buf page list. If the NFS client did not
provide a reply chunk or a write list, send_reply() is supposed to
set up a separate sge for the page containing the READ data, and
another sge for XDR padding if needed, then post all of the sges via
a single SEND Work Request.

The problem is send_reply() does not advance through the xdr_buf
when setting up scatter/gather entries for SEND WR. It always calls
dma_map_xdr with xdr_off set to zero. When there's more than one
sge, dma_map_xdr() sets up the SEND sge's so they all point to the
xdr_buf's head.

The current Linux NFS/RDMA client always provides a reply chunk or
a write list when performing an NFS READ over RDMA. Therefore, it
does not exercise this particular case. The Linux server has never
had to use more than one extra sge for building RPC/RDMA replies
with a Linux client.

However, an NFS/RDMA client _is_ allowed to send small NFS READs
without setting up a write list or reply chunk. The NFS READ reply
fits entirely within the inline reply buffer in this case. This is
perhaps a more efficient way of performing NFS READs that the Linux
NFS/RDMA client may some day adopt.

Fixes: b432e6b3 ('svcrdma: Change DMA mapping logic to . . .')
BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=285Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

9d11b51c

05 6月, 2015 2 次提交

svcrdma: Replace GFP_KERNEL in a loop with GFP_NOFAIL · b7e0b9a9

由 Chuck Lever 提交于 6月 04, 2015

At the 2015 LSF/MM, it was requested that memory allocation
call sites that request GFP_KERNEL allocations in a loop should be
annotated with __GFP_NOFAIL.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

b7e0b9a9

svcrdma: Fix byte-swapping in svc_rdma_sendto.c · 70747c25

由 Chuck Lever 提交于 6月 04, 2015

In send_write_chunks(), we have:

	for (xdr_off = rqstp->rq_res.head[0].iov_len, chunk_no = 0;
	     xfer_len && chunk_no < arg_ary->wc_nchunks;
	     chunk_no++) {
		 . . .
	}

Note that arg_ary->wc_nchunk is in network byte-order. For the
comparison to work correctly, both have to be in native byte-order.

In send_reply_chunks, we have:

	write_len = min(xfer_len, htonl(ch->rs_length));

xfer_len is in native byte-order, and ch->rs_length is in
network byte-order. be32_to_cpu() is the correct byte swap
for ch->rs_length.

As an additional clean up, replace ntohl() with be32_to_cpu() in
a few other places.

This appears to address a problem with large rsize hangs while
using PHYSICAL memory registration. I suspect that is the only
registration mode that uses more than one chunk element.

BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=248Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

70747c25

16 1月, 2015 2 次提交

svcrdma: Find rmsgp more reliably · e5523bd2

由 Chuck Lever 提交于 1月 13, 2015

xdr_start() can return the wrong rmsgp address if an assumption
about how the xdr_buf was constructed changes.  When it gets it
wrong, the client receives a reply that has gibberish in the
RPC/RDMA header, preventing it from matching a waiting RPC request.

Instead, make (and document) just one assumption: that the RDMA
header for the client's RPC call is at the start of the first page
in rq_pages.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Reviewed-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

e5523bd2

svcrdma: Scrub BUG_ON() and WARN_ON() call sites · 3fe04ee9

由 Chuck Lever 提交于 1月 13, 2015

Current convention is to avoid using BUG_ON() in places where an
oops could cause complete system failure.

Replace BUG_ON() call sites in svcrdma with an assertion error
message and allow execution to continue safely.

Some BUG_ON() calls are removed because they have never fired in
production (that we are aware of).

Some WARN_ON() calls are also replaced where a back trace is not
helpful; e.g., in a workqueue task.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Reviewed-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

3fe04ee9

12 7月, 2014 1 次提交

svcrdma: send_write() must not overflow the device's max sge · 25594290

由 Steve Wise 提交于 7月 09, 2014

Function send_write() must stop creating sges when it reaches the device
max and return the amount sent in the RDMA Write to the caller.
Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

25594290

07 6月, 2014 1 次提交

svcrdma: refactor marshalling logic · 0bf48289

由 Steve Wise 提交于 5月 28, 2014

This patch refactors the NFSRDMA server marshalling logic to
remove the intermediary map structures.  It also fixes an existing bug
where the NFSRDMA server was not minding the device fast register page
list length limitations.
Signed-off-by: NTom Tucker <tom@opengridcomputing.com>
Signed-off-by: NSteve Wise <swise@opengridcomputing.com>

0bf48289

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功