提交 · ac5bb5b3b0502b90b489e6b19c46b8c3dbe2e640 · openanolis / cloud-kernel

10 8月, 2018 4 次提交

rpc: remove unneeded variable 'ret' in rdma_listen_handler · ac5bb5b3

由 zhong jiang 提交于 8月 07, 2018

The ret is not modified after initalization, So just remove the variable
and return 0.
Signed-off-by: Nzhong jiang <zhongjiang@huawei.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

ac5bb5b3

svcrdma: Clean up Read chunk path · 07d0ff3b

由 Chuck Lever 提交于 7月 27, 2018

Simplify the error handling at the tail of recv_read_chunk() by
re-arranging rq_pages[] housekeeping and documenting it properly.

NB: In this path, svc_rdma_recvfrom returns zero. Therefore no
subsequent reply processing is done on the svc_rqstp, and thus the
rq_respages field does not need to be updated.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

07d0ff3b

svcrdma: Avoid releasing a page in svc_xprt_release() · a53d5cb0

由 Chuck Lever 提交于 7月 27, 2018

svc_xprt_release() invokes svc_free_res_pages(), which releases
pages between rq_respages and rq_next_page.

Historically, the RPC/RDMA transport has set these two pointers to
be different by one, which means:

- one page gets released when svc_recv returns 0. This normally
happens whenever one or more RDMA Reads need to be dispatched to
complete construction of an RPC Call.

- one page gets released after every call to svc_send.

In both cases, this released page is immediately refilled by
svc_alloc_arg. There does not seem to be a reason for releasing this
page.

To avoid this unnecessary memory allocator traffic, set rq_next_page
more carefully.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

a53d5cb0

sunrpc: remove redundant variables 'checksumlen','blocksize' and 'data' · 9cc3b98d

由 YueHaibing 提交于 8月 01, 2018

Variables 'checksumlen','blocksize' and 'data' are being assigned,
but are never used, hence they are redundant and can be removed.

Fix the following warning:

net/sunrpc/auth_gss/gss_krb5_wrap.c:443:7: warning: variable ‘blocksize’ set but not used [-Wunused-but-set-variable]
net/sunrpc/auth_gss/gss_krb5_crypto.c:376:15: warning: variable ‘checksumlen’ set but not used [-Wunused-but-set-variable]
net/sunrpc/xprtrdma/svc_rdma.c:97:9: warning: variable ‘data’ set but not used [-Wunused-but-set-variable]
Signed-off-by: NYueHaibing <yuehaibing@huawei.com>
Reviewed-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

9cc3b98d

09 6月, 2018 1 次提交

svcrdma: Fix incorrect return value/type in svc_rdma_post_recvs · af7fd74e

由 Chuck Lever 提交于 5月 14, 2018

This crept in during the development process and wasn't caught
before I posted the "final" version.
Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
Fixes: 0b2613c5883f ('svcrdma: Allocate recv_ctxt's on CPU ... ')
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

af7fd74e

02 6月, 2018 5 次提交

xprtrdma: Remove transfertypes array · 11d0ac16

由 Chuck Lever 提交于 5月 04, 2018

Clean up: This array was used in a dprintk that was replaced by a
trace point in commit ab03eff5 ("xprtrdma: Add trace points in
RPC Call transmit paths").
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

11d0ac16

xprtrdma: Add trace_xprtrdma_dma_map(mr) · 8335640c

由 Chuck Lever 提交于 5月 04, 2018

Matches trace_xprtrdma_dma_unmap(mr).
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

8335640c

xprtrdma: Wait on empty sendctx queue · 2fad6592

由 Chuck Lever 提交于 5月 04, 2018

Currently, when the sendctx queue is exhausted during marshaling, the
RPC/RDMA transport places the RPC task on the delayq, which forces a
wait for HZ >> 2 before the marshal and send is retried.

With this change, the transport now places such an RPC task on the
pending queue, and wakes it just as soon as more sendctxs become
available. This typically takes less than a millisecond, and the
write_space waking mechanism is less deadlock-prone.

Moreover, the waiting RPC task is holding the transport's write
lock, which blocks the transport from sending RPCs. Therefore faster
recovery from sendctx queue exhaustion is desirable.

Cf. commit 5804891455d5 ("xprtrdma: ->send_request returns -EAGAIN
when there are no free MRs").
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

2fad6592

xprtrdma: Move common wait_for_buffer_space call to parent function · ed3aa742

由 Chuck Lever 提交于 5月 04, 2018

Clean up: The logic to wait for write space is common to a bunch of
the encoding helper functions. Lift it out and put it in the tail
of rpcrdma_marshal_req().
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

ed3aa742

xprtrdma: Return -ENOBUFS when no pages are available · a8f688ec

由 Chuck Lever 提交于 5月 04, 2018

The use of -EAGAIN in rpcrdma_convert_iovs() is a latent bug: the
transport never calls xprt_write_space() when more pages become
available. -ENOBUFS will trigger the correct "delay briefly and call
again" logic.

Fixes: 7a89f9c6 ("xprtrdma: Honor ->send_request API contract")
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Cc: stable@vger.kernel.org # 4.8+
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

a8f688ec

12 5月, 2018 18 次提交

svcrdma: Persistently allocate and DMA-map Send buffers · 99722fe4

由 Chuck Lever 提交于 5月 07, 2018

While sending each RPC Reply, svc_rdma_sendto allocates and DMA-
maps a separate buffer where the RPC/RDMA transport header is
constructed. The buffer is unmapped and released in the Send
completion handler. This is significant per-RPC overhead,
especially for small RPCs.

Instead, allocate and DMA-map a buffer, and cache it in each
svc_rdma_send_ctxt. This buffer and its mapping can be re-used
for each RPC, saving the cost of memory allocation and DMA
mapping.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

99722fe4

svcrdma: Simplify svc_rdma_send() · 3abb03fa

由 Chuck Lever 提交于 5月 07, 2018

Clean up: No current caller of svc_rdma_send's passes in a chained
WR. The logic that counts the chain length can be replaced with a
constant (1).
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

3abb03fa

svcrdma: Remove post_send_wr · 986b7889

由 Chuck Lever 提交于 5月 07, 2018

Clean up: Now that the send_wr is part of the svc_rdma_send_ctxt,
svc_rdma_post_send_wr is nearly empty.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

986b7889

svcrdma: Don't overrun the SGE array in svc_rdma_send_ctxt · 25fd86ec

由 Chuck Lever 提交于 5月 07, 2018

Receive buffers are always the same size, but each Send WR has a
variable number of SGEs, based on the contents of the xdr_buf being
sent.

While assembling a Send WR, keep track of the number of SGEs so that
we don't exceed the device's maximum, or walk off the end of the
Send SGE array.

For now the Send path just fails if it exceeds the maximum.

The current logic in svc_rdma_accept bases the maximum number of
Send SGEs on the largest NFS request that can be sent or received.
In the transport layer, the limit is actually based on the
capabilities of the underlying device, not on properties of the
Upper Layer Protocol.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

25fd86ec

svcrdma: Introduce svc_rdma_send_ctxt · 4201c746

由 Chuck Lever 提交于 5月 07, 2018

svc_rdma_op_ctxt's are pre-allocated and maintained on a per-xprt
free list. This eliminates the overhead of calling kmalloc / kfree,
both of which grab a globally shared lock that disables interrupts.
Introduce a replacement to svc_rdma_op_ctxt's that is built
especially for the svcrdma Send path.

Subsequent patches will take advantage of this new structure by
allocating real resources which are then cached in these objects.
The allocations are freed when the transport is torn down.

I've renamed the structure so that static type checking can be used
to ensure that uses of op_ctxt and send_ctxt are not confused. As an
additional clean up, structure fields are renamed to conform with
kernel coding conventions.

Additional clean ups:
- Handle svc_rdma_send_ctxt_get allocation failure at each call
  site, rather than pre-allocating and hoping we guessed correctly
- All send_ctxt_put call-sites request page freeing, so remove
  the @free_pages argument
- All send_ctxt_put call-sites unmap SGEs, so fold that into
  svc_rdma_send_ctxt_put
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

4201c746

svcrdma: Clean up Send SGE accounting · 23262790

由 Chuck Lever 提交于 5月 07, 2018

Clean up: Since there's already a svc_rdma_op_ctxt being passed
around with the running count of mapped SGEs, drop unneeded
parameters to svc_rdma_post_send_wr().
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

23262790

svcrdma: Refactor svc_rdma_dma_map_buf · f016f305

由 Chuck Lever 提交于 5月 07, 2018

Clean up: svc_rdma_dma_map_buf does mostly the same thing as
svc_rdma_dma_map_page, so let's fold these together.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

f016f305

svcrdma: Allocate recv_ctxt's on CPU handling Receives · eb5d7a62

由 Chuck Lever 提交于 5月 07, 2018

There is a significant latency penalty when processing an ingress
Receive if the Receive buffer resides in memory that is not on the
same NUMA node as the the CPU handling completions for a CQ.

The system administrator and the device driver determine which CPU
handles completions. This CPU does not change during life of the CQ.
Further the Upper Layer does not have any visibility of which CPU it
is.

Allocating Receive buffers in the Receive completion handler
guarantees that Receive buffers are allocated on the preferred NUMA
node for that CQ.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

eb5d7a62

svcrdma: Persistently allocate and DMA-map Receive buffers · 3316f063

由 Chuck Lever 提交于 5月 07, 2018

The current Receive path uses an array of pages which are allocated
and DMA mapped when each Receive WR is posted, and then handed off
to the upper layer in rqstp::rq_arg. The page flip releases unused
pages in the rq_pages pagelist. This mechanism introduces a
significant amount of overhead.

So instead, kmalloc the Receive buffer, and leave it DMA-mapped
while the transport remains connected. This confers a number of
benefits:

* Each Receive WR requires only one receive SGE, no matter how large
  the inline threshold is. This helps the server-side NFS/RDMA
  transport operate on less capable RDMA devices.

* The Receive buffer is left allocated and mapped all the time. This
  relieves svc_rdma_post_recv from the overhead of allocating and
  DMA-mapping a fresh buffer.

* svc_rdma_wc_receive no longer has to DMA unmap the Receive buffer.
  It has to DMA sync only the number of bytes that were received.

* svc_rdma_build_arg_xdr no longer has to free a page in rq_pages
  for each page in the Receive buffer, making it a constant-time
  function.

* The Receive buffer is now plugged directly into the rq_arg's
  head[0].iov_vec, and can be larger than a page without spilling
  over into rq_arg's page list. This enables simplification of
  the RDMA Read path in subsequent patches.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

3316f063

svcrdma: Preserve Receive buffer until svc_rdma_sendto · 3a88092e

由 Chuck Lever 提交于 5月 07, 2018

Rather than releasing the incoming svc_rdma_recv_ctxt at the end of
svc_rdma_recvfrom, hold onto it until svc_rdma_sendto.

This permits the contents of the Receive buffer to be preserved
through svc_process and then referenced directly in sendto as it
constructs Write and Reply chunks to return to the client.

The real changes will come in subsequent patches.

Note: I cannot use ->xpo_release_rqst for this purpose because that
is called _before_ ->xpo_sendto. svc_rdma_sendto uses information in
the received Call transport header to construct the Reply transport
header, which is preserved in the RPC's Receive buffer.

The historical comment in svc_send() isn't helpful: it is already
obvious that ->xpo_release_rqst is being called before ->xpo_sendto,
but there is no explanation for this ordering going back to the
beginning of the git era.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

3a88092e

svcrdma: Simplify svc_rdma_recv_ctxt_put · 1e5f4160

由 Chuck Lever 提交于 5月 07, 2018

Currently svc_rdma_recv_ctxt_put's callers have to know whether they
want to free the ctxt's pages or not. This means the human
developers have to know when and why to set that free_pages
argument.

Instead, the ctxt should carry that information with it so that
svc_rdma_recv_ctxt_put does the right thing no matter who is
calling.

We want to keep track of the number of pages in the Receive buffer
separately from the number of pages pulled over by RDMA Read. This
is so that the correct number of pages can be freed properly and
that number is well-documented.

So now, rc_hdr_count is the number of pages consumed by head[0]
(ie., the page index where the Read chunk should start); and
rc_page_count is always the number of pages that need to be released
when the ctxt is put.

The @free_pages argument is no longer needed.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

1e5f4160

svcrdma: Remove sc_rq_depth · 2c577bfe

由 Chuck Lever 提交于 5月 07, 2018

Clean up: No need to retain rq_depth in struct svcrdma_xprt, it is
used only in svc_rdma_accept().
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

2c577bfe

svcrdma: Introduce svc_rdma_recv_ctxt · ecf85b23

由 Chuck Lever 提交于 5月 07, 2018

svc_rdma_op_ctxt's are pre-allocated and maintained on a per-xprt
free list. This eliminates the overhead of calling kmalloc / kfree,
both of which grab a globally shared lock that disables interrupts.
To reduce contention further, separate the use of these objects in
the Receive and Send paths in svcrdma.

Subsequent patches will take advantage of this separation by
allocating real resources which are then cached in these objects.
The allocations are freed when the transport is torn down.

I've renamed the structure so that static type checking can be used
to ensure that uses of op_ctxt and recv_ctxt are not confused. As an
additional clean up, structure fields are renamed to conform with
kernel coding conventions.

As a final clean up, helpers related to recv_ctxt are moved closer
to the functions that use them.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

ecf85b23

svcrdma: Trace key RDMA API events · bd2abef3

由 Chuck Lever 提交于 5月 07, 2018

This includes:
  * Posting on the Send and Receive queues
  * Send, Receive, Read, and Write completion
  * Connect upcalls
  * QP errors
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

bd2abef3

svcrdma: Trace key RPC/RDMA protocol events · 98895edb

由 Chuck Lever 提交于 5月 07, 2018

This includes:
  * Transport accept and tear-down
  * Decisions about using Write and Reply chunks
  * Each RDMA segment that is handled
  * Whenever an RDMA_ERR is sent

As a clean-up, I've standardized the order of the includes, and
removed some now redundant dprintk call sites.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

98895edb

xprtrdma: Prepare RPC/RDMA includes for server-side trace points · b6e717cb

由 Chuck Lever 提交于 5月 07, 2018

Clean up: Move #include <trace/events/rpcrdma.h> into source files,
similar to how it is done with trace/events/sunrpc.h.

Server-side trace points will be part of the rpcrdma subsystem,
just like the client-side trace points.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

b6e717cb

svcrdma: Use passed-in net namespace when creating RDMA listener · 8dafcbee

由 Chuck Lever 提交于 5月 07, 2018

Ensure each RDMA listener and its children transports are created in
the same net namespace as the user that started the NFS service.
This is similar to how listener sockets are created in
svc_create_socket, required for enabling support for containers.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

8dafcbee

C
svcrdma: Add proper SPDX tags for NetApp-contributed source · bcf3ffd4
由 Chuck Lever 提交于 5月 07, 2018
```
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
```
bcf3ffd4

07 5月, 2018 12 次提交

xprtrdma: Make rpcrdma_sendctx_put_locked() a static function · efd81e90

由 Chuck Lever 提交于 5月 04, 2018

Clean up: The only call site is in the same file as the function's
definition.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

efd81e90

xprtrdma: Remove rpcrdma_buffer_get_rep_locked() · 9d95cd53

由 Chuck Lever 提交于 5月 04, 2018

Clean up: There is only one remaining call site for this helper.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

9d95cd53

xprtrdma: Remove rpcrdma_buffer_get_req_locked() · e68699cc

由 Chuck Lever 提交于 5月 04, 2018

Clean up. There is only one call-site for this helper, and it can be
simplified by using list_first_entry_or_null().
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

e68699cc

xprtrdma: Remove rpcrdma_ep_{post_recv, post_extra_recv} · a7986f09

由 Chuck Lever 提交于 5月 04, 2018

Clean up: These functions are no longer used.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

a7986f09

xprtrdma: Move Receive posting to Receive handler · 7c8d9e7c

由 Chuck Lever 提交于 5月 04, 2018

Receive completion and Reply handling are done by a BOUND
workqueue, meaning they run on only one CPU.

Posting receives is currently done in the send_request path, which
on large systems is typically done on a different CPU than the one
handling Receive completions. This results in movement of
Receive-related cachelines between the sending and receiving CPUs.

More importantly, it means that currently Receives are posted while
the transport's write lock is held, which is unnecessary and costly.

Finally, allocation of Receive buffers is performed on-demand in
the Receive completion handler. This helps guarantee that they are
allocated on the same NUMA node as the CPU that handles Receive
completions.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

7c8d9e7c

xprtrdma: Clean up Receive trace points · 0e0b854c

由 Chuck Lever 提交于 5月 04, 2018

For clarity, report the posting and completion of Receive CQEs.

Also, the wc->byte_len field contains garbage if wc->status is
non-zero, and the vendor error field contains garbage if wc->status
is zero. For readability, don't save those fields in those cases.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

0e0b854c

xprtrdma: Make rpc_rqst part of rpcrdma_req · edb41e61

由 Chuck Lever 提交于 5月 04, 2018

This simplifies allocation of the generic RPC slot and xprtrdma
specific per-RPC resources.

It also makes xprtrdma more like the socket-based transports:
->buf_alloc and ->buf_free are now responsible only for send and
receive buffers.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

edb41e61

xprtrdma: Introduce ->alloc_slot call-out for xprtrdma · 48be539d

由 Chuck Lever 提交于 5月 04, 2018

rpcrdma_buffer_get acquires an rpcrdma_req and rep for each RPC.
Currently this is done in the call_allocate action, and sometimes it
can fail if there are many outstanding RPCs.

When call_allocate fails, the RPC task is put on the delayq. It is
awoken a few milliseconds later, but there's no guarantee it will
get a buffer at that time. The RPC task can be repeatedly put back
to sleep or even starved.

The call_allocate action should rarely fail. The delayq mechanism is
not meant to deal with transport congestion.

In the current sunrpc stack, there is a friendlier way to deal with
this situation. These objects are actually tantamount to an RPC
slot (rpc_rqst) and there is a separate FSM action, distinct from
call_allocate, for allocating slot resources. This is the
call_reserve action.

When allocation fails during this action, the RPC is placed on the
transport's backlog queue. The backlog mechanism provides a stronger
guarantee that when the RPC is awoken, a buffer will be available
for it; and backlogged RPCs are awoken one-at-a-time.

To make slot resource allocation occur in the call_reserve action,
create special ->alloc_slot and ->free_slot call-outs for xprtrdma.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

48be539d

SUNRPC: Add a ->free_slot transport callout · a9cde23a

由 Chuck Lever 提交于 5月 04, 2018

Refactor: xprtrdma needs to have better control over when RPCs are
awoken from the backlog queue, so replace xprt_free_slot with a
transport op callout.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

a9cde23a

xprtrdma: Fix max_send_wr computation · 914fcad9

由 Chuck Lever 提交于 5月 04, 2018

For FRWR, the computation of max_send_wr is split between
frwr_op_open and rpcrdma_ep_create, which makes it difficult to tell
that the max_send_wr result is currently incorrect if frwr_op_open
has to reduce the credit limit to accommodate a small max_qp_wr.
This is a problem now that extra WRs are needed for backchannel
operations and a drain CQE.

So, refactor the computation so that it is all done in ->ro_open,
and fix the FRWR version of this computation so that it
accommodates HCAs with small max_qp_wr correctly.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

914fcad9

xprtrdma: Create transport's CM ID in the correct network namespace · 107c4beb

由 Chuck Lever 提交于 5月 04, 2018

Set up RPC/RDMA transport in mount.nfs's network namespace. This
passes the correct namespace information to the RDMA core, similar
to how RPC sockets are created (see xs_create_sock).
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

107c4beb

xprtrdma: Try to fail quickly if proto=rdma · 52d28fe4

由 Chuck Lever 提交于 5月 04, 2018

rdma_resolve_addr(3) says:

> This call is used to map a given destination IP address to a
> usable RDMA address. The IP to RDMA address mapping is done
> using the local routing tables, or via ARP.

If this can't be done, there's no local device that can be used
to establish an RDMA-capable network path to the remote. In this
case, the RDMA CM very quickly posts an RDMA_CM_EVENT_ADDR_ERROR
upcall.

Currently rpcrdma_conn_upcall() converts RDMA_CM_EVENT_ADDR_ERROR
to EHOSTUNREACH. mount.nfs seems to want to retry EHOSTUNREACH
forever, thinking that this is a temporary situation. This makes
mount.nfs appear to hang if I try to mount with proto=rdma through,
say, a conventional Ethernet device.

If the admin has specified proto=rdma along with a server IP address
that requires a network path that does not support RDMA, instead
let's fail with a permanent error. -EPROTONOSUPPORT is returned when
NFSv4 or one of its minor versions is not supported.

-EPROTO is not (currently) retried by mount.nfs.

There are potentially other similar cases where -EPROTO is an
appropriate return code.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Tested-by: NOlga Kornievskaia <kolga@netapp.com>
Tested-by: NAnna Schumaker <Anna.Schumaker@netapp.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

52d28fe4

openanolis / cloud-kernel 大约 1 年 前同步成功

openanolis / cloud-kernel
大约 1 年前同步成功