提交 · 6afafa7799cf6fa3c0efb6887704506d21965ad6 · openanolis / cloud-kernel

14 7月, 2017 11 次提交

xprtrdma: Fix documenting comments in frwr_ops.c · 6afafa77

由 Chuck Lever 提交于 6月 08, 2017

Clean up.

FASTREG and LOCAL_INV WRs are typically not signaled. localinv_wake
is used for the last LOCAL_INV WR in a chain, which is always
signaled. The documenting comments should reflect that.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

6afafa77

xprtrdma: Replace PAGE_MASK with offset_in_page() · d933cc32

由 Chuck Lever 提交于 6月 08, 2017

Clean up.

Reported by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

d933cc32

xprtrdma: FMR does not need list_del_init() · e2f6ef09

由 Chuck Lever 提交于 6月 08, 2017

Clean up.

Commit 38f1932e ("xprtrdma: Remove FMRs from the unmap list
after unmapping") utilized list_del_init() to try to prevent some
list corruption. The corruption was actually caused by the reply
handler racing with a signal. Now that MR invalidation is properly
serialized, list_del_init() can safely be replaced.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

e2f6ef09

xprtrdma: Demote "connect" log messages · 173b8f49

由 Chuck Lever 提交于 6月 08, 2017

Some have complained about the log messages generated when xprtrdma
opens or closes a connection to a server. When an NFS mount is
mostly idle these can appear every few minutes as the client idles
out the connection and reconnects.

Connection and disconnection is a normal part of operation, and not
exceptional, so change these to dprintk's for now. At some point
all of these will be converted to tracepoints, but that's for
another day.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

173b8f49

xprtrdma: Don't defer MR recovery if ro_map fails · 1f541895

由 Chuck Lever 提交于 6月 08, 2017

Deferred MR recovery does a DMA-unmapping of the MW. However, ro_map
invokes rpcrdma_defer_mr_recovery in some error cases where the MW
has not even been DMA-mapped yet.

Avoid a DMA-unmapping error replacing rpcrdma_defer_mr_recovery.

Also note that if ib_dma_map_sg is asked to map 0 nents, it will
return 0. So the extra "if (i == 0)" check is no longer needed.

Fixes: 42fe28f6 ("xprtrdma: Do not leak an MW during a DMA ...")
Fixes: 505bbe64 ("xprtrdma: Refactor MR recovery work queues")
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

1f541895

xprtrdma: Fix FRWR invalidation error recovery · 8d75483a

由 Chuck Lever 提交于 6月 08, 2017

When ib_post_send() fails, all LOCAL_INV WRs past @bad_wr have to be
examined, and the MRs reset by hand.

I'm not sure how the existing code can work by comparing R_keys.
Restructure the logic so that instead it walks the chain of WRs,
starting from the first bad one.

Make sure to wait for completion if at least one WR was actually
posted. Otherwise, if the ib_post_send fails, we can end up
DMA-unmapping the MR while LOCAL_INV operations are in flight.

Commit 7a89f9c6 ("xprtrdma: Honor ->send_request API contract")
added the rdma_disconnect() call site. The disconnect actually
causes more problems than it solves, and SQ overruns happen only as
a result of software bugs. So remove it.

Fixes: d7a21c1b ("xprtrdma: Reset MRs in frwr_op_unmap_sync()")
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

8d75483a

xprtrdma: Fix client lock-up after application signal fires · 431af645

由 Chuck Lever 提交于 6月 08, 2017

After a signal, the RPC client aborts synchronous RPCs running on
behalf of the signaled application.

The server is still executing those RPCs, and will write the results
back into the client's memory when it's done. By the time the server
writes the results, that memory is likely being used for other
purposes. Therefore xprtrdma has to immediately invalidate all
memory regions used by those aborted RPCs to prevent the server's
writes from clobbering that re-used memory.

With FMR memory registration, invalidation takes a relatively long
time. In fact, the invalidation is often still running when the
server tries to write the results into the memory regions that are
being invalidated.

This sets up a race between two processes:

1.  After the signal, xprt_rdma_free calls ro_unmap_safe.
2.  While ro_unmap_safe is still running, the server replies and
    rpcrdma_reply_handler runs, calling ro_unmap_sync.

Both processes invoke ib_unmap_fmr on the same FMR.

The mlx4 driver allows two ib_unmap_fmr calls on the same FMR at
the same time, but HCAs generally don't tolerate this. Sometimes
this can result in a system crash.

If the HCA happens to survive, rpcrdma_reply_handler continues. It
removes the rpc_rqst from rq_list and releases the transport_lock.
This enables xprt_rdma_free to run in another process, and the
rpc_rqst is released while rpcrdma_reply_handler is still waiting
for the ib_unmap_fmr call to finish.

But further down in rpcrdma_reply_handler, the transport_lock is
taken again, and "rqst" is dereferenced. If "rqst" has already been
released, this triggers a general protection fault. Since bottom-
halves are disabled, the system locks up.

Address both issues by reversing the order of the xprt_lookup_rqst
call and the ro_unmap_sync call. Introduce a separate lookup
mechanism for rpcrdma_req's to enable calling ro_unmap_sync before
xprt_lookup_rqst. Now the handler takes the transport_lock once
and holds it for the XID lookup and RPC completion.

BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=305
Fixes: 68791649 ('xprtrdma: Invalidate in the RPC reply ... ')
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

431af645

xprtrdma: Rename rpcrdma_req::rl_free · a80d66c9

由 Chuck Lever 提交于 6月 08, 2017

Clean up: I'm about to use the rl_free field for purposes other than
a free list. So use a more generic name.

This is a refactoring change only.

BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=305
Fixes: 68791649 ('xprtrdma: Invalidate in the RPC reply ... ')
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

a80d66c9

xprtrdma: Pass only the list of registered MRs to ro_unmap_sync · 451d26e1

由 Chuck Lever 提交于 6月 08, 2017

There are rare cases where an rpcrdma_req can be re-used (via
rpcrdma_buffer_put) while the RPC reply handler is still running.
This is due to a signal firing at just the wrong instant.

Since commit 9d6b0409 ("xprtrdma: Place registered MWs on a
per-req list"), rpcrdma_mws are self-contained; ie., they fully
describe an MR and scatterlist, and no part of that information is
stored in struct rpcrdma_req.

As part of closing the above race window, pass only the req's list
of registered MRs to ro_unmap_sync, rather than the rpcrdma_req
itself.

Some extra transport header sanity checking is removed. Since the
client depends on its own recollection of what memory had been
registered, there doesn't seem to be a way to abuse this change.

And, the check was not terribly effective. If the client had sent
Read chunks, the "list_empty" test is negative in both of the
removed cases, which are actually looking for Write or Reply
chunks.

BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=305
Fixes: 68791649 ('xprtrdma: Invalidate in the RPC reply ... ')
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

451d26e1

xprtrdma: Pre-mark remotely invalidated MRs · 4b196dc6

由 Chuck Lever 提交于 6月 08, 2017

There are rare cases where an rpcrdma_req and its matched
rpcrdma_rep can be re-used, via rpcrdma_buffer_put, while the RPC
reply handler is still using that req. This is typically due to a
signal firing at just the wrong instant.

As part of closing this race window, avoid using the wrong
rpcrdma_rep to detect remotely invalidated MRs. Mark MRs as
invalidated while we are sure the rep is still OK to use.

BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=305
Fixes: 68791649 ('xprtrdma: Invalidate in the RPC reply ... ')
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

4b196dc6

xprtrdma: On invalidation failure, remove MWs from rl_registered · 04d25b7d

由 Chuck Lever 提交于 6月 08, 2017

Callers assume the ro_unmap_sync and ro_unmap_safe methods empty
the list of registered MRs. Ensure that all paths through
fmr_op_unmap_sync() remove MWs from that list.

Fixes: 9d6b0409 ("xprtrdma: Place registered MWs on a ... ")
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

04d25b7d

24 5月, 2017 1 次提交

xprtrdma: Delete an error message for a failed memory allocation in xprt_rdma_bc_setup() · d2c23c00

由 Markus Elfring 提交于 5月 22, 2017

Omit an extra message for a memory allocation failure in this function.

This issue was detected by using the Coccinelle software.

Link: http://events.linuxfoundation.org/sites/events/files/slides/LCJ16-Refactor_Strings-WSang_0.pdfSigned-off-by: NMarkus Elfring <elfring@users.sourceforge.net>
Reviewed-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

d2c23c00

26 4月, 2017 25 次提交

svcrdma: Clean out old XDR encoders · dadf3e43

由 Chuck Lever 提交于 4月 09, 2017

Clean up: These have been replaced and are no longer used.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

dadf3e43

svcrdma: Remove the req_map cache · 2cf32924

由 Chuck Lever 提交于 4月 09, 2017

req_maps are no longer used by the send path and can thus be removed.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

2cf32924

svcrdma: Remove unused RDMA Write completion handler · 68cc4636

由 Chuck Lever 提交于 4月 09, 2017

Clean up. All RDMA Write completions are now handled by
svc_rdma_wc_write_ctx.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

68cc4636

svcrdma: Reduce size of sge array in struct svc_rdma_op_ctxt · ded8d196

由 Chuck Lever 提交于 4月 09, 2017

The sge array in struct svc_rdma_op_ctxt is no longer used for
sending RDMA Write WRs. It need only accommodate the construction of
Send and Receive WRs. The maximum inline size is the largest payload
it needs to handle now.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

ded8d196

svcrdma: Clean up RPC-over-RDMA backchannel reply processing · f5821c76

由 Chuck Lever 提交于 4月 09, 2017

Replace C structure-based XDR decoding with pointer arithmetic.
Pointer arithmetic is considered more portable.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

f5821c76

svcrdma: Report Write/Reply chunk overruns · 4757d90b

由 Chuck Lever 提交于 4月 09, 2017

Observed at Connectathon 2017.

If a client has underestimated the size of a Write or Reply chunk,
the Linux server writes as much payload data as it can, then it
recognizes there was a problem and closes the connection without
sending the transport header.

This creates a couple of problems:

<> The client never receives indication of the server-side failure,
   so it continues to retransmit the bad RPC. Forward progress on
   the transport is blocked.

<> The reply payload pages are not moved out of the svc_rqst, thus
   they can be released by the RPC server before the RDMA Writes
   have completed.

The new rdma_rw-ized helpers return a distinct error code when a
Write/Reply chunk overrun occurs, so it's now easy for the caller
(svc_rdma_sendto) to recognize this case.

Instead of dropping the connection, post an RDMA_ERROR message. The
client now sees an RDMA_ERROR and can properly terminate the RPC
transaction.

As part of the new logic, set up the same delayed release for these
payload pages as would have occurred in the normal case.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

4757d90b

svcrdma: Clean up RDMA_ERROR path · 6b19cc5c

由 Chuck Lever 提交于 4月 09, 2017

Now that svc_rdma_sendto has been renovated, svc_rdma_send_error can
be refactored to reduce code duplication and remove C structure-
based XDR encoding. It is also relocated to the source file that
contains its only caller.

This is a refactoring change only.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

6b19cc5c

svcrdma: Use rdma_rw API in RPC reply path · 9a6a180b

由 Chuck Lever 提交于 4月 09, 2017

The current svcrdma sendto code path posts one RDMA Write WR at a
time. Each of these Writes typically carries a small number of pages
(for instance, up to 30 pages for mlx4 devices). That means a 1MB
NFS READ reply requires 9 ib_post_send() calls for the Write WRs,
and one for the Send WR carrying the actual RPC Reply message.

Instead, use the new rdma_rw API. The details of Write WR chain
construction and memory registration are taken care of in the RDMA
core. svcrdma can focus on the details of the RPC-over-RDMA
protocol. This gives three main benefits:

1. All Write WRs for one RDMA segment are posted in a single chain.
As few as one ib_post_send() for each Write chunk.

2. The Write path can now use FRWR to register the Write buffers.
If the device's maximum page list depth is large, this means a
single Write WR is needed for each RPC's Write chunk data.

3. The new code introduces support for RPCs that carry both a Write
list and a Reply chunk. This combination can be used for an NFSv4
READ where the data payload is large, and thus is removed from the
Payload Stream, but the Payload Stream is still larger than the
inline threshold.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

9a6a180b

svcrdma: Introduce local rdma_rw API helpers · f13193f5

由 Chuck Lever 提交于 4月 09, 2017

The plan is to replace the local bespoke code that constructs and
posts RDMA Read and Write Work Requests with calls to the rdma_rw
API. This shares code with other RDMA-enabled ULPs that manages the
gory details of buffer registration and posting Work Requests.

Some design notes:

 o The structure of RPC-over-RDMA transport headers is flexible,
   allowing multiple segments per Reply with arbitrary alignment,
   each with a unique R_key. Write and Send WRs continue to be
   built and posted in separate code paths. However, one whole
   chunk (with one or more RDMA segments apiece) gets exactly
   one ib_post_send and one work completion.

 o svc_xprt reference counting is modified, since a chain of
   rdma_rw_ctx structs generates one completion, no matter how
   many Write WRs are posted.

 o The current code builds the transport header as it is construct-
   ing Write WRs. I've replaced that with marshaling of transport
   header data items in a separate step. This is because the exact
   structure of client-provided segments may not align with the
   components of the server's reply xdr_buf, or the pages in the
   page list. Thus parts of each client-provided segment may be
   written at different points in the send path.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

f13193f5

svcrdma: Clean up svc_rdma_get_inv_rkey() · c238c4c0

由 Chuck Lever 提交于 4月 09, 2017

Replace C structure-based XDR decoding with more portable code that
instead uses pointer arithmetic.

This is a refactoring change only.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

c238c4c0

svcrdma: Add helper to save pages under I/O · c55ab070

由 Chuck Lever 提交于 4月 09, 2017

Clean up: extract the logic to save pages under I/O into a helper to
add a big documenting comment without adding clutter in the send
path.

This is a refactoring change only.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

c55ab070

svcrdma: Eliminate RPCRDMA_SQ_DEPTH_MULT · b623589d

由 Chuck Lever 提交于 4月 09, 2017

The Send Queue depth is temporarily reduced to 1 SQE per credit. The
new rdma_rw API does an internal computation, during QP creation, to
increase the depth of the Send Queue to handle RDMA Read and Write
operations.

This change has to come before the NFSD code paths are updated to
use the rdma_rw API. Without this patch, rdma_rw_init_qp() increases
the size of the SQ too much, resulting in memory allocation failures
during QP creation.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

b623589d

svcrdma: Add svc_rdma_map_reply_hdr() · 6e6092ca

由 Chuck Lever 提交于 4月 09, 2017

Introduce a helper to DMA-map a reply's transport header before
sending it. This will in part replace the map vector cache.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

6e6092ca

svcrdma: Move send_wr to svc_rdma_op_ctxt · 17f5f7f5

由 Chuck Lever 提交于 4月 09, 2017

Clean up: Move the ib_send_wr off the stack, and move common code
to post a Send Work Request into a helper.

This is a refactoring change only.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

17f5f7f5

xprtrdma: Remove rpcrdma_buffer::rb_pool · 2be1fce9

由 Chuck Lever 提交于 4月 11, 2017

Since commit 1e465fd4 ("xprtrdma: Replace send and receive
arrays"), this field is no longer used.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

2be1fce9

xprtrdma: Squelch ENOBUFS warnings · 0031e47c

由 Chuck Lever 提交于 4月 11, 2017

When ro_map is out of buffers, that's not a permanent error, so
don't report a problem.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

0031e47c

xprtrdma: Annotate receive workqueue · 7d7fa9b5

由 Chuck Lever 提交于 4月 11, 2017

Micro-optimize the receive workqueue by marking it's anchor "read-
mostly."
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

7d7fa9b5

xprtrdma: Revert commit · 56a6bd15

由 Chuck Lever 提交于 4月 11, 2017

Device removal is now adequately supported. Pinning the underlying
device driver to prevent removal while an NFS mount is active is no
longer necessary.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

56a6bd15

xprtrdma: Restore transport after device removal · a9b0e381

由 Chuck Lever 提交于 4月 11, 2017

After a device removal, enable the transport connect worker to
restore normal operation if there is another device with
connectivity to the server.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

a9b0e381

xprtrdma: Refactor rpcrdma_ep_connect · 1890896b

由 Chuck Lever 提交于 4月 11, 2017

I'm about to add another arm to

    if (ep->rep_connected != 0)

It will be cleaner to use a switch statement here. We'll be looking
for a couple of specific errnos, or "anything else," basically to
sort out the difference between a normal reconnect and recovery from
device removal.

This is a refactoring change only.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

1890896b

xprtrdma: Support unplugging an HCA from under an NFS mount · bebd0318

由 Chuck Lever 提交于 4月 11, 2017

The device driver for the underlying physical device associated
with an RPC-over-RDMA transport can be removed while RPC-over-RDMA
transports are still in use (ie, while NFS filesystems are still
mounted and active). The IB core performs a connection event upcall
to request that consumers free all RDMA resources associated with
a transport.

There may be pending RPCs when this occurs. Care must be taken to
release associated resources without leaving references that can
trigger a subsequent crash if a signal or soft timeout occurs. We
rely on the caller of the transport's ->close method to ensure that
the previous RPC task has invoked xprt_release but the transport
remains write-locked.

A DEVICE_REMOVE upcall forces a disconnect then sleeps. When ->close
is invoked, it destroys the transport's H/W resources, then wakes
the upcall, which completes and allows the core driver unload to
continue.

BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=266Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

bebd0318

xprtrdma: Use same device when mapping or syncing DMA buffers · 91a10c52

由 Chuck Lever 提交于 4月 11, 2017

When the underlying device driver is reloaded, ia->ri_device will be
replaced. All cached copies of that device pointer have to be
updated as well.

Commit 54cbd6b0 ("xprtrdma: Delay DMA mapping Send and Receive
buffers") added the rg_device field to each regbuf. As part of
handling a device removal, rpcrdma_dma_unmap_regbuf is invoked on
all regbufs for a transport.

Simply calling rpcrdma_dma_map_regbuf for each Receive buffer after
the driver has been reloaded should reinitialize rg_device correctly
for every case except rpcrdma_wc_receive, which still uses
rpcrdma_rep::rr_device.

Ensure the same device that was used to map a Receive buffer is also
used to sync it in rpcrdma_wc_receive by using rg_device there
instead of rr_device.

This is the only use of rr_device, so it can be removed.

The use of regbufs in the send path is also updated, for
completeness.

Fixes: 54cbd6b0 ("xprtrdma: Delay DMA mapping Send and ... ")
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

91a10c52

xprtrdma: Refactor rpcrdma_ia_open() · fff09594

由 Chuck Lever 提交于 4月 11, 2017

In order to unload a device driver and reload it, xprtrdma will need
to close a transport's interface adapter, and then call
rpcrdma_ia_open again, possibly finding a different interface
adapter.

Make rpcrdma_ia_open safe to call on the same transport multiple
times.

This is a refactoring change only.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

fff09594

xprtrdma: Detect unreachable NFS/RDMA servers more reliably · 33849792

由 Chuck Lever 提交于 4月 11, 2017

Current NFS clients rely on connection loss to determine when to
retransmit. In particular, for protocols like NFSv4, clients no
longer rely on RPC timeouts to drive retransmission: NFSv4 servers
are required to terminate a connection when they need a client to
retransmit pending RPCs.

When a server is no longer reachable, either because it has crashed
or because the network path has broken, the server cannot actively
terminate a connection. Thus NFS clients depend on transport-level
keepalive to determine when a connection must be replaced and
pending RPCs retransmitted.

However, RDMA RC connections do not have a native keepalive
mechanism. If an NFS/RDMA server crashes after a client has sent
RPCs successfully (an RC ACK has been received for all OTW RDMA
requests), there is no way for the client to know the connection is
moribund.

In addition, new RDMA requests are subject to the RPC-over-RDMA
credit limit. If the client has consumed all granted credits with
NFS traffic, it is not allowed to send another RDMA request until
the server replies. Thus it has no way to send a true keepalive when
the workload has already consumed all credits with pending RPCs.

To address this, forcibly disconnect a transport when an RPC times
out. This prevents moribund connections from stopping the
detection of failover or other configuration changes on the server.

Note that even if the connection is still good, retransmitting
any RPC will trigger a disconnect thanks to this logic in
xprt_rdma_send_request:

	/* Must suppress retransmit to maintain credits */
	if (req->rl_connect_cookie == xprt->connect_cookie)
		goto drop_connection;
	req->rl_connect_cookie = xprt->connect_cookie;
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

33849792

xprtrdma: Cancel refresh worker during buffer shutdown · 9378b274

由 Chuck Lever 提交于 4月 11, 2017

Trying to create MRs while the transport is being torn down can
cause a crash.

Fixes: e2ac236c ("xprtrdma: Allocate MRs on demand")
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

9378b274

29 3月, 2017 1 次提交

svcrdma: set XPT_CONG_CTRL flag for bc xprt · 23abec20

由 Chuck Lever 提交于 3月 26, 2017

Same change as Kinglong Mee's fix for the TCP backchannel service.

Fixes: 5283b03e ("nfs/nfsd/sunrpc: enforce transport...")
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

23abec20

18 3月, 2017 1 次提交

xprtrdma: Squelch kbuild sparse complaint · eed50879

由 Chuck Lever 提交于 3月 11, 2017

New complaint from kbuild for 4.9.y:

net/sunrpc/xprtrdma/verbs.c:489:19: sparse: incompatible types in
    comparison expression (different type sizes)

verbs.c:
489	max_sge = min(ia->ri_device->attrs.max_sge, RPCRDMA_MAX_SEND_SGES);

I can't reproduce this running sparse here. Likewise, "make W=1
net/sunrpc/xprtrdma/verbs.o" never indicated any issue.

A little poking suggests that because the range of its values is
small, gcc can make the actual width of RPCRDMA_MAX_SEND_SGES
smaller than the width of an unsigned integer.

Fixes: 16f906d6 ("xprtrdma: Reduce required number of send SGEs")
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Cc: stable@kernel.org
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

eed50879

25 2月, 2017 1 次提交

sunrpc: flag transports as having congestion control · 362142b2

由 Jeff Layton 提交于 2月 24, 2017

NFSv4 requires a transport protocol with congestion control in most
cases.

On an IP network, that means that NFSv4 over UDP should be forbidden.

The situation with RDMA is a bit more nuanced, but most RDMA transports
are suitable for this. For now, we assume that all RDMA transports are
suitable, but we may need to revise that at some point.
Signed-off-by: NJeff Layton <jlayton@redhat.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

362142b2

openanolis / cloud-kernel 接近 2 年 前同步成功

openanolis / cloud-kernel
接近 2 年前同步成功