- 18 5月, 2016 3 次提交
-
-
由 Chuck Lever 提交于
When deciding whether to send a Call inline, rpcrdma_marshal_req doesn't take into account header bytes consumed by chunk lists. This results in Call messages on the wire that are sometimes larger than the inline threshold. Likewise, when a Write list or Reply chunk is in play, the server's reply has to emit an RDMA Send that includes a larger-than-minimal RPC-over-RDMA header. The actual size of a Call message cannot be estimated until after the chunk lists have been registered. Thus the size of each RPC-over-RDMA header can be estimated only after chunks are registered; but the decision to register chunks is based on the size of that header. Chicken, meet egg. The best a client can do is estimate header size based on the largest header that might occur, and then ensure that inline content is always smaller than that. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Tested-by: NSteve Wise <swise@opengridcomputing.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
由 Chuck Lever 提交于
Send buffer space is shared between the RPC-over-RDMA header and an RPC message. A large RPC-over-RDMA header means less space is available for the associated RPC message, which then has to be moved via an RDMA Read or Write. As more segments are added to the chunk lists, the header increases in size. Typical modern hardware needs only a few segments to convey the maximum payload size, but some devices and registration modes may need a lot of segments to convey data payload. Sometimes so many are needed that the remaining space in the Send buffer is not enough for the RPC message. Sending such a message usually fails. To ensure a transport can always make forward progress, cap the number of RDMA segments that are allowed in chunk lists. This prevents less-capable devices and memory registrations from consuming a large portion of the Send buffer by reducing the maximum data payload that can be conveyed with such devices. For now I choose an arbitrary maximum of 8 RDMA segments. This allows a maximum size RPC-over-RDMA header to fit nicely in the current 1024 byte inline threshold with over 700 bytes remaining for an inline RPC message. The current maximum data payload of NFS READ or WRITE requests is one megabyte. To convey that payload on a client with 4KB pages, each chunk segment would need to handle 32 or more data pages. This is well within the capabilities of FMR. For physical registration, the maximum payload size on platforms with 4KB pages is reduced to 32KB. For FRWR, a device's maximum page list depth would need to be at least 34 to support the maximum 1MB payload. A device with a smaller maximum page list depth means the maximum data payload is reduced when using that device. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Tested-by: NSteve Wise <swise@opengridcomputing.com> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
由 Chuck Lever 提交于
RPC-over-RDMA transports have a limit on how large a backward direction (backchannel) RPC message can be. Ensure that the NFSv4.x CREATE_SESSION operation advertises this limit to servers. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Tested-by: NSteve Wise <swise@opengridcomputing.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
- 15 3月, 2016 4 次提交
-
-
由 Chuck Lever 提交于
Calling ib_poll_cq() to sort through WCs during a completion is a common pattern amongst RDMA consumers. Since commit 14d3a3b2 ("IB: add a proper completion queue abstraction"), WC sorting can be handled by the IB core. By converting to this new API, xprtrdma is made a better neighbor to other RDMA consumers, as it allows the core to schedule the delivery of completions more fairly amongst all active consumers. Because each ib_cqe carries a pointer to a completion method, the core can now post its own operations on a consumer's QP, and handle the completions itself, without changes to the consumer. Send completions were previously handled entirely in the completion upcall handler (ie, deferring to a process context is unneeded). Thus IB_POLL_SOFTIRQ is a direct replacement for the current xprtrdma send code path. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Reviewed-by: NDevesh Sharma <devesh.sharma@broadcom.com> Reviewed-by: NSagi Grimberg <sagig@mellanox.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
由 Chuck Lever 提交于
Clean up: Make code more readable. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Reviewed-by: NDevesh Sharma <devesh.sharma@broadcom.com> Reviewed-by: NSagi Grimberg <sagig@mellanox.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
由 Chuck Lever 提交于
Calling ib_poll_cq() to sort through WCs during a completion is a common pattern amongst RDMA consumers. Since commit 14d3a3b2 ("IB: add a proper completion queue abstraction"), WC sorting can be handled by the IB core. By converting to this new API, xprtrdma is made a better neighbor to other RDMA consumers, as it allows the core to schedule the delivery of completions more fairly amongst all active consumers. Because each ib_cqe carries a pointer to a completion method, the core can now post its own operations on a consumer's QP, and handle the completions itself, without changes to the consumer. xprtrdma's reply processing is already handled in a work queue, but there is some initial order-dependent processing that is done in the soft IRQ context before a work item is scheduled. IB_POLL_SOFTIRQ is a direct replacement for the current xprtrdma receive code path. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Reviewed-by: NDevesh Sharma <devesh.sharma@broadcom.com> Reviewed-by: NSagi Grimberg <sagig@mellanox.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
由 Chuck Lever 提交于
Commit fe97b47c ("xprtrdma: Use workqueue to process RPC/RDMA replies") replaced the reply tasklet with a workqueue that allows RPC replies to be processed in parallel. Thus the credit values in RPC-over-RDMA replies can be applied in a different order than in which the server sent them. To fix this, revert commit eba8ff66 ("xprtrdma: Move credit update to RPC reply handler"). Reverting is done by hand to accommodate code changes that have occurred since then. Fixes: fe97b47c ("xprtrdma: Use workqueue to process . . .") Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Reviewed-by: NSagi Grimberg <sagig@mellanox.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
- 20 1月, 2016 2 次提交
-
-
由 Chuck Lever 提交于
To support the server-side of an NFSv4.1 backchannel on RDMA connections, add a transport class that enables backward direction messages on an existing forward channel connection. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Acked-by: NBruce Fields <bfields@fieldses.org> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Chuck Lever 提交于
Clean up. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Acked-by: NBruce Fields <bfields@fieldses.org> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
- 23 12月, 2015 1 次提交
-
-
由 Or Gerlitz 提交于
Instead, use the cached copy of the attributes present on the device. Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
- 19 12月, 2015 4 次提交
-
-
由 Chuck Lever 提交于
The root of the problem was that sends (especially unsignalled FASTREG and LOCAL_INV Work Requests) were not properly flow- controlled, which allowed a send queue overrun. Now that the RPC/RDMA reply handler waits for invalidation to complete, the send queue is properly flow-controlled. Thus this limit is no longer necessary. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Tested-by: NDevesh Sharma <devesh.sharma@avagotech.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
由 Chuck Lever 提交于
FRWR's ro_unmap is asynchronous. The new ro_unmap_sync posts LOCAL_INV Work Requests and waits for them to complete before returning. Note also, DMA unmapping is now done _after_ invalidation. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Tested-by: NDevesh Sharma <devesh.sharma@avagotech.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
由 Chuck Lever 提交于
In the current xprtrdma implementation, some memreg strategies implement ro_unmap synchronously (the MR is knocked down before the method returns) and some asynchonously (the MR will be knocked down and returned to the pool in the background). To guarantee the MR is truly invalid before the RPC consumer is allowed to resume execution, we need an unmap method that is always synchronous, invoked from the RPC/RDMA reply handler. The new method unmaps all MRs for an RPC. The existing ro_unmap method unmaps only one MR at a time. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Tested-by: NDevesh Sharma <devesh.sharma@avagotech.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
由 Chuck Lever 提交于
For FRWR FASTREG and LOCAL_INV, move the ib_*_wr structure off the stack. This allows frwr_op_map and frwr_op_unmap to chain WRs together without limit to register or invalidate a set of MRs with a single ib_post_send(). (This will be for chaining LOCAL_INV requests). Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Tested-by: NDevesh Sharma <devesh.sharma@avagotech.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
- 03 11月, 2015 9 次提交
-
-
由 Chuck Lever 提交于
Forechannel transports get their own "bc_up" method to create an endpoint for the backchannel service. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> [Anna Schumaker: Add forward declaration of struct net to xprt.h] Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
由 Chuck Lever 提交于
Introduce a code path in the rpcrdma_reply_handler() to catch incoming backward direction RPC calls and route them to the ULP's backchannel server. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Reviewed-by: NSagi Grimberg <sagig@mellanox.com> Tested-By: NDevesh Sharma <devesh.sharma@avagotech.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
由 Chuck Lever 提交于
Backward direction RPC replies are sent via the client transport's send_request method, the same way forward direction RPC calls are sent. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Reviewed-by: NSagi Grimberg <sagig@mellanox.com> Tested-By: NDevesh Sharma <devesh.sharma@avagotech.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
由 Chuck Lever 提交于
Pre-allocate extra send and receive Work Requests needed to handle backchannel receives and sends. The transport doesn't know how many extra WRs to pre-allocate until the xprt_setup_backchannel() call, but that's long after the WRs are allocated during forechannel setup. So, use a fixed value for now. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Reviewed-by: NSagi Grimberg <sagig@mellanox.com> Tested-By: NDevesh Sharma <devesh.sharma@avagotech.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
由 Chuck Lever 提交于
xprtrdma's backward direction send and receive buffers are the same size as the forechannel's inline threshold, and must be pre- registered. The consumer has no control over which receive buffer the adapter chooses to catch an incoming backwards-direction call. Any receive buffer can be used for either a forward reply or a backward call. Thus both types of RPC message must all be the same size. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Reviewed-by: NSagi Grimberg <sagig@mellanox.com> Tested-By: NDevesh Sharma <devesh.sharma@avagotech.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
由 Chuck Lever 提交于
The reply tasklet is fast, but it's single threaded. After reply traffic saturates a single CPU, there's no more reply processing capacity. Replace the tasklet with a workqueue to spread reply handling across all CPUs. This also moves RPC/RDMA reply handling out of the soft IRQ context and into a context that allows sleeps. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Reviewed-by: NSagi Grimberg <sagig@mellanox.com> Tested-By: NDevesh Sharma <devesh.sharma@avagotech.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
由 Chuck Lever 提交于
The rb_send_bufs and rb_recv_bufs arrays are used to implement a pair of stacks for keeping track of free rpcrdma_req and rpcrdma_rep structs. Replace those arrays with free lists. To allow more than 512 RPCs in-flight at once, each of these arrays would be larger than a page (assuming 8-byte addresses and 4KB pages). Allowing up to 64K in-flight RPCs (as TCP now does), each buffer array would have to be 128 pages. That's an order-6 allocation. (Not that we're going there.) A list is easier to expand dynamically. Instead of allocating a larger array of pointers and copying the existing pointers to the new array, simply append more buffers to each list. This also makes it simpler to manage receive buffers that might catch backwards-direction calls, or to post receive buffers in bulk to amortize the overhead of ib_post_recv. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Reviewed-by: NSagi Grimberg <sagig@mellanox.com> Reviewed-by: NDevesh Sharma <devesh.sharma@avagotech.com> Tested-By: NDevesh Sharma <devesh.sharma@avagotech.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
由 Chuck Lever 提交于
Clean up: The error cases in rpcrdma_reply_handler() almost never execute. Ensure the compiler places them out of the hot path. No behavior change expected. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Reviewed-by: NSagi Grimberg <sagig@mellanox.com> Reviewed-by: NDevesh Sharma <devesh.sharma@avagotech.com> Tested-By: NDevesh Sharma <devesh.sharma@avagotech.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
由 Chuck Lever 提交于
Commit 8301a2c0 ("xprtrdma: Limit work done by completion handler") was supposed to prevent xprtrdma's upcall handlers from starving other softIRQ work by letting them return to the provider before all CQEs have been polled. The logic assumes the provider will call the upcall handler again immediately if the CQ is re-armed while there are still queued CQEs. This assumption is invalid. The IBTA spec says that after a CQ is armed, the hardware must interrupt only when a new CQE is inserted. xprtrdma can't rely on the provider calling again, even though some providers do. Therefore, leaving CQEs on queue makes sense only when there is another mechanism that ensures all remaining CQEs are consumed in a timely fashion. xprtrdma does not have such a mechanism. If a CQE remains queued, the transport can wait forever to send the next RPC. Finally, move the wcs array back onto the stack to ensure that the poll array is always local to the CPU where the completion upcall is running. Fixes: 8301a2c0 ("xprtrdma: Limit work done by completion ...") Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Reviewed-by: NSagi Grimberg <sagig@mellanox.com> Reviewed-by: NDevesh Sharma <devesh.sharma@avagotech.com> Tested-By: NDevesh Sharma <devesh.sharma@avagotech.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
- 29 10月, 2015 1 次提交
-
-
由 Sagi Grimberg 提交于
Instead of maintaining a fastreg page list, keep an sg table and convert an array of pages to a sg list. Then call ib_map_mr_sg and construct ib_reg_wr. Signed-off-by: NSagi Grimberg <sagig@mellanox.com> Acked-by: NChristoph Hellwig <hch@lst.de> Tested-by: NSteve Wise <swise@opengridcomputing.com> Tested-by: NSelvin Xavier <selvin.xavier@avagotech.com> Reviewed-by: NChuck Lever <chuck.lever@oracle.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
- 25 9月, 2015 1 次提交
-
-
由 Chuck Lever 提交于
The core API has changed so that devices that do not have a global DMA lkey automatically create an mr, per-PD, and make that lkey available. The global DMA lkey interface is going away in favor of the per-PD DMA lkey. The per-PD DMA lkey is always available. Convert xprtrdma to use the device's per-PD DMA lkey for regbufs, no matter which memory registration scheme is in use. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Signed-off-by: NSagi Grimberg <sagig@mellanox.com> Cc: linux-nfs <linux-nfs@vger.kernel.org> Acked-by: NAnna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
- 11 8月, 2015 1 次提交
-
-
由 Chuck Lever 提交于
Both commit 0380a3f3 ("svcrdma: Add a separate "max data segs" macro for svcrdma") and commit 7e5be288 ("svcrdma: advertise the correct max payload") are incorrect. This commit reverts both changes, restoring the server's maximum payload size to 1MB. Commit 7e5be288 based the server's maximum payload on the _client's_ RPCRDMA_MAX_DATA_SEGS value. That was wrong. Commit 0380a3f3 tried to fix this so that the client maximum payload size could be raised without affecting the server, but managed to confuse matters more on the server side. More importantly, limiting the advertised maximum payload size was meant to be a workaround, not the actual fix. We need to revisit https://bugzilla.linux-nfs.org/show_bug.cgi?id=270 A Linux client on a platform with 64KB pages can overrun and crash an x86_64 NFS/RDMA server when the r/wsize is 1MB. An x86/64 Linux client seems to work fine using 1MB reads and writes when the Linux server's maximum payload size is restored to 1MB. BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=270 Fixes: 0380a3f3 ("svcrdma: Add a separate "max data segs" macro") Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
- 06 8月, 2015 5 次提交
-
-
由 Chuck Lever 提交于
RDMA_NOMSG type calls are less efficient than RDMA_MSG. Count NOMSG calls so administrators can tell if they happen to be used more than expected. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Tested-by: NDevesh Sharma <devesh.sharma@avagotech.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
由 Chuck Lever 提交于
RDMA_MSGP type calls insert a zero pad in the middle of the RPC message to align the RPC request's data payload to the server's alignment preferences. A server can then "page flip" the payload into place to avoid a data copy in certain circumstances. However: 1. The client has to have a priori knowledge of the server's preferred alignment 2. Requests eligible for RDMA_MSGP are requests that are small enough to have been sent inline, and convey a data payload at the _end_ of the RPC message Today 1. is done with a sysctl, and is a global setting that is copied during mount. Linux does not support CCP to query the server's preferences (RFC 5666, Section 6). A small-ish NFSv3 WRITE might use RDMA_MSGP, but no NFSv4 compound fits bullet 2. Thus the Linux client currently leaves RDMA_MSGP disabled. The Linux server handles RDMA_MSGP, but does not use any special page flipping, so it confers no benefit. Clean up the marshaling code by removing the logic that constructs RDMA_MSGP type calls. This also reduces the maximum send iovec size from four to just two elements. /proc/sys/sunrpc/rdma_inline_write_padding is a kernel API, and thus is left in place. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Tested-by: NDevesh Sharma <devesh.sharma@avagotech.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
由 Chuck Lever 提交于
Untangle the end of rpcrdma_ia_open() by moving DMA MR set-up, which is different for each registration method, to the .ro_open functions. This is refactoring only. No behavior change is expected. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Tested-by: NDevesh Sharma <devesh.sharma@avagotech.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
由 Chuck Lever 提交于
All HCA providers have an ib_get_dma_mr() verb. Thus rpcrdma_ia_open() will either grab the device's local_dma_key if one is available, or it will call ib_get_dma_mr(). If ib_get_dma_mr() fails, rpcrdma_ia_open() fails and no transport is created. Therefore execution never reaches the ib_reg_phys_mr() call site in rpcrdma_register_internal(), so it can be removed. The remaining logic in rpcrdma_{de}register_internal() is folded into rpcrdma_{alloc,free}_regbuf(). This is clean up only. No behavior change is expected. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Reviewed-by: NDevesh Sharma <devesh.sharma@avagotech.com> Reviewed-By: NSagi Grimberg <sagig@mellanox.com> Tested-by: NDevesh Sharma <devesh.sharma@avagotech.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
由 Chuck Lever 提交于
The point of larger rsize and wsize is to reduce the per-byte cost of memory registration and deregistration. Modern HCAs can typically handle a megabyte or more with a single registration operation. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Reviewed-by: NDevesh Sharma <devesh.sharma@avagotech.com> Reviewed-By: NSagi Grimberg <sagig@mellanox.com> Tested-by: NDevesh Sharma <devesh.sharma@avagotech.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
- 13 6月, 2015 9 次提交
-
-
由 Chuck Lever 提交于
fmr_op_map() declares a 64 element array of u64 in automatic storage. This is 512 bytes (8 * 64) on the stack. Instead, when FMR memory registration is in use, pre-allocate a physaddr array for each rpcrdma_mw. This is a pre-requisite for increasing the r/wsize maximum for FMR on platforms with 4KB pages. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Reviewed-by: NSteve Wise <swise@opengridcomputing.com> Reviewed-by: NSagi Grimberg <sagig@mellanox.com> Reviewed-by: NDevesh Sharma <devesh.sharma@avagotech.com> Tested-By: NDevesh Sharma <devesh.sharma@avagotech.com> Reviewed-by: NDoug Ledford <dledford@redhat.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
由 Chuck Lever 提交于
/proc/lock_stat showed contention between rpcrdma_buffer_get/put and the MR allocation functions during I/O intensive workloads. Now that MRs are no longer allocated in rpcrdma_buffer_get(), there's no reason the rb_mws list has to be managed using the same lock as the send/receive buffers. Split that lock. The new lock does not need to disable interrupts because buffer get/put is never called in an interrupt context. struct rpcrdma_buffer is re-arranged to ensure rb_mwlock and rb_mws are always in a different cacheline than rb_lock and the buffer pointers. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Reviewed-by: NSteve Wise <swise@opengridcomputing.com> Reviewed-by: NSagi Grimberg <sagig@mellanox.com> Tested-By: NDevesh Sharma <devesh.sharma@avagotech.com> Reviewed-by: NDoug Ledford <dledford@redhat.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
由 Chuck Lever 提交于
Clean up: This field is no longer used. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Reviewed-by: NSteve Wise <swise@opengridcomputing.com> Reviewed-by: NSagi Grimberg <sagig@mellanox.com> Reviewed-by: NDevesh Sharma <devesh.sharma@avagotech.com> Tested-By: NDevesh Sharma <devesh.sharma@avagotech.com> Reviewed-by: NDoug Ledford <dledford@redhat.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
由 Chuck Lever 提交于
An RPC can exit at any time. When it does so, xprt_rdma_free() is called, and it calls ->op_unmap(). If ->ro_reset() is running due to a transport disconnect, the two methods can race while processing the same rpcrdma_mw. The results are unpredictable. Because of this, in previous patches I've altered ->ro_map() to handle MR reset. ->ro_reset() is no longer needed and can be removed. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Reviewed-by: NSteve Wise <swise@opengridcomputing.com> Reviewed-by: NSagi Grimberg <sagig@mellanox.com> Reviewed-by: NDevesh Sharma <devesh.sharma@avagotech.com> Tested-By: NDevesh Sharma <devesh.sharma@avagotech.com> Reviewed-by: NDoug Ledford <dledford@redhat.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
由 Chuck Lever 提交于
After a transport disconnect, FRMRs can be left in an undetermined state. In particular, the MR's rkey is no good. Currently, FRMRs are fixed up by the transport connect worker, but that can race with ->ro_unmap if an RPC happens to exit while the transport connect worker is running. A better way of dealing with broken FRMRs is to detect them before they are re-used by ->ro_map. Such FRMRs are either already invalid or are owned by the sending RPC, and thus no race with ->ro_unmap is possible. Introduce a mechanism for handing broken FRMRs to a workqueue to be reset in a context that is appropriate for allocating resources (ie. an ib_alloc_fast_reg_mr() API call). This mechanism is not yet used, but will be in subsequent patches. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Reviewed-by: NSteve Wise <swise@opengridcomputing.com> Reviewed-By: NDevesh Sharma <devesh.sharma@avagotech.com> Tested-By: NDevesh Sharma <devesh.sharma@avagotech.com> Reviewed-by: NDoug Ledford <dledford@redhat.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
由 Chuck Lever 提交于
We eventually want to handle allocating MWs one at a time, as needed, instead of grabbing 64 and throwing them at each RPC in the pipeline. Add a helper for grabbing an MW off rb_mws, and a helper for returning an MW to rb_mws. These will be used in a subsequent patch. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Reviewed-by: NSteve Wise <swise@opengridcomputing.com> Reviewed-by: NSagi Grimberg <sagig@mellanox.com> Tested-By: NDevesh Sharma <devesh.sharma@avagotech.com> Reviewed-by: NDoug Ledford <dledford@redhat.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
由 Chuck Lever 提交于
The connect worker can replace ri_id, but prevents ri_id->device from changing during the lifetime of a transport instance. The old ID is kept around until a new ID is created and the ->device is confirmed to be the same. Cache a copy of ri_id->device in rpcrdma_ia and in rpcrdma_rep. The cached copy can be used safely in code that does not serialize with the connect worker. Other code can use it to save an extra address generation (one pointer dereference instead of two). Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Reviewed-by: NSteve Wise <swise@opengridcomputing.com> Tested-By: NDevesh Sharma <devesh.sharma@avagotech.com> Reviewed-by: NDoug Ledford <dledford@redhat.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
由 Chuck Lever 提交于
A posted rpcrdma_rep never has rr_func set to anything but rpcrdma_reply_handler. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Tested-By: NDevesh Sharma <devesh.sharma@avagotech.com> Reviewed-by: NDoug Ledford <dledford@redhat.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
由 Chuck Lever 提交于
Clean up: Instead of carrying a pointer to the buffer pool and the rpc_xprt, carry a pointer to the controlling rpcrdma_xprt. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Reviewed-by: NSteve Wise <swise@opengridcomputing.com> Reviewed-by: NSagi Grimberg <sagig@mellanox.com> Tested-By: NDevesh Sharma <devesh.sharma@avagotech.com> Reviewed-by: NDoug Ledford <dledford@redhat.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-