- 18 5月, 2016 18 次提交
-
-
由 Chuck Lever 提交于
In a cluster failover scenario, it is desirable for the client to attempt to reconnect quickly, as an alternate NFS server is already waiting to take over for the down server. The client can't see that a server IP address has moved to a new server until the existing connection is gone. For fabrics and devices where it is meaningful, set a definite upper bound on the amount of time before it is determined that a connection is no longer valid. This allows the RPC client to detect connection loss in a timely matter, then perform a fresh resolution of the server GUID in case it has changed (cluster failover). Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Tested-by: NSteve Wise <swise@opengridcomputing.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
由 Chuck Lever 提交于
Clean up: The ro_unmap method is no longer used. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Tested-by: NSteve Wise <swise@opengridcomputing.com> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
由 Chuck Lever 提交于
There needs to be a safe method of releasing registered memory resources when an RPC terminates. Safe can mean a number of things: + Doesn't have to sleep + Doesn't rely on having a QP in RTS ro_unmap_safe will be that safe method. It can be used in cases where synchronous memory invalidation can deadlock, or needs to have an active QP. The important case is fencing an RPC's memory regions after it is signaled (^C) and before it exits. If this is not done, there is a window where the server can write an RPC reply into memory that the client has released and re-used for some other purpose. Note that this is a full solution for FRWR, but FMR and physical still have some gaps where a particularly bad server can wreak some havoc on the client. These gaps are not made worse by this patch and are expected to be exceptionally rare and timing-based. They are noted in documenting comments. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Tested-by: NSteve Wise <swise@opengridcomputing.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
由 Chuck Lever 提交于
Separate the DMA unmap operation from freeing the MW. In a subsequent patch they will not always be done at the same time, and they are not related operations (except by order; freeing the MW must be the last step during invalidation). Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Tested-by: NSteve Wise <swise@opengridcomputing.com> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
由 Chuck Lever 提交于
In a subsequent patch, the fr_xprt and fr_worker fields will be needed by another memory registration mode. Move them into the generic rpcrdma_mw structure that wraps struct rpcrdma_frmr. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Tested-by: NSteve Wise <swise@opengridcomputing.com> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
由 Chuck Lever 提交于
Maintain the order of invalidation and DMA unmapping when doing a background MR reset. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Tested-by: NSteve Wise <swise@opengridcomputing.com> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
由 Chuck Lever 提交于
frwr_op_unmap_sync() is now invoked in a workqueue context, the same as __frwr_queue_recovery(). There's no need to defer MR reset if posting LOCAL_INV MRs fails. This means that even when ib_post_send() fails (which should occur very rarely) the invalidation and DMA unmapping steps are still done in the correct order. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Tested-by: NSteve Wise <swise@opengridcomputing.com> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
由 Chuck Lever 提交于
Move the the I/O direction field from rpcrdma_mr_seg into the rpcrdma_frmr. This makes it possible to DMA-unmap the frwr long after an RPC has exited and its rpcrdma_mr_seg array has been released and re-used. This might occur if an RPC times out while waiting for a new connection to be established. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Tested-by: NSteve Wise <swise@opengridcomputing.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
由 Chuck Lever 提交于
Clean up: Follow same naming convention as other fields in struct rpcrdma_frwr. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Tested-by: NSteve Wise <swise@opengridcomputing.com> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
由 Chuck Lever 提交于
Clean up: Replace rpcrdma_flush_cqs() and rpcrdma_clean_cqs() with the new ib_drain_qp() API. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Reviewed-By: NLeon Romanovsky <leonro@mellanox.com> Tested-by: NSteve Wise <swise@opengridcomputing.com> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
由 Chuck Lever 提交于
rpcrdma_create_chunks() has been replaced, and can be removed. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Tested-by: NSteve Wise <swise@opengridcomputing.com> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
由 Chuck Lever 提交于
rpcrdma_marshal_req() makes a simplifying assumption: that NFS operations with large Call messages have small Reply messages, and vice versa. Therefore with RPC-over-RDMA, only one chunk type is ever needed for each Call/Reply pair, because one direction needs chunks, the other direction will always fit inline. In fact, this assumption is asserted in the code: if (rtype != rpcrdma_noch && wtype != rpcrdma_noch) { dprintk("RPC: %s: cannot marshal multiple chunk lists\n", __func__); return -EIO; } But RPCGSS_SEC breaks this assumption. Because krb5i and krb5p perform data transformation on RPC messages before they are transmitted, direct data placement techniques cannot be used, thus RPC messages must be sent via a Long call in both directions. All such calls are sent with a Position Zero Read chunk, and all such replies are handled with a Reply chunk. Thus the client must provide every Call/Reply pair with both a Read list and a Reply chunk. Without any special security in effect, NFSv4 WRITEs may now also use the Read list and provide a Reply chunk. The marshal_req logic was preventing that, meaning an NFSv4 WRITE with a large payload that included a GETATTR result larger than the inline threshold would fail. The code that encodes each chunk list is now completely contained in its own function. There is some code duplication, but the trade-off is that the overall logic should be more clear. Note that all three chunk lists now share the rl_segments array. Some additional per-req accounting is necessary to track this usage. For the same reasons that the above simplifying assumption has held true for so long, I don't expect more array elements are needed at this time. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Tested-by: NSteve Wise <swise@opengridcomputing.com> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
由 Chuck Lever 提交于
Update documenting comments to reflect code changes over the past year. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Tested-by: NSteve Wise <swise@opengridcomputing.com> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
由 Chuck Lever 提交于
Avoid the latency and interrupt overhead of registering a Write chunk when handling NFS READ requests of a few hundred bytes or less. This change does not interoperate with Linux NFS/RDMA servers that do not have commit 9d11b51c ('svcrdma: Fix send_reply() scatter/gather set-up'). Commit 9d11b51c was introduced in v4.3, and is included in 4.2.y, 4.1.y, and 3.18.y. Oracle bug 22925946 has been filed to request that the above fix be included in the Oracle Linux UEK4 NFS/RDMA server. Red Hat bugzillas 1327280 and 1327554 have been filed to request that RHEL NFS/RDMA server backports include the above fix. Workaround: Replace the "proto=rdma,port=20049" mount options with "proto=tcp" until commit 9d11b51c is applied to your NFS server. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Tested-by: NSteve Wise <swise@opengridcomputing.com> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
由 Chuck Lever 提交于
When deciding whether to send a Call inline, rpcrdma_marshal_req doesn't take into account header bytes consumed by chunk lists. This results in Call messages on the wire that are sometimes larger than the inline threshold. Likewise, when a Write list or Reply chunk is in play, the server's reply has to emit an RDMA Send that includes a larger-than-minimal RPC-over-RDMA header. The actual size of a Call message cannot be estimated until after the chunk lists have been registered. Thus the size of each RPC-over-RDMA header can be estimated only after chunks are registered; but the decision to register chunks is based on the size of that header. Chicken, meet egg. The best a client can do is estimate header size based on the largest header that might occur, and then ensure that inline content is always smaller than that. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Tested-by: NSteve Wise <swise@opengridcomputing.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
由 Chuck Lever 提交于
Send buffer space is shared between the RPC-over-RDMA header and an RPC message. A large RPC-over-RDMA header means less space is available for the associated RPC message, which then has to be moved via an RDMA Read or Write. As more segments are added to the chunk lists, the header increases in size. Typical modern hardware needs only a few segments to convey the maximum payload size, but some devices and registration modes may need a lot of segments to convey data payload. Sometimes so many are needed that the remaining space in the Send buffer is not enough for the RPC message. Sending such a message usually fails. To ensure a transport can always make forward progress, cap the number of RDMA segments that are allowed in chunk lists. This prevents less-capable devices and memory registrations from consuming a large portion of the Send buffer by reducing the maximum data payload that can be conveyed with such devices. For now I choose an arbitrary maximum of 8 RDMA segments. This allows a maximum size RPC-over-RDMA header to fit nicely in the current 1024 byte inline threshold with over 700 bytes remaining for an inline RPC message. The current maximum data payload of NFS READ or WRITE requests is one megabyte. To convey that payload on a client with 4KB pages, each chunk segment would need to handle 32 or more data pages. This is well within the capabilities of FMR. For physical registration, the maximum payload size on platforms with 4KB pages is reduced to 32KB. For FRWR, a device's maximum page list depth would need to be at least 34 to support the maximum 1MB payload. A device with a smaller maximum page list depth means the maximum data payload is reduced when using that device. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Tested-by: NSteve Wise <swise@opengridcomputing.com> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
由 Chuck Lever 提交于
Currently the sysctls that allow setting the inline threshold allow any value to be set. Small values only make the transport run slower. The default 1KB setting is as low as is reasonable. And the logic that decides how to divide a Send buffer between RPC-over-RDMA header and RPC message assumes (but does not check) that the lower bound is not crazy (say, 57 bytes). Send and receive buffers share a page with some control information. Values larger than about 3KB can't be supported, currently. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Tested-by: NSteve Wise <swise@opengridcomputing.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
由 Chuck Lever 提交于
RPC-over-RDMA transports have a limit on how large a backward direction (backchannel) RPC message can be. Ensure that the NFSv4.x CREATE_SESSION operation advertises this limit to servers. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Tested-by: NSteve Wise <swise@opengridcomputing.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
- 15 3月, 2016 9 次提交
-
-
由 Chuck Lever 提交于
Calling ib_poll_cq() to sort through WCs during a completion is a common pattern amongst RDMA consumers. Since commit 14d3a3b2 ("IB: add a proper completion queue abstraction"), WC sorting can be handled by the IB core. By converting to this new API, xprtrdma is made a better neighbor to other RDMA consumers, as it allows the core to schedule the delivery of completions more fairly amongst all active consumers. Because each ib_cqe carries a pointer to a completion method, the core can now post its own operations on a consumer's QP, and handle the completions itself, without changes to the consumer. Send completions were previously handled entirely in the completion upcall handler (ie, deferring to a process context is unneeded). Thus IB_POLL_SOFTIRQ is a direct replacement for the current xprtrdma send code path. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Reviewed-by: NDevesh Sharma <devesh.sharma@broadcom.com> Reviewed-by: NSagi Grimberg <sagig@mellanox.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
由 Chuck Lever 提交于
Clean up: Make code more readable. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Reviewed-by: NDevesh Sharma <devesh.sharma@broadcom.com> Reviewed-by: NSagi Grimberg <sagig@mellanox.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
由 Chuck Lever 提交于
Calling ib_poll_cq() to sort through WCs during a completion is a common pattern amongst RDMA consumers. Since commit 14d3a3b2 ("IB: add a proper completion queue abstraction"), WC sorting can be handled by the IB core. By converting to this new API, xprtrdma is made a better neighbor to other RDMA consumers, as it allows the core to schedule the delivery of completions more fairly amongst all active consumers. Because each ib_cqe carries a pointer to a completion method, the core can now post its own operations on a consumer's QP, and handle the completions itself, without changes to the consumer. xprtrdma's reply processing is already handled in a work queue, but there is some initial order-dependent processing that is done in the soft IRQ context before a work item is scheduled. IB_POLL_SOFTIRQ is a direct replacement for the current xprtrdma receive code path. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Reviewed-by: NDevesh Sharma <devesh.sharma@broadcom.com> Reviewed-by: NSagi Grimberg <sagig@mellanox.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
由 Chuck Lever 提交于
Commit fe97b47c ("xprtrdma: Use workqueue to process RPC/RDMA replies") replaced the reply tasklet with a workqueue that allows RPC replies to be processed in parallel. Thus the credit values in RPC-over-RDMA replies can be applied in a different order than in which the server sent them. To fix this, revert commit eba8ff66 ("xprtrdma: Move credit update to RPC reply handler"). Reverting is done by hand to accommodate code changes that have occurred since then. Fixes: fe97b47c ("xprtrdma: Use workqueue to process . . .") Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Reviewed-by: NSagi Grimberg <sagig@mellanox.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
由 Chuck Lever 提交于
These are shorter than RPCRDMA_HDRLEN_MIN, and they need to complete the waiting RPC. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Reviewed-by: NSagi Grimberg <sagig@mellanox.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
由 Chuck Lever 提交于
If ib_post_send() in ro_unmap_sync() fails, the WRs have not been posted, no completions will fire, and wait_for_completion() will wait forever. Skip the wait in that case. To ensure the MRs are invalid, disconnect. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
由 Chuck Lever 提交于
A single memory allocation is used for the pair of buffers wherein the RPC client builds an RPC call message and decodes its matching reply. These buffers are sized based on the maximum possible size of the RPC call and reply messages for the operation in progress. This means that as the call buffer increases in size, the start of the reply buffer is pushed farther into the memory allocation. RPC requests are growing in size. It used to be that both the call and reply buffers fit inside a single page. But these days, thanks to NFSv4 (and especially security labels in NFSv4.2) the maximum call and reply sizes are large. NFSv4.0 OPEN, for example, now requires a 6KB allocation for a pair of call and reply buffers, and NFSv4 LOOKUP is not far behind. As the maximum size of a call increases, the reply buffer is pushed far enough into the buffer's memory allocation that a page boundary can appear in the middle of it. When the maximum possible reply size is larger than the client's RDMA receive buffers (currently 1KB), the client has to register a Reply chunk for the server to RDMA Write the reply into. The logic in rpcrdma_convert_iovs() assumes that xdr_buf head and tail buffers would always be contained on a single page. It supplies just one segment for the head and one for the tail. FMR, for example, registers up to a page boundary (only a portion of the reply buffer in the OPEN case above). But without additional segments, it doesn't register the rest of the buffer. When the server tries to write the OPEN reply, the RDMA Write fails with a remote access error since the client registered only part of the Reply chunk. rpcrdma_convert_iovs() must split the XDR buffer into multiple segments, each of which are guaranteed not to contain a page boundary. That way fmr_op_map is given the proper number of segments to register the whole reply buffer. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Reviewed-by: NDevesh Sharma <devesh.sharma@broadcom.com> Reviewed-by: NSagi Grimberg <sagig@mellanox.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
由 Chuck Lever 提交于
Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Reviewed-by: NSagi Grimberg <sagig@mellanox.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
由 Chuck Lever 提交于
physical_op_unmap{_sync} don't use mr_nsegs, so don't bother to set it in physical_op_map. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Reviewed-by: NSagi Grimberg <sagig@mellanox.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
- 02 3月, 2016 11 次提交
-
-
由 Chuck Lever 提交于
Calling ib_poll_cq() to sort through WCs during a completion is a common pattern amongst RDMA consumers. Since commit 14d3a3b2 ("IB: add a proper completion queue abstraction"), WC sorting can be handled by the IB core. By converting to this new API, svcrdma is made a better neighbor to other RDMA consumers, as it allows the core to schedule the delivery of completions more fairly amongst all active consumers. This new API also aims each completion at a function that is specific to the WR's opcode. Thus the ctxt->wr_op field and the switch in process_context is replaced by a set of methods that handle each completion type. Because each ib_cqe carries a pointer to a completion method, the core can now post operations on a consumer's QP, and handle the completions itself. The server's rdma_stat_sq_poll and rdma_stat_sq_prod metrics are no longer updated. As a clean up, the cq_event_handler, the dto_tasklet, and all associated locking is removed, as they are no longer referenced or used. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Tested-by: NSteve Wise <swise@opengridcomputing.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Chuck Lever 提交于
Calling ib_poll_cq() to sort through WCs during a completion is a common pattern amongst RDMA consumers. Since commit 14d3a3b2 ("IB: add a proper completion queue abstraction"), WC sorting can be handled by the IB core. By converting to this new API, svcrdma is made a better neighbor to other RDMA consumers, as it allows the core to schedule the delivery of completions more fairly amongst all active consumers. Because each ib_cqe carries a pointer to a completion method, the core can now post operations on a consumer's QP, and handle the completions itself. svcrdma receive completions no longer use the dto_tasklet. Each polled Receive WC is now handled individually in soft IRQ context. The server transport's rdma_stat_rq_poll and rdma_stat_rq_prod metrics are no longer updated. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Chuck Lever 提交于
Clean up: close_out is reached only when ctxt == NULL and XPT_CLOSE is already set. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Reviewed-by: NDevesh Sharma <devesh.sharma@broadcom.com> Tested-by: NDevesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Chuck Lever 提交于
RFC 5666 Section 4.2 states: > When the peer detects an RPC-over-RDMA header version that it does > not support (currently this document defines only version 1), it > replies with an error code of ERR_VERS, and provides the low and > high inclusive version numbers it does, in fact, support. And: > When other decoding errors are detected in the header or chunks, > either an RPC decode error MAY be returned or the RPC/RDMA error > code ERR_CHUNK MUST be returned. The Linux NFS server does throw ERR_VERS when a client sends it a request whose rdma_version is not "one." But it does not return ERR_CHUNK when a header decoding error occurs. It just drops the request. To improve protocol extensibility, it should reject invalid values in the rdma_proc field instead of treating them all like RDMA_MSG. Otherwise clients can't detect when the server doesn't support new rdma_proc values. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Reviewed-by: NDevesh Sharma <devesh.sharma@broadcom.com> Tested-by: NDevesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Chuck Lever 提交于
When constructing an error reply, svc_rdma_xdr_encode_error() needs to view the client's request message so it can get the failing request's XID. svc_rdma_xdr_decode_req() is supposed to return a pointer to the client's request header. But if it fails to decode the client's message (and thus an error reply is needed) it does not return the pointer. The server then sends a bogus XID in the error reply. Instead, unconditionally generate the pointer to the client's header in svc_rdma_recvfrom(), and pass that pointer to both functions. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Reviewed-by: NDevesh Sharma <devesh.sharma@broadcom.com> Tested-by: NDevesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Chuck Lever 提交于
Fix several issues with svc_rdma_send_error(): - Post a receive buffer to replace the one that was consumed by the incoming request - Posting a send should use DMA_TO_DEVICE, not DMA_FROM_DEVICE - No need to put_page _and_ free pages in svc_rdma_put_context - Make sure the sge is set up completely in case the error path goes through svc_rdma_unmap_dma() - Replace the use of ENOSYS, which has a reserved meaning Related fixes in svc_rdma_recvfrom(): - Don't leak the ctxt associated with the incoming request - Don't close the connection after sending an error reply - Let svc_rdma_send_error() figure out the right header error code As a last clean up, move svc_rdma_send_error() to svc_rdma_sendto.c with other similar functions. There is some common logic in these functions that could someday be combined to reduce code duplication. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Reviewed-by: NDevesh Sharma <devesh.sharma@broadcom.com> Tested-by: NDevesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Chuck Lever 提交于
Clean up: Most svc_rdma_post_recv() call sites close the transport connection when a receive cannot be posted. Wrap that in a common helper. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Reviewed-by: NDevesh Sharma <devesh.sharma@broadcom.com> Tested-by: NDevesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Chuck Lever 提交于
Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Chuck Lever 提交于
The NFS server's XDR encoders adds an XDR pad for content in the xdr_buf page list at the beginning of the xdr_buf's tail buffer. On RDMA transports, Write chunks are sent separately and without an XDR pad. If a Write chunk is being sent, strip off the pad in the tail buffer so that inline content following the Write chunk remains XDR-aligned when it is sent to the client. BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=294Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Chuck Lever 提交于
When the Linux NFS server writes an odd-length data item into a Write chunk, it finishes with XDR pad bytes. If the data item is smaller than the Write chunk, the pad bytes are written at the end of the data item, but still inside the chunk (ie, in the application's buffer). Since this is direct data placement, that exposes the pad bytes. XDR pad bytes are inserted in order to preserve the XDR alignment of the next XDR data item in an XDR stream. But Write chunks do not appear in the payload XDR stream, and only one data item is allowed in each chunk. Thus XDR padding is not needed in a Write chunk. With NFSv4, the Linux NFS server places the results of any operations that follow an NFSv4 READ or READLINK in the xdr_buf's tail. Those results also should never be sent as a part of a Write chunk. The current logic in send_write_chunks() appears to assume that the xdr_buf's tail contains only pad bytes (ie, NFSv3). The server should write only the contents of the xdr_buf's page list in a Write chunk. If there's more than an XDR pad in the tail, that needs to go inline or in the Reply chunk. BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=294Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Chuck Lever 提交于
The client provides the location of Write chunks into which the server writes bulk payload. The client provides these when the Upper Layer Protocol wants direct data placement and the Binding allows it. (For NFS, this is READ and READLINK operations). The client also provides the location of a Reply chunk into which the server writes the non-bulk part of an RPC reply. The client provides this chunk whenever it believes the reply can be larger than its receive buffers. The server then uses the presence of these chunks to determine how it will form its reply message. svc_rdma_sendto() was looking for Write and Reply chunks multiple times for every reply message. It would be more efficient to do it just once. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
- 17 2月, 2016 1 次提交
-
-
由 Chuck Lever 提交于
Some NFSv4.1 OPEN requests were hanging waiting for the NFS server to finish recalling delegations. Turns out that each NFSv4.1 CB request on RDMA gets a GARBAGE_ARGS reply from the Linux client. Commit 756b9b37 added a line in bc_svc_process that overwrites the incoming rq_rcv_buf's length with the value in rq_private_buf.len. But rpcrdma_bc_receive_call() does not invoke xprt_complete_bc_request(), thus rq_private_buf.len is not initialized. svc_process_common() is invoked with a zero-length RPC message, and fails. Fixes: 756b9b37 ('SUNRPC: Fix callback channel') Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
- 20 1月, 2016 1 次提交
-
-
由 Christoph Hellwig 提交于
We now alwasy have a per-PD local_dma_lkey available. Make use of that fact in svc_rdma and stop registering our own MR. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NSagi Grimberg <sagig@mellanox.com> Reviewed-by: NJason Gunthorpe <jgunthorpe@obsidianresearch.com> Reviewed-by: NChuck Lever <chuck.lever@oracle.com> Reviewed-by: NSteve Wise <swise@opengridcomputing.com> Acked-by: NJ. Bruce Fields <bfields@redhat.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-