- 14 5月, 2016 1 次提交
-
-
由 Chuck Lever 提交于
Clean up: Pass in just the piece of the svc_rqst that is needed here. While we're in the area, add an informative documenting comment. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
- 02 3月, 2016 6 次提交
-
-
由 Chuck Lever 提交于
Calling ib_poll_cq() to sort through WCs during a completion is a common pattern amongst RDMA consumers. Since commit 14d3a3b2 ("IB: add a proper completion queue abstraction"), WC sorting can be handled by the IB core. By converting to this new API, svcrdma is made a better neighbor to other RDMA consumers, as it allows the core to schedule the delivery of completions more fairly amongst all active consumers. This new API also aims each completion at a function that is specific to the WR's opcode. Thus the ctxt->wr_op field and the switch in process_context is replaced by a set of methods that handle each completion type. Because each ib_cqe carries a pointer to a completion method, the core can now post operations on a consumer's QP, and handle the completions itself. The server's rdma_stat_sq_poll and rdma_stat_sq_prod metrics are no longer updated. As a clean up, the cq_event_handler, the dto_tasklet, and all associated locking is removed, as they are no longer referenced or used. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Tested-by: NSteve Wise <swise@opengridcomputing.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Chuck Lever 提交于
Calling ib_poll_cq() to sort through WCs during a completion is a common pattern amongst RDMA consumers. Since commit 14d3a3b2 ("IB: add a proper completion queue abstraction"), WC sorting can be handled by the IB core. By converting to this new API, svcrdma is made a better neighbor to other RDMA consumers, as it allows the core to schedule the delivery of completions more fairly amongst all active consumers. Because each ib_cqe carries a pointer to a completion method, the core can now post operations on a consumer's QP, and handle the completions itself. svcrdma receive completions no longer use the dto_tasklet. Each polled Receive WC is now handled individually in soft IRQ context. The server transport's rdma_stat_rq_poll and rdma_stat_rq_prod metrics are no longer updated. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Chuck Lever 提交于
When constructing an error reply, svc_rdma_xdr_encode_error() needs to view the client's request message so it can get the failing request's XID. svc_rdma_xdr_decode_req() is supposed to return a pointer to the client's request header. But if it fails to decode the client's message (and thus an error reply is needed) it does not return the pointer. The server then sends a bogus XID in the error reply. Instead, unconditionally generate the pointer to the client's header in svc_rdma_recvfrom(), and pass that pointer to both functions. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Reviewed-by: NDevesh Sharma <devesh.sharma@broadcom.com> Tested-by: NDevesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Chuck Lever 提交于
Fix several issues with svc_rdma_send_error(): - Post a receive buffer to replace the one that was consumed by the incoming request - Posting a send should use DMA_TO_DEVICE, not DMA_FROM_DEVICE - No need to put_page _and_ free pages in svc_rdma_put_context - Make sure the sge is set up completely in case the error path goes through svc_rdma_unmap_dma() - Replace the use of ENOSYS, which has a reserved meaning Related fixes in svc_rdma_recvfrom(): - Don't leak the ctxt associated with the incoming request - Don't close the connection after sending an error reply - Let svc_rdma_send_error() figure out the right header error code As a last clean up, move svc_rdma_send_error() to svc_rdma_sendto.c with other similar functions. There is some common logic in these functions that could someday be combined to reduce code duplication. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Reviewed-by: NDevesh Sharma <devesh.sharma@broadcom.com> Tested-by: NDevesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Chuck Lever 提交于
Clean up: Most svc_rdma_post_recv() call sites close the transport connection when a receive cannot be posted. Wrap that in a common helper. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Reviewed-by: NDevesh Sharma <devesh.sharma@broadcom.com> Tested-by: NDevesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Chuck Lever 提交于
The NFS server's XDR encoders adds an XDR pad for content in the xdr_buf page list at the beginning of the xdr_buf's tail buffer. On RDMA transports, Write chunks are sent separately and without an XDR pad. If a Write chunk is being sent, strip off the pad in the tail buffer so that inline content following the Write chunk remains XDR-aligned when it is sent to the client. BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=294Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
- 20 1月, 2016 8 次提交
-
-
由 Christoph Hellwig 提交于
We now alwasy have a per-PD local_dma_lkey available. Make use of that fact in svc_rdma and stop registering our own MR. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NSagi Grimberg <sagig@mellanox.com> Reviewed-by: NJason Gunthorpe <jgunthorpe@obsidianresearch.com> Reviewed-by: NChuck Lever <chuck.lever@oracle.com> Reviewed-by: NSteve Wise <swise@opengridcomputing.com> Acked-by: NJ. Bruce Fields <bfields@redhat.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Chuck Lever 提交于
To support the server-side of an NFSv4.1 backchannel on RDMA connections, add a transport class that enables backward direction messages on an existing forward channel connection. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Acked-by: NBruce Fields <bfields@fieldses.org> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Chuck Lever 提交于
Extra resources for handling backchannel requests have to be pre-allocated when a transport instance is created. Set up additional fields in svcxprt_rdma to track these resources. The max_requests fields are elements of the RPC-over-RDMA protocol, so they should be u32. To ensure that unsigned arithmetic is used everywhere, some other fields in the svcxprt_rdma struct are updated. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Acked-by: NBruce Fields <bfields@fieldses.org> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Chuck Lever 提交于
Pre-requisite to use map_xdr in the backchannel code. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Acked-by: NBruce Fields <bfields@fieldses.org> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Chuck Lever 提交于
svc_rdma_post_recv() allocates pages for receive buffers on-demand. It uses GFP_KERNEL so the allocator tries hard, and may sleep. But I'm about to add a call to svc_rdma_post_recv() from a function that may not sleep. Since all svc_rdma_post_recv() call sites can tolerate its failure, allow it to fail if the page allocator returns nothing. Longer term, receive buffers, being a finite resource per-connection, should be pre-allocated and re-used. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Acked-by: NBruce Fields <bfields@fieldses.org> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Chuck Lever 提交于
Clean up. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Acked-by: NBruce Fields <bfields@fieldses.org> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Chuck Lever 提交于
To ensure this allocation cannot fail and will not sleep, pre-allocate the req_map structures per-connection. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Acked-by: NBruce Fields <bfields@fieldses.org> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Chuck Lever 提交于
When the maximum payload size of NFS READ and WRITE was increased by commit cc9a903d ("svcrdma: Change maximum server payload back to RPCSVC_MAXPAYLOAD"), the size of struct svc_rdma_op_ctxt increased to over 6KB (on x86_64). That makes allocating one of these from a kmem_cache more likely to fail in situations when system memory is exhausted. Since I'm about to add a caller where this allocation must always work _and_ it cannot sleep, pre-allocate ctxts for each connection. Another motivation for this change is that NFSv4.x servers are required by specification not to drop NFS requests. Pre-allocating memory resources reduces the likelihood of a drop. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Acked-by: NBruce Fields <bfields@fieldses.org> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
- 03 11月, 2015 1 次提交
-
-
由 Chuck Lever 提交于
On NFSv4.1 mount points, the Linux NFS client uses this transport endpoint to receive backward direction calls and route replies back to the NFSv4.1 server. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Acked-by: N"J. Bruce Fields" <bfields@fieldses.org> Reviewed-by: NSagi Grimberg <sagig@mellanox.com> Tested-By: NDevesh Sharma <devesh.sharma@avagotech.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
- 29 10月, 2015 1 次提交
-
-
由 Sagi Grimberg 提交于
Instead of maintaining a fastreg page list, keep an sg table and convert an array of pages to a sg list. Then call ib_map_mr_sg and construct ib_reg_wr. Signed-off-by: NSagi Grimberg <sagig@mellanox.com> Acked-by: NChristoph Hellwig <hch@lst.de> Tested-by: NSteve Wise <swise@opengridcomputing.com> Tested-by: NSelvin Xavier <selvin.xavier@avagotech.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
- 29 8月, 2015 1 次提交
-
-
由 Steve Wise 提交于
Signed-off-by: NSteve Wise <swise@opengridcomputing.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
- 11 8月, 2015 1 次提交
-
-
由 Chuck Lever 提交于
Both commit 0380a3f3 ("svcrdma: Add a separate "max data segs" macro for svcrdma") and commit 7e5be288 ("svcrdma: advertise the correct max payload") are incorrect. This commit reverts both changes, restoring the server's maximum payload size to 1MB. Commit 7e5be288 based the server's maximum payload on the _client's_ RPCRDMA_MAX_DATA_SEGS value. That was wrong. Commit 0380a3f3 tried to fix this so that the client maximum payload size could be raised without affecting the server, but managed to confuse matters more on the server side. More importantly, limiting the advertised maximum payload size was meant to be a workaround, not the actual fix. We need to revisit https://bugzilla.linux-nfs.org/show_bug.cgi?id=270 A Linux client on a platform with 64KB pages can overrun and crash an x86_64 NFS/RDMA server when the r/wsize is 1MB. An x86/64 Linux client seems to work fine using 1MB reads and writes when the Linux server's maximum payload size is restored to 1MB. BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=270 Fixes: 0380a3f3 ("svcrdma: Add a separate "max data segs" macro") Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
- 21 7月, 2015 2 次提交
-
-
由 Chuck Lever 提交于
Commit 0bf48289 ("svcrdma: refactor marshalling logic") removed the last call site for svc_rdma_fastreg(). Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Chuck Lever 提交于
Kernel coding conventions frown upon having large nontrivial functions in header files, and the preference these days is to allow the compiler to make inlining decisions if possible. As these functions are re-homed into a .c file, be sure that comparisons with fields in struct rpcrdma_msg are with be32 constants. This is a refactoring change; no behavior change is intended. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
- 05 6月, 2015 3 次提交
-
-
由 Chuck Lever 提交于
The server and client maximum are architecturally independent. Allow changing one without affecting the other. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Chuck Lever 提交于
At the 2015 LSF/MM, it was requested that memory allocation call sites that request GFP_KERNEL allocations in a loop should be annotated with __GFP_NOFAIL. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Chuck Lever 提交于
Fields in struct rpcrdma_msg are __be32. Don't byte-swap these fields when decoding RPC calls and then swap them back for the reply. For the most part, they can be left alone. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
- 04 6月, 2015 1 次提交
-
-
由 Chuck Lever 提交于
svc_rdma_xdr_decode_deferred_req() indexes an array with an un-byte-swapped value off the wire. Fortunately this function isn't used anywhere, so simply remove it. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
- 30 1月, 2015 1 次提交
-
-
由 Chuck Lever 提交于
Clean up: Replace htonl and ntohl with the be32 equivalents. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Reviewed-by: NSteve Wise <swise@opengridcomputing.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
- 16 1月, 2015 3 次提交
-
-
由 Chuck Lever 提交于
Currently the Linux server can not decode RDMA_NOMSG type requests. Operations whose length exceeds the fixed size of RDMA SEND buffers, like large NFSv4 CREATE(NF4LNK) operations, must be conveyed via RDMA_NOMSG. For an RDMA_MSG type request, the client sends the RPC/RDMA, RPC headers, and some or all of the NFS arguments via RDMA SEND. For an RDMA_NOMSG type request, the client sends just the RPC/RDMA header via RDMA SEND. The request's read list contains elements for the entire RPC message, including the RPC header. NFSD expects the RPC/RMDA header and RPC header to be contiguous in page zero of the XDR buffer. Add logic in the RDMA READ path to make the read list contents land where the server prefers, when the incoming message is a type RDMA_NOMSG message. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Reviewed-by: NSteve Wise <swise@opengridcomputing.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Chuck Lever 提交于
The RDMA reader function doesn't change once an svcxprt_rdma is instantiated. Instead of checking sc_devcap during every incoming RPC, set the reader function once when the connection is accepted. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Reviewed-by: NSteve Wise <swise@opengridcomputing.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Chuck Lever 提交于
The byte_count argument is not used, and the function is called only from one place. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Reviewed-by: NSteve Wise <swise@opengridcomputing.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
- 24 7月, 2014 1 次提交
-
-
由 Chuck Lever 提交于
The RDMA credit limit controls how many concurrent RPCs are allowed per connection. An NFS/RDMA client and server exchange their credit limits in the RPC/RDMA headers. The Linux client and the Solaris client and server allow 32 credits. The Linux server allows only 16, which limits its performance. Set the server's default credit limit to 32, like the other well- known implementations, so the out-of-the-shrinkwrap performance of the Linux server is better. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
- 07 6月, 2014 1 次提交
-
-
由 Steve Wise 提交于
This patch refactors the NFSRDMA server marshalling logic to remove the intermediary map structures. It also fixes an existing bug where the NFSRDMA server was not minding the device fast register page list length limitations. Signed-off-by: NTom Tucker <tom@opengridcomputing.com> Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
-
- 07 3月, 2012 1 次提交
-
-
由 Dan Carpenter 提交于
Sparse complains that the definition function definition and the implementation aren't anotated the same way. Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com> Acked-by: NTom Tucker <tom@opengridcomputing.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
- 18 2月, 2012 1 次提交
-
-
由 Tom Tucker 提交于
The svcrdma transport was un-marshalling requests in-place. This resulted in sparse warnings due to __beXX data containing both NBO and HBO data. The code has been restructured to do byte-swapping as the header is parsed instead of when the header is validated immediately after receipt. Also moved extern declarations for the workqueue and memory pools to the private header file. Signed-off-by: NTom Tucker <tom@ogc.us> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
- 07 10月, 2008 3 次提交
-
-
由 Tom Tucker 提交于
RPCRDMA requests that specify a read-list are fetched with RDMA_READ. Using an FRMR to map the data sink improves NFSRDMA security on transports that place the RDMA_READ data sink LKEY on the wire because the valid lifetime of the MR is only the duration of the RDMA_READ. The LKEY is invalidated when the last RDMA_READ WR completes. Mapping the data sink also allows for very large amounts to data to be fetched with a single WR, so if the client is also using FRMR, the entire RPC read-list can be fetched with a single WR. Signed-off-by: NTom Tucker <tom@opengridcomputing.com>
-
由 Tom Tucker 提交于
Fast Reg MR introduces a new WR type. Add a service to register the region with the adapter and update the completion handling to support completions with a NULL WR context. Signed-off-by: NTom Tucker <tom@opengridcomputing.com>
-
由 Tom Tucker 提交于
Add services for the allocating, freeing, and unmapping Fast Reg MR. These services will be used by the transport connection setup, send and receive routines. Signed-off-by: NTom Tucker <tom@opengridcomputing.com>
-
- 04 10月, 2008 1 次提交
-
-
由 Tom Tucker 提交于
Add data types to track Fast Reg Memory Regions. The core data type is svc_rdma_fastreg_mr that associates a device MR with a host kva and page list. A field is added to the WR context to keep track of the FRMR used to map the local memory for an RPC. An FRMR list and spin lock are added to the transport instance to keep track of all FRMR allocated for the transport. Also added are device capability flags to indicate what the memory registration capabilities are for the underlying device and whether or not fast memory registration is supported. Signed-off-by: NTom Tucker <tom@opengridcomputing.com>
-
- 14 8月, 2008 1 次提交
-
-
由 Tom Tucker 提交于
RDMA_READ completions are kept on a separate queue from the general I/O request queue. Since a separate lock is used to protect the RDMA_READ completion queue, a race exists between the dto_tasklet and the svc_rdma_recvfrom thread where the dto_tasklet sets the XPT_DATA bit and adds I/O to the read-completion queue. Concurrently, the recvfrom thread checks the generic queue, finds it empty and resets the XPT_DATA bit. A subsequent svc_xprt_enqueue will fail to enqueue the transport for I/O and cause the transport to "stall". The fix is to protect both lists with the same lock and set the XPT_DATA bit with this lock held. Signed-off-by: NTom Tucker <tom@opengridcomputing.com> Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
-
- 03 7月, 2008 2 次提交
-
-
由 Tom Tucker 提交于
Change the WR context pool to be shared across mount points. This reduces the RDMA transport memory footprint significantly since idle mounts don't consume WR context memory. Signed-off-by: NTom Tucker <tom@opengridcomputing.com>
-
由 Tom Tucker 提交于
The sc_read_wait queue head is no longer used. Remove it. Signed-off-by: NTom Tucker <tom@opengridcomputing.com>
-