- 03 7月, 2008 2 次提交
-
-
由 Tom Tucker 提交于
Use the new svc_rdma_req_map data type for mapping the client side memory to the server side memory. Move the DMA mapping to the context pointed to by each WR individually so that it is unmapped after the WR completes. Signed-off-by: NTom Tucker <tom@opengridcomputing.com>
-
由 Tom Tucker 提交于
Create a new data structure to hold the remote client address space to local server address space mapping. Signed-off-by: NTom Tucker <tom@opengridcomputing.com>
-
- 19 5月, 2008 21 次提交
-
-
由 Tom Tucker 提交于
A RDMA read-list cannot contain more elements than RPCSVC_MAXPAGES or it will overflow the DTO context. Verify this when processing the protocol header. Signed-off-by: NTom Tucker <tom@opengridcomputing.com>
-
由 Tom Tucker 提交于
The svc_rdma_send_error function is called when an RPCRDMA protocol error is detected. This function attempts to post an error reply message. Since an error posting to a transport in error is ignored, change the return type to void. Signed-off-by: NTom Tucker <tom@opengridcomputing.com>
-
由 Tom Tucker 提交于
This race was found by inspection. Messages can be received from the peer immediately following the rdma_accept call, however, the CQ have not yet been armed and the transport address has not yet been set. Set the transport address in the connect request handler and arm the CQ prior to calling rdma_accept. Signed-off-by: NTom Tucker <tom@opengridcomputing.com>
-
由 Tom Tucker 提交于
The rdma_read_complete function needs to copy the rqstp transport address from the transport. Failure to do so can result in using the wrong authentication method for the RPC or bug checking if the rqstp address is not valid. Signed-off-by: NTom Tucker <tom@opengridcomputing.com>
-
由 Tom Tucker 提交于
Use the ib_verbs version of the dma_unmap service in the svc_rdma_put_context function. This should support providers using software rdma. Signed-off-by: NTom Tucker <tom@opengridcomputing.com>
-
由 Tom Tucker 提交于
When the transport is closing, the DTO tasklet may queue data that never gets processed. Clean up resources associated with this I/O. Signed-off-by: NTom Tucker <tom@opengridcomputing.com>
-
由 Tom Tucker 提交于
Move the destruction of the QP and CM_ID to the free path so that the QP cleanup code doesn't race with the dto_tasklet handling flushed WR. The QP reference is not needed because we now have a reference for every WR. Also add a guard in the SQ and RQ completion handlers to ignore calls generated by some providers when the QP is destroyed. Signed-off-by: NTom Tucker <tom@opengridcomputing.com>
-
由 Tom Tucker 提交于
Add a reference on the transport for every outstanding WR. Signed-off-by: NTom Tucker <tom@opengridcomputing.com>
-
由 Tom Tucker 提交于
Some providers may wait while destroying adapter resources. Since it is possible that the last reference is put on the dto_tasklet, the actual destroy must be scheduled as a work item. Signed-off-by: NTom Tucker <tom@opengridcomputing.com>
-
由 Tom Tucker 提交于
The rq_cq_reap function is only called from the dto_tasklet. The only resource shared with other threads is the sc_rq_dto_q. Move the spin lock to protect only this list. Signed-off-by: NTom Tucker <tom@opengridcomputing.com>
-
由 Tom Tucker 提交于
Replace the one-off linked list implementation used to implement the context cache with the standard Linux list_head lists. Add a context counter to catch resource leaks. A WARN_ON will be added later to ensure that we've freed all contexts. Signed-off-by: NTom Tucker <tom@opengridcomputing.com>
-
由 Tom Tucker 提交于
An NFS_WRITE requires a set of RDMA_READ requests to fetch the write data from the client. There are two principal pieces of data that need to be tracked: the list of pages that comprise the completed RPC and the SGE of dma mapped pages to refer to this list of pages. Previously this whole bit was managed as a linked list of contexts with the context containing the page list buried in this list. This patch simplifies this processing by not keeping a linked list, but rather only a pionter from the last submitted RDMA_READ's context to the context that maps the set of pages that describe the RPC. This significantly simplifies this code path. SGE contexts are cleaned up inline in the DTO path instead of at read completion time. Signed-off-by: NTom Tucker <tom@opengridcomputing.com>
-
由 Tom Tucker 提交于
The RDMACTXT_F_READ_DONE bit is not longer used. Remove it. Signed-off-by: NTom Tucker <tom@opengridcomputing.com>
-
由 Tom Tucker 提交于
The rdma_read_xdr function did not discriminate between no read-list and an error posting the read-list. This results in a leak of a page if there is an error posting the read-list. Signed-off-by: NTom Tucker <tom@opengridcomputing.com>
-
由 Tom Tucker 提交于
A listening endpoint isn't known to the generic transport switch until the svc_create_xprt function returns without error. Calling svc_xprt_put within the xpo_create function causes the module reference count to be erroneously decremented. Signed-off-by: NTom Tucker <tom@opengridcomputing.com>
-
由 Tom Tucker 提交于
If an error is encountered trying to post a recv buffer in send_reply, free the passed in context. Return an error to the caller so it is aware that the request was not posted. Signed-off-by: NTom Tucker <tom@opengridcomputing.com>
-
由 Tom Tucker 提交于
If there is an error posting the recv WR to the RQ, free the context associated with the WR. This would leak a context when asynchronous errors occurred on the transport while conccurent threads were processing their RPC. Signed-off-by: NTom Tucker <tom@opengridcomputing.com>
-
由 Tom Tucker 提交于
The svcrdma transport takes a reference when it gets the ESTABLISHED event from the provider. This reference is supposed to be removed when the DISCONNECT event is received, however, the call to svc_xprt_put was missing in the switch statement. This results in the memory associated with the transport never being freed. Signed-off-by: NTom Tucker <tom@opengridcomputing.com>
-
由 Tom Tucker 提交于
Fix the return value on close to -ENOTCONN so caller knows to free context. Also if a thread is waiting for free SQ space, check for close when waking to avoid posting WR to a closing transport. Signed-off-by: NTom Tucker <tom@opengridcomputing.com>
-
由 Tom Tucker 提交于
The svc_rdma_send function will attempt to reap SQ WR to make room for a new request if it finds the SQ full. This function races with the dto_tasklet that also reaps SQ WR. To avoid polling and arming the CQ unnecessarily move the test_and_clear_bit of the RDMAXPRT_SQ_PENDING flag and arming of the CQ to the sq_cq_reap function. Refactor the rq_cq_reap function to match sq_cq_reap so that the code is easier to follow. Signed-off-by: NTom Tucker <tom@opengridcomputing.com>
-
由 Tom Tucker 提交于
The svcrdma transport provider currently allocates receive buffers to the RQ through the xpo_release_rqst method. This approach is overly complicated since it means that the rqstp rq_xprt_ctxt has to be selectively set based on whether the RPC is going to be processed immediately or deferred. Instead, just post the receive buffer when we are certain that we are replying in the send_reply function. Signed-off-by: NTom Tucker <tom@opengridcomputing.com>
-
- 24 4月, 2008 1 次提交
-
-
由 Tom Tucker 提交于
SVCRDMA: Add check for XPT_CLOSE in svc_rdma_send The svcrdma transport can crash if a send is waiting for an empty SQ slot and the connection is closed due to an asynchronous error. The crash is caused when svc_rdma_send attempts to send on a deleted QP. Signed-off-by: NTom Tucker <tom@opengridcomputing.com> Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
-
- 17 4月, 2008 1 次提交
-
-
由 Roland Dreier 提交于
Add a new IB_WR_SEND_WITH_INV send opcode that can be used to mark a "send with invalidate" work request as defined in the iWARP verbs and the InfiniBand base memory management extensions. Also put "imm_data" and a new "invalidate_rkey" member in a new "ex" union in struct ib_send_wr. The invalidate_rkey member can be used to pass in an R_Key/STag to be invalidated. Add this new union to struct ib_uverbs_send_wr. Add code to copy the invalidate_rkey field in ib_uverbs_post_send(). Fix up low-level drivers to deal with the change to struct ib_send_wr, and just remove the imm_data initialization from net/sunrpc/xprtrdma/, since that code never does any send with immediate operations. Also, move the existing IB_DEVICE_SEND_W_INV flag to a new bit, since the iWARP drivers currently in the tree set the bit. The amso1100 driver at least will silently fail to honor the IB_SEND_INVALIDATE bit if passed in as part of userspace send requests (since it does not implement kernel bypass work request queueing). Remove the flag from all existing drivers that set it until we know which ones are OK. The values chosen for the new flag is not consecutive to avoid clashing with flags defined in the XRC patches, which are not merged yet but which are already in use and are likely to be merged soon. This resurrects a patch sent long ago by Mikkel Hagen <mhagen@iol.unh.edu>. Signed-off-by: NRoland Dreier <rolandd@cisco.com>
-
- 27 3月, 2008 1 次提交
-
-
由 Tom Tucker 提交于
The RDMACTXT_F_LAST_CTXT bit was getting set incorrectly when the last chunk in the read-list spanned multiple pages. This resulted in a kernel panic when the wrong context was used to build the RPC iovec page list. RDMA_READ is used to fetch RPC data from the client for NFS_WRITE requests. A scatter-gather is used to map the advertised client side buffer to the server-side iovec and associated page list. WR contexts are used to convey which scatter-gather entries are handled by each WR. When the write data is large, a single RPC may require multiple RDMA_READ requests so the contexts for a single RPC are chained together in a linked list. The last context in this list is marked with a bit RDMACTXT_F_LAST_CTXT so that when this WR completes, the CQ handler code can enqueue the RPC for processing. The code in rdma_read_xdr was setting this bit on the last two contexts on this list when the last read-list chunk spanned multiple pages. This caused the svc_rdma_recvfrom logic to incorrectly build the RPC and caused the kernel to crash because the second-to-last context doesn't contain the iovec page list. Modified the condition that sets this bit so that it correctly detects the last context for the RPC. Signed-off-by: NTom Tucker <tom@opengridcomputing.com> Tested-by: NRoland Dreier <rolandd@cisco.com> Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 25 3月, 2008 1 次提交
-
-
由 Roland Dreier 提交于
The iWARP protocol limits RDMA read requests to a single scatter entry. NFS/RDMA has code in rdma_read_max_sge() that is supposed to limit the sge_count for RDMA read requests to 1, but the code to do that is inside an #ifdef RDMA_TRANSPORT_IWARP block. In the mainline kernel at least, RDMA_TRANSPORT_IWARP is an enum and not a preprocessor #define, so the #ifdef'ed code is never compiled. In my test of a kernel build with -j8 on an NFS/RDMA mount, this problem eventually leads to trouble starting with: svcrdma: Error posting send = -22 svcrdma : RDMA_READ error = -22 and things go downhill from there. The trivial fix is to delete the #ifdef guard. The check seems to be a remnant of when the NFS/RDMA code was not merged and needed to compile against multiple kernel versions, although I don't think it ever worked as intended. In any case now that the code is upstream there's no need to test whether the RDMA_TRANSPORT_IWARP constant is defined or not. Without this patch, my kernel build on an NFS/RDMA mount using NetEffect adapters quickly and 100% reproducibly failed with an error like: ld: final link failed: Software caused connection abort With the patch applied I was able to complete a kernel build on the same setup. (Tom Tucker says this is "actually an _ancient_ remnant when it had to compile against iWARP vs. non-iWARP enabled OFA trees.") Signed-off-by: NRoland Dreier <rolandd@cisco.com> Acked-by: NTom Tucker <tom@opengridcomputing.com> Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 13 3月, 2008 2 次提交
-
-
由 Tom Tucker 提交于
The assertion that checks for sge context overflow is incorrectly hard-coded to 32. This causes a kernel bug check when using big-data mounts. Changed the BUG_ON to use the computed value RPCSVC_MAXPAGES. Signed-off-by: NTom Tucker <tom@opengridcomputing.com> Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Tom Tucker 提交于
RDMA connection shutdown on an SMP machine can cause a kernel crash due to the transport close path racing with the I/O tasklet. Additional transport references were added as follows: - A reference when on the DTO Q to avoid having the transport deleted while queued for I/O. - A reference while there is a QP able to generate events. - A reference until the DISCONNECTED event is received on the CM ID Signed-off-by: NTom Tucker <tom@opengridcomputing.com> Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 08 3月, 2008 1 次提交
-
-
由 Tom Talpey 提交于
Prevent an RPC oops when freeing a dynamically allocated RDMA buffer, used in certain special-case large metadata operations. Signed-off-by: NTom Talpey <tmt@netapp.com> Signed-off-by: NJames Lentini <jlentini@netapp.com> Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
-
- 11 2月, 2008 1 次提交
-
-
由 Roland Dreier 提交于
net/sunrpc/xprtrdma/svc_rdma_sendto.c:160: warning: format '%llx' expects type 'long long unsigned int', but argument 3 has type 'u64' Signed-off-by: NRoland Dreier <rolandd@cisco.com> Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
-
- 02 2月, 2008 6 次提交
-
-
由 Tom Tucker 提交于
Add the svcrdma module to the xprtrdma makefile. Signed-off-by: NTom Tucker <tom@opengridcomputing.com> Acked-by: NNeil Brown <neilb@suse.de> Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
-
由 Tom Tucker 提交于
This logic parses the ONCRDMA protocol headers that precede the actual RPC header. It is placed in a separate file to keep all protocol aware code in a single place. Signed-off-by: NTom Tucker <tom@opengridcomputing.com> Acked-by: NNeil Brown <neilb@suse.de> Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
-
由 Tom Tucker 提交于
This file implements the RDMA transport sendto function. A RPC reply on an RDMA transport consists of some number of RDMA_WRITE requests followed by an RDMA_SEND request. The sendto function parses the ONCRPC RDMA reply header to determine how to send the reply back to the client. The send queue is sized so as to be able to send complete replies for requests in most cases. In the event that there are not enough SQ WR slots to reply, e.g. big data, the send will block the NFSD thread. The I/O callback functions in svc_rdma_transport.c that reap WR completions wake any waiters blocked on the SQ. In general, the goal is not to block NFSD threads and the has_wspace method stall requests when the SQ is nearly full. Signed-off-by: NTom Tucker <tom@opengridcomputing.com> Acked-by: NNeil Brown <neilb@suse.de> Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
-
由 Tom Tucker 提交于
This file implements the RDMA transport recvfrom function. The function dequeues work reqeust completion contexts from an I/O list that it shares with the I/O tasklet in svc_rdma_transport.c. For ONCRPC RDMA, an RPC may not be complete when it is received. Instead, the RDMA header that precedes the RPC message informs the transport where to get the RPC data from on the client and where to place it in the RPC message before it is delivered to the server. The svc_rdma_recvfrom function therefore, parses this RDMA header and issues any necessary RDMA operations to fetch the remainder of the RPC from the client. Special handling is required when the request involves an RDMA_READ. In this case, recvfrom submits the RDMA_READ requests to the underlying transport driver and then returns 0. When the transport completes the last RDMA_READ for the request, it enqueues it on a read completion queue and enqueues the transport. The recvfrom code favors this queue over the regular DTO queue when satisfying reads. Signed-off-by: NTom Tucker <tom@opengridcomputing.com> Acked-by: NNeil Brown <neilb@suse.de> Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
-
由 Tom Tucker 提交于
This file implements the core transport data management and I/O path. The I/O path for RDMA involves receiving callbacks on interrupt context. Since all the svc transport locks are _bh locks we enqueue the transport on a list, schedule a tasklet to dequeue data indications from the RDMA completion queue. The tasklet in turn takes _bh locks to enqueue receive data indications on a list for the transport. The svc_rdma_recvfrom transport function dequeues data from this list in an NFSD thread context. Signed-off-by: NTom Tucker <tom@opengridcomputing.com> Acked-by: NNeil Brown <neilb@suse.de> Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
-
由 Tom Tucker 提交于
This file implements the RDMA transport module initialization and termination logic and registers the transport sysctl variables. Signed-off-by: NTom Tucker <tom@opengridcomputing.com> Acked-by: NNeil Brown <neilb@suse.de> Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
-
- 30 1月, 2008 3 次提交
-
-
由 Chuck Lever 提交于
Clean up: document the rule (kfree) and the exceptions (RPC_DISPLAY_PROTO and RPC_DISPLAY_NETID) when freeing the objects in a transport's address_strings array. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
-
由 Trond Myklebust 提交于
In order to be able to support setting the timeo and retrans parameters on a per-mountpoint basis, we move the rpc_timeout structure into the rpc_clnt. Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
-
由 Trond Myklebust 提交于
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
-