- 03 11月, 2015 10 次提交
-
-
由 Chuck Lever 提交于
xprt_{setup,destroy}_backchannel() won't be adequate for RPC/RMDA bi-direction. In particular, receive buffers have to be pre- registered and posted in order to receive incoming backchannel requests. Add a virtual function call to allow the insertion of appropriate backchannel setup and destruction methods for each transport. In addition, freeing a backchannel request is a little different for RPC/RDMA. Introduce an rpc_xprt_op to handle the difference. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Reviewed-by: NSagi Grimberg <sagig@mellanox.com> Tested-By: NDevesh Sharma <devesh.sharma@avagotech.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
由 Chuck Lever 提交于
Now that RPC replies are processed in a workqueue, there's no need to disable IRQs when managing send and receive buffers. This saves noticeable overhead per RPC. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Reviewed-by: NSagi Grimberg <sagig@mellanox.com> Tested-By: NDevesh Sharma <devesh.sharma@avagotech.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
由 Chuck Lever 提交于
Clean up: The reply tasklet is no longer used. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Reviewed-by: NSagi Grimberg <sagig@mellanox.com> Tested-By: NDevesh Sharma <devesh.sharma@avagotech.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
由 Chuck Lever 提交于
The reply tasklet is fast, but it's single threaded. After reply traffic saturates a single CPU, there's no more reply processing capacity. Replace the tasklet with a workqueue to spread reply handling across all CPUs. This also moves RPC/RDMA reply handling out of the soft IRQ context and into a context that allows sleeps. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Reviewed-by: NSagi Grimberg <sagig@mellanox.com> Tested-By: NDevesh Sharma <devesh.sharma@avagotech.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
由 Chuck Lever 提交于
The rb_send_bufs and rb_recv_bufs arrays are used to implement a pair of stacks for keeping track of free rpcrdma_req and rpcrdma_rep structs. Replace those arrays with free lists. To allow more than 512 RPCs in-flight at once, each of these arrays would be larger than a page (assuming 8-byte addresses and 4KB pages). Allowing up to 64K in-flight RPCs (as TCP now does), each buffer array would have to be 128 pages. That's an order-6 allocation. (Not that we're going there.) A list is easier to expand dynamically. Instead of allocating a larger array of pointers and copying the existing pointers to the new array, simply append more buffers to each list. This also makes it simpler to manage receive buffers that might catch backwards-direction calls, or to post receive buffers in bulk to amortize the overhead of ib_post_recv. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Reviewed-by: NSagi Grimberg <sagig@mellanox.com> Reviewed-by: NDevesh Sharma <devesh.sharma@avagotech.com> Tested-By: NDevesh Sharma <devesh.sharma@avagotech.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
由 Chuck Lever 提交于
Clean up: The error cases in rpcrdma_reply_handler() almost never execute. Ensure the compiler places them out of the hot path. No behavior change expected. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Reviewed-by: NSagi Grimberg <sagig@mellanox.com> Reviewed-by: NDevesh Sharma <devesh.sharma@avagotech.com> Tested-By: NDevesh Sharma <devesh.sharma@avagotech.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
由 Chuck Lever 提交于
Commit 8301a2c0 ("xprtrdma: Limit work done by completion handler") was supposed to prevent xprtrdma's upcall handlers from starving other softIRQ work by letting them return to the provider before all CQEs have been polled. The logic assumes the provider will call the upcall handler again immediately if the CQ is re-armed while there are still queued CQEs. This assumption is invalid. The IBTA spec says that after a CQ is armed, the hardware must interrupt only when a new CQE is inserted. xprtrdma can't rely on the provider calling again, even though some providers do. Therefore, leaving CQEs on queue makes sense only when there is another mechanism that ensures all remaining CQEs are consumed in a timely fashion. xprtrdma does not have such a mechanism. If a CQE remains queued, the transport can wait forever to send the next RPC. Finally, move the wcs array back onto the stack to ensure that the poll array is always local to the CPU where the completion upcall is running. Fixes: 8301a2c0 ("xprtrdma: Limit work done by completion ...") Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Reviewed-by: NSagi Grimberg <sagig@mellanox.com> Reviewed-by: NDevesh Sharma <devesh.sharma@avagotech.com> Tested-By: NDevesh Sharma <devesh.sharma@avagotech.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
由 Chuck Lever 提交于
ib_req_notify_cq(IB_CQ_REPORT_MISSED_EVENTS) returns a positive value if WCs were added to a CQ after the last completion upcall but before the CQ has been re-armed. Commit 7f23f6f6 ("xprtrmda: Reduce lock contention in completion handlers") assumed that when ib_req_notify_cq() returned a positive RC, the CQ had also been successfully re-armed, making it safe to return control to the provider without losing any completion signals. That is an invalid assumption. Change both completion handlers to continue polling while ib_req_notify_cq() returns a positive value. Fixes: 7f23f6f6 ("xprtrmda: Reduce lock contention in ...") Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Reviewed-by: NSagi Grimberg <sagig@mellanox.com> Reviewed-by: NDevesh Sharma <devesh.sharma@avagotech.com> Tested-By: NDevesh Sharma <devesh.sharma@avagotech.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
由 Chuck Lever 提交于
After adding a swapfile on an NFS/RDMA mount and removing the normal swap partition, I was able to push the NFS client well into swap without any issue. I forgot to swapoff the NFS file before rebooting. This pinned the NFS mount and the IB core and provider, causing shutdown to hang. I think this is expected and safe behavior. Probably shutdown scripts should "swapoff -a" before unmounting any filesystems. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Reviewed-by: NSagi Grimberg <sagig@mellanox.com> Tested-By: NDevesh Sharma <devesh.sharma@avagotech.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
由 Steve Wise 提交于
Unsignaled send WRs can get flushed as part of normal unmount, so don't log them as warnings. Signed-off-by: NSteve Wise <swise@opengridcomputing.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
- 12 10月, 2015 1 次提交
-
-
由 Chuck Lever 提交于
Now that the NFS server advertises a maximum payload size of 1MB for RPC/RDMA again, it crashes in svc_process_common() when NFS client sends a 1MB NFS WRITE on an NFS/RDMA mount. The server has set up a 259 element array of struct page pointers in rq_pages[] for each incoming request. The last element of the array is NULL. When an incoming request has been completely received, rdma_read_complete() attempts to set the starting page of the incoming page vector: rqstp->rq_arg.pages = &rqstp->rq_pages[head->hdr_count]; and the page to use for the reply: rqstp->rq_respages = &rqstp->rq_arg.pages[page_no]; But the value of page_no has already accounted for head->hdr_count. Thus rq_respages now points past the end of the incoming pages. For NFS WRITE operations smaller than the maximum, this is harmless. But when the NFS WRITE operation is as large as the server's max payload size, rq_respages now points at the last entry in rq_pages, which is NULL. Fixes: cc9a903d ('svcrdma: Change maximum server payload . . .') BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=270Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Reviewed-by: NSagi Grimberg <sagig@dev.mellanox.co.il> Reviewed-by: NSteve Wise <swise@opengridcomputing.com> Reviewed-by: NShirley Ma <shirley.ma@oracle.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
- 07 10月, 2015 1 次提交
-
-
由 Sagi Grimberg 提交于
There is no need to require LOCAL_DMA_LKEY support as the PD allocation makes sure that there is a local_dma_lkey. Also correctly set a return value in error path. This caused a NULL pointer dereference in mlx5 which removed the support for LOCAL_DMA_LKEY. Fixes: bb6c96d7 ("xprtrdma: Replace global lkey with lkey local to PD") Signed-off-by: NSagi Grimberg <sagig@mellanox.com> Reviewed-by: NChuck Lever <chuck.lever@oracle.com> Acked-by: NAnna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
- 30 9月, 2015 1 次提交
-
-
由 Steve Wise 提交于
The server rdma_read_chunk_lcl() and rdma_read_chunk_frmr() functions were not taking into account the initial page_offset when determining the rdma read length. This resulted in a read who's starting address and length exceeded the base/bounds of the frmr. The server gets an async error from the rdma device and kills the connection, and the client then reconnects and resends. This repeats indefinitely, and the application hangs. Most work loads don't tickle this bug apparently, but one test hit it every time: building the linux kernel on a 16 core node with 'make -j 16 O=/mnt/0' where /mnt/0 is a ramdisk mounted via NFSRDMA. This bug seems to only be tripped with devices having small fastreg page list depths. I didn't see it with mlx4, for instance. Fixes: 0bf48289 ('svcrdma: refactor marshalling logic') Signed-off-by: NSteve Wise <swise@opengridcomputing.com> Tested-by: NChuck Lever <chuck.lever@oracle.com> Cc: stable@vger.kernel.org Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
- 28 9月, 2015 1 次提交
-
-
由 Steve Wise 提交于
Otherwise a FRMR completion can cause a touch-after-free crash. In xprt_rdma_destroy(), call rpcrdma_buffer_destroy() only after calling rpcrdma_ep_destroy(). In rpcrdma_ep_destroy(), disconnect the cm_id first which should flush the qp, then drain the cqs, then destroy the qp, and finally destroy the cqs. Signed-off-by: NSteve Wise <swise@opengridcomputing.com> Tested-by: NChuck Lever <chuck.lever@oracle.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
- 25 9月, 2015 1 次提交
-
-
由 Chuck Lever 提交于
The core API has changed so that devices that do not have a global DMA lkey automatically create an mr, per-PD, and make that lkey available. The global DMA lkey interface is going away in favor of the per-PD DMA lkey. The per-PD DMA lkey is always available. Convert xprtrdma to use the device's per-PD DMA lkey for regbufs, no matter which memory registration scheme is in use. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Signed-off-by: NSagi Grimberg <sagig@mellanox.com> Cc: linux-nfs <linux-nfs@vger.kernel.org> Acked-by: NAnna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
- 23 9月, 2015 1 次提交
-
-
由 Andrea Arcangeli 提交于
This reverts commit 51360155 and adapts fs/userfaultfd.c to use the old version of that function. It didn't look robust to call __wake_up_common with "nr == 1" when we absolutely require wakeall semantics, but we've full control of what we insert in the two waitqueue heads of the blocked userfaults. No exclusive waitqueue risks to be inserted into those two waitqueue heads so we can as well stick to "nr == 1" of the old code and we can rely purely on the fact no waitqueue inserted in one of the two waitqueue heads we must enforce as wakeall, has wait->flags WQ_FLAG_EXCLUSIVE set. Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com> Cc: Dr. David Alan Gilbert <dgilbert@redhat.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Shuah Khan <shuahkh@osg.samsung.com> Cc: Thierry Reding <treding@nvidia.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 20 9月, 2015 2 次提交
-
-
由 Trond Myklebust 提交于
Under all conditions, it should be quite sufficient just to mark the socket as disconnected. It will then be closed by the transport shutdown or reconnect code. Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
由 Trond Myklebust 提交于
Avoid all races with the connect/disconnect handlers by taking the transport lock. Reported-by: N"Suzuki K. Poulose" <suzuki.poulose@arm.com> Acked-by: NJeff Layton <jlayton@poochiereds.net> Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
- 18 9月, 2015 3 次提交
-
-
由 Trond Myklebust 提交于
Commit 718ba5b8, moved the responsibility for unlocking the socket to xs_tcp_setup_socket, meaning that the socket will be unlocked before we know that it has finished trying to connect. The following patch is based on an initial patch by Russell King to ensure that we delay clearing the XPRT_CONNECTING flag until we either know that we failed to initiate a connection attempt, or the connection attempt itself failed. Fixes: 718ba5b8 ("SUNRPC: Add helpers to prevent socket create from racing") Reported-by: NRussell King <linux@arm.linux.org.uk> Reported-by: NRussell King <rmk+kernel@arm.linux.org.uk> Tested-by: NRussell King <rmk+kernel@arm.linux.org.uk> Tested-by: NBenjamin Coddington <bcodding@redhat.com> Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
由 Julia Lawall 提交于
Remove unneeded NULL test. The semantic patch that makes this change is as follows: (http://coccinelle.lip6.fr/) // <smpl> @@ expression x; @@ -if (x != NULL) \(kmem_cache_destroy\|mempool_destroy\|dma_pool_destroy\)(x); // </smpl> Signed-off-by: NJulia Lawall <Julia.Lawall@lip6.fr> Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
由 Trond Myklebust 提交于
When we're destroying the socket transport, we need to ensure that we cancel any existing delayed connection attempts, and order them w.r.t. the call to xs_close(). Reported-by: N"Suzuki K. Poulose" <suzuki.poulose@arm.com> Acked-by: NJeff Layton <jlayton@poochiereds.net> Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
- 05 9月, 2015 1 次提交
-
-
由 Andrea Arcangeli 提交于
userfaultfd needs to wake all waitqueues (pass 0 as nr parameter), instead of the current hardcoded 1 (that would wake just the first waitqueue in the head list). Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com> Acked-by: NPavel Emelyanov <xemul@parallels.com> Cc: Sanidhya Kashyap <sanidhya.gatech@gmail.com> Cc: zhang.zhanghailiang@huawei.com Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Andres Lagar-Cavilla <andreslc@google.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Hugh Dickins <hughd@google.com> Cc: Peter Feiner <pfeiner@google.com> Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: "Huangpeng (Peter)" <peter.huangpeng@huawei.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 31 8月, 2015 3 次提交
-
-
由 Jason Gunthorpe 提交于
The majority of callers never check the return value, and even if they did, they can't do anything about a failure. All possible failure cases represent a bug in the caller, so just WARN_ON inside the function instead. This fixes a few random errors: net/rd/iw.c infinite loops while it fails. (racing with EBUSY?) This also lays the ground work to get rid of error return from the drivers. Most drivers do not error, the few that do are broken since it cannot be handled. Since uverbs can legitimately make use of EBUSY, open code the check. Signed-off-by: NJason Gunthorpe <jgunthorpe@obsidianresearch.com> Reviewed-by: NChuck Lever <chuck.lever@oracle.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Steve Wise 提交于
Svcrdma was incorrectly allocating fastreg MRs and page lists using RPCSVC_MAXPAGES, which can exceed the device capabilities. So limit the depth to the minimum of RPCSVC_MAXPAGES and xprt->sc_frmr_pg_list_len. Signed-off-by: NSteve Wise <swise@opengridcomputing.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Sagi Grimberg 提交于
Signed-off-by: NSagi Grimberg <sagig@mellanox.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
- 30 8月, 2015 2 次提交
-
-
由 Trond Myklebust 提交于
Add a shutdown() call before we release the socket in order to ensure the reset is sent before we try to reconnect. Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
由 Trond Myklebust 提交于
In case the reconnection attempt fails. Cc: stable@vger.kernel.org Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
- 29 8月, 2015 1 次提交
-
-
由 Steve Wise 提交于
Signed-off-by: NSteve Wise <swise@opengridcomputing.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
- 20 8月, 2015 1 次提交
-
-
由 Trond Myklebust 提交于
Follow up to commit c4a7ca77 ("SUNRPC: Allow waiting on memory allocation"). Allows the RPC socket code to do non-IO blocking. Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
- 18 8月, 2015 1 次提交
-
-
由 Trond Myklebust 提交于
It is rather pointless to test the value of transport->inet after calling xs_reset_transport(), since it will always be zero, and so we will never see any exponential back off behaviour. Also don't force early connections for SOFTCONN tasks. If the server disconnects us, we should respect the exponential backoff. Cc: stable@vger.kernel.org # 4.0+ Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
- 13 8月, 2015 4 次提交
-
-
由 Kinglong Mee 提交于
Switch using list_head for cache_head in cache_detail, it is useful of remove an cache_head entry directly from cache_detail. v8, using hash list, not head list Signed-off-by: NKinglong Mee <kinglongmee@gmail.com> Reviewed-by: NNeilBrown <neilb@suse.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Kinglong Mee 提交于
Nfsd has implement a site of seq_operations functions as sunrpc's cache. Just exports sunrpc's codes, and remove nfsd's redundant codes. v8, same as v6 Signed-off-by: NKinglong Mee <kinglongmee@gmail.com> Reviewed-by: NNeilBrown <neilb@suse.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Kinglong Mee 提交于
Cleanup. Just store cache_detail in seq_file's private, an allocated handle is redundant. v8, same as v6. Signed-off-by: NKinglong Mee <kinglongmee@gmail.com> Reviewed-by: NNeilBrown <neilb@suse.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Jeff Layton 提交于
The current limit of 32 bytes artificially limits the name string that we end up stuffing into NFSv4.x client ID blobs. If you have multiple hosts with long hostnames that only differ near the end, then this can cause NFSv4 client ID collisions. Linux nodenames are actually limited to __NEW_UTS_LEN bytes (64), so use that as the limit instead. Also, use XDR_QUADLEN to specify the slack length, just for clarity and in case someone in the future changes this to something not evenly divisible by 4. Reported-by: NMichael Skralivetsky <michael.skralivetsky@primarydata.com> Signed-off-by: NJeff Layton <jeff.layton@primarydata.com> Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
- 11 8月, 2015 6 次提交
-
-
由 Jeff Layton 提交于
In later patches, we'll want to be able to allocate and free svc_rqst structures without monkeying with the serv->sv_nrthreads refcount. Factor those pieces out of their respective functions. Signed-off-by: NShirley Ma <shirley.ma@oracle.com> Acked-by: NJeff Layton <jlayton@primarydata.com> Tested-by: NShirley Ma <shirley.ma@oracle.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Jeff Layton 提交于
In later patches, we're going to need to allow code external to svc.c to figure out what pool_mode is in use. Move these definitions into svc.h to prepare for that. Also, make the svc_pool_map object available and exported so that other modules can peek in there to get insight into what pool mode is in use. Likewise, export svc_pool_map_get/put function to make it safe to do so. Signed-off-by: NShirley Ma <shirley.ma@oracle.com> Acked-by: NJeff Layton <jlayton@primarydata.com> Tested-by: NShirley Ma <shirley.ma@oracle.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Jeff Layton 提交于
For now, all services use svc_xprt_do_enqueue, but once we add workqueue-based service support, we'll need to do something different. Signed-off-by: NShirley Ma <shirley.ma@oracle.com> Acked-by: NJeff Layton <jlayton@primarydata.com> Tested-by: NShirley Ma <shirley.ma@oracle.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Jeff Layton 提交于
...not technically an operation, but it's more convenient and cleaner to pass the module pointer in this struct. Signed-off-by: NShirley Ma <shirley.ma@oracle.com> Acked-by: NJeff Layton <jlayton@primarydata.com> Tested-by: NShirley Ma <shirley.ma@oracle.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Jeff Layton 提交于
Since we now have a container for holding svc_serv operations, move the sv_function into it as well. Signed-off-by: NShirley Ma <shirley.ma@oracle.com> Acked-by: NJeff Layton <jlayton@primarydata.com> Tested-by: NShirley Ma <shirley.ma@oracle.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Jeff Layton 提交于
In later patches we'll need to abstract out more operations on a per-service level, besides sv_shutdown and sv_function. Declare a new svc_serv_ops struct to hold these operations, and move sv_shutdown into this struct. Signed-off-by: NShirley Ma <shirley.ma@oracle.com> Acked-by: NJeff Layton <jlayton@primarydata.com> Tested-by: NShirley Ma <shirley.ma@oracle.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-