- 08 10月, 2015 5 次提交
-
-
由 Trond Myklebust 提交于
If we're sending more than one page via kernel_sendpage(), then set MSG_SENDPAGE_NOTLAST between the pages so that we don't send suboptimal frames (see commit 2f533844 and commit 35f9c09f). Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
由 Trond Myklebust 提交于
Now that we've done it for TCP and UDP, let's convert AF_LOCAL as well. Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
由 Trond Myklebust 提交于
Now that we've done it for TCP, let's convert UDP as well. Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
由 Trond Myklebust 提交于
Stream protocols such as TCP can often build up a backlog of data to be read due to ordering. Combine this with the fact that some workloads such as NFS read()-intensive workloads need to receive a lot of data per RPC call, and it turns out that receiving the data from inside a softirq context can cause starvation. The following patch moves the TCP data receive into a workqueue context. We still end up calling tcp_read_sock(), but we do so from a process context, meaning that softirqs are enabled for most of the time. With this patch, I see a doubling of read bandwidth when running a multi-threaded iozone workload between a virtual client and server setup. Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
由 Trond Myklebust 提交于
Move the TCP data receive loop out of xs_tcp_data_ready(). Doing so will allow us to move the data receive out of the softirq context in a set of followup patches. Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
- 28 9月, 2015 1 次提交
-
-
由 Steve Wise 提交于
Otherwise a FRMR completion can cause a touch-after-free crash. In xprt_rdma_destroy(), call rpcrdma_buffer_destroy() only after calling rpcrdma_ep_destroy(). In rpcrdma_ep_destroy(), disconnect the cm_id first which should flush the qp, then drain the cqs, then destroy the qp, and finally destroy the cqs. Signed-off-by: NSteve Wise <swise@opengridcomputing.com> Tested-by: NChuck Lever <chuck.lever@oracle.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
- 25 9月, 2015 1 次提交
-
-
由 Chuck Lever 提交于
The core API has changed so that devices that do not have a global DMA lkey automatically create an mr, per-PD, and make that lkey available. The global DMA lkey interface is going away in favor of the per-PD DMA lkey. The per-PD DMA lkey is always available. Convert xprtrdma to use the device's per-PD DMA lkey for regbufs, no matter which memory registration scheme is in use. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Signed-off-by: NSagi Grimberg <sagig@mellanox.com> Cc: linux-nfs <linux-nfs@vger.kernel.org> Acked-by: NAnna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
- 23 9月, 2015 1 次提交
-
-
由 Andrea Arcangeli 提交于
This reverts commit 51360155 and adapts fs/userfaultfd.c to use the old version of that function. It didn't look robust to call __wake_up_common with "nr == 1" when we absolutely require wakeall semantics, but we've full control of what we insert in the two waitqueue heads of the blocked userfaults. No exclusive waitqueue risks to be inserted into those two waitqueue heads so we can as well stick to "nr == 1" of the old code and we can rely purely on the fact no waitqueue inserted in one of the two waitqueue heads we must enforce as wakeall, has wait->flags WQ_FLAG_EXCLUSIVE set. Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com> Cc: Dr. David Alan Gilbert <dgilbert@redhat.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Shuah Khan <shuahkh@osg.samsung.com> Cc: Thierry Reding <treding@nvidia.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 20 9月, 2015 2 次提交
-
-
由 Trond Myklebust 提交于
Under all conditions, it should be quite sufficient just to mark the socket as disconnected. It will then be closed by the transport shutdown or reconnect code. Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
由 Trond Myklebust 提交于
Avoid all races with the connect/disconnect handlers by taking the transport lock. Reported-by: N"Suzuki K. Poulose" <suzuki.poulose@arm.com> Acked-by: NJeff Layton <jlayton@poochiereds.net> Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
- 18 9月, 2015 3 次提交
-
-
由 Trond Myklebust 提交于
Commit 718ba5b8, moved the responsibility for unlocking the socket to xs_tcp_setup_socket, meaning that the socket will be unlocked before we know that it has finished trying to connect. The following patch is based on an initial patch by Russell King to ensure that we delay clearing the XPRT_CONNECTING flag until we either know that we failed to initiate a connection attempt, or the connection attempt itself failed. Fixes: 718ba5b8 ("SUNRPC: Add helpers to prevent socket create from racing") Reported-by: NRussell King <linux@arm.linux.org.uk> Reported-by: NRussell King <rmk+kernel@arm.linux.org.uk> Tested-by: NRussell King <rmk+kernel@arm.linux.org.uk> Tested-by: NBenjamin Coddington <bcodding@redhat.com> Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
由 Julia Lawall 提交于
Remove unneeded NULL test. The semantic patch that makes this change is as follows: (http://coccinelle.lip6.fr/) // <smpl> @@ expression x; @@ -if (x != NULL) \(kmem_cache_destroy\|mempool_destroy\|dma_pool_destroy\)(x); // </smpl> Signed-off-by: NJulia Lawall <Julia.Lawall@lip6.fr> Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
由 Trond Myklebust 提交于
When we're destroying the socket transport, we need to ensure that we cancel any existing delayed connection attempts, and order them w.r.t. the call to xs_close(). Reported-by: N"Suzuki K. Poulose" <suzuki.poulose@arm.com> Acked-by: NJeff Layton <jlayton@poochiereds.net> Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
- 05 9月, 2015 1 次提交
-
-
由 Andrea Arcangeli 提交于
userfaultfd needs to wake all waitqueues (pass 0 as nr parameter), instead of the current hardcoded 1 (that would wake just the first waitqueue in the head list). Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com> Acked-by: NPavel Emelyanov <xemul@parallels.com> Cc: Sanidhya Kashyap <sanidhya.gatech@gmail.com> Cc: zhang.zhanghailiang@huawei.com Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Cc: Andres Lagar-Cavilla <andreslc@google.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Hugh Dickins <hughd@google.com> Cc: Peter Feiner <pfeiner@google.com> Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: "Huangpeng (Peter)" <peter.huangpeng@huawei.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 31 8月, 2015 3 次提交
-
-
由 Jason Gunthorpe 提交于
The majority of callers never check the return value, and even if they did, they can't do anything about a failure. All possible failure cases represent a bug in the caller, so just WARN_ON inside the function instead. This fixes a few random errors: net/rd/iw.c infinite loops while it fails. (racing with EBUSY?) This also lays the ground work to get rid of error return from the drivers. Most drivers do not error, the few that do are broken since it cannot be handled. Since uverbs can legitimately make use of EBUSY, open code the check. Signed-off-by: NJason Gunthorpe <jgunthorpe@obsidianresearch.com> Reviewed-by: NChuck Lever <chuck.lever@oracle.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Steve Wise 提交于
Svcrdma was incorrectly allocating fastreg MRs and page lists using RPCSVC_MAXPAGES, which can exceed the device capabilities. So limit the depth to the minimum of RPCSVC_MAXPAGES and xprt->sc_frmr_pg_list_len. Signed-off-by: NSteve Wise <swise@opengridcomputing.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Sagi Grimberg 提交于
Signed-off-by: NSagi Grimberg <sagig@mellanox.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
- 30 8月, 2015 2 次提交
-
-
由 Trond Myklebust 提交于
Add a shutdown() call before we release the socket in order to ensure the reset is sent before we try to reconnect. Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
由 Trond Myklebust 提交于
In case the reconnection attempt fails. Cc: stable@vger.kernel.org Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
- 29 8月, 2015 1 次提交
-
-
由 Steve Wise 提交于
Signed-off-by: NSteve Wise <swise@opengridcomputing.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
- 20 8月, 2015 1 次提交
-
-
由 Trond Myklebust 提交于
Follow up to commit c4a7ca77 ("SUNRPC: Allow waiting on memory allocation"). Allows the RPC socket code to do non-IO blocking. Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
- 18 8月, 2015 1 次提交
-
-
由 Trond Myklebust 提交于
It is rather pointless to test the value of transport->inet after calling xs_reset_transport(), since it will always be zero, and so we will never see any exponential back off behaviour. Also don't force early connections for SOFTCONN tasks. If the server disconnects us, we should respect the exponential backoff. Cc: stable@vger.kernel.org # 4.0+ Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
- 13 8月, 2015 4 次提交
-
-
由 Kinglong Mee 提交于
Switch using list_head for cache_head in cache_detail, it is useful of remove an cache_head entry directly from cache_detail. v8, using hash list, not head list Signed-off-by: NKinglong Mee <kinglongmee@gmail.com> Reviewed-by: NNeilBrown <neilb@suse.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Kinglong Mee 提交于
Nfsd has implement a site of seq_operations functions as sunrpc's cache. Just exports sunrpc's codes, and remove nfsd's redundant codes. v8, same as v6 Signed-off-by: NKinglong Mee <kinglongmee@gmail.com> Reviewed-by: NNeilBrown <neilb@suse.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Kinglong Mee 提交于
Cleanup. Just store cache_detail in seq_file's private, an allocated handle is redundant. v8, same as v6. Signed-off-by: NKinglong Mee <kinglongmee@gmail.com> Reviewed-by: NNeilBrown <neilb@suse.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Jeff Layton 提交于
The current limit of 32 bytes artificially limits the name string that we end up stuffing into NFSv4.x client ID blobs. If you have multiple hosts with long hostnames that only differ near the end, then this can cause NFSv4 client ID collisions. Linux nodenames are actually limited to __NEW_UTS_LEN bytes (64), so use that as the limit instead. Also, use XDR_QUADLEN to specify the slack length, just for clarity and in case someone in the future changes this to something not evenly divisible by 4. Reported-by: NMichael Skralivetsky <michael.skralivetsky@primarydata.com> Signed-off-by: NJeff Layton <jeff.layton@primarydata.com> Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
- 11 8月, 2015 7 次提交
-
-
由 Jeff Layton 提交于
In later patches, we'll want to be able to allocate and free svc_rqst structures without monkeying with the serv->sv_nrthreads refcount. Factor those pieces out of their respective functions. Signed-off-by: NShirley Ma <shirley.ma@oracle.com> Acked-by: NJeff Layton <jlayton@primarydata.com> Tested-by: NShirley Ma <shirley.ma@oracle.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Jeff Layton 提交于
In later patches, we're going to need to allow code external to svc.c to figure out what pool_mode is in use. Move these definitions into svc.h to prepare for that. Also, make the svc_pool_map object available and exported so that other modules can peek in there to get insight into what pool mode is in use. Likewise, export svc_pool_map_get/put function to make it safe to do so. Signed-off-by: NShirley Ma <shirley.ma@oracle.com> Acked-by: NJeff Layton <jlayton@primarydata.com> Tested-by: NShirley Ma <shirley.ma@oracle.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Jeff Layton 提交于
For now, all services use svc_xprt_do_enqueue, but once we add workqueue-based service support, we'll need to do something different. Signed-off-by: NShirley Ma <shirley.ma@oracle.com> Acked-by: NJeff Layton <jlayton@primarydata.com> Tested-by: NShirley Ma <shirley.ma@oracle.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Jeff Layton 提交于
...not technically an operation, but it's more convenient and cleaner to pass the module pointer in this struct. Signed-off-by: NShirley Ma <shirley.ma@oracle.com> Acked-by: NJeff Layton <jlayton@primarydata.com> Tested-by: NShirley Ma <shirley.ma@oracle.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Jeff Layton 提交于
Since we now have a container for holding svc_serv operations, move the sv_function into it as well. Signed-off-by: NShirley Ma <shirley.ma@oracle.com> Acked-by: NJeff Layton <jlayton@primarydata.com> Tested-by: NShirley Ma <shirley.ma@oracle.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Jeff Layton 提交于
In later patches we'll need to abstract out more operations on a per-service level, besides sv_shutdown and sv_function. Declare a new svc_serv_ops struct to hold these operations, and move sv_shutdown into this struct. Signed-off-by: NShirley Ma <shirley.ma@oracle.com> Acked-by: NJeff Layton <jlayton@primarydata.com> Tested-by: NShirley Ma <shirley.ma@oracle.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Chuck Lever 提交于
Both commit 0380a3f3 ("svcrdma: Add a separate "max data segs" macro for svcrdma") and commit 7e5be288 ("svcrdma: advertise the correct max payload") are incorrect. This commit reverts both changes, restoring the server's maximum payload size to 1MB. Commit 7e5be288 based the server's maximum payload on the _client's_ RPCRDMA_MAX_DATA_SEGS value. That was wrong. Commit 0380a3f3 tried to fix this so that the client maximum payload size could be raised without affecting the server, but managed to confuse matters more on the server side. More importantly, limiting the advertised maximum payload size was meant to be a workaround, not the actual fix. We need to revisit https://bugzilla.linux-nfs.org/show_bug.cgi?id=270 A Linux client on a platform with 64KB pages can overrun and crash an x86_64 NFS/RDMA server when the r/wsize is 1MB. An x86/64 Linux client seems to work fine using 1MB reads and writes when the Linux server's maximum payload size is restored to 1MB. BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=270 Fixes: 0380a3f3 ("svcrdma: Add a separate "max data segs" macro") Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
- 06 8月, 2015 7 次提交
-
-
由 Devesh Sharma 提交于
This is a rework of the following patch sent almost a year back: http://www.mail-archive.com/linux-rdma%40vger.kernel.org/msg20730.html In presence of active mount if someone tries to rmmod vendor-driver, the command remains stuck forever waiting for destruction of all rdma-cm-id. in worst case client can crash during shutdown with active mounts. The existing code assumes that ia->ri_id->device cannot change during the lifetime of a transport. xprtrdma do not have support for DEVICE_REMOVAL event either. Lifting that assumption and adding support for DEVICE_REMOVAL event is a long chain of work, and is in plan. The community decided that preventing the hang right now is more important than waiting for architectural changes. Thus, this patch introduces a temporary workaround to acquire HCA driver module reference count during the mount of a nfs-rdma mount point. Signed-off-by: NDevesh Sharma <devesh.sharma@avagotech.com> Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Reviewed-by: NSagi Grimberg <sagig@dev.mellanox.co.il> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
由 Chuck Lever 提交于
RDMA_NOMSG type calls are less efficient than RDMA_MSG. Count NOMSG calls so administrators can tell if they happen to be used more than expected. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Tested-by: NDevesh Sharma <devesh.sharma@avagotech.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
由 Chuck Lever 提交于
checkpatch.pl complained about the seq_printf() format string split across lines and the use of %Lu. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Tested-by: NDevesh Sharma <devesh.sharma@avagotech.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
由 Chuck Lever 提交于
Repair how rpcrdma_marshal_req() chooses which RDMA message type to use for large non-WRITE operations so that it picks RDMA_NOMSG in the correct situations, and sets up the marshaling logic to SEND only the RPC/RDMA header. Large NFSv2 SYMLINK requests now use RDMA_NOMSG calls. The Linux NFS server XDR decoder for NFSv2 SYMLINK does not handle having the pathname argument arrive in a separate buffer. The decoder could be fixed, but this is simpler and RDMA_NOMSG can be used in a variety of other situations. Ensure that the Linux client continues to use "RDMA_MSG + read list" when sending large NFSv3 SYMLINK requests, which is more efficient than using RDMA_NOMSG. Large NFSv4 CREATE(NF4LNK) requests are changed to use "RDMA_MSG + read list" just like NFSv3 (see Section 5 of RFC 5667). Before, these did not work at all. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Tested-by: NDevesh Sharma <devesh.sharma@avagotech.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
由 Chuck Lever 提交于
Currently xprtrdma appends an extra chunk element to the RPC/RDMA read chunk list of each NFSv4 WRITE compound. The extra element contains the final GETATTR operation in the compound. The result is an extra RDMA READ operation to transfer a very short piece of each NFS WRITE compound (typically 16 bytes). This is inefficient. It is also incorrect. The client is sending the trailing GETATTR at the same Position as the preceding WRITE data payload. Whether or not RFC 5667 allows the GETATTR to appear in a read chunk, RFC 5666 requires that these two separate RPC arguments appear at two distinct Positions. It can also be argued that the GETATTR operation is not bulk data, and therefore RFC 5667 forbids its appearance in a read chunk at all. Although RFC 5667 is not precise about when using a read list with NFSv4 COMPOUND is allowed, the intent is that only data arguments not touched by NFS (ie, read and write payloads) are to be sent using RDMA READ or WRITE. The NFS client constructs GETATTR arguments itself, and therefore is required to send the trailing GETATTR operation as additional inline content, not as a data payload. NB: This change is not backwards compatible. Some older servers do not accept inline content following the read list. The Linux NFS server should handle this content correctly as of commit a97c331f ("svcrdma: Handle additional inline content"). Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Tested-by: NDevesh Sharma <devesh.sharma@avagotech.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
由 Chuck Lever 提交于
Currently Linux always offers a reply chunk, even when the reply can be sent inline (ie. is smaller than 1KB). On the client, registering a memory region can be expensive. A server may choose not to use the reply chunk, wasting the cost of the registration. This is a change only for RPC replies smaller than 1KB which the server constructs in the RPC reply send buffer. Because the elements of the reply must be XDR encoded, a copy-free data transfer has no benefit in this case. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Reviewed-by: NSagi Grimberg <sagig@mellanox.com> Tested-by: NDevesh Sharma <devesh.sharma@avagotech.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
由 Chuck Lever 提交于
The client has been setting up a reply chunk for NFS READs that are smaller than the inline threshold. This is not efficient: both the server and client CPUs have to copy the reply's data payload into and out of the memory region that is then transferred via RDMA. Using the write list, the data payload is moved by the device and no extra data copying is necessary. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Reviewed-by: NDevesh Sharma <devesh.sharma@avagotech.com> Reviewed-By: NSagi Grimberg <sagig@mellanox.com> Tested-by: NDevesh Sharma <devesh.sharma@avagotech.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-