- 09 3月, 2021 1 次提交
-
-
由 Benjamin Coddington 提交于
We could recurse into NFS doing memory reclaim while sending a sync task, which might result in a deadlock. Set memalloc_nofs_save for sync task execution. Fixes: a1231fda ("SUNRPC: Set memalloc_nofs_save() on all rpciod/xprtiod jobs") Signed-off-by: NBenjamin Coddington <bcodding@redhat.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
- 17 2月, 2021 3 次提交
-
-
由 Chuck Lever 提交于
Clean up: The msghdr is no longer needed in the caller. Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
-
由 Trond Myklebust 提交于
Now that the caller controls the TCP_CORK socket option, it is redundant to set MSG_MORE and MSG_SENDPAGE_NOTLAST in the calls to kernel_sendpage(). Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
-
由 Trond Myklebust 提交于
Use a counter to keep track of how many requests are queued behind the xprt->xpt_mutex, and keep TCP_CORK set until the queue is empty. Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com> Link: https://lore.kernel.org/linux-nfs/20210213202532.23146-1-trondmy@kernel.org/T/#uSigned-off-by: NChuck Lever <chuck.lever@oracle.com>
-
- 15 2月, 2021 1 次提交
-
-
由 Chuck Lever 提交于
RDMA core mutex locking was restructured by commit d114c6fe ("RDMA/cma: Add missing locking to rdma_accept()") [Aug 2020]. When lock debugging is enabled, the RPC/RDMA server trips over the new lockdep assertion in rdma_accept() because it doesn't call rdma_accept() from its CM event handler. As a temporary fix, have svc_rdma_accept() take the handler_mutex explicitly. In the meantime, let's consider how to restructure the RPC/RDMA transport to invoke rdma_accept() from the proper context. Calls to svc_rdma_accept() are serialized with calls to svc_rdma_free() by the generic RPC server layer. Suggested-by: NJason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/linux-rdma/20210209154014.GO4247@nvidia.com/ Fixes: d114c6fe ("RDMA/cma: Add missing locking to rdma_accept()") Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
-
- 06 2月, 2021 6 次提交
-
-
由 Chuck Lever 提交于
Since commit 9ed5af26 ("SUNRPC: Clean up the handling of page padding in rpc_prepare_reply_pages()") [Dec 2020] the NFS client passes payload data to the transport with the padding in xdr->pages instead of in the send buffer's tail kvec. There's no need for the extra logic to advance the base of the tail kvec because the upper layer no longer places XDR padding there. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
由 Chuck Lever 提交于
The NetApp Linux team discovered that with NFS/RDMA servers that do not support RFC 8797, the Linux client is forming NFSv4.x WRITE requests incorrectly. In this case, the Linux NFS client disables implicit chunk round-up for odd-length Read and Write chunks. The goal was to support old servers that needed that padding to be sent explicitly by clients. In that case the Linux NFS included the tail kvec in the Read chunk, since the tail contains any needed padding. That meant a separate memory registration is needed for the tail kvec, adding to the cost of forming such requests. To avoid that cost for a mere 3 bytes of zeroes that are always ignored by receivers, we try to use implicit roundup when possible. For NFSv4.x, the tail kvec also sometimes contains a trailing GETATTR operation. The Linux NFS client unintentionally includes that GETATTR operation in the Read chunk as well as inline. The fix is simply to /never/ include the tail kvec when forming a data payload Read chunk. The padding is thus now always present. Note that since commit 9ed5af26 ("SUNRPC: Clean up the handling of page padding in rpc_prepare_reply_pages()") [Dec 2020] the NFS client passes payload data to the transport with the padding in xdr->pages instead of in the send buffer's tail kvec. So now the Linux NFS client appends XDR padding to all odd-sized Read chunks. This shouldn't be a problem because: - RFC 8166-compliant servers are supposed to work with or without that XDR padding in Read chunks. - Since the padding is now in the same memory region as the data payload, a separate memory registration is not needed. In addition, the link layer extends data in RDMA Read responses to 4-byte boundaries anyway. Thus there is now no savings when the padding is not included. Because older kernels include the payload's XDR padding in the tail kvec, a fix there will be more complicated. Thus backporting this patch is not recommended. Reported by: Olga Kornievskaia <Olga.Kornievskaia@netapp.com> Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Reviewed-by: NTom Talpey <tom@talpey.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
由 Chuck Lever 提交于
During the final stages of publication of RFC 8167, reviewers requested that we use the term "reverse direction" rather than "backwards direction". Update comments to reflect this preference. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Reviewed-by: NTom Talpey <tom@talpey.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
由 Chuck Lever 提交于
Clean up so that offset_in_page() is invoked less often in the most common case, which is mapping xdr->pages. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Reviewed-by: NTom Talpey <tom@talpey.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
由 Chuck Lever 提交于
Clean up. Remove a conditional branch from the SGL set-up loop in frwr_map(): Instead of using either sg_set_page() or sg_set_buf(), initialize the mr_page field properly when rpcrdma_convert_kvec() converts the kvec to an SGL entry. frwr_map() can then invoke sg_set_page() unconditionally. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Reviewed-by: NTom Talpey <tom@talpey.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
由 Chuck Lever 提交于
Support for FMR was removed by commit ba69cd12 ("xprtrdma: Remove support for FMR memory registration") [Dec 2018]. That means the buffer-splitting behavior of rpcrdma_convert_kvec(), added by commit 821c791a ("xprtrdma: Segment head and tail XDR buffers on page boundaries") [Mar 2016], is no longer necessary. FRWR memory registration handles this case with aplomb. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
- 02 2月, 2021 1 次提交
-
-
由 Gustavo A. R. Silva 提交于
In preparation to enable -Wimplicit-fallthrough for Clang, fix multiple warnings by explicitly adding multiple break statements instead of letting the code fall through to the next case. Link: https://github.com/KSPP/linux/issues/115Signed-off-by: NGustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
- 01 2月, 2021 3 次提交
-
-
由 Bhaskar Chowdhury 提交于
Few trivial and rudimentary spell corrections. Signed-off-by: NBhaskar Chowdhury <unixbhaskar@gmail.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
由 Calum Mackay 提交于
This comment was introduced by commit 6ea44adc ("SUNRPC: ensure correct error is reported by xs_tcp_setup_socket()"). I believe EIO was a typo at the time: it should have been EAGAIN. Subsequently, commit 0445f92c ("SUNRPC: Fix disconnection races") changed that to ENOTCONN. Rather than trying to keep the comment here in sync with the code in xprt_force_disconnect(), make the point in a non-specific way. Fixes: 6ea44adc ("SUNRPC: ensure correct error is reported by xs_tcp_setup_socket()") Signed-off-by: NCalum Mackay <calum.mackay@oracle.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
-
由 Chuck Lever 提交于
Anj Duvnjak reports that the Kodi.tv NFS client is not able to read video files from a v5.10.11 Linux NFS server. The new sendpage-based TCP sendto logic was not attentive to non- zero page_base values. nfsd_splice_read() sets that field when a READ payload starts in the middle of a page. The Linux NFS client rarely emits an NFS READ that is not page- aligned. All of my testing so far has been with Linux clients, so I missed this one. Reported-by: NA. Duvnjak <avian@extremenerds.net> BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=211471 Fixes: 4a85a6a3 ("SUNRPC: Handle TCP socket sends with kernel_sendpage() again") Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Tested-by: NA. Duvnjak <avian@extremenerds.net>
-
- 26 1月, 2021 2 次提交
-
-
由 Dave Wysochanski 提交于
When handling an auth_gss downcall, it's possible to get 0-length opaque object for the acceptor. In the case of a 0-length XDR object, make sure simple_get_netobj() fills in dest->data = NULL, and does not continue to kmemdup() which will set dest->data = ZERO_SIZE_PTR for the acceptor. The trace event code can handle NULL but not ZERO_SIZE_PTR for a string, and so without this patch the rpcgss_context trace event will crash the kernel as follows: [ 162.887992] BUG: kernel NULL pointer dereference, address: 0000000000000010 [ 162.898693] #PF: supervisor read access in kernel mode [ 162.900830] #PF: error_code(0x0000) - not-present page [ 162.902940] PGD 0 P4D 0 [ 162.904027] Oops: 0000 [#1] SMP PTI [ 162.905493] CPU: 4 PID: 4321 Comm: rpc.gssd Kdump: loaded Not tainted 5.10.0 #133 [ 162.908548] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 [ 162.910978] RIP: 0010:strlen+0x0/0x20 [ 162.912505] Code: 48 89 f9 74 09 48 83 c1 01 80 39 00 75 f7 31 d2 44 0f b6 04 16 44 88 04 11 48 83 c2 01 45 84 c0 75 ee c3 0f 1f 80 00 00 00 00 <80> 3f 00 74 10 48 89 f8 48 83 c0 01 80 38 00 75 f7 48 29 f8 c3 31 [ 162.920101] RSP: 0018:ffffaec900c77d90 EFLAGS: 00010202 [ 162.922263] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 00000000fffde697 [ 162.925158] RDX: 000000000000002f RSI: 0000000000000080 RDI: 0000000000000010 [ 162.928073] RBP: 0000000000000010 R08: 0000000000000e10 R09: 0000000000000000 [ 162.930976] R10: ffff8e698a590cb8 R11: 0000000000000001 R12: 0000000000000e10 [ 162.933883] R13: 00000000fffde697 R14: 000000010034d517 R15: 0000000000070028 [ 162.936777] FS: 00007f1e1eb93700(0000) GS:ffff8e6ab7d00000(0000) knlGS:0000000000000000 [ 162.940067] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 162.942417] CR2: 0000000000000010 CR3: 0000000104eba000 CR4: 00000000000406e0 [ 162.945300] Call Trace: [ 162.946428] trace_event_raw_event_rpcgss_context+0x84/0x140 [auth_rpcgss] [ 162.949308] ? __kmalloc_track_caller+0x35/0x5a0 [ 162.951224] ? gss_pipe_downcall+0x3a3/0x6a0 [auth_rpcgss] [ 162.953484] gss_pipe_downcall+0x585/0x6a0 [auth_rpcgss] [ 162.955953] rpc_pipe_write+0x58/0x70 [sunrpc] [ 162.957849] vfs_write+0xcb/0x2c0 [ 162.959264] ksys_write+0x68/0xe0 [ 162.960706] do_syscall_64+0x33/0x40 [ 162.962238] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 162.964346] RIP: 0033:0x7f1e1f1e57df Signed-off-by: NDave Wysochanski <dwysocha@redhat.com> Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>
-
由 Dave Wysochanski 提交于
Remove duplicated helper functions to parse opaque XDR objects and place inside new file net/sunrpc/auth_gss/auth_gss_internal.h. In the new file carry the license and copyright from the source file net/sunrpc/auth_gss/auth_gss.c. Finally, update the comment inside include/linux/sunrpc/xdr.h since lockd is not the only user of struct xdr_netobj. Signed-off-by: NDave Wysochanski <dwysocha@redhat.com> Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>
-
- 25 1月, 2021 8 次提交
-
-
由 Chuck Lever 提交于
Clean up: The rq_argpages field was removed from struct svc_rqst in the pre-git era. Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
-
由 Chuck Lever 提交于
The Receive completion handler doesn't look at the contents of the Receive buffer. The DMA sync isn't terribly expensive but it's one less thing that needs to be done by the Receive completion handler, which is single-threaded (per svc_xprt). This helps scalability. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Reviewed-by: NChristoph Hellwig <hch@lst.de>
-
由 Chuck Lever 提交于
This is similar to commit e340c2d6 ("xprtrdma: Reduce the doorbell rate (Receive)") which added Receive batching to the client. Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
-
由 Chuck Lever 提交于
Clean up. We are not permitted to remove old proc files. Instead, convert these variables to stubs that are only ever allowed to display a value of zero. Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
-
由 Chuck Lever 提交于
Now that we have an efficient mechanism to update these two stats, let's start maintaining them again. Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
-
由 Chuck Lever 提交于
Avoid the overhead of a memory bus lock cycle for counting a value that is hardly every used. Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
-
由 Chuck Lever 提交于
Receives are frequent events. Avoid the overhead of a memory bus lock cycle for counting a value that is hardly every used. Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
-
由 Chuck Lever 提交于
Setting up the proc variables is about to get more complicated. Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
-
- 13 1月, 2021 1 次提交
-
-
由 Chuck Lever 提交于
Commit 156708ad ("SUNRPC: Move the svc_xdr_recvfrom() tracepoint") tried to capture the correct XID in the trace record, but this line in svc_recv: rqstp->rq_xid = svc_getu32(&rqstp->rq_arg.head[0]); alters the size of rq_arg.head[0].iov_len. The tracepoint records the correct XID but an incorrect value for the length of the xdr_buf's head. To keep the trace callsites simple, I've created two trace classes. One assumes the xdr_buf contains a full RPC message, and the XID can be extracted from it. The other assumes the contents of the xdr_buf are arbitrary, and the xid will be provided by the caller. Currently there is only one user of each class, but I expect we will need a few more tracepoints using each class as time goes on. Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
-
- 11 1月, 2021 1 次提交
-
-
由 j.nixdorf@avm.de 提交于
A return value of 0 means success. This is documented in lib/kstrtox.c. This was found by trying to mount an NFS share from a link-local IPv6 address with the interface specified by its index: mount("[fe80::1%1]:/srv/nfs", "/mnt", "nfs", 0, "nolock,addr=fe80::1%1") Before this commit this failed with EINVAL and also caused the following message in dmesg: [...] NFS: bad IP address specified: addr=fe80::1%1 The syscall using the same address based on the interface name instead of its index succeeds. Credits for this patch go to my colleague Christian Speich, who traced the origin of this bug to this line of code. Signed-off-by: NJohannes Nixdorf <j.nixdorf@avm.de> Fixes: 00cfaa94 ("replace strict_strto calls") Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>
-
- 19 12月, 2020 1 次提交
-
-
由 Chuck Lever 提交于
Daire Byrne reports a ~50% aggregrate throughput regression on his Linux NFS server after commit da1661b9 ("SUNRPC: Teach server to use xprt_sock_sendmsg for socket sends"), which replaced kernel_send_page() calls in NFSD's socket send path with calls to sock_sendmsg() using iov_iter. Investigation showed that tcp_sendmsg() was not using zero-copy to send the xdr_buf's bvec pages, but instead was relying on memcpy. This means copying every byte of a large NFS READ payload. It looks like TLS sockets do indeed support a ->sendpage method, so it's really not necessary to use xprt_sock_sendmsg() to support TLS fully on the server. A mechanical reversion of da1661b9 is not possible at this point, but we can re-implement the server's TCP socket sendmsg path using kernel_sendpage(). Reported-by: NDaire Byrne <daire@dneg.com> BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=209439Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
-
- 14 12月, 2020 10 次提交
-
-
由 Trond Myklebust 提交于
If we're shifting the page data to the right, and this happens to be a sparse page array, then we may need to allocate new pages in order to receive the data. Reported-by: N"Mkrtchyan, Tigran" <tigran.mkrtchyan@desy.de> Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>
-
由 Trond Myklebust 提交于
There are a number of xdr helpers for struct xdr_buf that do not change the structure itself. Mark those as taking const pointers for documentation purposes. Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>
-
由 Trond Myklebust 提交于
Move the setting of the xdr_stream 'nwords' field into the helpers that reset the xdr_stream cursor. Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>
-
由 Trond Myklebust 提交于
Clean up callers of _copy_to/from_pages() that still check for a zero length. Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>
-
由 Trond Myklebust 提交于
Clean up xdr_shrink_bufhead() to use the new helpers instead of doing its own thing. Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>
-
由 Trond Myklebust 提交于
We do want to try to grow the buffer if possible, but if that attempt fails, we still want to move the data and truncate the XDR message. Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>
-
由 Trond Myklebust 提交于
The main use case right now for xdr_align_data() is to shift the page data to the left, and in practice shrink the total XDR data buffer. This patch ensures that we fix up the accounting for the buffer length as we shift that data around. Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>
-
由 Trond Myklebust 提交于
Exit early if the shift is zero. Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>
-
由 Chuck Lever 提交于
Olga K. observed that rpcrdma_marsh_req() allocates sparse pages only when it has determined that a Reply chunk is necessary. There are plenty of cases where no Reply chunk is needed, but the XDRBUF_SPARSE_PAGES flag is set. The result would be a crash in rpcrdma_inline_fixup() when it tries to copy parts of the received Reply into a missing page. To avoid crashing, handle sparse page allocation up front. Until XATTR support was added, this issue did not appear often because the only SPARSE_PAGES consumer always expected a reply large enough to always require a Reply chunk. Reported-by: NOlga Kornievskaia <kolga@netapp.com> Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Cc: <stable@vger.kernel.org> Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>
-
由 Dan Aloni 提交于
When receiving pages data, return value 'ret' when positive includes `buf->page_base`, so we should subtract that before it is used for changing `offset` and comparing against `want`. This was discovered on the very rare cases where the server returned a chunk of bytes that when added to the already received amount of bytes for the pages happened to match the current `recv.len`, for example on this case: buf->page_base : 258356 actually received from socket: 1740 ret : 260096 want : 260096 In this case neither of the two 'if ... goto out' trigger, and we continue to tail parsing. Worth to mention that the ensuing EMSGSIZE from the continued execution of `xs_read_xdr_buf` may be observed by an application due to 4 superfluous bytes being added to the pages data. Fixes: 277e4ab7 ("SUNRPC: Simplify TCP receive code by switching to using iterators") Signed-off-by: NDan Aloni <dan@kernelim.com> Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>
-
- 09 12月, 2020 2 次提交
-
-
由 Chuck Lever 提交于
There's no need to defer allocation of pages for the receive buffer. - This upcall is quite infrequent - gssp_alloc_receive_pages() can allocate the pages with GFP_KERNEL, unlike the transport - gssp_alloc_receive_pages() knows exactly how many pages are needed Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Reviewed-by: NOlga Kornievskaia <kolga@netapp.com>
-
We can simplify code around cache_downcall unifying memory allocations using kvmalloc. This has the benefit of getting rid of cache_slow_downcall (and queue_io_mutex), and also matches userland allocation size and limits. Signed-off-by: NRoberto Bergantinos Corpas <rbergant@redhat.com> Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
-