- 07 10月, 2021 1 次提交
-
-
由 Benjamin Coddington 提交于
If nfsd has existing listening sockets without any processes, then an error returned from svc_create_xprt() for an additional transport will remove those existing listeners. We're seeing this in practice when userspace attempts to create rpcrdma transports without having the rpcrdma modules present before creating nfsd kernel processes. Fix this by checking for existing sockets before calling nfsd_destroy(). Signed-off-by: NBenjamin Coddington <bcodding@redhat.com> Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
-
- 01 10月, 2021 3 次提交
-
-
由 J. Bruce Fields 提交于
If sd_max is unsigned, then sd_max - GSS_SEQ_WIN is a very large number whenever sd_max is less than GSS_SEQ_WIN, and the comparison: seq_num <= sd->sd_max - GSS_SEQ_WIN in gss_check_seq_num is pretty much always true, even when that's clearly not what was intended. This was causing pynfs to hang when using krb5, because pynfs uses zero as the initial gss sequence number. That's perfectly legal, but this logic error causes knfsd to drop the rpc in that case. Out-of-order sequence IDs in the first GSS_SEQ_WIN (128) calls will also cause this. Fixes: 10b9d99a ("SUNRPC: Augment server-side rpcgss tracepoints") Cc: stable@vger.kernel.org Signed-off-by: NJ. Bruce Fields <bfields@redhat.com> Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
-
由 Trond Myklebust 提交于
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
-
由 Trond Myklebust 提交于
RFC3530 notes that the 'dircount' field may be zero, in which case the recommendation is to ignore it, and only enforce the 'maxcount' field. In RFC5661, this recommendation to ignore a zero valued field becomes a requirement. Fixes: aee37764 ("nfsd4: fix rd_dircount enforcement") Cc: <stable@vger.kernel.org> Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
-
- 30 9月, 2021 1 次提交
-
-
由 Patrick Ho 提交于
init_nfsd() should not unregister pernet subsys if the register fails but should instead unwind from the last successful operation which is register_filesystem(). Unregistering a failed register_pernet_subsys() call can result in a kernel GPF as revealed by programmatically injecting an error in register_pernet_subsys(). Verified the fix handled failure gracefully with no lingering nfsd entry in /proc/filesystems. This change was introduced by the commit bd5ae928 ("nfsd: register pernet ops last, unregister first"), the original error handling logic was correct. Fixes: bd5ae928 ("nfsd: register pernet ops last, unregister first") Cc: stable@vger.kernel.org Signed-off-by: NPatrick Ho <Patrick.Ho@netapp.com> Acked-by: NJ. Bruce Fields <bfields@redhat.com> Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
-
- 17 9月, 2021 2 次提交
-
-
由 Dai Ngo 提交于
When the back channel enters SEQ4_STATUS_CB_PATH_DOWN state, the client recovers by sending BIND_CONN_TO_SESSION but the server fails to recover the back channel and leaves it as NFSD4_CB_DOWN. Fix by enhancing nfsd4_bind_conn_to_session to probe the back channel by calling nfsd4_probe_callback. Signed-off-by: NDai Ngo <dai.ngo@oracle.com> Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
-
由 Chuck Lever 提交于
Dai Ngo reports that, since the XDR overhaul, the NLM server crashes when the TEST procedure wants to return NLM_DENIED. There is a bug in svcxdr_encode_owner() that none of our standard test cases found. Replace the open-coded function with a call to an appropriate pre-fabricated XDR helper. Reported-by: NDai Ngo <Dai.Ngo@oracle.com> Fixes: a6a63ca5 ("lockd: Common NLM XDR helpers") Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
-
- 04 9月, 2021 1 次提交
-
-
由 NeilBrown 提交于
When the NFS server receives a large gss (kerberos) credential and tries to pass it up to rpc.svcgssd (which is deprecated), it triggers an infinite loop in cache_read(). cache_request() always returns -EAGAIN, and this causes a "goto again". This patch: - changes the error to -E2BIG to avoid the infinite loop, and - generates a WARN_ONCE when rsi_request first sees an over-sized credential. The warning suggests switching to gssproxy. Link: https://bugzilla.kernel.org/show_bug.cgi?id=196583Signed-off-by: NNeilBrown <neilb@suse.de> Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
-
- 01 9月, 2021 1 次提交
-
-
由 NeilBrown 提交于
alloc_pages_bulk_array() attempts to allocate at least one page based on the provided pages, and then opportunistically allocates more if that can be done without dropping the spinlock. So if it returns fewer than requested, that could just mean that it needed to drop the lock. In that case, try again immediately. Only pause for a time if no progress could be made. Reported-and-tested-by: NMike Javorski <mike.javorski@gmail.com> Reported-and-tested-by: NLothar Paltins <lopa@mailbox.org> Fixes: f6e70aab ("SUNRPC: refresh rq_pages using a bulk page allocator") Signed-off-by: NNeilBrown <neilb@suse.de> Acked-by: NMel Gorman <mgorman@suse.com> Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
-
- 27 8月, 2021 4 次提交
-
-
由 J. Bruce Fields 提交于
Unlike other filesystems, NFSv3 tries to use fl_file in the GETLK case. Signed-off-by: NJ. Bruce Fields <bfields@redhat.com> Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
-
由 J. Bruce Fields 提交于
In the reexport case, nfsd is currently passing along locks with the reclaim bit set. The client sends a new lock request, which is granted if there's currently no conflict--even if it's possible a conflicting lock could have been briefly held in the interim. We don't currently have any way to safely grant reclaim, so for now let's just deny them all. I'm doing this by passing the reclaim bit to nfs and letting it fail the call, with the idea that eventually the client might be able to do something more forgiving here. Signed-off-by: NJ. Bruce Fields <bfields@redhat.com> Acked-by: NAnna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
-
由 J. Bruce Fields 提交于
As in the v4 case, it doesn't work well to block waiting for a lock on an nfs filesystem. As in the v4 case, that means we're depending on the client to poll. It's probably incorrect to depend on that, but I *think* clients do poll in practice. In any case, it's an improvement over hanging the lockd thread indefinitely as we currently are. Signed-off-by: NJ. Bruce Fields <bfields@redhat.com> Acked-by: NAnna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
-
由 J. Bruce Fields 提交于
NFS implements blocking locks by blocking inside its lock method. In the reexport case, this blocks the nfs server thread, which could lead to deadlocks since an nfs server thread might be required to unlock the conflicting lock. It also causes a crash, since the nfs server thread assumes it can free the lock when its lm_notify lock callback is called. Ideal would be to make the nfs lock method return without blocking in this case, but for now it works just not to attempt blocking locks. The difference is just that the original client will have to poll (as it does in the v4.0 case) instead of getting a callback when the lock's available. Signed-off-by: NJ. Bruce Fields <bfields@redhat.com> Acked-by: NAnna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
-
- 24 8月, 2021 4 次提交
-
-
由 J. Bruce Fields 提交于
We shouldn't really be using a read-only file descriptor to take a write lock. Most filesystems will put up with it. But NFS, for example, won't. Signed-off-by: NJ. Bruce Fields <bfields@redhat.com> Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
-
由 J. Bruce Fields 提交于
Update comment to reflect that we *do* allow reexport, whether it's a good idea or not.... Signed-off-by: NJ. Bruce Fields <bfields@redhat.com> Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
-
由 J. Bruce Fields 提交于
Make this lookup slightly more concise, and prepare for changing how we look this up in a following patch. Signed-off-by: NJ. Bruce Fields <bfields@redhat.com> Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
-
由 J. Bruce Fields 提交于
It'll come in handy to get the whole nlm_lock. Signed-off-by: NJ. Bruce Fields <bfields@redhat.com> Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
-
- 21 8月, 2021 5 次提交
-
-
由 J. Bruce Fields 提交于
Locks have two sets of op arrays, fl_lmops for the lock manager (lockd or nfsd), fl_ops for the filesystem. The server-side lockd code has been setting its own fl_ops, which leads to confusion (and crashes) in the reexport case, where the filesystem expects to be the only one setting fl_ops. And there's no reason for it that I can see-the lm_get/put_owner ops do the same job. Reported-by: NDaire Byrne <daire@dneg.com> Tested-by: NDaire Byrne <daire@dneg.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com> Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
-
由 Chuck Lever 提交于
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
-
由 Chuck Lever 提交于
Disconnect injection stress-tests the ability for both client and server implementations to behave resiliently in the face of network instability. A file called /sys/kernel/debug/fail_sunrpc/ignore-server-disconnect enables administrators to turn off server-side disconnect injection while allowing other types of sunrpc errors to be injected. The default setting is that server-side disconnect injection is enabled (ignore=false). Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
-
由 Chuck Lever 提交于
Disconnect injection stress-tests the ability for both client and server implementations to behave resiliently in the face of network instability. Convert the existing client-side disconnect injection infrastructure to use the kernel's generic error injection facility. The generic facility has a richer set of injection criteria. Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
-
由 Chuck Lever 提交于
This directory will contain a set of administrative controls for enabling error injection for kernel RPC consumers. Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
-
- 19 8月, 2021 1 次提交
-
-
由 Chuck Lever 提交于
svc_xprt_free() already "puts" the bc_xprt before calling the transport's "free" method. No need to do it twice. Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
-
- 17 8月, 2021 17 次提交
-
-
由 J. Bruce Fields 提交于
This should use the network-namespace-wide client_lock, not the per-client cl_lock. You shouldn't see any bugs unless you're actually using the forced-expiry interface introduced by 89c905be. Fixes: 89c905be "nfsd: allow forced expiration of NFSv4 clients" Signed-off-by: NJ. Bruce Fields <bfields@redhat.com> Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
-
由 J. Bruce Fields 提交于
The failure case here should be rare, but it's obviously wrong. Signed-off-by: NJ. Bruce Fields <bfields@redhat.com> Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
-
由 Chuck Lever 提交于
Shared by client and server. See: https://www.iana.org/assignments/rpc-authentication-numbers/rpc-authentication-numbers.xhtmlSigned-off-by: NChuck Lever <chuck.lever@oracle.com>
-
由 Jia He 提交于
nsm_use_hostnames is a module parameter and it will be exported to sysctl procfs. This is to let user sometimes change it from userspace. But the minimal unit for sysctl procfs read/write it sizeof(int). In big endian system, the converting from/to bool to/from int will cause error for proc items. This patch use a new proc_handler proc_dobool to fix it. Signed-off-by: NJia He <hejianet@gmail.com> Reviewed-by: NPan Xinhui <xinhui.pan@linux.vnet.ibm.com> [thuth: Fix typo in commit message] Signed-off-by: NThomas Huth <thuth@redhat.com> Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
-
由 Jia He 提交于
This is to let bool variable could be correctly displayed in big/little endian sysctl procfs. sizeof(bool) is arch dependent, proc_dobool should work in all arches. Suggested-by: NPan Xinhui <xinhui@linux.vnet.ibm.com> Signed-off-by: NJia He <hejianet@gmail.com> [thuth: rebased the patch to the current kernel version] Signed-off-by: NThomas Huth <thuth@redhat.com> Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
-
由 Chuck Lever 提交于
Some paths through svc_process() leave rqst->rq_procinfo set to NULL, which triggers a crash if tracing happens to be enabled. Fixes: 89ff8749 ("SUNRPC: Display RPC procedure names instead of proc numbers") Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
-
由 NeilBrown 提交于
Including one's name in copyright claims is appropriate. Including it in random comments is just vanity. After 2 decades, it is time for these to be gone. Signed-off-by: NNeilBrown <neilb@suse.de> Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
-
由 Chuck Lever 提交于
Relieve contention on sc_rw_ctxt_lock by converting rdma->sc_rw_ctxts to an llist. The goal is to reduce the average overhead of Send completions, because a transport's completion handlers are single-threaded on one CPU core. This change reduces CPU utilization of each Send completion by 2-3% on my server. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Reviewed-By: NTom Talpey <tom@talpey.com>
-
由 Chuck Lever 提交于
/proc/lock_stat indicates the the sc_send_lock is heavily contended when the server is under load from a single client. To address this, convert the send_ctxt free list to an llist. Returning an item to the send_ctxt cache is now waitless, which reduces the instruction path length in the single-threaded Send handler (svc_rdma_wc_send). The goal is to enable the ib_comp_wq worker to handle a higher RPC/RDMA Send completion rate given the same CPU resources. This change reduces CPU utilization of Send completion by 2-3% on my server. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Reviewed-By: NTom Talpey <tom@talpey.com>
-
由 Chuck Lever 提交于
Because wake_up() takes an IRQ-safe lock, it can be expensive, especially to call inside of a single-threaded completion handler. What's more, the Send wait queue almost never has waiters, so most of the time, this is an expensive no-op. As always, the goal is to reduce the average overhead of each completion, because a transport's completion handlers are single- threaded on one CPU core. This change reduces CPU utilization of the Send completion thread by 2-3% on my server. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Reviewed-By: NTom Talpey <tom@talpey.com>
-
由 Benjamin Coddington 提交于
After calling vfs_test_lock() the pointer to a conflicting lock can be returned, and that lock is not guarunteed to be owned by nlm. In that case, we cannot cast it to struct nlm_lockowner. Instead return the pid of that conflicting lock. Fixes: 646d73e9 ("lockd: Show pid of lockd for remote locks") Signed-off-by: NBenjamin Coddington <bcodding@redhat.com> Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
-
由 Chuck Lever 提交于
Clean up. Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
-
由 Chuck Lever 提交于
Clean up. Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
-
由 Steven Rostedt (VMware) 提交于
There's a few cases that a string that is to be recorded in a trace event, does not have a terminating 'nul' character, and instead, the tracepoint passes in the length of the string to record. Add two helper macros to the trace event code that lets this work easier, than tricks with "%.*s" logic. __string_len() which is similar to __string() for declaration, but takes a length argument. __assign_str_len() which is similar to __assign_str() for assiging the string, but it too takes a length argument. Note, the TRACE_EVENT() macro will allocate the location on the ring buffer to 'len + 1', that will be used to store the string into. It is a requirement that the 'len' used for this is a most the length of the string being recorded. This string can still use __get_str() just like strings created with __string() can use to retrieve the string. Link: https://lore.kernel.org/linux-nfs/20210513105018.7539996a@gandalf.local.home/Tested-by: NChuck Lever <chuck.lever@oracle.com> Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
-
由 Chuck Lever 提交于
Large splice reads call put_page() repeatedly. put_page() is relatively expensive to call, so replace it with the new svc_rqst_replace_page() helper to help amortize that cost. Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Reviewed-by: NNeilBrown <neilb@suse.de>
-
由 Chuck Lever 提交于
Replacing a page in rq_pages[] requires a get_page(), which is a bus-locked operation, and a put_page(), which can be even more costly. To reduce the cost of replacing a page in rq_pages[], batch the put_page() operations by collecting "freed" pages in a pagevec, and then release those pages when the pagevec is full. This pagevec is also emptied when each RPC completes. Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
-
由 Chuck Lever 提交于
A few useful observations: - The value in @size is never modified. - splice_desc.len is an unsigned int, and so is xdr_buf.page_len. An implicit cast to size_t is unnecessary. - The computation of .page_len is the same in all three arms of the "if" statement, so hoist it out to make it clear that the operation is an unconditional invariant. The resulting function is 18 bytes shorter on my system (-Os). Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Reviewed-by: NNeilBrown <neilb@suse.de>
-