提交 · 2ba5acfb34957e8a7fe47cd78c77ca88e9cc2b03 · openeuler / Kernel

01 10月, 2021 3 次提交

SUNRPC: fix sign error causing rpcsec_gss drops · 2ba5acfb

由 J. Bruce Fields 提交于 10月 01, 2021

If sd_max is unsigned, then sd_max - GSS_SEQ_WIN is a very large number
whenever sd_max is less than GSS_SEQ_WIN, and the comparison:

	seq_num <= sd->sd_max - GSS_SEQ_WIN

in gss_check_seq_num is pretty much always true, even when that's
clearly not what was intended.

This was causing pynfs to hang when using krb5, because pynfs uses zero
as the initial gss sequence number.  That's perfectly legal, but this
logic error causes knfsd to drop the rpc in that case.  Out-of-order
sequence IDs in the first GSS_SEQ_WIN (128) calls will also cause this.

Fixes: 10b9d99a ("SUNRPC: Augment server-side rpcgss tracepoints")
Cc: stable@vger.kernel.org
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>

2ba5acfb

nfsd: Fix a warning for nfsd_file_close_inode · 19598141

由 Trond Myklebust 提交于 9月 30, 2021

Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>

19598141

nfsd4: Handle the NFSv4 READDIR 'dircount' hint being zero · f2e717d6

由 Trond Myklebust 提交于 9月 30, 2021

RFC3530 notes that the 'dircount' field may be zero, in which case the
recommendation is to ignore it, and only enforce the 'maxcount' field.
In RFC5661, this recommendation to ignore a zero valued field becomes a
requirement.

Fixes: aee37764 ("nfsd4: fix rd_dircount enforcement")
Cc: <stable@vger.kernel.org>
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>

f2e717d6

30 9月, 2021 1 次提交

nfsd: fix error handling of register_pernet_subsys() in init_nfsd() · 1d625050

由 Patrick Ho 提交于 8月 21, 2021

init_nfsd() should not unregister pernet subsys if the register fails
but should instead unwind from the last successful operation which is
register_filesystem().

Unregistering a failed register_pernet_subsys() call can result in
a kernel GPF as revealed by programmatically injecting an error in
register_pernet_subsys().

Verified the fix handled failure gracefully with no lingering nfsd
entry in /proc/filesystems.  This change was introduced by the commit
bd5ae928 ("nfsd: register pernet ops last, unregister first"),
the original error handling logic was correct.

Fixes: bd5ae928 ("nfsd: register pernet ops last, unregister first")
Cc: stable@vger.kernel.org
Signed-off-by: NPatrick Ho <Patrick.Ho@netapp.com>
Acked-by: NJ. Bruce Fields <bfields@redhat.com>
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>

1d625050

17 9月, 2021 2 次提交

nfsd: back channel stuck in SEQ4_STATUS_CB_PATH_DOWN · 02579b2f

由 Dai Ngo 提交于 9月 16, 2021

When the back channel enters SEQ4_STATUS_CB_PATH_DOWN state, the client
recovers by sending BIND_CONN_TO_SESSION but the server fails to recover
the back channel and leaves it as NFSD4_CB_DOWN.

Fix by enhancing nfsd4_bind_conn_to_session to probe the back channel
by calling nfsd4_probe_callback.
Signed-off-by: NDai Ngo <dai.ngo@oracle.com>
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>

02579b2f

NLM: Fix svcxdr_encode_owner() · 89c485c7

由 Chuck Lever 提交于 9月 16, 2021

Dai Ngo reports that, since the XDR overhaul, the NLM server crashes
when the TEST procedure wants to return NLM_DENIED. There is a bug
in svcxdr_encode_owner() that none of our standard test cases found.

Replace the open-coded function with a call to an appropriate
pre-fabricated XDR helper.
Reported-by: NDai Ngo <Dai.Ngo@oracle.com>
Fixes: a6a63ca5 ("lockd: Common NLM XDR helpers")
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>

89c485c7

04 9月, 2021 1 次提交

SUNRPC: improve error response to over-size gss credential · 0c217d50

由 NeilBrown 提交于 9月 02, 2021

When the NFS server receives a large gss (kerberos) credential and tries
to pass it up to rpc.svcgssd (which is deprecated), it triggers an
infinite loop in cache_read().

cache_request() always returns -EAGAIN, and this causes a "goto again".

This patch:
 - changes the error to -E2BIG to avoid the infinite loop, and
 - generates a WARN_ONCE when rsi_request first sees an over-sized
   credential.  The warning suggests switching to gssproxy.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=196583Signed-off-by: NNeilBrown <neilb@suse.de>
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>

0c217d50

01 9月, 2021 1 次提交

SUNRPC: don't pause on incomplete allocation · e38b3f20

由 NeilBrown 提交于 8月 30, 2021

alloc_pages_bulk_array() attempts to allocate at least one page based on
the provided pages, and then opportunistically allocates more if that
can be done without dropping the spinlock.

So if it returns fewer than requested, that could just mean that it
needed to drop the lock.  In that case, try again immediately.

Only pause for a time if no progress could be made.
Reported-and-tested-by: NMike Javorski <mike.javorski@gmail.com>
Reported-and-tested-by: NLothar Paltins <lopa@mailbox.org>
Fixes: f6e70aab ("SUNRPC: refresh rq_pages using a bulk page allocator")
Signed-off-by: NNeilBrown <neilb@suse.de>
Acked-by: NMel Gorman <mgorman@suse.com>
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>

e38b3f20

27 8月, 2021 4 次提交

nfsd: fix crash on LOCKT on reexported NFSv3 · 0bcc7ca4

由 J. Bruce Fields 提交于 8月 24, 2021

Unlike other filesystems, NFSv3 tries to use fl_file in the GETLK case.
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>

0bcc7ca4

nfs: don't allow reexport reclaims · bb0a55bb

由 J. Bruce Fields 提交于 8月 20, 2021

In the reexport case, nfsd is currently passing along locks with the
reclaim bit set.  The client sends a new lock request, which is granted
if there's currently no conflict--even if it's possible a conflicting
lock could have been briefly held in the interim.

We don't currently have any way to safely grant reclaim, so for now
let's just deny them all.

I'm doing this by passing the reclaim bit to nfs and letting it fail the
call, with the idea that eventually the client might be able to do
something more forgiving here.
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
Acked-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>

bb0a55bb

lockd: don't attempt blocking locks on nfs reexports · b840be2f

由 J. Bruce Fields 提交于 8月 20, 2021

As in the v4 case, it doesn't work well to block waiting for a lock on
an nfs filesystem.

As in the v4 case, that means we're depending on the client to poll.
It's probably incorrect to depend on that, but I *think* clients do poll
in practice.  In any case, it's an improvement over hanging the lockd
thread indefinitely as we currently are.
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
Acked-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>

b840be2f

nfs: don't atempt blocking locks on nfs reexports · f657f8ee

由 J. Bruce Fields 提交于 8月 20, 2021

NFS implements blocking locks by blocking inside its lock method. In
the reexport case, this blocks the nfs server thread, which could lead
to deadlocks since an nfs server thread might be required to unlock the
conflicting lock. It also causes a crash, since the nfs server thread
assumes it can free the lock when its lm_notify lock callback is called.

Ideal would be to make the nfs lock method return without blocking in
this case, but for now it works just not to attempt blocking locks. The
difference is just that the original client will have to poll (as it
does in the v4.0 case) instead of getting a callback when the lock's
available.
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
Acked-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>

f657f8ee

24 8月, 2021 4 次提交

Keep read and write fds with each nlm_file · 7f024fcd

由 J. Bruce Fields 提交于 8月 23, 2021

We shouldn't really be using a read-only file descriptor to take a write
lock.

Most filesystems will put up with it.  But NFS, for example, won't.
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>

7f024fcd

lockd: update nlm_lookup_file reexport comment · b661601a

由 J. Bruce Fields 提交于 8月 20, 2021

Update comment to reflect that we *do* allow reexport, whether it's a
good idea or not....
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>

b661601a

nlm: minor refactoring · a81041b7

由 J. Bruce Fields 提交于 8月 23, 2021

Make this lookup slightly more concise, and prepare for changing how we
look this up in a following patch.
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>

a81041b7

nlm: minor nlm_lookup_file argument change · 2dc6f19e

由 J. Bruce Fields 提交于 8月 23, 2021

It'll come in handy to get the whole nlm_lock.
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>

2dc6f19e

21 8月, 2021 5 次提交

lockd: lockd server-side shouldn't set fl_ops · 7de875b2

由 J. Bruce Fields 提交于 8月 20, 2021

Locks have two sets of op arrays, fl_lmops for the lock manager (lockd
or nfsd), fl_ops for the filesystem.  The server-side lockd code has
been setting its own fl_ops, which leads to confusion (and crashes) in
the reexport case, where the filesystem expects to be the only one
setting fl_ops.

And there's no reason for it that I can see-the lm_get/put_owner ops do
the same job.
Reported-by: NDaire Byrne <daire@dneg.com>
Tested-by: NDaire Byrne <daire@dneg.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>

7de875b2

C
SUNRPC: Add documentation for the fail_sunrpc/ directory · 400edd8c
由 Chuck Lever 提交于 8月 09, 2021
```
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
```
400edd8c

SUNRPC: Server-side disconnect injection · 3a126180

由 Chuck Lever 提交于 8月 03, 2021

Disconnect injection stress-tests the ability for both client and
server implementations to behave resiliently in the face of network
instability.

A file called /sys/kernel/debug/fail_sunrpc/ignore-server-disconnect
enables administrators to turn off server-side disconnect injection
while allowing other types of sunrpc errors to be injected. The
default setting is that server-side disconnect injection is enabled
(ignore=false).
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>

3a126180

SUNRPC: Move client-side disconnect injection · a4ae3081

由 Chuck Lever 提交于 8月 05, 2021

Disconnect injection stress-tests the ability for both client and
server implementations to behave resiliently in the face of network
instability.

Convert the existing client-side disconnect injection infrastructure
to use the kernel's generic error injection facility. The generic
facility has a richer set of injection criteria.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>

a4ae3081

SUNRPC: Add a /sys/kernel/debug/fail_sunrpc/ directory · c782af25

由 Chuck Lever 提交于 8月 03, 2021

This directory will contain a set of administrative controls for
enabling error injection for kernel RPC consumers.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>

c782af25

19 8月, 2021 1 次提交

svcrdma: xpt_bc_xprt is already clear in __svc_rdma_free() · 729580dd

由 Chuck Lever 提交于 8月 18, 2021

svc_xprt_free() already "puts" the bc_xprt before calling the
transport's "free" method. No need to do it twice.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>

729580dd

17 8月, 2021 17 次提交

nfsd4: Fix forced-expiry locking · f7104cc1

由 J. Bruce Fields 提交于 8月 12, 2021

This should use the network-namespace-wide client_lock, not the
per-client cl_lock.

You shouldn't see any bugs unless you're actually using the
forced-expiry interface introduced by 89c905be.

Fixes: 89c905be "nfsd: allow forced expiration of NFSv4 clients"
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>

f7104cc1

rpc: fix gss_svc_init cleanup on failure · 5a475344

由 J. Bruce Fields 提交于 8月 12, 2021

The failure case here should be rare, but it's obviously wrong.
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>

5a475344

SUNRPC: Add RPC_AUTH_TLS protocol numbers · b4ab2fea

由 Chuck Lever 提交于 7月 30, 2021

Shared by client and server. See:

https://www.iana.org/assignments/rpc-authentication-numbers/rpc-authentication-numbers.xhtmlSigned-off-by: NChuck Lever <chuck.lever@oracle.com>

b4ab2fea

lockd: change the proc_handler for nsm_use_hostnames · d02a3a2c

由 Jia He 提交于 8月 03, 2021

nsm_use_hostnames is a module parameter and it will be exported to sysctl
procfs. This is to let user sometimes change it from userspace. But the
minimal unit for sysctl procfs read/write it sizeof(int).
In big endian system, the converting from/to  bool to/from int will cause
error for proc items.

This patch use a new proc_handler proc_dobool to fix it.
Signed-off-by: NJia He <hejianet@gmail.com>
Reviewed-by: NPan Xinhui <xinhui.pan@linux.vnet.ibm.com>
[thuth: Fix typo in commit message]
Signed-off-by: NThomas Huth <thuth@redhat.com>
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>

d02a3a2c

sysctl: introduce new proc handler proc_dobool · a2071573

由 Jia He 提交于 8月 03, 2021

This is to let bool variable could be correctly displayed in
big/little endian sysctl procfs. sizeof(bool) is arch dependent,
proc_dobool should work in all arches.
Suggested-by: NPan Xinhui <xinhui@linux.vnet.ibm.com>
Signed-off-by: NJia He <hejianet@gmail.com>
[thuth: rebased the patch to the current kernel version]
Signed-off-by: NThomas Huth <thuth@redhat.com>
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>

a2071573

SUNRPC: Fix a NULL pointer deref in trace_svc_stats_latency() · 5c117207

由 Chuck Lever 提交于 8月 05, 2021

Some paths through svc_process() leave rqst->rq_procinfo set to
NULL, which triggers a crash if tracing happens to be enabled.

Fixes: 89ff8749 ("SUNRPC: Display RPC procedure names instead of proc numbers")
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>

5c117207

NFSD: remove vanity comments · ea49dc79

由 NeilBrown 提交于 7月 28, 2021

Including one's name in copyright claims is appropriate.  Including it
in random comments is just vanity.  After 2 decades, it is time for
these to be gone.
Signed-off-by: NNeilBrown <neilb@suse.de>
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>

ea49dc79

svcrdma: Convert rdma->sc_rw_ctxts to llist · 07a92d00

由 Chuck Lever 提交于 2月 08, 2021

Relieve contention on sc_rw_ctxt_lock by converting rdma->sc_rw_ctxts
to an llist.

The goal is to reduce the average overhead of Send completions,
because a transport's completion handlers are single-threaded on
one CPU core. This change reduces CPU utilization of each Send
completion by 2-3% on my server.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Reviewed-By: NTom Talpey <tom@talpey.com>

07a92d00

svcrdma: Relieve contention on sc_send_lock. · b6c2bfea

由 Chuck Lever 提交于 2月 09, 2021

/proc/lock_stat indicates the the sc_send_lock is heavily
contended when the server is under load from a single client.

To address this, convert the send_ctxt free list to an llist.
Returning an item to the send_ctxt cache is now waitless, which
reduces the instruction path length in the single-threaded Send
handler (svc_rdma_wc_send).

The goal is to enable the ib_comp_wq worker to handle a higher
RPC/RDMA Send completion rate given the same CPU resources. This
change reduces CPU utilization of Send completion by 2-3% on my
server.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Reviewed-By: NTom Talpey <tom@talpey.com>

b6c2bfea

svcrdma: Fewer calls to wake_up() in Send completion handler · 6c8c84f5

由 Chuck Lever 提交于 7月 07, 2021

Because wake_up() takes an IRQ-safe lock, it can be expensive,
especially to call inside of a single-threaded completion handler.
What's more, the Send wait queue almost never has waiters, so
most of the time, this is an expensive no-op.

As always, the goal is to reduce the average overhead of each
completion, because a transport's completion handlers are single-
threaded on one CPU core. This change reduces CPU utilization of
the Send completion thread by 2-3% on my server.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Reviewed-By: NTom Talpey <tom@talpey.com>

6c8c84f5

lockd: Fix invalid lockowner cast after vfs_test_lock · cd2d644d

由 Benjamin Coddington 提交于 7月 26, 2021

After calling vfs_test_lock() the pointer to a conflicting lock can be
returned, and that lock is not guarunteed to be owned by nlm.  In that
case, we cannot cast it to struct nlm_lockowner.  Instead return the pid
of that conflicting lock.

Fixes: 646d73e9 ("lockd: Show pid of lockd for remote locks")
Signed-off-by: NBenjamin Coddington <bcodding@redhat.com>
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>

cd2d644d

C
NFSD: Use new __string_len C macros for nfsd_clid_class · d27b74a8
由 Chuck Lever 提交于 5月 14, 2021
```
Clean up.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
```
d27b74a8
C
NFSD: Use new __string_len C macros for the nfs_dirent tracepoint · 408c0de7
由 Chuck Lever 提交于 5月 12, 2021
```
Clean up.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
```
408c0de7

tracing: Add trace_event helper macros __string_len() and __assign_str_len() · 883b4aee

由 Steven Rostedt (VMware) 提交于 7月 16, 2021

There's a few cases that a string that is to be recorded in a trace event,
does not have a terminating 'nul' character, and instead, the tracepoint
passes in the length of the string to record.

Add two helper macros to the trace event code that lets this work easier,
than tricks with "%.*s" logic.

  __string_len() which is similar to __string() for declaration, but takes a
                 length argument.

  __assign_str_len() which is similar to __assign_str() for assiging the
                 string, but it too takes a length argument.

Note, the TRACE_EVENT() macro will allocate the location on the ring
buffer to 'len + 1', that will be used to store the string into. It is a
requirement that the 'len' used for this is a most the length of the
string being recorded.

This string can still use __get_str() just like strings created with
__string() can use to retrieve the string.

Link: https://lore.kernel.org/linux-nfs/20210513105018.7539996a@gandalf.local.home/Tested-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>

883b4aee

NFSD: Batch release pages during splice read · 496d83cf

由 Chuck Lever 提交于 6月 28, 2021

Large splice reads call put_page() repeatedly. put_page() is
relatively expensive to call, so replace it with the new
svc_rqst_replace_page() helper to help amortize that cost.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Reviewed-by: NNeilBrown <neilb@suse.de>

496d83cf

SUNRPC: Add svc_rqst_replace_page() API · 2f0f88f4

由 Chuck Lever 提交于 7月 01, 2021

Replacing a page in rq_pages[] requires a get_page(), which is a
bus-locked operation, and a put_page(), which can be even more
costly.

To reduce the cost of replacing a page in rq_pages[], batch the
put_page() operations by collecting "freed" pages in a pagevec,
and then release those pages when the pagevec is full. This
pagevec is also emptied when each RPC completes.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>

2f0f88f4

NFSD: Clean up splice actor · c7e0b781

由 Chuck Lever 提交于 6月 28, 2021

A few useful observations:

 - The value in @size is never modified.

 - splice_desc.len is an unsigned int, and so is xdr_buf.page_len.
   An implicit cast to size_t is unnecessary.

 - The computation of .page_len is the same in all three arms
   of the "if" statement, so hoist it out to make it clear that
   the operation is an unconditional invariant.

The resulting function is 18 bytes shorter on my system (-Os).
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Reviewed-by: NNeilBrown <neilb@suse.de>

c7e0b781

16 8月, 2021 1 次提交
- L
  
  Linux 5.14-rc6 · 7c60610d
  由 Linus Torvalds 提交于 8月 15, 2021
  
  7c60610d

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功