提交 · 9e895cd9649abe4392c59d14e31b0f5667d082d2 · openeuler / Kernel

02 5月, 2021 2 次提交

xprtrdma: Fix a NULL dereference in frwr_unmap_sync() · 9e895cd9

由 Chuck Lever 提交于 5月 01, 2021

The normal mechanism that invalidates and unmaps MRs is
frwr_unmap_async(). frwr_unmap_sync() is used only when an RPC
Reply bearing Write or Reply chunks has been lost (ie, almost
never).

Coverity found that after commit 9a301caf ("xprtrdma: Move
fr_linv_done field to struct rpcrdma_mr"), the while() loop in
frwr_unmap_sync() exits only once @mr is NULL, unconditionally
causing subsequent dereferences of @mr to Oops.

I've tested this fix by creating a client that skips invoking
frwr_unmap_async() when RPC Replies complete. That forces all
invalidation tasks to fall upon frwr_unmap_sync(). Simple workloads
with this fix applied to the adulterated client work as designed.
Reported-by: Ncoverity-bot <keescook+coverity-bot@chromium.org>
Addresses-Coverity-ID: 1504556 ("Null pointer dereferences")
Fixes: 9a301caf ("xprtrdma: Move fr_linv_done field to struct rpcrdma_mr")
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>

9e895cd9

sunrpc: Fix misplaced barrier in call_decode · f8f7e0fb

由 Baptiste Lepers 提交于 5月 01, 2021

Fix a misplaced barrier in call_decode. The struct rpc_rqst is modified
as follows by xprt_complete_rqst:

req->rq_private_buf.len = copied;
/* Ensure all writes are done before we update */
/* req->rq_reply_bytes_recvd */
smp_wmb();
req->rq_reply_bytes_recvd = copied;

And currently read as follows by call_decode:

smp_rmb(); // misplaced
if (!req->rq_reply_bytes_recvd)
   goto out;
req->rq_rcv_buf.len = req->rq_private_buf.len;

This patch places the smp_rmb after the if to ensure that
rq_reply_bytes_recvd and rq_private_buf.len are read in order.

Fixes: 9ba82886 ("SUNRPC: Don't try to parse incomplete RPC messages")
Signed-off-by: NBaptiste Lepers <baptiste.lepers@gmail.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>

f8f7e0fb

26 4月, 2021 24 次提交

NFSv4.2: Remove ifdef CONFIG_NFSD from NFSv4.2 client SSC code. · d9092b4b

由 Dai Ngo 提交于 4月 22, 2021

The client SSC code should not depend on any of the CONFIG_NFSD config.
This patch removes all CONFIG_NFSD from NFSv4.2 client SSC code and
simplifies the config of CONFIG_NFS_V4_2_SSC_HELPER, NFSD_V4_2_INTER_SSC.
Signed-off-by: NDai Ngo <dai.ngo@oracle.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>

d9092b4b

xprtrdma: Move fr_mr field to struct rpcrdma_mr · 13bcf7e3

由 Chuck Lever 提交于 4月 19, 2021

Clean up: The last remaining field in struct rpcrdma_frwr has been
removed, so the struct can be eliminated.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>

13bcf7e3

xprtrdma: Move the Work Request union to struct rpcrdma_mr · dcff9ed2

由 Chuck Lever 提交于 4月 19, 2021

Clean up.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>

dcff9ed2

xprtrdma: Move fr_linv_done field to struct rpcrdma_mr · 9a301caf

由 Chuck Lever 提交于 4月 19, 2021

Clean up: Move more of struct rpcrdma_frwr into its parent.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>

9a301caf

xprtrdma: Move cqe to struct rpcrdma_mr · e10fa96d

由 Chuck Lever 提交于 4月 19, 2021

Clean up.

- Simplify variable initialization in the completion handlers.

- Move another field out of struct rpcrdma_frwr.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>

e10fa96d

xprtrdma: Move fr_cid to struct rpcrdma_mr · 0a26d10e

由 Chuck Lever 提交于 4月 19, 2021

Clean up (for several purposes):

- The MR's cid is initialized sooner so that tracepoints can show
  something reasonable even if the MR is never posted.
- The MR's res.id doesn't change so the cid won't change either.
  Initializing the cid once is sufficient.
- struct rpcrdma_frwr is going away soon.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>

0a26d10e

xprtrdma: Remove the RPC/RDMA QP event handler · e1648eb2

由 Chuck Lever 提交于 4月 19, 2021

Clean up: The handler only recorded a trace event. If indeed no
action is needed by the RPC/RDMA consumer, then the event can be
ignored.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>

e1648eb2

xprtrdma: Don't display r_xprt memory addresses in tracepoints · 83189d15

由 Chuck Lever 提交于 4月 19, 2021

The remote peer's IP address is sufficient, and does not expose
details of the kernel's memory layout.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>

83189d15

xprtrdma: Add an rpcrdma_mr_completion_class · 6b147ea7

由 Chuck Lever 提交于 4月 19, 2021

I found it confusing that the MR_EVENT class displays the mr.id but
the associated COMPLETION_EVENT class displays a cid (that happens
to contain the mr.id!). To make it a little easier on humans who
have to read and interpret these events, create an MR_COMPLETION
class that displays the mr.id in the same way as the MR_EVENT class.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>

6b147ea7

xprtrdma: Add tracepoints showing FastReg WRs and remote invalidation · 4ddd0fc3

由 Chuck Lever 提交于 4月 19, 2021

The Send signaling logic is a little subtle, so add some
observability around it. For every xprtrdma_mr_fastreg event, there
should be an xprtrdma_mr_localinv or xprtrdma_mr_reminv event.

When these tracepoints are enabled, we can see exactly when an MR is
DMA-mapped, registered, invalidated (either locally or remotely) and
then DMA-unmapped.

kworker/u25:2-190 [000] 787.979512: xprtrdma_mr_map: task:351@5 mr.id=4 nents=2 5608@0x8679e0c8f6f56000:0x00000503 (TO_DEVICE)
kworker/u25:2-190 [000] 787.979515: xprtrdma_chunk_read: task:351@5 pos=148 5608@0x8679e0c8f6f56000:0x00000503 (last)
kworker/u25:2-190 [000] 787.979519: xprtrdma_marshal: task:351@5 xid=0x8679e0c8: hdr=52 xdr=148/5608/0 read list/inline
kworker/u25:2-190 [000] 787.979525: xprtrdma_mr_fastreg: task:351@5 mr.id=4 nents=2 5608@0x8679e0c8f6f56000:0x00000503 (TO_DEVICE)
kworker/u25:2-190 [000] 787.979526: xprtrdma_post_send: task:351@5 cq.id=0 cid=73 (2 SGEs)

...

kworker/5:1H-219 [005] 787.980567: xprtrdma_wc_receive: cq.id=1 cid=161 status=SUCCESS (0/0x0) received=164
kworker/5:1H-219 [005] 787.980571: xprtrdma_post_recvs: peer=[192.168.100.55]:20049 r_xprt=0xffff8884974d4000: 0 new recvs, 70 active (rc 0)
kworker/5:1H-219 [005] 787.980573: xprtrdma_reply: task:351@5 xid=0x8679e0c8 credits=64
kworker/5:1H-219 [005] 787.980576: xprtrdma_mr_reminv: task:351@5 mr.id=4 nents=2 5608@0x8679e0c8f6f56000:0x00000503 (TO_DEVICE)
kworker/5:1H-219 [005] 787.980577: xprtrdma_mr_unmap: mr.id=4 nents=2 5608@0x8679e0c8f6f56000:0x00000503 (TO_DEVICE)

Note that I've moved the xprtrdma_post_send tracepoint so that event
always appears after the xprtrdma_mr_fastreg tracepoint. Otherwise
the event log looks counterintuitive (FastReg is always supposed to
happen before Send).
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>

4ddd0fc3

xprtrdma: Avoid Send Queue wrapping · b3ce7a25

由 Chuck Lever 提交于 4月 19, 2021

Send WRs can be signalled or unsignalled. A signalled Send WR
always has a matching Send completion, while a unsignalled Send
has a completion only if the Send WR fails.

xprtrdma has a Send account mechanism that is designed to reduce
the number of signalled Send WRs. This in turn mitigates the
interrupt rate of the underlying device.

RDMA consumers can't leave all Sends unsignaled, however, because
providers rely on Send completions to maintain their Send Queue head
and tail pointers. xprtrdma counts the number of unsignaled Send WRs
that have been posted to ensure that Sends are signalled often
enough to prevent the Send Queue from wrapping.

This mechanism neglected to account for FastReg WRs, which are
posted on the Send Queue but never signalled. As a result, the
Send Queue wrapped on occasion, resulting in duplication completions
of FastReg and LocalInv WRs.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>

b3ce7a25

xprtrdma: Do not wake RPC consumer on a failed LocalInv · 8a053433

由 Chuck Lever 提交于 4月 19, 2021

Throw away any reply where the LocalInv flushes or could not be
posted. The registered memory region is in an unknown state until
the disconnect completes.

rpcrdma_xprt_disconnect() will find and release the MR. No need to
put it back on the MR free list in this case.

The client retransmits pending RPC requests once it reestablishes a
fresh connection, so a replacement reply should be forthcoming on
the next connection instance.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>

8a053433

xprtrdma: Do not recycle MR after FastReg/LocalInv flushes · e4b52ca0

由 Chuck Lever 提交于 4月 19, 2021

Better not to touch MRs involved in a flush or post error until the
Send and Receive Queues are drained and the transport is fully
quiescent. Simply don't insert such MRs back onto the free list.
They remain on mr_all and will be released when the connection is
torn down.

I had thought that recycling would prevent hardware resources from
being tied up for a long time. However, since v5.7, a transport
disconnect destroys the QP and other hardware-owned resources. The
MRs get cleaned up nicely at that point.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>

e4b52ca0

xprtrdma: Clarify use of barrier in frwr_wc_localinv_done() · 44438ad9

由 Chuck Lever 提交于 4月 19, 2021

Clean up: The comment and the placement of the memory barrier is
confusing. Humans want to read the function statements from head
to tail.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>

44438ad9

xprtrdma: Rename frwr_release_mr() · f912af77

由 Chuck Lever 提交于 4月 19, 2021

Clean up: To be consistent with other functions in this source file,
follow the naming convention of putting the object being acted upon
before the action itself.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>

f912af77

xprtrdma: rpcrdma_mr_pop() already does list_del_init() · 1363e638

由 Chuck Lever 提交于 4月 19, 2021

The rpcrdma_mr_pop() earlier in the function has already cleared
out mr_list, so it must not be done again in the error path.

Fixes: 84756894 ("xprtrdma: Remove fr_state")
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>

1363e638

xprtrdma: Delete rpcrdma_recv_buffer_put() · c35ca60d

由 Chuck Lever 提交于 4月 19, 2021

Clean up: The name recv_buffer_put() is a vestige of older code,
and the function is just a wrapper for the newer rpcrdma_rep_put().
In most of the existing call sites, a pointer to the owning
rpcrdma_buffer is already available.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>

c35ca60d

xprtrdma: Fix cwnd update ordering · 35d8b10a

由 Chuck Lever 提交于 4月 19, 2021

After a reconnect, the reply handler is opening the cwnd (and thus
enabling more RPC Calls to be sent) /before/ rpcrdma_post_recvs()
can post enough Receive WRs to receive their replies. This causes an
RNR and the new connection is lost immediately.

The race is most clearly exposed when KASAN and disconnect injection
are enabled. This slows down rpcrdma_rep_create() enough to allow
the send side to post a bunch of RPC Calls before the Receive
completion handler can invoke ib_post_recv().

Fixes: 2ae50ad6 ("xprtrdma: Close window between waking RPC senders and posting Receives")
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>

35d8b10a

xprtrdma: Improve locking around rpcrdma_rep creation · 9e3ca33b

由 Chuck Lever 提交于 4月 19, 2021

Defensive clean up: Protect the rb_all_reps list during rep
creation.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>

9e3ca33b

xprtrdma: Improve commentary around rpcrdma_reps_unmap() · 8b5292be

由 Chuck Lever 提交于 4月 19, 2021

Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>

8b5292be

xprtrdma: Improve locking around rpcrdma_rep destruction · eaf86e8c

由 Chuck Lever 提交于 4月 24, 2021

Currently rpcrdma_reps_destroy() assumes that, at transport
tear-down, the content of the rb_free_reps list is the same as the
content of the rb_all_reps list. Although that is usually true,
using the rb_all_reps list should be more reliable because of
the way it's managed. And, rpcrdma_reps_unmap() uses rb_all_reps;
these two functions should both traverse the "all" list.

Ensure that all rpcrdma_reps are always destroyed whether they are
on the rep free list or not.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>

eaf86e8c

xprtrdma: Put flushed Receives on free list instead of destroying them · 5030c9a9

由 Chuck Lever 提交于 4月 19, 2021

Defer destruction of an rpcrdma_rep until transport tear-down to
preserve the rb_all_reps list while Receives flush.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Reviewed-by: NTom Talpey <tom@talpey.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>

5030c9a9

xprtrdma: Do not refresh Receive Queue while it is draining · 15788d1d

由 Chuck Lever 提交于 4月 19, 2021

Currently the Receive completion handler refreshes the Receive Queue
whenever a successful Receive completion occurs.

On disconnect, xprtrdma drains the Receive Queue. The first few
Receive completions after a disconnect are typically successful,
until the first flushed Receive.

This means the Receive completion handler continues to post more
Receive WRs after the drain sentinel has been posted. The late-
posted Receives flush after the drain sentinel has completed,
leading to a crash later in rpcrdma_xprt_disconnect().

To prevent this crash, xprtrdma has to ensure that the Receive
handler stops posting Receives before ib_drain_rq() posts its
drain sentinel.
Suggested-by: NTom Talpey <tom@talpey.com>
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>

15788d1d

xprtrdma: Avoid Receive Queue wrapping · 32e6b681

由 Chuck Lever 提交于 4月 19, 2021

Commit e340c2d6 ("xprtrdma: Reduce the doorbell rate (Receive)")
increased the number of Receive WRs that are posted by the client,
but did not increase the size of the Receive Queue allocated during
transport set-up.

This is usually not an issue because RPCRDMA_BACKWARD_WRS is defined
as (32) when SUNRPC_BACKCHANNEL is defined. In cases where it isn't,
there is a real risk of Receive Queue wrapping.

Fixes: e340c2d6 ("xprtrdma: Reduce the doorbell rate (Receive)")
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Reviewed-by: NTom Talpey <tom@talpey.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>

32e6b681

21 4月, 2021 1 次提交

NFS: The 'fattr_valid' field in struct nfs_server should be unsigned int · d99f2487

由 Trond Myklebust 提交于 4月 21, 2021

Fix up a static compiler warning:
"fs/nfs/nfs4proc.c:3882 _nfs4_server_capabilities() warn: was expecting
a 64 bit value instead of '(1 << 11)'"

The fix is to convert the fattr_valid field to match the type of the
'valid' field in struct nfs_fattr.
Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>

d99f2487

19 4月, 2021 2 次提交

NFSv4.1: Simplify layout return in pnfs_layout_process() · fb700ef0

由 Trond Myklebust 提交于 4月 15, 2021

If the server hands us a layout that does not match the one we currently
hold, then have pnfs_mark_matching_lsegs_return() just ditch the old
layout if NFS_LSEG_LAYOUTRETURN is not set.
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>

fb700ef0

NFSv4: Don't discard segments marked for return in _pnfs_return_layout() · de144ff4

由 Trond Myklebust 提交于 4月 18, 2021

If the pNFS layout segment is marked with the NFS_LSEG_LAYOUTRETURN
flag, then the assumption is that it has some reporting requirement
to perform through a layoutreturn (e.g. flexfiles layout stats or error
information).

Fixes: 6d597e17 ("pnfs: only tear down lsegs that precede seqid in LAYOUTRETURN args")
Cc: stable@vger.kernel.org
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>

de144ff4

16 4月, 2021 2 次提交

NFS: Don't discard pNFS layout segments that are marked for return · 39fd0186

由 Trond Myklebust 提交于 4月 15, 2021

If the pNFS layout segment is marked with the NFS_LSEG_LAYOUTRETURN
flag, then the assumption is that it has some reporting requirement
to perform through a layoutreturn (e.g. flexfiles layout stats or error
information).

Fixes: e0b7d420 ("pNFS: Don't discard layout segments that are marked for return")
Cc: stable@vger.kernel.org
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>

39fd0186

NFSv4.x: Don't return NFS4ERR_NOMATCHING_LAYOUT if we're unmounting · 8926cc83

由 Trond Myklebust 提交于 4月 15, 2021

If the NFS super block is being unmounted, then we currently may end up
telling the server that we've forgotten the layout while it is actually
still in use by the client.
In that case, just assume that the client will soon return the layout
anyway, and so return NFS4ERR_DELAY in response to the layout recall.

Fixes: 58ac3e59 ("NFSv4/pnfs: Clean up nfs_layout_find_inode()")
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>

8926cc83

14 4月, 2021 9 次提交

NFSv42: Don't force attribute revalidation of the copy offload source · febfeaae

由 Trond Myklebust 提交于 4月 14, 2021

When a copy offload is performed, we do not expect the source file to
change other than perhaps to see the atime be updated.
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>

febfeaae

NFSv42: Copy offload should update the file size when appropriate · 94d202d5

由 Trond Myklebust 提交于 4月 14, 2021

If the result of a copy offload or clone operation is to grow the
destination file size, then we should update it. The reason is that when
a client holds a delegation, it is authoritative for the file size.

Fixes: 16abd2a0 ("NFSv4.2: fix client's attribute cache management for copy_file_range")
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>

94d202d5

SUNRPC: Handle major timeout in xprt_adjust_timeout() · 09252177

由 Chris Dion 提交于 4月 04, 2021

Currently if a major timeout value is reached, but the minor value has
not been reached, an ETIMEOUT will not be sent back to the caller.
This can occur if the v4 server is not responding to requests and
retrans is configured larger than the default of two.

For example, A TCP mount with a configured timeout value of 50 and a
retransmission count of 3 to a v4 server which is not responding:

1. Initial value and increment set to 5s, maxval set to 20s, retries at 3
2. Major timeout is set to 20s, minor timeout set to 5s initially
3. xport_adjust_timeout() is called after 5s, retry with 10s timeout,
   minor timeout is bumped to 10s
4. And again after another 10s, 15s total time with minor timeout set
   to 15s
5. After 20s total time xport_adjust_timeout is called as major timeout is
   reached, but skipped because the minor timeout is not reached
       - After this time the cpu spins continually calling
       	 xport_adjust_timeout() and returning 0 for 10 seconds.
	 As seen on perf sched:
   	 39243.913182 [0005]  mount.nfs[3794] 4607.938      0.017   9746.863
6. This continues until the 15s minor timeout condition is reached (in
   this case for 10 seconds). After which the ETIMEOUT is processed
   back to the caller, the cpu spinning stops, and normal operations
   continue

Fixes: 7de62bc0 ("SUNRPC dont update timeout value on connection reset")
Signed-off-by: NChris Dion <Christopher.Dion@dell.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>

09252177

SUNRPC: Remove trace_xprt_transmit_queued · 6cf23783

由 Chuck Lever 提交于 3月 31, 2021

This tracepoint can crash when dereferencing snd_task because
when some transports connect, they put a cookie in that field
instead of a pointer to an rpc_task.

BUG: KASAN: use-after-free in trace_event_raw_event_xprt_writelock_event+0x141/0x18e [sunrpc]
Read of size 2 at addr ffff8881a83bd3a0 by task git/331872

CPU: 11 PID: 331872 Comm: git Tainted: G S                5.12.0-rc2-00007-g3ab6e585a7f9 #1453
Hardware name: Supermicro SYS-6028R-T/X10DRi, BIOS 1.1a 10/16/2015
Call Trace:
 dump_stack+0x9c/0xcf
 print_address_description.constprop.0+0x18/0x239
 kasan_report+0x174/0x1b0
 trace_event_raw_event_xprt_writelock_event+0x141/0x18e [sunrpc]
 xprt_prepare_transmit+0x8e/0xc1 [sunrpc]
 call_transmit+0x4d/0xc6 [sunrpc]

Fixes: 9ce07ae5 ("SUNRPC: Replace dprintk() call site in xprt_prepare_transmit")
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>

6cf23783

SUNRPC: Add tracepoint that fires when an RPC is retransmitted · e936a597

由 Chuck Lever 提交于 3月 31, 2021

A separate tracepoint can be left enabled all the time to capture
rare but important retransmission events. So for example:

kworker/u26:3-568 [009] 156.967933: xprt_retransmit: task:44093@5 xid=0xa25dbc79 nfsv3 WRITE ntrans=2

Or, for example, enable all nfs and nfs4 tracepoints, and set up a
trigger to disable tracing when xprt_retransmit fires to capture
everything that leads up to it.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>

e936a597

SUNRPC: Move fault injection call sites · 7638e0bf

由 Chuck Lever 提交于 3月 31, 2021

I've hit some crashes that occur in the xprt_rdma_inject_disconnect
path. It appears that, for some provides, rdma_disconnect() can
take so long that the transport can disconnect and release its
hardware resources while rdma_disconnect() is still running,
resulting in a UAF in the provider.

The transport's fault injection method may depend on the stability
of transport data structures. That means it needs to be invoked
only from contexts that hold the transport write lock.

Fixes: 4a068258 ("SUNRPC: Transport fault injection")
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>

7638e0bf

NFSv4.2 fix handling of sr_eof in SEEK's reply · 73f5c88f

由 Olga Kornievskaia 提交于 3月 31, 2021

Currently the client ignores the value of the sr_eof of the SEEK
operation. According to the spec, if the server didn't find the
requested extent and reached the end of the file, the server
would return sr_eof=true. In case the request for DATA and no
data was found (ie in the middle of the hole), then the lseek
expects that ENXIO would be returned.

Fixes: 1c6dcbe5 ("NFS: Implement SEEK")
Signed-off-by: NOlga Kornievskaia <kolga@netapp.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>

73f5c88f

pNFS/flexfiles: fix incorrect size check in decode_nfs_fh() · ed34695e

由 Nikola Livic 提交于 3月 29, 2021

We (adam zabrocki, alexander matrosov, alexander tereshkin, maksym
bazalii) observed the check:

	if (fh->size > sizeof(struct nfs_fh))

should not use the size of the nfs_fh struct which includes an extra two
bytes from the size field.

struct nfs_fh {
	unsigned short         size;
	unsigned char          data[NFS_MAXFHSIZE];
}

but should determine the size from data[NFS_MAXFHSIZE] so the memcpy
will not write 2 bytes beyond destination.  The proposed fix is to
compare against the NFS_MAXFHSIZE directly, as is done elsewhere in fs
code base.

Fixes: d67ae825 ("pnfs/flexfiles: Add the FlexFile Layout Driver")
Signed-off-by: NNikola Livic <nlivic@gmail.com>
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>

ed34695e

NFSv4: Catch and trace server filehandle encoding errors · eb3d58c6

由 Trond Myklebust 提交于 4月 01, 2021

If the server returns a filehandle with an invalid length, then trace
that, and return an EREMOTEIO error.
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>

eb3d58c6

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功