提交 · 28d9d56f4c7759e1f12e5b1bff60210082812edc · openeuler / Kernel

16 8月, 2017 1 次提交

xprtrdma: Remove imul instructions from rpcrdma_convert_iovs() · 28d9d56f

由 Chuck Lever 提交于 8月 14, 2017

Re-arrange the pointer arithmetic in rpcrdma_convert_iovs() to
eliminate several integer multiplication instructions during
Transport Header encoding.

Also, array overflow does not occur outside development
environments, so replace overflow checking with one spot check
at the end. This reduces the number of conditional branches in
the common case.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

28d9d56f

12 8月, 2017 5 次提交

xprtrdma: Clean up rpcrdma_bc_marshal_reply() · 7ec910e7

由 Chuck Lever 提交于 8月 10, 2017

Same changes as in rpcrdma_marshal_req(). This removes
C-structure style encoding from the backchannel.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

7ec910e7

xprtrdma: Harden chunk list encoding against send buffer overflow · 39f4cd9e

由 Chuck Lever 提交于 8月 10, 2017

While marshaling chunk lists which are variable-length XDR objects,
check for XDR buffer overflow at every step. Measurements show no
significant changes in CPU utilization.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

39f4cd9e

xprtrdma: Set up an xdr_stream in rpcrdma_marshal_req() · 7a80f3f0

由 Chuck Lever 提交于 8月 10, 2017

Initialize an xdr_stream at the top of rpcrdma_marshal_req(), and
use it to encode the fixed transport header fields. This xdr_stream
will be used to encode the chunk lists in a subsequent patch.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

7a80f3f0

xprtrdma: Remove rpclen from rpcrdma_marshal_req · f4a2805e

由 Chuck Lever 提交于 8月 10, 2017

Clean up: Remove a variable whose result is no longer used.
Commit 655fec69 ("xprtrdma: Use gathered Send for large inline
messages") should have removed it.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

f4a2805e

xprtrdma: Clean up rpcrdma_marshal_req() synopsis · 09e60641

由 Chuck Lever 提交于 8月 10, 2017

Clean up: The caller already has rpcrdma_xprt, so pass that directly
instead. And provide a documenting comment for this critical
function.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

09e60641

08 8月, 2017 7 次提交

xprtrdma: Clean up XDR decoding in rpcrdma_update_granted_credits() · c1bcb68e

由 Chuck Lever 提交于 8月 03, 2017

Clean up: Replace C-structure based XDR decoding for consistency
with other areas.

struct rpcrdma_rep is rearranged slightly so that the relevant fields
are in cache when the Receive completion handler is invoked.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

c1bcb68e

xprtrdma: Remove rpcrdma_rep::rr_len · e2a67190

由 Chuck Lever 提交于 8月 03, 2017

This field is no longer used outside the Receive completion handler.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

e2a67190

xprtrdma: Remove opcode check in Receive completion handler · fdf503e3

由 Chuck Lever 提交于 8月 03, 2017

Clean up: The opcode check is no longer necessary, because since
commit 2fa8f88d ("xprtrdma: Use new CQ API for RPC-over-RDMA
client send CQs"), this completion handler is invoked only for
RECV work requests.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

fdf503e3

xprtrdma: Replace rpcrdma_count_chunks() · 264b0cdb

由 Chuck Lever 提交于 8月 03, 2017

Clean up chunk list decoding by using the xdr_stream set up in
rpcrdma_reply_handler. This hardens decoding by checking for buffer
overflow at every step while unmarshaling variable-length XDR
objects.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

264b0cdb

xprtrdma: Refactor rpcrdma_reply_handler() · 07ff2dd5

由 Chuck Lever 提交于 8月 03, 2017

Refactor the reply handler's transport header decoding logic to make
it easier to understand and update.

Convert some of the handler to use xdr_streams, which will enable
stricter validation of input data and enable the eventual addition
of support for new combinations of chunks, such as "Write + Reply"
or "PZRC + normal Read".
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

07ff2dd5

xprtrdma: Harden backchannel call decoding · 41c8f70f

由 Chuck Lever 提交于 8月 03, 2017

Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

41c8f70f

xprtrdma: Add xdr_init_decode to rpcrdma_reply_handler() · 96f8778f

由 Chuck Lever 提交于 8月 03, 2017

Transport header decoding deals with untrusted input data, therefore
decoding this header needs to be hardened.

Adopt the same infrastructure that is used when XDR decoding NFS
replies. This is slightly more CPU-intensive than the replaced code,
but we're not adding new atomics, locking, or context switches. The
cost is manageable.

Start by initializing an xdr_stream in rpcrdma_reply_handler().
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

96f8778f

02 8月, 2017 1 次提交

sunrpc: Const-ify all instances of struct rpc_xprt_ops · d31ae254

由 Chuck Lever 提交于 8月 01, 2017

After transport instance creation, these function pointers never
change. Mark them as constant to prevent their use as an attack
vector for code injections.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

d31ae254

21 7月, 2017 1 次提交

net/sunrpc/xprt_sock: fix regression in connection error reporting. · 3ffbc1d6

由 NeilBrown 提交于 7月 19, 2017

Commit 3d476263 ("tcp: remove poll() flakes when receiving
RST") in v4.12 changed the order in which ->sk_state_change()
and ->sk_error_report() are called when a socket is shut
down - sk_state_change() is now called first.

This causes xs_tcp_state_change() -> xs_sock_mark_closed() ->
xprt_disconnect_done() to wake all pending tasked with -EAGAIN.
When the ->sk_error_report() callback arrives, it is too late to
pass the error on, and it is lost.

As easy way to demonstrate the problem caused is to try to start
rpc.nfsd while rcpbind isn't running.
nfsd will attempt a tcp connection to rpcbind.  A ECONNREFUSED
error is returned, but sunrpc code loses the error and keeps
retrying.  If it saw the ECONNREFUSED, it would abort.

To fix this, handle the sk->sk_err in the TCP_CLOSE branch of
xs_tcp_state_change().

Fixes: 3d476263 ("tcp: remove poll() flakes when receiving RST")
Cc: stable@vger.kernel.org (v4.12)
Signed-off-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

3ffbc1d6

14 7月, 2017 25 次提交

sunrpc: use constant time memory comparison for mac · 15a8b93f

由 Jason A. Donenfeld 提交于 6月 10, 2017

Otherwise, we enable a MAC forgery via timing attack.
Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Jeff Layton <jlayton@poochiereds.net>
Cc: Trond Myklebust <trond.myklebust@primarydata.com>
Cc: Anna Schumaker <anna.schumaker@netapp.com>
Cc: linux-nfs@vger.kernel.org
Cc: stable@vger.kernel.org
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

15a8b93f

xprtrdma: Fix documenting comments in frwr_ops.c · 6afafa77

由 Chuck Lever 提交于 6月 08, 2017

Clean up.

FASTREG and LOCAL_INV WRs are typically not signaled. localinv_wake
is used for the last LOCAL_INV WR in a chain, which is always
signaled. The documenting comments should reflect that.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

6afafa77

xprtrdma: Replace PAGE_MASK with offset_in_page() · d933cc32

由 Chuck Lever 提交于 6月 08, 2017

Clean up.

Reported by: Geliang Tang <geliangtang@gmail.com>
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

d933cc32

xprtrdma: FMR does not need list_del_init() · e2f6ef09

由 Chuck Lever 提交于 6月 08, 2017

Clean up.

Commit 38f1932e ("xprtrdma: Remove FMRs from the unmap list
after unmapping") utilized list_del_init() to try to prevent some
list corruption. The corruption was actually caused by the reply
handler racing with a signal. Now that MR invalidation is properly
serialized, list_del_init() can safely be replaced.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

e2f6ef09

xprtrdma: Demote "connect" log messages · 173b8f49

由 Chuck Lever 提交于 6月 08, 2017

Some have complained about the log messages generated when xprtrdma
opens or closes a connection to a server. When an NFS mount is
mostly idle these can appear every few minutes as the client idles
out the connection and reconnects.

Connection and disconnection is a normal part of operation, and not
exceptional, so change these to dprintk's for now. At some point
all of these will be converted to tracepoints, but that's for
another day.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

173b8f49

xprtrdma: Don't defer MR recovery if ro_map fails · 1f541895

由 Chuck Lever 提交于 6月 08, 2017

Deferred MR recovery does a DMA-unmapping of the MW. However, ro_map
invokes rpcrdma_defer_mr_recovery in some error cases where the MW
has not even been DMA-mapped yet.

Avoid a DMA-unmapping error replacing rpcrdma_defer_mr_recovery.

Also note that if ib_dma_map_sg is asked to map 0 nents, it will
return 0. So the extra "if (i == 0)" check is no longer needed.

Fixes: 42fe28f6 ("xprtrdma: Do not leak an MW during a DMA ...")
Fixes: 505bbe64 ("xprtrdma: Refactor MR recovery work queues")
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

1f541895

xprtrdma: Fix FRWR invalidation error recovery · 8d75483a

由 Chuck Lever 提交于 6月 08, 2017

When ib_post_send() fails, all LOCAL_INV WRs past @bad_wr have to be
examined, and the MRs reset by hand.

I'm not sure how the existing code can work by comparing R_keys.
Restructure the logic so that instead it walks the chain of WRs,
starting from the first bad one.

Make sure to wait for completion if at least one WR was actually
posted. Otherwise, if the ib_post_send fails, we can end up
DMA-unmapping the MR while LOCAL_INV operations are in flight.

Commit 7a89f9c6 ("xprtrdma: Honor ->send_request API contract")
added the rdma_disconnect() call site. The disconnect actually
causes more problems than it solves, and SQ overruns happen only as
a result of software bugs. So remove it.

Fixes: d7a21c1b ("xprtrdma: Reset MRs in frwr_op_unmap_sync()")
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

8d75483a

xprtrdma: Fix client lock-up after application signal fires · 431af645

由 Chuck Lever 提交于 6月 08, 2017

After a signal, the RPC client aborts synchronous RPCs running on
behalf of the signaled application.

The server is still executing those RPCs, and will write the results
back into the client's memory when it's done. By the time the server
writes the results, that memory is likely being used for other
purposes. Therefore xprtrdma has to immediately invalidate all
memory regions used by those aborted RPCs to prevent the server's
writes from clobbering that re-used memory.

With FMR memory registration, invalidation takes a relatively long
time. In fact, the invalidation is often still running when the
server tries to write the results into the memory regions that are
being invalidated.

This sets up a race between two processes:

1.  After the signal, xprt_rdma_free calls ro_unmap_safe.
2.  While ro_unmap_safe is still running, the server replies and
    rpcrdma_reply_handler runs, calling ro_unmap_sync.

Both processes invoke ib_unmap_fmr on the same FMR.

The mlx4 driver allows two ib_unmap_fmr calls on the same FMR at
the same time, but HCAs generally don't tolerate this. Sometimes
this can result in a system crash.

If the HCA happens to survive, rpcrdma_reply_handler continues. It
removes the rpc_rqst from rq_list and releases the transport_lock.
This enables xprt_rdma_free to run in another process, and the
rpc_rqst is released while rpcrdma_reply_handler is still waiting
for the ib_unmap_fmr call to finish.

But further down in rpcrdma_reply_handler, the transport_lock is
taken again, and "rqst" is dereferenced. If "rqst" has already been
released, this triggers a general protection fault. Since bottom-
halves are disabled, the system locks up.

Address both issues by reversing the order of the xprt_lookup_rqst
call and the ro_unmap_sync call. Introduce a separate lookup
mechanism for rpcrdma_req's to enable calling ro_unmap_sync before
xprt_lookup_rqst. Now the handler takes the transport_lock once
and holds it for the XID lookup and RPC completion.

BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=305
Fixes: 68791649 ('xprtrdma: Invalidate in the RPC reply ... ')
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

431af645

xprtrdma: Rename rpcrdma_req::rl_free · a80d66c9

由 Chuck Lever 提交于 6月 08, 2017

Clean up: I'm about to use the rl_free field for purposes other than
a free list. So use a more generic name.

This is a refactoring change only.

BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=305
Fixes: 68791649 ('xprtrdma: Invalidate in the RPC reply ... ')
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

a80d66c9

xprtrdma: Pass only the list of registered MRs to ro_unmap_sync · 451d26e1

由 Chuck Lever 提交于 6月 08, 2017

There are rare cases where an rpcrdma_req can be re-used (via
rpcrdma_buffer_put) while the RPC reply handler is still running.
This is due to a signal firing at just the wrong instant.

Since commit 9d6b0409 ("xprtrdma: Place registered MWs on a
per-req list"), rpcrdma_mws are self-contained; ie., they fully
describe an MR and scatterlist, and no part of that information is
stored in struct rpcrdma_req.

As part of closing the above race window, pass only the req's list
of registered MRs to ro_unmap_sync, rather than the rpcrdma_req
itself.

Some extra transport header sanity checking is removed. Since the
client depends on its own recollection of what memory had been
registered, there doesn't seem to be a way to abuse this change.

And, the check was not terribly effective. If the client had sent
Read chunks, the "list_empty" test is negative in both of the
removed cases, which are actually looking for Write or Reply
chunks.

BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=305
Fixes: 68791649 ('xprtrdma: Invalidate in the RPC reply ... ')
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

451d26e1

xprtrdma: Pre-mark remotely invalidated MRs · 4b196dc6

由 Chuck Lever 提交于 6月 08, 2017

There are rare cases where an rpcrdma_req and its matched
rpcrdma_rep can be re-used, via rpcrdma_buffer_put, while the RPC
reply handler is still using that req. This is typically due to a
signal firing at just the wrong instant.

As part of closing this race window, avoid using the wrong
rpcrdma_rep to detect remotely invalidated MRs. Mark MRs as
invalidated while we are sure the rep is still OK to use.

BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=305
Fixes: 68791649 ('xprtrdma: Invalidate in the RPC reply ... ')
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

4b196dc6

xprtrdma: On invalidation failure, remove MWs from rl_registered · 04d25b7d

由 Chuck Lever 提交于 6月 08, 2017

Callers assume the ro_unmap_sync and ro_unmap_safe methods empty
the list of registered MRs. Ensure that all paths through
fmr_op_unmap_sync() remove MWs from that list.

Fixes: 9d6b0409 ("xprtrdma: Place registered MWs on a ... ")
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

04d25b7d

SUNRPC: Make slot allocation more reliable · 92ea011f

由 Trond Myklebust 提交于 6月 20, 2017

In xprt_alloc_slot(), the spin lock is only needed to provide atomicity
between the atomic_add_unless() failure and the call to xprt_add_backlog().
We do not actually need to hold it across the memory allocation itself.

By dropping the lock, we can use a more resilient GFP_NOFS allocation,
just as we now do in the rest of the RPC client code.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

92ea011f

C
sunrpc: mark all struct svc_version instances as const · aa8217d5
由 Christoph Hellwig 提交于 5月 12, 2017
```
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Acked-by: NTrond Myklebust <trond.myklebust@primarydata.com>
```
aa8217d5

sunrpc: mark all struct svc_procinfo instances as const · b9c744c1

由 Christoph Hellwig 提交于 5月 12, 2017

struct svc_procinfo contains function pointers, and marking it as
constant avoids it being able to be used as an attach vector for
code injections.
Signed-off-by: NChristoph Hellwig <hch@lst.de>

b9c744c1

sunrpc: move pc_count out of struct svc_procinfo · 0becc118

由 Christoph Hellwig 提交于 5月 08, 2017

pc_count is the only writeable memeber of struct svc_procinfo, which is
a good candidate to be const-ified as it contains function pointers.

This patch moves it into out out struct svc_procinfo, and into a
separate writable array that is pointed to by struct svc_version.
Signed-off-by: NChristoph Hellwig <hch@lst.de>

0becc118

sunrpc: properly type pc_encode callbacks · d16d1867

由 Christoph Hellwig 提交于 5月 08, 2017

Drop the resp argument as it can trivially be derived from the rqstp
argument.  With that all functions now have the same prototype, and we
can remove the unsafe casting to kxdrproc_t.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Acked-by: NTrond Myklebust <trond.myklebust@primarydata.com>

d16d1867

sunrpc: properly type pc_decode callbacks · cc6acc20

由 Christoph Hellwig 提交于 5月 08, 2017

Drop the argp argument as it can trivially be derived from the rqstp
argument.  With that all functions now have the same prototype, and we
can remove the unsafe casting to kxdrproc_t.
Signed-off-by: NChristoph Hellwig <hch@lst.de>

cc6acc20

sunrpc: properly type pc_release callbacks · 1150ded8

由 Christoph Hellwig 提交于 5月 08, 2017

Drop the p and resp arguments as they are always NULL or can trivially
be derived from the rqstp argument. With that all functions now have the
same prototype, and we can remove the unsafe casting to kxdrproc_t.
Signed-off-by: NChristoph Hellwig <hch@lst.de>

1150ded8

sunrpc: properly type pc_func callbacks · 1c8a5409

由 Christoph Hellwig 提交于 5月 08, 2017

Drop the argp and resp arguments as they can trivially be derived from
the rqstp argument.  With that all functions now have the same prototype,
and we can remove the unsafe casting to svc_procfunc as well as the
svc_procfunc typedef itself.
Signed-off-by: NChristoph Hellwig <hch@lst.de>

1c8a5409

sunrpc: mark all struct rpc_procinfo instances as const · 511e936b

由 Christoph Hellwig 提交于 5月 12, 2017

struct rpc_procinfo contains function pointers, and marking it as
constant avoids it being able to be used as an attach vector for
code injections.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Acked-by: NTrond Myklebust <trond.myklebust@primarydata.com>

511e936b

sunrpc: move p_count out of struct rpc_procinfo · c551858a

由 Christoph Hellwig 提交于 5月 08, 2017

p_count is the only writeable memeber of struct rpc_procinfo, which is
a good candidate to be const-ified as it contains function pointers.

This patch moves it into out out struct rpc_procinfo, and into a
separate writable array that is pointed to by struct rpc_version and
indexed by p_statidx.
Signed-off-by: NChristoph Hellwig <hch@lst.de>

c551858a

sunrpc/auth_gss: fix decoder callback prototypes · c56c620b

由 Christoph Hellwig 提交于 5月 08, 2017

Declare the p_decode callbacks with the proper prototype instead of
casting to kxdrdproc_t and losing all type safety.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJeff Layton <jlayton@redhat.com>
Acked-by: NTrond Myklebust <trond.myklebust@primarydata.com>

c56c620b

sunrpc: fix decoder callback prototypes · 555966be

由 Christoph Hellwig 提交于 5月 08, 2017

Declare the p_decode callbacks with the proper prototype instead of
casting to kxdrdproc_t and losing all type safety.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJeff Layton <jlayton@redhat.com>

555966be

sunrpc: properly type argument to kxdrdproc_t · 993328e2

由 Christoph Hellwig 提交于 5月 08, 2017

Pass struct rpc_request as the first argument instead of an untyped blob.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJeff Layton <jlayton@redhat.com>
Acked-by: NTrond Myklebust <trond.myklebust@primarydata.com>

993328e2

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功