提交 · 316a616e7886583c03d00f8320458b713c1dd277 · openeuler / Kernel

03 10月, 2018 7 次提交

xprtrdma: Re-organize the switch() in rpcrdma_conn_upcall · 316a616e

由 Chuck Lever 提交于 10月 01, 2018

Clean up: Eliminate the FALLTHROUGH into the default arm to make the
switch easier to understand.

Also, as long as I'm here, do not display the memory address of the
target rpcrdma_ep. A hashed memory address is of marginal use here.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

316a616e

xprtrdma: Eliminate "connstate" variable from rpcrdma_conn_upcall() · aadc5a94

由 Chuck Lever 提交于 10月 01, 2018

Clean up.

Since commit 173b8f49 ("xprtrdma: Demote "connect" log messages")
there has been no need to initialize connstat to zero. In fact, in
this code path there's now no reason not to set rep_connected
directly.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

aadc5a94

xprtrdma: Conventional variable names in rpcrdma_conn_upcall · ed97f1f7

由 Chuck Lever 提交于 10月 01, 2018

Clean up: The convention throughout other parts of xprtrdma is to
name variables of type struct rpcrdma_xprt "r_xprt", not "xprt".
This convention enables the use of the name "xprt" for a "struct
rpc_xprt" type variable, as in other parts of the RPC client.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

ed97f1f7

xprtrdma: Rename rpcrdma_conn_upcall · ae38288e

由 Chuck Lever 提交于 10月 01, 2018

Clean up: Use a function name that is consistent with the RDMA core
API and with other consumers. Because this is a function that is
invoked from outside the rpcrdma.ko module, add an appropriate
documenting comment.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

ae38288e

xprtrdma: Name MR trace events consistently · d379eaa8

由 Chuck Lever 提交于 10月 01, 2018

Clean up the names of trace events related to MRs so that it's
easy to enable these with a glob.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

d379eaa8

xprtrdma: Explicitly resetting MRs is no longer necessary · 61da886b

由 Chuck Lever 提交于 10月 01, 2018

When a memory operation fails, the MR's driver state might not match
its hardware state. The only reliable recourse is to dereg the MR.
This is done in ->ro_recover_mr, which then attempts to allocate a
fresh MR to replace the released MR.

Since commit e2ac236c ("xprtrdma: Allocate MRs on demand"),
xprtrdma dynamically allocates MRs. It can add more MRs whenever
they are needed.

That makes it possible to simply release an MR when a memory
operation fails, instead of "recovering" it. It will automatically
be replaced by the on-demand MR allocator.

This commit is a little larger than I wanted, but it replaces
->ro_recover_mr, rb_recovery_lock, rb_recovery_worker, and the
rb_stale_mrs list with a generic work queue.

Since MRs are no longer orphaned, the mrs_orphaned metric is no
longer used.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

61da886b

xprtrdma: Create more MRs at a time · c421ece6

由 Chuck Lever 提交于 10月 01, 2018

Some devices require more than 3 MRs to build a single 1MB I/O.
Ensure that rpcrdma_mrs_create() will add enough MRs to build that
I/O.

In a subsequent patch I'm changing the MR recovery logic to just
toss out the MRs. In that case it's possible for ->send_request to
loop acquiring some MRs, not getting enough, getting called again,
recycling the previous MRs, then not getting enough, lather rinse
repeat. Thus first we need to ensure enough MRs are created to
prevent that loop.

I'm "reusing" ia->ri_max_segs. All of its accessors seem to want the
maximum number of data segments plus two, so I'm going to bake that
into the initial calculation.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

c421ece6

09 8月, 2018 1 次提交

xprtrdma: Fix disconnect regression · 8d4fb8ff

由 Chuck Lever 提交于 7月 28, 2018

I found that injecting disconnects with v4.18-rc resulted in
random failures of the multi-threaded git regression test.

The root cause appears to be that, after a reconnect, the
RPC/RDMA transport is waking pending RPCs before the transport has
posted enough Receive buffers to receive the Replies. If a Reply
arrives before enough Receive buffers are posted, the connection
is dropped. A few connection drops happen in quick succession as
the client and server struggle to regain credit synchronization.

This regression was introduced with commit 7c8d9e7c ("xprtrdma:
Move Receive posting to Receive handler"). The client is supposed to
post a single Receive when a connection is established because
it's not supposed to send more than one RPC Call before it gets
a fresh credit grant in the first RPC Reply [RFC 8166, Section
3.3.3].

Unfortunately there appears to be a longstanding bug in the Linux
client's credit accounting mechanism. On connect, it simply dumps
all pending RPC Calls onto the new connection. It's possible it has
done this ever since the RPC/RDMA transport was added to the kernel
ten years ago.

Servers have so far been tolerant of this bad behavior. Currently no
server implementation ever changes its credit grant over reconnects,
and servers always repost enough Receives before connections are
fully established.

The Linux client implementation used to post a Receive before each
of these Calls. This has covered up the flooding send behavior.

I could try to correct this old bug so that the client sends exactly
one RPC Call and waits for a Reply. Since we are so close to the
next merge window, I'm going to instead provide a simple patch to
post enough Receives before a reconnect completes (based on the
number of credits granted to the previous connection).

The spurious disconnects will be gone, but the client will still
send multiple RPC Calls immediately after a reconnect.

Addressing the latter problem will wait for a merge window because
a) I expect it to be a large change requiring lots of testing, and
b) obviously the Linux client has interoperated successfully since
day zero while still being broken.

Fixes: 7c8d9e7c ("xprtrdma: Move Receive posting to ... ")
Cc: stable@vger.kernel.org # v4.18+
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

8d4fb8ff

31 7月, 2018 1 次提交

RDMA, core and ULPs: Declare ib_post_send() and ib_post_recv() arguments const · d34ac5cd

由 Bart Van Assche 提交于 7月 18, 2018

Since neither ib_post_send() nor ib_post_recv() modify the data structure
their second argument points at, declare that argument const. This change
makes it necessary to declare the 'bad_wr' argument const too and also to
modify all ULPs that call ib_post_send(), ib_post_recv() or
ib_post_srq_recv(). This patch does not change any functionality but makes
it possible for the compiler to verify whether the
ib_post_(send|recv|srq_recv) really do not modify the posted work request.

To make this possible, only one cast had to be introduce that casts away
constness, namely in rpcrdma_post_recvs(). The only way I can think of to
avoid that cast is to introduce an additional loop in that function or to
change the data type of bad_wr from struct ib_recv_wr ** into int
(an index that refers to an element in the work request list). However,
both approaches would require even more extensive changes than this
patch.
Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

d34ac5cd

19 6月, 2018 1 次提交

IB/core: add max_send_sge and max_recv_sge attributes · 33023fb8

由 Steve Wise 提交于 6月 18, 2018

This patch replaces the ib_device_attr.max_sge with max_send_sge and
max_recv_sge. It allows ulps to take advantage of devices that have very
different send and recv sge depths.  For example cxgb4 has a max_recv_sge
of 4, yet a max_send_sge of 16.  Splitting out these attributes allows
much more efficient use of the SQ for cxgb4 with ulps that use the RDMA_RW
API. Consider a large RDMA WRITE that has 16 scattergather entries.
With max_sge of 4, the ulp would send 4 WRITE WRs, but with max_sge of
16, it can be done with 1 WRITE WR.
Acked-by: NSagi Grimberg <sagi@grimberg.me>
Acked-by: NChristoph Hellwig <hch@lst.de>
Acked-by: NSelvin Xavier <selvin.xavier@broadcom.com>
Acked-by: NShiraz Saleem <shiraz.saleem@intel.com>
Acked-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

33023fb8

02 6月, 2018 1 次提交

xprtrdma: Wait on empty sendctx queue · 2fad6592

由 Chuck Lever 提交于 5月 04, 2018

Currently, when the sendctx queue is exhausted during marshaling, the
RPC/RDMA transport places the RPC task on the delayq, which forces a
wait for HZ >> 2 before the marshal and send is retried.

With this change, the transport now places such an RPC task on the
pending queue, and wakes it just as soon as more sendctxs become
available. This typically takes less than a millisecond, and the
write_space waking mechanism is less deadlock-prone.

Moreover, the waiting RPC task is holding the transport's write
lock, which blocks the transport from sending RPCs. Therefore faster
recovery from sendctx queue exhaustion is desirable.

Cf. commit 5804891455d5 ("xprtrdma: ->send_request returns -EAGAIN
when there are no free MRs").
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

2fad6592

12 5月, 2018 1 次提交

xprtrdma: Prepare RPC/RDMA includes for server-side trace points · b6e717cb

由 Chuck Lever 提交于 5月 07, 2018

Clean up: Move #include <trace/events/rpcrdma.h> into source files,
similar to how it is done with trace/events/sunrpc.h.

Server-side trace points will be part of the rpcrdma subsystem,
just like the client-side trace points.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

b6e717cb

07 5月, 2018 10 次提交

xprtrdma: Make rpcrdma_sendctx_put_locked() a static function · efd81e90

由 Chuck Lever 提交于 5月 04, 2018

Clean up: The only call site is in the same file as the function's
definition.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

efd81e90

xprtrdma: Remove rpcrdma_buffer_get_rep_locked() · 9d95cd53

由 Chuck Lever 提交于 5月 04, 2018

Clean up: There is only one remaining call site for this helper.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

9d95cd53

xprtrdma: Remove rpcrdma_buffer_get_req_locked() · e68699cc

由 Chuck Lever 提交于 5月 04, 2018

Clean up. There is only one call-site for this helper, and it can be
simplified by using list_first_entry_or_null().
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

e68699cc

xprtrdma: Remove rpcrdma_ep_{post_recv, post_extra_recv} · a7986f09

由 Chuck Lever 提交于 5月 04, 2018

Clean up: These functions are no longer used.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

a7986f09

xprtrdma: Move Receive posting to Receive handler · 7c8d9e7c

由 Chuck Lever 提交于 5月 04, 2018

Receive completion and Reply handling are done by a BOUND
workqueue, meaning they run on only one CPU.

Posting receives is currently done in the send_request path, which
on large systems is typically done on a different CPU than the one
handling Receive completions. This results in movement of
Receive-related cachelines between the sending and receiving CPUs.

More importantly, it means that currently Receives are posted while
the transport's write lock is held, which is unnecessary and costly.

Finally, allocation of Receive buffers is performed on-demand in
the Receive completion handler. This helps guarantee that they are
allocated on the same NUMA node as the CPU that handles Receive
completions.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

7c8d9e7c

xprtrdma: Clean up Receive trace points · 0e0b854c

由 Chuck Lever 提交于 5月 04, 2018

For clarity, report the posting and completion of Receive CQEs.

Also, the wc->byte_len field contains garbage if wc->status is
non-zero, and the vendor error field contains garbage if wc->status
is zero. For readability, don't save those fields in those cases.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

0e0b854c

xprtrdma: Fix max_send_wr computation · 914fcad9

由 Chuck Lever 提交于 5月 04, 2018

For FRWR, the computation of max_send_wr is split between
frwr_op_open and rpcrdma_ep_create, which makes it difficult to tell
that the max_send_wr result is currently incorrect if frwr_op_open
has to reduce the credit limit to accommodate a small max_qp_wr.
This is a problem now that extra WRs are needed for backchannel
operations and a drain CQE.

So, refactor the computation so that it is all done in ->ro_open,
and fix the FRWR version of this computation so that it
accommodates HCAs with small max_qp_wr correctly.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

914fcad9

xprtrdma: Create transport's CM ID in the correct network namespace · 107c4beb

由 Chuck Lever 提交于 5月 04, 2018

Set up RPC/RDMA transport in mount.nfs's network namespace. This
passes the correct namespace information to the RDMA core, similar
to how RPC sockets are created (see xs_create_sock).
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

107c4beb

xprtrdma: Try to fail quickly if proto=rdma · 52d28fe4

由 Chuck Lever 提交于 5月 04, 2018

rdma_resolve_addr(3) says:

> This call is used to map a given destination IP address to a
> usable RDMA address. The IP to RDMA address mapping is done
> using the local routing tables, or via ARP.

If this can't be done, there's no local device that can be used
to establish an RDMA-capable network path to the remote. In this
case, the RDMA CM very quickly posts an RDMA_CM_EVENT_ADDR_ERROR
upcall.

Currently rpcrdma_conn_upcall() converts RDMA_CM_EVENT_ADDR_ERROR
to EHOSTUNREACH. mount.nfs seems to want to retry EHOSTUNREACH
forever, thinking that this is a temporary situation. This makes
mount.nfs appear to hang if I try to mount with proto=rdma through,
say, a conventional Ethernet device.

If the admin has specified proto=rdma along with a server IP address
that requires a network path that does not support RDMA, instead
let's fail with a permanent error. -EPROTONOSUPPORT is returned when
NFSv4 or one of its minor versions is not supported.

-EPROTO is not (currently) retried by mount.nfs.

There are potentially other similar cases where -EPROTO is an
appropriate return code.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Tested-by: NOlga Kornievskaia <kolga@netapp.com>
Tested-by: NAnna Schumaker <Anna.Schumaker@netapp.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

52d28fe4

xprtrdma: Add proper SPDX tags for NetApp-contributed source · a2268cfb

由 Chuck Lever 提交于 5月 04, 2018

Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

a2268cfb

02 5月, 2018 1 次提交

xprtrdma: Fix list corruption / DMAR errors during MR recovery · 054f1557

由 Chuck Lever 提交于 5月 01, 2018

The ro_release_mr methods check whether mr->mr_list is empty.
Therefore, be sure to always use list_del_init when removing an MR
linked into a list using that field. Otherwise, when recovering from
transport failures or device removal, list corruption can result, or
MRs can get mapped or unmapped an odd number of times, resulting in
IOMMU-related failures.

In general this fix is appropriate back to v4.8. However, code
changes since then make it impossible to apply this patch directly
to stable kernels. The fix would have to be applied by hand or
reworked for kernels earlier than v4.16.

Backport guidance -- there are several cases:
- When creating an MR, initialize mr_list so that using list_empty
  on an as-yet-unused MR is safe.
- When an MR is being handled by the remote invalidation path,
  ensure that mr_list is reinitialized when it is removed from
  rl_registered.
- When an MR is being handled by rpcrdma_destroy_mrs, it is removed
  from mr_all, but it may still be on an rl_registered list. In
  that case, the MR needs to be removed from that list before being
  released.
- Other cases are covered by using list_del_init in rpcrdma_mr_pop.

Fixes: 9d6b0409 ('xprtrdma: Place registered MWs on a ... ')
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

054f1557

11 4月, 2018 7 次提交

xprtrdma: Fix corner cases when handling device removal · 25524288

由 Chuck Lever 提交于 3月 19, 2018

Michal Kalderon has found some corner cases around device unload
with active NFS mounts that I didn't have the imagination to test
when xprtrdma device removal was added last year.

- The ULP device removal handler is responsible for deallocating
  the PD. That wasn't clear to me initially, and my own testing
  suggested it was not necessary, but that is incorrect.

- The transport destruction path can no longer assume that there
  is a valid ID.

- When destroying a transport, ensure that ib_free_cq() is not
  invoked on a CQ that was already released.
Reported-by: NMichal Kalderon <Michal.Kalderon@cavium.com>
Fixes: bebd0318 ("xprtrdma: Support unplugging an HCA from ...")
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Cc: stable@vger.kernel.org # v4.12+
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

25524288

xprtrdma: Move creation of rl_rdmabuf to rpcrdma_create_req · 2dd4a012

由 Chuck Lever 提交于 2月 28, 2018

Refactor: Both rpcrdma_create_req call sites have to allocate the
buffer where the transport header is built, so just move that
allocation into rpcrdma_create_req.

This buffer is a fixed size. There's no needed information available
in call_allocate that is not also available when the transport is
created.

The original purpose for allocating these buffers on demand was to
reduce the possibility that an allocation failure during transport
creation will hork the mount operation during low memory scenarios.
Some relief for this rare possibility is coming up in the next few
patches.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

2dd4a012

xprtrdma: Chain Send to FastReg WRs · f2877623

由 Chuck Lever 提交于 2月 28, 2018

With FRWR, the client transport can perform memory registration and
post a Send with just a single ib_post_send.

This reduces contention between the send_request path and the Send
Completion handlers, and reduces the overhead of registering a chunk
that has multiple segments.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

f2877623

xprtrdma: Reduce number of MRs created by rpcrdma_mrs_create · ae741a85

由 Chuck Lever 提交于 2月 28, 2018

Create fewer MRs on average. Many workloads don't need as many as
32 MRs, and the transport can now quickly restock the MR free list.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

ae741a85

xprtrdma: ->send_request returns -EAGAIN when there are no free MRs · 9e679d5e

由 Chuck Lever 提交于 2月 28, 2018

Currently, when the MR free list is exhausted during marshaling, the
RPC/RDMA transport places the RPC task on the delayq, which forces a
wait for HZ >> 2 before the marshal and send is retried.

With this change, the transport now places such an RPC task on the
pending queue, and wakes it just as soon as more MRs have been
created. Creating more MRs typically takes less than a millisecond,
and this waking mechanism is less deadlock-prone.

Moreover, the waiting RPC task is holding the transport's write
lock, which blocks the transport from sending RPCs. Therefore faster
recovery from MR exhaustion is desirable.

This is the same mechanism that the TCP transport utilizes when
handling write buffer space exhaustion.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

9e679d5e

xprtrdma: Remove xprt-specific connect cookie · 8a14793e

由 Chuck Lever 提交于 2月 28, 2018

Clean up: The generic rq_connect_cookie is sufficient to detect RPC
Call retransmission.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

8a14793e

xprtrdma: Remove arbitrary limit on initiator depth · b7e85fff

由 Chuck Lever 提交于 2月 28, 2018

Clean up: We need to check only that the value does not exceed the
range of the u8 field it's going into.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

b7e85fff

03 2月, 2018 2 次提交

xprtrdma: Fix BUG after a device removal · e89e8d8f

由 Chuck Lever 提交于 1月 31, 2018

Michal Kalderon reports a BUG that occurs just after device removal:

[  169.112490] rpcrdma: removing device qedr0 for 192.168.110.146:20049
[  169.143909] BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
[  169.181837] IP: rpcrdma_dma_unmap_regbuf+0xa/0x60 [rpcrdma]

The RPC/RDMA client transport attempts to allocate some resources
on demand. Registered buffers are one such resource. These are
allocated (or re-allocated) by xprt_rdma_allocate to hold RPC Call
and Reply messages. A hardware resource is associated with each of
these buffers, as they can be used for a Send or Receive Work
Request.

If a device is removed from under an NFS/RDMA mount, the transport
layer is responsible for releasing all hardware resources before
the device can be finally unplugged. A BUG results when the NFS
mount hasn't yet seen much activity: the transport tries to release
resources that haven't yet been allocated.

rpcrdma_free_regbuf() already checks for this case, so just move
that check to cover the DEVICE_REMOVAL case as well.
Reported-by: NMichal Kalderon <Michal.Kalderon@cavium.com>
Fixes: bebd0318 ("xprtrdma: Support unplugging an HCA ...")
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Tested-by: NMichal Kalderon <Michal.Kalderon@cavium.com>
Cc: stable@vger.kernel.org # v4.12+
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

e89e8d8f

xprtrdma: Fix calculation of ri_max_send_sges · 1179e2c2

由 Chuck Lever 提交于 1月 31, 2018

Commit 16f906d6 ("xprtrdma: Reduce required number of send
SGEs") introduced the rpcrdma_ia::ri_max_send_sges field. This fixes
a problem where xprtrdma would not work if the device's max_sge
capability was small (low single digits).

At least RPCRDMA_MIN_SEND_SGES are needed for the inline parts of
each RPC. ri_max_send_sges is set to this value:

  ia->ri_max_send_sges = max_sge - RPCRDMA_MIN_SEND_SGES;

Then when marshaling each RPC, rpcrdma_args_inline uses that value
to determine whether the device has enough Send SGEs to convey an
NFS WRITE payload inline, or whether instead a Read chunk is
required.

More recently, commit ae72950a ("xprtrdma: Add data structure to
manage RDMA Send arguments") used the ri_max_send_sges value to
calculate the size of an array, but that commit erroneously assumed
ri_max_send_sges contains a value similar to the device's max_sge,
and not one that was reduced by the minimum SGE count.

This assumption results in the calculated size of the sendctx's
Send SGE array to be too small. When the array is used to marshal
an RPC, the code can write Send SGEs into the following sendctx
element in that array, corrupting it. When the device's max_sge is
large, this issue is entirely harmless; but it results in an oops
in the provider's post_send method, if dev.attrs.max_sge is small.

So let's straighten this out: ri_max_send_sges will now contain a
value with the same meaning as dev.attrs.max_sge, which makes
the code easier to understand, and enables rpcrdma_sendctx_create
to calculate the size of the SGE array correctly.
Reported-by: NMichal Kalderon <Michal.Kalderon@cavium.com>
Fixes: 16f906d6 ("xprtrdma: Reduce required number of send SGEs")
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Tested-by: NMichal Kalderon <Michal.Kalderon@cavium.com>
Cc: stable@vger.kernel.org # v4.10+
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

1179e2c2

23 1月, 2018 8 次提交

xprtrdma: Correct some documenting comments · 9ab6d89e

由 Chuck Lever 提交于 1月 03, 2018

Fix kernel-doc warnings in net/sunrpc/xprtrdma/ .

net/sunrpc/xprtrdma/verbs.c:1575: warning: No description found for parameter 'count'
net/sunrpc/xprtrdma/verbs.c:1575: warning: Excess function parameter 'min_reqs' description in 'rpcrdma_ep_post_extra_recv'

net/sunrpc/xprtrdma/backchannel.c:288: warning: No description found for parameter 'r_xprt'
net/sunrpc/xprtrdma/backchannel.c:288: warning: Excess function parameter 'xprt' description in 'rpcrdma_bc_receive_call'
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

9ab6d89e

C
xprtrdma: Instrument allocation/release of rpcrdma_req/rep objects · ae724676
由 Chuck Lever 提交于 12月 20, 2017
```
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
```
ae724676
C
xprtrdma: Add trace points to instrument QP and CQ access upcalls · 643cf323
由 Chuck Lever 提交于 12月 20, 2017
```
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
```
643cf323

xprtrdma: Add trace points for connect events · b4744e00

由 Chuck Lever 提交于 12月 20, 2017

Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

b4744e00

C
xprtrdma: Add trace points to instrument MR allocation and recovery · 1c443eff
由 Chuck Lever 提交于 12月 20, 2017
```
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
```
1c443eff
C
xprtrdma: Add trace points to instrument memory invalidation · 2937fede
由 Chuck Lever 提交于 12月 20, 2017
```
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
```
2937fede

xprtrdma: Add trace points in the RPC Reply handler paths · b4a7f91c

由 Chuck Lever 提交于 12月 20, 2017

Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

b4a7f91c

xprtrdma: Add trace points in RPC Call transmit paths · ab03eff5

由 Chuck Lever 提交于 12月 20, 2017

Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

ab03eff5

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功