提交 · cc9a903d915c21626b6b2fbf8ed0ff16a7f82210 · openeuler / Kernel

11 8月, 2015 1 次提交

svcrdma: Change maximum server payload back to RPCSVC_MAXPAYLOAD · cc9a903d

由 Chuck Lever 提交于 8月 07, 2015

Both commit 0380a3f3 ("svcrdma: Add a separate "max data segs"
macro for svcrdma") and commit 7e5be288 ("svcrdma: advertise
the correct max payload") are incorrect. This commit reverts both
changes, restoring the server's maximum payload size to 1MB.

Commit 7e5be288 based the server's maximum payload on the
_client's_ RPCRDMA_MAX_DATA_SEGS value. That was wrong.

Commit 0380a3f3 tried to fix this so that the client maximum
payload size could be raised without affecting the server, but
managed to confuse matters more on the server side.

More importantly, limiting the advertised maximum payload size was
meant to be a workaround, not the actual fix. We need to revisit

  https://bugzilla.linux-nfs.org/show_bug.cgi?id=270

A Linux client on a platform with 64KB pages can overrun and crash
an x86_64 NFS/RDMA server when the r/wsize is 1MB. An x86/64 Linux
client seems to work fine using 1MB reads and writes when the Linux
server's maximum payload size is restored to 1MB.

BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=270
Fixes: 0380a3f3 ("svcrdma: Add a separate "max data segs" macro")
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

cc9a903d

13 6月, 2015 9 次提交

xprtrdma: Stack relief in fmr_op_map() · acb9da7a

由 Chuck Lever 提交于 5月 26, 2015

fmr_op_map() declares a 64 element array of u64 in automatic
storage. This is 512 bytes (8 * 64) on the stack.

Instead, when FMR memory registration is in use, pre-allocate a
physaddr array for each rpcrdma_mw.

This is a pre-requisite for increasing the r/wsize maximum for
FMR on platforms with 4KB pages.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Reviewed-by: NSteve Wise <swise@opengridcomputing.com>
Reviewed-by: NSagi Grimberg <sagig@mellanox.com>
Reviewed-by: NDevesh Sharma <devesh.sharma@avagotech.com>
Tested-By: NDevesh Sharma <devesh.sharma@avagotech.com>
Reviewed-by: NDoug Ledford <dledford@redhat.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

acb9da7a

xprtrdma: Split rb_lock · 58d1dcf5

由 Chuck Lever 提交于 5月 26, 2015

/proc/lock_stat showed contention between rpcrdma_buffer_get/put
and the MR allocation functions during I/O intensive workloads.

Now that MRs are no longer allocated in rpcrdma_buffer_get(),
there's no reason the rb_mws list has to be managed using the
same lock as the send/receive buffers. Split that lock. The
new lock does not need to disable interrupts because buffer
get/put is never called in an interrupt context.

struct rpcrdma_buffer is re-arranged to ensure rb_mwlock and rb_mws
are always in a different cacheline than rb_lock and the buffer
pointers.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Reviewed-by: NSteve Wise <swise@opengridcomputing.com>
Reviewed-by: NSagi Grimberg <sagig@mellanox.com>
Tested-By: NDevesh Sharma <devesh.sharma@avagotech.com>
Reviewed-by: NDoug Ledford <dledford@redhat.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

58d1dcf5

xprtrdma: Remove rpcrdma_ia::ri_memreg_strategy · 7e53df11

由 Chuck Lever 提交于 5月 26, 2015

Clean up: This field is no longer used.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Reviewed-by: NSteve Wise <swise@opengridcomputing.com>
Reviewed-by: NSagi Grimberg <sagig@mellanox.com>
Reviewed-by: NDevesh Sharma <devesh.sharma@avagotech.com>
Tested-By: NDevesh Sharma <devesh.sharma@avagotech.com>
Reviewed-by: NDoug Ledford <dledford@redhat.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

7e53df11

xprtrdma: Remove ->ro_reset · 3269a94b

由 Chuck Lever 提交于 5月 26, 2015

An RPC can exit at any time. When it does so, xprt_rdma_free() is
called, and it calls ->op_unmap().

If ->ro_reset() is running due to a transport disconnect, the two
methods can race while processing the same rpcrdma_mw. The results
are unpredictable.

Because of this, in previous patches I've altered ->ro_map() to
handle MR reset. ->ro_reset() is no longer needed and can be
removed.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Reviewed-by: NSteve Wise <swise@opengridcomputing.com>
Reviewed-by: NSagi Grimberg <sagig@mellanox.com>
Reviewed-by: NDevesh Sharma <devesh.sharma@avagotech.com>
Tested-By: NDevesh Sharma <devesh.sharma@avagotech.com>
Reviewed-by: NDoug Ledford <dledford@redhat.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

3269a94b

xprtrdma: Introduce an FRMR recovery workqueue · 951e721c

由 Chuck Lever 提交于 5月 26, 2015

After a transport disconnect, FRMRs can be left in an undetermined
state. In particular, the MR's rkey is no good.

Currently, FRMRs are fixed up by the transport connect worker, but
that can race with ->ro_unmap if an RPC happens to exit while the
transport connect worker is running.

A better way of dealing with broken FRMRs is to detect them before
they are re-used by ->ro_map. Such FRMRs are either already invalid
or are owned by the sending RPC, and thus no race with ->ro_unmap
is possible.

Introduce a mechanism for handing broken FRMRs to a workqueue to be
reset in a context that is appropriate for allocating resources
(ie. an ib_alloc_fast_reg_mr() API call).

This mechanism is not yet used, but will be in subsequent patches.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Reviewed-by: NSteve Wise <swise@opengridcomputing.com>
Reviewed-By: NDevesh Sharma <devesh.sharma@avagotech.com>
Tested-By: NDevesh Sharma <devesh.sharma@avagotech.com>
Reviewed-by: NDoug Ledford <dledford@redhat.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

951e721c

xprtrdma: Introduce helpers for allocating MWs · 346aa66b

由 Chuck Lever 提交于 5月 26, 2015

We eventually want to handle allocating MWs one at a time, as
needed, instead of grabbing 64 and throwing them at each RPC in the
pipeline.

Add a helper for grabbing an MW off rb_mws, and a helper for
returning an MW to rb_mws. These will be used in a subsequent patch.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Reviewed-by: NSteve Wise <swise@opengridcomputing.com>
Reviewed-by: NSagi Grimberg <sagig@mellanox.com>
Tested-By: NDevesh Sharma <devesh.sharma@avagotech.com>
Reviewed-by: NDoug Ledford <dledford@redhat.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

346aa66b

xprtrdma: Use ib_device pointer safely · 89e0d112

由 Chuck Lever 提交于 5月 26, 2015

The connect worker can replace ri_id, but prevents ri_id->device
from changing during the lifetime of a transport instance. The old
ID is kept around until a new ID is created and the ->device is
confirmed to be the same.

Cache a copy of ri_id->device in rpcrdma_ia and in rpcrdma_rep.
The cached copy can be used safely in code that does not serialize
with the connect worker.

Other code can use it to save an extra address generation (one
pointer dereference instead of two).
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Reviewed-by: NSteve Wise <swise@opengridcomputing.com>
Tested-By: NDevesh Sharma <devesh.sharma@avagotech.com>
Reviewed-by: NDoug Ledford <dledford@redhat.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

89e0d112

xprtrdma: Remove rr_func · 494ae30d

由 Chuck Lever 提交于 5月 26, 2015

A posted rpcrdma_rep never has rr_func set to anything but
rpcrdma_reply_handler.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Tested-By: NDevesh Sharma <devesh.sharma@avagotech.com>
Reviewed-by: NDoug Ledford <dledford@redhat.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

494ae30d

xprtrdma: Replace rpcrdma_rep::rr_buffer with rr_rxprt · fed171b3

由 Chuck Lever 提交于 5月 26, 2015

Clean up: Instead of carrying a pointer to the buffer pool and
the rpc_xprt, carry a pointer to the controlling rpcrdma_xprt.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Reviewed-by: NSteve Wise <swise@opengridcomputing.com>
Reviewed-by: NSagi Grimberg <sagig@mellanox.com>
Tested-By: NDevesh Sharma <devesh.sharma@avagotech.com>
Reviewed-by: NDoug Ledford <dledford@redhat.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

fed171b3

05 6月, 2015 2 次提交

rpcrdma: Merge svcrdma and xprtrdma modules into one · ffe1f0df

由 Chuck Lever 提交于 6月 04, 2015

Bi-directional RPC support means code in svcrdma.ko invokes a bit of
code in xprtrdma.ko, and vice versa. To avoid loader/linker loops,
merge the server and client side modules together into a single
module.

When backchannel capabilities are added, the combined module will
register all needed transport capabilities so that Upper Layer
consumers automatically have everything needed to create a
bi-directional transport connection.

Module aliases are added for backwards compatibility with user
space, which still may expect svcrdma.ko or xprtrdma.ko to be
present.

This commit reverts commit 2e8c12e1 ("xprtrdma: add separate
Kconfig options for NFSoRDMA client and server support") and
provides a single CONFIG option for enabling the new module.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

ffe1f0df

svcrdma: Add a separate "max data segs macro for svcrdma · 0380a3f3

由 Chuck Lever 提交于 6月 04, 2015

The server and client maximum are architecturally independent.
Allow changing one without affecting the other.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

0380a3f3

31 3月, 2015 11 次提交

xprtrdma: Make rpcrdma_{un}map_one() into inline functions · d654788e

由 Chuck Lever 提交于 3月 30, 2015

These functions are called in a loop for each page transferred via
RDMA READ or WRITE. Extract loop invariants and inline them to
reduce CPU overhead.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Tested-by: NDevesh Sharma <Devesh.Sharma@Emulex.Com>
Tested-by: NMeghana Cheripady <Meghana.Cheripady@Emulex.Com>
Tested-by: NVeeresh U. Kokatnur <veereshuk@chelsio.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

d654788e

xprtrdma: Handle non-SEND completions via a callout · e46ac34c

由 Chuck Lever 提交于 3月 30, 2015

Allow each memory registration mode to plug in a callout that handles
the completion of a memory registration operation.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Reviewed-by: NSagi Grimberg <sagig@mellanox.com>
Tested-by: NDevesh Sharma <Devesh.Sharma@Emulex.Com>
Tested-by: NMeghana Cheripady <Meghana.Cheripady@Emulex.Com>
Tested-by: NVeeresh U. Kokatnur <veereshuk@chelsio.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

e46ac34c

xprtrdma: Add "open" memreg op · 3968cb58

由 Chuck Lever 提交于 3月 30, 2015

The open op determines the size of various transport data structures
based on device capabilities and memory registration mode.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Tested-by: NDevesh Sharma <Devesh.Sharma@Emulex.Com>
Tested-by: NMeghana Cheripady <Meghana.Cheripady@Emulex.Com>
Tested-by: NVeeresh U. Kokatnur <veereshuk@chelsio.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

3968cb58

xprtrdma: Add "destroy MRs" memreg op · 4561f347

由 Chuck Lever 提交于 3月 30, 2015

Memory Region objects associated with a transport instance are
destroyed before the instance is shutdown and destroyed.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Reviewed-by: NSagi Grimberg <sagig@mellanox.com>
Tested-by: NDevesh Sharma <Devesh.Sharma@Emulex.Com>
Tested-by: NMeghana Cheripady <Meghana.Cheripady@Emulex.Com>
Tested-by: NVeeresh U. Kokatnur <veereshuk@chelsio.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

4561f347

xprtrdma: Add "reset MRs" memreg op · 31a701a9

由 Chuck Lever 提交于 3月 30, 2015

This method is invoked when a transport instance is about to be
reconnected. Each Memory Region object is reset to its initial
state.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Reviewed-by: NSagi Grimberg <sagig@mellanox.com>
Tested-by: NDevesh Sharma <Devesh.Sharma@Emulex.Com>
Tested-by: NMeghana Cheripady <Meghana.Cheripady@Emulex.Com>
Tested-by: NVeeresh U. Kokatnur <veereshuk@chelsio.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

31a701a9

xprtrdma: Add "init MRs" memreg op · 91e70e70

由 Chuck Lever 提交于 3月 30, 2015

This method is used when setting up a new transport instance to
create a pool of Memory Region objects that will be used to register
memory during operation.

Memory Regions are not needed for "physical" registration, since
->prepare and ->release are no-ops for that mode.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Reviewed-by: NSagi Grimberg <sagig@mellanox.com>
Tested-by: NDevesh Sharma <Devesh.Sharma@Emulex.Com>
Tested-by: NMeghana Cheripady <Meghana.Cheripady@Emulex.Com>
Tested-by: NVeeresh U. Kokatnur <veereshuk@chelsio.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

91e70e70

xprtrdma: Add a "deregister_external" op for each memreg mode · 6814baea

由 Chuck Lever 提交于 3月 30, 2015

There is very little common processing among the different external
memory deregistration functions.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Tested-by: NDevesh Sharma <Devesh.Sharma@Emulex.Com>
Tested-by: NMeghana Cheripady <Meghana.Cheripady@Emulex.Com>
Tested-by: NVeeresh U. Kokatnur <veereshuk@chelsio.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

6814baea

xprtrdma: Add a "register_external" op for each memreg mode · 9c1b4d77

由 Chuck Lever 提交于 3月 30, 2015

There is very little common processing among the different external
memory registration functions. Have rpcrdma_create_chunks() call
the registration method directly. This removes a stack frame and a
switch statement from the external registration path.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Tested-by: NDevesh Sharma <Devesh.Sharma@Emulex.Com>
Tested-by: NMeghana Cheripady <Meghana.Cheripady@Emulex.Com>
Tested-by: NVeeresh U. Kokatnur <veereshuk@chelsio.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

9c1b4d77

xprtrdma: Add a "max_payload" op for each memreg mode · 1c9351ee

由 Chuck Lever 提交于 3月 30, 2015

The max_payload computation is generalized to ensure that the
payload maximum is the lesser of RPC_MAX_DATA_SEGS and the number of
data segments that can be transmitted in an inline buffer.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Reviewed-by: NSagi Grimberg <sagig@mellanox.com>
Tested-by: NDevesh Sharma <Devesh.Sharma@Emulex.Com>
Tested-by: NMeghana Cheripady <Meghana.Cheripady@Emulex.Com>
Tested-by: NVeeresh U. Kokatnur <veereshuk@chelsio.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

1c9351ee

xprtrdma: Add vector of ops for each memory registration strategy · a0ce85f5

由 Chuck Lever 提交于 3月 30, 2015

Instead of employing switch() statements, let's use the typical
Linux kernel idiom for handling behavioral variation: virtual
functions.

Start by defining a vector of operations for each supported memory
registration mode, and by adding a source file for each mode.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Reviewed-by: NSagi Grimberg <sagig@mellanox.com>
Tested-by: NDevesh Sharma <Devesh.Sharma@Emulex.Com>
Tested-by: NMeghana Cheripady <Meghana.Cheripady@Emulex.Com>
Tested-by: NVeeresh U. Kokatnur <veereshuk@chelsio.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

a0ce85f5

xprtrdma: Perform a full marshal on retransmit · e2377945

由 Chuck Lever 提交于 3月 30, 2015

Commit 6ab59945 ("xprtrdma: Update rkeys after transport
reconnect" added logic in the ->send_request path to update the
chunk list when an RPC/RDMA request is retransmitted.

Note that rpc_xdr_encode() resets and re-encodes the entire RPC
send buffer for each retransmit of an RPC. The RPC send buffer
is not preserved from the previous transmission of an RPC.

Revert 6ab59945, and instead, just force each request to be
fully marshaled every time through ->send_request. This should
preserve the fix from 6ab59945, while also performing pullup
during retransmits.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Acked-by: NSagi Grimberg <sagig@mellanox.com>
Tested-by: NDevesh Sharma <Devesh.Sharma@Emulex.Com>
Tested-by: NMeghana Cheripady <Meghana.Cheripady@Emulex.Com>
Tested-by: NVeeresh U. Kokatnur <veereshuk@chelsio.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

e2377945

24 2月, 2015 1 次提交

xprtrdma: Store RDMA credits in unsigned variables · 9b1dcbc8

由 Chuck Lever 提交于 2月 12, 2015

Dan Carpenter's static checker pointed out:

   net/sunrpc/xprtrdma/rpc_rdma.c:879 rpcrdma_reply_handler()
   warn: can 'credits' be negative?

"credits" is defined as an int. The credits value comes from the
server as a 32-bit unsigned integer.

A malicious or broken server can plant a large unsigned integer in
that field which would result in an underflow in the following
logic, potentially triggering a deadlock of the mount point by
blocking the client from issuing more RPC requests.

net/sunrpc/xprtrdma/rpc_rdma.c:

  876          credits = be32_to_cpu(headerp->rm_credit);
  877          if (credits == 0)
  878                  credits = 1;    /* don't deadlock */
  879          else if (credits > r_xprt->rx_buf.rb_max_requests)
  880                  credits = r_xprt->rx_buf.rb_max_requests;
  881
  882          cwnd = xprt->cwnd;
  883          xprt->cwnd = credits << RPC_CWNDSHIFT;
  884          if (xprt->cwnd > cwnd)
  885                  xprt_release_rqst_cong(rqst->rq_task);
Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
Fixes: eba8ff66 ("xprtrdma: Move credit update to RPC . . .")
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

9b1dcbc8

06 2月, 2015 1 次提交

xprtrdma: Address sparse complaint in rpcr_to_rdmar() · b625a616

由 Chuck Lever 提交于 2月 04, 2015

With "make ARCH=x86_64 allmodconfig make C=1 CF=-D__CHECK_ENDIAN__":

linux-2.6/net/sunrpc/xprtrdma/xprt_rdma.h:273:30: warning: incorrect
  type in initializer (different base types)
linux-2.6/net/sunrpc/xprtrdma/xprt_rdma.h:273:30: expected restricted
  __be32 [usertype] *buffer
linux-2.6/net/sunrpc/xprtrdma/xprt_rdma.h:273:30:    got unsigned int
  [usertype] *rq_buffer

As far as I can tell this is a false positive.

Reported-by: kbuild-all@01.org
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

b625a616

30 1月, 2015 14 次提交

xprtrdma: Clean up after adding regbuf management · df515ca7

由 Chuck Lever 提交于 1月 21, 2015

rpcrdma_{de}register_internal() are used only in verbs.c now.

MAX_RPCRDMAHDR is no longer used and can be removed.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Reviewed-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

df515ca7

xprtrdma: Allocate zero pad separately from rpcrdma_buffer · c05fbb5a

由 Chuck Lever 提交于 1月 21, 2015

Use the new rpcrdma_alloc_regbuf() API to shrink the amount of
contiguous memory needed for a buffer pool by moving the zero
pad buffer into a regbuf.

This is for consistency with the other uses of internally
registered memory.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Reviewed-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

c05fbb5a

xprtrdma: Allocate RPC/RDMA receive buffer separately from struct rpcrdma_rep · 6b1184cd

由 Chuck Lever 提交于 1月 21, 2015

The rr_base field is currently the buffer where RPC replies land.

An RPC/RDMA reply header lands in this buffer. In some cases an RPC
reply header also lands in this buffer, just after the RPC/RDMA
header.

The inline threshold is an agreed-on size limit for RDMA SEND
operations that pass from server and client. The sum of the
RPC/RDMA reply header size and the RPC reply header size must be
less than this threshold.

The largest RDMA RECV that the client should have to handle is the
size of the inline threshold. The receive buffer should thus be the
size of the inline threshold, and not related to RPCRDMA_MAX_SEGS.

RPC replies received via RDMA WRITE (long replies) are caught in
rq_rcv_buf, which is the second half of the RPC send buffer. Ie,
such replies are not involved in any way with rr_base.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

6b1184cd

xprtrdma: Allocate RPC/RDMA send buffer separately from struct rpcrdma_req · 85275c87

由 Chuck Lever 提交于 1月 21, 2015

The rl_base field is currently the buffer where each RPC/RDMA call
header is built.

The inline threshold is an agreed-on size limit to for RDMA SEND
operations that pass between client and server. The sum of the
RPC/RDMA header size and the RPC header size must be less than or
equal to this threshold.

Increasing the r/wsize maximum will require MAX_SEGS to grow
significantly, but the inline threshold size won't change (both
sides agree on it). The server's inline threshold doesn't change.

Since an RPC/RDMA header can never be larger than the inline
threshold, make all RPC/RDMA header buffers the size of the
inline threshold.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Reviewed-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

85275c87

xprtrdma: Allocate RPC send buffer separately from struct rpcrdma_req · 0ca77dc3

由 Chuck Lever 提交于 1月 21, 2015

Because internal memory registration is an expensive and synchronous
operation, xprtrdma pre-registers send and receive buffers at mount
time, and then re-uses them for each RPC.

A "hardway" allocation is a memory allocation and registration that
replaces a send buffer during the processing of an RPC. Hardway must
be done if the RPC send buffer is too small to accommodate an RPC's
call and reply headers.

For xprtrdma, each RPC send buffer is currently part of struct
rpcrdma_req so that xprt_rdma_free(), which is passed nothing but
the address of an RPC send buffer, can find its matching struct
rpcrdma_req and rpcrdma_rep quickly via container_of / offsetof.

That means that hardway currently has to replace a whole rpcrmda_req
when it replaces an RPC send buffer. This is often a fairly hefty
chunk of contiguous memory due to the size of the rl_segments array
and the fact that both the send and receive buffers are part of
struct rpcrdma_req.

Some obscure re-use of fields in rpcrdma_req is done so that
xprt_rdma_free() can detect replaced rpcrdma_req structs, and
restore the original.

This commit breaks apart the RPC send buffer and struct rpcrdma_req
so that increasing the size of the rl_segments array does not change
the alignment of each RPC send buffer. (Increasing rl_segments is
needed to bump up the maximum r/wsize for NFS/RDMA).

This change opens up some interesting possibilities for improving
the design of xprt_rdma_allocate().

xprt_rdma_allocate() is now the one place where RPC send buffers
are allocated or re-allocated, and they are now always left in place
by xprt_rdma_free().

A large re-allocation that includes both the rl_segments array and
the RPC send buffer is no longer needed. Send buffer re-allocation
becomes quite rare. Good send buffer alignment is guaranteed no
matter what the size of the rl_segments array is.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Reviewed-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

0ca77dc3

xprtrdma: Add struct rpcrdma_regbuf and helpers · 9128c3e7

由 Chuck Lever 提交于 1月 21, 2015

There are several spots that allocate a buffer via kmalloc (usually
contiguously with another data structure) and then register that
buffer internally. I'd like to split the buffers out of these data
structures to allow the data structures to scale.

Start by adding functions that can kmalloc and register a buffer,
and can manage/preserve the buffer's associated ib_sge and ib_mr
fields.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Reviewed-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

9128c3e7

xprtrdma: Simplify synopsis of rpcrdma_buffer_create() · ac920d04

由 Chuck Lever 提交于 1月 21, 2015

Clean up: There is one call site for rpcrdma_buffer_create(). All of
the arguments there are fields of an rpcrdma_xprt.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Reviewed-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

ac920d04

xprtrdma: Take struct ib_qp_attr and ib_qp_init_attr off the stack · ce1ab9ab

由 Chuck Lever 提交于 1月 21, 2015

Reduce stack footprint of the connection upcall handler function.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Reviewed-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

ce1ab9ab

xprtrdma: Take struct ib_device_attr off the stack · 7bc7972c

由 Chuck Lever 提交于 1月 21, 2015

Device attributes are large, and are used in more than one place.
Stash a copy in dynamically allocated memory.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Reviewed-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

7bc7972c

xprtrdma: Remove rpcrdma_ep::rep_func and ::rep_xprt · afadc468

由 Chuck Lever 提交于 1月 21, 2015

Clean up: The rep_func field always refers to rpcrdma_conn_func().
rep_func should have been removed by commit b45ccfd2 ("xprtrdma:
Remove MEMWINDOWS registration modes").
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Reviewed-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

afadc468

xprtrdma: Move credit update to RPC reply handler · eba8ff66

由 Chuck Lever 提交于 1月 21, 2015

Reduce work in the receive CQ handler, which can be run at hardware
interrupt level, by moving the RPC/RDMA credit update logic to the
RPC reply handler.

This has some additional benefits: More header sanity checking is
done before trusting the incoming credit value, and the receive CQ
handler no longer touches the RPC/RDMA header (the CPU stalls while
waiting for the header contents to be brought into the cache).

This further extends work begun by commit e7ce710a ("xprtrdma:
Avoid deadlock when credit window is reset").
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Reviewed-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

eba8ff66

xprtrdma: Remove rl_mr field, and the mr_chunk union · 3eb35810

由 Chuck Lever 提交于 1月 21, 2015

Clean up: Since commit 0ac531c1 ("xprtrdma: Remove REGISTER
memory registration mode"), the rl_mr pointer is no longer used
anywhere.

After removal, there's only a single member of the mr_chunk union,
so mr_chunk can be removed as well, in favor of a single pointer
field.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Reviewed-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

3eb35810

xprtrdma: Remove rpcrdma_ep::rep_ia · 5d410ba0

由 Chuck Lever 提交于 1月 21, 2015

Clean up: This field is not used.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Reviewed-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

5d410ba0

xprtrdma: Rename "xprt" and "rdma_connect" fields in struct rpcrdma_xprt · 5abefb86

由 Chuck Lever 提交于 1月 21, 2015

Clean up: Use consistent field names in struct rpcrdma_xprt.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Reviewed-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

5abefb86

26 11月, 2014 1 次提交

xprtrdma: Cap req_cqinit · e7104a2a

由 Chuck Lever 提交于 11月 08, 2014

Recent work made FRMR registration and invalidation completions
unsignaled. This greatly reduces the adapter interrupt rate.

Every so often, however, a posted send Work Request is allowed to
signal. Otherwise, the provider's Work Queue will wrap and the
workload will hang.

The number of Work Requests that are allowed to remain unsignaled is
determined by the value of req_cqinit. Currently, this is set to the
size of the send Work Queue divided by two, minus 1.

For FRMR, the send Work Queue is the maximum number of concurrent
RPCs (currently 32) times the maximum number of Work Requests an
RPC might use (currently 7, though some adapters may need more).

For mlx4, this is 224 entries. This leaves completion signaling
disabled for 111 send Work Requests.

Some providers hold back dispatching Work Requests until a CQE is
generated.  If completions are disabled, then no CQEs are generated
for quite some time, and that can stall the Work Queue.

I've seen this occur running xfstests generic/113 over NFSv4, where
eventually, posting a FAST_REG_MR Work Request fails with -ENOMEM
because the Work Queue has overflowed. The connection is dropped
and re-established.

Cap the rep_cqinit setting so completions are not left turned off
for too long.

BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=269Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

e7104a2a

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功