提交 · 31a701a94751509bb72e13d851f18ddcf22ff722 · openanolis / cloud-kernel

31 3月, 2015 9 次提交

xprtrdma: Add "reset MRs" memreg op · 31a701a9

由 Chuck Lever 提交于 3月 30, 2015

This method is invoked when a transport instance is about to be
reconnected. Each Memory Region object is reset to its initial
state.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Reviewed-by: NSagi Grimberg <sagig@mellanox.com>
Tested-by: NDevesh Sharma <Devesh.Sharma@Emulex.Com>
Tested-by: NMeghana Cheripady <Meghana.Cheripady@Emulex.Com>
Tested-by: NVeeresh U. Kokatnur <veereshuk@chelsio.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

31a701a9

xprtrdma: Add "init MRs" memreg op · 91e70e70

由 Chuck Lever 提交于 3月 30, 2015

This method is used when setting up a new transport instance to
create a pool of Memory Region objects that will be used to register
memory during operation.

Memory Regions are not needed for "physical" registration, since
->prepare and ->release are no-ops for that mode.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Reviewed-by: NSagi Grimberg <sagig@mellanox.com>
Tested-by: NDevesh Sharma <Devesh.Sharma@Emulex.Com>
Tested-by: NMeghana Cheripady <Meghana.Cheripady@Emulex.Com>
Tested-by: NVeeresh U. Kokatnur <veereshuk@chelsio.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

91e70e70

xprtrdma: Add a "deregister_external" op for each memreg mode · 6814baea

由 Chuck Lever 提交于 3月 30, 2015

There is very little common processing among the different external
memory deregistration functions.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Tested-by: NDevesh Sharma <Devesh.Sharma@Emulex.Com>
Tested-by: NMeghana Cheripady <Meghana.Cheripady@Emulex.Com>
Tested-by: NVeeresh U. Kokatnur <veereshuk@chelsio.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

6814baea

xprtrdma: Add a "register_external" op for each memreg mode · 9c1b4d77

由 Chuck Lever 提交于 3月 30, 2015

There is very little common processing among the different external
memory registration functions. Have rpcrdma_create_chunks() call
the registration method directly. This removes a stack frame and a
switch statement from the external registration path.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Tested-by: NDevesh Sharma <Devesh.Sharma@Emulex.Com>
Tested-by: NMeghana Cheripady <Meghana.Cheripady@Emulex.Com>
Tested-by: NVeeresh U. Kokatnur <veereshuk@chelsio.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

9c1b4d77

xprtrdma: Add a "max_payload" op for each memreg mode · 1c9351ee

由 Chuck Lever 提交于 3月 30, 2015

The max_payload computation is generalized to ensure that the
payload maximum is the lesser of RPC_MAX_DATA_SEGS and the number of
data segments that can be transmitted in an inline buffer.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Reviewed-by: NSagi Grimberg <sagig@mellanox.com>
Tested-by: NDevesh Sharma <Devesh.Sharma@Emulex.Com>
Tested-by: NMeghana Cheripady <Meghana.Cheripady@Emulex.Com>
Tested-by: NVeeresh U. Kokatnur <veereshuk@chelsio.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

1c9351ee

xprtrdma: Add vector of ops for each memory registration strategy · a0ce85f5

由 Chuck Lever 提交于 3月 30, 2015

Instead of employing switch() statements, let's use the typical
Linux kernel idiom for handling behavioral variation: virtual
functions.

Start by defining a vector of operations for each supported memory
registration mode, and by adding a source file for each mode.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Reviewed-by: NSagi Grimberg <sagig@mellanox.com>
Tested-by: NDevesh Sharma <Devesh.Sharma@Emulex.Com>
Tested-by: NMeghana Cheripady <Meghana.Cheripady@Emulex.Com>
Tested-by: NVeeresh U. Kokatnur <veereshuk@chelsio.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

a0ce85f5

xprtrdma: Prevent infinite loop in rpcrdma_ep_create() · 41f97028

由 Chuck Lever 提交于 3月 30, 2015

If a provider advertizes a zero max_fast_reg_page_list_len, FRWR
depth detection loops forever. Instead of just failing the mount,
try other memory registration modes.

Fixes: 0fc6c4e7 ("xprtrdma: mind the device's max fast . . .")
Reported-by: NDevesh Sharma <Devesh.Sharma@Emulex.Com>
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Tested-by: NDevesh Sharma <Devesh.Sharma@Emulex.Com>
Tested-by: NMeghana Cheripady <Meghana.Cheripady@Emulex.Com>
Tested-by: NVeeresh U. Kokatnur <veereshuk@chelsio.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

41f97028

xprtrdma: Byte-align FRWR registration · 80527240

由 Chuck Lever 提交于 3月 30, 2015

The RPC/RDMA transport's FRWR registration logic registers whole
pages. This means areas in the first and last pages that are not
involved in the RDMA I/O are needlessly exposed to the server.

Buffered I/O is typically page-aligned, so not a problem there. But
for direct I/O, which can be byte-aligned, and for reply chunks,
which are nearly always smaller than a page, the transport could
expose memory outside the I/O buffer.

FRWR allows byte-aligned memory registration, so let's use it as
it was intended.
Reported-by: NSagi Grimberg <sagig@mellanox.com>
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Tested-by: NDevesh Sharma <Devesh.Sharma@Emulex.Com>
Tested-by: NMeghana Cheripady <Meghana.Cheripady@Emulex.Com>
Tested-by: NVeeresh U. Kokatnur <veereshuk@chelsio.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

80527240

xprtrdma: Display IPv6 addresses and port numbers correctly · 0dd39cae

由 Chuck Lever 提交于 3月 30, 2015

Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Reviewed-by: NSagi Grimberg <sagig@mellanox.com>
Tested-by: NDevesh Sharma <Devesh.Sharma@Emulex.Com>
Tested-by: NMeghana Cheripady <Meghana.Cheripady@Emulex.Com>
Tested-by: NVeeresh U. Kokatnur <veereshuk@chelsio.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

0dd39cae

30 1月, 2015 16 次提交

xprtrdma: Clean up after adding regbuf management · df515ca7

由 Chuck Lever 提交于 1月 21, 2015

rpcrdma_{de}register_internal() are used only in verbs.c now.

MAX_RPCRDMAHDR is no longer used and can be removed.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Reviewed-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

df515ca7

xprtrdma: Allocate zero pad separately from rpcrdma_buffer · c05fbb5a

由 Chuck Lever 提交于 1月 21, 2015

Use the new rpcrdma_alloc_regbuf() API to shrink the amount of
contiguous memory needed for a buffer pool by moving the zero
pad buffer into a regbuf.

This is for consistency with the other uses of internally
registered memory.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Reviewed-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

c05fbb5a

xprtrdma: Allocate RPC/RDMA receive buffer separately from struct rpcrdma_rep · 6b1184cd

由 Chuck Lever 提交于 1月 21, 2015

The rr_base field is currently the buffer where RPC replies land.

An RPC/RDMA reply header lands in this buffer. In some cases an RPC
reply header also lands in this buffer, just after the RPC/RDMA
header.

The inline threshold is an agreed-on size limit for RDMA SEND
operations that pass from server and client. The sum of the
RPC/RDMA reply header size and the RPC reply header size must be
less than this threshold.

The largest RDMA RECV that the client should have to handle is the
size of the inline threshold. The receive buffer should thus be the
size of the inline threshold, and not related to RPCRDMA_MAX_SEGS.

RPC replies received via RDMA WRITE (long replies) are caught in
rq_rcv_buf, which is the second half of the RPC send buffer. Ie,
such replies are not involved in any way with rr_base.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

6b1184cd

xprtrdma: Allocate RPC/RDMA send buffer separately from struct rpcrdma_req · 85275c87

由 Chuck Lever 提交于 1月 21, 2015

The rl_base field is currently the buffer where each RPC/RDMA call
header is built.

The inline threshold is an agreed-on size limit to for RDMA SEND
operations that pass between client and server. The sum of the
RPC/RDMA header size and the RPC header size must be less than or
equal to this threshold.

Increasing the r/wsize maximum will require MAX_SEGS to grow
significantly, but the inline threshold size won't change (both
sides agree on it). The server's inline threshold doesn't change.

Since an RPC/RDMA header can never be larger than the inline
threshold, make all RPC/RDMA header buffers the size of the
inline threshold.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Reviewed-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

85275c87

xprtrdma: Allocate RPC send buffer separately from struct rpcrdma_req · 0ca77dc3

由 Chuck Lever 提交于 1月 21, 2015

Because internal memory registration is an expensive and synchronous
operation, xprtrdma pre-registers send and receive buffers at mount
time, and then re-uses them for each RPC.

A "hardway" allocation is a memory allocation and registration that
replaces a send buffer during the processing of an RPC. Hardway must
be done if the RPC send buffer is too small to accommodate an RPC's
call and reply headers.

For xprtrdma, each RPC send buffer is currently part of struct
rpcrdma_req so that xprt_rdma_free(), which is passed nothing but
the address of an RPC send buffer, can find its matching struct
rpcrdma_req and rpcrdma_rep quickly via container_of / offsetof.

That means that hardway currently has to replace a whole rpcrmda_req
when it replaces an RPC send buffer. This is often a fairly hefty
chunk of contiguous memory due to the size of the rl_segments array
and the fact that both the send and receive buffers are part of
struct rpcrdma_req.

Some obscure re-use of fields in rpcrdma_req is done so that
xprt_rdma_free() can detect replaced rpcrdma_req structs, and
restore the original.

This commit breaks apart the RPC send buffer and struct rpcrdma_req
so that increasing the size of the rl_segments array does not change
the alignment of each RPC send buffer. (Increasing rl_segments is
needed to bump up the maximum r/wsize for NFS/RDMA).

This change opens up some interesting possibilities for improving
the design of xprt_rdma_allocate().

xprt_rdma_allocate() is now the one place where RPC send buffers
are allocated or re-allocated, and they are now always left in place
by xprt_rdma_free().

A large re-allocation that includes both the rl_segments array and
the RPC send buffer is no longer needed. Send buffer re-allocation
becomes quite rare. Good send buffer alignment is guaranteed no
matter what the size of the rl_segments array is.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Reviewed-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

0ca77dc3

xprtrdma: Add struct rpcrdma_regbuf and helpers · 9128c3e7

由 Chuck Lever 提交于 1月 21, 2015

There are several spots that allocate a buffer via kmalloc (usually
contiguously with another data structure) and then register that
buffer internally. I'd like to split the buffers out of these data
structures to allow the data structures to scale.

Start by adding functions that can kmalloc and register a buffer,
and can manage/preserve the buffer's associated ib_sge and ib_mr
fields.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Reviewed-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

9128c3e7

xprtrdma: Refactor rpcrdma_buffer_create() and rpcrdma_buffer_destroy() · 1392402c

由 Chuck Lever 提交于 1月 21, 2015

Move the details of how to create and destroy rpcrdma_req and
rpcrdma_rep structures into helper functions.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Reviewed-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

1392402c

xprtrdma: Simplify synopsis of rpcrdma_buffer_create() · ac920d04

由 Chuck Lever 提交于 1月 21, 2015

Clean up: There is one call site for rpcrdma_buffer_create(). All of
the arguments there are fields of an rpcrdma_xprt.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Reviewed-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

ac920d04

xprtrdma: Take struct ib_qp_attr and ib_qp_init_attr off the stack · ce1ab9ab

由 Chuck Lever 提交于 1月 21, 2015

Reduce stack footprint of the connection upcall handler function.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Reviewed-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

ce1ab9ab

xprtrdma: Take struct ib_device_attr off the stack · 7bc7972c

由 Chuck Lever 提交于 1月 21, 2015

Device attributes are large, and are used in more than one place.
Stash a copy in dynamically allocated memory.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Reviewed-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

7bc7972c

xprtrdma: Free the pd if ib_query_qp() fails · 5ae711a2

由 Chuck Lever 提交于 1月 21, 2015

If ib_query_qp() fails or the memory registration mode isn't
supported, don't leak the PD. An orphaned IB/core resource will
cause IB module removal to hang.

Fixes: bd7ed1d1 ("RPC/RDMA: check selected memory registration ...")
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Reviewed-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

5ae711a2

xprtrdma: Remove rpcrdma_ep::rep_func and ::rep_xprt · afadc468

由 Chuck Lever 提交于 1月 21, 2015

Clean up: The rep_func field always refers to rpcrdma_conn_func().
rep_func should have been removed by commit b45ccfd2 ("xprtrdma:
Remove MEMWINDOWS registration modes").
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Reviewed-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

afadc468

xprtrdma: Move credit update to RPC reply handler · eba8ff66

由 Chuck Lever 提交于 1月 21, 2015

Reduce work in the receive CQ handler, which can be run at hardware
interrupt level, by moving the RPC/RDMA credit update logic to the
RPC reply handler.

This has some additional benefits: More header sanity checking is
done before trusting the incoming credit value, and the receive CQ
handler no longer touches the RPC/RDMA header (the CPU stalls while
waiting for the header contents to be brought into the cache).

This further extends work begun by commit e7ce710a ("xprtrdma:
Avoid deadlock when credit window is reset").
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Reviewed-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

eba8ff66

xprtrdma: Remove rl_mr field, and the mr_chunk union · 3eb35810

由 Chuck Lever 提交于 1月 21, 2015

Clean up: Since commit 0ac531c1 ("xprtrdma: Remove REGISTER
memory registration mode"), the rl_mr pointer is no longer used
anywhere.

After removal, there's only a single member of the mr_chunk union,
so mr_chunk can be removed as well, in favor of a single pointer
field.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Reviewed-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

3eb35810

xprtrdma: Remove rpcrdma_ep::rep_ia · 5d410ba0

由 Chuck Lever 提交于 1月 21, 2015

Clean up: This field is not used.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Reviewed-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

5d410ba0

xprtrdma: human-readable completion status · 8502427c

由 Chuck Lever 提交于 1月 21, 2015

Make it easier to grep the system log for specific error conditions.

The wc.opcode field is not included because opcode numbers are
sparse, and because wc.opcode is not necessarily valid when
completion reports an error.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Reviewed-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

8502427c

26 11月, 2014 6 次提交

xprtrdma: Display async errors · 7ff11de1