提交 · 3a72dc771cc38e4d6e441a86256a3d7788a84c01 · openeuler / raspberrypi-kernel

30 11月, 2016 2 次提交

xprtrdma: Relocate connection helper functions · 3a72dc77

由 Chuck Lever 提交于 11月 29, 2016

Clean up: Disentangle connection helpers from RPC-over-RDMA reply
decoding functions.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

3a72dc77

xprtrdma: Avoid calls to ro_unmap_safe() · 48016dce

由 Chuck Lever 提交于 11月 29, 2016

Micro-optimization: Most of the time, calls to ro_unmap_safe are
expensive no-ops. Call only when there is work to do.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

48016dce

20 9月, 2016 12 次提交

xprtrdma: Support larger inline thresholds · 44829d02

由 Chuck Lever 提交于 9月 15, 2016

The Version One default inline threshold is still 1KB. But allow
testing with thresholds up to 64KB.

This maximum is somewhat arbitrary. There's no fundamental
architectural limit I'm aware of, but it's good to keep the size of
Receive buffers reasonable. Now that Send can use a s/g list, a
Send buffer is only as large as each RPC requires. Receive buffers
are always the size of the inline threshold, however.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

44829d02

xprtrdma: Use gathered Send for large inline messages · 655fec69

由 Chuck Lever 提交于 9月 15, 2016

An RPC Call message that is sent inline but that has a data payload
(ie, one or more items in rq_snd_buf's page list) must be "pulled
up:"

- call_allocate has to reserve enough RPC Call buffer space to
accommodate the data payload

- call_transmit has to memcopy the rq_snd_buf's page list and tail
into its head iovec before it is sent

As the inline threshold is increased beyond its current 1KB default,
however, this means data payloads of more than a few KB are copied
by the host CPU. For example, if the inline threshold is increased
just to 4KB, then NFS WRITE requests up to 4KB would involve a
memcpy of the NFS WRITE's payload data into the RPC Call buffer.
This is an undesirable amount of participation by the host CPU.

The inline threshold may be much larger than 4KB in the future,
after negotiation with a peer server.

Instead of copying the components of rq_snd_buf into its head iovec,
construct a gather list of these components, and send them all in
place. The same approach is already used in the Linux server's
RPC-over-RDMA reply path.

This mechanism also eliminates the need for rpcrdma_tail_pullup,
which is used to manage the XDR pad and trailing inline content when
a Read list is present.

This requires that the pages in rq_snd_buf's page list be DMA-mapped
during marshaling, and unmapped when a data-bearing RPC is
completed. This is slightly less efficient for very small I/O
payloads, but significantly more efficient as data payload size and
inline threshold increase past a kilobyte.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

655fec69

xprtrdma: Basic support for Remote Invalidation · c8b920bb

由 Chuck Lever 提交于 9月 15, 2016

Have frwr's ro_unmap_sync recognize an invalidated rkey that appears
as part of a Receive completion. Local invalidation can be skipped
for that rkey.

Use an out-of-band signaling mechanism to indicate to the server
that the client is prepared to receive RDMA Send With Invalidate.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

c8b920bb

xprtrdma: Eliminate "ia" argument in rpcrdma_{alloc, free}_regbuf · 13650c23

由 Chuck Lever 提交于 9月 15, 2016

Clean up. The "ia" argument is no longer used.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

13650c23

xprtrdma: Replace DMA_BIDIRECTIONAL · 99ef4db3

由 Chuck Lever 提交于 9月 15, 2016

The use of DMA_BIDIRECTIONAL is discouraged by DMA-API.txt.
Fortunately, xprtrdma now knows which direction I/O is going as
soon as it allocates each regbuf.

The RPC Call and Reply buffers are no longer the same regbuf. They
can each be labeled correctly now. The RPC Reply buffer is never
part of either a Send or Receive WR, but it can be part of Reply
chunk, which is mapped and registered via ->ro_map . So it is not
DMA mapped when it is allocated (DMA_NONE), to avoid a double-
mapping.

Since Receive buffers are no longer DMA_BIDIRECTIONAL and their
contents are never modified by the host CPU, DMA-API-HOWTO.txt
suggests that a DMA sync before posting each buffer should be
unnecessary. (See my_card_interrupt_handler).
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

99ef4db3

xprtrdma: Use smaller buffers for RPC-over-RDMA headers · 08cf2efd

由 Chuck Lever 提交于 9月 15, 2016

Commit 94931746 ("xprtrdma: Limit number of RDMA segments in
RPC-over-RDMA headers") capped the number of chunks that may appear
in RPC-over-RDMA headers. The maximum header size can be estimated
and fixed to avoid allocating buffer space that is never used.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

08cf2efd

xprtrdma: Initialize separate RPC call and reply buffers · 9c40c49f

由 Chuck Lever 提交于 9月 15, 2016

RPC-over-RDMA needs to separate its RPC call and reply buffers.

 o When an RPC Call is sent, rq_snd_buf is DMA mapped for an RDMA
   Send operation using DMA_TO_DEVICE

 o If the client expects a large RPC reply, it DMA maps rq_rcv_buf
   as part of a Reply chunk using DMA_FROM_DEVICE

The two mappings are for data movement in opposite directions.

DMA-API.txt suggests that if these mappings share a DMA cacheline,
bad things can happen. This could occur in the final bytes of
rq_snd_buf and the first bytes of rq_rcv_buf if the two buffers
happen to share a DMA cacheline.

On x86_64 the cacheline size is typically 8 bytes, and RPC call
messages are usually much smaller than the send buffer, so this
hasn't been a noticeable problem. But the DMA cacheline size can be
larger on other platforms.

Also, often rq_rcv_buf starts most of the way into a page, thus
an additional RDMA segment is needed to map and register the end of
that buffer. Try to avoid that scenario to reduce the cost of
registering and invalidating Reply chunks.

Instead of carrying a single regbuf that covers both rq_snd_buf and
rq_rcv_buf, each struct rpcrdma_req now carries one regbuf for
rq_snd_buf and one regbuf for rq_rcv_buf.

Some incidental changes worth noting:

- To clear out some spaghetti, refactor xprt_rdma_allocate.
- The value stored in rg_size is the same as the value stored in
  the iov.length field, so eliminate rg_size
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

9c40c49f

SUNRPC: Add a transport-specific private field in rpc_rqst · 5a6d1db4

由 Chuck Lever 提交于 9月 15, 2016

Currently there's a hidden and indirect mechanism for finding the
rpcrdma_req that goes with an rpc_rqst. It depends on getting from
the rq_buffer pointer in struct rpc_rqst to the struct
rpcrdma_regbuf that controls that buffer, and then to the struct
rpcrdma_req it goes with.

This was done back in the day to avoid the need to add a per-rqst
pointer or to alter the buf_free API when support for RPC-over-RDMA
was introduced.

I'm about to change the way regbuf's work to support larger inline
thresholds. Now is a good time to replace this indirect mechanism
with something that is more straightforward. I guess this should be
considered a clean up.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

5a6d1db4

SUNRPC: Separate buffer pointers for RPC Call and Reply messages · 68778945

由 Chuck Lever 提交于 9月 15, 2016

For xprtrdma, the RPC Call and Reply buffers are involved in real
I/O operations.

To start with, the DMA direction of the I/O for a Call is opposite
that of a Reply.

In the current arrangement, the Reply buffer address is on a
four-byte alignment just past the call buffer. Would be friendlier
on some platforms if that was at a DMA cache alignment instead.

Because the current arrangement allocates a single memory region
which contains both buffers, the RPC Reply buffer often contains a
page boundary in it when the Call buffer is large enough (which is
frequent).

It would be a little nicer for setting up DMA operations (and
possible registration of the Reply buffer) if the two buffers were
separated, well-aligned, and contained as few page boundaries as
possible.

Now, I could just pad out the single memory region used for the pair
of buffers. But frequently that would mean a lot of unused space to
ensure the Reply buffer did not have a page boundary.

Add a separate pointer to rpc_rqst that points right to the RPC
Reply buffer. This makes no difference to xprtsock, but it will help
xprtrdma in subsequent patches.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

68778945

SUNRPC: Generalize the RPC buffer release API · 3435c74a

由 Chuck Lever 提交于 9月 15, 2016

xprtrdma needs to allocate the Call and Reply buffers separately.
TBH, the reliance on using a single buffer for the pair of XDR
buffers is transport implementation-specific.

Instead of passing just the rq_buffer into the buf_free method, pass
the task structure and let buf_free take care of freeing both
XDR buffers at once.

There's a micro-optimization here. In the common case, both
xprt_release and the transport's buf_free method were checking if
rq_buffer was NULL. Now the check is done only once per RPC.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

3435c74a

SUNRPC: Generalize the RPC buffer allocation API · 5fe6eaa1

由 Chuck Lever 提交于 9月 15, 2016

xprtrdma needs to allocate the Call and Reply buffers separately.
TBH, the reliance on using a single buffer for the pair of XDR
buffers is transport implementation-specific.

Transports that want to allocate separate Call and Reply buffers
will ignore the "size" argument anyway.  Don't bother passing it.

The buf_alloc method can't return two pointers. Instead, make the
method's return value an error code, and set the rq_buffer pointer
in the method itself.

This gives call_allocate an opportunity to terminate an RPC instead
of looping forever when a permanent problem occurs. If a request is
just bogus, or the transport is in a state where it can't allocate
resources for any request, there needs to be a way to kill the RPC
right there and not loop.

This immediately fixes a rare problem in the backchannel send path,
which loops if the server happens to send a CB request whose
call+reply size is larger than a page (which it shouldn't do yet).

One more issue: looks like xprt_inject_disconnect was incorrectly
placed in the failure path in call_allocate. It needs to be in the
success path, as it is for other call-sites.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

5fe6eaa1

xprtrdma: Eliminate INLINE_THRESHOLD macros · eb342e9a

由 Chuck Lever 提交于 9月 15, 2016

Clean up: r_xprt is already available everywhere these macros are
invoked, so just dereference that directly.

RPCRDMA_INLINE_PAD_VALUE is no longer used, so it can simply be
removed.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

eb342e9a

12 7月, 2016 4 次提交

xprtrdma: Place registered MWs on a per-req list · 9d6b0409

由 Chuck Lever 提交于 6月 29, 2016

Instead of placing registered MWs sparsely into the rl_segments
array, place these MWs on a per-req list.

ro_unmap_{sync,safe} can then simply pull those MWs off the list
instead of walking through the array.

This change significantly reduces the size of struct rpcrdma_req
by removing nsegs and rl_mw from every array element.

As an additional clean-up, chunk co-ordinates are returned in the
"*mw" output argument so they are no longer needed in every
array element.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Tested-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

9d6b0409

xprtrdma: Allocate MRs on demand · e2ac236c

由 Chuck Lever 提交于 6月 29, 2016

Frequent MR list exhaustion can impact I/O throughput, so enough MRs
are always created during transport set-up to prevent running out.
This means more MRs are created than most workloads need.

Commit 94f58c58 ("xprtrdma: Allow Read list and Reply chunk
simultaneously") introduced support for sending two chunk lists per
RPC, which consumes more MRs per RPC.

Instead of trying to provision more MRs, introduce a mechanism for
allocating MRs on demand. A few MRs are allocated during transport
set-up to kick things off.

This significantly reduces the average number of MRs per transport
while allowing the MR count to grow for workloads or devices that
need more MRs.

FRWR with mlx4 allocated almost 400 MRs per transport before this
patch. Now it starts with 32.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Tested-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

e2ac236c

xprtrdma: Honor ->send_request API contract · 7a89f9c6

由 Chuck Lever 提交于 6月 29, 2016

Commit c93c6223 ("xprtrdma: Disconnect on registration failure")
added a disconnect for some RPC marshaling failures. This is needed
only in a handful of cases, but it was triggering for simple stuff
like temporary resource shortages. Try to straighten this out.

Fix up the lower layers so they don't return -ENOMEM or other error
codes that the RPC client's FSM doesn't explicitly recognize.

Also fix up the places in the send_request path that do want a
disconnect. For example, when ib_post_send or ib_post_recv fail,
this is a sign that there is a send or receive queue resource
miscalculation. That should be rare, and is a sign of a software
bug. But xprtrdma can recover: disconnect to reset the transport and
start over.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Tested-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

7a89f9c6

xprtrdma: Refactor MR recovery work queues · 505bbe64

由 Chuck Lever 提交于 6月 29, 2016

I found that commit ead3f26e ("xprtrdma: Add ro_unmap_safe
memreg method"), which introduces ro_unmap_safe, never wired up the
FMR recovery worker.

The FMR and FRWR recovery work queues both do the same thing.
Instead of setting up separate individual work queues for this,
schedule a delayed worker to deal with them, since recovering MRs is
not performance-critical.

Fixes: ead3f26e ("xprtrdma: Add ro_unmap_safe memreg method")
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Tested-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

505bbe64

18 5月, 2016 3 次提交

xprtrdma: Add ro_unmap_safe memreg method · ead3f26e

由 Chuck Lever 提交于 5月 02, 2016

There needs to be a safe method of releasing registered memory
resources when an RPC terminates. Safe can mean a number of things:

+ Doesn't have to sleep

+ Doesn't rely on having a QP in RTS

ro_unmap_safe will be that safe method. It can be used in cases
where synchronous memory invalidation can deadlock, or needs to have
an active QP.

The important case is fencing an RPC's memory regions after it is
signaled (^C) and before it exits. If this is not done, there is a
window where the server can write an RPC reply into memory that the
client has released and re-used for some other purpose.

Note that this is a full solution for FRWR, but FMR and physical
still have some gaps where a particularly bad server can wreak
some havoc on the client. These gaps are not made worse by this
patch and are expected to be exceptionally rare and timing-based.
They are noted in documenting comments.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Tested-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

ead3f26e

xprtrdma: Bound the inline threshold values · 29c55422

由 Chuck Lever 提交于 5月 02, 2016

Currently the sysctls that allow setting the inline threshold allow
any value to be set.

Small values only make the transport run slower. The default 1KB
setting is as low as is reasonable. And the logic that decides how
to divide a Send buffer between RPC-over-RDMA header and RPC message
assumes (but does not check) that the lower bound is not crazy (say,
57 bytes).

Send and receive buffers share a page with some control information.
Values larger than about 3KB can't be supported, currently.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Tested-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

29c55422

sunrpc: Advertise maximum backchannel payload size · 6b26cc8c

由 Chuck Lever 提交于 5月 02, 2016

RPC-over-RDMA transports have a limit on how large a backward
direction (backchannel) RPC message can be. Ensure that the NFSv4.x
CREATE_SESSION operation advertises this limit to servers.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Tested-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

6b26cc8c

20 1月, 2016 1 次提交

svcrdma: Add class for RDMA backwards direction transport · 5d252f90

由 Chuck Lever 提交于 1月 07, 2016

To support the server-side of an NFSv4.1 backchannel on RDMA
connections, add a transport class that enables backward
direction messages on an existing forward channel connection.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Acked-by: NBruce Fields <bfields@fieldses.org>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

5d252f90

19 12月, 2015 1 次提交

xprtrdma: xprt_rdma_free() must not release backchannel reqs · ffc4d9b1

由 Chuck Lever 提交于 12月 16, 2015

Preserve any rpcrdma_req that is attached to rpc_rqst's allocated
for the backchannel. Otherwise, after all the pre-allocated
backchannel req's are consumed, incoming backward calls start
writing on freed memory.

Somehow this hunk got lost.

Fixes: f531a5db ('xprtrdma: Pre-allocate backward rpc_rqst')
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Tested-by: NDevesh Sharma <devesh.sharma@avagotech.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

ffc4d9b1

03 11月, 2015 4 次提交

NFS: Enable client side NFSv4.1 backchannel to use other transports · 76566773

由 Chuck Lever 提交于 10月 24, 2015

Forechannel transports get their own "bc_up" method to create an
endpoint for the backchannel service.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
[Anna Schumaker: Add forward declaration of struct net to xprt.h]
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

76566773

xprtrdma: Pre-allocate backward rpc_rqst and send/receive buffers · f531a5db

由 Chuck Lever 提交于 10月 24, 2015

xprtrdma's backward direction send and receive buffers are the same
size as the forechannel's inline threshold, and must be pre-
registered.

The consumer has no control over which receive buffer the adapter
chooses to catch an incoming backwards-direction call. Any receive
buffer can be used for either a forward reply or a backward call.
Thus both types of RPC message must all be the same size.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Reviewed-by: NSagi Grimberg <sagig@mellanox.com>
Tested-By: NDevesh Sharma <devesh.sharma@avagotech.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

f531a5db

xprtrdma: Use workqueue to process RPC/RDMA replies · fe97b47c

由 Chuck Lever 提交于 10月 24, 2015

The reply tasklet is fast, but it's single threaded. After reply
traffic saturates a single CPU, there's no more reply processing
capacity.

Replace the tasklet with a workqueue to spread reply handling across
all CPUs.  This also moves RPC/RDMA reply handling out of the soft
IRQ context and into a context that allows sleeps.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Reviewed-by: NSagi Grimberg <sagig@mellanox.com>
Tested-By: NDevesh Sharma <devesh.sharma@avagotech.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

fe97b47c

xprtrdma: Enable swap-on-NFS/RDMA · a0451788

由 Chuck Lever 提交于 10月 24, 2015

After adding a swapfile on an NFS/RDMA mount and removing the
normal swap partition, I was able to push the NFS client well
into swap without any issue.

I forgot to swapoff the NFS file before rebooting. This pinned
the NFS mount and the IB core and provider, causing shutdown to
hang. I think this is expected and safe behavior. Probably
shutdown scripts should "swapoff -a" before unmounting any
filesystems.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Reviewed-by: NSagi Grimberg <sagig@mellanox.com>
Tested-By: NDevesh Sharma <devesh.sharma@avagotech.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

a0451788

28 9月, 2015 1 次提交

xprtrdma: disconnect and flush cqs before freeing buffers · 72c02173

由 Steve Wise 提交于 9月 21, 2015

Otherwise a FRMR completion can cause a touch-after-free crash.

In xprt_rdma_destroy(), call rpcrdma_buffer_destroy() only after calling
rpcrdma_ep_destroy().

In rpcrdma_ep_destroy(), disconnect the cm_id first which should flush the
qp, then drain the cqs, then destroy the qp, and finally destroy the cqs.
Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
Tested-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

72c02173

06 8月, 2015 3 次提交

xprtrdma: Count RDMA_NOMSG type calls · 860477d1

由 Chuck Lever 提交于 8月 03, 2015

RDMA_NOMSG type calls are less efficient than RDMA_MSG. Count NOMSG
calls so administrators can tell if they happen to be used more than
expected.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Tested-by: NDevesh Sharma <devesh.sharma@avagotech.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

860477d1

xprtrdma: Clean up xprt_rdma_print_stats() · 763f7e4e

由 Chuck Lever 提交于 8月 03, 2015

checkpatch.pl complained about the seq_printf() format string split
across lines and the use of %Lu.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Tested-by: NDevesh Sharma <devesh.sharma@avagotech.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

763f7e4e

xprtrdma: Make xprt_setup_rdma() agnostic to family of server address · 5231eb97

由 Chuck Lever 提交于 8月 03, 2015

In particular, recognize when an IPv6 connection is bound.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Tested-by: NDevesh Sharma <devesh.sharma@avagotech.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

5231eb97

13 6月, 2015 3 次提交

xprtrdma: Introduce an FRMR recovery workqueue · 951e721c

由 Chuck Lever 提交于 5月 26, 2015

After a transport disconnect, FRMRs can be left in an undetermined
state. In particular, the MR's rkey is no good.

Currently, FRMRs are fixed up by the transport connect worker, but
that can race with ->ro_unmap if an RPC happens to exit while the
transport connect worker is running.

A better way of dealing with broken FRMRs is to detect them before
they are re-used by ->ro_map. Such FRMRs are either already invalid
or are owned by the sending RPC, and thus no race with ->ro_unmap
is possible.

Introduce a mechanism for handing broken FRMRs to a workqueue to be
reset in a context that is appropriate for allocating resources
(ie. an ib_alloc_fast_reg_mr() API call).

This mechanism is not yet used, but will be in subsequent patches.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Reviewed-by: NSteve Wise <swise@opengridcomputing.com>
Reviewed-By: NDevesh Sharma <devesh.sharma@avagotech.com>
Tested-By: NDevesh Sharma <devesh.sharma@avagotech.com>
Reviewed-by: NDoug Ledford <dledford@redhat.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

951e721c

xprtrdma: Remove rr_func · 494ae30d

由 Chuck Lever 提交于 5月 26, 2015

A posted rpcrdma_rep never has rr_func set to anything but
rpcrdma_reply_handler.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Tested-By: NDevesh Sharma <devesh.sharma@avagotech.com>
Reviewed-by: NDoug Ledford <dledford@redhat.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

494ae30d

xprtrdma: Replace rpcrdma_rep::rr_buffer with rr_rxprt · fed171b3

由 Chuck Lever 提交于 5月 26, 2015

Clean up: Instead of carrying a pointer to the buffer pool and
the rpc_xprt, carry a pointer to the controlling rpcrdma_xprt.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Reviewed-by: NSteve Wise <swise@opengridcomputing.com>
Reviewed-by: NSagi Grimberg <sagig@mellanox.com>
Tested-By: NDevesh Sharma <devesh.sharma@avagotech.com>
Reviewed-by: NDoug Ledford <dledford@redhat.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

fed171b3

11 6月, 2015 2 次提交

SUNRPC: Transport fault injection · 4a068258

由 Chuck Lever 提交于 5月 11, 2015

It has been exceptionally useful to exercise the logic that handles
local immediate errors and RDMA connection loss.  To enable
developers to test this regularly and repeatably, add logic to
simulate connection loss every so often.

Fault injection is disabled by default. It is enabled with

  $ sudo echo xxx > /sys/kernel/debug/sunrpc/inject_fault/disconnect

where "xxx" is a large positive number of transport method calls
before a disconnect. A value of several thousand is usually a good
number that allows reasonable forward progress while still causing a
lot of connection drops.

These hooks are disabled when SUNRPC_DEBUG is turned off.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

4a068258

sunrpc: turn swapper_enable/disable functions into rpc_xprt_ops · d67fa4d8

由 Jeff Layton 提交于 6月 03, 2015

RDMA xprts don't have a sock_xprt, but an rdma_xprt, so the
xs_swapper_enable/disable functions will likely oops when fed an RDMA
xprt. Turn these functions into rpc_xprt_ops so that that doesn't
occur. For now the RDMA versions are no-ops that just return -EINVAL
on an attempt to swapon.

Cc: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: NJeff Layton <jeff.layton@primarydata.com>
Reviewed-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

d67fa4d8

05 6月, 2015 1 次提交

rpcrdma: Merge svcrdma and xprtrdma modules into one · ffe1f0df

由 Chuck Lever 提交于 6月 04, 2015

Bi-directional RPC support means code in svcrdma.ko invokes a bit of
code in xprtrdma.ko, and vice versa. To avoid loader/linker loops,
merge the server and client side modules together into a single
module.

When backchannel capabilities are added, the combined module will
register all needed transport capabilities so that Upper Layer
consumers automatically have everything needed to create a
bi-directional transport connection.

Module aliases are added for backwards compatibility with user
space, which still may expect svcrdma.ko or xprtrdma.ko to be
present.

This commit reverts commit 2e8c12e1 ("xprtrdma: add separate
Kconfig options for NFSoRDMA client and server support") and
provides a single CONFIG option for enabling the new module.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

ffe1f0df

31 3月, 2015 3 次提交

xprtrdma: Add a "deregister_external" op for each memreg mode · 6814baea

由 Chuck Lever 提交于 3月 30, 2015

There is very little common processing among the different external
memory deregistration functions.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Tested-by: NDevesh Sharma <Devesh.Sharma@Emulex.Com>
Tested-by: NMeghana Cheripady <Meghana.Cheripady@Emulex.Com>
Tested-by: NVeeresh U. Kokatnur <veereshuk@chelsio.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

6814baea

xprtrdma: Add a "max_payload" op for each memreg mode · 1c9351ee

由 Chuck Lever 提交于 3月 30, 2015

The max_payload computation is generalized to ensure that the
payload maximum is the lesser of RPC_MAX_DATA_SEGS and the number of
data segments that can be transmitted in an inline buffer.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Reviewed-by: NSagi Grimberg <sagig@mellanox.com>
Tested-by: NDevesh Sharma <Devesh.Sharma@Emulex.Com>
Tested-by: NMeghana Cheripady <Meghana.Cheripady@Emulex.Com>
Tested-by: NVeeresh U. Kokatnur <veereshuk@chelsio.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

1c9351ee

xprtrdma: Perform a full marshal on retransmit · e2377945

由 Chuck Lever 提交于 3月 30, 2015

Commit 6ab59945 ("xprtrdma: Update rkeys after transport
reconnect" added logic in the ->send_request path to update the
chunk list when an RPC/RDMA request is retransmitted.

Note that rpc_xdr_encode() resets and re-encodes the entire RPC
send buffer for each retransmit of an RPC. The RPC send buffer
is not preserved from the previous transmission of an RPC.

Revert 6ab59945, and instead, just force each request to be
fully marshaled every time through ->send_request. This should
preserve the fix from 6ab59945, while also performing pullup
during retransmits.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Acked-by: NSagi Grimberg <sagig@mellanox.com>
Tested-by: NDevesh Sharma <Devesh.Sharma@Emulex.Com>
Tested-by: NMeghana Cheripady <Meghana.Cheripady@Emulex.Com>
Tested-by: NVeeresh U. Kokatnur <veereshuk@chelsio.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

e2377945