提交 · b625a61698619c7af652de2701a2fb17c5c5d66e · openanolis / cloud-kernel

06 2月, 2015 1 次提交

xprtrdma: Address sparse complaint in rpcr_to_rdmar() · b625a616

由 Chuck Lever 提交于 2月 04, 2015

With "make ARCH=x86_64 allmodconfig make C=1 CF=-D__CHECK_ENDIAN__":

linux-2.6/net/sunrpc/xprtrdma/xprt_rdma.h:273:30: warning: incorrect
  type in initializer (different base types)
linux-2.6/net/sunrpc/xprtrdma/xprt_rdma.h:273:30: expected restricted
  __be32 [usertype] *buffer
linux-2.6/net/sunrpc/xprtrdma/xprt_rdma.h:273:30:    got unsigned int
  [usertype] *rq_buffer

As far as I can tell this is a false positive.

Reported-by: kbuild-all@01.org
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

b625a616

30 1月, 2015 14 次提交

xprtrdma: Clean up after adding regbuf management · df515ca7

由 Chuck Lever 提交于 1月 21, 2015

rpcrdma_{de}register_internal() are used only in verbs.c now.

MAX_RPCRDMAHDR is no longer used and can be removed.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Reviewed-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

df515ca7

xprtrdma: Allocate zero pad separately from rpcrdma_buffer · c05fbb5a

由 Chuck Lever 提交于 1月 21, 2015

Use the new rpcrdma_alloc_regbuf() API to shrink the amount of
contiguous memory needed for a buffer pool by moving the zero
pad buffer into a regbuf.

This is for consistency with the other uses of internally
registered memory.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Reviewed-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

c05fbb5a

xprtrdma: Allocate RPC/RDMA receive buffer separately from struct rpcrdma_rep · 6b1184cd

由 Chuck Lever 提交于 1月 21, 2015

The rr_base field is currently the buffer where RPC replies land.

An RPC/RDMA reply header lands in this buffer. In some cases an RPC
reply header also lands in this buffer, just after the RPC/RDMA
header.

The inline threshold is an agreed-on size limit for RDMA SEND
operations that pass from server and client. The sum of the
RPC/RDMA reply header size and the RPC reply header size must be
less than this threshold.

The largest RDMA RECV that the client should have to handle is the
size of the inline threshold. The receive buffer should thus be the
size of the inline threshold, and not related to RPCRDMA_MAX_SEGS.

RPC replies received via RDMA WRITE (long replies) are caught in
rq_rcv_buf, which is the second half of the RPC send buffer. Ie,
such replies are not involved in any way with rr_base.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

6b1184cd

xprtrdma: Allocate RPC/RDMA send buffer separately from struct rpcrdma_req · 85275c87

由 Chuck Lever 提交于 1月 21, 2015

The rl_base field is currently the buffer where each RPC/RDMA call
header is built.

The inline threshold is an agreed-on size limit to for RDMA SEND
operations that pass between client and server. The sum of the
RPC/RDMA header size and the RPC header size must be less than or
equal to this threshold.

Increasing the r/wsize maximum will require MAX_SEGS to grow
significantly, but the inline threshold size won't change (both
sides agree on it). The server's inline threshold doesn't change.

Since an RPC/RDMA header can never be larger than the inline
threshold, make all RPC/RDMA header buffers the size of the
inline threshold.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Reviewed-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

85275c87

xprtrdma: Allocate RPC send buffer separately from struct rpcrdma_req · 0ca77dc3

由 Chuck Lever 提交于 1月 21, 2015

Because internal memory registration is an expensive and synchronous
operation, xprtrdma pre-registers send and receive buffers at mount
time, and then re-uses them for each RPC.

A "hardway" allocation is a memory allocation and registration that
replaces a send buffer during the processing of an RPC. Hardway must
be done if the RPC send buffer is too small to accommodate an RPC's
call and reply headers.

For xprtrdma, each RPC send buffer is currently part of struct
rpcrdma_req so that xprt_rdma_free(), which is passed nothing but
the address of an RPC send buffer, can find its matching struct
rpcrdma_req and rpcrdma_rep quickly via container_of / offsetof.

That means that hardway currently has to replace a whole rpcrmda_req
when it replaces an RPC send buffer. This is often a fairly hefty
chunk of contiguous memory due to the size of the rl_segments array
and the fact that both the send and receive buffers are part of
struct rpcrdma_req.

Some obscure re-use of fields in rpcrdma_req is done so that
xprt_rdma_free() can detect replaced rpcrdma_req structs, and
restore the original.

This commit breaks apart the RPC send buffer and struct rpcrdma_req
so that increasing the size of the rl_segments array does not change
the alignment of each RPC send buffer. (Increasing rl_segments is
needed to bump up the maximum r/wsize for NFS/RDMA).

This change opens up some interesting possibilities for improving
the design of xprt_rdma_allocate().

xprt_rdma_allocate() is now the one place where RPC send buffers
are allocated or re-allocated, and they are now always left in place
by xprt_rdma_free().

A large re-allocation that includes both the rl_segments array and
the RPC send buffer is no longer needed. Send buffer re-allocation
becomes quite rare. Good send buffer alignment is guaranteed no
matter what the size of the rl_segments array is.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Reviewed-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

0ca77dc3

xprtrdma: Add struct rpcrdma_regbuf and helpers · 9128c3e7

由 Chuck Lever 提交于 1月 21, 2015

There are several spots that allocate a buffer via kmalloc (usually
contiguously with another data structure) and then register that
buffer internally. I'd like to split the buffers out of these data
structures to allow the data structures to scale.

Start by adding functions that can kmalloc and register a buffer,
and can manage/preserve the buffer's associated ib_sge and ib_mr
fields.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Reviewed-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

9128c3e7

xprtrdma: Simplify synopsis of rpcrdma_buffer_create() · ac920d04

由 Chuck Lever 提交于 1月 21, 2015

Clean up: There is one call site for rpcrdma_buffer_create(). All of
the arguments there are fields of an rpcrdma_xprt.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Reviewed-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

ac920d04

xprtrdma: Take struct ib_qp_attr and ib_qp_init_attr off the stack · ce1ab9ab

由 Chuck Lever 提交于 1月 21, 2015

Reduce stack footprint of the connection upcall handler function.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Reviewed-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

ce1ab9ab

xprtrdma: Take struct ib_device_attr off the stack · 7bc7972c

由 Chuck Lever 提交于 1月 21, 2015

Device attributes are large, and are used in more than one place.
Stash a copy in dynamically allocated memory.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Reviewed-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

7bc7972c

xprtrdma: Remove rpcrdma_ep::rep_func and ::rep_xprt · afadc468

由 Chuck Lever 提交于 1月 21, 2015

Clean up: The rep_func field always refers to rpcrdma_conn_func().
rep_func should have been removed by commit b45ccfd2 ("xprtrdma:
Remove MEMWINDOWS registration modes").
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Reviewed-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

afadc468

xprtrdma: Move credit update to RPC reply handler · eba8ff66

由 Chuck Lever 提交于 1月 21, 2015

Reduce work in the receive CQ handler, which can be run at hardware
interrupt level, by moving the RPC/RDMA credit update logic to the
RPC reply handler.

This has some additional benefits: More header sanity checking is
done before trusting the incoming credit value, and the receive CQ
handler no longer touches the RPC/RDMA header (the CPU stalls while
waiting for the header contents to be brought into the cache).

This further extends work begun by commit e7ce710a ("xprtrdma:
Avoid deadlock when credit window is reset").
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Reviewed-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

eba8ff66

xprtrdma: Remove rl_mr field, and the mr_chunk union · 3eb35810

由 Chuck Lever 提交于 1月 21, 2015

Clean up: Since commit 0ac531c1 ("xprtrdma: Remove REGISTER
memory registration mode"), the rl_mr pointer is no longer used
anywhere.

After removal, there's only a single member of the mr_chunk union,
so mr_chunk can be removed as well, in favor of a single pointer
field.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Reviewed-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

3eb35810

xprtrdma: Remove rpcrdma_ep::rep_ia · 5d410ba0

由 Chuck Lever 提交于 1月 21, 2015

Clean up: This field is not used.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Reviewed-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

5d410ba0

xprtrdma: Rename "xprt" and "rdma_connect" fields in struct rpcrdma_xprt · 5abefb86

由 Chuck Lever 提交于 1月 21, 2015

Clean up: Use consistent field names in struct rpcrdma_xprt.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Reviewed-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

5abefb86

26 11月, 2014 1 次提交

xprtrdma: Cap req_cqinit · e7104a2a

由 Chuck Lever 提交于 11月 08, 2014

Recent work made FRMR registration and invalidation completions
unsignaled. This greatly reduces the adapter interrupt rate.

Every so often, however, a posted send Work Request is allowed to
signal. Otherwise, the provider's Work Queue will wrap and the
workload will hang.

The number of Work Requests that are allowed to remain unsignaled is
determined by the value of req_cqinit. Currently, this is set to the
size of the send Work Queue divided by two, minus 1.

For FRMR, the send Work Queue is the maximum number of concurrent
RPCs (currently 32) times the maximum number of Work Requests an
RPC might use (currently 7, though some adapters may need more).

For mlx4, this is 224 entries. This leaves completion signaling
disabled for 111 send Work Requests.

Some providers hold back dispatching Work Requests until a CQE is
generated.  If completions are disabled, then no CQEs are generated
for quite some time, and that can stall the Work Queue.

I've seen this occur running xfstests generic/113 over NFSv4, where
eventually, posting a FAST_REG_MR Work Request fails with -ENOMEM
because the Work Queue has overflowed. The connection is dropped
and re-established.

Cap the rep_cqinit setting so completions are not left turned off
for too long.

BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=269Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

e7104a2a

30 9月, 2014 1 次提交

svcrdma: advertise the correct max payload · 7e5be288

由 Steve Wise 提交于 9月 23, 2014

Svcrdma currently advertises 1MB, which is too large. The correct value
is the minimum of RPCSVC_MAXPAYLOAD and the max scatter-gather allowed
in an NFSRDMA IO chunk * the host page size. This bug is usually benign
because the Linux X64 NFSRDMA client correctly limits the payload size to
the correct value (64*4096 = 256KB). But if the Linux client is PPC64
with a 64KB page size, then the client will indeed use a payload size
that will overflow the server.
Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

7e5be288

01 8月, 2014 7 次提交

xprtrdma: Make rpcrdma_ep_disconnect() return void · 282191cb

由 Chuck Lever 提交于 7月 29, 2014

Clean up: The return code is used only for dprintk's that are
already redundant.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Tested-by: NSteve Wise <swise@opengridcomputing.com>
Tested-by: NShirley Ma <shirley.ma@oracle.com>
Tested-by: NDevesh Sharma <devesh.sharma@emulex.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

282191cb

xprtrdma: Reset FRMRs when FAST_REG_MR is flushed by a disconnect · 9f9d802a

由 Chuck Lever 提交于 7月 29, 2014

FAST_REG_MR Work Requests update a Memory Region's rkey. Rkey's are
used to block unwanted access to the memory controlled by an MR. The
rkey is passed to the receiver (the NFS server, in our case), and is
also used by xprtrdma to invalidate the MR when the RPC is complete.

When a FAST_REG_MR Work Request is flushed after a transport
disconnect, xprtrdma cannot tell whether the WR actually hit the
adapter or not. So it is indeterminant at that point whether the
existing rkey is still valid.

After the transport connection is re-established, the next
FAST_REG_MR or LOCAL_INV Work Request against that MR can sometimes
fail because the rkey value does not match what xprtrdma expects.

The only reliable way to recover in this case is to deregister and
register the MR before it is used again. These operations can be
done only in a process context, so handle it in the transport
connect worker.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Tested-by: NSteve Wise <swise@opengridcomputing.com>
Tested-by: NShirley Ma <shirley.ma@oracle.com>
Tested-by: NDevesh Sharma <devesh.sharma@emulex.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

9f9d802a

xprtrdma: Chain together all MWs in same buffer pool · 3111d72c

由 Chuck Lever 提交于 7月 29, 2014

During connection loss recovery, need to visit every MW in a
buffer pool. Any MW that is in use by an RPC will not be on the
rb_mws list.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Tested-by: NSteve Wise <swise@opengridcomputing.com>
Tested-by: NShirley Ma <shirley.ma@oracle.com>
Tested-by: NDevesh Sharma <devesh.sharma@emulex.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

3111d72c

xprtrdma: Unclutter struct rpcrdma_mr_seg · 0dbb4108

由 Chuck Lever 提交于 7月 29, 2014

Clean ups:
 - make it obvious that the rl_mw field is a pointer -- allocated
   separately, not as part of struct rpcrdma_mr_seg
 - promote "struct {} frmr;" to a named type
 - promote the state enum to a named type
 - name the MW state field the same way other fields in
   rpcrdma_mw are named
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Tested-by: NSteve Wise <swise@opengridcomputing.com>
Tested-by: NShirley Ma <shirley.ma@oracle.com>
Tested-by: NDevesh Sharma <devesh.sharma@emulex.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

0dbb4108

xprtrdma: Update rkeys after transport reconnect · 6ab59945

由 Chuck Lever 提交于 7月 29, 2014

Various reports of:

  rpcrdma_qp_async_error_upcall: QP error 3 on device mlx4_0
		ep ffff8800bfd3e848

Ensure that rkeys in already-marshalled RPC/RDMA headers are
refreshed after the QP has been replaced by a reconnect.

BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=249Suggested-by: NSelvin Xavier <Selvin.Xavier@Emulex.Com>
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Tested-by: NSteve Wise <swise@opengridcomputing.com>
Tested-by: NShirley Ma <shirley.ma@oracle.com>
Tested-by: NDevesh Sharma <devesh.sharma@emulex.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

6ab59945

xprtrdma: Limit data payload size for ALLPHYSICAL · 43e95988

由 Chuck Lever 提交于 7月 29, 2014

When the client uses physical memory registration, each page in the
payload gets its own array entry in the RPC/RDMA header's chunk list.

Therefore, don't advertise a maximum payload size that would require
more array entries than can fit in the RPC buffer where RPC/RDMA
headers are built.

BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=248Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Tested-by: NSteve Wise <swise@opengridcomputing.com>
Tested-by: NShirley Ma <shirley.ma@oracle.com>
Tested-by: NDevesh Sharma <devesh.sharma@emulex.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

43e95988

xprtrdma: Protect ia->ri_id when unmapping/invalidating MRs · 73806c88

由 Chuck Lever 提交于 7月 29, 2014

Ensure ia->ri_id remains valid while invoking dma_unmap_page() or
posting LOCAL_INV during a transport reconnect. Otherwise,
ia->ri_id->device or ia->ri_id->qp is NULL, which triggers a panic.

BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=259
Fixes: ec62f40d 'xprtrdma: Ensure ia->ri_id->qp is not NULL when reconnecting'
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Tested-by: NSteve Wise <swise@opengridcomputing.com>
Tested-by: NShirley Ma <shirley.ma@oracle.com>
Tested-by: NDevesh Sharma <devesh.sharma@emulex.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

73806c88

04 6月, 2014 9 次提交

xprtrdma: Avoid deadlock when credit window is reset · e7ce710a

由 Chuck Lever 提交于 5月 28, 2014

Update the cwnd while processing the server's reply.  Otherwise the
next task on the xprt_sending queue is still subject to the old
credit window. Currently, no task is awoken if the old congestion
window is still exceeded, even if the new window is larger, and a
deadlock results.

This is an issue during a transport reconnect. Servers don't
normally shrink the credit window, but the client does reset it to
1 when reconnecting so the server can safely grow it again.

As a minor optimization, remove the hack of grabbing the initial
cwnd size (which happens to be RPC_CWNDSCALE) and using that value
as the congestion scaling factor. The scaling value is invariant,
and we are better off without the multiplication operation.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

e7ce710a

xprtrdma: Limit work done by completion handler · 8301a2c0

由 Chuck Lever 提交于 5月 28, 2014

Sagi Grimberg <sagig@dev.mellanox.co.il> points out that a steady
stream of CQ events could starve other work because of the boundless
loop pooling in rpcrdma_{send,recv}_poll().

Instead of a (potentially infinite) while loop, return after
collecting a budgeted number of completions.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Acked-by: NSagi Grimberg <sagig@dev.mellanox.co.il>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

8301a2c0

xprtrmda: Reduce calls to ib_poll_cq() in completion handlers · 1c00dd07

由 Chuck Lever 提交于 5月 28, 2014

Change the completion handlers to grab up to 16 items per
ib_poll_cq() call. No extra ib_poll_cq() is needed if fewer than 16
items are returned.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

1c00dd07

xprtrdma: Split the completion queue · fc664485

由 Chuck Lever 提交于 5月 28, 2014

The current CQ handler uses the ib_wc.opcode field to distinguish
between event types. However, the contents of that field are not
reliable if the completion status is not IB_WC_SUCCESS.

When an error completion occurs on a send event, the CQ handler
schedules a tasklet with something that is not a struct rpcrdma_rep.
This is never correct behavior, and sometimes it results in a panic.

To resolve this issue, split the completion queue into a send CQ and
a receive CQ. The send CQ handler now handles only struct rpcrdma_mw
wr_id's, and the receive CQ handler now handles only struct
rpcrdma_rep wr_id's.

Fix suggested by Shirley Ma <shirley.ma@oracle.com>
Reported-by: NRafael Reiter <rafael.reiter@ims.co.at>
Fixes: 5c635e09
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=73211Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Tested-by: NKlemens Senn <klemens.senn@ims.co.at>
Tested-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

fc664485

xprtrdma: Make rpcrdma_ep_destroy() return void · 7f1d5419

由 Chuck Lever 提交于 5月 28, 2014

Clean up: rpcrdma_ep_destroy() returns a value that is used
only to print a debugging message. rpcrdma_ep_destroy() already
prints debugging messages in all error cases.

Make rpcrdma_ep_destroy() return void instead.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Tested-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

7f1d5419

xprtrdma: Simplify rpcrdma_deregister_external() synopsis · 13c9ff8f

由 Chuck Lever 提交于 5月 28, 2014

Clean up: All remaining callers of rpcrdma_deregister_external()
pass NULL as the last argument, so remove that argument.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Tested-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

13c9ff8f

xprtrdma: Remove MEMWINDOWS registration modes · b45ccfd2

由 Chuck Lever 提交于 5月 28, 2014

The MEMWINDOWS and MEMWINDOWS_ASYNC memory registration modes were
intended as stop-gap modes before the introduction of FRMR. They
are now considered obsolete.

MEMWINDOWS_ASYNC is also considered unsafe because it can leave
client memory registered and exposed for an indeterminant time after
each I/O.

At this point, the MEMWINDOWS modes add needless complexity, so
remove them.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Tested-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

b45ccfd2

xprtrdma: RPC/RDMA must invoke xprt_wake_pending_tasks() in process context · 254f91e2

由 Chuck Lever 提交于 5月 28, 2014

An IB provider can invoke rpcrdma_conn_func() in an IRQ context,
thus rpcrdma_conn_func() cannot be allowed to directly invoke
generic RPC functions like xprt_wake_pending_tasks().
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Tested-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

254f91e2

xprtrdma: mind the device's max fast register page list depth · 0fc6c4e7

由 Steve Wise 提交于 5月 28, 2014

Some rdma devices don't support a fast register page list depth of
at least RPCRDMA_MAX_DATA_SEGS.  So xprtrdma needs to chunk its fast
register regions according to the minimum of the device max supported
depth or RPCRDMA_MAX_DATA_SEGS.
Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
Reviewed-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

0fc6c4e7

01 2月, 2013 1 次提交

SUNRPC: Eliminate task->tk_xprt accesses that bypass rcu_dereference() · a4f0835c

由 Trond Myklebust 提交于 1月 08, 2013

tk_xprt is just a shortcut for tk_client->cl_xprt, however cl_xprt is
defined as an __rcu variable. Replace dereferences of tk_xprt with
non-rcu dereferences where it is safe to do so.
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

a4f0835c

18 2月, 2012 1 次提交

svcrdma: Cleanup sparse warnings in the svcrdma module · cec56c8f

由 Tom Tucker 提交于 2月 15, 2012

The svcrdma transport was un-marshalling requests in-place. This resulted
in sparse warnings due to __beXX data containing both NBO and HBO data.

The code has been restructured to do byte-swapping as the header is
parsed instead of when the header is validated immediately after receipt.

Also moved extern declarations for the workqueue and memory pools to the
private header file.
Signed-off-by: NTom Tucker <tom@ogc.us>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

cec56c8f

27 7月, 2011 1 次提交

atomic: use <linux/atomic.h> · 60063497

由 Arun Sharma 提交于 7月 26, 2011

This allows us to move duplicated code in <asm/atomic.h>
(atomic_inc_not_zero() for now) to <linux/atomic.h>
Signed-off-by: NArun Sharma <asharma@fb.com>
Reviewed-by: NEric Dumazet <eric.dumazet@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: David Miller <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: NMike Frysinger <vapier@gentoo.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

60063497

26 7月, 2011 1 次提交

RDMA: Increasing RPCRDMA_MAX_DATA_SEGS · 2773395b

由 Steve Dickson 提交于 7月 21, 2011

Our performance team has noticed that increasing
RPCRDMA_MAX_DATA_SEGS from 8 to 64 significantly
increases throughput when using the RDMA transport.
Signed-off-by: NSteve Dickson <steved@redhat.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

2773395b

12 3月, 2011 1 次提交

RPCRDMA: Fix FRMR registration/invalidate handling. · 5c635e09

由 Tom Tucker 提交于 2月 09, 2011

When the rpc_memreg_strategy is 5, FRMR are used to map RPC data.
This mode uses an FRMR to map the RPC data, then invalidates
(i.e. unregisers) the data in xprt_rdma_free. These FRMR are used
across connections on the same mount, i.e. if the connection goes
away on an idle timeout and reconnects later, the FRMR are not
destroyed and recreated.

This creates a problem for transport errors because the WR that
invalidate an FRMR may be flushed (i.e. fail) leaving the
FRMR valid. When the FRMR is later used to map an RPC it will fail,
tearing down the transport and starting over. Over time, more and
more of the FRMR pool end up in the wrong state resulting in
seemingly random disconnects.

This fix keeps track of the FRMR state explicitly by setting it's
state based on the successful completion of a reg/inv WR. If the FRMR
is ever used and found to be in the wrong state, an invalidate WR
is prepended, re-syncing the FRMR state and avoiding the connection loss.
Signed-off-by: NTom Tucker <tom@ogc.us>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

5c635e09

11 10月, 2008 2 次提交

RPC/RDMA: harden connection logic against missing/late rdma_cm upcalls. · 5675add3

由 Tom Talpey 提交于 10月 09, 2008

Add defensive timeouts to wait_for_completion() calls in RDMA
address resolution, and make them interruptible. Fix the timeout
units to milliseconds (formerly jiffies) and move to private header.
Signed-off-by: NTom Talpey <talpey@netapp.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

5675add3

RPC/RDMA: adhere to protocol for unpadded client trailing write chunks. · 9191ca3b

由 Tom Talpey 提交于 10月 09, 2008

The RPC/RDMA protocol allows clients and servers to avoid RDMA
operations for data which is purely the result of XDR padding.
On the client, automatically insert the necessary padding for
such server replies, and optionally don't marshal such chunks.
Signed-off-by: NTom Talpey <talpey@netapp.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

9191ca3b

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功