提交 · a473018cfe0ef1e46c0ff9df3fa02afc23c9f1d2 · openeuler / Kernel

12 7月, 2016 3 次提交

xprtrdma: Remove rpcrdma_map_one() and friends · a473018c

由 Chuck Lever 提交于 6月 29, 2016

Clean up: ALLPHYSICAL is gone and FMR has been converted to use
scatterlists. There are no more users of these functions.

This patch shrinks the size of struct rpcrdma_req by about 3500
bytes on x86_64. There is one of these structs for each RPC credit
(128 credits per transport connection).
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Tested-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

a473018c

xprtrdma: Remove ALLPHYSICAL memory registration mode · 2dc3a69d

由 Chuck Lever 提交于 6月 29, 2016

No HCA or RNIC in the kernel tree requires the use of ALLPHYSICAL.

ALLPHYSICAL advertises in the clear on the network fabric an R_key
that is good for all of the client's memory. No known exploit
exists, but theoretically any user on the server can use that R_key
on the client's QP to read or update any part of the client's memory.

ALLPHYSICAL exposes the client to server bugs, including:
 o base/bounds errors causing data outside the i/o buffer to be
   accessed
 o RDMA access after reply causing data corruption and/or integrity
   fail

ALLPHYSICAL can't protect application memory regions from server
update after a local signal or soft timeout has terminated an RPC.

ALLPHYSICAL chunks are no larger than a page. Special cases to
handle small chunks and long chunk lists have been a source of
implementation complexity and bugs.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Tested-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

2dc3a69d

xprtrdma: Refactor MR recovery work queues · 505bbe64

由 Chuck Lever 提交于 6月 29, 2016

I found that commit ead3f26e ("xprtrdma: Add ro_unmap_safe
memreg method"), which introduces ro_unmap_safe, never wired up the
FMR recovery worker.

The FMR and FRWR recovery work queues both do the same thing.
Instead of setting up separate individual work queues for this,
schedule a delayed worker to deal with them, since recovering MRs is
not performance-critical.

Fixes: ead3f26e ("xprtrdma: Add ro_unmap_safe memreg method")
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Tested-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

505bbe64

18 5月, 2016 4 次提交

xprtrdma: Remove qplock · 6e14a92c

由 Chuck Lever 提交于 5月 04, 2016

Clean up.

After "xprtrdma: Remove ro_unmap() from all registration modes",
there are no longer any sites that take rpcrdma_ia::qplock for read.
The one site that takes it for write is always single-threaded. It
is safe to remove it.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

6e14a92c

xprtrdma: Faster server reboot recovery · b2dde94b

由 Chuck Lever 提交于 5月 02, 2016

In a cluster failover scenario, it is desirable for the client to
attempt to reconnect quickly, as an alternate NFS server is already
waiting to take over for the down server. The client can't see that
a server IP address has moved to a new server until the existing
connection is gone.

For fabrics and devices where it is meaningful, set a definite upper
bound on the amount of time before it is determined that a
connection is no longer valid. This allows the RPC client to detect
connection loss in a timely matter, then perform a fresh resolution
of the server GUID in case it has changed (cluster failover).
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Tested-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

b2dde94b

xprtrdma: Use core ib_drain_qp() API · 550d7502

由 Chuck Lever 提交于 5月 02, 2016

Clean up: Replace rpcrdma_flush_cqs() and rpcrdma_clean_cqs() with
the new ib_drain_qp() API.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Reviewed-By: NLeon Romanovsky <leonro@mellanox.com>
Tested-by: NSteve Wise <swise@opengridcomputing.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

550d7502

xprtrdma: Limit number of RDMA segments in RPC-over-RDMA headers · 94931746

由 Chuck Lever 提交于 5月 02, 2016

Send buffer space is shared between the RPC-over-RDMA header and
an RPC message. A large RPC-over-RDMA header means less space is
available for the associated RPC message, which then has to be
moved via an RDMA Read or Write.

As more segments are added to the chunk lists, the header increases
in size.  Typical modern hardware needs only a few segments to
convey the maximum payload size, but some devices and registration
modes may need a lot of segments to convey data payload. Sometimes
so many are needed that the remaining space in the Send buffer is
not enough for the RPC message. Sending such a message usually
fails.

To ensure a transport can always make forward progress, cap the
number of RDMA segments that are allowed in chunk lists. This
prevents less-capable devices and memory registrations from
consuming a large portion of the Send buffer by reducing the
maximum data payload that can be conveyed with such devices.

For now I choose an arbitrary maximum of 8 RDMA segments. This
allows a maximum size RPC-over-RDMA header to fit nicely in the
current 1024 byte inline threshold with over 700 bytes remaining
for an inline RPC message.

The current maximum data payload of NFS READ or WRITE requests is
one megabyte. To convey that payload on a client with 4KB pages,
each chunk segment would need to handle 32 or more data pages. This
is well within the capabilities of FMR. For physical registration,
the maximum payload size on platforms with 4KB pages is reduced to
32KB.

For FRWR, a device's maximum page list depth would need to be at
least 34 to support the maximum 1MB payload. A device with a smaller
maximum page list depth means the maximum data payload is reduced
when using that device.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Tested-by: NSteve Wise <swise@opengridcomputing.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

94931746

15 3月, 2016 3 次提交

xprtrdma: Use new CQ API for RPC-over-RDMA client send CQs · 2fa8f88d

由 Chuck Lever 提交于 3月 04, 2016

Calling ib_poll_cq() to sort through WCs during a completion is a
common pattern amongst RDMA consumers. Since commit 14d3a3b2
("IB: add a proper completion queue abstraction"), WC sorting can
be handled by the IB core.

By converting to this new API, xprtrdma is made a better neighbor to
other RDMA consumers, as it allows the core to schedule the delivery
of completions more fairly amongst all active consumers.

Because each ib_cqe carries a pointer to a completion method, the
core can now post its own operations on a consumer's QP, and handle
the completions itself, without changes to the consumer.

Send completions were previously handled entirely in the completion
upcall handler (ie, deferring to a process context is unneeded).
Thus IB_POLL_SOFTIRQ is a direct replacement for the current
xprtrdma send code path.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Reviewed-by: NDevesh Sharma <devesh.sharma@broadcom.com>
Reviewed-by: NSagi Grimberg <sagig@mellanox.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

2fa8f88d

xprtrdma: Use new CQ API for RPC-over-RDMA client receive CQs · 552bf225

由 Chuck Lever 提交于 3月 04, 2016

Calling ib_poll_cq() to sort through WCs during a completion is a
common pattern amongst RDMA consumers. Since commit 14d3a3b2
("IB: add a proper completion queue abstraction"), WC sorting can
be handled by the IB core.

By converting to this new API, xprtrdma is made a better neighbor to
other RDMA consumers, as it allows the core to schedule the delivery
of completions more fairly amongst all active consumers.

Because each ib_cqe carries a pointer to a completion method, the
core can now post its own operations on a consumer's QP, and handle
the completions itself, without changes to the consumer.

xprtrdma's reply processing is already handled in a work queue, but
there is some initial order-dependent processing that is done in the
soft IRQ context before a work item is scheduled.

IB_POLL_SOFTIRQ is a direct replacement for the current xprtrdma
receive code path.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Reviewed-by: NDevesh Sharma <devesh.sharma@broadcom.com>
Reviewed-by: NSagi Grimberg <sagig@mellanox.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

552bf225

xprtrdma: Serialize credit accounting again · 23826c7a

由 Chuck Lever 提交于 3月 04, 2016

Commit fe97b47c ("xprtrdma: Use workqueue to process RPC/RDMA
replies") replaced the reply tasklet with a workqueue that allows
RPC replies to be processed in parallel. Thus the credit values in
RPC-over-RDMA replies can be applied in a different order than in
which the server sent them.

To fix this, revert commit eba8ff66 ("xprtrdma: Move credit
update to RPC reply handler"). Reverting is done by hand to
accommodate code changes that have occurred since then.

Fixes: fe97b47c ("xprtrdma: Use workqueue to process . . .")
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Reviewed-by: NSagi Grimberg <sagig@mellanox.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

23826c7a

23 12月, 2015 1 次提交

xprtrdma: Avoid calling ib_query_device · e3e45b1b

由 Or Gerlitz 提交于 12月 18, 2015

Instead, use the cached copy of the attributes present on the device.
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

e3e45b1b

19 12月, 2015 3 次提交

xprtrdma: Revert commit ('xprtrdma: Cap req_cqinit'). · 26ae9d1c

由 Chuck Lever 提交于 12月 16, 2015

The root of the problem was that sends (especially unsignalled
FASTREG and LOCAL_INV Work Requests) were not properly flow-
controlled, which allowed a send queue overrun.

Now that the RPC/RDMA reply handler waits for invalidation to
complete, the send queue is properly flow-controlled. Thus this
limit is no longer necessary.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Tested-by: NDevesh Sharma <devesh.sharma@avagotech.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

26ae9d1c

xprtrdma: Fix additional uses of spin_lock_irqsave(rb_lock) · 9b06688b

由 Chuck Lever 提交于 12月 16, 2015

Clean up.

rb_lock critical sections added in rpcrdma_ep_post_extra_recv()
should have first been converted to use normal spin_lock now that
the reply handler is a work queue.

The backchannel set up code should use the appropriate helper
instead of open-coding a rb_recv_bufs list add.

Problem introduced by glib patch re-ordering on my part.

Fixes: f531a5db ('xprtrdma: Pre-allocate backward rpc_rqst')
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Tested-by: NDevesh Sharma <devesh.sharma@avagotech.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

9b06688b

xprtrdma: clean up some curly braces · 38b95bcf

由 Dan Carpenter 提交于 11月 05, 2015

It doesn't matter either way, but the curly braces were clearly intended
here.  It causes a Smatch warning.
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

38b95bcf

03 11月, 2015 9 次提交

xprtrdma: Pre-allocate Work Requests for backchannel · 124fa17d