提交 · c93e986a295d537589efd0504f36ca952bd1a5be · openanolis / cloud-kernel

01 8月, 2014 8 次提交

xprtrdma: Back off rkey when FAST_REG_MR fails · c93e986a

由 Chuck Lever 提交于 7月 29, 2014

If posting a FAST_REG_MR Work Reqeust fails, revert the rkey update
to avoid subsequent IB_WC_MW_BIND_ERR completions.
Suggested-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

c93e986a

xprtrdma: Unclutter struct rpcrdma_mr_seg · 0dbb4108

由 Chuck Lever 提交于 7月 29, 2014

Clean ups:
 - make it obvious that the rl_mw field is a pointer -- allocated
   separately, not as part of struct rpcrdma_mr_seg
 - promote "struct {} frmr;" to a named type
 - promote the state enum to a named type
 - name the MW state field the same way other fields in
   rpcrdma_mw are named
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Tested-by: NSteve Wise <swise@opengridcomputing.com>
Tested-by: NShirley Ma <shirley.ma@oracle.com>
Tested-by: NDevesh Sharma <devesh.sharma@emulex.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

0dbb4108

xprtrdma: Don't invalidate FRMRs if registration fails · 539431a4

由 Chuck Lever 提交于 7月 29, 2014

If FRMR registration fails, it's likely to transition the QP to the
error state. Or, registration may have failed because the QP is
_already_ in ERROR.

Thus calling rpcrdma_deregister_external() in
rpcrdma_create_chunks() is useless in FRMR mode: the LOCAL_INVs just
get flushed.

It is safe to leave existing registrations: when FRMR registration
is tried again, rpcrdma_register_frmr_external() checks if each FRMR
is already/still VALID, and knocks it down first if it is.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Tested-by: NSteve Wise <swise@opengridcomputing.com>
Tested-by: NShirley Ma <shirley.ma@oracle.com>
Tested-by: NDevesh Sharma <devesh.sharma@emulex.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

539431a4

xprtrdma: On disconnect, don't ignore pending CQEs · a7bc211a

由 Chuck Lever 提交于 7月 29, 2014

xprtrdma is currently throwing away queued completions during
a reconnect. RPC replies posted just before connection loss, or
successful completions that change the state of an FRMR, can be
missed.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Tested-by: NSteve Wise <swise@opengridcomputing.com>
Tested-by: NShirley Ma <shirley.ma@oracle.com>
Tested-by: NDevesh Sharma <devesh.sharma@emulex.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

a7bc211a

xprtrdma: Update rkeys after transport reconnect · 6ab59945

由 Chuck Lever 提交于 7月 29, 2014

Various reports of:

  rpcrdma_qp_async_error_upcall: QP error 3 on device mlx4_0
		ep ffff8800bfd3e848

Ensure that rkeys in already-marshalled RPC/RDMA headers are
refreshed after the QP has been replaced by a reconnect.

BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=249Suggested-by: NSelvin Xavier <Selvin.Xavier@Emulex.Com>
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Tested-by: NSteve Wise <swise@opengridcomputing.com>
Tested-by: NShirley Ma <shirley.ma@oracle.com>
Tested-by: NDevesh Sharma <devesh.sharma@emulex.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

6ab59945

xprtrdma: Limit data payload size for ALLPHYSICAL · 43e95988

由 Chuck Lever 提交于 7月 29, 2014

When the client uses physical memory registration, each page in the
payload gets its own array entry in the RPC/RDMA header's chunk list.

Therefore, don't advertise a maximum payload size that would require
more array entries than can fit in the RPC buffer where RPC/RDMA
headers are built.

BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=248Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Tested-by: NSteve Wise <swise@opengridcomputing.com>
Tested-by: NShirley Ma <shirley.ma@oracle.com>
Tested-by: NDevesh Sharma <devesh.sharma@emulex.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

43e95988

xprtrdma: Protect ia->ri_id when unmapping/invalidating MRs · 73806c88

由 Chuck Lever 提交于 7月 29, 2014

Ensure ia->ri_id remains valid while invoking dma_unmap_page() or
posting LOCAL_INV during a transport reconnect. Otherwise,
ia->ri_id->device or ia->ri_id->qp is NULL, which triggers a panic.

BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=259
Fixes: ec62f40d 'xprtrdma: Ensure ia->ri_id->qp is not NULL when reconnecting'
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Tested-by: NSteve Wise <swise@opengridcomputing.com>
Tested-by: NShirley Ma <shirley.ma@oracle.com>
Tested-by: NDevesh Sharma <devesh.sharma@emulex.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

73806c88

xprtrdma: Fix panic in rpcrdma_register_frmr_external() · 5fc83f47

由 Chuck Lever 提交于 7月 29, 2014

seg1->mr_nsegs is not yet initialized when it is used to unmap
segments during an error exit. Use the same unmapping logic for
all error exits.

"if (frmr_wr.wr.fast_reg.length < len) {" used to be a BUG_ON check.
The broken code will never be executed under normal operation.

Fixes: c977dea2 (xprtrdma: Remove BUG_ON() call sites)
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Tested-by: NSteve Wise <swise@opengridcomputing.com>
Tested-by: NShirley Ma <shirley.ma@oracle.com>
Tested-by: NDevesh Sharma <devesh.sharma@emulex.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

5fc83f47

23 7月, 2014 1 次提交

xprtrdma: Fix DMA-API-DEBUG warning by checking dma_map result · bf858ab0

由 Yan Burman 提交于 6月 19, 2014

Fix the following warning when DMA-API debug is enabled by checking ib_dma_map_single result:
[ 1455.345548] ------------[ cut here ]------------
[ 1455.346863] WARNING: CPU: 3 PID: 3929 at /home/yanb/kernel/net-next/lib/dma-debug.c:1140 check_unmap+0x4e5/0x990()
[ 1455.349350] mlx4_core 0000:00:07.0: DMA-API: device driver failed to check map error[device address=0x000000007c9f2090] [size=2656 bytes] [mapped as single]
[ 1455.349350] Modules linked in: xprtrdma netconsole configfs nfsv3 nfs_acl ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm autofs4 auth_rpcgss oid_registry nfsv4 nfs fscache lockd sunrpc dm_mirror dm_region_hash dm_log microcode pcspkr mlx4_ib ib_sa ib_mad ib_core ib_addr mlx4_en ipv6 ptp pps_core vxlan mlx4_core virtio_balloon cirrus ttm drm_kms_helper drm sysimgblt sysfillrect syscopyarea i2c_piix4 i2c_core button ext3 jbd virtio_blk virtio_net virtio_pci virtio_ring virtio uhci_hcd ata_generic ata_piix libata
[ 1455.349350] CPU: 3 PID: 3929 Comm: mount.nfs Not tainted 3.15.0-rc1-dbg+ #13
[ 1455.349350] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007
[ 1455.349350]  0000000000000474 ffff880069dcf628 ffffffff8151c341 ffffffff817b69d8
[ 1455.349350]  ffff880069dcf678 ffff880069dcf668 ffffffff8105b5fc 0000000069dcf658
[ 1455.349350]  ffff880069dcf778 ffff88007b0c9f00 ffffffff8255ec40 0000000000000a60
[ 1455.349350] Call Trace:
[ 1455.349350]  [<ffffffff8151c341>] dump_stack+0x52/0x81
[ 1455.349350]  [<ffffffff8105b5fc>] warn_slowpath_common+0x8c/0xc0
[ 1455.349350]  [<ffffffff8105b6e6>] warn_slowpath_fmt+0x46/0x50
[ 1455.349350]  [<ffffffff812e6305>] check_unmap+0x4e5/0x990
[ 1455.349350]  [<ffffffff81521fb0>] ? _raw_spin_unlock_irq+0x30/0x60
[ 1455.349350]  [<ffffffff812e6a0a>] debug_dma_unmap_page+0x5a/0x60
[ 1455.349350]  [<ffffffffa0389583>] rpcrdma_deregister_internal+0xb3/0xd0 [xprtrdma]
[ 1455.349350]  [<ffffffffa038a639>] rpcrdma_buffer_destroy+0x69/0x170 [xprtrdma]
[ 1455.349350]  [<ffffffffa03872ff>] xprt_rdma_destroy+0x3f/0xb0 [xprtrdma]
[ 1455.349350]  [<ffffffffa04a95ff>] xprt_destroy+0x6f/0x80 [sunrpc]
[ 1455.349350]  [<ffffffffa04a9625>] xprt_put+0x15/0x20 [sunrpc]
[ 1455.349350]  [<ffffffffa04a899a>] rpc_free_client+0x8a/0xe0 [sunrpc]
[ 1455.349350]  [<ffffffffa04a8a58>] rpc_release_client+0x68/0xa0 [sunrpc]
[ 1455.349350]  [<ffffffffa04a9060>] rpc_shutdown_client+0xb0/0xc0 [sunrpc]
[ 1455.349350]  [<ffffffffa04a8f5d>] ? rpc_ping+0x5d/0x70 [sunrpc]
[ 1455.349350]  [<ffffffffa04a91ab>] rpc_create_xprt+0xbb/0xd0 [sunrpc]
[ 1455.349350]  [<ffffffffa04a9273>] rpc_create+0xb3/0x160 [sunrpc]
[ 1455.349350]  [<ffffffff81129749>] ? __probe_kernel_read+0x69/0xb0
[ 1455.349350]  [<ffffffffa053851c>] nfs_create_rpc_client+0xdc/0x100 [nfs]
[ 1455.349350]  [<ffffffffa0538cfa>] nfs_init_client+0x3a/0x90 [nfs]
[ 1455.349350]  [<ffffffffa05391c8>] nfs_get_client+0x478/0x5b0 [nfs]
[ 1455.349350]  [<ffffffffa0538e50>] ? nfs_get_client+0x100/0x5b0 [nfs]
[ 1455.349350]  [<ffffffff81172c6d>] ? kmem_cache_alloc_trace+0x24d/0x260
[ 1455.349350]  [<ffffffffa05393f3>] nfs_create_server+0xf3/0x4c0 [nfs]
[ 1455.349350]  [<ffffffffa0545ff0>] ? nfs_request_mount+0xf0/0x1a0 [nfs]
[ 1455.349350]  [<ffffffffa031c0c3>] nfs3_create_server+0x13/0x30 [nfsv3]
[ 1455.349350]  [<ffffffffa0546293>] nfs_try_mount+0x1f3/0x230 [nfs]
[ 1455.349350]  [<ffffffff8108ea21>] ? get_parent_ip+0x11/0x50
[ 1455.349350]  [<ffffffff812d6343>] ? __this_cpu_preempt_check+0x13/0x20
[ 1455.349350]  [<ffffffff810d632b>] ? try_module_get+0x6b/0x190
[ 1455.349350]  [<ffffffffa05449f7>] nfs_fs_mount+0x187/0x9d0 [nfs]
[ 1455.349350]  [<ffffffffa0545940>] ? nfs_clone_super+0x140/0x140 [nfs]
[ 1455.349350]  [<ffffffffa0543b20>] ? nfs_auth_info_match+0x40/0x40 [nfs]
[ 1455.349350]  [<ffffffff8117e360>] mount_fs+0x20/0xe0
[ 1455.349350]  [<ffffffff811a1c16>] vfs_kern_mount+0x76/0x160
[ 1455.349350]  [<ffffffff811a29a8>] do_mount+0x428/0xae0
[ 1455.349350]  [<ffffffff811a30f0>] SyS_mount+0x90/0xe0
[ 1455.349350]  [<ffffffff8152af52>] system_call_fastpath+0x16/0x1b
[ 1455.349350] ---[ end trace f1f31572972e211d ]---
Signed-off-by: NYan Burman <yanb@mellanox.com>
Reviewed-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

bf858ab0

07 6月, 2014 2 次提交

svcrdma: Fence LOCAL_INV work requests · 83710fc7

由 Steve Wise 提交于 6月 05, 2014

Fencing forces the invalidate to only happen after all prior send
work requests have been completed.
Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
Reported by : Devesh Sharma <Devesh.Sharma@Emulex.Com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

83710fc7

svcrdma: refactor marshalling logic · 0bf48289

由 Steve Wise 提交于 5月 28, 2014

This patch refactors the NFSRDMA server marshalling logic to
remove the intermediary map structures.  It also fixes an existing bug
where the NFSRDMA server was not minding the device fast register page
list length limitations.
Signed-off-by: NTom Tucker <tom@opengridcomputing.com>
Signed-off-by: NSteve Wise <swise@opengridcomputing.com>

0bf48289

04 6月, 2014 23 次提交

xprtrdma: Disconnect on registration failure · c93c6223

由 Chuck Lever 提交于 5月 28, 2014

If rpcrdma_register_external() fails during request marshaling, the
current RPC request is killed. Instead, this RPC should be retried
after reconnecting the transport instance.

The most likely reason for registration failure with FRMR is a
failed post_send, which would be due to a remote transport
disconnect or memory exhaustion. These issues can be recovered
by a retry.

Problems encountered in the marshaling logic itself will not be
corrected by trying again, so these should still kill a request.

Now that we've added a clean exit for marshaling errors, take the
opportunity to defang some BUG_ON's.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

c93c6223

xprtrdma: Remove BUG_ON() call sites · c977dea2

由 Chuck Lever 提交于 5月 28, 2014

If an error occurs in the marshaling logic, fail the RPC request
being processed, but leave the client running.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

c977dea2

xprtrdma: Avoid deadlock when credit window is reset · e7ce710a

由 Chuck Lever 提交于 5月 28, 2014

Update the cwnd while processing the server's reply.  Otherwise the
next task on the xprt_sending queue is still subject to the old
credit window. Currently, no task is awoken if the old congestion
window is still exceeded, even if the new window is larger, and a
deadlock results.

This is an issue during a transport reconnect. Servers don't
normally shrink the credit window, but the client does reset it to
1 when reconnecting so the server can safely grow it again.

As a minor optimization, remove the hack of grabbing the initial
cwnd size (which happens to be RPC_CWNDSCALE) and using that value
as the congestion scaling factor. The scaling value is invariant,
and we are better off without the multiplication operation.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

e7ce710a

xprtrdma: Reset connection timeout after successful reconnect · 18906972

由 Chuck Lever 提交于 5月 28, 2014

If the new connection is able to make forward progress, reset the
re-establish timeout. Otherwise it keeps growing even if disconnect
events are rare.

The same behavior as TCP is adopted: reconnect immediately if the
transport instance has been able to make some forward progress.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

18906972

xprtrdma: Use macros for reconnection timeout constants · bfaee096

由 Chuck Lever 提交于 5月 28, 2014

Clean up: Ensure the same max and min constant values are used
everywhere when setting reconnect timeouts.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

bfaee096

xprtrdma: Allocate missing pagelist · 196c6998

由 Shirley Ma 提交于 5月 28, 2014

GETACL relies on transport layer to alloc memory for reply buffer.
However xprtrdma assumes that the reply buffer (pagelist) has been
pre-allocated in upper layer. This problem was reported by IOL OFA lab
test on PPC.
Signed-off-by: NShirley Ma <shirley.ma@oracle.com>
Reviewed-by: NChuck Lever <chuck.lever@oracle.com>
Tested-by: NEdward Mossman <emossman@iol.unh.edu>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

196c6998

xprtrdma: Remove Tavor MTU setting · 5bc4bc72

由 Chuck Lever 提交于 5月 28, 2014

Clean up.  Remove HCA-specific clutter in xprtrdma, which is
supposed to be device-independent.

Hal Rosenstock <hal@dev.mellanox.co.il> observes:
> Note that there is OpenSM option (enable_quirks) to return 1K MTU
> in SA PathRecord responses for Tavor so that can be used for this.
> The default setting for enable_quirks is FALSE so that would need
> changing.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

5bc4bc72

xprtrdma: Ensure ia->ri_id->qp is not NULL when reconnecting · ec62f40d

由 Chuck Lever 提交于 5月 28, 2014

Devesh Sharma <Devesh.Sharma@Emulex.Com> reports that after a
disconnect, his HCA is failing to create a fresh QP, leaving
ia_ri->ri_id->qp set to NULL. But xprtrdma still allows RPCs to
wake up and post LOCAL_INV as they exit, causing an oops.

rpcrdma_ep_connect() is allowing the wake-up by leaking the QP
creation error code (-EPERM in this case) to the RPC client's
generic layer. xprt_connect_status() does not recognize -EPERM, so
it kills pending RPC tasks immediately rather than retrying the
connect.

Re-arrange the QP creation logic so that when it fails on reconnect,
it leaves ->qp with the old QP rather than NULL.  If pending RPC
tasks wake and exit, LOCAL_INV work requests will flush rather than
oops.

On initial connect, leaving ->qp == NULL is OK, since there are no
pending RPCs that might use ->qp. But be sure not to try to destroy
a NULL QP when rpcrdma_ep_connect() is retried.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

ec62f40d

xprtrdma: Reduce the number of hardway buffer allocations · 65866f82

由 Chuck Lever 提交于 5月 28, 2014

While marshaling an RPC/RDMA request, the inline_{rsize,wsize}
settings determine whether an inline request is used, or whether
read or write chunks lists are built. The current default value of
these settings is 1024. Any RPC request smaller than 1024 bytes is
sent to the NFS server completely inline.

rpcrdma_buffer_create() allocates and pre-registers a set of RPC
buffers for each transport instance, also based on the inline rsize
and wsize settings.

RPC/RDMA requests and replies are built in these buffers. However,
if an RPC/RDMA request is expected to be larger than 1024, a buffer
has to be allocated and registered for that RPC, and deregistered
and released when the RPC is complete. This is known has a
"hardway allocation."

Since the introduction of NFSv4, the size of RPC requests has become
larger, and hardway allocations are thus more frequent. Hardway
allocations are significant overhead, and they waste the existing
RPC buffers pre-allocated by rpcrdma_buffer_create().

We'd like fewer hardway allocations.

Increasing the size of the pre-registered buffers is the most direct
way to do this. However, a blanket increase of the inline thresholds
has interoperability consequences.

On my 64-bit system, rpcrdma_buffer_create() requests roughly 7000
bytes for each RPC request buffer, using kmalloc(). Due to internal
fragmentation, this wastes nearly 1200 bytes because kmalloc()
already returns an 8192-byte piece of memory for a 7000-byte
allocation request, though the extra space remains unused.

So let's round up the size of the pre-allocated buffers, and make
use of the unused space in the kmalloc'd memory.

This change reduces the amount of hardway allocated memory for an
NFSv4 general connectathon run from 1322092 to 9472 bytes (99%).
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Tested-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

65866f82

xprtrdma: Limit work done by completion handler · 8301a2c0

由 Chuck Lever 提交于 5月 28, 2014

Sagi Grimberg <sagig@dev.mellanox.co.il> points out that a steady
stream of CQ events could starve other work because of the boundless
loop pooling in rpcrdma_{send,recv}_poll().

Instead of a (potentially infinite) while loop, return after
collecting a budgeted number of completions.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Acked-by: NSagi Grimberg <sagig@dev.mellanox.co.il>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

8301a2c0

xprtrmda: Reduce calls to ib_poll_cq() in completion handlers · 1c00dd07

由 Chuck Lever 提交于 5月 28, 2014

Change the completion handlers to grab up to 16 items per
ib_poll_cq() call. No extra ib_poll_cq() is needed if fewer than 16
items are returned.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

1c00dd07

xprtrmda: Reduce lock contention in completion handlers · 7f23f6f6

由 Chuck Lever 提交于 5月 28, 2014

Skip the ib_poll_cq() after re-arming, if the provider knows there
are no additional items waiting. (Have a look at commit ed23a727 for
more details).
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

7f23f6f6

xprtrdma: Split the completion queue · fc664485

由 Chuck Lever 提交于 5月 28, 2014

The current CQ handler uses the ib_wc.opcode field to distinguish
between event types. However, the contents of that field are not
reliable if the completion status is not IB_WC_SUCCESS.

When an error completion occurs on a send event, the CQ handler
schedules a tasklet with something that is not a struct rpcrdma_rep.
This is never correct behavior, and sometimes it results in a panic.

To resolve this issue, split the completion queue into a send CQ and
a receive CQ. The send CQ handler now handles only struct rpcrdma_mw
wr_id's, and the receive CQ handler now handles only struct
rpcrdma_rep wr_id's.

Fix suggested by Shirley Ma <shirley.ma@oracle.com>
Reported-by: NRafael Reiter <rafael.reiter@ims.co.at>
Fixes: 5c635e09
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=73211Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Tested-by: NKlemens Senn <klemens.senn@ims.co.at>
Tested-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

fc664485

xprtrdma: Make rpcrdma_ep_destroy() return void · 7f1d5419

由 Chuck Lever 提交于 5月 28, 2014

Clean up: rpcrdma_ep_destroy() returns a value that is used
only to print a debugging message. rpcrdma_ep_destroy() already
prints debugging messages in all error cases.

Make rpcrdma_ep_destroy() return void instead.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Tested-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

7f1d5419

xprtrdma: Simplify rpcrdma_deregister_external() synopsis · 13c9ff8f

由 Chuck Lever 提交于 5月 28, 2014

Clean up: All remaining callers of rpcrdma_deregister_external()
pass NULL as the last argument, so remove that argument.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Tested-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

13c9ff8f

xprtrdma: mount reports "Invalid mount option" if memreg mode not supported · cdd9ade7

由 Chuck Lever 提交于 5月 28, 2014

If the selected memory registration mode is not supported by the
underlying provider/HCA, the NFS mount command reports that there was
an invalid mount option, and fails. This is misleading.

Reporting a problem allocating memory is a lot closer to the truth.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

cdd9ade7

xprtrdma: Fall back to MTHCAFMR when FRMR is not supported · f10eafd3

由 Chuck Lever 提交于 5月 28, 2014

An audit of in-kernel RDMA providers that do not support the FRMR
memory registration shows that several of them support MTHCAFMR.
Prefer MTHCAFMR when FRMR is not supported.

If MTHCAFMR is not supported, only then choose ALLPHYSICAL.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

f10eafd3

xprtrdma: Remove REGISTER memory registration mode · 0ac531c1

由 Chuck Lever 提交于 5月 28, 2014

All kernel RDMA providers except amso1100 support either MTHCAFMR
or FRMR, both of which are faster than REGISTER.  amso1100 can
continue to use ALLPHYSICAL.

The only other ULP consumer in the kernel that uses the reg_phys_mr
verb is Lustre.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

0ac531c1

xprtrdma: Remove MEMWINDOWS registration modes · b45ccfd2

由 Chuck Lever 提交于 5月 28, 2014

The MEMWINDOWS and MEMWINDOWS_ASYNC memory registration modes were
intended as stop-gap modes before the introduction of FRMR. They
are now considered obsolete.

MEMWINDOWS_ASYNC is also considered unsafe because it can leave
client memory registered and exposed for an indeterminant time after
each I/O.

At this point, the MEMWINDOWS modes add needless complexity, so
remove them.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Tested-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

b45ccfd2

xprtrdma: Remove BOUNCEBUFFERS memory registration mode · 03ff8821

由 Chuck Lever 提交于 5月 28, 2014

Clean up: This memory registration mode is slow and was never
meant for use in production environments. Remove it to reduce
implementation complexity.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Tested-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

03ff8821

xprtrdma: RPC/RDMA must invoke xprt_wake_pending_tasks() in process context · 254f91e2

由 Chuck Lever 提交于 5月 28, 2014

An IB provider can invoke rpcrdma_conn_func() in an IRQ context,
thus rpcrdma_conn_func() cannot be allowed to directly invoke
generic RPC functions like xprt_wake_pending_tasks().
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Tested-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

254f91e2

nfs-rdma: Fix for FMR leaks · 4034ba04

由 Allen Andrews 提交于 5月 28, 2014

Two memory region leaks were found during testing:

1. rpcrdma_buffer_create: While allocating RPCRDMA_FRMR's
ib_alloc_fast_reg_mr is called and then ib_alloc_fast_reg_page_list is
called. If ib_alloc_fast_reg_page_list returns an error it bails out of
the routine dropping the last ib_alloc_fast_reg_mr frmr region creating a
memory leak. Added code to dereg the last frmr if
ib_alloc_fast_reg_page_list fails.

2. rpcrdma_buffer_destroy: While cleaning up, the routine will only free
the MR's on the rb_mws list if there are rb_send_bufs present. However, in
rpcrdma_buffer_create while the rb_mws list is being built if one of the MR
allocation requests fail after some MR's have been allocated on the rb_mws
list the routine never gets to create any rb_send_bufs but instead jumps to
the rpcrdma_buffer_destroy routine which will never free the MR's on rb_mws
list because the rb_send_bufs were never created. This leaks all the MR's
on the rb_mws list that were created prior to one of the MR allocations
failing.

Issue(2) was seen during testing. Our adapter had a finite number of MR's
available and we created enough connections to where we saw an MR
allocation failure on our Nth NFS connection request. After the kernel
cleaned up the resources it had allocated for the Nth connection we noticed
that FMR's had been leaked due to the coding error described above.

Issue(1) was seen during a code review while debugging issue(2).
Signed-off-by: NAllen Andrews <allen.andrews@emulex.com>
Reviewed-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

4034ba04

xprtrdma: mind the device's max fast register page list depth · 0fc6c4e7

由 Steve Wise 提交于 5月 28, 2014

Some rdma devices don't support a fast register page list depth of
at least RPCRDMA_MAX_DATA_SEGS.  So xprtrdma needs to chunk its fast
register regions according to the minimum of the device max supported
depth or RPCRDMA_MAX_DATA_SEGS.
Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
Reviewed-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

0fc6c4e7

23 5月, 2014 1 次提交

NFSD: Ignore client's source port on RDMA transports · 16e4d93f

由 Chuck Lever 提交于 5月 19, 2014

An NFS/RDMA client's source port is meaningless for RDMA transports.
The transport layer typically sets the source port value on the
connection to a random ephemeral port.

Currently, NFS server administrators must specify the "insecure"
export option to enable clients to access exports via RDMA.

But this means NFS clients can access such an export via IP using an
ephemeral port, which may not be desirable.

This patch eliminates the need to specify the "insecure" export
option to allow NFS/RDMA clients access to an export.

BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=250Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

16e4d93f

29 3月, 2014 3 次提交

svcrdma: fix offset calculation for non-page aligned sge entries · 3cbe01a9

由 Jeff Layton 提交于 3月 17, 2014

The xdr_off value in dma_map_xdr gets passed to ib_dma_map_page as the
offset into the page to be mapped. This calculation does not correctly
take into account the case where the data starts at some offset into
the page. Increment the xdr_off by the page_base to ensure that it is
respected.

Cc: Tom Tucker <tom@opengridcomputing.com>
Signed-off-by: NJeff Layton <jlayton@redhat.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

3cbe01a9

xprtrdma: add separate Kconfig options for NFSoRDMA client and server support · 2e8c12e1

由 Jeff Layton 提交于 3月 18, 2014

There are two entirely separate modules under xprtrdma/ and there's no
reason that enabling one should automatically enable the other. Add
config options for each one so they can be enabled/disabled separately.
Signed-off-by: NJeff Layton <jlayton@redhat.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

2e8c12e1

Fix regression in NFSRDMA server · 7e4359e2

由 Tom Tucker 提交于 3月 25, 2014

The server regression was caused by the addition of rq_next_page
(afc59400). There were a few places that
were missed with the update of the rq_respages array.
Signed-off-by: NTom Tucker <tom@ogc.us>
Tested-by: NSteve Wise <swise@ogc.us>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

7e4359e2

28 3月, 2014 1 次提交

svcrdma: fix printk when memory allocation fails · c42a01ee

由 Jeff Layton 提交于 3月 10, 2014

It retries in 1s, not 1000 jiffies.
Signed-off-by: NJeff Layton <jlayton@redhat.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

c42a01ee

18 3月, 2014 1 次提交

SUNRPC: remove KERN_INFO from dprintk() call sites · 3a0799a9

由 Chuck Lever 提交于 3月 12, 2014

The use of KERN_INFO causes garbage characters to appear when
debugging is enabled.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

3a0799a9

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功