提交 · edef1297f33a4546559d905457b435a5ea160bab · openeuler / raspberrypi-kernel

26 11月, 2014 1 次提交

SUNRPC: serialize iostats updates · edef1297

由 Chuck Lever 提交于 11月 08, 2014

Occasionally mountstats reports a negative retransmission rate.
Ensure that two RPCs completing concurrently don't confuse the sums
in the transport's op_metrics array.

Since pNFS filelayout can invoke rpc_count_iostats() on another
transport from xprt_release(), we can't rely on simply holding the
transport_lock in xprt_release(). There's nothing for it but hard
serialization. One spin lock per RPC operation should make this as
painless as it can be.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

edef1297

14 11月, 2014 1 次提交

sunrpc: fix sleeping under rcu_read_lock in gss_stringify_acceptor · b3ecba09

由 Jeff Layton 提交于 11月 13, 2014

Bruce reported that he was seeing the following BUG pop:

    BUG: sleeping function called from invalid context at mm/slab.c:2846
    in_atomic(): 0, irqs_disabled(): 0, pid: 4539, name: mount.nfs
    2 locks held by mount.nfs/4539:
    #0:  (nfs_clid_init_mutex){+.+.+.}, at: [<ffffffffa01c0a9a>] nfs4_discover_server_trunking+0x4a/0x2f0 [nfsv4]
    #1:  (rcu_read_lock){......}, at: [<ffffffffa00e3185>] gss_stringify_acceptor+0x5/0xb0 [auth_rpcgss]
    Preemption disabled at:[<ffffffff81a4f082>] printk+0x4d/0x4f

    CPU: 3 PID: 4539 Comm: mount.nfs Not tainted 3.18.0-rc1-00013-g5b095e99 #3393
    Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
    ffff880021499390 ffff8800381476a8 ffffffff81a534cf 0000000000000001
    0000000000000000 ffff8800381476c8 ffffffff81097854 00000000000000d0
    0000000000000018 ffff880038147718 ffffffff8118e4f3 0000000020479f00
    Call Trace:
    [<ffffffff81a534cf>] dump_stack+0x4f/0x7c
    [<ffffffff81097854>] __might_sleep+0x114/0x180
    [<ffffffff8118e4f3>] __kmalloc+0x1a3/0x280
    [<ffffffffa00e31d8>] gss_stringify_acceptor+0x58/0xb0 [auth_rpcgss]
    [<ffffffffa00e3185>] ? gss_stringify_acceptor+0x5/0xb0 [auth_rpcgss]
    [<ffffffffa006b438>] rpcauth_stringify_acceptor+0x18/0x30 [sunrpc]
    [<ffffffffa01b0469>] nfs4_proc_setclientid+0x199/0x380 [nfsv4]
    [<ffffffffa01b04d0>] ? nfs4_proc_setclientid+0x200/0x380 [nfsv4]
    [<ffffffffa01bdf1a>] nfs40_discover_server_trunking+0xda/0x150 [nfsv4]
    [<ffffffffa01bde45>] ? nfs40_discover_server_trunking+0x5/0x150 [nfsv4]
    [<ffffffffa01c0acf>] nfs4_discover_server_trunking+0x7f/0x2f0 [nfsv4]
    [<ffffffffa01c8e24>] nfs4_init_client+0x104/0x2f0 [nfsv4]
    [<ffffffffa01539b4>] nfs_get_client+0x314/0x3f0 [nfs]
    [<ffffffffa0153780>] ? nfs_get_client+0xe0/0x3f0 [nfs]
    [<ffffffffa01c83aa>] nfs4_set_client+0x8a/0x110 [nfsv4]
    [<ffffffffa0069708>] ? __rpc_init_priority_wait_queue+0xa8/0xf0 [sunrpc]
    [<ffffffffa01c9b2f>] nfs4_create_server+0x12f/0x390 [nfsv4]
    [<ffffffffa01c1472>] nfs4_remote_mount+0x32/0x60 [nfsv4]
    [<ffffffff81196489>] mount_fs+0x39/0x1b0
    [<ffffffff81166145>] ? __alloc_percpu+0x15/0x20
    [<ffffffff811b276b>] vfs_kern_mount+0x6b/0x150
    [<ffffffffa01c1396>] nfs_do_root_mount+0x86/0xc0 [nfsv4]
    [<ffffffffa01c1784>] nfs4_try_mount+0x44/0xc0 [nfsv4]
    [<ffffffffa01549b7>] ? get_nfs_version+0x27/0x90 [nfs]
    [<ffffffffa0161a2d>] nfs_fs_mount+0x47d/0xd60 [nfs]
    [<ffffffff81a59c5e>] ? mutex_unlock+0xe/0x10
    [<ffffffffa01606a0>] ? nfs_remount+0x430/0x430 [nfs]
    [<ffffffffa01609c0>] ? nfs_clone_super+0x140/0x140 [nfs]
    [<ffffffff81196489>] mount_fs+0x39/0x1b0
    [<ffffffff81166145>] ? __alloc_percpu+0x15/0x20
    [<ffffffff811b276b>] vfs_kern_mount+0x6b/0x150
    [<ffffffff811b5830>] do_mount+0x210/0xbe0
    [<ffffffff811b54ca>] ? copy_mount_options+0x3a/0x160
    [<ffffffff811b651f>] SyS_mount+0x6f/0xb0
    [<ffffffff81a5c852>] system_call_fastpath+0x12/0x17

Sleeping under the rcu_read_lock is bad. This patch fixes it by dropping
the rcu_read_lock before doing the allocation and then reacquiring it
and redoing the dereference before doing the copy. If we find that the
string has somehow grown in the meantime, we'll reallocate and try again.

Cc: <stable@vger.kernel.org> # v3.17+
Reported-by: N"J. Bruce Fields" <bfields@fieldses.org>
Signed-off-by: NJeff Layton <jlayton@primarydata.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

b3ecba09

30 9月, 2014 1 次提交

svcrdma: advertise the correct max payload · 7e5be288

由 Steve Wise 提交于 9月 23, 2014

Svcrdma currently advertises 1MB, which is too large. The correct value
is the minimum of RPCSVC_MAXPAYLOAD and the max scatter-gather allowed
in an NFSRDMA IO chunk * the host page size. This bug is usually benign
because the Linux X64 NFSRDMA client correctly limits the payload size to
the correct value (64*4096 = 256KB). But if the Linux client is PPC64
with a 64KB page size, then the client will indeed use a payload size
that will overflow the server.
Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

7e5be288

26 9月, 2014 1 次提交

SUNRPC: Add missing support for RPC_CLNT_CREATE_NO_RETRANS_TIMEOUT · 2aca5b86

由 Trond Myklebust 提交于 9月 24, 2014

The flag RPC_CLNT_CREATE_NO_RETRANS_TIMEOUT was intended introduced in
order to allow NFSv4 clients to disable resend timeouts. Since those
cause the RPC layer to break the connection, they mess up the duplicate
reply caches that remain indexed on the port number in NFSv4..

This patch includes the code that was missing in the original to
set the appropriate flag in struct rpc_clnt, when the caller of
rpc_create() sets RPC_CLNT_CREATE_NO_RETRANS_TIMEOUT.

Fixes: 8a19a0b6 (SUNRPC: Add RPC task and client level options to...)
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

2aca5b86

25 9月, 2014 4 次提交

NFS/SUNRPC: Remove other deadlock-avoidance mechanisms in nfs_release_page() · 1aff5256

由 NeilBrown 提交于 9月 24, 2014

Now that nfs_release_page() doesn't block indefinitely, other deadlock
avoidance mechanisms aren't needed.
 - it doesn't hurt for kswapd to block occasionally.  If it doesn't
   want to block it would clear __GFP_WAIT.  The current_is_kswapd()
   was only added to avoid deadlocks and we have a new approach for
   that.
 - memory allocation in the SUNRPC layer can very rarely try to
   ->releasepage() a page it is trying to handle.  The deadlock
   is removed as nfs_release_page() doesn't block indefinitely.

So we don't need to set PF_FSTRANS for sunrpc network operations any
more.
Signed-off-by: NNeilBrown <neilb@suse.de>
Acked-by: NJeff Layton <jlayton@primarydata.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

1aff5256

rpc: Add -EPERM processing for xs_udp_send_request() · 3dedbb5c

由 Jason Baron 提交于 9月 24, 2014

If an iptables drop rule is added for an nfs server, the client can end up in
a softlockup. Because of the way that xs_sendpages() is structured, the -EPERM
is ignored since the prior bits of the packet may have been successfully queued
and thus xs_sendpages() returns a non-zero value. Then, xs_udp_send_request()
thinks that because some bits were queued it should return -EAGAIN. We then try
the request again and again, resulting in cpu spinning. Reproducer:

1) open a file on the nfs server '/nfs/foo' (mounted using udp)
2) iptables -A OUTPUT -d <nfs server ip> -j DROP
3) write to /nfs/foo
4) close /nfs/foo
5) iptables -D OUTPUT -d <nfs server ip> -j DROP

The softlockup occurs in step 4 above.

The previous patch, allows xs_sendpages() to return both a sent count and
any error values that may have occurred. Thus, if we get an -EPERM, return
that to the higher level code.

With this patch in place we can successfully abort the above sequence and
avoid the softlockup.

I also tried the above test case on an nfs mount on tcp and although the system
does not softlockup, I still ended up with the 'hung_task' firing after 120
seconds, due to the i/o being stuck. The tcp case appears a bit harder to fix,
since -EPERM appears to get ignored much lower down in the stack and does not
propogate up to xs_sendpages(). This case is not quite as insidious as the
softlockup and it is not addressed here.
Reported-by: NYigong Lou <ylou@akamai.com>
Signed-off-by: NJason Baron <jbaron@akamai.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

3dedbb5c

rpc: return sent and err from xs_sendpages() · f279cd00

由 Jason Baron 提交于 9月 24, 2014

If an error is returned after the first bits of a packet have already been
successfully queued, xs_sendpages() will return a positive 'int' value
indicating success. Callers seem to treat this as -EAGAIN.

However, there are cases where its not a question of waiting for the write
queue to drain. For example, when there is an iptables rule dropping packets
to the destination, the lower level code can return -EPERM only after parts
of the packet have been successfully queued. In this case, we can end up
continuously retrying resulting in a kernel softlockup.

This patch is intended to make no changes in behavior but is in preparation for
subsequent patches that can make decisions based on both on the number of bytes
sent by xs_sendpages() and any errors that may have be returned.
Signed-off-by: NJason Baron <jbaron@akamai.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

f279cd00

SUNRPC: Don't wake tasks during connection abort · a743419f

由 Benjamin Coddington 提交于 9月 23, 2014

When aborting a connection to preserve source ports, don't wake the task in
xs_error_report. This allows tasks with RPC_TASK_SOFTCONN to succeed if the
connection needs to be re-established since it preserves the task's status
instead of setting it to the status of the aborting kernel_connect().

This may also avoid a potential conflict on the socket's lock.
Signed-off-by: NBenjamin Coddington <bcodding@redhat.com>
Cc: stable@vger.kernel.org # 3.14+
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

a743419f

11 9月, 2014 1 次提交

rpc: xs_bind - do not bind when requesting a random ephemeral port · 0f7a622c

由 Chris Perl 提交于 9月 05, 2014

When attempting to establish a local ephemeral endpoint for a TCP or UDP
socket, do not explicitly call bind, instead let it happen implicilty when the
socket is first used.

The main motivating factor for this change is when TCP runs out of unique
ephemeral ports (i.e.  cannot find any ephemeral ports which are not a part of
*any* TCP connection).  In this situation if you explicitly call bind, then the
call will fail with EADDRINUSE.  However, if you allow the allocation of an
ephemeral port to happen implicitly as part of connect (or other functions),
then ephemeral ports can be reused, so long as the combination of (local_ip,
local_port, remote_ip, remote_port) is unique for TCP sockets on the system.

This doesn't matter for UDP sockets, but it seemed easiest to treat TCP and UDP
sockets the same.

This can allow mount.nfs(8) to continue to function successfully, even in the
face of misbehaving applications which are creating a large number of TCP
connections.
Signed-off-by: NChris Perl <chris.perl@gmail.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

0f7a622c

29 8月, 2014 2 次提交

sunrpc: fix byte-swapping of displayed XID · 71efecb3

由 Chuck Lever 提交于 8月 22, 2014

xprt_lookup_rqst() and bc_send_request() display a byte-swapped XID,
but receive_cb_reply() does not.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

71efecb3

SUNRPC: Fix compile on non-x86 · ae89254d

由 J. Bruce Fields 提交于 8月 20, 2014

current_task appears to be x86-only, oops.

Let's just delete this check entirely:

Any developer that adds a new user without setting rq_task will get a
crash the first time they test it.  I also don't think there are
normally any important locks held here, and I can't see any other reason
why killing a server thread would bring the whole box down.

So the effort to fail gracefully here looks like overkill.
Reported-by: NStephen Rothwell <sfr@canb.auug.org.au>
Fixes: 983c6844 "SUNRPC: get rid of the request wait queue"
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

ae89254d

18 8月, 2014 6 次提交

SUNRPC: Optimise away svc_recv_available · f8d1ff47

由 Trond Myklebust 提交于 8月 03, 2014

We really do not want to do ioctls in the server's fast path. Instead, let's
use the fact that we managed to read a full record as the indicator that
we should try to read the socket again.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

f8d1ff47

SUNRPC: More optimisations of svc_xprt_enqueue() · 0c0746d0

由 Trond Myklebust 提交于 8月 03, 2014

Just move the transport locking out of the spin lock protected area
altogether.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

0c0746d0

SUNRPC: Fix broken kthread_should_stop test in svc_get_next_xprt · a4aa8054

由 Trond Myklebust 提交于 8月 03, 2014

We should definitely not be exiting svc_get_next_xprt() with the
thread enqueued. Fix this by ensuring that we fall through to
the dequeue.
Also move the test itself outside the spin lock protected section.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

a4aa8054

SUNRPC: get rid of the request wait queue · 983c6844

由 Trond Myklebust 提交于 8月 03, 2014

We're always _only_ waking up tasks from within the sp_threads list, so
we know that they are enqueued and alive. The rq_wait waitqueue is just
a distraction with extra atomic semantics.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

983c6844

T
SUNRPC: Do not grab pool->sp_lock unnecessarily in svc_get_next_xprt · 106f359c
由 Trond Myklebust 提交于 8月 03, 2014
```
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
```
106f359c

SUNRPC: Do not override wspace tests in svc_handle_xprt · 9e5b208d

由 Trond Myklebust 提交于 8月 03, 2014

We already determined that there was enough wspace when we
called svc_xprt_enqueue.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

9e5b208d

06 8月, 2014 1 次提交

svcrdma: remove rdma_create_qp() failure recovery logic · d1e458fe

由 Steve Wise 提交于 7月 31, 2014

In svc_rdma_accept(), if rdma_create_qp() fails, there is useless
logic to try and call rdma_create_qp() again with reduced sge depths.
The assumption, I guess, was that perhaps the initial sge depths
chosen were too big.  However they initial depths are selected based
on the rdma device attribute max_sge returned from ib_query_device().
If rdma_create_qp() fails, it would not be because the max_send_sge and
max_recv_sge values passed in exceed the device's max.  So just remove
this code.
Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

d1e458fe

04 8月, 2014 7 次提交

SUNRPC: remove all refcounting of groupinfo from rpcauth_lookupcred · 122a8cda

由 NeilBrown 提交于 8月 04, 2014

current_cred() can only be changed by 'current', and
cred->group_info is never changed.  If a new group_info is
needed, a new 'cred' is created.

Consequently it is always safe to access
   current_cred()->group_info

without taking any further references.
So drop the refcounting and the incorrect rcu_dereference().
Signed-off-by: NNeilBrown <neilb@suse.de>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

122a8cda

sunrpc/auth: allow lockless (rcu) lookup of credential cache. · bd956080

由 NeilBrown 提交于 7月 14, 2014

The new flag RPCAUTH_LOOKUP_RCU to credential lookup avoids locking,
does not take a reference on the returned credential, and returns
-ECHILD if a simple lookup was not possible.

The returned value can only be used within an rcu_read_lock protected
region.

The main user of this is the new rpc_lookup_cred_nonblock() which
returns a pointer to the current credential which is only rcu-safe (no
ref-count held), and might return -ECHILD if allocation was required.
Signed-off-by: NNeilBrown <neilb@suse.de>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

bd956080

sunrpc: remove "ec" argument from encrypt_v2 operation · ec25422c

由 Jeff Layton 提交于 7月 16, 2014

It's always 0.
Signed-off-by: NJeff Layton <jlayton@primarydata.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

ec25422c

sunrpc: clean up sparse endianness warnings in gss_krb5_wrap.c · b36e9c44

由 Jeff Layton 提交于 7月 16, 2014

Fix the endianness handling in gss_wrap_kerberos_v1 and drop the memset
call there in favor of setting the filler bytes directly.

In gss_wrap_kerberos_v2, get rid of the "ec" variable which is always
zero, and drop the endianness conversion of 0. Sparse handles 0 as a
special case, so it's not necessary.
Signed-off-by: NJeff Layton <jlayton@primarydata.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

b36e9c44

sunrpc: clean up sparse endianness warnings in gss_krb5_seal.c · 6ac0fbbf

由 Jeff Layton 提交于 7月 16, 2014

Use u16 pointer in setup_token and setup_token_v2. None of the fields
are actually handled as __be16, so this simplifies the code a bit. Also
get rid of some unneeded pointer increments.
Signed-off-by: NJeff Layton <jlayton@primarydata.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

6ac0fbbf

sunrpc: fix RCU handling of gc_ctx field · c5e6aecd

由 Jeff Layton 提交于 7月 16, 2014

The handling of the gc_ctx pointer only seems to be partially RCU-safe.
The assignment and freeing are done using RCU, but many places in the
code seem to dereference that pointer without proper RCU safeguards.

Fix them to use rcu_dereference and to rcu_read_lock/unlock, and to
properly handle the case where the pointer is NULL.

Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: NJeff Layton <jlayton@primarydata.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

c5e6aecd

SUNRPC: Enforce an upper limit on the number of cached credentials · bae6746f

由 Trond Myklebust 提交于 7月 21, 2014

In some cases where the credentials are not often reused, we may want
to limit their total number just in order to make the negative lookups
in the hash table more manageable.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

bae6746f

01 8月, 2014 15 次提交

xprtrdma: Handle additional connection events · 8079fb78

由 Chuck Lever 提交于 7月 29, 2014

Commit 38ca83a5 added RDMA_CM_EVENT_TIMEWAIT_EXIT. But that status
is relevant only for consumers that re-use their QPs on new
connections. xprtrdma creates a fresh QP on reconnection, so that
event should be explicitly ignored.

Squelch the alarming "unexpected CM event" message.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Tested-by: NSteve Wise <swise@opengridcomputing.com>
Tested-by: NShirley Ma <shirley.ma@oracle.com>
Tested-by: NDevesh Sharma <devesh.sharma@emulex.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

8079fb78

xprtrdma: Remove RPCRDMA_PERSISTENT_REGISTRATION macro · a779ca5f

由 Chuck Lever 提交于 7月 29, 2014

Clean up.

RPCRDMA_PERSISTENT_REGISTRATION was a compile-time switch between
RPCRDMA_REGISTER mode and RPCRDMA_ALLPHYSICAL mode.  Since
RPCRDMA_REGISTER has been removed, there's no need for the extra
conditional compilation.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Tested-by: NSteve Wise <swise@opengridcomputing.com>
Tested-by: NShirley Ma <shirley.ma@oracle.com>
Tested-by: NDevesh Sharma <devesh.sharma@emulex.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

a779ca5f

xprtrdma: Make rpcrdma_ep_disconnect() return void · 282191cb

由 Chuck Lever 提交于 7月 29, 2014

Clean up: The return code is used only for dprintk's that are
already redundant.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Tested-by: NSteve Wise <swise@opengridcomputing.com>
Tested-by: NShirley Ma <shirley.ma@oracle.com>
Tested-by: NDevesh Sharma <devesh.sharma@emulex.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

282191cb

xprtrdma: Schedule reply tasklet once per upcall · bb96193d

由 Chuck Lever 提交于 7月 29, 2014

Minor optimization: grab rpcrdma_tk_lock_g and disable hard IRQs
just once after clearing the receive completion queue.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Tested-by: NSteve Wise <swise@opengridcomputing.com>
Tested-by: NShirley Ma <shirley.ma@oracle.com>
Tested-by: NDevesh Sharma <devesh.sharma@emulex.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

bb96193d

xprtrdma: Allocate each struct rpcrdma_mw separately · 2e84522c

由 Chuck Lever 提交于 7月 29, 2014

Currently rpcrdma_buffer_create() allocates struct rpcrdma_mw's as
a single contiguous area of memory. It amounts to quite a bit of
memory, and there's no requirement for these to be carved from a
single piece of contiguous memory.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Tested-by: NSteve Wise <swise@opengridcomputing.com>
Tested-by: NShirley Ma <shirley.ma@oracle.com>
Tested-by: NDevesh Sharma <devesh.sharma@emulex.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

2e84522c

xprtrdma: Rename frmr_wr · f590e878

由 Chuck Lever 提交于 7月 29, 2014

Clean up: Name frmr_wr after the opcode of the Work Request,
consistent with the send and local invalidation paths.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Tested-by: NSteve Wise <swise@opengridcomputing.com>
Tested-by: NShirley Ma <shirley.ma@oracle.com>
Tested-by: NDevesh Sharma <devesh.sharma@emulex.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

f590e878

xprtrdma: Disable completions for LOCAL_INV Work Requests · dab7e3b8

由 Chuck Lever 提交于 7月 29, 2014

Instead of relying on a completion to change the state of an FRMR
to FRMR_IS_INVALID, set it in advance. If an error occurs, a completion
will fire anyway and mark the FRMR FRMR_IS_STALE.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Tested-by: NSteve Wise <swise@opengridcomputing.com>
Tested-by: NShirley Ma <shirley.ma@oracle.com>
Tested-by: NDevesh Sharma <devesh.sharma@emulex.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

dab7e3b8

xprtrdma: Disable completions for FAST_REG_MR Work Requests · 05055722

由 Chuck Lever 提交于 7月 29, 2014

Instead of relying on a completion to change the state of an FRMR
to FRMR_IS_VALID, set it in advance. If an error occurs, a completion
will fire anyway and mark the FRMR FRMR_IS_STALE.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Tested-by: NSteve Wise <swise@opengridcomputing.com>
Tested-by: NShirley Ma <shirley.ma@oracle.com>
Tested-by: NDevesh Sharma <devesh.sharma@emulex.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

05055722

xprtrdma: Don't post a LOCAL_INV in rpcrdma_register_frmr_external() · 440ddad5

由 Chuck Lever 提交于 7月 29, 2014

Any FRMR arriving in rpcrdma_register_frmr_external() is now
guaranteed to be either invalid, or to be targeted by a queued
LOCAL_INV that will invalidate it before the adapter processes
the FAST_REG_MR being built here.

The problem with current arrangement of chaining a LOCAL_INV to the
FAST_REG_MR is that if the transport is not connected, the LOCAL_INV
is flushed and the FAST_REG_MR is flushed. This leaves the FRMR
valid with the old rkey. But rpcrdma_register_frmr_external() has
already bumped the in-memory rkey.

Next time through rpcrdma_register_frmr_external(), a LOCAL_INV and
FAST_REG_MR is attempted again because the FRMR is still valid. But
the rkey no longer matches the hardware's rkey, and a memory
management operation error occurs.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Tested-by: NSteve Wise <swise@opengridcomputing.com>
Tested-by: NShirley Ma <shirley.ma@oracle.com>
Tested-by: NDevesh Sharma <devesh.sharma@emulex.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

440ddad5

xprtrdma: Reset FRMRs after a flushed LOCAL_INV Work Request · ddb6bebc

由 Chuck Lever 提交于 7月 29, 2014

When a LOCAL_INV Work Request is flushed, it leaves an FRMR in the
VALID state. This FRMR can be returned by rpcrdma_buffer_get(), and
must be knocked down in rpcrdma_register_frmr_external() before it
can be re-used.

Instead, capture these in rpcrdma_buffer_get(), and reset them.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Tested-by: NSteve Wise <swise@opengridcomputing.com>
Tested-by: NShirley Ma <shirley.ma@oracle.com>
Tested-by: NDevesh Sharma <devesh.sharma@emulex.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

ddb6bebc

xprtrdma: Reset FRMRs when FAST_REG_MR is flushed by a disconnect · 9f9d802a

由 Chuck Lever 提交于 7月 29, 2014

FAST_REG_MR Work Requests update a Memory Region's rkey. Rkey's are
used to block unwanted access to the memory controlled by an MR. The
rkey is passed to the receiver (the NFS server, in our case), and is
also used by xprtrdma to invalidate the MR when the RPC is complete.

When a FAST_REG_MR Work Request is flushed after a transport
disconnect, xprtrdma cannot tell whether the WR actually hit the
adapter or not. So it is indeterminant at that point whether the
existing rkey is still valid.

After the transport connection is re-established, the next
FAST_REG_MR or LOCAL_INV Work Request against that MR can sometimes
fail because the rkey value does not match what xprtrdma expects.

The only reliable way to recover in this case is to deregister and
register the MR before it is used again. These operations can be
done only in a process context, so handle it in the transport
connect worker.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Tested-by: NSteve Wise <swise@opengridcomputing.com>
Tested-by: NShirley Ma <shirley.ma@oracle.com>
Tested-by: NDevesh Sharma <devesh.sharma@emulex.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

9f9d802a

xprtrdma: Properly handle exhaustion of the rb_mws list · c2922c02

由 Chuck Lever 提交于 7月 29, 2014

If the rb_mws list is exhausted, clean up and return NULL so that
call_allocate() will delay and try again.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Tested-by: NSteve Wise <swise@opengridcomputing.com>
Tested-by: NShirley Ma <shirley.ma@oracle.com>
Tested-by: NDevesh Sharma <devesh.sharma@emulex.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

c2922c02

xprtrdma: Chain together all MWs in same buffer pool · 3111d72c

由 Chuck Lever 提交于 7月 29, 2014

During connection loss recovery, need to visit every MW in a
buffer pool. Any MW that is in use by an RPC will not be on the
rb_mws list.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Tested-by: NSteve Wise <swise@opengridcomputing.com>
Tested-by: NShirley Ma <shirley.ma@oracle.com>
Tested-by: NDevesh Sharma <devesh.sharma@emulex.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

3111d72c

xprtrdma: Back off rkey when FAST_REG_MR fails · c93e986a

由 Chuck Lever 提交于 7月 29, 2014

If posting a FAST_REG_MR Work Reqeust fails, revert the rkey update
to avoid subsequent IB_WC_MW_BIND_ERR completions.
Suggested-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

c93e986a

xprtrdma: Unclutter struct rpcrdma_mr_seg · 0dbb4108

由 Chuck Lever 提交于 7月 29, 2014

Clean ups:
 - make it obvious that the rl_mw field is a pointer -- allocated
   separately, not as part of struct rpcrdma_mr_seg
 - promote "struct {} frmr;" to a named type
 - promote the state enum to a named type
 - name the MW state field the same way other fields in
   rpcrdma_mw are named
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Tested-by: NSteve Wise <swise@opengridcomputing.com>
Tested-by: NShirley Ma <shirley.ma@oracle.com>
Tested-by: NDevesh Sharma <devesh.sharma@emulex.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

0dbb4108