提交 · 26d36301bd653df6481fd38f3e1435a1f15e56d1 · openanolis / cloud-kernel

28 9月, 2016 15 次提交

NFSv4.1: Ensure we call FREE_STATEID if needed on close/delegreturn/locku · 26d36301

由 Trond Myklebust 提交于 9月 22, 2016

If a server returns NFS4ERR_ADMIN_REVOKED, NFS4ERR_DELEG_REVOKED
or NFS4ERR_EXPIRED on a call to close, open_downgrade, delegreturn, or
locku, we should call FREE_STATEID before attempting to recover.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
Tested-by: NOleg Drokin <green@linuxhacker.ru>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

26d36301

NFSv4.1: FREE_STATEID can be asynchronous · f0b0bf88

由 Trond Myklebust 提交于 9月 22, 2016

Nothing should need to be serialised with FREE_STATEID on the client,
so let's make the RPC call always asynchronous. Also constify the
stateid argument.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
Tested-by: NOleg Drokin <green@linuxhacker.ru>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

f0b0bf88

NFSv4.1: Ensure we always run TEST/FREE_STATEID on locks · c5896fc8

由 Trond Myklebust 提交于 9月 22, 2016

Right now, we're only running TEST/FREE_STATEID on the locks if
the open stateid recovery succeeds. The protocol requires us to
always do so.
The fix would be to move the call to TEST/FREE_STATEID and do it
before we attempt open recovery.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
Tested-by: NOleg Drokin <green@linuxhacker.ru>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

c5896fc8

NFSv4.1: Allow revoked stateids to skip the call to TEST_STATEID · f7a62ada

由 Trond Myklebust 提交于 9月 22, 2016

In some cases (e.g. when the SEQ4_STATUS_EXPIRED_ALL_STATE_REVOKED sequence
flag is set) we may already know that the stateid was revoked and that the
only valid operation we can call is FREE_STATEID. In those cases, allow
the stateid to carry the information in the type field, so that we skip
the redundant call to TEST_STATEID.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
Tested-by: NOleg Drokin <green@linuxhacker.ru>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

f7a62ada

NFSv4.1: Don't recheck delegations that have already been checked · 63d63cbf

由 Trond Myklebust 提交于 9月 22, 2016

Ensure we don't spam the server with test_stateid() calls for
delegations that have already been checked.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
Tested-by: NOleg Drokin <green@linuxhacker.ru>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

63d63cbf

NFSv4.1: Deal with server reboots during delegation expiration recovery · bb3d1a3b

由 Trond Myklebust 提交于 9月 22, 2016

Ensure that if the server reboots while we're testing and recovering
from revoked delegations, we exit to allow the state manager to
handle matters.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
Tested-by: NOleg Drokin <green@linuxhacker.ru>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

bb3d1a3b

NFSv4.1: Test delegation stateids when server declares "some state revoked" · 45870d69

由 Trond Myklebust 提交于 9月 22, 2016

According to RFC5661, if any of the SEQUENCE status bits
SEQ4_STATUS_EXPIRED_ALL_STATE_REVOKED,
SEQ4_STATUS_EXPIRED_SOME_STATE_REVOKED, SEQ4_STATUS_ADMIN_STATE_REVOKED,
or SEQ4_STATUS_RECALLABLE_STATE_REVOKED are set, then we need to use
TEST_STATEID to figure out which stateids have been revoked, so we
can acknowledge the loss of state using FREE_STATEID.

While we already do this for open and lock state, we have not been doing
so for all the delegations.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
Tested-by: NOleg Drokin <green@linuxhacker.ru>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

45870d69

NFSv4.x: Allow callers of nfs_remove_bad_delegation() to specify a stateid · 41020b67

由 Trond Myklebust 提交于 9月 22, 2016

Allow the callers of nfs_remove_bad_delegation() to specify the stateid
that needs to be marked as bad.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
Tested-by: NOleg Drokin <green@linuxhacker.ru>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

41020b67

NFSv4.1: Add a helper function to deal with expired stateids · 4586f6e2

由 Trond Myklebust 提交于 9月 22, 2016

In NFSv4.1 and newer, if the server decides to revoke some or all of
the protocol state, the client is required to iterate through all the
stateids that it holds and call TEST_STATEID to determine which stateids
still correspond to valid state, and then call FREE_STATEID on the
others.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
Tested-by: NOleg Drokin <green@linuxhacker.ru>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

4586f6e2

NFSv4.1: Allow test_stateid to handle session errors without waiting · 43912bbb

由 Trond Myklebust 提交于 9月 22, 2016

If the server crashes while we're testing stateids for validity, then
we want to initiate session recovery. Usually, we will be calling from
a state manager thread, though, so we don't really want to wait.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
Tested-by: NOleg Drokin <green@linuxhacker.ru>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

43912bbb

NFSv4.1: Don't check delegations that are already marked as revoked · 4c8e5447

由 Trond Myklebust 提交于 9月 22, 2016

If the delegation has been marked as revoked, we don't have to test
it, because we should already have called FREE_STATEID on it.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
Tested-by: NOlek Drokin <green@linuxhacker.ru>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

4c8e5447

NFSv4: nfs4_copy_delegation_stateid() must fail if the delegation is invalid · aa05c87f

由 Trond Myklebust 提交于 9月 22, 2016

We must not allow the use of delegations that have been revoked or are
being returned.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
Fixes: 869f9dfa ("NFSv4: Fix races between nfs_remove_bad_delegation()...")
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
Cc: stable@vger.kernel.org # v3.19+
Tested-by: NOleg Drokin <green@linuxhacker.ru>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

aa05c87f

NFSv4: Don't report revoked delegations as valid in nfs_have_delegation() · b3f9e723

由 Trond Myklebust 提交于 9月 22, 2016

If the delegation is revoked, then it can't be used for caching.

Fixes: 869f9dfa ("NFSv4: Fix races between nfs_remove_bad_delegation()...")
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
Cc: stable@vger.kernel.org # v3.19+
Tested-by: NOleg Drokin <green@linuxhacker.ru>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

b3f9e723

NFS: Fix inode corruption in nfs_prime_dcache() · 7dc72d5f

由 Trond Myklebust 提交于 9月 22, 2016

Due to inode number reuse in filesystems, we can end up corrupting the
inode on our client if we apply the file attributes without ensuring that
the filehandle matches.
Typical symptoms include spurious "mode changed" reports in the syslog.

We still do want to ensure that we don't invalidate the dentry if the
inode number matches, but we don't have a filehandle.

Fixes: fa923369 ("NFS: Don't require a filehandle to refresh...")
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
Cc: stable@vger.kernel.org # v4.0+
Tested-by: NOleg Drokin <green@linuxhacker.ru>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

7dc72d5f

NFSv4.1: Don't deadlock the state manager on the SEQUENCE status flags · 0a014a44

由 Trond Myklebust 提交于 9月 22, 2016

As described in RFC5661, section 18.46, some of the status flags exist
in order to tell the client when it needs to acknowledge the existence of
revoked state on the server and/or to recover state.
Those flags will then remain set until the recovery procedure is done.

In order to avoid looping, the client therefore needs to ignore
those particular flags while recovering.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
Tested-by: NOleg Drokin <green@linuxhacker.ru>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

0a014a44

23 9月, 2016 15 次提交

xprtrdma: use complete() instead complete_all() · 5690a22d

由 Daniel Wagner 提交于 9月 23, 2016

There is only one waiter for the completion, therefore there
is no need to use complete_all(). Let's make that clear by
using complete() instead of complete_all().

The usage pattern of the completion is:

waiter context                          waker context

frwr_op_unmap_sync()
  reinit_completion()
  ib_post_send()
  wait_for_completion()

					frwr_wc_localinv_wake()
					  complete()
Signed-off-by: NDaniel Wagner <daniel.wagner@bmw-carit.de>
Cc: Anna Schumaker <Anna.Schumaker@Netapp.com>
Cc: Trond Myklebust <trond.myklebust@primarydata.com>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: linux-nfs@vger.kernel.org
Cc: netdev@vger.kernel.org
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

5690a22d

NFS: cache_lib: use complete() instead of complete_all() · 2a446a5d

由 Daniel Wagner 提交于 9月 22, 2016

There is only one waiter for the completion, therefore there
is no need to use complete_all(). Let's make that clear by
using complete() instead of complete_all().

The generic caching code from sunrpc is calling revisit() only once.

The usage pattern of the completion is:

waiter context                          waker context

do_cache_lookup_wait()
  nfs_cache_defer_req_alloc()
    init_completion()
  do_cache_lookup()
  nfs_cache_wait_for_upcall()
    wait_for_completion_timeout()

					nfs_dns_cache_revisit()
					  complete()

  nfs_cache_defer_req_put()
Signed-off-by: NDaniel Wagner <daniel.wagner@bmw-carit.de>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

2a446a5d

NFS: direct: use complete() instead of complete_all() · 024de8f1

由 Daniel Wagner 提交于 9月 22, 2016

There is only one waiter for the completion, therefore there
is no need to use complete_all(). Let's make that clear by
using complete() instead of complete_all().

nfs_file_direct_write() or nfs_file_direct_read() allocated a request
object via nfs_direct_req_alloc(), which initializes the
completion. The request object then is freed later in the exit path.
Between the initialization and the release either
nfs_direct_write_schedule_iovec() resp
nfs_direct_read_schedule_iovec() are called which will asynchronously
process the request. The calling function waits via nfs_direct_wait()
till the async work has been done. Thus there is only one waiter on
the completion.

nfs_direct_pgio_init() and nfs_direct_read_completion() are passed via
function pointers to nfs pageio. The first function does a ref
counting (get_dreq() and put_dreq()) which ensures that
nfs_direct_read_completion() and nfs_direct_read_schedule_iovec() only
call the completion path once.

The usage pattern of the completion is:

waiter context                          waker context

nfs_file_direct_write()
  dreq = nfs_direct_req_alloc()
    init_completion()
  nfs_direct_write_schedule_iovec()
  nfs_direct_wait()
    wait_for_completion_killable()

                                        nfs_direct_write_schedule_work()
                                          nfs_direct_complete()
                                            complete()

nfs_file_direct_read()
  dreq = nfs_direct_req_all()
    init_completion()
  nfs_direct_read_schedule_iovec()
  nfs_direct_wait()
    wait_for_completion_killable()
                                        nfs_direct_read_schedule_iovec()
                                          nfs_direct_complete()
                                            complete()

                                        nfs_direct_read_completion()
                                          nfs_direct_complete()
                                            complete()
Signed-off-by: NDaniel Wagner <daniel.wagner@bmw-carit.de>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

024de8f1

SUNRPC: Fix setting of buffer length in xdr_set_next_buffer() · a6cebd41

由 Trond Myklebust 提交于 9月 20, 2016

Use xdr->nwords to tell us how much buffer remains.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

a6cebd41

SUNRPC: Fix corruption of xdr->nwords in xdr_copy_to_scratch · ace0e14f

由 Trond Myklebust 提交于 9月 20, 2016

When we copy the first part of the data, we need to ensure that value
of xdr->nwords is updated as well. Do so by calling __xdr_inline_decode()
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

ace0e14f

NFS: nfs_prime_dcache must validate the filename · 78d04af4

由 Trond Myklebust 提交于 9月 20, 2016

Before we try to stash it in the dcache, we need to at least check
that the filename passed to us by the server is non-empty and doesn't
contain any illegal '\0' or '/' characters.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

78d04af4

nfs: allow blocking locks to be awoken by lock callbacks · a1d617d8

由 Jeff Layton 提交于 9月 17, 2016

Add a waitqueue head to the client structure. Have clients set a wait
on that queue prior to requesting a lock from the server. If the lock
is blocked, then we can use that to wait for wakeups.

Note that we do need to do this "manually" since we need to set the
wait on the waitqueue prior to requesting the lock, but requesting a
lock can involve activities that can block.

However, only do that for NFSv4.1 locks, either by compiling out
all of the waitqueue handling when CONFIG_NFS_V4_1 is disabled, or
skipping all of it at runtime if we're dealing with v4.0, or v4.1
servers that don't send lock callbacks.

Note too that even when we expect to get a lock callback, RFC5661
section 20.11.4 is pretty clear that we still need to poll for them,
so we do still sleep on a timeout. We do however always poll at the
longest interval in that case.
Signed-off-by: NJeff Layton <jlayton@redhat.com>
[Anna: nfs4_retry_setlk() "status" should default to -ERESTARTSYS]
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

a1d617d8

nfs: move nfs4 lock retry attempt loop to a separate function · d2f3a7f9

由 Jeff Layton 提交于 9月 17, 2016

This also consolidates the waiting logic into a single function,
instead of having it spread across two like it is now.
Signed-off-by: NJeff Layton <jlayton@redhat.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

d2f3a7f9

nfs: move nfs4_set_lock_state call into caller · 1ea67dbd

由 Jeff Layton 提交于 9月 17, 2016

We need to have this info set up before adding the waiter to the
waitqueue, so move this out of the _nfs4_proc_setlk and into the
caller. That's more efficient anyway since we don't need to do
this more than once if we end up waiting on the lock.
Signed-off-by: NJeff Layton <jlayton@redhat.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

1ea67dbd

nfs: add handling for CB_NOTIFY_LOCK in client · db783688

由 Jeff Layton 提交于 9月 17, 2016

For now, the callback doesn't do anything. Support for that will be
added in later patches.
Signed-off-by: NJeff Layton <jlayton@redhat.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

db783688

nfs: track whether server sets MAY_NOTIFY_LOCK flag · a8ce377a

由 Jeff Layton 提交于 9月 17, 2016

We want to handle the two cases differently, such that we poll more
aggressively when we don't expect a callback.
Signed-off-by: NJeff Layton <jlayton@redhat.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

a8ce377a

nfs: add a new NFS4_OPEN_RESULT_MAY_NOTIFY_LOCK constant · eed7c414

由 Jeff Layton 提交于 9月 17, 2016

As defined in RFC 5661, section 18.16.
Signed-off-by: NJeff Layton <jlayton@redhat.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

eed7c414

nfs: use safe, interruptible sleeps when waiting to retry LOCK · 66f570ab

由 Jeff Layton 提交于 9月 17, 2016

We actually want to use TASK_INTERRUPTIBLE sleeps when we're in the
process of polling for a NFSv4 lock. If there is a signal pending when
the task wakes up, then we'll be returning an error anyway. So, we might
as well wake up immediately for non-fatal signals as well. That allows
us to return to userland more quickly in that case, but won't change the
error that userland sees.

Also, there is no need to use the *_unsafe sleep variants here, as no
vfs-layer locks should be held at this point.
Signed-off-by: NJeff Layton <jlayton@redhat.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

66f570ab

J
nfs: eliminate pointless and confusing do_vfs_lock wrappers · 75575ddf
由 Jeff Layton 提交于 9月 17, 2016
```
Signed-off-by: NJeff Layton <jlayton@redhat.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
```
75575ddf

nfs: the length argument to read_buf should be unsigned · b60475c9

由 Jeff Layton 提交于 9月 17, 2016

Since it gets passed through to xdr_inline_decode, we might as well
have read_buf expect what it expects -- a size_t.
Signed-off-by: NJeff Layton <jlayton@redhat.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

b60475c9

20 9月, 2016 10 次提交

nfs: cover ->migratepage with CONFIG_MIGRATION · f844cd0d

由 Chao Yu 提交于 9月 20, 2016

It will be more clean to use CONFIG_MIGRATION to cover nfs' private
.migratepage in nfs_file_aops like we do in other part of nfs
operations.
Signed-off-by: NChao Yu <yuchao0@huawei.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

f844cd0d

sunrpc: fix write space race causing stalls · d48f9ce7

由 David Vrabel 提交于 9月 19, 2016

Write space becoming available may race with putting the task to sleep
in xprt_wait_for_buffer_space().  The existing mechanism to avoid the
race does not work.

This (edited) partial trace illustrates the problem:

   [1] rpc_task_run_action: task:43546@5 ... action=call_transmit
   [2] xs_write_space <-xs_tcp_write_space
   [3] xprt_write_space <-xs_write_space
   [4] rpc_task_sleep: task:43546@5 ...
   [5] xs_write_space <-xs_tcp_write_space

[1] Task 43546 runs but is out of write space.

[2] Space becomes available, xs_write_space() clears the
    SOCKWQ_ASYNC_NOSPACE bit.

[3] xprt_write_space() attemts to wake xprt->snd_task (== 43546), but
    this has not yet been queued and the wake up is lost.

[4] xs_nospace() is called which calls xprt_wait_for_buffer_space()
    which queues task 43546.

[5] The call to sk->sk_write_space() at the end of xs_nospace() (which
    is supposed to handle the above race) does not call
    xprt_write_space() as the SOCKWQ_ASYNC_NOSPACE bit is clear and
    thus the task is not woken.

Fix the race by resetting the SOCKWQ_ASYNC_NOSPACE bit in xs_nospace()
so the second call to sk->sk_write_space() calls xprt_write_space().
Suggested-by: NTrond Myklebust <trondmy@primarydata.com>
Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
cc: stable@vger.kernel.org # 4.4
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

d48f9ce7

pnfs: add a new mechanism to select a layout driver according to an ordered list · ca440c38

由 Jeff Layton 提交于 9月 15, 2016

Currently, the layout driver selection code always chooses the first one
from the list. That's not really ideal however, as the server can send
the list of layout types in any order that it likes. It's up to the
client to select the best one for its needs.

This patch adds an ordered list of preferred driver types and has the
selection code sort the list of available layout drivers according to it.
Any unrecognized layout type is sorted to the end of the list.

For now, the order of preference is hardcoded, but it should be possible
to make this configurable in the future.
Signed-off-by: NJeff Layton <jlayton@redhat.com>
Reviewed-by: NJ. Bruce Fields <bfields@fieldses.org>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

ca440c38

xprtrdma: Eliminate rpcrdma_receive_worker() · 496b77a5

由 Chuck Lever 提交于 9月 15, 2016

Clean up: the extra layer of indirection doesn't add value.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

496b77a5

xprtrdma: Rename rpcrdma_receive_wc() · 1519e969

由 Chuck Lever 提交于 9月 15, 2016

Clean up: When converting xprtrdma to use the new CQ API, I missed a
spot. The naming convention elsewhere is:

  {svc_rdma,rpcrdma}_wc_{operation}
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

1519e969

xprtrmda: Report address of frmr, not mw · eeb30613

由 Chuck Lever 提交于 9月 15, 2016

Tie frwr debugging messages together by always reporting the address
of the frwr.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

eeb30613

xprtrdma: Support larger inline thresholds · 44829d02

由 Chuck Lever 提交于 9月 15, 2016

The Version One default inline threshold is still 1KB. But allow
testing with thresholds up to 64KB.

This maximum is somewhat arbitrary. There's no fundamental
architectural limit I'm aware of, but it's good to keep the size of
Receive buffers reasonable. Now that Send can use a s/g list, a
Send buffer is only as large as each RPC requires. Receive buffers
are always the size of the inline threshold, however.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

44829d02

xprtrdma: Use gathered Send for large inline messages · 655fec69

由 Chuck Lever 提交于 9月 15, 2016

An RPC Call message that is sent inline but that has a data payload
(ie, one or more items in rq_snd_buf's page list) must be "pulled
up:"

- call_allocate has to reserve enough RPC Call buffer space to
accommodate the data payload

- call_transmit has to memcopy the rq_snd_buf's page list and tail
into its head iovec before it is sent

As the inline threshold is increased beyond its current 1KB default,
however, this means data payloads of more than a few KB are copied
by the host CPU. For example, if the inline threshold is increased
just to 4KB, then NFS WRITE requests up to 4KB would involve a
memcpy of the NFS WRITE's payload data into the RPC Call buffer.
This is an undesirable amount of participation by the host CPU.

The inline threshold may be much larger than 4KB in the future,
after negotiation with a peer server.

Instead of copying the components of rq_snd_buf into its head iovec,
construct a gather list of these components, and send them all in
place. The same approach is already used in the Linux server's
RPC-over-RDMA reply path.

This mechanism also eliminates the need for rpcrdma_tail_pullup,
which is used to manage the XDR pad and trailing inline content when
a Read list is present.

This requires that the pages in rq_snd_buf's page list be DMA-mapped
during marshaling, and unmapped when a data-bearing RPC is
completed. This is slightly less efficient for very small I/O
payloads, but significantly more efficient as data payload size and
inline threshold increase past a kilobyte.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

655fec69

xprtrdma: Basic support for Remote Invalidation · c8b920bb

由 Chuck Lever 提交于 9月 15, 2016

Have frwr's ro_unmap_sync recognize an invalidated rkey that appears
as part of a Receive completion. Local invalidation can be skipped
for that rkey.

Use an out-of-band signaling mechanism to indicate to the server
that the client is prepared to receive RDMA Send With Invalidate.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

c8b920bb

xprtrdma: Client-side support for rpcrdma_connect_private · 87cfb9a0

由 Chuck Lever 提交于 9月 15, 2016

Send an RDMA-CM private message on connect, and look for one during
a connection-established event.

Both sides can communicate their various implementation limits.
Implementations that don't support this sideband protocol ignore it.

Once the client knows the server's inline threshold maxima, it can
adjust the use of Reply chunks, and eliminate most use of Position
Zero Read chunks. Moderately-sized I/O can be done using a pure
inline RDMA Send instead of RDMA operations that require memory
registration.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

87cfb9a0

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功