提交 · 31303d6cbb24ba94e8b82170213bd2fde6365d9a · openeuler / Kernel

08 10月, 2015 5 次提交

SUNRPC: Use MSG_SENDPAGE_NOTLAST in xs_send_pagedata() · 31303d6c

由 Trond Myklebust 提交于 10月 06, 2015

If we're sending more than one page via kernel_sendpage(), then set
MSG_SENDPAGE_NOTLAST between the pages so that we don't send suboptimal
frames (see commit 2f533844 and commit 35f9c09f).
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

31303d6c

SUNRPC: Move AF_LOCAL receive data path into a workqueue context · a2648094

由 Trond Myklebust 提交于 10月 06, 2015

Now that we've done it for TCP and UDP, let's convert AF_LOCAL as well.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

a2648094

SUNRPC: Move UDP receive data path into a workqueue context · f9b2ee71

由 Trond Myklebust 提交于 10月 06, 2015

Now that we've done it for TCP, let's convert UDP as well.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

f9b2ee71

SUNRPC: Move TCP receive data path into a workqueue context · edc1b01c

由 Trond Myklebust 提交于 10月 05, 2015

Stream protocols such as TCP can often build up a backlog of data to be
read due to ordering. Combine this with the fact that some workloads such
as NFS read()-intensive workloads need to receive a lot of data per RPC
call, and it turns out that receiving the data from inside a softirq
context can cause starvation.

The following patch moves the TCP data receive into a workqueue context.
We still end up calling tcp_read_sock(), but we do so from a process
context, meaning that softirqs are enabled for most of the time.

With this patch, I see a doubling of read bandwidth when running a
multi-threaded iozone workload between a virtual client and server setup.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

edc1b01c

SUNRPC: Refactor TCP receive · 66d7a56a

由 Trond Myklebust 提交于 10月 05, 2015

Move the TCP data receive loop out of xs_tcp_data_ready(). Doing so
will allow us to move the data receive out of the softirq context in
a set of followup patches.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

66d7a56a

28 9月, 2015 1 次提交

xprtrdma: disconnect and flush cqs before freeing buffers · 72c02173

由 Steve Wise 提交于 9月 21, 2015

Otherwise a FRMR completion can cause a touch-after-free crash.

In xprt_rdma_destroy(), call rpcrdma_buffer_destroy() only after calling
rpcrdma_ep_destroy().

In rpcrdma_ep_destroy(), disconnect the cm_id first which should flush the
qp, then drain the cqs, then destroy the qp, and finally destroy the cqs.
Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
Tested-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

72c02173

25 9月, 2015 1 次提交

xprtrdma: Replace global lkey with lkey local to PD · bb6c96d7

由 Chuck Lever 提交于 9月 24, 2015

The core API has changed so that devices that do not have a global
DMA lkey automatically create an mr, per-PD, and make that lkey
available. The global DMA lkey interface is going away in favor of
the per-PD DMA lkey.

The per-PD DMA lkey is always available. Convert xprtrdma to use the
device's per-PD DMA lkey for regbufs, no matter which memory
registration scheme is in use.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NSagi Grimberg <sagig@mellanox.com>
Cc: linux-nfs <linux-nfs@vger.kernel.org>
Acked-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

bb6c96d7

23 9月, 2015 1 次提交

userfaultfd: revert "userfaultfd: waitqueue: add nr wake parameter to __wake_up_locked_key" · ac5be6b4

由 Andrea Arcangeli 提交于 9月 22, 2015

This reverts commit 51360155 and adapts
fs/userfaultfd.c to use the old version of that function.

It didn't look robust to call __wake_up_common with "nr == 1" when we
absolutely require wakeall semantics, but we've full control of what we
insert in the two waitqueue heads of the blocked userfaults.  No
exclusive waitqueue risks to be inserted into those two waitqueue heads
so we can as well stick to "nr == 1" of the old code and we can rely
purely on the fact no waitqueue inserted in one of the two waitqueue
heads we must enforce as wakeall, has wait->flags WQ_FLAG_EXCLUSIVE set.
Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com>
Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Shuah Khan <shuahkh@osg.samsung.com>
Cc: Thierry Reding <treding@nvidia.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ac5be6b4

20 9月, 2015 2 次提交

SUNRPC: xs_sock_mark_closed() does not need to trigger socket autoclose · 4b0ab51d

由 Trond Myklebust 提交于 9月 18, 2015

Under all conditions, it should be quite sufficient just to mark
the socket as disconnected. It will then be closed by the
transport shutdown or reconnect code.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

4b0ab51d

SUNRPC: Lock the transport layer on shutdown · 79234c3d

由 Trond Myklebust 提交于 9月 18, 2015

Avoid all races with the connect/disconnect handlers by taking the
transport lock.
Reported-by: N"Suzuki K. Poulose" <suzuki.poulose@arm.com>
Acked-by: NJeff Layton <jlayton@poochiereds.net>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

79234c3d

18 9月, 2015 3 次提交

SUNRPC: Ensure that we wait for connections to complete before retrying · 0fdea1e8

由 Trond Myklebust 提交于 9月 16, 2015

Commit 718ba5b8, moved the responsibility for unlocking the socket to
xs_tcp_setup_socket, meaning that the socket will be unlocked before we
know that it has finished trying to connect. The following patch is based on
an initial patch by Russell King to ensure that we delay clearing the
XPRT_CONNECTING flag until we either know that we failed to initiate
a connection attempt, or the connection attempt itself failed.

Fixes: 718ba5b8 ("SUNRPC: Add helpers to prevent socket create from racing")
Reported-by: NRussell King <linux@arm.linux.org.uk>
Reported-by: NRussell King <rmk+kernel@arm.linux.org.uk>
Tested-by: NRussell King <rmk+kernel@arm.linux.org.uk>
Tested-by: NBenjamin Coddington <bcodding@redhat.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

0fdea1e8

SUNRPC: drop null test before destroy functions · 17a9618e

由 Julia Lawall 提交于 9月 13, 2015

Remove unneeded NULL test.

The semantic patch that makes this change is as follows:
(http://coccinelle.lip6.fr/)

// <smpl>
@@ expression x; @@
-if (x != NULL)
  \(kmem_cache_destroy\|mempool_destroy\|dma_pool_destroy\)(x);
// </smpl>
Signed-off-by: NJulia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

17a9618e

SUNRPC: Fix races between socket connection and destroy code · 03c78827

由 Trond Myklebust 提交于 9月 17, 2015

When we're destroying the socket transport, we need to ensure that
we cancel any existing delayed connection attempts, and order them
w.r.t. the call to xs_close().
Reported-by: N"Suzuki K. Poulose" <suzuki.poulose@arm.com>
Acked-by: NJeff Layton <jlayton@poochiereds.net>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

03c78827

05 9月, 2015 1 次提交

userfaultfd: waitqueue: add nr wake parameter to __wake_up_locked_key · 51360155

由 Andrea Arcangeli 提交于 9月 04, 2015

userfaultfd needs to wake all waitqueues (pass 0 as nr parameter), instead
of the current hardcoded 1 (that would wake just the first waitqueue in
the head list).
Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com>
Acked-by: NPavel Emelyanov <xemul@parallels.com>
Cc: Sanidhya Kashyap <sanidhya.gatech@gmail.com>
Cc: zhang.zhanghailiang@huawei.com
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Andres Lagar-Cavilla <andreslc@google.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Hugh Dickins <hughd@google.com>
Cc: Peter Feiner <pfeiner@google.com>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: "Huangpeng (Peter)" <peter.huangpeng@huawei.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

51360155

31 8月, 2015 3 次提交

IB/core: Make ib_dealloc_pd return void · 7dd78647

由 Jason Gunthorpe 提交于 8月 05, 2015

The majority of callers never check the return value, and even if they
did, they can't do anything about a failure.

All possible failure cases represent a bug in the caller, so just
WARN_ON inside the function instead.

This fixes a few random errors:
 net/rd/iw.c infinite loops while it fails. (racing with EBUSY?)

This also lays the ground work to get rid of error return from the
drivers. Most drivers do not error, the few that do are broken since
it cannot be handled.

Since uverbs can legitimately make use of EBUSY, open code the
check.
Signed-off-by: NJason Gunthorpe <jgunthorpe@obsidianresearch.com>
Reviewed-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

7dd78647

svcrdma: limit FRMR page list lengths to device max · 9ac07501

由 Steve Wise 提交于 8月 07, 2015

Svcrdma was incorrectly allocating fastreg MRs and page lists using
RPCSVC_MAXPAGES, which can exceed the device capabilities. So limit
the depth to the minimum of RPCSVC_MAXPAGES and xprt->sc_frmr_pg_list_len.
Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

9ac07501

xprtrdma, svcrdma: Convert to ib_alloc_mr · 0410e38e

由 Sagi Grimberg 提交于 7月 30, 2015

Signed-off-by: NSagi Grimberg <sagig@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

0410e38e

30 8月, 2015 2 次提交

SUNRPC: Prevent SYN+SYNACK+RST storms · 09939204

由 Trond Myklebust 提交于 8月 29, 2015

Add a shutdown() call before we release the socket in order to ensure the
reset is sent before we try to reconnect.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

09939204

SUNRPC: xs_reset_transport must mark the connection as disconnected · 0c78789e

由 Trond Myklebust 提交于 8月 29, 2015

In case the reconnection attempt fails.

Cc: stable@vger.kernel.org
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

0c78789e

29 8月, 2015 1 次提交

svcrdma: Use max_sge_rd for destination read depths · bc3fe2e3

由 Steve Wise 提交于 7月 27, 2015

Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

bc3fe2e3

20 8月, 2015 1 次提交

SUNRPC: Allow sockets to do GFP_NOIO allocations · c2126157

由 Trond Myklebust 提交于 8月 19, 2015

Follow up to commit c4a7ca77 ("SUNRPC: Allow waiting on memory
allocation"). Allows the RPC socket code to do non-IO blocking.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

c2126157

18 8月, 2015 1 次提交

SUNRPC: Fix a thinko in xs_connect() · 99b1a4c3

由 Trond Myklebust 提交于 8月 13, 2015

It is rather pointless to test the value of transport->inet after
calling xs_reset_transport(), since it will always be zero, and
so we will never see any exponential back off behaviour.
Also don't force early connections for SOFTCONN tasks. If the server
disconnects us, we should respect the exponential backoff.

Cc: stable@vger.kernel.org # 4.0+
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

99b1a4c3

13 8月, 2015 4 次提交

sunrpc: Switch to using hash list instead single list · 129e5824

由 Kinglong Mee 提交于 7月 27, 2015

Switch using list_head for cache_head in cache_detail,
it is useful of remove an cache_head entry directly from cache_detail.

v8, using hash list, not head list
Signed-off-by: NKinglong Mee <kinglongmee@gmail.com>
Reviewed-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

129e5824

sunrpc/nfsd: Remove redundant code by exports seq_operations functions · c8c081b7

由 Kinglong Mee 提交于 7月 27, 2015

Nfsd has implement a site of seq_operations functions as sunrpc's cache.
Just exports sunrpc's codes, and remove nfsd's redundant codes.

v8, same as v6
Signed-off-by: NKinglong Mee <kinglongmee@gmail.com>
Reviewed-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

c8c081b7

sunrpc: Store cache_detail in seq_file's private directly · 9936f2ae

由 Kinglong Mee 提交于 7月 27, 2015

Cleanup.

Just store cache_detail in seq_file's private,
an allocated handle is redundant.

v8, same as v6.
Signed-off-by: NKinglong Mee <kinglongmee@gmail.com>
Reviewed-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

9936f2ae

sunrpc: increase UNX_MAXNODENAME from 32 to __NEW_UTS_LEN bytes · 24a9a961

由 Jeff Layton 提交于 8月 03, 2015

The current limit of 32 bytes artificially limits the name string that
we end up stuffing into NFSv4.x client ID blobs. If you have multiple
hosts with long hostnames that only differ near the end, then this can
cause NFSv4 client ID collisions.

Linux nodenames are actually limited to __NEW_UTS_LEN bytes (64), so use
that as the limit instead. Also, use XDR_QUADLEN to specify the slack
length, just for clarity and in case someone in the future changes this
to something not evenly divisible by 4.
Reported-by: NMichael Skralivetsky <michael.skralivetsky@primarydata.com>
Signed-off-by: NJeff Layton <jeff.layton@primarydata.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

24a9a961

11 8月, 2015 7 次提交

nfsd/sunrpc: factor svc_rqst allocation and freeing from sv_nrthreads refcounting · 1b6dc1df

由 Jeff Layton 提交于 6月 08, 2015

In later patches, we'll want to be able to allocate and free svc_rqst
structures without monkeying with the serv->sv_nrthreads refcount.

Factor those pieces out of their respective functions.
Signed-off-by: NShirley Ma <shirley.ma@oracle.com>
Acked-by: NJeff Layton <jlayton@primarydata.com>
Tested-by: NShirley Ma <shirley.ma@oracle.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

1b6dc1df

nfsd/sunrpc: move pool_mode definitions into svc.h · d70bc0c6

由 Jeff Layton 提交于 6月 08, 2015

In later patches, we're going to need to allow code external to svc.c
to figure out what pool_mode is in use. Move these definitions into
svc.h to prepare for that.

Also, make the svc_pool_map object available and exported so that other
modules can peek in there to get insight into what pool mode is in use.
Likewise, export svc_pool_map_get/put function to make it safe to do so.
Signed-off-by: NShirley Ma <shirley.ma@oracle.com>
Acked-by: NJeff Layton <jlayton@primarydata.com>
Tested-by: NShirley Ma <shirley.ma@oracle.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

d70bc0c6

nfsd/sunrpc: turn enqueueing a svc_xprt into a svc_serv operation · b9e13cdf

由 Jeff Layton 提交于 6月 08, 2015

For now, all services use svc_xprt_do_enqueue, but once we add
workqueue-based service support, we'll need to do something different.
Signed-off-by: NShirley Ma <shirley.ma@oracle.com>
Acked-by: NJeff Layton <jlayton@primarydata.com>
Tested-by: NShirley Ma <shirley.ma@oracle.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

b9e13cdf

nfsd/sunrpc: move sv_module parm into sv_ops · 758f62ff

由 Jeff Layton 提交于 6月 08, 2015

...not technically an operation, but it's more convenient and cleaner
to pass the module pointer in this struct.
Signed-off-by: NShirley Ma <shirley.ma@oracle.com>
Acked-by: NJeff Layton <jlayton@primarydata.com>
Tested-by: NShirley Ma <shirley.ma@oracle.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

758f62ff

nfsd/sunrpc: move sv_function into sv_ops · c369014f

由 Jeff Layton 提交于 6月 08, 2015

Since we now have a container for holding svc_serv operations, move the
sv_function into it as well.
Signed-off-by: NShirley Ma <shirley.ma@oracle.com>
Acked-by: NJeff Layton <jlayton@primarydata.com>
Tested-by: NShirley Ma <shirley.ma@oracle.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

c369014f

nfsd/sunrpc: add a new svc_serv_ops struct and move sv_shutdown into it · ea126e74

由 Jeff Layton 提交于 6月 08, 2015

In later patches we'll need to abstract out more operations on a
per-service level, besides sv_shutdown and sv_function.

Declare a new svc_serv_ops struct to hold these operations, and move
sv_shutdown into this struct.
Signed-off-by: NShirley Ma <shirley.ma@oracle.com>
Acked-by: NJeff Layton <jlayton@primarydata.com>
Tested-by: NShirley Ma <shirley.ma@oracle.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

ea126e74

svcrdma: Change maximum server payload back to RPCSVC_MAXPAYLOAD · cc9a903d

由 Chuck Lever 提交于 8月 07, 2015

Both commit 0380a3f3 ("svcrdma: Add a separate "max data segs"
macro for svcrdma") and commit 7e5be288 ("svcrdma: advertise
the correct max payload") are incorrect. This commit reverts both
changes, restoring the server's maximum payload size to 1MB.

Commit 7e5be288 based the server's maximum payload on the
_client's_ RPCRDMA_MAX_DATA_SEGS value. That was wrong.

Commit 0380a3f3 tried to fix this so that the client maximum
payload size could be raised without affecting the server, but
managed to confuse matters more on the server side.

More importantly, limiting the advertised maximum payload size was
meant to be a workaround, not the actual fix. We need to revisit

  https://bugzilla.linux-nfs.org/show_bug.cgi?id=270

A Linux client on a platform with 64KB pages can overrun and crash
an x86_64 NFS/RDMA server when the r/wsize is 1MB. An x86/64 Linux
client seems to work fine using 1MB reads and writes when the Linux
server's maximum payload size is restored to 1MB.

BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=270
Fixes: 0380a3f3 ("svcrdma: Add a separate "max data segs" macro")
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

cc9a903d

06 8月, 2015 7 次提交

xprtrdma: take HCA driver refcount at client · d0f36c46

由 Devesh Sharma 提交于 8月 03, 2015

This is a rework of the following patch sent almost a year back:
http://www.mail-archive.com/linux-rdma%40vger.kernel.org/msg20730.html

In presence of active mount if someone tries to rmmod vendor-driver, the
command remains stuck forever waiting for destruction of all rdma-cm-id.
in worst case client can crash during shutdown with active mounts.

The existing code assumes that ia->ri_id->device cannot change during
the lifetime of a transport. xprtrdma do not have support for
DEVICE_REMOVAL event either. Lifting that assumption and adding support
for DEVICE_REMOVAL event is a long chain of work, and is in plan.

The community decided that preventing the hang right now is more
important than waiting for architectural changes.

Thus, this patch introduces a temporary workaround to acquire HCA driver
module reference count during the mount of a nfs-rdma mount point.
Signed-off-by: NDevesh Sharma <devesh.sharma@avagotech.com>
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Reviewed-by: NSagi Grimberg <sagig@dev.mellanox.co.il>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

d0f36c46

xprtrdma: Count RDMA_NOMSG type calls · 860477d1

由 Chuck Lever 提交于 8月 03, 2015

RDMA_NOMSG type calls are less efficient than RDMA_MSG. Count NOMSG
calls so administrators can tell if they happen to be used more than
expected.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Tested-by: NDevesh Sharma <devesh.sharma@avagotech.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

860477d1

xprtrdma: Clean up xprt_rdma_print_stats() · 763f7e4e

由 Chuck Lever 提交于 8月 03, 2015

checkpatch.pl complained about the seq_printf() format string split
across lines and the use of %Lu.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Tested-by: NDevesh Sharma <devesh.sharma@avagotech.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

763f7e4e

xprtrdma: Fix large NFS SYMLINK calls · 2fcc213a

由 Chuck Lever 提交于 8月 03, 2015

Repair how rpcrdma_marshal_req() chooses which RDMA message type
to use for large non-WRITE operations so that it picks RDMA_NOMSG
in the correct situations, and sets up the marshaling logic to
SEND only the RPC/RDMA header.

Large NFSv2 SYMLINK requests now use RDMA_NOMSG calls. The Linux NFS
server XDR decoder for NFSv2 SYMLINK does not handle having the
pathname argument arrive in a separate buffer. The decoder could be
fixed, but this is simpler and RDMA_NOMSG can be used in a variety
of other situations.

Ensure that the Linux client continues to use "RDMA_MSG + read
list" when sending large NFSv3 SYMLINK requests, which is more
efficient than using RDMA_NOMSG.

Large NFSv4 CREATE(NF4LNK) requests are changed to use "RDMA_MSG +
read list" just like NFSv3 (see Section 5 of RFC 5667). Before,
these did not work at all.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Tested-by: NDevesh Sharma <devesh.sharma@avagotech.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

2fcc213a

xprtrdma: Fix XDR tail buffer marshalling · 677eb17e

由 Chuck Lever 提交于 8月 03, 2015

Currently xprtrdma appends an extra chunk element to the RPC/RDMA
read chunk list of each NFSv4 WRITE compound. The extra element
contains the final GETATTR operation in the compound.

The result is an extra RDMA READ operation to transfer a very short
piece of each NFS WRITE compound (typically 16 bytes). This is
inefficient.

It is also incorrect.

The client is sending the trailing GETATTR at the same Position as
the preceding WRITE data payload. Whether or not RFC 5667 allows
the GETATTR to appear in a read chunk, RFC 5666 requires that these
two separate RPC arguments appear at two distinct Positions.

It can also be argued that the GETATTR operation is not bulk data,
and therefore RFC 5667 forbids its appearance in a read chunk at
all.

Although RFC 5667 is not precise about when using a read list with
NFSv4 COMPOUND is allowed, the intent is that only data arguments
not touched by NFS (ie, read and write payloads) are to be sent
using RDMA READ or WRITE.

The NFS client constructs GETATTR arguments itself, and therefore is
required to send the trailing GETATTR operation as additional inline
content, not as a data payload.

NB: This change is not backwards compatible. Some older servers do
not accept inline content following the read list. The Linux NFS
server should handle this content correctly as of commit
a97c331f ("svcrdma: Handle additional inline content").
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Tested-by: NDevesh Sharma <devesh.sharma@avagotech.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

677eb17e

xprtrdma: Don't provide a reply chunk when expecting a short reply · 33943b29

由 Chuck Lever 提交于 8月 03, 2015

Currently Linux always offers a reply chunk, even when the reply
can be sent inline (ie. is smaller than 1KB).

On the client, registering a memory region can be expensive. A
server may choose not to use the reply chunk, wasting the cost of
the registration.

This is a change only for RPC replies smaller than 1KB which the
server constructs in the RPC reply send buffer. Because the elements
of the reply must be XDR encoded, a copy-free data transfer has no
benefit in this case.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Reviewed-by: NSagi Grimberg <sagig@mellanox.com>
Tested-by: NDevesh Sharma <devesh.sharma@avagotech.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

33943b29

xprtrdma: Always provide a write list when sending NFS READ · 02eb57d8

由 Chuck Lever 提交于 8月 03, 2015

The client has been setting up a reply chunk for NFS READs that are
smaller than the inline threshold. This is not efficient: both the
server and client CPUs have to copy the reply's data payload into
and out of the memory region that is then transferred via RDMA.

Using the write list, the data payload is moved by the device and no
extra data copying is necessary.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Reviewed-by: NDevesh Sharma <devesh.sharma@avagotech.com>
Reviewed-By: NSagi Grimberg <sagig@mellanox.com>
Tested-by: NDevesh Sharma <devesh.sharma@avagotech.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

02eb57d8

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功