提交 · b5f0afbea4f2ea52c613ac2b06cb6de2ea18cb6d · openeuler / Kernel

11 2月, 2017 2 次提交

xprtrdma: Per-connection pad optimization · b5f0afbe

由 Chuck Lever 提交于 2月 08, 2017

Pad optimization is changed by echoing into
/proc/sys/sunrpc/rdma_pad_optimize. This is a global setting,
affecting all RPC-over-RDMA connections to all servers.

The marshaling code picks up that value and uses it for decisions
about how to construct each RPC-over-RDMA frame. Having it change
suddenly in mid-operation can result in unexpected failures. And
some servers a client mounts might need chunk round-up, while
others don't.

So instead, copy the pad_optimize setting into each connection's
rpcrdma_ia when the transport is created, and use the copy, which
can't change during the life of the connection, instead.

This also removes a hack: rpcrdma_convert_iovs was using
the remote-invalidation-expected flag to predict when it could leave
out Write chunk padding. This is because the Linux server handles
implicit XDR padding on Write chunks correctly, and only Linux
servers can set the connection's remote-invalidation-expected flag.

It's more sensible to use the pad optimization setting instead.

Fixes: 677eb17e ("xprtrdma: Fix XDR tail buffer marshalling")
Cc: stable@vger.kernel.org # v4.9+
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

b5f0afbe

xprtrdma: Fix Read chunk padding · 24abdf1b

由 Chuck Lever 提交于 2月 08, 2017

When pad optimization is disabled, rpcrdma_convert_iovs still
does not add explicit XDR round-up padding to a Read chunk.

Commit 677eb17e ("xprtrdma: Fix XDR tail buffer marshalling")
incorrectly short-circuited the test for whether round-up padding
is needed that appears later in rpcrdma_convert_iovs.

However, if this is indeed a regular Read chunk (and not a
Position-Zero Read chunk), the tail iovec _always_ contains the
chunk's padding, and never anything else.

So, it's easy to just skip the tail when padding optimization is
enabled, and add the tail in a subsequent Read chunk segment, if
disabled.

Fixes: 677eb17e ("xprtrdma: Fix XDR tail buffer marshalling")
Cc: stable@vger.kernel.org # v4.9+
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

24abdf1b

10 2月, 2017 4 次提交

NFSv4: Set the connection timeout to match the lease period · 26ae102f

由 Trond Myklebust 提交于 2月 08, 2017

Set the timeout for TCP connections to be 1 lease period to ensure
that we don't lose our lease due to a faulty TCP connection.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

26ae102f

SUNRPC: Allow changing of the TCP timeout parameters on the fly · 7196dbb0

由 Trond Myklebust 提交于 2月 08, 2017

When the NFSv4 server tells us the lease period, we usually want
to adjust down the timeout parameters on the TCP connection to
ensure that we don't miss lease renewals due to a faulty connection.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

7196dbb0

SUNRPC: Refactor TCP socket timeout code into a helper function · 8d1b8c62

由 Trond Myklebust 提交于 2月 08, 2017

Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

8d1b8c62

SUNRPC: Remove unused function rpc_get_timeout() · d23bb113

由 Trond Myklebust 提交于 2月 08, 2017

Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

d23bb113

09 2月, 2017 7 次提交

sunrpc: use simple_read_from_buffer for reading cache flush · 8ccc8691

由 Kinglong Mee 提交于 2月 07, 2017

Signed-off-by: NKinglong Mee <kinglongmee@gmail.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

8ccc8691

sunrpc: record rpc client pointer in seq->private directly · 3f373e81

由 Kinglong Mee 提交于 2月 07, 2017

pos in rpc_clnt_iter is useless, drop it and record clnt in seq_private.
Signed-off-by: NKinglong Mee <kinglongmee@gmail.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

3f373e81

sunrpc: update the comments of sunrpc proc path · 6489a8f4

由 Kinglong Mee 提交于 2月 07, 2017

Signed-off-by: NKinglong Mee <kinglongmee@gmail.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

6489a8f4

sunrpc: remove dead codes of cr_magic in rpc_cred · af4926e5

由 Kinglong Mee 提交于 2月 07, 2017

Don't found any place using the cr_magic.
Signed-off-by: NKinglong Mee <kinglongmee@gmail.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

af4926e5

sunrpc: rename NFS_NGROUPS to UNX_NGROUPS for auth unix · 5786461b

由 Kinglong Mee 提交于 2月 07, 2017

NFS_NGROUPS has been move to sunrpc, rename to UNX_NGROUPS.
Signed-off-by: NKinglong Mee <kinglongmee@gmail.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

5786461b

sunrpc/nfs: cleanup procfs/pipefs entry in cache_detail · 863d7d9c

由 Kinglong Mee 提交于 2月 07, 2017

Record flush/channel/content entries is useless, remove them.
Signed-off-by: NKinglong Mee <kinglongmee@gmail.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

863d7d9c

sunrpc: error out if register_shrinker fail · 2864486b

由 Kinglong Mee 提交于 2月 07, 2017

register_shrinker may return error when register fail, error out.
Signed-off-by: NKinglong Mee <kinglongmee@gmail.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

2864486b

31 1月, 2017 1 次提交

SUNRPC: two small improvements to rpcauth shrinker. · 4c3ffd05

由 NeilBrown 提交于 1月 06, 2017

1/ If we find an entry that is too young to be pruned,
return SHRINK_STOP to ensure we don't get called again.
This is more correct, and avoids wasting a little CPU time.
Prior to 3.12, it can prevent drop_slab() from spinning indefinitely.

2/ Return a precise number from rpcauth_cache_shrink_count(), rather than
rounding down to a multiple of 100 (of whatever sysctl_vfs_cache_pressure is).
This ensures that when we "echo 3 > /proc/sys/vm/drop_caches", this cache is
still purged, even if it has fewer than 100 entires.

Neither of these are really important, they just make behaviour
more predicatable, which can be helpful when debugging related issues.
Signed-off-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

4c3ffd05

25 1月, 2017 1 次提交

SUNRPC: cleanup ida information when removing sunrpc module · c929ea0b

由 Kinglong Mee 提交于 1月 20, 2017

After removing sunrpc module, I get many kmemleak information as,
unreferenced object 0xffff88003316b1e0 (size 544):
  comm "gssproxy", pid 2148, jiffies 4294794465 (age 4200.081s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<ffffffffb0cfb58a>] kmemleak_alloc+0x4a/0xa0
    [<ffffffffb03507fe>] kmem_cache_alloc+0x15e/0x1f0
    [<ffffffffb0639baa>] ida_pre_get+0xaa/0x150
    [<ffffffffb0639cfd>] ida_simple_get+0xad/0x180
    [<ffffffffc06054fb>] nlmsvc_lookup_host+0x4ab/0x7f0 [lockd]
    [<ffffffffc0605e1d>] lockd+0x4d/0x270 [lockd]
    [<ffffffffc06061e5>] param_set_timeout+0x55/0x100 [lockd]
    [<ffffffffc06cba24>] svc_defer+0x114/0x3f0 [sunrpc]
    [<ffffffffc06cbbe7>] svc_defer+0x2d7/0x3f0 [sunrpc]
    [<ffffffffc06c71da>] rpc_show_info+0x8a/0x110 [sunrpc]
    [<ffffffffb044a33f>] proc_reg_write+0x7f/0xc0
    [<ffffffffb038e41f>] __vfs_write+0xdf/0x3c0
    [<ffffffffb0390f1f>] vfs_write+0xef/0x240
    [<ffffffffb0392fbd>] SyS_write+0xad/0x130
    [<ffffffffb0d06c37>] entry_SYSCALL_64_fastpath+0x1a/0xa9
    [<ffffffffffffffff>] 0xffffffffffffffff

I found, the ida information (dynamic memory) isn't cleanup.
Signed-off-by: NKinglong Mee <kinglongmee@gmail.com>
Fixes: 2f048db4 ("SUNRPC: Add an identifier for struct rpc_clnt")
Cc: stable@vger.kernel.org # v3.12+
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

c929ea0b

13 1月, 2017 3 次提交

svcrdma: avoid duplicate dma unmapping during error recovery · ce1ca7d2

由 Sriharsha Basavapatna 提交于 1月 09, 2017

In rdma_read_chunk_frmr() when ib_post_send() fails, the error code path
invokes ib_dma_unmap_sg() to unmap the sg list. It then invokes
svc_rdma_put_frmr() which in turn tries to unmap the same sg list through
ib_dma_unmap_sg() again. This second unmap is invalid and could lead to
problems when the iova being unmapped is subsequently reused. Remove
the call to unmap in rdma_read_chunk_frmr() and let svc_rdma_put_frmr()
handle it.

Fixes: 412a15c0 ("svcrdma: Port to new memory registration API")
Cc: stable@vger.kernel.org
Signed-off-by: NSriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Reviewed-by: NChuck Lever <chuck.lever@oracle.com>
Reviewed-by: NYuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

ce1ca7d2

sunrpc: don't call sleeping functions from the notifier block callbacks · 546125d1

由 Scott Mayhew 提交于 1月 05, 2017

The inet6addr_chain is an atomic notifier chain, so we can't call
anything that might sleep (like lock_sock)... instead of closing the
socket from svc_age_temp_xprts_now (which is called by the notifier
function), just have the rpc service threads do it instead.

Cc: stable@vger.kernel.org
Fixes: c3d4879e "sunrpc: Add a function to close..."
Signed-off-by: NScott Mayhew <smayhew@redhat.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

546125d1

svcrpc: don't leak contexts on PROC_DESTROY · 78794d18

由 J. Bruce Fields 提交于 1月 09, 2017

Context expiry times are in units of seconds since boot, not unix time.

The use of get_seconds() here therefore sets the expiry time decades in
the future.  This prevents timely freeing of contexts destroyed by
client RPC_GSS_PROC_DESTROY requests.  We'd still free them eventually
(when the module is unloaded or the container shut down), but a lot of
contexts could pile up before then.

Cc: stable@vger.kernel.org
Fixes: c5b29f88 "sunrpc: use seconds since boot in expiry cache"
Reported-by: NAndy Adamson <andros@netapp.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

78794d18

26 12月, 2016 1 次提交

ktime: Get rid of the union · 2456e855

由 Thomas Gleixner 提交于 12月 25, 2016

ktime is a union because the initial implementation stored the time in
scalar nanoseconds on 64 bit machine and in a endianess optimized timespec
variant for 32bit machines. The Y2038 cleanup removed the timespec variant
and switched everything to scalar nanoseconds. The union remained, but
become completely pointless.

Get rid of the union and just keep ktime_t as simple typedef of type s64.

The conversion was done with coccinelle and some manual mopping up.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>

2456e855

25 12月, 2016 1 次提交

Replace <asm/uaccess.h> with <linux/uaccess.h> globally · 7c0f6ba6

由 Linus Torvalds 提交于 12月 24, 2016

This was entirely automated, using the script by Al:

  PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*<asm/uaccess.h>'
  sed -i -e "s!$PATT!#include <linux/uaccess.h>!" \
        $(git grep -l "$PATT"|grep -v ^include/linux/uaccess.h)

to do the replacement at the end of the merge window.
Requested-by: NAl Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

7c0f6ba6

10 12月, 2016 1 次提交

SUNRPC: fix refcounting problems with auth_gss messages. · 1cded9d2

由 NeilBrown 提交于 12月 05, 2016

There are two problems with refcounting of auth_gss messages.

First, the reference on the pipe->pipe list (taken by a call
to rpc_queue_upcall()) is not counted.  It seems to be
assumed that a message in pipe->pipe will always also be in
pipe->in_downcall, where it is correctly reference counted.

However there is no guaranty of this.  I have a report of a
NULL dereferences in rpc_pipe_read() which suggests a msg
that has been freed is still on the pipe->pipe list.

One way I imagine this might happen is:
- message is queued for uid=U and auth->service=S1
- rpc.gssd reads this message and starts processing.
  This removes the message from pipe->pipe
- message is queued for uid=U and auth->service=S2
- rpc.gssd replies to the first message. gss_pipe_downcall()
  calls __gss_find_upcall(pipe, U, NULL) and it finds the
  *second* message, as new messages are placed at the head
  of ->in_downcall, and the service type is not checked.
- This second message is removed from ->in_downcall and freed
  by gss_release_msg() (even though it is still on pipe->pipe)
- rpc.gssd tries to read another message, and dereferences a pointer
  to this message that has just been freed.

I fix this by incrementing the reference count before calling
rpc_queue_upcall(), and decrementing it if that fails, or normally in
gss_pipe_destroy_msg().

It seems strange that the reply doesn't target the message more
precisely, but I don't know all the details.  In any case, I think the
reference counting irregularity became a measureable bug when the
extra arg was added to __gss_find_upcall(), hence the Fixes: line
below.

The second problem is that if rpc_queue_upcall() fails, the new
message is not freed. gss_alloc_msg() set the ->count to 1,
gss_add_msg() increments this to 2, gss_unhash_msg() decrements to 1,
then the pointer is discarded so the memory never gets freed.

Fixes: 9130b8db ("SUNRPC: allow for upcalls for same uid but different gss service")
Cc: stable@vger.kernel.org
Link: https://bugzilla.opensuse.org/show_bug.cgi?id=1011250Signed-off-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

1cded9d2

07 12月, 2016 1 次提交

sunrpc: use DEFINE_SPINLOCK() · 3eb15f28

由 Fabian Frederick 提交于 12月 04, 2016

Signed-off-by: NFabian Frederick <fabf@skynet.be>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

3eb15f28

02 12月, 2016 1 次提交

sunrpc: Don't engage exponential backoff when connection attempt is rejected. · 2c2ee6d2

由 NeilBrown 提交于 11月 23, 2016

xs_connect() contains an exponential backoff mechanism so the repeated
connection attempts are delayed by longer and longer amounts.

This is appropriate when the connection failed due to a timeout, but
it not appropriate when a definitive "no" answer is received.  In such
cases, call_connect_status() imposes a minimum 3-second back-off, so
not having the exponetial back-off will never result in immediate
retries.

The current situation is a problem when the NFS server tries to
register with rpcbind but rpcbind isn't running.  All connection
attempts are made on the same "xprt" and as the connection is never
"closed", the exponential back delays successive attempts to register,
or de-register, different protocols.  This results in a multi-minute
delay with no benefit.

So, when call_connect_status() receives a definitive "no", use
xprt_conditional_disconnect() to cancel the previous connection attempt.
This will set XPRT_CLOSE_WAIT so that xprt->ops->close() calls xs_close()
which resets the reestablish_timeout.

To ensure xprt_conditional_disconnect() does the right thing, we
ensure that rq_connect_cookie is set before a connection attempt, and
allow xprt_conditional_disconnect() to complete even when the
transport is not fully connected.
Signed-off-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

2c2ee6d2

01 12月, 2016 10 次提交

svcrdma: Further clean-up of svc_rdma_get_inv_rkey() · fafedf81