提交 · 0afa6b4412988019db14c6bfb8c6cbdf120ca9ad · openeuler / Kernel

09 2月, 2018 1 次提交

SUNRPC: Don't call __UDPX_INC_STATS() from a preemptible context · 0afa6b44

由 Trond Myklebust 提交于 2月 09, 2018

Calling __UDPX_INC_STATS() from a preemptible context leads to a
warning of the form:

 BUG: using __this_cpu_add() in preemptible [00000000] code: kworker/u5:0/31
 caller is xs_udp_data_receive_workfn+0x194/0x270
 CPU: 1 PID: 31 Comm: kworker/u5:0 Not tainted 4.15.0-rc8-00076-g90ea9f1b #2
 Workqueue: xprtiod xs_udp_data_receive_workfn
 Call Trace:
  dump_stack+0x85/0xc1
  check_preemption_disabled+0xce/0xe0
  xs_udp_data_receive_workfn+0x194/0x270
  process_one_work+0x318/0x620
  worker_thread+0x20a/0x390
  ? process_one_work+0x620/0x620
  kthread+0x120/0x130
  ? __kthread_bind_mask+0x60/0x60
  ret_from_fork+0x24/0x30

Since we're taking a spinlock in those functions anyway, let's fix the
issue by moving the call so that it occurs under the spinlock.
Reported-by: Nkernel test robot <fengguang.wu@intel.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

0afa6b44

06 2月, 2018 1 次提交

SUNRPC: Ensure we always close the socket after a connection shuts down · 9b30889c

由 Trond Myklebust 提交于 2月 05, 2018

Ensure that we release the TCP socket once it is in the TCP_CLOSE or
TCP_TIME_WAIT state (and only then) so that we don't confuse rkhunter
and its ilk.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

9b30889c

15 1月, 2018 2 次提交

SUNRPC: Add explicit rescheduling points in the receive path · 0af3442a

由 Trond Myklebust 提交于 1月 14, 2018

When reading the reply from the server, insert an explicit
cond_resched() to avoid starving higher priority tasks.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

0af3442a

SUNRPC: Chunk reading of replies from the server · 3d188805

由 Trond Myklebust 提交于 1月 14, 2018

Read the TCP data in chunks of max 2MB so that we do not hog the
socket lock.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

3d188805

01 12月, 2017 1 次提交

SUNRPC: Handle ENETDOWN errors · eb5b46fa

由 Trond Myklebust 提交于 11月 30, 2017

Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

eb5b46fa

30 11月, 2017 1 次提交

SUNRPC: Allow connect to return EHOSTUNREACH · 4ba161a7

由 Trond Myklebust 提交于 11月 24, 2017

Reported-by: NDmitry Vyukov <dvyukov@google.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
Tested-by: NDmitry Vyukov <dvyukov@google.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

4ba161a7

18 11月, 2017 1 次提交

net: sunrpc: mark expected switch fall-throughs · e9d47639

由 Gustavo A. R. Silva 提交于 10月 20, 2017

In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.
Signed-off-by: NGustavo A. R. Silva <garsilva@embeddedor.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

e9d47639

02 11月, 2017 1 次提交

License cleanup: add SPDX GPL-2.0 license identifier to files with no license · b2441318

由 Greg Kroah-Hartman 提交于 11月 01, 2017

Many source files in the tree are missing licensing information, which
makes it harder for compliance tools to determine the correct license.

By default all files without license information are under the default
license of the kernel, which is GPL version 2.

Update the files which contain no license information with the 'GPL-2.0'
SPDX license identifier.  The SPDX identifier is a legally binding
shorthand, which can be used instead of the full boiler plate text.

This patch is based on work done by Thomas Gleixner and Kate Stewart and
Philippe Ombredanne.

How this work was done:

Patches were generated and checked against linux-4.14-rc6 for a subset of
the use cases:
 - file had no licensing information it it.
 - file was a */uapi/* one with no licensing information in it,
 - file was a */uapi/* one with existing licensing information,

Further patches will be generated in subsequent months to fix up cases
where non-standard license headers were used, and references to license
had to be inferred by heuristics based on keywords.

The analysis to determine which SPDX License Identifier to be applied to
a file was done in a spreadsheet of side by side results from of the
output of two independent scanners (ScanCode & Windriver) producing SPDX
tag:value files created by Philippe Ombredanne.  Philippe prepared the
base worksheet, and did an initial spot review of a few 1000 files.

The 4.13 kernel was the starting point of the analysis with 60,537 files
assessed.  Kate Stewart did a file by file comparison of the scanner
results in the spreadsheet to determine which SPDX license identifier(s)
to be applied to the file. She confirmed any determination that was not
immediately clear with lawyers working with the Linux Foundation.

Criteria used to select files for SPDX license identifier tagging was:
 - Files considered eligible had to be source code files.
 - Make and config files were included as candidates if they contained >5
   lines of source
 - File already had some variant of a license header in it (even if <5
   lines).

All documentation files were explicitly excluded.

The following heuristics were used to determine which SPDX license
identifiers to apply.

 - when both scanners couldn't find any license traces, file was
   considered to have no license information in it, and the top level
   COPYING file license applied.

   For non */uapi/* files that summary was:

   SPDX license identifier                            # files
   ---------------------------------------------------|-------
   GPL-2.0                                              11139

   and resulted in the first patch in this series.

   If that file was a */uapi/* path one, it was "GPL-2.0 WITH
   Linux-syscall-note" otherwise it was "GPL-2.0".  Results of that was:

   SPDX license identifier                            # files
   ---------------------------------------------------|-------
   GPL-2.0 WITH Linux-syscall-note                        930

   and resulted in the second patch in this series.

 - if a file had some form of licensing information in it, and was one
   of the */uapi/* ones, it was denoted with the Linux-syscall-note if
   any GPL family license was found in the file or had no licensing in
   it (per prior point).  Results summary:

   SPDX license identifier                            # files
   ---------------------------------------------------|------
   GPL-2.0 WITH Linux-syscall-note                       270
   GPL-2.0+ WITH Linux-syscall-note                      169
   ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause)    21
   ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause)    17
   LGPL-2.1+ WITH Linux-syscall-note                      15
   GPL-1.0+ WITH Linux-syscall-note                       14
   ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause)    5
   LGPL-2.0+ WITH Linux-syscall-note                       4
   LGPL-2.1 WITH Linux-syscall-note                        3
   ((GPL-2.0 WITH Linux-syscall-note) OR MIT)              3
   ((GPL-2.0 WITH Linux-syscall-note) AND MIT)             1

   and that resulted in the third patch in this series.

 - when the two scanners agreed on the detected license(s), that became
   the concluded license(s).

 - when there was disagreement between the two scanners (one detected a
   license but the other didn't, or they both detected different
   licenses) a manual inspection of the file occurred.

 - In most cases a manual inspection of the information in the file
   resulted in a clear resolution of the license that should apply (and
   which scanner probably needed to revisit its heuristics).

 - When it was not immediately clear, the license identifier was
   confirmed with lawyers working with the Linux Foundation.

 - If there was any question as to the appropriate license identifier,
   the file was flagged for further research and to be revisited later
   in time.

In total, over 70 hours of logged manual review was done on the
spreadsheet to determine the SPDX license identifiers to apply to the
source files by Kate, Philippe, Thomas and, in some cases, confirmation
by lawyers working with the Linux Foundation.

Kate also obtained a third independent scan of the 4.13 code base from
FOSSology, and compared selected files where the other two scanners
disagreed against that SPDX file, to see if there was new insights.  The
Windriver scanner is based on an older version of FOSSology in part, so
they are related.

Thomas did random spot checks in about 500 files from the spreadsheets
for the uapi headers and agreed with SPDX license identifier in the
files he inspected. For the non-uapi files Thomas did random spot checks
in about 15000 files.

In initial set of patches against 4.14-rc6, 3 files were found to have
copy/paste license identifier errors, and have been fixed to reflect the
correct identifier.

Additionally Philippe spent 10 hours this week doing a detailed manual
inspection and review of the 12,461 patched files from the initial patch
version early this week with:
 - a full scancode scan run, collecting the matched texts, detected
   license ids and scores
 - reviewing anything where there was a license detected (about 500+
   files) to ensure that the applied SPDX license was correct
 - reviewing anything where there was no detection but the patch license
   was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
   SPDX license was correct

This produced a worksheet with 20 files needing minor correction.  This
worksheet was then exported into 3 different .csv files for the
different types of files to be modified.

These .csv files were then reviewed by Greg.  Thomas wrote a script to
parse the csv files and add the proper SPDX tag to the file, in the
format that the file expected.  This script was further refined by Greg
based on the output to detect more types of files automatically and to
distinguish between header and source .c files (which need different
comment types.)  Finally Greg ran the script using the .csv files to
generate the patches.
Reviewed-by: NKate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: NPhilippe Ombredanne <pombredanne@nexb.com>
Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

b2441318

02 10月, 2017 1 次提交

sunrpc: remove redundant initialization of sock · d099b8af

由 Colin Ian King 提交于 9月 18, 2017

sock is being initialized and then being almost immediately updated
hence the initialized value is not being used and is redundant. Remove
the initialization. Cleans up clang warning:

warning: Value stored to 'sock' during its initialization is never read
Signed-off-by: NColin Ian King <colin.king@canonical.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

d099b8af

19 8月, 2017 1 次提交

SUNRPC: Add a separate spinlock to protect the RPC request receive list · ce7c252a

由 Trond Myklebust 提交于 8月 16, 2017

This further reduces contention with the transport_lock, and allows us
to convert to using a non-bh-safe spinlock, since the list is now never
accessed from a bh context.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

ce7c252a

17 8月, 2017 4 次提交

SUNRPC: Cleanup xs_tcp_read_common() · 040249df

由 Trond Myklebust 提交于 8月 13, 2017

Simplify the code to avoid a full copy of the struct xdr_skb_reader.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

040249df

SUNRPC: Don't loop forever in xs_tcp_data_receive() · 8d6f97d6

由 Trond Myklebust 提交于 8月 12, 2017

Ensure that we don't hog the workqueue thread by requeuing the job
every 64 loops.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

8d6f97d6

SUNRPC: Don't hold the transport lock when receiving backchannel data · c89091c8

由 Trond Myklebust 提交于 8月 16, 2017

The backchannel request has no associated task, so it is going nowhere
until we call xprt_complete_bc_request().
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

c89091c8

SUNRPC: Don't hold the transport lock across socket copy operations · 729749bb

由 Trond Myklebust 提交于 8月 13, 2017

Instead add a mechanism to ensure that the request doesn't disappear
from underneath us while copying from the socket. We do this by
preventing xprt_release() from freeing the XDR buffers until the
flag RPC_TASK_MSG_RECV has been cleared from the request.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
Reviewed-by: NChuck Lever <chuck.lever@oracle.com>

729749bb

02 8月, 2017 1 次提交

sunrpc: Const-ify all instances of struct rpc_xprt_ops · d31ae254

由 Chuck Lever 提交于 8月 01, 2017

After transport instance creation, these function pointers never
change. Mark them as constant to prevent their use as an attack
vector for code injections.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

d31ae254

21 7月, 2017 1 次提交

net/sunrpc/xprt_sock: fix regression in connection error reporting. · 3ffbc1d6

由 NeilBrown 提交于 7月 19, 2017

Commit 3d476263 ("tcp: remove poll() flakes when receiving
RST") in v4.12 changed the order in which ->sk_state_change()
and ->sk_error_report() are called when a socket is shut
down - sk_state_change() is now called first.

This causes xs_tcp_state_change() -> xs_sock_mark_closed() ->
xprt_disconnect_done() to wake all pending tasked with -EAGAIN.
When the ->sk_error_report() callback arrives, it is too late to
pass the error on, and it is lost.

As easy way to demonstrate the problem caused is to try to start
rpc.nfsd while rcpbind isn't running.
nfsd will attempt a tcp connection to rpcbind.  A ECONNREFUSED
error is returned, but sunrpc code loses the error and keeps
retrying.  If it saw the ECONNREFUSED, it would abort.

To fix this, handle the sk->sk_err in the TCP_CLOSE branch of
xs_tcp_state_change().

Fixes: 3d476263 ("tcp: remove poll() flakes when receiving RST")
Cc: stable@vger.kernel.org (v4.12)
Signed-off-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

3ffbc1d6

01 6月, 2017 1 次提交

SUNRPC: ensure correct error is reported by xs_tcp_setup_socket() · 6ea44adc

由 NeilBrown 提交于 5月 25, 2017

If you attempt a TCP mount from an host that is unreachable in a way
that triggers an immediate error from kernel_connect(), that error
does not propagate up, instead EAGAIN is reported.

This results in call_connect_status receiving the wrong error.

A case that it easy to demonstrate is to attempt to mount from an
address that results in ENETUNREACH, but first deleting any default
route.
Without this patch, the mount.nfs process is persistently runnable
and is hard to kill.  With this patch it exits as it should.

The problem is caused by the fact that xs_tcp_force_close() eventually
calls
      xprt_wake_pending_tasks(xprt, -EAGAIN);
which causes an error return of -EAGAIN.  so when xs_tcp_setup_sock()
calls
      xprt_wake_pending_tasks(xprt, status);
the status is ignored.

Fixes: 4efdd92c ("SUNRPC: Remove TCP client connection reset hack")
Signed-off-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

6ea44adc

28 2月, 2017 1 次提交

lib/vsprintf.c: remove %Z support · 5b5e0928

由 Alexey Dobriyan 提交于 2月 27, 2017

Now that %z is standartised in C99 there is no reason to support %Z.
Unlike %L it doesn't even make format strings smaller.

Use BUILD_BUG_ON in a couple ATM drivers.

In case anyone didn't notice lib/vsprintf.o is about half of SLUB which
is in my opinion is quite an achievement.  Hopefully this patch inspires
someone else to trim vsprintf.c more.

Link: http://lkml.kernel.org/r/20170103230126.GA30170@avx2Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

5b5e0928

21 2月, 2017 1 次提交

sunrpc: silence uninitialized variable warning · 9761a246

由 Dan Carpenter 提交于 2月 19, 2017

kstrtouint() can return a couple different error codes so the check for
"ret == -EINVAL" is wrong and static analysis tools correctly complain
that we can use "num" without initializing it. It's not super harmful
because we check the bounds. But it's also easy enough to fix.
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

9761a246

11 2月, 2017 1 次提交

sunrpc: Allow xprt->ops->timer method to sleep · b977b644

由 Chuck Lever 提交于 2月 08, 2017

The transport lock is needed to protect the xprt_adjust_cwnd() call
in xs_udp_timer, but it is not necessary for accessing the
rq_reply_bytes_recvd or tk_status fields. It is correct to sublimate
the lock into UDP's xs_udp_timer method, where it is required.

The ->timer method has to take the transport lock if needed, but it
can now sleep safely, or even call back into the RPC scheduler.

This is more a clean-up than a fix, but the "issue" was introduced
by my transport switch patches back in 2005.

Fixes: 46c0ee8b ("RPC: separate xprt_timer implementations")
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

b977b644

10 2月, 2017 2 次提交

SUNRPC: Allow changing of the TCP timeout parameters on the fly · 7196dbb0

由 Trond Myklebust 提交于 2月 08, 2017

When the NFSv4 server tells us the lease period, we usually want
to adjust down the timeout parameters on the TCP connection to
ensure that we don't miss lease renewals due to a faulty connection.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

7196dbb0

SUNRPC: Refactor TCP socket timeout code into a helper function · 8d1b8c62

由 Trond Myklebust 提交于 2月 08, 2017

Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

8d1b8c62

08 11月, 2016 1 次提交

udp: do fwd memory scheduling on dequeue · 7c13f97f

由 Paolo Abeni 提交于 11月 04, 2016

A new argument is added to __skb_recv_datagram to provide
an explicit skb destructor, invoked under the receive queue
lock.
The UDP protocol uses such argument to perform memory
reclaiming on dequeue, so that the UDP protocol does not
set anymore skb->desctructor.
Instead explicit memory reclaiming is performed at close() time and
when skbs are removed from the receive queue.
The in kernel UDP protocol users now need to call a
skb_recv_udp() variant instead of skb_recv_datagram() to
properly perform memory accounting on dequeue.

Overall, this allows acquiring only once the receive queue
lock on dequeue.

Tested using pktgen with random src port, 64 bytes packet,
wire-speed on a 10G link as sender and udp_sink as the receiver,
using an l4 tuple rxhash to stress the contention, and one or more
udp_sink instances with reuseport.

nr sinks	vanilla		patched
1		440		560
3		2150		2300
6		3650		3800
9		4450		4600
12		6250		6450

v1 -> v2:
 - do rmem and allocated memory scheduling under the receive lock
 - do bulk scheduling in first_packet_length() and in udp_destruct_sock()
 - avoid the typdef for the dequeue callback
Suggested-by: NEric Dumazet <edumazet@google.com>
Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7c13f97f

29 10月, 2016 1 次提交

sunrpc: fix some missing rq_rbuffer assignments · 18e601d6

由 Jeff Layton 提交于 10月 24, 2016

We've been seeing some crashes in testing that look like this:

BUG: unable to handle kernel NULL pointer dereference at           (null)
IP: [<ffffffff8135ce99>] memcpy_orig+0x29/0x110
PGD 212ca2067 PUD 212ca3067 PMD 0
Oops: 0002 [#1] SMP
Modules linked in: rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache ppdev parport_pc i2c_piix4 sg parport i2c_core virtio_balloon pcspkr acpi_cpufreq nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sd_mod ata_generic pata_acpi virtio_scsi 8139too ata_piix libata 8139cp mii virtio_pci floppy virtio_ring serio_raw virtio
CPU: 1 PID: 1540 Comm: nfsd Not tainted 4.9.0-rc1 #39
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007
task: ffff88020d7ed200 task.stack: ffff880211838000
RIP: 0010:[<ffffffff8135ce99>]  [<ffffffff8135ce99>] memcpy_orig+0x29/0x110
RSP: 0018:ffff88021183bdd0  EFLAGS: 00010206
RAX: 0000000000000000 RBX: ffff88020d7fa000 RCX: 000000f400000000
RDX: 0000000000000014 RSI: ffff880212927020 RDI: 0000000000000000
RBP: ffff88021183be30 R08: 01000000ef896996 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffff880211704ca8
R13: ffff88021473f000 R14: 00000000ef896996 R15: ffff880211704800
FS:  0000000000000000(0000) GS:ffff88021fc80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 0000000212ca1000 CR4: 00000000000006e0
Stack:
 ffffffffa01ea087 ffffffff63400001 ffff880215145e00 ffff880211bacd00
 ffff88021473f2b8 0000000000000004 00000000d0679d67 ffff880211bacd00
 ffff88020d7fa000 ffff88021473f000 0000000000000000 ffff88020d7faa30
Call Trace:
 [<ffffffffa01ea087>] ? svc_tcp_recvfrom+0x5a7/0x790 [sunrpc]
 [<ffffffffa01f84d8>] svc_recv+0xad8/0xbd0 [sunrpc]
 [<ffffffffa0262d5e>] nfsd+0xde/0x160 [nfsd]
 [<ffffffffa0262c80>] ? nfsd_destroy+0x60/0x60 [nfsd]
 [<ffffffff810a9418>] kthread+0xd8/0xf0
 [<ffffffff816dbdbf>] ret_from_fork+0x1f/0x40
 [<ffffffff810a9340>] ? kthread_park+0x60/0x60
Code: 00 00 48 89 f8 48 83 fa 20 72 7e 40 38 fe 7c 35 48 83 ea 20 48 83 ea 20 4c 8b 06 4c 8b 4e 08 4c 8b 56 10 4c 8b 5e 18 48 8d 76 20 <4c> 89 07 4c 89 4f 08 4c 89 57 10 4c 89 5f 18 48 8d 7f 20 73 d4
RIP  [<ffffffff8135ce99>] memcpy_orig+0x29/0x110
 RSP <ffff88021183bdd0>
CR2: 0000000000000000

Both Bruce and Eryu ran a bisect here and found that the problematic
patch was 68778945 (SUNRPC: Separate buffer pointers for RPC Call and
Reply messages).

That patch changed rpc_xdr_encode to use a new rq_rbuffer pointer to
set up the receive buffer, but didn't change all of the necessary
codepaths to set it properly. In particular the backchannel setup was
missing.

We need to set rq_rbuffer whenever rq_buffer is set. Ensure that it is.
Reviewed-by: NChuck Lever <chuck.lever@oracle.com>
Tested-by: NChuck Lever <chuck.lever@oracle.com>
Reported-by: NEryu Guan <guaneryu@gmail.com>
Tested-by: NEryu Guan <guaneryu@gmail.com>
Fixes: 68778945 "SUNRPC: Separate buffer pointers..."
Reported-by: NJ. Bruce Fields <bfields@fieldses.org>
Signed-off-by: NJeff Layton <jlayton@redhat.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

18e601d6

23 10月, 2016 1 次提交

udp: use it's own memory accounting schema · 850cbadd

由 Paolo Abeni 提交于 10月 21, 2016

Completely avoid default sock memory accounting and replace it
with udp-specific accounting.

Since the new memory accounting model encapsulates completely
the required locking, remove the socket lock on both enqueue and
dequeue, and avoid using the backlog on enqueue.

Be sure to clean-up rx queue memory on socket destruction, using
udp its own sk_destruct.

Tested using pktgen with random src port, 64 bytes packet,
wire-speed on a 10G link as sender and udp_sink as the receiver,
using an l4 tuple rxhash to stress the contention, and one or more
udp_sink instances with reuseport.

nr readers      Kpps (vanilla)  Kpps (patched)
1               170             440
3               1250            2150
6               3000            3650
9               4200            4450
12              5700            6250

v4 -> v5:
  - avoid unneeded test in first_packet_length

v3 -> v4:
  - remove useless sk_rcvqueues_full() call

v2 -> v3:
  - do not set the now unsed backlog_rcv callback

v1 -> v2:
  - add memory pressure support
  - fixed dropwatch accounting for ipv6
Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

850cbadd

20 9月, 2016 3 次提交

sunrpc: fix write space race causing stalls · d48f9ce7

由 David Vrabel 提交于 9月 19, 2016

Write space becoming available may race with putting the task to sleep
in xprt_wait_for_buffer_space().  The existing mechanism to avoid the
race does not work.

This (edited) partial trace illustrates the problem:

   [1] rpc_task_run_action: task:43546@5 ... action=call_transmit
   [2] xs_write_space <-xs_tcp_write_space
   [3] xprt_write_space <-xs_write_space
   [4] rpc_task_sleep: task:43546@5 ...
   [5] xs_write_space <-xs_tcp_write_space

[1] Task 43546 runs but is out of write space.

[2] Space becomes available, xs_write_space() clears the
    SOCKWQ_ASYNC_NOSPACE bit.

[3] xprt_write_space() attemts to wake xprt->snd_task (== 43546), but
    this has not yet been queued and the wake up is lost.

[4] xs_nospace() is called which calls xprt_wait_for_buffer_space()
    which queues task 43546.

[5] The call to sk->sk_write_space() at the end of xs_nospace() (which
    is supposed to handle the above race) does not call
    xprt_write_space() as the SOCKWQ_ASYNC_NOSPACE bit is clear and
    thus the task is not woken.

Fix the race by resetting the SOCKWQ_ASYNC_NOSPACE bit in xs_nospace()
so the second call to sk->sk_write_space() calls xprt_write_space().
Suggested-by: NTrond Myklebust <trondmy@primarydata.com>
Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
cc: stable@vger.kernel.org # 4.4
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

d48f9ce7

SUNRPC: Generalize the RPC buffer release API · 3435c74a

由 Chuck Lever 提交于 9月 15, 2016

xprtrdma needs to allocate the Call and Reply buffers separately.
TBH, the reliance on using a single buffer for the pair of XDR
buffers is transport implementation-specific.

Instead of passing just the rq_buffer into the buf_free method, pass
the task structure and let buf_free take care of freeing both
XDR buffers at once.

There's a micro-optimization here. In the common case, both
xprt_release and the transport's buf_free method were checking if
rq_buffer was NULL. Now the check is done only once per RPC.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

3435c74a

SUNRPC: Generalize the RPC buffer allocation API · 5fe6eaa1

由 Chuck Lever 提交于 9月 15, 2016

xprtrdma needs to allocate the Call and Reply buffers separately.
TBH, the reliance on using a single buffer for the pair of XDR
buffers is transport implementation-specific.

Transports that want to allocate separate Call and Reply buffers
will ignore the "size" argument anyway.  Don't bother passing it.

The buf_alloc method can't return two pointers. Instead, make the
method's return value an error code, and set the rq_buffer pointer
in the method itself.

This gives call_allocate an opportunity to terminate an RPC instead
of looping forever when a permanent problem occurs. If a request is
just bogus, or the transport is in a state where it can't allocate
resources for any request, there needs to be a way to kill the RPC
right there and not loop.

This immediately fixes a rare problem in the backchannel send path,
which loops if the server happens to send a CB request whose
call+reply size is larger than a page (which it shouldn't do yet).

One more issue: looks like xprt_inject_disconnect was incorrectly
placed in the failure path in call_allocate. It needs to be in the
success path, as it is for other call-sites.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

5fe6eaa1

03 9月, 2016 1 次提交

sunrpc: fix UDP memory accounting · a41bd25a

由 Paolo Abeni 提交于 8月 25, 2016

The commit f9b2ee71 ("SUNRPC: Move UDP receive data path
into a workqueue context"), as a side effect, moved the
skb_free_datagram() call outside the scope of the related socket
lock, but UDP sockets require such lock to be held for proper
memory accounting.
Fix it by replacing skb_free_datagram() with
skb_free_datagram_locked().

Fixes: f9b2ee71 ("SUNRPC: Move UDP receive data path into a workqueue context")
Reported-and-tested-by: NJan Stancek <jstancek@redhat.com>
Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
Cc: stable@vger.kernel.org # 4.4+
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

a41bd25a

06 8月, 2016 2 次提交

SUNRPC: Limit the reconnect backoff timer to the max RPC message timeout · 3851f1cd

由 Trond Myklebust 提交于 8月 04, 2016

...and ensure that we propagate it to new transports on the same
client.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

3851f1cd

SUNRPC: Fix reconnection timeouts · 02910177

由 Trond Myklebust 提交于 8月 04, 2016

When the connect attempt fails and backs off, we should start the clock
at the last connection attempt, not time at which we queue up the
reconnect job.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

02910177

05 8月, 2016 1 次提交

SUNRPC: disable the use of IPv6 temporary addresses. · d88e4d82

由 NeilBrown 提交于 8月 04, 2016

If the net.ipv6.conf.*.use_temp_addr sysctl is set to '2',
then TCP connections over IPv6 will prefer a 'private' source
address.
These eventually expire and become invalid, typically after a week,
but the time is configurable.

When the local address becomes invalid the client will not be able to
receive replies from the server.  Eventually the connection will timeout
or break and a new connection will be established, but this can take
half an hour (typically TCP connection break time).

RFC 4941, which describes private IPv6 addresses, acknowledges that some
applications might not work well with them and that the application may
explicitly a request non-temporary (i.e. "public") address.

I believe this is correct for SUNRPC clients.  Without this change, a
client will occasionally experience a long delay if private addresses
have been enabled.

The privacy offered by private addresses is of little value for an NFS
server which requires client authentication.

For NFSv3 this will often not be a problem because idle connections are
closed after 5 minutes.  For NFSv4 connections never go idle due to the
period RENEW (or equivalent) request.
Signed-off-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

d88e4d82

02 8月, 2016 1 次提交

SUNRPC: Handle EADDRNOTAVAIL on connection failures · 1f4c17a0

由 Trond Myklebust 提交于 8月 01, 2016

If the connect attempt immediately fails with an EADDRNOTAVAIL error, then
that means our choice of source port number was bad.
This error is expected when we set the SO_REUSEPORT socket option and we
have 2 sockets sharing the same source and destination address and port
combinations.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
Fixes: 402e23b4 ("SUNRPC: Fix stupid typo in xs_sock_set_reuseport")
Cc: stable@vger.kernel.org # v4.0+

1f4c17a0

20 7月, 2016 3 次提交

sunrpc: Prevent resvport min/max inversion via sysfs and module parameter · ffb6ca33

由 Frank Sorenson 提交于 7月 08, 2016

The current min/max resvport settings are independently limited
by the entire range of allowed ports, so max_resvport can be
set to a port lower than min_resvport.

Prevent inversion of min/max values when set through sysfs and
module parameter by setting the limits dependent on each other.
Signed-off-by: NFrank Sorenson <sorenson@redhat.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

ffb6ca33

sunrpc: Prevent resvport min/max inversion via sysctl · e08ea3a9

由 Frank Sorenson 提交于 7月 08, 2016

The current min/max resvport settings are independently limited
by the entire range of allowed ports, so max_resvport can be
set to a port lower than min_resvport.

Prevent inversion of min/max values when set through sysctl by
setting the limits dependent on each other.
Signed-off-by: NFrank Sorenson <sorenson@redhat.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

e08ea3a9

sunrpc: Fix reserved port range calculation · 5d71899a

由 Frank Sorenson 提交于 7月 08, 2016

The range calculation for choosing the random reserved port will panic
with divide-by-zero when min_resvport == max_resvport, a range of one
port, not zero.

Fix the reserved port range calculation by adding one to the difference.
Signed-off-by: NFrank Sorenson <sorenson@redhat.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

5d71899a

15 6月, 2016 1 次提交

rpc: share one xps between all backchannels · 39a9beab

由 J. Bruce Fields 提交于 5月 17, 2016

The spec allows backchannels for multiple clients to share the same tcp
connection.  When that happens, we need to use the same xprt for all of
them.  Similarly, we need the same xps.

This fixes list corruption introduced by the multipath code.

Cc: stable@vger.kernel.org
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
Acked-by: NTrond Myklebust <trondmy@primarydata.com>

39a9beab

14 6月, 2016 3 次提交

SUNRPC: Fix suspicious enobufs issues. · 9ffadfbc

由 Trond Myklebust 提交于 5月 29, 2016

The current test is racy when dealing with fast NICs.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

9ffadfbc

SUNRPC: RPC transport queue must be low latency · 40a5f1b1

由 Trond Myklebust 提交于 5月 27, 2016

rpciod can easily get congested due to the long list of queued rpc_tasks.
Having the receive queue wait in turn for those tasks to complete can
therefore be a bottleneck.

Address the problem by separating the workqueues into:
- rpciod: manages rpc_tasks
- xprtiod: manages transport related work.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

40a5f1b1

SUNRPC: Consolidate xs_tcp_data_ready and xs_data_ready · 5157b956

由 Trond Myklebust 提交于 5月 29, 2016

The only difference between the two at this point is the reset of
the connection timeout, and since everyone expect tcp ignore that value,
we can just throw it into the generic function.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

5157b956

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功