提交 · de84d89030fa4efa44c02c96c8b4a8176042c4ff · openanolis / cloud-kernel

09 2月, 2015 6 次提交

SUNRPC: TCP/UDP always close the old socket before reconnecting · de84d890

由 Trond Myklebust 提交于 2月 08, 2015

It is not safe to call xs_reset_transport() from inside xs_udp_setup_socket()
or xs_tcp_setup_socket(), since they do not own the correct locks. Instead,
do it in xs_connect().
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

de84d890

SUNRPC: Add helpers to prevent socket create from racing · 718ba5b8

由 Trond Myklebust 提交于 2月 08, 2015

The socket lock is currently held by the task that is requesting the
connection be established. While that is efficient in the case where
the connection happens quickly, it is racy in the case where it doesn't.
What we really want is for the connect helper to be able to block access
to the socket while it is being set up.

This patch does so by arranging to transfer the socket lock from the
task that is requesting the connect attempt, and then releasing that
lock once everything is done.
This scheme also gives us automatic protection against collisions with
the RPC close code, so we can kill the cancel_delayed_work_sync()
call in xs_close().
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

718ba5b8

T
SUNRPC: Ensure xs_reset_transport() resets the close connection flags · 6cc7e908
由 Trond Myklebust 提交于 2月 08, 2015
```
Otherwise, we may end up looping.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
```
6cc7e908

SUNRPC: Do not clear the source port in xs_reset_transport · 76698b23

由 Trond Myklebust 提交于 2月 08, 2015

Now that we can reuse bound ports after a close, we never really want to
clear the transport's source port after it has been set. Doing so really
messes up the NFSv3 DRC on the server.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

76698b23

SUNRPC: Handle EADDRINUSE on connect · 3913c78c

由 Trond Myklebust 提交于 2月 08, 2015

Now that we're setting SO_REUSEPORT, we still need to handle the
case where a connect() is attempted, but the old socket is still
lingering.
Essentially, all we want to do here is handle the error by waiting
a few seconds and then retrying.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

3913c78c

SUNRPC: Set SO_REUSEPORT socket option for TCP connections · 4dda9c8a

由 Trond Myklebust 提交于 2月 08, 2015

When using TCP, we need the ability to reuse port numbers after
a disconnection, so that the NFSv3 server knows that we're the same
client. Currently we use a hack to work around the TCP socket's
TIME_WAIT: we send an RST instead of closing, which doesn't
always work...
The SO_REUSEPORT option added in Linux 3.9 allows us to bind multiple
TCP connections to the same source address+port combination, and thus
to use ordinary TCP close() instead of the current hack.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

4dda9c8a

25 11月, 2014 3 次提交

sunrpc: eliminate RPC_DEBUG · f895b252

由 Jeff Layton 提交于 11月 17, 2014

It's always set to whatever CONFIG_SUNRPC_DEBUG is, so just use that.
Signed-off-by: NJeff Layton <jlayton@primarydata.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

f895b252

sunrpc: add tracepoints in xs_tcp_data_recv · 1a867a08

由 Jeff Layton 提交于 10月 28, 2014

Add tracepoints inside the main loop on xs_tcp_data_recv that allow
us to keep an eye on what's happening during each phase of it.
Signed-off-by: NJeff Layton <jlayton@primarydata.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

1a867a08

sunrpc: add new tracepoints in xprt handling code · 3705ad64

由 Jeff Layton 提交于 10月 28, 2014

...so we can keep track of when calls are sent and replies received.
Signed-off-by: NJeff Layton <jlayton@primarydata.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

3705ad64

25 9月, 2014 4 次提交

NFS/SUNRPC: Remove other deadlock-avoidance mechanisms in nfs_release_page() · 1aff5256

由 NeilBrown 提交于 9月 24, 2014

Now that nfs_release_page() doesn't block indefinitely, other deadlock
avoidance mechanisms aren't needed.
 - it doesn't hurt for kswapd to block occasionally.  If it doesn't
   want to block it would clear __GFP_WAIT.  The current_is_kswapd()
   was only added to avoid deadlocks and we have a new approach for
   that.
 - memory allocation in the SUNRPC layer can very rarely try to
   ->releasepage() a page it is trying to handle.  The deadlock
   is removed as nfs_release_page() doesn't block indefinitely.

So we don't need to set PF_FSTRANS for sunrpc network operations any
more.
Signed-off-by: NNeilBrown <neilb@suse.de>
Acked-by: NJeff Layton <jlayton@primarydata.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

1aff5256

rpc: Add -EPERM processing for xs_udp_send_request() · 3dedbb5c

由 Jason Baron 提交于 9月 24, 2014

If an iptables drop rule is added for an nfs server, the client can end up in
a softlockup. Because of the way that xs_sendpages() is structured, the -EPERM
is ignored since the prior bits of the packet may have been successfully queued
and thus xs_sendpages() returns a non-zero value. Then, xs_udp_send_request()
thinks that because some bits were queued it should return -EAGAIN. We then try
the request again and again, resulting in cpu spinning. Reproducer:

1) open a file on the nfs server '/nfs/foo' (mounted using udp)
2) iptables -A OUTPUT -d <nfs server ip> -j DROP
3) write to /nfs/foo
4) close /nfs/foo
5) iptables -D OUTPUT -d <nfs server ip> -j DROP

The softlockup occurs in step 4 above.

The previous patch, allows xs_sendpages() to return both a sent count and
any error values that may have occurred. Thus, if we get an -EPERM, return
that to the higher level code.

With this patch in place we can successfully abort the above sequence and
avoid the softlockup.

I also tried the above test case on an nfs mount on tcp and although the system
does not softlockup, I still ended up with the 'hung_task' firing after 120
seconds, due to the i/o being stuck. The tcp case appears a bit harder to fix,
since -EPERM appears to get ignored much lower down in the stack and does not
propogate up to xs_sendpages(). This case is not quite as insidious as the
softlockup and it is not addressed here.
Reported-by: NYigong Lou <ylou@akamai.com>
Signed-off-by: NJason Baron <jbaron@akamai.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

3dedbb5c

rpc: return sent and err from xs_sendpages() · f279cd00

由 Jason Baron 提交于 9月 24, 2014

If an error is returned after the first bits of a packet have already been
successfully queued, xs_sendpages() will return a positive 'int' value
indicating success. Callers seem to treat this as -EAGAIN.

However, there are cases where its not a question of waiting for the write
queue to drain. For example, when there is an iptables rule dropping packets
to the destination, the lower level code can return -EPERM only after parts
of the packet have been successfully queued. In this case, we can end up
continuously retrying resulting in a kernel softlockup.

This patch is intended to make no changes in behavior but is in preparation for
subsequent patches that can make decisions based on both on the number of bytes
sent by xs_sendpages() and any errors that may have be returned.
Signed-off-by: NJason Baron <jbaron@akamai.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

f279cd00

SUNRPC: Don't wake tasks during connection abort · a743419f

由 Benjamin Coddington 提交于 9月 23, 2014

When aborting a connection to preserve source ports, don't wake the task in
xs_error_report. This allows tasks with RPC_TASK_SOFTCONN to succeed if the
connection needs to be re-established since it preserves the task's status
instead of setting it to the status of the aborting kernel_connect().

This may also avoid a potential conflict on the socket's lock.
Signed-off-by: NBenjamin Coddington <bcodding@redhat.com>
Cc: stable@vger.kernel.org # 3.14+
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

a743419f

11 9月, 2014 1 次提交

rpc: xs_bind - do not bind when requesting a random ephemeral port · 0f7a622c

由 Chris Perl 提交于 9月 05, 2014

When attempting to establish a local ephemeral endpoint for a TCP or UDP
socket, do not explicitly call bind, instead let it happen implicilty when the
socket is first used.

The main motivating factor for this change is when TCP runs out of unique
ephemeral ports (i.e.  cannot find any ephemeral ports which are not a part of
*any* TCP connection).  In this situation if you explicitly call bind, then the
call will fail with EADDRINUSE.  However, if you allow the allocation of an
ephemeral port to happen implicitly as part of connect (or other functions),
then ephemeral ports can be reused, so long as the combination of (local_ip,
local_port, remote_ip, remote_port) is unique for TCP sockets on the system.

This doesn't matter for UDP sockets, but it seemed easiest to treat TCP and UDP
sockets the same.

This can allow mount.nfs(8) to continue to function successfully, even in the
face of misbehaving applications which are creating a large number of TCP
connections.
Signed-off-by: NChris Perl <chris.perl@gmail.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

0f7a622c

13 7月, 2014 1 次提交

replace strict_strto calls · 00cfaa94

由 Daniel Walter 提交于 6月 21, 2014

Replace obsolete strict_strto calls with appropriate kstrto calls
Signed-off-by: NDaniel Walter <dwalter@google.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

00cfaa94

01 7月, 2014 1 次提交

SUNRPC: Ensure that we handle ENOBUFS errors correctly. · 3601c4a9

由 Trond Myklebust 提交于 6月 30, 2014

Currently, an ENOBUFS error will result in a fatal error for the RPC
call. Normally, we will just want to wait and then retry.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

3601c4a9

24 5月, 2014 1 次提交

sunrpc: Remove sk_no_check setting · 0f8066bd

由 Tom Herbert 提交于 5月 23, 2014

Setting sk_no_check to UDP_CSUM_NORCV seems to have no effect.
Signed-off-by: NTom Herbert <therbert@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0f8066bd

18 4月, 2014 1 次提交

arch: Mass conversion of smp_mb__*() · 4e857c58

由 Peter Zijlstra 提交于 3月 17, 2014

Mostly scripted conversion of the smp_mb__* barriers.
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Acked-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/n/tip-55dhyhocezdw1dg7u19hmh1u@git.kernel.org
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-arch@vger.kernel.org
Signed-off-by: NIngo Molnar <mingo@kernel.org>

4e857c58

12 4月, 2014 1 次提交

net: Fix use after free by removing length arg from sk_data_ready callbacks. · 676d2369

由 David S. Miller 提交于 4月 11, 2014

Several spots in the kernel perform a sequence like:

	skb_queue_tail(&sk->s_receive_queue, skb);
	sk->sk_data_ready(sk, skb->len);

But at the moment we place the SKB onto the socket receive queue it
can be consumed and freed up.  So this skb->len access is potentially
to freed up memory.

Furthermore, the skb->len can be modified by the consumer so it is
possible that the value isn't accurate.

And finally, no actual implementation of this callback actually uses
the length argument.  And since nobody actually cared about it's
value, lots of call sites pass arbitrary values in such as '0' and
even '1'.

So just remove the length argument from the callback, that way there
is no confusion whatsoever and all of these use-after-free cases get
fixed as a side effect.

Based upon a patch by Eric Dumazet and his suggestion to audit this
issue tree-wide.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

676d2369

30 3月, 2014 3 次提交

SUNRPC: Clear xpt_bc_xprt if xs_setup_bc_tcp failed · 642aab58

由 Kinglong Mee 提交于 3月 24, 2014

Don't move the assign of args->bc_xprt->xpt_bc_xprt out of xs_setup_bc_tcp,
because rpc_ping (which is in rpc_create) will using it.
Signed-off-by: NKinglong Mee <kinglongmee@gmail.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

642aab58

NFSD/SUNRPC: Check rpc_xprt out of xs_setup_bc_tcp · d531c008

由 Kinglong Mee 提交于 3月 24, 2014

Besides checking rpc_xprt out of xs_setup_bc_tcp,
increase it's reference (it's important).
Signed-off-by: NKinglong Mee <kinglongmee@gmail.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

d531c008

NFSD: Free backchannel xprt in bc_destroy · 47f72efa

由 Kinglong Mee 提交于 3月 24, 2014

Backchannel xprt isn't freed right now.
Free it in bc_destroy, and put the reference of THIS_MODULE.
Signed-off-by: NKinglong Mee <kinglongmee@gmail.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

47f72efa

29 3月, 2014 1 次提交

SUNRPC: fix memory leak of peer addresses in XPRT · 315f3812

由 Kinglong Mee 提交于 3月 24, 2014

Creating xprt failed after xs_format_peer_addresses,
sunrpc must free those memory of peer addresses in xprt.
Signed-off-by: NKinglong Mee <kinglongmee@gmail.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

315f3812

12 2月, 2014 1 次提交

SUNRPC: RPC callbacks may be split across several TCP segments · 2ea24497

由 Trond Myklebust 提交于 2月 10, 2014

Since TCP is a stream protocol, our callback read code needs to take into
account the fact that RPC callbacks are not always confined to a single
TCP segment.
This patch adds support for multiple TCP segments by ensuring that we
only remove the rpc_rqst structure from the 'free backchannel requests'
list once the data has been completely received. We rely on the fact
that TCP data is ordered for the duration of the connection.
Reported-by: Nshaobingqing <shaobingqing@bwstor.com.cn>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

2ea24497

11 2月, 2014 1 次提交

SUNRPC: Fix races in xs_nospace() · 06ea0bfe

由 Trond Myklebust 提交于 2月 11, 2014

When a send failure occurs due to the socket being out of buffer space,
we call xs_nospace() in order to have the RPC task wait until the
socket has drained enough to make it worth while trying again.
The current patch fixes a race in which the socket is drained before
we get round to setting up the machinery in xs_nospace(), and which
is reported to cause hangs.

Link: http://lkml.kernel.org/r/20140210170315.33dfc621@notabene.brown
Fixes: a9a6b52e (SUNRPC: Don't start the retransmission timer...)
Reported-by: NNeil Brown <neilb@suse.com>
Cc: stable@vger.kernel.org
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

06ea0bfe

15 1月, 2014 1 次提交

net: replace macros net_random and net_srandom with direct calls to prandom · 63862b5b

由 Aruna-Hewapathirane 提交于 1月 11, 2014

This patch removes the net_random and net_srandom macros and replaces
them with direct calls to the prandom ones. As new commits only seem to
use prandom_u32 there is no use to keep them around.
This change makes it easier to grep for users of prandom_u32.
Signed-off-by: NAruna-Hewapathirane <aruna.hewapathirane@gmail.com>
Suggested-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

63862b5b

01 1月, 2014 2 次提交
- T
  SUNRPC: Add tracepoint for socket errors · e8353c76
  由 Trond Myklebust 提交于 12月 31, 2013
```
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
```
  e8353c76
- T
  SUNRPC: Report connection error values to rpc_tasks on the pending queue · 2118071d
  由 Trond Myklebust 提交于 12月 31, 2013
```
Currently we only report EAGAIN, which is not descriptive enough for
softconn tasks.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
```
  2118071d
11 12月, 2013 1 次提交

sunrpc: fix some typos · 28303ca3

由 Weng Meiling 提交于 11月 30, 2013

Signed-off-by: NWeng Meiling <wengmeiling.weng@huawei.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

28303ca3

13 11月, 2013 1 次提交
- J
  sunrpc: comment typo fix · f06c3d2b
  由 J. Bruce Fields 提交于 9月 17, 2013
```
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
```
  f06c3d2b
09 11月, 2013 1 次提交

SUNRPC: Fix a data corruption issue when retransmitting RPC calls · a6b31d18

由 Trond Myklebust 提交于 11月 08, 2013

The following scenario can cause silent data corruption when doing
NFS writes. It has mainly been observed when doing database writes
using O_DIRECT.

1) The RPC client uses sendpage() to do zero-copy of the page data.
2) Due to networking issues, the reply from the server is delayed,
   and so the RPC client times out.

3) The client issues a second sendpage of the page data as part of
   an RPC call retransmission.

4) The reply to the first transmission arrives from the server
   _before_ the client hardware has emptied the TCP socket send
   buffer.
5) After processing the reply, the RPC state machine rules that
   the call to be done, and triggers the completion callbacks.
6) The application notices the RPC call is done, and reuses the
   pages to store something else (e.g. a new write).

7) The client NIC drains the TCP socket send buffer. Since the
   page data has now changed, it reads a corrupted version of the
   initial RPC call, and puts it on the wire.

This patch fixes the problem in the following manner:

The ordering guarantees of TCP ensure that when the server sends a
reply, then we know that the _first_ transmission has completed. Using
zero-copy in that situation is therefore safe.
If a time out occurs, we then send the retransmission using sendmsg()
(i.e. no zero-copy), We then know that the socket contains a full copy of
the data, and so it will retransmit a faithful reproduction even if the
RPC call completes, and the application reuses the O_DIRECT buffer in
the meantime.
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org

a6b31d18

31 10月, 2013 2 次提交

SUNRPC: Cleanup xs_destroy() · a1311d87

由 Trond Myklebust 提交于 10月 31, 2013

There is no longer any need for a separate xs_local_destroy() helper.
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

a1311d87

SUNRPC: close a rare race in xs_tcp_setup_socket. · 93dc41bd

由 NeilBrown 提交于 10月 31, 2013

We have one report of a crash in xs_tcp_setup_socket.
The call path to the crash is:

  xs_tcp_setup_socket -> inet_stream_connect -> lock_sock_nested.

The 'sock' passed to that last function is NULL.

The only way I can see this happening is a concurrent call to
xs_close:

  xs_close -> xs_reset_transport -> sock_release -> inet_release

inet_release sets:
   sock->sk = NULL;
inet_stream_connect calls
   lock_sock(sock->sk);
which gets NULL.

All calls to xs_close are protected by XPRT_LOCKED as are most
activations of the workqueue which runs xs_tcp_setup_socket.
The exception is xs_tcp_schedule_linger_timeout.

So presumably the timeout queued by the later fires exactly when some
other code runs xs_close().

To protect against this we can move the cancel_delayed_work_sync()
call from xs_destory() to xs_close().

As xs_close is never called from the worker scheduled on
->connect_worker, this can never deadlock.
Signed-off-by: NNeilBrown <neilb@suse.de>
[Trond: Make it safe to call cancel_delayed_work_sync() on AF_LOCAL sockets]
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

93dc41bd

29 10月, 2013 1 次提交

sunrpc: comment typo fix · e3bfab18

由 J. Bruce Fields 提交于 10月 02, 2013

Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

e3bfab18

02 10月, 2013 2 次提交

T
SUNRPC: Only update the TCP connect cookie on a successful connect · 8b71798c
由 Trond Myklebust 提交于 9月 26, 2013
```
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
```
8b71798c

SUNRPC: Enable the keepalive option for TCP sockets · 7f260e85

由 Trond Myklebust 提交于 9月 24, 2013

For NFSv4 we want to avoid retransmitting RPC calls unless the TCP
connection breaks. However we still want to detect TCP connection
breakage as soon as possible. Do this by setting the keepalive option
with the idle timeout and count set to the 'timeo' and 'retrans' mount
options.
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

7f260e85

05 9月, 2013 1 次提交

SUNRPC: Add tracepoints to help debug socket connection issues · 40b5ea0c

由 Trond Myklebust 提交于 9月 04, 2013

Add client side debugging to help trace socket connection/disconnection
and unexpected state change issues.
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>

40b5ea0c

25 7月, 2013 1 次提交

net: add sk_stream_is_writeable() helper · 64dc6130

由 Eric Dumazet 提交于 7月 22, 2013

Several call sites use the hardcoded following condition :

sk_stream_wspace(sk) >= sk_stream_min_wspace(sk)

Lets use a helper because TCP_NOTSENT_LOWAT support will change this
condition for TCP sockets.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Acked-by: NNeal Cardwell <ncardwell@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

64dc6130

13 6月, 2013 1 次提交

net: Convert uses of typedef ctl_table to struct ctl_table · fe2c6338

由 Joe Perches 提交于 6月 11, 2013

Reduce the uses of this unnecessary typedef.

Done via perl script:

$ git grep --name-only -w ctl_table net | \
  xargs perl -p -i -e '\
	sub trim { my ($local) = @_; $local =~ s/(^\s+|\s+$)//g; return $local; } \
        s/\b(?<!struct\s)ctl_table\b(\s*\*\s*|\s+\w+)/"struct ctl_table " . trim($1)/ge'

Reflow the modified lines that now exceed 80 columns.
Signed-off-by: NJoe Perches <joe@perches.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fe2c6338

15 5月, 2013 1 次提交

sunrpc: server back channel needs no rpcbind method · 2fccbd9c

由 J. Bruce Fields 提交于 9月 24, 2012

XPRT_BOUND is set on server backchannel xprts by xs_setup_bc_tcp()
(using xprt_set_bound()), and is never cleared, so ->rpcbind() will
never need to be called.
Reported-by: N"Myklebust, Trond" <Trond.Myklebust@netapp.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

2fccbd9c

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功