- 09 2月, 2015 6 次提交
-
-
由 Trond Myklebust 提交于
It is not safe to call xs_reset_transport() from inside xs_udp_setup_socket() or xs_tcp_setup_socket(), since they do not own the correct locks. Instead, do it in xs_connect(). Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
由 Trond Myklebust 提交于
The socket lock is currently held by the task that is requesting the connection be established. While that is efficient in the case where the connection happens quickly, it is racy in the case where it doesn't. What we really want is for the connect helper to be able to block access to the socket while it is being set up. This patch does so by arranging to transfer the socket lock from the task that is requesting the connect attempt, and then releasing that lock once everything is done. This scheme also gives us automatic protection against collisions with the RPC close code, so we can kill the cancel_delayed_work_sync() call in xs_close(). Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
由 Trond Myklebust 提交于
Otherwise, we may end up looping. Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
由 Trond Myklebust 提交于
Now that we can reuse bound ports after a close, we never really want to clear the transport's source port after it has been set. Doing so really messes up the NFSv3 DRC on the server. Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
由 Trond Myklebust 提交于
Now that we're setting SO_REUSEPORT, we still need to handle the case where a connect() is attempted, but the old socket is still lingering. Essentially, all we want to do here is handle the error by waiting a few seconds and then retrying. Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
由 Trond Myklebust 提交于
When using TCP, we need the ability to reuse port numbers after a disconnection, so that the NFSv3 server knows that we're the same client. Currently we use a hack to work around the TCP socket's TIME_WAIT: we send an RST instead of closing, which doesn't always work... The SO_REUSEPORT option added in Linux 3.9 allows us to bind multiple TCP connections to the same source address+port combination, and thus to use ordinary TCP close() instead of the current hack. Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
- 25 11月, 2014 3 次提交
-
-
由 Jeff Layton 提交于
It's always set to whatever CONFIG_SUNRPC_DEBUG is, so just use that. Signed-off-by: NJeff Layton <jlayton@primarydata.com> Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
由 Jeff Layton 提交于
Add tracepoints inside the main loop on xs_tcp_data_recv that allow us to keep an eye on what's happening during each phase of it. Signed-off-by: NJeff Layton <jlayton@primarydata.com> Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
由 Jeff Layton 提交于
...so we can keep track of when calls are sent and replies received. Signed-off-by: NJeff Layton <jlayton@primarydata.com> Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
- 25 9月, 2014 4 次提交
-
-
由 NeilBrown 提交于
Now that nfs_release_page() doesn't block indefinitely, other deadlock avoidance mechanisms aren't needed. - it doesn't hurt for kswapd to block occasionally. If it doesn't want to block it would clear __GFP_WAIT. The current_is_kswapd() was only added to avoid deadlocks and we have a new approach for that. - memory allocation in the SUNRPC layer can very rarely try to ->releasepage() a page it is trying to handle. The deadlock is removed as nfs_release_page() doesn't block indefinitely. So we don't need to set PF_FSTRANS for sunrpc network operations any more. Signed-off-by: NNeilBrown <neilb@suse.de> Acked-by: NJeff Layton <jlayton@primarydata.com> Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
由 Jason Baron 提交于
If an iptables drop rule is added for an nfs server, the client can end up in a softlockup. Because of the way that xs_sendpages() is structured, the -EPERM is ignored since the prior bits of the packet may have been successfully queued and thus xs_sendpages() returns a non-zero value. Then, xs_udp_send_request() thinks that because some bits were queued it should return -EAGAIN. We then try the request again and again, resulting in cpu spinning. Reproducer: 1) open a file on the nfs server '/nfs/foo' (mounted using udp) 2) iptables -A OUTPUT -d <nfs server ip> -j DROP 3) write to /nfs/foo 4) close /nfs/foo 5) iptables -D OUTPUT -d <nfs server ip> -j DROP The softlockup occurs in step 4 above. The previous patch, allows xs_sendpages() to return both a sent count and any error values that may have occurred. Thus, if we get an -EPERM, return that to the higher level code. With this patch in place we can successfully abort the above sequence and avoid the softlockup. I also tried the above test case on an nfs mount on tcp and although the system does not softlockup, I still ended up with the 'hung_task' firing after 120 seconds, due to the i/o being stuck. The tcp case appears a bit harder to fix, since -EPERM appears to get ignored much lower down in the stack and does not propogate up to xs_sendpages(). This case is not quite as insidious as the softlockup and it is not addressed here. Reported-by: NYigong Lou <ylou@akamai.com> Signed-off-by: NJason Baron <jbaron@akamai.com> Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
由 Jason Baron 提交于
If an error is returned after the first bits of a packet have already been successfully queued, xs_sendpages() will return a positive 'int' value indicating success. Callers seem to treat this as -EAGAIN. However, there are cases where its not a question of waiting for the write queue to drain. For example, when there is an iptables rule dropping packets to the destination, the lower level code can return -EPERM only after parts of the packet have been successfully queued. In this case, we can end up continuously retrying resulting in a kernel softlockup. This patch is intended to make no changes in behavior but is in preparation for subsequent patches that can make decisions based on both on the number of bytes sent by xs_sendpages() and any errors that may have be returned. Signed-off-by: NJason Baron <jbaron@akamai.com> Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
由 Benjamin Coddington 提交于
When aborting a connection to preserve source ports, don't wake the task in xs_error_report. This allows tasks with RPC_TASK_SOFTCONN to succeed if the connection needs to be re-established since it preserves the task's status instead of setting it to the status of the aborting kernel_connect(). This may also avoid a potential conflict on the socket's lock. Signed-off-by: NBenjamin Coddington <bcodding@redhat.com> Cc: stable@vger.kernel.org # 3.14+ Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
- 11 9月, 2014 1 次提交
-
-
由 Chris Perl 提交于
When attempting to establish a local ephemeral endpoint for a TCP or UDP socket, do not explicitly call bind, instead let it happen implicilty when the socket is first used. The main motivating factor for this change is when TCP runs out of unique ephemeral ports (i.e. cannot find any ephemeral ports which are not a part of *any* TCP connection). In this situation if you explicitly call bind, then the call will fail with EADDRINUSE. However, if you allow the allocation of an ephemeral port to happen implicitly as part of connect (or other functions), then ephemeral ports can be reused, so long as the combination of (local_ip, local_port, remote_ip, remote_port) is unique for TCP sockets on the system. This doesn't matter for UDP sockets, but it seemed easiest to treat TCP and UDP sockets the same. This can allow mount.nfs(8) to continue to function successfully, even in the face of misbehaving applications which are creating a large number of TCP connections. Signed-off-by: NChris Perl <chris.perl@gmail.com> Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
- 13 7月, 2014 1 次提交
-
-
由 Daniel Walter 提交于
Replace obsolete strict_strto calls with appropriate kstrto calls Signed-off-by: NDaniel Walter <dwalter@google.com> Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
- 01 7月, 2014 1 次提交
-
-
由 Trond Myklebust 提交于
Currently, an ENOBUFS error will result in a fatal error for the RPC call. Normally, we will just want to wait and then retry. Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
- 24 5月, 2014 1 次提交
-
-
由 Tom Herbert 提交于
Setting sk_no_check to UDP_CSUM_NORCV seems to have no effect. Signed-off-by: NTom Herbert <therbert@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 18 4月, 2014 1 次提交
-
-
由 Peter Zijlstra 提交于
Mostly scripted conversion of the smp_mb__* barriers. Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Acked-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com> Link: http://lkml.kernel.org/n/tip-55dhyhocezdw1dg7u19hmh1u@git.kernel.org Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: linux-arch@vger.kernel.org Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
- 12 4月, 2014 1 次提交
-
-
由 David S. Miller 提交于
Several spots in the kernel perform a sequence like: skb_queue_tail(&sk->s_receive_queue, skb); sk->sk_data_ready(sk, skb->len); But at the moment we place the SKB onto the socket receive queue it can be consumed and freed up. So this skb->len access is potentially to freed up memory. Furthermore, the skb->len can be modified by the consumer so it is possible that the value isn't accurate. And finally, no actual implementation of this callback actually uses the length argument. And since nobody actually cared about it's value, lots of call sites pass arbitrary values in such as '0' and even '1'. So just remove the length argument from the callback, that way there is no confusion whatsoever and all of these use-after-free cases get fixed as a side effect. Based upon a patch by Eric Dumazet and his suggestion to audit this issue tree-wide. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 30 3月, 2014 3 次提交
-
-
由 Kinglong Mee 提交于
Don't move the assign of args->bc_xprt->xpt_bc_xprt out of xs_setup_bc_tcp, because rpc_ping (which is in rpc_create) will using it. Signed-off-by: NKinglong Mee <kinglongmee@gmail.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Kinglong Mee 提交于
Besides checking rpc_xprt out of xs_setup_bc_tcp, increase it's reference (it's important). Signed-off-by: NKinglong Mee <kinglongmee@gmail.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Kinglong Mee 提交于
Backchannel xprt isn't freed right now. Free it in bc_destroy, and put the reference of THIS_MODULE. Signed-off-by: NKinglong Mee <kinglongmee@gmail.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
- 29 3月, 2014 1 次提交
-
-
由 Kinglong Mee 提交于
Creating xprt failed after xs_format_peer_addresses, sunrpc must free those memory of peer addresses in xprt. Signed-off-by: NKinglong Mee <kinglongmee@gmail.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
- 12 2月, 2014 1 次提交
-
-
由 Trond Myklebust 提交于
Since TCP is a stream protocol, our callback read code needs to take into account the fact that RPC callbacks are not always confined to a single TCP segment. This patch adds support for multiple TCP segments by ensuring that we only remove the rpc_rqst structure from the 'free backchannel requests' list once the data has been completely received. We rely on the fact that TCP data is ordered for the duration of the connection. Reported-by: Nshaobingqing <shaobingqing@bwstor.com.cn> Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
- 11 2月, 2014 1 次提交
-
-
由 Trond Myklebust 提交于
When a send failure occurs due to the socket being out of buffer space, we call xs_nospace() in order to have the RPC task wait until the socket has drained enough to make it worth while trying again. The current patch fixes a race in which the socket is drained before we get round to setting up the machinery in xs_nospace(), and which is reported to cause hangs. Link: http://lkml.kernel.org/r/20140210170315.33dfc621@notabene.brown Fixes: a9a6b52e (SUNRPC: Don't start the retransmission timer...) Reported-by: NNeil Brown <neilb@suse.com> Cc: stable@vger.kernel.org Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
- 15 1月, 2014 1 次提交
-
-
由 Aruna-Hewapathirane 提交于
This patch removes the net_random and net_srandom macros and replaces them with direct calls to the prandom ones. As new commits only seem to use prandom_u32 there is no use to keep them around. This change makes it easier to grep for users of prandom_u32. Signed-off-by: NAruna-Hewapathirane <aruna.hewapathirane@gmail.com> Suggested-by: NHannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 01 1月, 2014 2 次提交
-
-
由 Trond Myklebust 提交于
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
由 Trond Myklebust 提交于
Currently we only report EAGAIN, which is not descriptive enough for softconn tasks. Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
- 11 12月, 2013 1 次提交
-
-
由 Weng Meiling 提交于
Signed-off-by: NWeng Meiling <wengmeiling.weng@huawei.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
- 13 11月, 2013 1 次提交
-
-
由 J. Bruce Fields 提交于
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
- 09 11月, 2013 1 次提交
-
-
由 Trond Myklebust 提交于
The following scenario can cause silent data corruption when doing NFS writes. It has mainly been observed when doing database writes using O_DIRECT. 1) The RPC client uses sendpage() to do zero-copy of the page data. 2) Due to networking issues, the reply from the server is delayed, and so the RPC client times out. 3) The client issues a second sendpage of the page data as part of an RPC call retransmission. 4) The reply to the first transmission arrives from the server _before_ the client hardware has emptied the TCP socket send buffer. 5) After processing the reply, the RPC state machine rules that the call to be done, and triggers the completion callbacks. 6) The application notices the RPC call is done, and reuses the pages to store something else (e.g. a new write). 7) The client NIC drains the TCP socket send buffer. Since the page data has now changed, it reads a corrupted version of the initial RPC call, and puts it on the wire. This patch fixes the problem in the following manner: The ordering guarantees of TCP ensure that when the server sends a reply, then we know that the _first_ transmission has completed. Using zero-copy in that situation is therefore safe. If a time out occurs, we then send the retransmission using sendmsg() (i.e. no zero-copy), We then know that the socket contains a full copy of the data, and so it will retransmit a faithful reproduction even if the RPC call completes, and the application reuses the O_DIRECT buffer in the meantime. Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@vger.kernel.org
-
- 31 10月, 2013 2 次提交
-
-
由 Trond Myklebust 提交于
There is no longer any need for a separate xs_local_destroy() helper. Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
-
由 NeilBrown 提交于
We have one report of a crash in xs_tcp_setup_socket. The call path to the crash is: xs_tcp_setup_socket -> inet_stream_connect -> lock_sock_nested. The 'sock' passed to that last function is NULL. The only way I can see this happening is a concurrent call to xs_close: xs_close -> xs_reset_transport -> sock_release -> inet_release inet_release sets: sock->sk = NULL; inet_stream_connect calls lock_sock(sock->sk); which gets NULL. All calls to xs_close are protected by XPRT_LOCKED as are most activations of the workqueue which runs xs_tcp_setup_socket. The exception is xs_tcp_schedule_linger_timeout. So presumably the timeout queued by the later fires exactly when some other code runs xs_close(). To protect against this we can move the cancel_delayed_work_sync() call from xs_destory() to xs_close(). As xs_close is never called from the worker scheduled on ->connect_worker, this can never deadlock. Signed-off-by: NNeilBrown <neilb@suse.de> [Trond: Make it safe to call cancel_delayed_work_sync() on AF_LOCAL sockets] Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
-
- 29 10月, 2013 1 次提交
-
-
由 J. Bruce Fields 提交于
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com> Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
-
- 02 10月, 2013 2 次提交
-
-
由 Trond Myklebust 提交于
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
-
由 Trond Myklebust 提交于
For NFSv4 we want to avoid retransmitting RPC calls unless the TCP connection breaks. However we still want to detect TCP connection breakage as soon as possible. Do this by setting the keepalive option with the idle timeout and count set to the 'timeo' and 'retrans' mount options. Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
-
- 05 9月, 2013 1 次提交
-
-
由 Trond Myklebust 提交于
Add client side debugging to help trace socket connection/disconnection and unexpected state change issues. Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
-
- 25 7月, 2013 1 次提交
-
-
由 Eric Dumazet 提交于
Several call sites use the hardcoded following condition : sk_stream_wspace(sk) >= sk_stream_min_wspace(sk) Lets use a helper because TCP_NOTSENT_LOWAT support will change this condition for TCP sockets. Signed-off-by: NEric Dumazet <edumazet@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Yuchung Cheng <ycheng@google.com> Acked-by: NNeal Cardwell <ncardwell@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 13 6月, 2013 1 次提交
-
-
由 Joe Perches 提交于
Reduce the uses of this unnecessary typedef. Done via perl script: $ git grep --name-only -w ctl_table net | \ xargs perl -p -i -e '\ sub trim { my ($local) = @_; $local =~ s/(^\s+|\s+$)//g; return $local; } \ s/\b(?<!struct\s)ctl_table\b(\s*\*\s*|\s+\w+)/"struct ctl_table " . trim($1)/ge' Reflow the modified lines that now exceed 80 columns. Signed-off-by: NJoe Perches <joe@perches.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 15 5月, 2013 1 次提交
-
-
由 J. Bruce Fields 提交于
XPRT_BOUND is set on server backchannel xprts by xs_setup_bc_tcp() (using xprt_set_bound()), and is never cleared, so ->rpcbind() will never need to be called. Reported-by: N"Myklebust, Trond" <Trond.Myklebust@netapp.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-