1. 25 11月, 2014 2 次提交
  2. 25 9月, 2014 4 次提交
    • N
      NFS/SUNRPC: Remove other deadlock-avoidance mechanisms in nfs_release_page() · 1aff5256
      NeilBrown 提交于
      Now that nfs_release_page() doesn't block indefinitely, other deadlock
      avoidance mechanisms aren't needed.
       - it doesn't hurt for kswapd to block occasionally.  If it doesn't
         want to block it would clear __GFP_WAIT.  The current_is_kswapd()
         was only added to avoid deadlocks and we have a new approach for
         that.
       - memory allocation in the SUNRPC layer can very rarely try to
         ->releasepage() a page it is trying to handle.  The deadlock
         is removed as nfs_release_page() doesn't block indefinitely.
      
      So we don't need to set PF_FSTRANS for sunrpc network operations any
      more.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      Acked-by: NJeff Layton <jlayton@primarydata.com>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      1aff5256
    • J
      rpc: Add -EPERM processing for xs_udp_send_request() · 3dedbb5c
      Jason Baron 提交于
      If an iptables drop rule is added for an nfs server, the client can end up in
      a softlockup. Because of the way that xs_sendpages() is structured, the -EPERM
      is ignored since the prior bits of the packet may have been successfully queued
      and thus xs_sendpages() returns a non-zero value. Then, xs_udp_send_request()
      thinks that because some bits were queued it should return -EAGAIN. We then try
      the request again and again, resulting in cpu spinning. Reproducer:
      
      1) open a file on the nfs server '/nfs/foo' (mounted using udp)
      2) iptables -A OUTPUT -d <nfs server ip> -j DROP
      3) write to /nfs/foo
      4) close /nfs/foo
      5) iptables -D OUTPUT -d <nfs server ip> -j DROP
      
      The softlockup occurs in step 4 above.
      
      The previous patch, allows xs_sendpages() to return both a sent count and
      any error values that may have occurred. Thus, if we get an -EPERM, return
      that to the higher level code.
      
      With this patch in place we can successfully abort the above sequence and
      avoid the softlockup.
      
      I also tried the above test case on an nfs mount on tcp and although the system
      does not softlockup, I still ended up with the 'hung_task' firing after 120
      seconds, due to the i/o being stuck. The tcp case appears a bit harder to fix,
      since -EPERM appears to get ignored much lower down in the stack and does not
      propogate up to xs_sendpages(). This case is not quite as insidious as the
      softlockup and it is not addressed here.
      Reported-by: NYigong Lou <ylou@akamai.com>
      Signed-off-by: NJason Baron <jbaron@akamai.com>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      3dedbb5c
    • J
      rpc: return sent and err from xs_sendpages() · f279cd00
      Jason Baron 提交于
      If an error is returned after the first bits of a packet have already been
      successfully queued, xs_sendpages() will return a positive 'int' value
      indicating success. Callers seem to treat this as -EAGAIN.
      
      However, there are cases where its not a question of waiting for the write
      queue to drain. For example, when there is an iptables rule dropping packets
      to the destination, the lower level code can return -EPERM only after parts
      of the packet have been successfully queued. In this case, we can end up
      continuously retrying resulting in a kernel softlockup.
      
      This patch is intended to make no changes in behavior but is in preparation for
      subsequent patches that can make decisions based on both on the number of bytes
      sent by xs_sendpages() and any errors that may have be returned.
      Signed-off-by: NJason Baron <jbaron@akamai.com>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      f279cd00
    • B
      SUNRPC: Don't wake tasks during connection abort · a743419f
      Benjamin Coddington 提交于
      When aborting a connection to preserve source ports, don't wake the task in
      xs_error_report.  This allows tasks with RPC_TASK_SOFTCONN to succeed if the
      connection needs to be re-established since it preserves the task's status
      instead of setting it to the status of the aborting kernel_connect().
      
      This may also avoid a potential conflict on the socket's lock.
      Signed-off-by: NBenjamin Coddington <bcodding@redhat.com>
      Cc: stable@vger.kernel.org # 3.14+
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      a743419f
  3. 11 9月, 2014 1 次提交
    • C
      rpc: xs_bind - do not bind when requesting a random ephemeral port · 0f7a622c
      Chris Perl 提交于
      When attempting to establish a local ephemeral endpoint for a TCP or UDP
      socket, do not explicitly call bind, instead let it happen implicilty when the
      socket is first used.
      
      The main motivating factor for this change is when TCP runs out of unique
      ephemeral ports (i.e.  cannot find any ephemeral ports which are not a part of
      *any* TCP connection).  In this situation if you explicitly call bind, then the
      call will fail with EADDRINUSE.  However, if you allow the allocation of an
      ephemeral port to happen implicitly as part of connect (or other functions),
      then ephemeral ports can be reused, so long as the combination of (local_ip,
      local_port, remote_ip, remote_port) is unique for TCP sockets on the system.
      
      This doesn't matter for UDP sockets, but it seemed easiest to treat TCP and UDP
      sockets the same.
      
      This can allow mount.nfs(8) to continue to function successfully, even in the
      face of misbehaving applications which are creating a large number of TCP
      connections.
      Signed-off-by: NChris Perl <chris.perl@gmail.com>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      0f7a622c
  4. 13 7月, 2014 1 次提交
  5. 01 7月, 2014 1 次提交
  6. 24 5月, 2014 1 次提交
  7. 18 4月, 2014 1 次提交
  8. 12 4月, 2014 1 次提交
    • D
      net: Fix use after free by removing length arg from sk_data_ready callbacks. · 676d2369
      David S. Miller 提交于
      Several spots in the kernel perform a sequence like:
      
      	skb_queue_tail(&sk->s_receive_queue, skb);
      	sk->sk_data_ready(sk, skb->len);
      
      But at the moment we place the SKB onto the socket receive queue it
      can be consumed and freed up.  So this skb->len access is potentially
      to freed up memory.
      
      Furthermore, the skb->len can be modified by the consumer so it is
      possible that the value isn't accurate.
      
      And finally, no actual implementation of this callback actually uses
      the length argument.  And since nobody actually cared about it's
      value, lots of call sites pass arbitrary values in such as '0' and
      even '1'.
      
      So just remove the length argument from the callback, that way there
      is no confusion whatsoever and all of these use-after-free cases get
      fixed as a side effect.
      
      Based upon a patch by Eric Dumazet and his suggestion to audit this
      issue tree-wide.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      676d2369
  9. 30 3月, 2014 3 次提交
  10. 29 3月, 2014 1 次提交
  11. 12 2月, 2014 1 次提交
  12. 11 2月, 2014 1 次提交
  13. 15 1月, 2014 1 次提交
  14. 01 1月, 2014 2 次提交
  15. 11 12月, 2013 1 次提交
  16. 13 11月, 2013 1 次提交
  17. 09 11月, 2013 1 次提交
    • T
      SUNRPC: Fix a data corruption issue when retransmitting RPC calls · a6b31d18
      Trond Myklebust 提交于
      The following scenario can cause silent data corruption when doing
      NFS writes. It has mainly been observed when doing database writes
      using O_DIRECT.
      
      1) The RPC client uses sendpage() to do zero-copy of the page data.
      2) Due to networking issues, the reply from the server is delayed,
         and so the RPC client times out.
      
      3) The client issues a second sendpage of the page data as part of
         an RPC call retransmission.
      
      4) The reply to the first transmission arrives from the server
         _before_ the client hardware has emptied the TCP socket send
         buffer.
      5) After processing the reply, the RPC state machine rules that
         the call to be done, and triggers the completion callbacks.
      6) The application notices the RPC call is done, and reuses the
         pages to store something else (e.g. a new write).
      
      7) The client NIC drains the TCP socket send buffer. Since the
         page data has now changed, it reads a corrupted version of the
         initial RPC call, and puts it on the wire.
      
      This patch fixes the problem in the following manner:
      
      The ordering guarantees of TCP ensure that when the server sends a
      reply, then we know that the _first_ transmission has completed. Using
      zero-copy in that situation is therefore safe.
      If a time out occurs, we then send the retransmission using sendmsg()
      (i.e. no zero-copy), We then know that the socket contains a full copy of
      the data, and so it will retransmit a faithful reproduction even if the
      RPC call completes, and the application reuses the O_DIRECT buffer in
      the meantime.
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      Cc: stable@vger.kernel.org
      a6b31d18
  18. 31 10月, 2013 2 次提交
    • T
      SUNRPC: Cleanup xs_destroy() · a1311d87
      Trond Myklebust 提交于
      There is no longer any need for a separate xs_local_destroy() helper.
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      a1311d87
    • N
      SUNRPC: close a rare race in xs_tcp_setup_socket. · 93dc41bd
      NeilBrown 提交于
      We have one report of a crash in xs_tcp_setup_socket.
      The call path to the crash is:
      
        xs_tcp_setup_socket -> inet_stream_connect -> lock_sock_nested.
      
      The 'sock' passed to that last function is NULL.
      
      The only way I can see this happening is a concurrent call to
      xs_close:
      
        xs_close -> xs_reset_transport -> sock_release -> inet_release
      
      inet_release sets:
         sock->sk = NULL;
      inet_stream_connect calls
         lock_sock(sock->sk);
      which gets NULL.
      
      All calls to xs_close are protected by XPRT_LOCKED as are most
      activations of the workqueue which runs xs_tcp_setup_socket.
      The exception is xs_tcp_schedule_linger_timeout.
      
      So presumably the timeout queued by the later fires exactly when some
      other code runs xs_close().
      
      To protect against this we can move the cancel_delayed_work_sync()
      call from xs_destory() to xs_close().
      
      As xs_close is never called from the worker scheduled on
      ->connect_worker, this can never deadlock.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      [Trond: Make it safe to call cancel_delayed_work_sync() on AF_LOCAL sockets]
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      93dc41bd
  19. 29 10月, 2013 1 次提交
  20. 02 10月, 2013 2 次提交
  21. 05 9月, 2013 1 次提交
  22. 25 7月, 2013 1 次提交
  23. 13 6月, 2013 1 次提交
  24. 15 5月, 2013 1 次提交
  25. 26 4月, 2013 1 次提交
  26. 15 4月, 2013 1 次提交
  27. 26 3月, 2013 1 次提交
  28. 10 3月, 2013 1 次提交
    • J
      sunrpc: don't attempt to cancel unitialized work · 190b1ecf
      J. Bruce Fields 提交于
      As of dc107402 "SUNRPC: make AF_LOCAL connect synchronous", we no longer initialize connect_worker in the
      AF_LOCAL case, resulting in warnings like:
      
          WARNING: at lib/debugobjects.c:261 debug_print_object+0x8c/0xb0() Hardware name: Bochs
          ODEBUG: assert_init not available (active state 0) object type: timer_list hint: stub_timer+0x0/0x20
          Modules linked in: iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi nfsd auth_rpcgss nfs_acl lockd sunrpc
          Pid: 4816, comm: nfsd Tainted: G        W    3.8.0-rc2-00049-gdc107402 #801
          Call Trace:
           [<ffffffff8156ec00>] ? free_obj_work+0x60/0xa0
           [<ffffffff81046aaf>] warn_slowpath_common+0x7f/0xc0
           [<ffffffff81046ba6>] warn_slowpath_fmt+0x46/0x50
           [<ffffffff8156eccc>] debug_print_object+0x8c/0xb0
           [<ffffffff81055030>] ? timer_debug_hint+0x10/0x10
           [<ffffffff8156f7e3>] debug_object_assert_init+0xe3/0x120
           [<ffffffff81057ebb>] del_timer+0x2b/0x80
           [<ffffffff8109c4e6>] ? mark_held_locks+0x86/0x110
           [<ffffffff81065a29>] try_to_grab_pending+0xd9/0x150
           [<ffffffff81065b57>] __cancel_work_timer+0x27/0xc0
           [<ffffffff81065c03>] cancel_delayed_work_sync+0x13/0x20
           [<ffffffffa0007067>] xs_destroy+0x27/0x80 [sunrpc]
           [<ffffffffa00040d8>] xprt_destroy+0x78/0xa0 [sunrpc]
           [<ffffffffa0006241>] xprt_put+0x21/0x30 [sunrpc]
           [<ffffffffa00030cf>] rpc_free_client+0x10f/0x1a0 [sunrpc]
           [<ffffffffa0002ff3>] ? rpc_free_client+0x33/0x1a0 [sunrpc]
           [<ffffffffa0002f7e>] rpc_release_client+0x6e/0xb0 [sunrpc]
           [<ffffffffa000325d>] rpc_shutdown_client+0xfd/0x1b0 [sunrpc]
           [<ffffffffa0017196>] rpcb_put_local+0x106/0x130 [sunrpc]
          ...
      Acked-by: N"Myklebust, Trond" <Trond.Myklebust@netapp.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      190b1ecf
  29. 01 3月, 2013 1 次提交
  30. 05 2月, 2013 1 次提交
  31. 01 2月, 2013 1 次提交