1. 19 8月, 2017 1 次提交
  2. 17 8月, 2017 4 次提交
  3. 02 8月, 2017 1 次提交
  4. 21 7月, 2017 1 次提交
    • N
      net/sunrpc/xprt_sock: fix regression in connection error reporting. · 3ffbc1d6
      NeilBrown 提交于
      Commit 3d476263 ("tcp: remove poll() flakes when receiving
      RST") in v4.12 changed the order in which ->sk_state_change()
      and ->sk_error_report() are called when a socket is shut
      down - sk_state_change() is now called first.
      
      This causes xs_tcp_state_change() -> xs_sock_mark_closed() ->
      xprt_disconnect_done() to wake all pending tasked with -EAGAIN.
      When the ->sk_error_report() callback arrives, it is too late to
      pass the error on, and it is lost.
      
      As easy way to demonstrate the problem caused is to try to start
      rpc.nfsd while rcpbind isn't running.
      nfsd will attempt a tcp connection to rpcbind.  A ECONNREFUSED
      error is returned, but sunrpc code loses the error and keeps
      retrying.  If it saw the ECONNREFUSED, it would abort.
      
      To fix this, handle the sk->sk_err in the TCP_CLOSE branch of
      xs_tcp_state_change().
      
      Fixes: 3d476263 ("tcp: remove poll() flakes when receiving RST")
      Cc: stable@vger.kernel.org (v4.12)
      Signed-off-by: NNeilBrown <neilb@suse.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      3ffbc1d6
  5. 01 6月, 2017 1 次提交
    • N
      SUNRPC: ensure correct error is reported by xs_tcp_setup_socket() · 6ea44adc
      NeilBrown 提交于
      If you attempt a TCP mount from an host that is unreachable in a way
      that triggers an immediate error from kernel_connect(), that error
      does not propagate up, instead EAGAIN is reported.
      
      This results in call_connect_status receiving the wrong error.
      
      A case that it easy to demonstrate is to attempt to mount from an
      address that results in ENETUNREACH, but first deleting any default
      route.
      Without this patch, the mount.nfs process is persistently runnable
      and is hard to kill.  With this patch it exits as it should.
      
      The problem is caused by the fact that xs_tcp_force_close() eventually
      calls
            xprt_wake_pending_tasks(xprt, -EAGAIN);
      which causes an error return of -EAGAIN.  so when xs_tcp_setup_sock()
      calls
            xprt_wake_pending_tasks(xprt, status);
      the status is ignored.
      
      Fixes: 4efdd92c ("SUNRPC: Remove TCP client connection reset hack")
      Signed-off-by: NNeilBrown <neilb@suse.com>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      6ea44adc
  6. 28 2月, 2017 1 次提交
  7. 21 2月, 2017 1 次提交
  8. 11 2月, 2017 1 次提交
    • C
      sunrpc: Allow xprt->ops->timer method to sleep · b977b644
      Chuck Lever 提交于
      The transport lock is needed to protect the xprt_adjust_cwnd() call
      in xs_udp_timer, but it is not necessary for accessing the
      rq_reply_bytes_recvd or tk_status fields. It is correct to sublimate
      the lock into UDP's xs_udp_timer method, where it is required.
      
      The ->timer method has to take the transport lock if needed, but it
      can now sleep safely, or even call back into the RPC scheduler.
      
      This is more a clean-up than a fix, but the "issue" was introduced
      by my transport switch patches back in 2005.
      
      Fixes: 46c0ee8b ("RPC: separate xprt_timer implementations")
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      b977b644
  9. 10 2月, 2017 2 次提交
  10. 08 11月, 2016 1 次提交
    • P
      udp: do fwd memory scheduling on dequeue · 7c13f97f
      Paolo Abeni 提交于
      A new argument is added to __skb_recv_datagram to provide
      an explicit skb destructor, invoked under the receive queue
      lock.
      The UDP protocol uses such argument to perform memory
      reclaiming on dequeue, so that the UDP protocol does not
      set anymore skb->desctructor.
      Instead explicit memory reclaiming is performed at close() time and
      when skbs are removed from the receive queue.
      The in kernel UDP protocol users now need to call a
      skb_recv_udp() variant instead of skb_recv_datagram() to
      properly perform memory accounting on dequeue.
      
      Overall, this allows acquiring only once the receive queue
      lock on dequeue.
      
      Tested using pktgen with random src port, 64 bytes packet,
      wire-speed on a 10G link as sender and udp_sink as the receiver,
      using an l4 tuple rxhash to stress the contention, and one or more
      udp_sink instances with reuseport.
      
      nr sinks	vanilla		patched
      1		440		560
      3		2150		2300
      6		3650		3800
      9		4450		4600
      12		6250		6450
      
      v1 -> v2:
       - do rmem and allocated memory scheduling under the receive lock
       - do bulk scheduling in first_packet_length() and in udp_destruct_sock()
       - avoid the typdef for the dequeue callback
      Suggested-by: NEric Dumazet <edumazet@google.com>
      Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7c13f97f
  11. 29 10月, 2016 1 次提交
    • J
      sunrpc: fix some missing rq_rbuffer assignments · 18e601d6
      Jeff Layton 提交于
      We've been seeing some crashes in testing that look like this:
      
      BUG: unable to handle kernel NULL pointer dereference at           (null)
      IP: [<ffffffff8135ce99>] memcpy_orig+0x29/0x110
      PGD 212ca2067 PUD 212ca3067 PMD 0
      Oops: 0002 [#1] SMP
      Modules linked in: rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache ppdev parport_pc i2c_piix4 sg parport i2c_core virtio_balloon pcspkr acpi_cpufreq nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sd_mod ata_generic pata_acpi virtio_scsi 8139too ata_piix libata 8139cp mii virtio_pci floppy virtio_ring serio_raw virtio
      CPU: 1 PID: 1540 Comm: nfsd Not tainted 4.9.0-rc1 #39
      Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007
      task: ffff88020d7ed200 task.stack: ffff880211838000
      RIP: 0010:[<ffffffff8135ce99>]  [<ffffffff8135ce99>] memcpy_orig+0x29/0x110
      RSP: 0018:ffff88021183bdd0  EFLAGS: 00010206
      RAX: 0000000000000000 RBX: ffff88020d7fa000 RCX: 000000f400000000
      RDX: 0000000000000014 RSI: ffff880212927020 RDI: 0000000000000000
      RBP: ffff88021183be30 R08: 01000000ef896996 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000000 R12: ffff880211704ca8
      R13: ffff88021473f000 R14: 00000000ef896996 R15: ffff880211704800
      FS:  0000000000000000(0000) GS:ffff88021fc80000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000000000000 CR3: 0000000212ca1000 CR4: 00000000000006e0
      Stack:
       ffffffffa01ea087 ffffffff63400001 ffff880215145e00 ffff880211bacd00
       ffff88021473f2b8 0000000000000004 00000000d0679d67 ffff880211bacd00
       ffff88020d7fa000 ffff88021473f000 0000000000000000 ffff88020d7faa30
      Call Trace:
       [<ffffffffa01ea087>] ? svc_tcp_recvfrom+0x5a7/0x790 [sunrpc]
       [<ffffffffa01f84d8>] svc_recv+0xad8/0xbd0 [sunrpc]
       [<ffffffffa0262d5e>] nfsd+0xde/0x160 [nfsd]
       [<ffffffffa0262c80>] ? nfsd_destroy+0x60/0x60 [nfsd]
       [<ffffffff810a9418>] kthread+0xd8/0xf0
       [<ffffffff816dbdbf>] ret_from_fork+0x1f/0x40
       [<ffffffff810a9340>] ? kthread_park+0x60/0x60
      Code: 00 00 48 89 f8 48 83 fa 20 72 7e 40 38 fe 7c 35 48 83 ea 20 48 83 ea 20 4c 8b 06 4c 8b 4e 08 4c 8b 56 10 4c 8b 5e 18 48 8d 76 20 <4c> 89 07 4c 89 4f 08 4c 89 57 10 4c 89 5f 18 48 8d 7f 20 73 d4
      RIP  [<ffffffff8135ce99>] memcpy_orig+0x29/0x110
       RSP <ffff88021183bdd0>
      CR2: 0000000000000000
      
      Both Bruce and Eryu ran a bisect here and found that the problematic
      patch was 68778945 (SUNRPC: Separate buffer pointers for RPC Call and
      Reply messages).
      
      That patch changed rpc_xdr_encode to use a new rq_rbuffer pointer to
      set up the receive buffer, but didn't change all of the necessary
      codepaths to set it properly. In particular the backchannel setup was
      missing.
      
      We need to set rq_rbuffer whenever rq_buffer is set. Ensure that it is.
      Reviewed-by: NChuck Lever <chuck.lever@oracle.com>
      Tested-by: NChuck Lever <chuck.lever@oracle.com>
      Reported-by: NEryu Guan <guaneryu@gmail.com>
      Tested-by: NEryu Guan <guaneryu@gmail.com>
      Fixes: 68778945 "SUNRPC: Separate buffer pointers..."
      Reported-by: NJ. Bruce Fields <bfields@fieldses.org>
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      18e601d6
  12. 23 10月, 2016 1 次提交
    • P
      udp: use it's own memory accounting schema · 850cbadd
      Paolo Abeni 提交于
      Completely avoid default sock memory accounting and replace it
      with udp-specific accounting.
      
      Since the new memory accounting model encapsulates completely
      the required locking, remove the socket lock on both enqueue and
      dequeue, and avoid using the backlog on enqueue.
      
      Be sure to clean-up rx queue memory on socket destruction, using
      udp its own sk_destruct.
      
      Tested using pktgen with random src port, 64 bytes packet,
      wire-speed on a 10G link as sender and udp_sink as the receiver,
      using an l4 tuple rxhash to stress the contention, and one or more
      udp_sink instances with reuseport.
      
      nr readers      Kpps (vanilla)  Kpps (patched)
      1               170             440
      3               1250            2150
      6               3000            3650
      9               4200            4450
      12              5700            6250
      
      v4 -> v5:
        - avoid unneeded test in first_packet_length
      
      v3 -> v4:
        - remove useless sk_rcvqueues_full() call
      
      v2 -> v3:
        - do not set the now unsed backlog_rcv callback
      
      v1 -> v2:
        - add memory pressure support
        - fixed dropwatch accounting for ipv6
      Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      850cbadd
  13. 20 9月, 2016 3 次提交
    • D
      sunrpc: fix write space race causing stalls · d48f9ce7
      David Vrabel 提交于
      Write space becoming available may race with putting the task to sleep
      in xprt_wait_for_buffer_space().  The existing mechanism to avoid the
      race does not work.
      
      This (edited) partial trace illustrates the problem:
      
         [1] rpc_task_run_action: task:43546@5 ... action=call_transmit
         [2] xs_write_space <-xs_tcp_write_space
         [3] xprt_write_space <-xs_write_space
         [4] rpc_task_sleep: task:43546@5 ...
         [5] xs_write_space <-xs_tcp_write_space
      
      [1] Task 43546 runs but is out of write space.
      
      [2] Space becomes available, xs_write_space() clears the
          SOCKWQ_ASYNC_NOSPACE bit.
      
      [3] xprt_write_space() attemts to wake xprt->snd_task (== 43546), but
          this has not yet been queued and the wake up is lost.
      
      [4] xs_nospace() is called which calls xprt_wait_for_buffer_space()
          which queues task 43546.
      
      [5] The call to sk->sk_write_space() at the end of xs_nospace() (which
          is supposed to handle the above race) does not call
          xprt_write_space() as the SOCKWQ_ASYNC_NOSPACE bit is clear and
          thus the task is not woken.
      
      Fix the race by resetting the SOCKWQ_ASYNC_NOSPACE bit in xs_nospace()
      so the second call to sk->sk_write_space() calls xprt_write_space().
      Suggested-by: NTrond Myklebust <trondmy@primarydata.com>
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      cc: stable@vger.kernel.org # 4.4
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      d48f9ce7
    • C
      SUNRPC: Generalize the RPC buffer release API · 3435c74a
      Chuck Lever 提交于
      xprtrdma needs to allocate the Call and Reply buffers separately.
      TBH, the reliance on using a single buffer for the pair of XDR
      buffers is transport implementation-specific.
      
      Instead of passing just the rq_buffer into the buf_free method, pass
      the task structure and let buf_free take care of freeing both
      XDR buffers at once.
      
      There's a micro-optimization here. In the common case, both
      xprt_release and the transport's buf_free method were checking if
      rq_buffer was NULL. Now the check is done only once per RPC.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      3435c74a
    • C
      SUNRPC: Generalize the RPC buffer allocation API · 5fe6eaa1
      Chuck Lever 提交于
      xprtrdma needs to allocate the Call and Reply buffers separately.
      TBH, the reliance on using a single buffer for the pair of XDR
      buffers is transport implementation-specific.
      
      Transports that want to allocate separate Call and Reply buffers
      will ignore the "size" argument anyway.  Don't bother passing it.
      
      The buf_alloc method can't return two pointers. Instead, make the
      method's return value an error code, and set the rq_buffer pointer
      in the method itself.
      
      This gives call_allocate an opportunity to terminate an RPC instead
      of looping forever when a permanent problem occurs. If a request is
      just bogus, or the transport is in a state where it can't allocate
      resources for any request, there needs to be a way to kill the RPC
      right there and not loop.
      
      This immediately fixes a rare problem in the backchannel send path,
      which loops if the server happens to send a CB request whose
      call+reply size is larger than a page (which it shouldn't do yet).
      
      One more issue: looks like xprt_inject_disconnect was incorrectly
      placed in the failure path in call_allocate. It needs to be in the
      success path, as it is for other call-sites.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      5fe6eaa1
  14. 03 9月, 2016 1 次提交
  15. 06 8月, 2016 2 次提交
  16. 05 8月, 2016 1 次提交
    • N
      SUNRPC: disable the use of IPv6 temporary addresses. · d88e4d82
      NeilBrown 提交于
      If the net.ipv6.conf.*.use_temp_addr sysctl is set to '2',
      then TCP connections over IPv6 will prefer a 'private' source
      address.
      These eventually expire and become invalid, typically after a week,
      but the time is configurable.
      
      When the local address becomes invalid the client will not be able to
      receive replies from the server.  Eventually the connection will timeout
      or break and a new connection will be established, but this can take
      half an hour (typically TCP connection break time).
      
      RFC 4941, which describes private IPv6 addresses, acknowledges that some
      applications might not work well with them and that the application may
      explicitly a request non-temporary (i.e. "public") address.
      
      I believe this is correct for SUNRPC clients.  Without this change, a
      client will occasionally experience a long delay if private addresses
      have been enabled.
      
      The privacy offered by private addresses is of little value for an NFS
      server which requires client authentication.
      
      For NFSv3 this will often not be a problem because idle connections are
      closed after 5 minutes.  For NFSv4 connections never go idle due to the
      period RENEW (or equivalent) request.
      Signed-off-by: NNeilBrown <neilb@suse.com>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      d88e4d82
  17. 02 8月, 2016 1 次提交
  18. 20 7月, 2016 3 次提交
  19. 15 6月, 2016 1 次提交
  20. 14 6月, 2016 4 次提交
  21. 18 5月, 2016 1 次提交
  22. 13 5月, 2016 1 次提交
  23. 28 4月, 2016 1 次提交
  24. 14 4月, 2016 1 次提交
  25. 12 4月, 2016 1 次提交
  26. 06 2月, 2016 1 次提交
  27. 07 1月, 2016 1 次提交
    • T
      SUNRPC: Fixup socket wait for memory · 13331a55
      Trond Myklebust 提交于
      We're seeing hangs in the NFS client code, with loops of the form:
      
       RPC: 30317 xmit incomplete (267368 left of 524448)
       RPC: 30317 call_status (status -11)
       RPC: 30317 call_transmit (status 0)
       RPC: 30317 xprt_prepare_transmit
       RPC: 30317 xprt_transmit(524448)
       RPC:       xs_tcp_send_request(267368) = -11
       RPC: 30317 xmit incomplete (267368 left of 524448)
       RPC: 30317 call_status (status -11)
       RPC: 30317 call_transmit (status 0)
       RPC: 30317 xprt_prepare_transmit
       RPC: 30317 xprt_transmit(524448)
      
      Turns out commit ceb5d58b ("net: fix sock_wake_async() rcu protection")
      moved SOCKWQ_ASYNC_NOSPACE out of sock->flags and into sk->sk_wq->flags,
      however it never tried to fix up the code in net/sunrpc.
      
      The new idiom is to use the flags in the RCU protected struct socket_wq.
      While we're at it, clear out the now redundant places where we set/clear
      SOCKWQ_ASYNC_NOSPACE and SOCK_NOSPACE. In principle, sk_stream_wait_memory()
      is supposed to set these for us, so we only need to clear them in the
      particular case of our ->write_space() callback.
      
      Fixes: ceb5d58b ("net: fix sock_wake_async() rcu protection")
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: stable@vger.kernel.org # 4.4
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      13331a55
  28. 28 12月, 2015 1 次提交