1. 14 4月, 2021 4 次提交
    • C
      SUNRPC: Handle major timeout in xprt_adjust_timeout() · 09252177
      Chris Dion 提交于
      Currently if a major timeout value is reached, but the minor value has
      not been reached, an ETIMEOUT will not be sent back to the caller.
      This can occur if the v4 server is not responding to requests and
      retrans is configured larger than the default of two.
      
      For example, A TCP mount with a configured timeout value of 50 and a
      retransmission count of 3 to a v4 server which is not responding:
      
      1. Initial value and increment set to 5s, maxval set to 20s, retries at 3
      2. Major timeout is set to 20s, minor timeout set to 5s initially
      3. xport_adjust_timeout() is called after 5s, retry with 10s timeout,
         minor timeout is bumped to 10s
      4. And again after another 10s, 15s total time with minor timeout set
         to 15s
      5. After 20s total time xport_adjust_timeout is called as major timeout is
         reached, but skipped because the minor timeout is not reached
             - After this time the cpu spins continually calling
             	 xport_adjust_timeout() and returning 0 for 10 seconds.
      	 As seen on perf sched:
         	 39243.913182 [0005]  mount.nfs[3794] 4607.938      0.017   9746.863
      6. This continues until the 15s minor timeout condition is reached (in
         this case for 10 seconds). After which the ETIMEOUT is processed
         back to the caller, the cpu spinning stops, and normal operations
         continue
      
      Fixes: 7de62bc0 ("SUNRPC dont update timeout value on connection reset")
      Signed-off-by: NChris Dion <Christopher.Dion@dell.com>
      Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>
      09252177
    • C
      SUNRPC: Remove trace_xprt_transmit_queued · 6cf23783
      Chuck Lever 提交于
      This tracepoint can crash when dereferencing snd_task because
      when some transports connect, they put a cookie in that field
      instead of a pointer to an rpc_task.
      
      BUG: KASAN: use-after-free in trace_event_raw_event_xprt_writelock_event+0x141/0x18e [sunrpc]
      Read of size 2 at addr ffff8881a83bd3a0 by task git/331872
      
      CPU: 11 PID: 331872 Comm: git Tainted: G S                5.12.0-rc2-00007-g3ab6e585a7f9 #1453
      Hardware name: Supermicro SYS-6028R-T/X10DRi, BIOS 1.1a 10/16/2015
      Call Trace:
       dump_stack+0x9c/0xcf
       print_address_description.constprop.0+0x18/0x239
       kasan_report+0x174/0x1b0
       trace_event_raw_event_xprt_writelock_event+0x141/0x18e [sunrpc]
       xprt_prepare_transmit+0x8e/0xc1 [sunrpc]
       call_transmit+0x4d/0xc6 [sunrpc]
      
      Fixes: 9ce07ae5 ("SUNRPC: Replace dprintk() call site in xprt_prepare_transmit")
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>
      6cf23783
    • C
      SUNRPC: Add tracepoint that fires when an RPC is retransmitted · e936a597
      Chuck Lever 提交于
      A separate tracepoint can be left enabled all the time to capture
      rare but important retransmission events. So for example:
      
      kworker/u26:3-568   [009]   156.967933: xprt_retransmit:      task:44093@5 xid=0xa25dbc79 nfsv3 WRITE ntrans=2
      
      Or, for example, enable all nfs and nfs4 tracepoints, and set up a
      trigger to disable tracing when xprt_retransmit fires to capture
      everything that leads up to it.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>
      e936a597
    • C
      SUNRPC: Move fault injection call sites · 7638e0bf
      Chuck Lever 提交于
      I've hit some crashes that occur in the xprt_rdma_inject_disconnect
      path. It appears that, for some provides, rdma_disconnect() can
      take so long that the transport can disconnect and release its
      hardware resources while rdma_disconnect() is still running,
      resulting in a UAF in the provider.
      
      The transport's fault injection method may depend on the stability
      of transport data structures. That means it needs to be invoked
      only from contexts that hold the transport write lock.
      
      Fixes: 4a068258 ("SUNRPC: Transport fault injection")
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>
      7638e0bf
  2. 05 4月, 2021 1 次提交
  3. 03 12月, 2020 4 次提交
  4. 21 9月, 2020 6 次提交
  5. 24 8月, 2020 1 次提交
  6. 05 8月, 2020 1 次提交
    • O
      SUNRPC dont update timeout value on connection reset · 7de62bc0
      Olga Kornievskaia 提交于
      Current behaviour: every time a v3 operation is re-sent to the server
      we update (double) the timeout. There is no distinction between whether
      or not the previous timer had expired before the re-sent happened.
      
      Here's the scenario:
      1. Client sends a v3 operation
      2. Server RST-s the connection (prior to the timeout) (eg., connection
      is immediately reset)
      3. Client re-sends a v3 operation but the timeout is now 120sec.
      
      As a result, an application sees 2mins pause before a retry in case
      server again does not reply.
      
      Instead, this patch proposes to keep track off when the minor timeout
      should happen and if it didn't, then don't update the new timeout.
      Value is updated based on the previous value to make timeouts
      predictable.
      Signed-off-by: NOlga Kornievskaia <kolga@netapp.com>
      Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>
      7de62bc0
  7. 12 6月, 2020 2 次提交
  8. 17 3月, 2020 1 次提交
  9. 31 10月, 2019 1 次提交
  10. 24 10月, 2019 1 次提交
  11. 21 9月, 2019 1 次提交
  12. 18 9月, 2019 1 次提交
  13. 27 8月, 2019 1 次提交
  14. 19 7月, 2019 1 次提交
  15. 09 7月, 2019 1 次提交
  16. 07 7月, 2019 3 次提交
  17. 22 6月, 2019 1 次提交
  18. 21 5月, 2019 1 次提交
  19. 26 4月, 2019 7 次提交
  20. 16 3月, 2019 1 次提交