1. 13 6月, 2015 5 次提交
  2. 31 3月, 2015 13 次提交
  3. 07 3月, 2015 1 次提交
  4. 30 1月, 2015 16 次提交
  5. 26 11月, 2014 5 次提交
    • C
      xprtrdma: Display async errors · 7ff11de1
      Chuck Lever 提交于
      An async error upcall is a hard error, and should be reported in
      the system log.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      7ff11de1
    • C
      xprtrdma: Re-write rpcrdma_flush_cqs() · 5c166bef
      Chuck Lever 提交于
      Currently rpcrdma_flush_cqs() attempts to avoid code duplication,
      and simply invokes rpcrdma_recvcq_upcall and rpcrdma_sendcq_upcall.
      
      1. rpcrdma_flush_cqs() can run concurrently with provider upcalls.
         Both flush_cqs() and the upcalls were invoking ib_poll_cq() in
         different threads using the same wc buffers (ep->rep_recv_wcs
         and ep->rep_send_wcs), added by commit 1c00dd07 ("xprtrmda:
         Reduce calls to ib_poll_cq() in completion handlers").
      
         During transport disconnect processing, this sometimes resulted
         in the same reply getting added to the rpcrdma_tasklets_g list
         more than once, which corrupted the list.
      
      2. The upcall functions drain only a limited number of CQEs,
         thanks to the poll budget added by commit 8301a2c0
         ("xprtrdma: Limit work done by completion handler").
      
      Fixes: a7bc211a ("xprtrdma: On disconnect, don't ignore ... ")
      BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=276Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      5c166bef
    • C
      xprtrdma: Refactor tasklet scheduling · f1a03b76
      Chuck Lever 提交于
      Restore the separate function that schedules the reply handling
      tasklet. I need to call it from two different paths.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      f1a03b76
    • C
      xprtrdma: unmap all FMRs during transport disconnect · 467c9674
      Chuck Lever 提交于
      When using RPCRDMA_MTHCAFMR memory registration, after a few
      transport disconnect / reconnect cycles, ib_map_phys_fmr() starts to
      return EINVAL because the provider has exhausted its map pool.
      
      Make sure that all FMRs are unmapped during transport disconnect,
      and that ->send_request remarshals them during an RPC retransmit.
      This resets the transport's MRs to ensure that none are leaked
      during a disconnect.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      467c9674
    • C
      xprtrdma: Cap req_cqinit · e7104a2a
      Chuck Lever 提交于
      Recent work made FRMR registration and invalidation completions
      unsignaled. This greatly reduces the adapter interrupt rate.
      
      Every so often, however, a posted send Work Request is allowed to
      signal. Otherwise, the provider's Work Queue will wrap and the
      workload will hang.
      
      The number of Work Requests that are allowed to remain unsignaled is
      determined by the value of req_cqinit. Currently, this is set to the
      size of the send Work Queue divided by two, minus 1.
      
      For FRMR, the send Work Queue is the maximum number of concurrent
      RPCs (currently 32) times the maximum number of Work Requests an
      RPC might use (currently 7, though some adapters may need more).
      
      For mlx4, this is 224 entries. This leaves completion signaling
      disabled for 111 send Work Requests.
      
      Some providers hold back dispatching Work Requests until a CQE is
      generated.  If completions are disabled, then no CQEs are generated
      for quite some time, and that can stall the Work Queue.
      
      I've seen this occur running xfstests generic/113 over NFSv4, where
      eventually, posting a FAST_REG_MR Work Request fails with -ENOMEM
      because the Work Queue has overflowed. The connection is dropped
      and re-established.
      
      Cap the rep_cqinit setting so completions are not left turned off
      for too long.
      
      BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=269Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      e7104a2a