1. 20 8月, 2019 1 次提交
  2. 18 7月, 2019 1 次提交
  3. 09 7月, 2019 8 次提交
    • C
      xprtrdma: Modernize ops->connect · 675dd90a
      Chuck Lever 提交于
      Adapt and apply changes that were made to the TCP socket connect
      code. See the following commits for details on the purpose of
      these changes:
      
      Commit 7196dbb0 ("SUNRPC: Allow changing of the TCP timeout parameters on the fly")
      Commit 3851f1cd ("SUNRPC: Limit the reconnect backoff timer to the max RPC message timeout")
      Commit 02910177 ("SUNRPC: Fix reconnection timeouts")
      
      Some common transport code is moved to xprt.c to satisfy the code
      duplication police.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      675dd90a
    • C
      xprtrdma: Remove rpcrdma_req::rl_buffer · 5828ceba
      Chuck Lever 提交于
      Clean up.
      
      There is only one remaining function, rpcrdma_buffer_put(), that
      uses this field. Its caller can supply a pointer to the correct
      rpcrdma_buffer, enabling the removal of an 8-byte pointer field
      from a frequently-allocated shared data structure.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      5828ceba
    • C
      xprtrdma: Wake RPCs directly in rpcrdma_wc_send path · 0ab11523
      Chuck Lever 提交于
      Eliminate a context switch in the path that handles RPC wake-ups
      when a Receive completion has to wait for a Send completion.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      0ab11523
    • C
      xprtrdma: Reduce context switching due to Local Invalidation · d8099fed
      Chuck Lever 提交于
      Since commit ba69cd12 ("xprtrdma: Remove support for FMR memory
      registration"), FRWR is the only supported memory registration mode.
      
      We can take advantage of the asynchronous nature of FRWR's LOCAL_INV
      Work Requests to get rid of the completion wait by having the
      LOCAL_INV completion handler take care of DMA unmapping MRs and
      waking the upper layer RPC waiter.
      
      This eliminates two context switches when local invalidation is
      necessary. As a side benefit, we will no longer need the per-xprt
      deferred completion work queue.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      d8099fed
    • C
      xprtrdma: Add mechanism to place MRs back on the free list · 40088f0e
      Chuck Lever 提交于
      When a marshal operation fails, any MRs that were already set up for
      that request are recycled. Recycling releases MRs and creates new
      ones, which is expensive.
      
      Since commit f2877623 ("xprtrdma: Chain Send to FastReg WRs")
      was merged, recycling FRWRs is unnecessary. This is because before
      that commit, frwr_map had already posted FAST_REG Work Requests,
      so ownership of the MRs had already been passed to the NIC and thus
      dealing with them had to be delayed until they completed.
      
      Since that commit, however, FAST_REG WRs are posted at the same time
      as the Send WR. This means that if marshaling fails, we are certain
      the MRs are safe to simply unmap and place back on the free list
      because neither the Send nor the FAST_REG WRs have been posted yet.
      The kernel still has ownership of the MRs at this point.
      
      This reduces the total number of MRs that the xprt has to create
      under heavy workloads and makes the marshaling logic less brittle.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      40088f0e
    • C
      xprtrdma: Remove fr_state · 84756894
      Chuck Lever 提交于
      Now that both the Send and Receive completions are handled in
      process context, it is safe to DMA unmap and return MRs to the
      free or recycle lists directly in the completion handlers.
      
      Doing this means rpcrdma_frwr no longer needs to track the state of
      each MR, meaning that a VALID or FLUSHED MR can no longer appear on
      an xprt's MR free list. Thus there is no longer a need to track the
      MR's registration state in rpcrdma_frwr.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      84756894
    • C
      xprtrdma: Remove the RPCRDMA_REQ_F_PENDING flag · 5809ea4f
      Chuck Lever 提交于
      Commit 9590d083 ("xprtrdma: Use xprt_pin_rqst in
      rpcrdma_reply_handler") pins incoming RPC/RDMA replies so they
      can be left in the pending requests queue while they are being
      processed without introducing a race between ->buf_free and the
      transport's reply handler. Therefore RPCRDMA_REQ_F_PENDING is no
      longer necessary.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      5809ea4f
    • C
      xprtrdma: Fix occasional transport deadlock · 05eb06d8
      Chuck Lever 提交于
      Under high I/O workloads, I've noticed that an RPC/RDMA transport
      occasionally deadlocks (IOPS goes to zero, and doesn't recover).
      Diagnosis shows that the sendctx queue is empty, but when sendctxs
      are returned to the queue, the xprt_write_space wake-up never
      occurs. The wake-up logic in rpcrdma_sendctx_put_locked is racy.
      
      I noticed that both EMPTY_SCQ and XPRT_WRITE_SPACE are implemented
      via an atomic bit. Just one of those is sufficient. Removing
      EMPTY_SCQ in favor of the generic bit mechanism makes the deadlock
      un-reproducible.
      
      Without EMPTY_SCQ, rpcrdma_buffer::rb_flags is no longer used and
      is therefore removed.
      
      Unfortunately this patch does not apply cleanly to stable. If
      needed, someone will have to port it and test it.
      
      Fixes: 2fad6592 ("xprtrdma: Wait on empty sendctx queue")
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      05eb06d8
  4. 26 4月, 2019 13 次提交
  5. 13 2月, 2019 2 次提交
    • C
      xprtrdma: Reduce the doorbell rate (Receive) · e340c2d6
      Chuck Lever 提交于
      Post RECV WRs in batches to reduce the hardware doorbell rate per
      transport. This helps the RPC-over-RDMA client scale better in
      number of transports.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      e340c2d6
    • C
      xprtrdma: Fix sparse warnings · ec482cc1
      Chuck Lever 提交于
      linux/net/sunrpc/xprtrdma/rpc_rdma.c:375:63: warning: incorrect type in argument 5 (different base types)
      linux/net/sunrpc/xprtrdma/rpc_rdma.c:375:63:    expected unsigned int [usertype] xid
      linux/net/sunrpc/xprtrdma/rpc_rdma.c:375:63:    got restricted __be32 [usertype] rq_xid
      linux/net/sunrpc/xprtrdma/rpc_rdma.c:432:62: warning: incorrect type in argument 5 (different base types)
      linux/net/sunrpc/xprtrdma/rpc_rdma.c:432:62:    expected unsigned int [usertype] xid
      linux/net/sunrpc/xprtrdma/rpc_rdma.c:432:62:    got restricted __be32 [usertype] rq_xid
      linux/net/sunrpc/xprtrdma/rpc_rdma.c:489:62: warning: incorrect type in argument 5 (different base types)
      linux/net/sunrpc/xprtrdma/rpc_rdma.c:489:62:    expected unsigned int [usertype] xid
      linux/net/sunrpc/xprtrdma/rpc_rdma.c:489:62:    got restricted __be32 [usertype] rq_xid
      
      Fixes: 0a93fbcb ("xprtrdma: Plant XID in on-the-wire RDMA ... ")
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      ec482cc1
  6. 03 1月, 2019 9 次提交
  7. 28 12月, 2018 1 次提交
  8. 03 10月, 2018 2 次提交
    • C
      xprtrdma: Simplify RPC wake-ups on connect · 31e62d25
      Chuck Lever 提交于
      Currently, when a connection is established, rpcrdma_conn_upcall
      invokes rpcrdma_conn_func and then
      wake_up_all(&ep->rep_connect_wait). The former wakes waiting RPCs,
      but the connect worker is not done yet, and that leads to races,
      double wakes, and difficulty understanding how this logic is
      supposed to work.
      
      Instead, collect all the "connection established" logic in the
      connect worker (xprt_rdma_connect_worker). A disconnect worker is
      retained to handle provider upcalls safely.
      
      Fixes: 254f91e2 ("xprtrdma: RPC/RDMA must invoke ... ")
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      31e62d25
    • C
      xprtrdma: Explicitly resetting MRs is no longer necessary · 61da886b
      Chuck Lever 提交于
      When a memory operation fails, the MR's driver state might not match
      its hardware state. The only reliable recourse is to dereg the MR.
      This is done in ->ro_recover_mr, which then attempts to allocate a
      fresh MR to replace the released MR.
      
      Since commit e2ac236c ("xprtrdma: Allocate MRs on demand"),
      xprtrdma dynamically allocates MRs. It can add more MRs whenever
      they are needed.
      
      That makes it possible to simply release an MR when a memory
      operation fails, instead of "recovering" it. It will automatically
      be replaced by the on-demand MR allocator.
      
      This commit is a little larger than I wanted, but it replaces
      ->ro_recover_mr, rb_recovery_lock, rb_recovery_worker, and the
      rb_stale_mrs list with a generic work queue.
      
      Since MRs are no longer orphaned, the mrs_orphaned metric is no
      longer used.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      61da886b
  9. 02 6月, 2018 1 次提交
    • C
      xprtrdma: Wait on empty sendctx queue · 2fad6592
      Chuck Lever 提交于
      Currently, when the sendctx queue is exhausted during marshaling, the
      RPC/RDMA transport places the RPC task on the delayq, which forces a
      wait for HZ >> 2 before the marshal and send is retried.
      
      With this change, the transport now places such an RPC task on the
      pending queue, and wakes it just as soon as more sendctxs become
      available. This typically takes less than a millisecond, and the
      write_space waking mechanism is less deadlock-prone.
      
      Moreover, the waiting RPC task is holding the transport's write
      lock, which blocks the transport from sending RPCs. Therefore faster
      recovery from sendctx queue exhaustion is desirable.
      
      Cf. commit 5804891455d5 ("xprtrdma: ->send_request returns -EAGAIN
      when there are no free MRs").
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      2fad6592
  10. 12 5月, 2018 1 次提交
  11. 07 5月, 2018 1 次提交