1. 24 10月, 2019 4 次提交
    • C
      xprtrdma: Move the rpcrdma_sendctx::sc_wr field · dc15c3d5
      Chuck Lever 提交于
      Clean up: This field is not needed in the Send completion handler,
      so it can be moved to struct rpcrdma_req to reduce the size of
      struct rpcrdma_sendctx, and to reduce the amount of memory that
      is sloshed between the sending process and the Send completion
      process.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      dc15c3d5
    • C
      xprtrdma: Ensure ri_id is stable during MR recycling · 15d9b015
      Chuck Lever 提交于
      ia->ri_id is replaced during a reconnect. The connect_worker runs
      with the transport send lock held to prevent ri_id from being
      dereferenced by the send_request path during this process.
      
      Currently, however, there is no guarantee that ia->ri_id is stable
      in the MR recycling worker, which operates in the background and is
      not serialized with the connect_worker in any way.
      
      But now that Local_Inv completions are being done in process
      context, we can handle the recycling operation there instead of
      deferring the recycling work to another process. Because the
      disconnect path drains all work before allowing tear down to
      proceed, it is guaranteed that Local Invalidations complete only
      while the ri_id pointer is stable.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      15d9b015
    • C
      xprtrdma: Manage MRs in context of a single connection · 9d2da4ff
      Chuck Lever 提交于
      MRs are now allocated on demand so we can safely throw them away on
      disconnect. This way an idle transport can disconnect and it won't
      pin hardware MR resources.
      
      Two additional changes:
      
      - Now that all MRs are destroyed on disconnect, there's no need to
        check during header marshaling if a req has MRs to recycle. Each
        req is sent only once per connection, and now rl_registered is
        guaranteed to be empty when rpcrdma_marshal_req is invoked.
      
      - Because MRs are now destroyed in a WQ_MEM_RECLAIM context, they
        also must be allocated in a WQ_MEM_RECLAIM context. This reduces
        the likelihood that device driver memory allocation will trigger
        memory reclaim during NFS writeback.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      9d2da4ff
    • C
      xprtrdma: Add unique trace points for posting Local Invalidate WRs · 4b93dab3
      Chuck Lever 提交于
      When adding frwr_unmap_async way back when, I re-used the existing
      trace_xprtrdma_post_send() trace point to record the return code
      of ib_post_send.
      
      Unfortunately there are some cases where re-using that trace point
      causes a crash. Instead, construct a trace point specific to posting
      Local Invalidate WRs that will always be safe to use in that context,
      and will act as a trace log eye-catcher for Local Invalidation.
      
      Fixes: 84756894 ("xprtrdma: Remove fr_state")
      Fixes: d8099fed ("xprtrdma: Reduce context switching due ... ")
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Tested-by: NBill Baker <bill.baker@oracle.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      4b93dab3
  2. 27 8月, 2019 1 次提交
  3. 21 8月, 2019 6 次提交
  4. 20 8月, 2019 2 次提交
  5. 09 7月, 2019 4 次提交
    • C
      xprtrdma: Reduce context switching due to Local Invalidation · d8099fed
      Chuck Lever 提交于
      Since commit ba69cd12 ("xprtrdma: Remove support for FMR memory
      registration"), FRWR is the only supported memory registration mode.
      
      We can take advantage of the asynchronous nature of FRWR's LOCAL_INV
      Work Requests to get rid of the completion wait by having the
      LOCAL_INV completion handler take care of DMA unmapping MRs and
      waking the upper layer RPC waiter.
      
      This eliminates two context switches when local invalidation is
      necessary. As a side benefit, we will no longer need the per-xprt
      deferred completion work queue.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      d8099fed
    • C
      xprtrdma: Add mechanism to place MRs back on the free list · 40088f0e
      Chuck Lever 提交于
      When a marshal operation fails, any MRs that were already set up for
      that request are recycled. Recycling releases MRs and creates new
      ones, which is expensive.
      
      Since commit f2877623 ("xprtrdma: Chain Send to FastReg WRs")
      was merged, recycling FRWRs is unnecessary. This is because before
      that commit, frwr_map had already posted FAST_REG Work Requests,
      so ownership of the MRs had already been passed to the NIC and thus
      dealing with them had to be delayed until they completed.
      
      Since that commit, however, FAST_REG WRs are posted at the same time
      as the Send WR. This means that if marshaling fails, we are certain
      the MRs are safe to simply unmap and place back on the free list
      because neither the Send nor the FAST_REG WRs have been posted yet.
      The kernel still has ownership of the MRs at this point.
      
      This reduces the total number of MRs that the xprt has to create
      under heavy workloads and makes the marshaling logic less brittle.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      40088f0e
    • C
      xprtrdma: Remove fr_state · 84756894
      Chuck Lever 提交于
      Now that both the Send and Receive completions are handled in
      process context, it is safe to DMA unmap and return MRs to the
      free or recycle lists directly in the completion handlers.
      
      Doing this means rpcrdma_frwr no longer needs to track the state of
      each MR, meaning that a VALID or FLUSHED MR can no longer appear on
      an xprt's MR free list. Thus there is no longer a need to track the
      MR's registration state in rpcrdma_frwr.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      84756894
    • C
      xprtrdma: Fix occasional transport deadlock · 05eb06d8
      Chuck Lever 提交于
      Under high I/O workloads, I've noticed that an RPC/RDMA transport
      occasionally deadlocks (IOPS goes to zero, and doesn't recover).
      Diagnosis shows that the sendctx queue is empty, but when sendctxs
      are returned to the queue, the xprt_write_space wake-up never
      occurs. The wake-up logic in rpcrdma_sendctx_put_locked is racy.
      
      I noticed that both EMPTY_SCQ and XPRT_WRITE_SPACE are implemented
      via an atomic bit. Just one of those is sufficient. Removing
      EMPTY_SCQ in favor of the generic bit mechanism makes the deadlock
      un-reproducible.
      
      Without EMPTY_SCQ, rpcrdma_buffer::rb_flags is no longer used and
      is therefore removed.
      
      Unfortunately this patch does not apply cleanly to stable. If
      needed, someone will have to port it and test it.
      
      Fixes: 2fad6592 ("xprtrdma: Wait on empty sendctx queue")
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      05eb06d8
  6. 26 4月, 2019 4 次提交
  7. 13 2月, 2019 1 次提交
    • C
      xprtrdma: Fix sparse warnings · ec482cc1
      Chuck Lever 提交于
      linux/net/sunrpc/xprtrdma/rpc_rdma.c:375:63: warning: incorrect type in argument 5 (different base types)
      linux/net/sunrpc/xprtrdma/rpc_rdma.c:375:63:    expected unsigned int [usertype] xid
      linux/net/sunrpc/xprtrdma/rpc_rdma.c:375:63:    got restricted __be32 [usertype] rq_xid
      linux/net/sunrpc/xprtrdma/rpc_rdma.c:432:62: warning: incorrect type in argument 5 (different base types)
      linux/net/sunrpc/xprtrdma/rpc_rdma.c:432:62:    expected unsigned int [usertype] xid
      linux/net/sunrpc/xprtrdma/rpc_rdma.c:432:62:    got restricted __be32 [usertype] rq_xid
      linux/net/sunrpc/xprtrdma/rpc_rdma.c:489:62: warning: incorrect type in argument 5 (different base types)
      linux/net/sunrpc/xprtrdma/rpc_rdma.c:489:62:    expected unsigned int [usertype] xid
      linux/net/sunrpc/xprtrdma/rpc_rdma.c:489:62:    got restricted __be32 [usertype] rq_xid
      
      Fixes: 0a93fbcb ("xprtrdma: Plant XID in on-the-wire RDMA ... ")
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      ec482cc1
  8. 03 1月, 2019 10 次提交
  9. 03 10月, 2018 3 次提交
    • C
      xprtrdma: Name MR trace events consistently · d379eaa8
      Chuck Lever 提交于
      Clean up the names of trace events related to MRs so that it's
      easy to enable these with a glob.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      d379eaa8
    • C
      xprtrdma: Explicitly resetting MRs is no longer necessary · 61da886b
      Chuck Lever 提交于
      When a memory operation fails, the MR's driver state might not match
      its hardware state. The only reliable recourse is to dereg the MR.
      This is done in ->ro_recover_mr, which then attempts to allocate a
      fresh MR to replace the released MR.
      
      Since commit e2ac236c ("xprtrdma: Allocate MRs on demand"),
      xprtrdma dynamically allocates MRs. It can add more MRs whenever
      they are needed.
      
      That makes it possible to simply release an MR when a memory
      operation fails, instead of "recovering" it. It will automatically
      be replaced by the on-demand MR allocator.
      
      This commit is a little larger than I wanted, but it replaces
      ->ro_recover_mr, rb_recovery_lock, rb_recovery_worker, and the
      rb_stale_mrs list with a generic work queue.
      
      Since MRs are no longer orphaned, the mrs_orphaned metric is no
      longer used.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      61da886b
    • C
      xprtrdma: Create more MRs at a time · c421ece6
      Chuck Lever 提交于
      Some devices require more than 3 MRs to build a single 1MB I/O.
      Ensure that rpcrdma_mrs_create() will add enough MRs to build that
      I/O.
      
      In a subsequent patch I'm changing the MR recovery logic to just
      toss out the MRs. In that case it's possible for ->send_request to
      loop acquiring some MRs, not getting enough, getting called again,
      recycling the previous MRs, then not getting enough, lather rinse
      repeat. Thus first we need to ensure enough MRs are created to
      prevent that loop.
      
      I'm "reusing" ia->ri_max_segs. All of its accessors seem to want the
      maximum number of data segments plus two, so I'm going to bake that
      into the initial calculation.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      c421ece6
  10. 31 7月, 2018 1 次提交
    • B
      RDMA, core and ULPs: Declare ib_post_send() and ib_post_recv() arguments const · d34ac5cd
      Bart Van Assche 提交于
      Since neither ib_post_send() nor ib_post_recv() modify the data structure
      their second argument points at, declare that argument const. This change
      makes it necessary to declare the 'bad_wr' argument const too and also to
      modify all ULPs that call ib_post_send(), ib_post_recv() or
      ib_post_srq_recv(). This patch does not change any functionality but makes
      it possible for the compiler to verify whether the
      ib_post_(send|recv|srq_recv) really do not modify the posted work request.
      
      To make this possible, only one cast had to be introduce that casts away
      constness, namely in rpcrdma_post_recvs(). The only way I can think of to
      avoid that cast is to introduce an additional loop in that function or to
      change the data type of bad_wr from struct ib_recv_wr ** into int
      (an index that refers to an element in the work request list). However,
      both approaches would require even more extensive changes than this
      patch.
      Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com>
      Reviewed-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      d34ac5cd
  11. 25 7月, 2018 1 次提交
  12. 02 6月, 2018 1 次提交
  13. 12 5月, 2018 2 次提交