1. 26 4月, 2019 4 次提交
  2. 13 2月, 2019 2 次提交
    • C
      xprtrdma: Reduce the doorbell rate (Receive) · e340c2d6
      Chuck Lever 提交于
      Post RECV WRs in batches to reduce the hardware doorbell rate per
      transport. This helps the RPC-over-RDMA client scale better in
      number of transports.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      e340c2d6
    • C
      xprtrdma: Fix sparse warnings · ec482cc1
      Chuck Lever 提交于
      linux/net/sunrpc/xprtrdma/rpc_rdma.c:375:63: warning: incorrect type in argument 5 (different base types)
      linux/net/sunrpc/xprtrdma/rpc_rdma.c:375:63:    expected unsigned int [usertype] xid
      linux/net/sunrpc/xprtrdma/rpc_rdma.c:375:63:    got restricted __be32 [usertype] rq_xid
      linux/net/sunrpc/xprtrdma/rpc_rdma.c:432:62: warning: incorrect type in argument 5 (different base types)
      linux/net/sunrpc/xprtrdma/rpc_rdma.c:432:62:    expected unsigned int [usertype] xid
      linux/net/sunrpc/xprtrdma/rpc_rdma.c:432:62:    got restricted __be32 [usertype] rq_xid
      linux/net/sunrpc/xprtrdma/rpc_rdma.c:489:62: warning: incorrect type in argument 5 (different base types)
      linux/net/sunrpc/xprtrdma/rpc_rdma.c:489:62:    expected unsigned int [usertype] xid
      linux/net/sunrpc/xprtrdma/rpc_rdma.c:489:62:    got restricted __be32 [usertype] rq_xid
      
      Fixes: 0a93fbcb ("xprtrdma: Plant XID in on-the-wire RDMA ... ")
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      ec482cc1
  3. 03 1月, 2019 9 次提交
  4. 28 12月, 2018 1 次提交
  5. 03 10月, 2018 2 次提交
    • C
      xprtrdma: Simplify RPC wake-ups on connect · 31e62d25
      Chuck Lever 提交于
      Currently, when a connection is established, rpcrdma_conn_upcall
      invokes rpcrdma_conn_func and then
      wake_up_all(&ep->rep_connect_wait). The former wakes waiting RPCs,
      but the connect worker is not done yet, and that leads to races,
      double wakes, and difficulty understanding how this logic is
      supposed to work.
      
      Instead, collect all the "connection established" logic in the
      connect worker (xprt_rdma_connect_worker). A disconnect worker is
      retained to handle provider upcalls safely.
      
      Fixes: 254f91e2 ("xprtrdma: RPC/RDMA must invoke ... ")
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      31e62d25
    • C
      xprtrdma: Explicitly resetting MRs is no longer necessary · 61da886b
      Chuck Lever 提交于
      When a memory operation fails, the MR's driver state might not match
      its hardware state. The only reliable recourse is to dereg the MR.
      This is done in ->ro_recover_mr, which then attempts to allocate a
      fresh MR to replace the released MR.
      
      Since commit e2ac236c ("xprtrdma: Allocate MRs on demand"),
      xprtrdma dynamically allocates MRs. It can add more MRs whenever
      they are needed.
      
      That makes it possible to simply release an MR when a memory
      operation fails, instead of "recovering" it. It will automatically
      be replaced by the on-demand MR allocator.
      
      This commit is a little larger than I wanted, but it replaces
      ->ro_recover_mr, rb_recovery_lock, rb_recovery_worker, and the
      rb_stale_mrs list with a generic work queue.
      
      Since MRs are no longer orphaned, the mrs_orphaned metric is no
      longer used.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      61da886b
  6. 02 6月, 2018 1 次提交
    • C
      xprtrdma: Wait on empty sendctx queue · 2fad6592
      Chuck Lever 提交于
      Currently, when the sendctx queue is exhausted during marshaling, the
      RPC/RDMA transport places the RPC task on the delayq, which forces a
      wait for HZ >> 2 before the marshal and send is retried.
      
      With this change, the transport now places such an RPC task on the
      pending queue, and wakes it just as soon as more sendctxs become
      available. This typically takes less than a millisecond, and the
      write_space waking mechanism is less deadlock-prone.
      
      Moreover, the waiting RPC task is holding the transport's write
      lock, which blocks the transport from sending RPCs. Therefore faster
      recovery from sendctx queue exhaustion is desirable.
      
      Cf. commit 5804891455d5 ("xprtrdma: ->send_request returns -EAGAIN
      when there are no free MRs").
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      2fad6592
  7. 12 5月, 2018 1 次提交
  8. 07 5月, 2018 5 次提交
  9. 02 5月, 2018 1 次提交
    • C
      xprtrdma: Fix list corruption / DMAR errors during MR recovery · 054f1557
      Chuck Lever 提交于
      The ro_release_mr methods check whether mr->mr_list is empty.
      Therefore, be sure to always use list_del_init when removing an MR
      linked into a list using that field. Otherwise, when recovering from
      transport failures or device removal, list corruption can result, or
      MRs can get mapped or unmapped an odd number of times, resulting in
      IOMMU-related failures.
      
      In general this fix is appropriate back to v4.8. However, code
      changes since then make it impossible to apply this patch directly
      to stable kernels. The fix would have to be applied by hand or
      reworked for kernels earlier than v4.16.
      
      Backport guidance -- there are several cases:
      - When creating an MR, initialize mr_list so that using list_empty
        on an as-yet-unused MR is safe.
      - When an MR is being handled by the remote invalidation path,
        ensure that mr_list is reinitialized when it is removed from
        rl_registered.
      - When an MR is being handled by rpcrdma_destroy_mrs, it is removed
        from mr_all, but it may still be on an rl_registered list. In
        that case, the MR needs to be removed from that list before being
        released.
      - Other cases are covered by using list_del_init in rpcrdma_mr_pop.
      
      Fixes: 9d6b0409 ('xprtrdma: Place registered MWs on a ... ')
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      054f1557
  10. 11 4月, 2018 3 次提交
    • C
      xprtrdma: Chain Send to FastReg WRs · f2877623
      Chuck Lever 提交于
      With FRWR, the client transport can perform memory registration and
      post a Send with just a single ib_post_send.
      
      This reduces contention between the send_request path and the Send
      Completion handlers, and reduces the overhead of registering a chunk
      that has multiple segments.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      f2877623
    • C
      xprtrdma: Remove xprt-specific connect cookie · 8a14793e
      Chuck Lever 提交于
      Clean up: The generic rq_connect_cookie is sufficient to detect RPC
      Call retransmission.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      8a14793e
    • C
      xprtrdma: Fix latency regression on NUMA NFS/RDMA clients · 6720a899
      Chuck Lever 提交于
      With v4.15, on one of my NFS/RDMA clients I measured a nearly
      doubling in the latency of small read and write system calls. There
      was no change in server round trip time. The extra latency appears
      in the whole RPC execution path.
      
      "git bisect" settled on commit ccede759 ("xprtrdma: Spread reply
      processing over more CPUs") .
      
      After some experimentation, I found that leaving the WQ bound and
      allowing the scheduler to pick the dispatch CPU seems to eliminate
      the long latencies, and it does not introduce any new regressions.
      
      The fix is implemented by reverting only the part of
      commit ccede759 ("xprtrdma: Spread reply processing over more
      CPUs") that dispatches RPC replies specifically on the CPU where the
      matching RPC call was made.
      
      Interestingly, saving the CPU number and later queuing reply
      processing there was effective _only_ for a NFS READ and WRITE
      request. On my NUMA client, in-kernel RPC reply processing for
      asynchronous RPCs was dispatched on the same CPU where the RPC call
      was made, as expected. However synchronous RPCs seem to get their
      reply dispatched on some other CPU than where the call was placed,
      every time.
      
      Fixes: ccede759 ("xprtrdma: Spread reply processing over ... ")
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Cc: stable@vger.kernel.org # v4.15+
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      6720a899
  11. 23 1月, 2018 2 次提交
  12. 17 1月, 2018 9 次提交