1. 18 5月, 2016 3 次提交
    • C
      xprtrdma: Faster server reboot recovery · b2dde94b
      Chuck Lever 提交于
      In a cluster failover scenario, it is desirable for the client to
      attempt to reconnect quickly, as an alternate NFS server is already
      waiting to take over for the down server. The client can't see that
      a server IP address has moved to a new server until the existing
      connection is gone.
      
      For fabrics and devices where it is meaningful, set a definite upper
      bound on the amount of time before it is determined that a
      connection is no longer valid. This allows the RPC client to detect
      connection loss in a timely matter, then perform a fresh resolution
      of the server GUID in case it has changed (cluster failover).
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Tested-by: NSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      b2dde94b
    • C
      xprtrdma: Use core ib_drain_qp() API · 550d7502
      Chuck Lever 提交于
      Clean up: Replace rpcrdma_flush_cqs() and rpcrdma_clean_cqs() with
      the new ib_drain_qp() API.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Reviewed-By: NLeon Romanovsky <leonro@mellanox.com>
      Tested-by: NSteve Wise <swise@opengridcomputing.com>
      Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      550d7502
    • C
      xprtrdma: Limit number of RDMA segments in RPC-over-RDMA headers · 94931746
      Chuck Lever 提交于
      Send buffer space is shared between the RPC-over-RDMA header and
      an RPC message. A large RPC-over-RDMA header means less space is
      available for the associated RPC message, which then has to be
      moved via an RDMA Read or Write.
      
      As more segments are added to the chunk lists, the header increases
      in size.  Typical modern hardware needs only a few segments to
      convey the maximum payload size, but some devices and registration
      modes may need a lot of segments to convey data payload. Sometimes
      so many are needed that the remaining space in the Send buffer is
      not enough for the RPC message. Sending such a message usually
      fails.
      
      To ensure a transport can always make forward progress, cap the
      number of RDMA segments that are allowed in chunk lists. This
      prevents less-capable devices and memory registrations from
      consuming a large portion of the Send buffer by reducing the
      maximum data payload that can be conveyed with such devices.
      
      For now I choose an arbitrary maximum of 8 RDMA segments. This
      allows a maximum size RPC-over-RDMA header to fit nicely in the
      current 1024 byte inline threshold with over 700 bytes remaining
      for an inline RPC message.
      
      The current maximum data payload of NFS READ or WRITE requests is
      one megabyte. To convey that payload on a client with 4KB pages,
      each chunk segment would need to handle 32 or more data pages. This
      is well within the capabilities of FMR. For physical registration,
      the maximum payload size on platforms with 4KB pages is reduced to
      32KB.
      
      For FRWR, a device's maximum page list depth would need to be at
      least 34 to support the maximum 1MB payload. A device with a smaller
      maximum page list depth means the maximum data payload is reduced
      when using that device.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Tested-by: NSteve Wise <swise@opengridcomputing.com>
      Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      94931746
  2. 15 3月, 2016 3 次提交
    • C
      xprtrdma: Use new CQ API for RPC-over-RDMA client send CQs · 2fa8f88d
      Chuck Lever 提交于
      Calling ib_poll_cq() to sort through WCs during a completion is a
      common pattern amongst RDMA consumers. Since commit 14d3a3b2
      ("IB: add a proper completion queue abstraction"), WC sorting can
      be handled by the IB core.
      
      By converting to this new API, xprtrdma is made a better neighbor to
      other RDMA consumers, as it allows the core to schedule the delivery
      of completions more fairly amongst all active consumers.
      
      Because each ib_cqe carries a pointer to a completion method, the
      core can now post its own operations on a consumer's QP, and handle
      the completions itself, without changes to the consumer.
      
      Send completions were previously handled entirely in the completion
      upcall handler (ie, deferring to a process context is unneeded).
      Thus IB_POLL_SOFTIRQ is a direct replacement for the current
      xprtrdma send code path.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Reviewed-by: NDevesh Sharma <devesh.sharma@broadcom.com>
      Reviewed-by: NSagi Grimberg <sagig@mellanox.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      2fa8f88d
    • C
      xprtrdma: Use new CQ API for RPC-over-RDMA client receive CQs · 552bf225
      Chuck Lever 提交于
      Calling ib_poll_cq() to sort through WCs during a completion is a
      common pattern amongst RDMA consumers. Since commit 14d3a3b2
      ("IB: add a proper completion queue abstraction"), WC sorting can
      be handled by the IB core.
      
      By converting to this new API, xprtrdma is made a better neighbor to
      other RDMA consumers, as it allows the core to schedule the delivery
      of completions more fairly amongst all active consumers.
      
      Because each ib_cqe carries a pointer to a completion method, the
      core can now post its own operations on a consumer's QP, and handle
      the completions itself, without changes to the consumer.
      
      xprtrdma's reply processing is already handled in a work queue, but
      there is some initial order-dependent processing that is done in the
      soft IRQ context before a work item is scheduled.
      
      IB_POLL_SOFTIRQ is a direct replacement for the current xprtrdma
      receive code path.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Reviewed-by: NDevesh Sharma <devesh.sharma@broadcom.com>
      Reviewed-by: NSagi Grimberg <sagig@mellanox.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      552bf225
    • C
      xprtrdma: Serialize credit accounting again · 23826c7a
      Chuck Lever 提交于
      Commit fe97b47c ("xprtrdma: Use workqueue to process RPC/RDMA
      replies") replaced the reply tasklet with a workqueue that allows
      RPC replies to be processed in parallel. Thus the credit values in
      RPC-over-RDMA replies can be applied in a different order than in
      which the server sent them.
      
      To fix this, revert commit eba8ff66 ("xprtrdma: Move credit
      update to RPC reply handler"). Reverting is done by hand to
      accommodate code changes that have occurred since then.
      
      Fixes: fe97b47c ("xprtrdma: Use workqueue to process . . .")
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Reviewed-by: NSagi Grimberg <sagig@mellanox.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      23826c7a
  3. 23 12月, 2015 1 次提交
  4. 19 12月, 2015 3 次提交
  5. 03 11月, 2015 9 次提交
  6. 29 10月, 2015 1 次提交
  7. 07 10月, 2015 1 次提交
  8. 28 9月, 2015 1 次提交
  9. 25 9月, 2015 1 次提交
  10. 31 8月, 2015 1 次提交
  11. 06 8月, 2015 5 次提交
  12. 13 6月, 2015 11 次提交