1. 21 6月, 2019 1 次提交
  2. 24 4月, 2019 3 次提交
  3. 06 2月, 2019 5 次提交
    • K
      IB/hfi1: Add TID RDMA WRITE functionality into RDMA verbs · 3c6cb20a
      Kaike Wan 提交于
      This patch integrates TID RDMA WRITE protocol into normal RDMA verbs
      framework. The TID RDMA WRITE protocol is an end-to-end protocol
      between the hfi1 drivers on two OPA nodes that converts a qualified
      RDMA WRITE request into a TID RDMA WRITE request to avoid data copying
      on the responder side.
      Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
      Signed-off-by: NMitko Haralanov <mitko.haralanov@intel.com>
      Signed-off-by: NKaike Wan <kaike.wan@intel.com>
      Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      3c6cb20a
    • K
      IB/hfi1: Add an s_acked_ack_queue pointer · 4f9264d1
      Kaike Wan 提交于
      The s_ack_queue is managed by two pointers into the ring:
      r_head_ack_queue and s_tail_ack_queue. r_head_ack_queue is the index of
      where the next received request is going to be placed and s_tail_ack_queue
      is the entry of the request currently being processed. This works
      perfectly fine for normal Verbs as the requests are processed one at a
      time and the s_tail_ack_queue is not moved until the request that it
      points to is fully completed.
      
      In this fashion, s_tail_ack_queue constantly chases r_head_ack_queue and
      the two pointers can easily be used to determine "queue full" and "queue
      empty" conditions.
      
      The detection of these two conditions are imported in determining when an
      old entry can safely be overwritten with a new received request and the
      resources associated with the old request be safely released.
      
      When pipelined TID RDMA WRITE is introduced into this mix, things look
      very different. r_head_ack_queue is still the point at which a newly
      received request will be inserted, s_tail_ack_queue is still the
      currently processed request. However, with pipelined TID RDMA WRITE
      requests, s_tail_ack_queue moves to the next request once all TID RDMA
      WRITE responses for that request have been sent. The rest of the protocol
      for a particular request is managed by other pointers specific to TID RDMA
      - r_tid_tail and r_tid_ack - which point to the entries for which the next
      TID RDMA DATA packets are going to arrive and the request for which
      the next TID RDMA ACK packets are to be generated, respectively.
      
      What this means is that entries in the ring, which are "behind"
      s_tail_ack_queue (entries which s_tail_ack_queue has gone past) are no
      longer considered complete. This is where the problem is - a newly
      received request could potentially overwrite a still active TID RDMA WRITE
      request.
      
      The reason why the TID RDMA pointers trail s_tail_ack_queue is that the
      normal Verbs send engine uses s_tail_ack_queue as the pointer for the next
      response. Since TID RDMA WRITE responses are processed by the normal Verbs
      send engine, s_tail_ack_queue had to be moved to the next entry once all
      TID RDMA WRITE response packets were sent to get the desired pipelining
      between requests. Doing otherwise would mean that the normal Verbs send
      engine would not be able to send the TID RDMA WRITE responses for the next
      TID RDMA request until the current one is fully completed.
      
      This patch introduces the s_acked_ack_queue index to point to the next
      request to complete on the responder side. For requests other than TID
      RDMA WRITE, s_acked_ack_queue should always be kept in sync with
      s_tail_ack_queue. For TID RDMA WRITE request, it may fall behind
      s_tail_ack_queue.
      Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
      Signed-off-by: NMitko Haralanov <mitko.haralanov@intel.com>
      Signed-off-by: NKaike Wan <kaike.wan@intel.com>
      Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      4f9264d1
    • K
      IB/hfi1: Increment the retry timeout value for TID RDMA READ request · 039cd3da
      Kaike Wan 提交于
      The RC retry timeout value is based on the estimated time for the
      response packet to come back. However, for TID RDMA READ request, due
      to the use of header suppression, the driver is normally not notified
      for each incoming response packet until the last TID RDMA READ response
      packet. Consequently, the retry timeout value should be extended to
      cover the transaction time for the entire length of a segment (default
      256K) instead of that for a single packet. This patch addresses the
      issue by introducing new retry timer functions to account for multiple
      packets and wrapper functions for backward compatibility.
      Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
      Signed-off-by: NKaike Wan <kaike.wan@intel.com>
      Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      039cd3da
    • K
      IB/hfi1: TID RDMA RcvArray programming and TID allocation · 838b6fd2
      Kaike Wan 提交于
      TID entries are used by hfi1 hardware to receive data payload from
      incoming packets directly into a user buffer and thus avoid data copying
      by software. This patch implements the functions for TID allocation,
      freeing, and programming TID RcvArray entries in hardware for kernel
      clients. TID entries are managed via lists of TID groups similar to PSM.
      Furthermore, to track TID resource allocation for each request, software
      flows are also allocated and freed as needed. Since software flows
      consume large amount of memory for tracking TID allocation and freeing,
      it is generally desirable to allocate them dynamically in the send queue
      and only for TID RDMA requests, but pre-allocate them for receive queue
      because the send queue could have thousands of entries while the receive
      queue has only a limited number of entries.
      Signed-off-by: NMitko Haralanov <mitko.haralanov@intel.com>
      Signed-off-by: NAshutosh Dixit <ashutosh.dixit@intel.com>
      Signed-off-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
      Signed-off-by: NKaike Wan <kaike.wan@intel.com>
      Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      838b6fd2
    • K
      IB/hfi: Move RC functions into a header file · 385156c5
      Kaike Wan 提交于
      This patch moves some RC helper functions into a header file so that
      they can be called from both RC and  TID RDMA functions. In addition,
      a common function for rewinding a request is created in rdmavt so that
      it can be shared between qib and hfi1 driver.
      Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
      Signed-off-by: NMitko Haralanov <mitko.haralanov@intel.com>
      Signed-off-by: NKaike Wan <kaike.wan@intel.com>
      Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      385156c5
  4. 04 10月, 2018 3 次提交
  5. 20 6月, 2018 1 次提交
  6. 10 5月, 2018 1 次提交
  7. 05 10月, 2017 1 次提交
  8. 29 8月, 2017 2 次提交
  9. 23 8月, 2017 1 次提交
  10. 20 7月, 2017 1 次提交
  11. 28 6月, 2017 1 次提交
  12. 05 5月, 2017 1 次提交
  13. 02 5月, 2017 1 次提交
  14. 29 4月, 2017 1 次提交
  15. 06 4月, 2017 2 次提交
  16. 19 2月, 2017 7 次提交
  17. 12 12月, 2016 3 次提交
  18. 17 9月, 2016 1 次提交
  19. 03 8月, 2016 4 次提交
    • I
      IB/rdmavt: Eliminate redundant opcode test in mr ref clear · fe508272
      Ira Weiny 提交于
      The use of the specific opcode test is redundant since
      all ack entry users correctly manipulate the mr pointer
      to selectively trigger the reference clearing.
      
      The overly specific test hinders the use of implementation
      specific operations.
      
      The change needs to get rid of the union to insure that
      an atomic value is not seen as an MR pointer.
      Reviewed-by: NAshutosh Dixit <ashutosh.dixit@intel.com>
      Signed-off-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
      Signed-off-by: NIra Weiny <ira.weiny@intel.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      fe508272
    • J
      IB/rdmavt, hfi1: Fix NFSoRDMA failure with FRMR enabled · d9b13c20
      Jianxin Xiong 提交于
      Hanging has been observed while writing a file over NFSoRDMA. Dmesg on
      the server contains messages like these:
      
      [  931.992501] svcrdma: Error -22 posting RDMA_READ
      [  952.076879] svcrdma: Error -22 posting RDMA_READ
      [  982.154127] svcrdma: Error -22 posting RDMA_READ
      [ 1012.235884] svcrdma: Error -22 posting RDMA_READ
      [ 1042.319194] svcrdma: Error -22 posting RDMA_READ
      
      Here is why:
      
      With the base memory management extension enabled, FRMR is used instead
      of FMR. The xprtrdma server issues each RDMA read request as the following
      bundle:
      
      (1)IB_WR_REG_MR, signaled;
      (2)IB_WR_RDMA_READ, signaled;
      (3)IB_WR_LOCAL_INV, signaled & fencing.
      
      These requests are signaled. In order to generate completion, the fast
      register work request is processed by the hfi1 send engine after being
      posted to the work queue, and the corresponding lkey is not valid until
      the request is processed. However, the rdmavt driver validates lkey when
      the RDMA read request is posted and thus it fails immediately with error
      -EINVAL (-22).
      
      This patch changes the work flow of local operations (fast register and
      local invalidate) so that fast register work requests are always
      processed immediately to ensure that the corresponding lkey is valid
      when subsequent work requests are posted. Local invalidate requests are
      processed immediately if fencing is not required and no previous local
      invalidate request is pending.
      
      To allow completion generation for signaled local operations that have
      been processed before posting to the work queue, an internal send flag
      RVT_SEND_COMPLETION_ONLY is added. The hfi1 send engine checks this flag
      and only generates completion for such requests.
      Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
      Signed-off-by: NJianxin Xiong <jianxin.xiong@intel.com>
      Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      d9b13c20
    • M
      IB/hfi1: Add the capability for reserved operations · 856cc4c2
      Mike Marciniszyn 提交于
      This fix allows for support of in-kernel reserved operations
      without impacting the ULP user.
      
      The low level driver can register a non-zero value which
      will be transparently added to the send queue size and hidden
      from the ULP in every respect.
      
      ULP post sends will never see a full queue due to a reserved
      post send and reserved operations will never exceed that
      registered value.
      
      The s_avail will continue to track the ULP swqe availability
      and the difference between the reserved value and the reserved
      in use will track reserved availabity.
      Reviewed-by: NAshutosh Dixit <ashutosh.dixit@intel.com>
      Signed-off-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
      Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      856cc4c2
    • J
      IB/rdmavt: Handle local operations in post send · d9f87239
      Jianxin Xiong 提交于
      Some work requests are local operations, such as IB_WR_REG_MR and
      IB_WR_LOCAL_INV. They differ from non-local operations in that:
      
      (1) Local operations can be processed immediately without being posted
      to the send queue if neither fencing nor completion generation is needed.
      However, to ensure correct ordering, once a local operation is posted to
      the work queue due to fencing or completion requiement, all subsequent
      local operations must also be posted to the work queue until all the
      local operations on the work queue have completed.
      
      (2) Local operations don't send packets over the wire and thus don't
      need (and shouldn't update) the packet sequence numbers.
      
      Define a new a flag bit for the post send table to identify local
      operations.
      
      Add a new field to the QP structure to track the number of local
      operations on the send queue to determine if direct processing of new
      local operations should be enabled/disabled.
      Reviewed-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
      Reviewed-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
      Signed-off-by: NJianxin Xiong <jianxin.xiong@intel.com>
      Signed-off-by: NDennis Dalessandro <dennis.dalessandro@intel.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      d9f87239