1. 11 3月, 2016 3 次提交
  2. 08 10月, 2015 1 次提交
    • C
      IB: split struct ib_send_wr · e622f2f4
      Christoph Hellwig 提交于
      This patch split up struct ib_send_wr so that all non-trivial verbs
      use their own structure which embedds struct ib_send_wr.  This dramaticly
      shrinks the size of a WR for most common operations:
      
      sizeof(struct ib_send_wr) (old):	96
      
      sizeof(struct ib_send_wr):		48
      sizeof(struct ib_rdma_wr):		64
      sizeof(struct ib_atomic_wr):		96
      sizeof(struct ib_ud_wr):		88
      sizeof(struct ib_fast_reg_wr):		88
      sizeof(struct ib_bind_mw_wr):		96
      sizeof(struct ib_sig_handover_wr):	80
      
      And with Sagi's pending MR rework the fast registration WR will also be
      down to a reasonable size:
      
      sizeof(struct ib_fastreg_wr):		64
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> [srp, srpt]
      Reviewed-by: Chuck Lever <chuck.lever@oracle.com> [sunrpc]
      Tested-by: NHaggai Eran <haggaie@mellanox.com>
      Tested-by: NSagi Grimberg <sagig@mellanox.com>
      Tested-by: NSteve Wise <swise@opengridcomputing.com>
      e622f2f4
  3. 18 2月, 2015 1 次提交
  4. 18 3月, 2014 1 次提交
  5. 09 7月, 2012 1 次提交
    • M
      IB/qib: Avoid returning EBUSY from MR deregister · 6a82649f
      Mike Marciniszyn 提交于
      A timing issue can occur where qib_mr_dereg can return -EBUSY if the
      MR use count is not zero.
      
      This can occur if the MR is de-registered while RDMA read response
      packets are being progressed from the SDMA ring.  The suspicion is
      that the peer sent an RDMA read request, which has already been copied
      across to the peer.  The peer sees the completion of his request and
      then communicates to the responder that the MR is not needed any
      longer.  The responder tries to de-register the MR, catching some
      responses remaining in the SDMA ring holding the MR use count.
      
      The code now uses a get/put paradigm to track MR use counts and
      coordinates with the MR de-registration process using a completion
      when the count has reached zero.  A timeout on the delay is in place
      to catch other EBUSY issues.
      
      The reference count protocol is as follows:
      - The return to the user counts as 1
      - A reference from the lk_table or the qib_ibdev counts as 1.
      - Transient I/O operations increase/decrease as necessary
      
      A lot of code duplication has been folded into the new routines
      init_qib_mregion() and deinit_qib_mregion().  Additionally, explicit
      initialization of fields to zero is now handled by kzalloc().
      
      Also, duplicated code 'while.*num_sge' that decrements reference
      counts have been consolidated in qib_put_ss().
      Reviewed-by: NRamkrishna Vepa <ramkrishna.vepa@intel.com>
      Signed-off-by: NMike Marciniszyn <mike.marciniszyn@intel.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      6a82649f
  6. 15 5月, 2012 1 次提交
  7. 09 3月, 2012 1 次提交
    • O
      IB: Change CQE "csum_ok" field to a bit flag · d927d505
      Or Gerlitz 提交于
      Use a bit in wc_flags rather then a whole integer to hold the
      "checksum OK" flag.  By itself, this change doesn't reduce the size of
      struct ib_wc on 64bit machines -- it stays on 56 bytes because of
      padding.  However, it will allow to add more fields in the future
      without enlarging the struct.  Also, it will let us have a unified
      approach with future libibverbs checksum offload reporting, because a
      bit flag doesn't break the library ABI.
      
      This patch was suggested during conversation with Liran Liss
      <liranl@mellanox.com>.
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Reviewed-by: NSean Hefty <sean.hefty@intel.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      d927d505
  8. 05 11月, 2011 1 次提交
    • M
      IB/qib: Fix panic in RC error flushing logic · 30ab7e23
      Mike Marciniszyn 提交于
      The following panic can occur when flushing a QP:
      
          RIP: 0010:[<ffffffffa0168e8b>]  [<ffffffffa0168e8b>] qib_send_complete+0x3b/0x190 [ib_qib]
          RSP: 0018:ffff8803cdc6fc90  EFLAGS: 00010046
          RAX: 0000000000000000 RBX: ffff8803d84ba000 RCX: 0000000000000000
          RDX: 0000000000000005 RSI: ffffc90015a53430 RDI: ffff8803d84ba000
          RBP: ffff8803cdc6fce0 R08: ffff8803cdc6fc90 R09: 0000000000000001
          R10: 00000000ffffffff R11: 0000000000000000 R12: ffff8803d84ba0c0
          R13: ffff8803d84ba5cc R14: 0000000000000800 R15: 0000000000000246
          FS:  0000000000000000(0000) GS:ffff880036600000(0000) knlGS:0000000000000000
          CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
          CR2: 0000000000000034 CR3: 00000003e44f9000 CR4: 00000000000406f0
          DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
          DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
          Process qib/0 (pid: 1350, threadinfo ffff8803cdc6e000, task ffff88042728a100)
          Stack:
           53544c5553455201 0000000100000005 0000000000000000 ffff8803d84ba000
           0000000000000000 0000000000000000 0000000000000000 0000000000000000
           0000000000000000 0000000000000001 ffff8803cdc6fd30 ffffffffa0165d7a
          Call Trace:
           [<ffffffffa0165d7a>] qib_make_rc_req+0x36a/0xe80 [ib_qib]
           [<ffffffffa0165a10>] ?  qib_make_rc_req+0x0/0xe80 [ib_qib]
           [<ffffffffa01698b3>] qib_do_send+0xf3/0xb60 [ib_qib]
           [<ffffffff814db757>] ? thread_return+0x4e/0x777
           [<ffffffffa01697c0>] ? qib_do_send+0x0/0xb60 [ib_qib]
           [<ffffffff81088bf0>] worker_thread+0x170/0x2a0
           [<ffffffff8108e530>] ?  autoremove_wake_function+0x0/0x40
           [<ffffffff81088a80>] ? worker_thread+0x0/0x2a0
           [<ffffffff8108e1c6>] kthread+0x96/0xa0
           [<ffffffff8100c1ca>] child_rip+0xa/0x20
           [<ffffffff8108e130>] ? kthread+0x0/0xa0
           [<ffffffff8100c1c0>] ? child_rip+0x0/0x20
          RIP  [<ffffffffa0168e8b>] qib_send_complete+0x3b/0x190 [ib_qib]
      
      The RC error state flush logic in qib_make_rc_req() could return all
      of the acked wqes and potentially have emptied the queue.  It would
      then unconditionally try return a flush completion via
      qib_send_complete() for an invalid wqe, or worse a valid one that is
      not queued. The panic results when the completion code tries to
      maintain an MR reference count for a NULL MR.
      
      This fix modifies logic to only send one completion per
      qib_make_rc_req() call and changing the completion status from
      IB_WC_SUCCESS to IB_WC_WR_FLUSH_ERR as the completions progress.
      
      The outer loop will call as many times as necessary to flush the queue.
      Reviewed-by: NRam Vepa <ram.vepa@qlogic.com>
      Signed-off-by: NMike Marciniszyn <mike.marciniszyn@qlogic.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      30ab7e23
  9. 22 10月, 2011 4 次提交
  10. 18 2月, 2011 1 次提交
    • M
      IB/qib: Prevent double completions after a timeout or RNR error · c0af2c05
      Mike Marciniszyn 提交于
      There is a double completion associated with error handling for RC QPs.
      
      The sequence is:
      
       - The do_rc_ack() routine fields an RNR nack and there are 0
         rnr_retries configured on the QP.
       - qib_error_qp() stops the pending timer
       - qib_rc_send_complete() is called from sdma_complete()
       - qib_rc_send_complete() starts the timer because the msb of the psn
         just completed says an ack is needed.
       - a bunch of flushes occur as ipoib posts WQEs to an error'ed QP
       - rc_timeout() calls qib_restart_rc()
       - qib_restart_rc() calls qib_send_complete() with a
         IB_WC_RETRY_EXC_ERR on a wqe that has already been completed in the
         past
      
      The fix avoids starting the timer since another packet will never
      arrive.
      Signed-off-by: NMike Marciniszyn <mike.marciniszyn@qlogic.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      c0af2c05
  11. 11 2月, 2011 1 次提交
    • M
      IB/qib: Fix double add_timer() · 414ed90c
      Mike Marciniszyn 提交于
      The following panic BUG_ON occurs during qib testing:
      
          Kernel BUG at include/linux/timer.h:82
      
          RIP  [<ffffffff881f7109>] :ib_qib:start_timer+0x73/0x89
           RSP <ffffffff80425bd0>
           <0>Kernel panic - not syncing: Fatal exception
           <0>Dumping qib trace buffer from panic
          qib_set_lid INFO: IB0:1 got a lid: 0xf8
          Done dumping qib trace buffer
          BUG: warning at kernel/panic.c:137/panic() (Tainted: G
      
      The flaw is due to a missing state test when processing responses that
      results in an add_timer() call when the same timer is already queued.
      This code was executing in parallel with a QP destroy on another CPU
      that had changed the state to reset, but the missing test caused to
      response handling code to run on into the panic.
      Signed-off-by: NMike Marciniszyn <mike.marciniszyn@qlogic.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      414ed90c
  12. 11 1月, 2011 1 次提交
  13. 23 10月, 2010 1 次提交
  14. 04 8月, 2010 1 次提交
  15. 24 5月, 2010 1 次提交
  16. 18 1月, 2009 1 次提交
  17. 06 12月, 2008 1 次提交
  18. 21 9月, 2008 1 次提交
  19. 15 7月, 2008 1 次提交
    • S
      RDMA/core: Add memory management extensions support · 00f7ec36
      Steve Wise 提交于
      This patch adds support for the IB "base memory management extension"
      (BMME) and the equivalent iWARP operations (which the iWARP verbs
      mandates all devices must implement).  The new operations are:
      
       - Allocate an ib_mr for use in fast register work requests.
      
       - Allocate/free a physical buffer lists for use in fast register work
         requests.  This allows device drivers to allocate this memory as
         needed for use in posting send requests (eg via dma_alloc_coherent).
      
       - New send queue work requests:
         * send with remote invalidate
         * fast register memory region
         * local invalidate memory region
         * RDMA read with invalidate local memory region (iWARP only)
      
      Consumer interface details:
      
       - A new device capability flag IB_DEVICE_MEM_MGT_EXTENSIONS is added
         to indicate device support for these features.
      
       - New send work request opcodes IB_WR_FAST_REG_MR, IB_WR_LOCAL_INV,
         IB_WR_RDMA_READ_WITH_INV are added.
      
       - A new consumer API function, ib_alloc_mr() is added to allocate
         fast register memory regions.
      
       - New consumer API functions, ib_alloc_fast_reg_page_list() and
         ib_free_fast_reg_page_list() are added to allocate and free
         device-specific memory for fast registration page lists.
      
       - A new consumer API function, ib_update_fast_reg_key(), is added to
         allow the key portion of the R_Key and L_Key of a fast registration
         MR to be updated.  Consumers call this if desired before posting
         a IB_WR_FAST_REG_MR work request.
      
      Consumers can use this as follows:
      
       - MR is allocated with ib_alloc_mr().
      
       - Page list memory is allocated with ib_alloc_fast_reg_page_list().
      
       - MR R_Key/L_Key "key" field is updated with ib_update_fast_reg_key().
      
       - MR made VALID and bound to a specific page list via
         ib_post_send(IB_WR_FAST_REG_MR)
      
       - MR made INVALID via ib_post_send(IB_WR_LOCAL_INV),
         ib_post_send(IB_WR_RDMA_READ_WITH_INV) or an incoming send with
         invalidate operation.
      
       - MR is deallocated with ib_dereg_mr()
      
       - page lists dealloced via ib_free_fast_reg_page_list().
      
      Applications can allocate a fast register MR once, and then can
      repeatedly bind the MR to different physical block lists (PBLs) via
      posting work requests to a send queue (SQ).  For each outstanding
      MR-to-PBL binding in the SQ pipe, a fast_reg_page_list needs to be
      allocated (the fast_reg_page_list is owned by the low-level driver
      from the consumer posting a work request until the request completes).
      Thus pipelining can be achieved while still allowing device-specific
      page_list processing.
      
      The 32-bit fast register memory key/STag is composed of a 24-bit index
      and an 8-bit key.  The application can change the key each time it
      fast registers thus allowing more control over the peer's use of the
      key/STag (ie it can effectively be changed each time the rkey is
      rebound to a page list).
      Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      00f7ec36
  20. 14 5月, 2008 3 次提交
  21. 08 5月, 2008 1 次提交
  22. 17 4月, 2008 3 次提交
    • R
      IB/core: Add support for "send with invalidate" work requests · 0f39cf3d
      Roland Dreier 提交于
      Add a new IB_WR_SEND_WITH_INV send opcode that can be used to mark a
      "send with invalidate" work request as defined in the iWARP verbs and
      the InfiniBand base memory management extensions.  Also put "imm_data"
      and a new "invalidate_rkey" member in a new "ex" union in struct
      ib_send_wr. The invalidate_rkey member can be used to pass in an
      R_Key/STag to be invalidated.  Add this new union to struct
      ib_uverbs_send_wr.  Add code to copy the invalidate_rkey field in
      ib_uverbs_post_send().
      
      Fix up low-level drivers to deal with the change to struct ib_send_wr,
      and just remove the imm_data initialization from net/sunrpc/xprtrdma/,
      since that code never does any send with immediate operations.
      
      Also, move the existing IB_DEVICE_SEND_W_INV flag to a new bit, since
      the iWARP drivers currently in the tree set the bit.  The amso1100
      driver at least will silently fail to honor the IB_SEND_INVALIDATE bit
      if passed in as part of userspace send requests (since it does not
      implement kernel bypass work request queueing).  Remove the flag from
      all existing drivers that set it until we know which ones are OK.
      
      The values chosen for the new flag is not consecutive to avoid clashing
      with flags defined in the XRC patches, which are not merged yet but
      which are already in use and are likely to be merged soon.
      
      This resurrects a patch sent long ago by Mikkel Hagen <mhagen@iol.unh.edu>.
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      0f39cf3d
    • R
    • R
      IB/ipath: Use PIO buffer for RC ACKs · d98b1937
      Ralph Campbell 提交于
      This reduces the latency for RC ACKs when a PIO buffer is available.
      Signed-off-by: NRalph Campbell <ralph.campbell@qlogic.com>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      d98b1937
  23. 12 3月, 2008 1 次提交
  24. 26 1月, 2008 1 次提交
    • R
      IB/ipath: Fix RNR NAK handling · cc65edcf
      Ralph Campbell 提交于
      This patch fixes a couple of minor problems with RNR NAK handling:
       - The insertion sort was causing extra delay when inserting ahead
         vs. behind an existing entry on the list.
       - A resend of a first packet of a message which is still not ready,
         needs another RNR NAK (i.e., it was suppressed when it shouldn't).
       - Also, the resend tasklet doesn't need to be woken up unless the
         ACK/NAK actually indicates progress has been made.
      Signed-off-by: NRalph Campbell <ralph.campbell@qlogic.com>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      cc65edcf
  25. 14 11月, 2007 1 次提交
  26. 10 10月, 2007 2 次提交
  27. 10 7月, 2007 4 次提交