1. 15 7月, 2008 6 次提交
    • R
      IB/mlx4: Add support for blocking multicast loopback packets · 521e575b
      Ron Livne 提交于
      Add support for handling the IB_QP_CREATE_MULTICAST_BLOCK_LOOPBACK
      flag by using the per-multicast group loopback blocking feature of
      mlx4 hardware.
      Signed-off-by: NRon Livne <ronli@voltaire.com>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      521e575b
    • R
      IB/mlx4: Remove extra code for RESET->ERR QP state transition · 7c27f358
      Roland Dreier 提交于
      Commit 65adfa91 ("IB/mlx4: Fix RESET to RESET and RESET to ERROR
      transitions") added some extra code to handle a QP state transition
      from RESET to ERROR.  However, the latest 1.2.1 version of the IB spec
      has clarified that this transition is actually not allowed, so we can
      remove this extra code again.
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      7c27f358
    • E
      IB/mlx4: Pass congestion management class MADs to the HCA · 6578cf33
      Eli Cohen 提交于
      ConnectX HCAs support the IB_MGMT_CLASS_CONG_MGMT management class, so
      process MADs of this class through the MAD_IFC firmware command.
      Signed-off-by: NEli Cohen <eli@mellanox.co.il>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      6578cf33
    • E
      IB/mlx4: Configure QPs' max message size based on real device capability · d1f2cd89
      Eli Cohen 提交于
      ConnectX returns the max message size it supports through the
      QUERY_DEV_CAP firmware command.  When modifying a QP to RTR, the max
      message size for the QP must be specified.  This value must not exceed
      the value declared through QUERY_DEV_CAP.  The current code ignores
      the max allowed size and unconditionally sets the value to 2^31.  This
      patch sets all QPs to the max value allowed as returned from firmware.
      Signed-off-by: NEli Cohen <eli@mellanox.co.il>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      d1f2cd89
    • S
      RDMA/core: Add memory management extensions support · 00f7ec36
      Steve Wise 提交于
      This patch adds support for the IB "base memory management extension"
      (BMME) and the equivalent iWARP operations (which the iWARP verbs
      mandates all devices must implement).  The new operations are:
      
       - Allocate an ib_mr for use in fast register work requests.
      
       - Allocate/free a physical buffer lists for use in fast register work
         requests.  This allows device drivers to allocate this memory as
         needed for use in posting send requests (eg via dma_alloc_coherent).
      
       - New send queue work requests:
         * send with remote invalidate
         * fast register memory region
         * local invalidate memory region
         * RDMA read with invalidate local memory region (iWARP only)
      
      Consumer interface details:
      
       - A new device capability flag IB_DEVICE_MEM_MGT_EXTENSIONS is added
         to indicate device support for these features.
      
       - New send work request opcodes IB_WR_FAST_REG_MR, IB_WR_LOCAL_INV,
         IB_WR_RDMA_READ_WITH_INV are added.
      
       - A new consumer API function, ib_alloc_mr() is added to allocate
         fast register memory regions.
      
       - New consumer API functions, ib_alloc_fast_reg_page_list() and
         ib_free_fast_reg_page_list() are added to allocate and free
         device-specific memory for fast registration page lists.
      
       - A new consumer API function, ib_update_fast_reg_key(), is added to
         allow the key portion of the R_Key and L_Key of a fast registration
         MR to be updated.  Consumers call this if desired before posting
         a IB_WR_FAST_REG_MR work request.
      
      Consumers can use this as follows:
      
       - MR is allocated with ib_alloc_mr().
      
       - Page list memory is allocated with ib_alloc_fast_reg_page_list().
      
       - MR R_Key/L_Key "key" field is updated with ib_update_fast_reg_key().
      
       - MR made VALID and bound to a specific page list via
         ib_post_send(IB_WR_FAST_REG_MR)
      
       - MR made INVALID via ib_post_send(IB_WR_LOCAL_INV),
         ib_post_send(IB_WR_RDMA_READ_WITH_INV) or an incoming send with
         invalidate operation.
      
       - MR is deallocated with ib_dereg_mr()
      
       - page lists dealloced via ib_free_fast_reg_page_list().
      
      Applications can allocate a fast register MR once, and then can
      repeatedly bind the MR to different physical block lists (PBLs) via
      posting work requests to a send queue (SQ).  For each outstanding
      MR-to-PBL binding in the SQ pipe, a fast_reg_page_list needs to be
      allocated (the fast_reg_page_list is owned by the low-level driver
      from the consumer posting a work request until the request completes).
      Thus pipelining can be achieved while still allowing device-specific
      page_list processing.
      
      The 32-bit fast register memory key/STag is composed of a 24-bit index
      and an 8-bit key.  The application can change the key each time it
      fast registers thus allowing more control over the peer's use of the
      key/STag (ie it can effectively be changed each time the rkey is
      rebound to a page list).
      Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      00f7ec36
    • E
      IB/mlx4: Optimize QP stamping · 9670e553
      Eli Cohen 提交于
      The idea is that for QPs with fixed size work requests (eg selective
      signaling QPs), before stamping the WQE, we read the value of the DS
      field, which gives the effective size of the descriptor as used in the
      previous post.  Then we stamp only that area, since the rest of the
      descriptor is already stamped.
      
      When initializing the send queue buffer, make sure the DS field is
      initialized to the max descriptor size so that the subsequent stamping
      will be done on the entire descriptor area.
      Signed-off-by: NEli Cohen <eli@mellanox.co.il>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      9670e553
  2. 21 5月, 2008 1 次提交
    • R
      IB/mlx4: Fix creation of kernel QP with max number of send s/g entries · cd155c1c
      Roland Dreier 提交于
      When creating a kernel QP where the consumer asked for a send queue
      with lots of scatter/gater entries, set_kernel_sq_size() incorrectly
      returned an error if the send queue stride is larger than the
      hardware's maximum send work request descriptor size.  This is not a
      problem; the only issue is to make sure that the actual descriptors
      used do not overflow the maximum descriptor size, so check this instead.
      
      Clamp the returned max_send_sge value to be no bigger than what
      query_device returns for the max_sge to avoid confusing hapless users,
      even if the hardware is capable of handling a few more s/g entries.
      
      This bug caused NFS/RDMA mounts to fail when the server adapter used
      the mlx4 driver.
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      cd155c1c
  3. 17 5月, 2008 1 次提交
  4. 01 5月, 2008 1 次提交
    • R
      IB/mlx4: Fix off-by-one errors in calls to mlx4_ib_free_cq_buf() · 3ae15e16
      Roland Dreier 提交于
      When I merged bbf8eed1 ("IB/mlx4: Add support for resizing CQs") I
      changed things around so that mlx4_ib_alloc_cq_buf() and
      mlx4_ib_free_cq_buf() were used everywhere they could be.  However, I
      screwed up the number of entries passed into mlx4_ib_alloc_cq_buf()
      in a couple places -- the function bumps the number of entries
      internally, so the caller shouldn't add 1 as well.
      
      Passing a too-big value for the number of entries to mlx4_ib_free_cq_buf()
      can cause the cleanup to go off the end of an array and corrupt
      allocator state in interesting ways.
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      3ae15e16
  5. 30 4月, 2008 1 次提交
  6. 29 4月, 2008 1 次提交
    • A
      IB: expand ib_umem_get() prototype · cb9fbc5c
      Arthur Kepner 提交于
      Add a new parameter, dmasync, to the ib_umem_get() prototype.  Use dmasync = 1
      when mapping user-allocated CQs with ib_umem_get().
      Signed-off-by: NArthur Kepner <akepner@sgi.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
      Cc: Jes Sorensen <jes@sgi.com>
      Cc: Randy Dunlap <randy.dunlap@oracle.com>
      Cc: Roland Dreier <rdreier@cisco.com>
      Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
      Cc: David Miller <davem@davemloft.net>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Grant Grundler <grundler@parisc-linux.org>
      Cc: Michael Ellerman <michael@ellerman.id.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cb9fbc5c
  7. 24 4月, 2008 1 次提交
  8. 20 4月, 2008 1 次提交
  9. 17 4月, 2008 12 次提交
  10. 15 2月, 2008 1 次提交
  11. 09 2月, 2008 1 次提交
    • J
      IB/mlx4: Use multiple WQ blocks to post smaller send WQEs · ea54b10c
      Jack Morgenstein 提交于
      ConnectX HCA supports shrinking WQEs, so that a single work request
      can be made of multiple units of wqe_shift.  This way, WRs can differ
      in size, and do not have to be a power of 2 in size, saving memory and
      speeding up send WR posting.  Unfortunately, if we do this then the
      wqe_index field in CQEs can't be used to look up the WR ID anymore, so
      our implementation does this only if selective signaling is off.
      
      Further, on 32-bit platforms, we can't use vmap() to make the QP
      buffer virtually contigious. Thus we have to use constant-sized WRs to
      make sure a WR is always fully within a single page-sized chunk.
      
      Finally, we use WRs with the NOP opcode to avoid wrapping around the
      queue buffer in the middle of posting a WR, and we set the
      NoErrorCompletion bit to avoid getting completions with error for NOP
      WRs.  However, NEC is only supported starting with firmware 2.2.232,
      so we use constant-sized WRs for older firmware.  And, since MLX QPs
      only support SEND, we use constant-sized WRs in this case.
      
      When stamping during NOP posting, do stamping following setting of the
      NOP WQE valid bit.
      Signed-off-by: NMichael S. Tsirkin <mst@dev.mellanox.co.il>
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      ea54b10c
  12. 07 2月, 2008 1 次提交
  13. 05 2月, 2008 2 次提交
  14. 26 1月, 2008 1 次提交
    • R
      IB/mlx4: Micro-optimize mlx4_ib_poll_one() · b3226184
      Roland Dreier 提交于
      Rather than byte-swapping cqe->g_mlpath_rqpn each time we extract a
      field from it, byte-swap it once into a temporary variable.  This 
      results in smaller, better code -- eg, on 32-bit x86:
      
      add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-5 (-5)
      function                                     old     new   delta
      mlx4_ib_poll_cq                             1188    1183      -5
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      b3226184
  15. 09 1月, 2008 1 次提交
  16. 31 10月, 2007 1 次提交
  17. 19 10月, 2007 1 次提交
  18. 10 10月, 2007 5 次提交
  19. 24 9月, 2007 1 次提交
    • J
      IB/mlx4: Fix data corruption triggered by wrong headroom marking order · 6e694ea3
      Jack Morgenstein 提交于
      This is an addendum to commit 0e6e7416 ("IB/mlx4: Handle new FW
      requirement for send request prefetching").  We also need to handle
      prefetch marking properly for S/G segments, or else the HCA may end up
      processing S/G segments that are not fully written and end up sending
      the wrong data.  This can actually cause data corruption in practice,
      especially on systems with relatively slow CPUs (where the HCA is more
      likely to prefetch while the CPU is in the middle of writing a work
      request into memory).
      
      We write S/G segments in reverse order into the WQE, in order to
      guarantee that the first dword of all cachelines containing S/G
      segments is written last (overwriting the headroom invalidation
      pattern).  The entire cacheline will thus contain valid data when the
      invalidation pattern is overwritten.
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      6e694ea3