1. 26 3月, 2019 1 次提交
  2. 11 1月, 2019 1 次提交
  3. 20 12月, 2018 2 次提交
  4. 23 11月, 2018 1 次提交
  5. 21 9月, 2018 1 次提交
  6. 31 7月, 2018 2 次提交
    • J
      IB/mlx4: Use 4K pages for kernel QP's WQE buffer · f95ccffc
      Jack Morgenstein 提交于
      In the current implementation, the driver tries to allocate contiguous
      memory, and if it fails, it falls back to 4K fragmented allocation.
      
      Once the memory is fragmented, the first allocation might take a lot
      of time, and even fail, which can cause connection failures.
      
      This patch changes the logic to always allocate with 4K granularity,
      since it's more robust and more likely to succeed.
      
      This patch was tested with Lustre and no performance degradation
      was observed.
      
      Note: This commit eliminates the "shrinking WQE" feature. This feature
      depended on using vmap to create a virtually contiguous send WQ.
      vmap use was abandoned due to problems with several processors (see the
      commit cited below). As a result, shrinking WQE was available only with
      physically contiguous send WQs. Allocating such send WQs caused the
      problems described above.
      Therefore, as a side effect of eliminating the use of large physically
      contiguous send WQs, the shrinking WQE feature became unavailable.
      
      Warning example:
      worker/20:1: page allocation failure: order:8, mode:0x80d0
      CPU: 20 PID: 513 Comm: kworker/20:1 Tainted: G OE ------------
      Workqueue: ib_cm cm_work_handler [ib_cm]
      Call Trace:
      [<ffffffff81686d81>] dump_stack+0x19/0x1b
      [<ffffffff81186160>] warn_alloc_failed+0x110/0x180
      [<ffffffff8118a954>] __alloc_pages_nodemask+0x9b4/0xba0
      [<ffffffff811ce868>] alloc_pages_current+0x98/0x110
      [<ffffffff81184fae>] __get_free_pages+0xe/0x50
      [<ffffffff8133f6fe>] swiotlb_alloc_coherent+0x5e/0x150
      [<ffffffff81062551>] x86_swiotlb_alloc_coherent+0x41/0x50
      [<ffffffffa056b4c4>] mlx4_buf_direct_alloc.isra.7+0xc4/0x180 [mlx4_core]
      [<ffffffffa056b73b>] mlx4_buf_alloc+0x1bb/0x260 [mlx4_core]
      [<ffffffffa0b15496>] create_qp_common+0x536/0x1000 [mlx4_ib]
      [<ffffffff811c6ef7>] ? dma_pool_free+0xa7/0xd0
      [<ffffffffa0b163c1>] mlx4_ib_create_qp+0x3b1/0xdc0 [mlx4_ib]
      [<ffffffffa0b01bc2>] ? mlx4_ib_create_cq+0x2d2/0x430 [mlx4_ib]
      [<ffffffffa0b21f20>] mlx4_ib_create_qp_wrp+0x10/0x20 [mlx4_ib]
      [<ffffffffa08f152a>] ib_create_qp+0x7a/0x2f0 [ib_core]
      [<ffffffffa06205d4>] rdma_create_qp+0x34/0xb0 [rdma_cm]
      [<ffffffffa08275c9>] kiblnd_create_conn+0xbf9/0x1950 [ko2iblnd]
      [<ffffffffa074077a>] ? cfs_percpt_unlock+0x1a/0xb0 [libcfs]
      [<ffffffffa0835519>] kiblnd_passive_connect+0xa99/0x18c0 [ko2iblnd]
      
      Fixes: 73898db0 ("net/mlx4: Avoid wrong virtual mappings")
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      f95ccffc
    • B
      RDMA, core and ULPs: Declare ib_post_send() and ib_post_recv() arguments const · d34ac5cd
      Bart Van Assche 提交于
      Since neither ib_post_send() nor ib_post_recv() modify the data structure
      their second argument points at, declare that argument const. This change
      makes it necessary to declare the 'bad_wr' argument const too and also to
      modify all ULPs that call ib_post_send(), ib_post_recv() or
      ib_post_srq_recv(). This patch does not change any functionality but makes
      it possible for the compiler to verify whether the
      ib_post_(send|recv|srq_recv) really do not modify the posted work request.
      
      To make this possible, only one cast had to be introduce that casts away
      constness, namely in rpcrdma_post_recvs(). The only way I can think of to
      avoid that cast is to introduce an additional loop in that function or to
      change the data type of bad_wr from struct ib_recv_wr ** into int
      (an index that refers to an element in the work request list). However,
      both approaches would require even more extensive changes than this
      patch.
      Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com>
      Reviewed-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      d34ac5cd
  7. 27 6月, 2018 1 次提交
  8. 26 6月, 2018 1 次提交
    • Y
      IB/mlx4: Add support for drain SQ & RQ · 1975acd9
      Yishai Hadas 提交于
      This patch follows the logic from ib_core but considers the internal
      device state upon executing the involved commands.
      
      Specifically, Upon internal error state modify QP to an error state can
      be assumed to be success as each in-progress WR going to be flushed in
      error in any case as expected by that modify command.
      
      In addition,
      As the drain should never fail the driver makes sure that post_send/recv
      will succeed even if the device is already in an internal error state.
      As such once the driver will supply the simulated/SW CQEs the CQE for
      the drain WR will be handled as well.
      
      In case of an internal error state the CQE for the drain WR may be
      completed as part of the main task that handled the error state or by
      the task that issued the drain WR.
      
      As the above depends on scheduling the code takes the relevant locks
      and actions to make sure that the completion handler for that WR will
      always be called after that the post_send/recv were issued but not in
      parallel to the other task that handles the error flow.
      Signed-off-by: NYishai Hadas <yishaih@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      1975acd9
  9. 19 6月, 2018 1 次提交
  10. 16 3月, 2018 2 次提交
  11. 08 3月, 2018 1 次提交
  12. 14 11月, 2017 2 次提交
  13. 11 11月, 2017 1 次提交
  14. 24 7月, 2017 4 次提交
  15. 18 7月, 2017 1 次提交
  16. 02 5月, 2017 1 次提交
  17. 25 1月, 2017 1 次提交
  18. 14 12月, 2016 1 次提交
  19. 08 10月, 2016 1 次提交
    • J
      IB/mlx4: Fix possible vl/sl field mismatch in LRH header in QP1 packets · fd10ed8e
      Jack Morgenstein 提交于
      In MLX qp packets, the LRH (built by the driver) has both a VL field
      and an SL field. When building a QP1 packet, the VL field should
      reflect the SLtoVL mapping and not arbitrarily contain zero (as is
      done now). This bug causes credit problems in IB switches at
      high rates of QP1 packets.
      
      The fix is to cache the SL to VL mapping in the driver, and look up
      the VL mapped to the SL provided in the send request when sending
      QP1 packets.
      
      For FW versions which support generating a port_management_config_change
      event with subtype sl-to-vl-table-change, the driver uses that event
      to update its sl-to-vl mapping cache.  Otherwise, the driver snoops
      incoming SMP mads to update the cache.
      
      There remains the case where the FW is running in secure-host mode
      (so no QP0 packets are delivered to the driver), and the FW does not
      generate the sl2vl mapping change event. To support this case, the
      driver updates (via querying the FW) its sl2vl mapping cache when
      running in secure-host mode when it receives either a Port Up event
      or a client-reregister event (where the port is still up, but there
      may have been an opensm failover).
      OpenSM modifies the sl2vl mapping before Port Up and Client-reregister
      events occur, so if there is a mapping change the driver's cache will
      be properly updated.
      
      Fixes: 225c7b1f ("IB/mlx4: Add a driver Mellanox ConnectX InfiniBand adapters")
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NLeon Romanovsky <leon@kernel.org>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      fd10ed8e
  20. 17 9月, 2016 1 次提交
    • J
      IB/mlx4: Use correct subnet-prefix in QP1 mads under SR-IOV · 8ec07bf8
      Jack Morgenstein 提交于
      When sending QP1 MAD packets which use a GRH, the source GID
      (which consists of the 64-bit subnet prefix, and the 64 bit port GUID)
      must be included in the packet GRH.
      
      For SR-IOV, a GID cache is used, since the source GID needs to be the
      slave's source GID, and not the Hypervisor's GID. This cache also
      included a subnet_prefix. Unfortunately, the subnet_prefix field in
      the cache was never initialized (to the default subnet prefix 0xfe80::0).
      As a result, this field remained all zeroes.  Therefore, when SR-IOV
      was active, all QP1 packets which included a GRH had a source GID
      subnet prefix of all-zeroes.
      
      However, the subnet-prefix should initially be 0xfe80::0 (the default
      subnet prefix). In addition, if OpenSM modifies a port's subnet prefix,
      the new subnet prefix must be used in the GRH when sending QP1 packets.
      To fix this we now initialize the subnet prefix in the SR-IOV GID cache
      to the default subnet prefix. We update the cached value if/when OpenSM
      modifies the port's subnet prefix. We take this cached value when sending
      QP1 packets when SR-IOV is active.
      
      Note that the value is stored as an atomic64. This eliminates any need
      for locking when the subnet prefix is being updated.
      
      Note also that we depend on the FW generating the "port management change"
      event for tracking subnet-prefix changes performed by OpenSM. If running
      early FW (before 2.9.4630), subnet prefix changes will not be tracked (but
      the default subnet prefix still will be stored in the cache; therefore
      users who do not modify the subnet prefix will not have a problem).
      IF there is a need for such tracking also for early FW, we will add that
      capability in a subsequent patch.
      
      Fixes: 1ffeb2eb ("IB/mlx4: SR-IOV IB context objects and proxy/tunnel SQP support")
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NLeon Romanovsky <leon@kernel.org>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      8ec07bf8
  21. 04 8月, 2016 1 次提交
    • M
      IB/mlx4: Add diagnostic hardware counters · 3f85f2aa
      Mark Bloch 提交于
      Expose IB diagnostic hardware counters.
      The counters count IB events and are applicable for IB and RoCE.
      
      The counters can be divided into two groups, per device and per port.
      Device counters are always exposed.
      Port counters are exposed only if the firmware supports per port counters.
      
      rq_num_dup and sq_num_to are only exposed if we have firmware support
      for them, if we do, we expose them per device and per port.
      rq_num_udsdprd and num_cqovf are device only counters.
      
      rq - denotes responder.
      sq - denotes requester.
      
      |-----------------------|---------------------------------------|
      |	Name		|	Description			|
      |-----------------------|---------------------------------------|
      |rq_num_lle		| Number of local length errors		|
      |-----------------------|---------------------------------------|
      |sq_num_lle		| number of local length errors		|
      |-----------------------|---------------------------------------|
      |rq_num_lqpoe		| Number of local QP operation errors	|
      |-----------------------|---------------------------------------|
      |sq_num_lqpoe		| Number of local QP operation errors	|
      |-----------------------|---------------------------------------|
      |rq_num_lpe		| Number of local protection errors	|
      |-----------------------|---------------------------------------|
      |sq_num_lpe		| Number of local protection errors	|
      |-----------------------|---------------------------------------|
      |rq_num_wrfe		| Number of CQEs with error		|
      |-----------------------|---------------------------------------|
      |sq_num_wrfe		| Number of CQEs with error		|
      |-----------------------|---------------------------------------|
      |sq_num_mwbe		| Number of Memory Window bind errors	|
      |-----------------------|---------------------------------------|
      |sq_num_bre		| Number of bad response errors		|
      |-----------------------|---------------------------------------|
      |sq_num_rire		| Number of Remote Invalid request	|
      |			| errors				|
      |-----------------------|---------------------------------------|
      |rq_num_rire		| Number of Remote Invalid request	|
      |			| errors				|
      |-----------------------|---------------------------------------|
      |sq_num_rae		| Number of remote access errors	|
      |-----------------------|---------------------------------------|
      |rq_num_rae		| Number of remote access errors	|
      |-----------------------|---------------------------------------|
      |sq_num_roe		| Number of remote operation errors	|
      |-----------------------|---------------------------------------|
      |sq_num_tree		| Number of transport retries exceeded	|
      |			| errors				|
      |-----------------------|---------------------------------------|
      |sq_num_rree		| Number of RNR NAK retries exceeded	|
      |			| errors				|
      |-----------------------|---------------------------------------|
      |rq_num_rnr		| Number of RNR NAKs sent		|
      |-----------------------|---------------------------------------|
      |sq_num_rnr		| Number of RNR NAKs received		|
      |-----------------------|---------------------------------------|
      |rq_num_oos		| Number of Out of Sequence requests	|
      |			| received				|
      |-----------------------|---------------------------------------|
      |sq_num_oos		| Number of Out of Sequence NAKs	|
      |			| received				|
      |-----------------------|---------------------------------------|
      |rq_num_udsdprd		| Number of UD packets silently		|
      |			| discarded on the Receive Queue due to	|
      |			| lack of receive descriptor		|
      |-----------------------|---------------------------------------|
      |rq_num_dup		| Number of duplicate requests received	|
      |-----------------------|---------------------------------------|
      |sq_num_to		| Number of time out received		|
      |-----------------------|---------------------------------------|
      |num_cqovf		| Number of CQ overflows		|
      |-----------------------|---------------------------------------|
      Signed-off-by: NMark Bloch <markb@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leon@kernel.org>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      3f85f2aa
  22. 23 6月, 2016 1 次提交
  23. 14 5月, 2016 2 次提交
  24. 02 3月, 2016 1 次提交
  25. 20 1月, 2016 2 次提交
  26. 24 12月, 2015 1 次提交
  27. 29 10月, 2015 2 次提交
  28. 22 10月, 2015 3 次提交
    • M
      IB/core: Use GID table in AH creation and dmac resolution · dbf727de
      Matan Barak 提交于
      Previously, vlan id and source MAC were used from QP attributes. Since
      the net device is now stored in the GID attributes, they could be used
      instead of getting this information from the QP attributes.
      
      IB_QP_SMAC, IB_QP_ALT_SMAC, IB_QP_VID and IB_QP_ALT_VID were removed
      because there is no known libibverbs that uses them.
      
      This commit also modifies the vendors (mlx4, ocrdma) drivers in order
      to use the new approach.
      
      ocrdma driver changes were done by Somnath Kotur <Somnath.Kotur@Avagotech.Com>
      Signed-off-by: NMatan Barak <matanb@mellanox.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      dbf727de
    • E
      IB/mlx4: Add counter based implementation for QP multicast loopback block · 7b59f0f9
      Eran Ben Elisha 提交于
      Current implementation for MLX4_IB_QP_BLOCK_MULTICAST_LOOPBACK is not
      supported when link layer is Ethernet.
      
      This patch will add counter based implementation for multicast loopback
      prevention. HW can drop multicast loopback packets if sender QP counter
      index is equal to receiver QP counter index. If qp flag
      MLX4_IB_QP_BLOCK_MULTICAST_LOOPBACK is set and link layer is Ethernet,
      create a new counter and attach it to the QP so it will continue
      receiving multicast loopback traffic but it's own.
      
      The decision if to create a new counter is being made at the qp
      modification to RTR after the QP's port is set. When QP is destroyed or
      moved back to reset state, delete the counter.
      Signed-off-by: NEran Ben Elisha <eranbe@mellanox.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      7b59f0f9
    • E
      IB/mlx4: Add IB counters table · 3ba8e31d
      Eran Ben Elisha 提交于
      This is an infrastructure step for allocating and attaching more than
      one counter to QPs on the same port. Allocate a counters table and
      manage the insertion and removals of the counters in load and unload of
      mlx4 IB.
      Signed-off-by: NEran Ben Elisha <eranbe@mellanox.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      3ba8e31d