1. 30 11月, 2018 1 次提交
    • S
      net/mlx5: Driver events notifier API · 20902be4
      Saeed Mahameed 提交于
      Use atomic notifier chain to fire events to mlx5 core driver
      consumers (mlx5e/mlx5_ib) and provide mlx5 register/unregister notifier
      API.
      
      This API will replace the current mlx5_interface->event callback and all
      the logic around it, especially the delayed events logic introduced by
      commit 97834eba ("net/mlx5: Delay events till ib registration ends")
      
      Which is not needed anymore with this new API where the mlx5 interface
      can dynamically register/unregister its notifier.
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      20902be4
  2. 27 11月, 2018 5 次提交
  3. 21 11月, 2018 8 次提交
  4. 19 10月, 2018 2 次提交
  5. 17 10月, 2018 1 次提交
  6. 11 10月, 2018 2 次提交
    • T
      net/mlx5: WQ, fixes for fragmented WQ buffers API · 37fdffb2
      Tariq Toukan 提交于
      mlx5e netdevice used to calculate fragment edges by a call to
      mlx5_wq_cyc_get_frag_size(). This calculation did not give the correct
      indication for queues smaller than a PAGE_SIZE, (broken by default on
      PowerPC, where PAGE_SIZE == 64KB).  Here it is replaced by the correct new
      calls/API.
      
      Since (TX/RX) Work Queues buffers are fragmented, here we introduce
      changes to the API in core driver, so that it gets a stride index and
      returns the index of last stride on same fragment, and an additional
      wrapping function that returns the number of physically contiguous
      strides that can be written contiguously to the work queue.
      
      This obsoletes the following API functions, and their buggy
      usage in EN driver:
      * mlx5_wq_cyc_get_frag_size()
      * mlx5_wq_cyc_ctr2fragix()
      
      The new API improves modularity and hides the details of such
      calculation for mlx5e netdevice and mlx5_ib rdma drivers.
      
      New calculation is also more efficient, and improves performance
      as follows:
      
      Packet rate test: pktgen, UDP / IPv4, 64byte, single ring, 8K ring size.
      
      Before: 16,477,619 pps
      After:  17,085,793 pps
      
      3.7% improvement
      
      Fixes: 3a2f7033 ("net/mlx5: Use order-0 allocations for all WQ types")
      Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
      Reviewed-by: NEran Ben Elisha <eranbe@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      37fdffb2
    • D
      RDMA/netdev: Hoist alloc_netdev_mqs out of the driver · f6a8a19b
      Denis Drozdov 提交于
      netdev has several interfaces that expect to call alloc_netdev_mqs from
      the core code, with the driver only providing the arguments.  This is
      incompatible with the rdma_netdev interface that returns the netdev
      directly.
      
      Thus re-organize the API used by ipoib so that the verbs core code calls
      alloc_netdev_mqs for the driver. This is done by allowing the drivers to
      provide the allocation parameters via a 'get_params' callback and then
      initializing an allocated netdev as a second step.
      
      Fixes: cd565b4b ("IB/IPoIB: Support acceleration options callbacks")
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      Signed-off-by: NDenis Drozdov <denisd@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      f6a8a19b
  7. 02 10月, 2018 1 次提交
  8. 25 9月, 2018 1 次提交
  9. 06 9月, 2018 8 次提交
    • S
      net/mlx5e: Replace PTP clock lock from RW lock to seq lock · 64109f1d
      Shay Agroskin 提交于
      Changed "priv.clock.lock" lock from 'rw_lock' to 'seq_lock'
      in order to improve packet rate performance.
      
      Tested on Intel(R) Xeon(R) CPU E5-2660 v2 @ 2.20GHz.
      Sent 64b packets between two peers connected by ConnectX-5,
      and measured packet rate for the receiver in three modes:
      	no time-stamping (base rate)
      	time-stamping using rw_lock (old lock) for critical region
      	time-stamping using seq_lock (new lock) for critical region
      Only the receiver time stamped its packets.
      
      The measured packet rate improvements are:
      
      	Single flow (multiple TX rings to single RX ring):
      		without timestamping:	  4.26 (M packets)/sec
      		with rw-lock (old lock):  4.1  (M packets)/sec
      		with seq-lock (new lock): 4.16 (M packets)/sec
      		1.46% improvement
      
      	Multiple flows (multiple TX rings to six RX rings):
      		without timestamping: 	  22   (M packets)/sec
      		with rw-lock (old lock):  11.7 (M packets)/sec
      		with seq-lock (new lock): 21.3 (M packets)/sec
      		82.05% improvement
      
      The packet rate improvement is due to the lack of atomic operations
      for the 'readers' by the seq-lock.
      Since there are much more 'readers' than 'writers' contention
      on this lock, almost all atomic operations are saved.
      this results in a dramatic decrease in overall
      cache misses.
      Signed-off-by: NShay Agroskin <shayag@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      64109f1d
    • V
      net/mlx5: Add flow counters idr · 12d6066c
      Vlad Buslov 提交于
      Previous patch in series changed flow counter storage structure from
      rb_tree to linked list in order to improve flow counter traversal
      performance. The drawback of such solution is that flow counter lookup by
      id becomes linear in complexity.
      
      Store pointers to flow counters in idr in order to improve lookup
      performance to logarithmic again. Idr is non-intrusive data structure and
      doesn't require extending flow counter struct with new elements. This means
      that idr can be used for lookup, while linked list from previous patch is
      used for traversal, and struct mlx5_fc size is <= 2 cache lines.
      Signed-off-by: NVlad Buslov <vladbu@mellanox.com>
      Acked-by: NAmir Vadai <amir@vadai.me>
      Reviewed-by: NPaul Blakey <paulb@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      12d6066c
    • V
      net/mlx5: Store flow counters in a list · 9aff93d7
      Vlad Buslov 提交于
      In order to improve performance of flow counter stats query loop that
      traverses all configured flow counters, replace rb_tree with double-linked
      list. This change improves performance of traversing flow counters by
      removing the tree traversal. (profiling data showed that call to rb_next
      was most top CPU consumer)
      
      However, lookup of flow flow counter in list becomes linear, instead of
      logarithmic. This problem is fixed by next patch in series, which adds idr
      for fast lookup. Idr is to be used because it is not an intrusive data
      structure and doesn't require adding any new members to struct mlx5_fc,
      which allows its control data part to stay <= 1 cache line in size.
      Signed-off-by: NVlad Buslov <vladbu@mellanox.com>
      Acked-by: NAmir Vadai <amir@vadai.me>
      Reviewed-by: NPaul Blakey <paulb@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      9aff93d7
    • V
      net/mlx5: Add new list to store deleted flow counters · 6e5e2283
      Vlad Buslov 提交于
      In order to prevent flow counters stats work function from traversing whole
      flow counters tree while searching for deleted flow counters, new list to
      store deleted flow counters is added to struct mlx5_fc_stats. Lockless
      NULL-terminated single linked list data type is used due to following
      reasons:
       - This use case only needs to add single element to list and
       remove/iterate whole list. Lockless list doesn't require any additional
       synchronization for these operations.
       - First cache line of flow counter data structure only has space to store
       single additional pointer, which precludes usage of double linked list.
      
      Remove flow counter 'deleted' flag that is no longer needed.
      Signed-off-by: NVlad Buslov <vladbu@mellanox.com>
      Acked-by: NAmir Vadai <amir@vadai.me>
      Reviewed-by: NPaul Blakey <paulb@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      6e5e2283
    • V
      net/mlx5: Change flow counters addlist type to single linked list · 83033688
      Vlad Buslov 提交于
      In order to prevent flow counters stats work function from traversing whole
      flow counters tree while searching for deleted flow counters, new list to
      store deleted flow counters will be added to struct mlx5_fc_stats. However,
      the flow counter structure itself has no space left to store any more data
      in first cache line. To free space that is needed to store additional list
      node, convert current addlist double linked list (two pointers per node) to
      atomic single linked list (one pointer per node).
      
      Lockless NULL-terminated single linked list data type doesn't require any
      additional external synchronization for operations used by flow counters
      module (add single new element, remove all elements from list and traverse
      them). Remove addlist_lock that is no longer needed.
      Signed-off-by: NVlad Buslov <vladbu@mellanox.com>
      Acked-by: NAmir Vadai <amir@vadai.me>
      Reviewed-by: NPaul Blakey <paulb@mellanox.com>
      Reviewed-by: NRoi Dayan <roid@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      83033688
    • T
      net/mlx5: Use u16 for Work Queue buffer strides offset · a0903622
      Tariq Toukan 提交于
      Minimal stride size is 16.
      Hence, the number of strides in a fragment (of PAGE_SIZE)
      is <= PAGE_SIZE / 16 <= 4K.
      
      u16 is sufficient to represent this.
      
      Fixes: d7037ad7 ("net/mlx5: Fix QP fragmented buffer allocation")
      Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
      Reviewed-by: NEran Ben Elisha <eranbe@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      a0903622
    • T
      net/mlx5: Use u16 for Work Queue buffer fragment size · 8d71e818
      Tariq Toukan 提交于
      Minimal stride size is 16.
      Hence, the number of strides in a fragment (of PAGE_SIZE)
      is <= PAGE_SIZE / 16 <= 4K.
      
      u16 is sufficient to represent this.
      
      Fixes: 388ca8be ("IB/mlx5: Implement fragmented completion queue (CQ)")
      Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
      Reviewed-by: NEran Ben Elisha <eranbe@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      8d71e818
    • J
      net/mlx5: Fix use-after-free in self-healing flow · 76d5581c
      Jack Morgenstein 提交于
      When the mlx5 health mechanism detects a problem while the driver
      is in the middle of init_one or remove_one, the driver needs to prevent
      the health mechanism from scheduling future work; if future work
      is scheduled, there is a problem with use-after-free: the system WQ
      tries to run the work item (which has been freed) at the scheduled
      future time.
      
      Prevent this by disabling work item scheduling in the health mechanism
      when the driver is in the middle of init_one() or remove_one().
      
      Fixes: e126ba97 ("mlx5: Add driver for Mellanox Connect-IB adapters")
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Reviewed-by: NFeras Daoud <ferasda@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      76d5581c
  10. 04 9月, 2018 1 次提交
  11. 03 8月, 2018 1 次提交
    • J
      RDMA/netdev: Use priv_destructor for netdev cleanup · 9f49a5b5
      Jason Gunthorpe 提交于
      Now that the unregister_netdev flow for IPoIB no longer relies on external
      code we can now introduce the use of priv_destructor and
      needs_free_netdev.
      
      The rdma_netdev flow is switched to use the netdev common priv_destructor
      instead of the special free_rdma_netdev and the IPOIB ULP adjusted:
       - priv_destructor needs to switch to point to the ULP's destructor
         which will then call the rdma_ndev's in the right order
       - We need to be careful around the error unwind of register_netdev
         as it sometimes calls priv_destructor on failure
       - ULPs need to use ndo_init/uninit to ensure proper ordering
         of failures around register_netdev
      
      Switching to priv_destructor is a necessary pre-requisite to using
      the rtnl new_link mechanism.
      
      The VNIC user for rdma_netdev should also be revised, but that is left for
      another patch.
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      Signed-off-by: NDenis Drozdov <denisd@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      9f49a5b5
  12. 28 7月, 2018 1 次提交
  13. 24 7月, 2018 1 次提交
  14. 19 7月, 2018 3 次提交
  15. 05 7月, 2018 1 次提交
  16. 26 5月, 2018 1 次提交
    • T
      net/mlx5: Use order-0 allocations for all WQ types · 3a2f7033
      Tariq Toukan 提交于
      Complete the transition of all WQ types to use fragmented
      order-0 coherent memory instead of high-order allocations.
      
      CQ-WQ already uses order-0.
      Here we do the same for cyclic and linked-list WQs.
      
      This allows the driver to load cleanly on systems with a highly
      fragmented coherent memory.
      
      Performance tests:
      ConnectX-5 100Gbps, CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz
      Packet rate of 64B packets, single transmit ring, size 8K.
      
      No degradation is sensed.
      Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      3a2f7033
  17. 25 5月, 2018 1 次提交
  18. 17 5月, 2018 1 次提交