提交 · 4b5b9c7d972e8a7b1e7691c7c921ec0d6dec33b9 · openeuler / Kernel

19 10月, 2018 2 次提交

net/mlx5: Add FEC fields to Port Phy Link Mode (PPLM) reg · 4b5b9c7d

由 Shay Agroskin 提交于 10月 09, 2018

Added FEC related fields to PPLM layout.
These fields are needed to set and query FEC policy
for different link speeds.
Signed-off-by: NShay Agroskin <shayag@mellanox.com>
Reviewed-by: NEran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

4b5b9c7d

net/mlx5: Refactor fragmented buffer struct fields and init flow · 4972e6fa

由 Tariq Toukan 提交于 9月 12, 2018

Take struct mlx5_frag_buf out of mlx5_frag_buf_ctrl, as it is not
needed to manage and control the datapath of the fragmented buffers API.

struct mlx5_frag_buf contains control info to manage the allocation
and de-allocation of the fragmented buffer.
Its fields are not relevant for datapath, so here I take them out of the
struct mlx5_frag_buf_ctrl, except for the fragments array itself.

In addition, modified mlx5_fill_fbc to initialise the frags pointers
as well. This implies that the buffer must be allocated before the
function is called.

A set of type-specific *_get_byte_size() functions are replaced by
a generic one.
Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

4972e6fa

11 10月, 2018 1 次提交

RDMA/netdev: Hoist alloc_netdev_mqs out of the driver · f6a8a19b

由 Denis Drozdov 提交于 8月 14, 2018

netdev has several interfaces that expect to call alloc_netdev_mqs from
the core code, with the driver only providing the arguments.  This is
incompatible with the rdma_netdev interface that returns the netdev
directly.

Thus re-organize the API used by ipoib so that the verbs core code calls
alloc_netdev_mqs for the driver. This is done by allowing the drivers to
provide the allocation parameters via a 'get_params' callback and then
initializing an allocated netdev as a second step.

Fixes: cd565b4b ("IB/IPoIB: Support acceleration options callbacks")
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
Signed-off-by: NDenis Drozdov <denisd@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

f6a8a19b

02 10月, 2018 1 次提交

net/mlx5: Cache the system image guid · 59c9d35e

由 Alaa Hleihel 提交于 9月 05, 2018

The system image guid is a read-only field which is used by the TC
offloads code to determine if two mlx5 devices belong to the same
ASIC while adding flows.

Read this once and save it on the core device rather than querying each
time an offloaded flow is added.
Signed-off-by: NAlaa Hleihel <alaa@mellanox.com>
Reviewed-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

59c9d35e

25 9月, 2018 1 次提交

net/mlx5: Set uid as part of SRQ commands · a0d8c054

由 Yishai Hadas 提交于 9月 20, 2018

Set uid as part of SRQ commands so that the firmware can manage the
SRQ object in a secured way.

That will enable using an SRQ that was created by verbs application
to be used by the DEVX flow in case the uid is equal.
Signed-off-by: NYishai Hadas <yishaih@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>

a0d8c054

06 9月, 2018 8 次提交

net/mlx5e: Replace PTP clock lock from RW lock to seq lock · 64109f1d

由 Shay Agroskin 提交于 6月 05, 2018

Changed "priv.clock.lock" lock from 'rw_lock' to 'seq_lock'
in order to improve packet rate performance.

Tested on Intel(R) Xeon(R) CPU E5-2660 v2 @ 2.20GHz.
Sent 64b packets between two peers connected by ConnectX-5,
and measured packet rate for the receiver in three modes:
	no time-stamping (base rate)
	time-stamping using rw_lock (old lock) for critical region
	time-stamping using seq_lock (new lock) for critical region
Only the receiver time stamped its packets.

The measured packet rate improvements are:

	Single flow (multiple TX rings to single RX ring):
		without timestamping:	  4.26 (M packets)/sec
		with rw-lock (old lock):  4.1  (M packets)/sec
		with seq-lock (new lock): 4.16 (M packets)/sec
		1.46% improvement

	Multiple flows (multiple TX rings to six RX rings):
		without timestamping: 	  22   (M packets)/sec
		with rw-lock (old lock):  11.7 (M packets)/sec
		with seq-lock (new lock): 21.3 (M packets)/sec
		82.05% improvement

The packet rate improvement is due to the lack of atomic operations
for the 'readers' by the seq-lock.
Since there are much more 'readers' than 'writers' contention
on this lock, almost all atomic operations are saved.
this results in a dramatic decrease in overall
cache misses.
Signed-off-by: NShay Agroskin <shayag@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

64109f1d

net/mlx5: Add flow counters idr · 12d6066c

由 Vlad Buslov 提交于 7月 24, 2018

Previous patch in series changed flow counter storage structure from
rb_tree to linked list in order to improve flow counter traversal
performance. The drawback of such solution is that flow counter lookup by
id becomes linear in complexity.

Store pointers to flow counters in idr in order to improve lookup
performance to logarithmic again. Idr is non-intrusive data structure and
doesn't require extending flow counter struct with new elements. This means
that idr can be used for lookup, while linked list from previous patch is
used for traversal, and struct mlx5_fc size is <= 2 cache lines.
Signed-off-by: NVlad Buslov <vladbu@mellanox.com>
Acked-by: NAmir Vadai <amir@vadai.me>
Reviewed-by: NPaul Blakey <paulb@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

12d6066c

net/mlx5: Store flow counters in a list · 9aff93d7

由 Vlad Buslov 提交于 7月 24, 2018

In order to improve performance of flow counter stats query loop that
traverses all configured flow counters, replace rb_tree with double-linked
list. This change improves performance of traversing flow counters by
removing the tree traversal. (profiling data showed that call to rb_next
was most top CPU consumer)

However, lookup of flow flow counter in list becomes linear, instead of
logarithmic. This problem is fixed by next patch in series, which adds idr
for fast lookup. Idr is to be used because it is not an intrusive data
structure and doesn't require adding any new members to struct mlx5_fc,
which allows its control data part to stay <= 1 cache line in size.
Signed-off-by: NVlad Buslov <vladbu@mellanox.com>
Acked-by: NAmir Vadai <amir@vadai.me>
Reviewed-by: NPaul Blakey <paulb@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

9aff93d7

net/mlx5: Add new list to store deleted flow counters · 6e5e2283

由 Vlad Buslov 提交于 7月 23, 2018

In order to prevent flow counters stats work function from traversing whole
flow counters tree while searching for deleted flow counters, new list to
store deleted flow counters is added to struct mlx5_fc_stats. Lockless
NULL-terminated single linked list data type is used due to following
reasons:
 - This use case only needs to add single element to list and
 remove/iterate whole list. Lockless list doesn't require any additional
 synchronization for these operations.
 - First cache line of flow counter data structure only has space to store
 single additional pointer, which precludes usage of double linked list.

Remove flow counter 'deleted' flag that is no longer needed.
Signed-off-by: NVlad Buslov <vladbu@mellanox.com>
Acked-by: NAmir Vadai <amir@vadai.me>
Reviewed-by: NPaul Blakey <paulb@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

6e5e2283

net/mlx5: Change flow counters addlist type to single linked list · 83033688

由 Vlad Buslov 提交于 7月 23, 2018

In order to prevent flow counters stats work function from traversing whole
flow counters tree while searching for deleted flow counters, new list to
store deleted flow counters will be added to struct mlx5_fc_stats. However,
the flow counter structure itself has no space left to store any more data
in first cache line. To free space that is needed to store additional list
node, convert current addlist double linked list (two pointers per node) to
atomic single linked list (one pointer per node).

Lockless NULL-terminated single linked list data type doesn't require any
additional external synchronization for operations used by flow counters
module (add single new element, remove all elements from list and traverse
them). Remove addlist_lock that is no longer needed.
Signed-off-by: NVlad Buslov <vladbu@mellanox.com>
Acked-by: NAmir Vadai <amir@vadai.me>
Reviewed-by: NPaul Blakey <paulb@mellanox.com>
Reviewed-by: NRoi Dayan <roid@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

83033688

net/mlx5: Use u16 for Work Queue buffer strides offset · a0903622

由 Tariq Toukan 提交于 8月 21, 2018

Minimal stride size is 16.
Hence, the number of strides in a fragment (of PAGE_SIZE)
is <= PAGE_SIZE / 16 <= 4K.

u16 is sufficient to represent this.

Fixes: d7037ad7 ("net/mlx5: Fix QP fragmented buffer allocation")
Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
Reviewed-by: NEran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

a0903622

net/mlx5: Use u16 for Work Queue buffer fragment size · 8d71e818

由 Tariq Toukan 提交于 8月 21, 2018

Minimal stride size is 16.
Hence, the number of strides in a fragment (of PAGE_SIZE)
is <= PAGE_SIZE / 16 <= 4K.

u16 is sufficient to represent this.

Fixes: 388ca8be ("IB/mlx5: Implement fragmented completion queue (CQ)")
Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
Reviewed-by: NEran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

8d71e818

net/mlx5: Fix use-after-free in self-healing flow · 76d5581c

由 Jack Morgenstein 提交于 8月 05, 2018

When the mlx5 health mechanism detects a problem while the driver
is in the middle of init_one or remove_one, the driver needs to prevent
the health mechanism from scheduling future work; if future work
is scheduled, there is a problem with use-after-free: the system WQ
tries to run the work item (which has been freed) at the scheduled
future time.

Prevent this by disabling work item scheduling in the health mechanism
when the driver is in the middle of init_one() or remove_one().

Fixes: e126ba97 ("mlx5: Add driver for Mellanox Connect-IB adapters")
Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
Reviewed-by: NFeras Daoud <ferasda@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

76d5581c

04 9月, 2018 1 次提交

net/mlx5: Fix atomic_mode enum values · aa7e80b2

由 Moni Shoua 提交于 9月 03, 2018

The field atomic_mode is 4 bits wide and therefore can hold values
from 0x0 to 0xf. Remove the unnecessary 20 bit shift that made the values
be incorrect. While that, remove unused enum values.

Fixes: 57cda166 ("net/mlx5: Add DCT command interface")
Signed-off-by: NMoni Shoua <monis@mellanox.com>
Reviewed-by: NArtemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>

aa7e80b2

03 8月, 2018 1 次提交

RDMA/netdev: Use priv_destructor for netdev cleanup · 9f49a5b5

由 Jason Gunthorpe 提交于 7月 29, 2018

Now that the unregister_netdev flow for IPoIB no longer relies on external
code we can now introduce the use of priv_destructor and
needs_free_netdev.

The rdma_netdev flow is switched to use the netdev common priv_destructor
instead of the special free_rdma_netdev and the IPOIB ULP adjusted:
 - priv_destructor needs to switch to point to the ULP's destructor
   which will then call the rdma_ndev's in the right order
 - We need to be careful around the error unwind of register_netdev
   as it sometimes calls priv_destructor on failure
 - ULPs need to use ndo_init/uninit to ensure proper ordering
   of failures around register_netdev

Switching to priv_destructor is a necessary pre-requisite to using
the rtnl new_link mechanism.

The VNIC user for rdma_netdev should also be revised, but that is left for
another patch.
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
Signed-off-by: NDenis Drozdov <denisd@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>

9f49a5b5

28 7月, 2018 1 次提交

net/mlx5e: Vxlan, move vxlan logic to core driver · 358aa5ce

由 Saeed Mahameed 提交于 5月 09, 2018

Move vxlan logic and objects to mlx5 core dirver.
Since it going to be used from different mlx5 interfaces.
e.g. mlx5e PF NIC netdev and mlx5e E-Switch representors.
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
Reviewed-by: NOr Gerlitz <ogerlitz@mellanox.com>

358aa5ce

24 7月, 2018 1 次提交

net/mlx5: FW tracer, implement tracer logic · f53aaa31

由 Feras Daoud 提交于 7月 16, 2018

Implement FW tracer logic and registers access, initialization and
cleanup flows.

Initializing the tracer will be part of load one flow, as multiple
PFs will try to acquire ownership but only one will succeed and will
be the tracer owner.
Signed-off-by: NFeras Daoud <ferasda@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

f53aaa31

19 7月, 2018 3 次提交

net/mlx5: Fix QP fragmented buffer allocation · d7037ad7

由 Tariq Toukan 提交于 7月 08, 2018

Fix bad alignment of SQ buffer in fragmented QP allocation.
It should start directly after RQ buffer ends.

Take special care of the end case where the RQ buffer does not occupy
a whole page. RQ size is a power of two, so would be the case only for
small RQ sizes (RQ size < PAGE_SIZE).

Fix wrong assignments for sqb->size (mistakenly assigned RQ size),
and for npages value of RQ and SQ.

Fixes: 3a2f7033 ("net/mlx5: Use order-0 allocations for all WQ types")
Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

d7037ad7

net/mlx5: Expose MPEGC (Management PCIe General Configuration) structures · 5e022dd3

由 Eran Ben Elisha 提交于 7月 16, 2018

This patch exposes PRM layout for handling MPEGC (Management PCIe
General Configuration).

This will be used in the downstream patch for configuring MPEGC via the
driver.
Signed-off-by: NEran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: NMoshe Shemesh <moshe@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

5e022dd3

net/mlx5: FW tracer, add hardware structures · eff8ea8f

由 Feras Daoud 提交于 7月 16, 2018

This change adds the infrastructure to mlx5 core fw tracer.
It introduces the following 4 new registers:
MLX5_REG_MTRC_CAP  - Used to read tracer capabilities
MLX5_REG_MTRC_CONF - Used to set tracer configurations
MLX5_REG_MTRC_STDB - Used to query tracer strings database
MLX5_REG_MTRC_CTRL - Used to control the tracer

The capability of the tracing can be checked using mcam access
register, therefore, the mcam access register interface will expose
the tracer register.
Signed-off-by: NFeras Daoud <ferasda@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

eff8ea8f

05 7月, 2018 1 次提交

net/mlx5: Limit scope of dump_fill_mkey function · 4d4fb5dc

由 Yonatan Cohen 提交于 6月 19, 2018

mlx5_core_dump_fill_mkey() is going to be used in next
patch in IB and doesn't need to be visible to whole
mlx5_core. Move that command to mlx5_ib.
Signed-off-by: NYonatan Cohen <yonatanc@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>

4d4fb5dc

26 5月, 2018 1 次提交

net/mlx5: Use order-0 allocations for all WQ types · 3a2f7033

由 Tariq Toukan 提交于 4月 04, 2018

Complete the transition of all WQ types to use fragmented
order-0 coherent memory instead of high-order allocations.

CQ-WQ already uses order-0.
Here we do the same for cyclic and linked-list WQs.

This allows the driver to load cleanly on systems with a highly
fragmented coherent memory.

Performance tests:
ConnectX-5 100Gbps, CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz
Packet rate of 64B packets, single transmit ring, size 8K.

No degradation is sensed.
Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

3a2f7033

25 5月, 2018 1 次提交

net/mlx5: PPTB and PBMC register firmware command support · 50b4a3c2

由 Huy Nguyen 提交于 3月 02, 2018

Add firmware command interface to read and write PPTB and PBMC
registers.

PPTB register enables mappings priority to a specific receive buffer.

PBMC registers enables changing the receive buffer's configuration such
as buffer size, xon/xoff thresholds, buffer's lossy property and
buffer's shared property.
Signed-off-by: NHuy Nguyen <huyn@mellanox.com>
Reviewed-by: NParav Pandit <parav@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

50b4a3c2

17 5月, 2018 1 次提交

net/mlx5: Fix build break when CONFIG_SMP=n · e3ca3488

由 Saeed Mahameed 提交于 5月 14, 2018

Avoid using the kernel's irq_descriptor and return IRQ vector affinity
directly from the driver.

This fixes the following build break when CONFIG_SMP=n

include/linux/mlx5/driver.h: In function ‘mlx5_get_vector_affinity_hint’:
include/linux/mlx5/driver.h:1299:13: error:
        ‘struct irq_desc’ has no member named ‘affinity_hint’

Fixes: 6082d9c9 ("net/mlx5: Fix mlx5_get_vector_affinity function")
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
CC: Randy Dunlap <rdunlap@infradead.org>
CC: Guenter Roeck <linux@roeck-us.net>
CC: Thomas Gleixner <tglx@linutronix.de>
Tested-by: NIsrael Rukshin <israelr@mellanox.com>
Reported-by: Nkbuild test robot <lkp@intel.com>
Reported-by: NRandy Dunlap <rdunlap@infradead.org>
Tested-by: NRandy Dunlap <rdunlap@infradead.org>
Acked-by: NThomas Gleixner <tglx@linutronix.de>
Tested-by: NGuenter Roeck <linux@roeck-us.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e3ca3488

27 4月, 2018 1 次提交

net/mlx5: Fix mlx5_get_vector_affinity function · 6082d9c9

由 Israel Rukshin 提交于 4月 12, 2018

Adding the vector offset when calling to mlx5_vector2eqn() is wrong.
This is because mlx5_vector2eqn() checks if EQ index is equal to vector number
and the fact that the internal completion vectors that mlx5 allocates
don't get an EQ index.

The second problem here is that using effective_affinity_mask gives the same
CPU for different vectors.
This leads to unmapped queues when calling it from blk_mq_rdma_map_queues().
This doesn't happen when using affinity_hint mask.

Fixes: 2572cf57 ("mlx5: fix mlx5_get_vector_affinity to start from completion vector 0")
Fixes: 05e0cc84 ("net/mlx5: Fix get vector affinity helper function")
Signed-off-by: NIsrael Rukshin <israelr@mellanox.com>
Reviewed-by: NMax Gurtovoy <maxg@mellanox.com>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>

6082d9c9

20 3月, 2018 1 次提交

net/mlx5: Packet pacing enhancement · 05d3ac97

由 Bodong Wang 提交于 3月 19, 2018

Add two new parameters: max_burst_sz and typical_pkt_size (both
in bytes) to rate limit configurations.

max_burst_sz: The device will schedule bursts of packets for an
SQ connected to this rate, smaller than or equal to this value.
Value 0x0 indicates packet bursts will be limited to the device
defaults. This field should be used if bursts of packets must be
strictly kept under a certain value.

typical_pkt_size: When the rate limit is intended for a stream of
similar packets, stating the typical packet size can improve the
accuracy of the rate limiter. The expected packet size will be
the same for all SQs associated with the same rate limit index.

Ethernet driver is updated according to this change, but these two
parameters will be kept as 0 due to lacking of proper way to get the
configurations from user space which requires to change
ndo_set_tx_maxrate interface.
Signed-off-by: NBodong Wang <bodong@mellanox.com>
Reviewed-by: NDaniel Jurgens <danielj@mellanox.com>
Reviewed-by: NYishai Hadas <yishaih@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

05d3ac97

14 3月, 2018 1 次提交

IB/mlx5: Fix integer overflows in mlx5_ib_create_srq · c2b37f76

由 Boris Pismenny 提交于 3月 08, 2018

This patch validates user provided input to prevent integer overflow due
to integer manipulation in the mlx5_ib_create_srq function.

Cc: syzkaller <syzkaller@googlegroups.com>
Fixes: e126ba97 ("mlx5: Add driver for Mellanox Connect-IB adapters")
Signed-off-by: NBoris Pismenny <borisp@mellanox.com>
Signed-off-by: NLeon Romanovsky <leon@kernel.org>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

c2b37f76

24 2月, 2018 1 次提交

net/mlx5: E-Switch, Move representors definition to a global scope · 57cbd893

由 Mark Bloch 提交于 1月 16, 2018

In preparation for IB representors, move representors structs to a global
scope, also expose functions needed for registration, unregistration,
eswitch mode and creating a flow rule to direct traffic from SQs to the
right VF.
Signed-off-by: NMark Bloch <markb@mellanox.com>
Reviewed-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NLeon Romanovsky <leon@kernel.org>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

57cbd893

15 2月, 2018 4 次提交

IB/mlx5: Implement fragmented completion queue (CQ) · 388ca8be

由 Yonatan Cohen 提交于 1月 02, 2018

The current implementation of create CQ requires contiguous
memory, such requirement is problematic once the memory is
fragmented or the system is low in memory, it causes for
failures in dma_zalloc_coherent().

This patch implements new scheme of fragmented CQ to overcome
this issue by introducing new type: 'struct mlx5_frag_buf_ctrl'
to allocate fragmented buffers, rather than contiguous ones.

Base the Completion Queues (CQs) on this new fragmented buffer.

It fixes following crashes:
kworker/29:0: page allocation failure: order:6, mode:0x80d0
CPU: 29 PID: 8374 Comm: kworker/29:0 Tainted: G OE 3.10.0
Workqueue: ib_cm cm_work_handler [ib_cm]
Call Trace:
[<>] dump_stack+0x19/0x1b
[<>] warn_alloc_failed+0x110/0x180
[<>] __alloc_pages_slowpath+0x6b7/0x725
[<>] __alloc_pages_nodemask+0x405/0x420
[<>] dma_generic_alloc_coherent+0x8f/0x140
[<>] x86_swiotlb_alloc_coherent+0x21/0x50
[<>] mlx5_dma_zalloc_coherent_node+0xad/0x110 [mlx5_core]
[<>] ? mlx5_db_alloc_node+0x69/0x1b0 [mlx5_core]
[<>] mlx5_buf_alloc_node+0x3e/0xa0 [mlx5_core]
[<>] mlx5_buf_alloc+0x14/0x20 [mlx5_core]
[<>] create_cq_kernel+0x90/0x1f0 [mlx5_ib]
[<>] mlx5_ib_create_cq+0x3b0/0x4e0 [mlx5_ib]
Signed-off-by: NYonatan Cohen <yonatanc@mellanox.com>
Reviewed-by: NTariq Toukan <tariqt@mellanox.com>
Signed-off-by: NLeon Romanovsky <leon@kernel.org>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

388ca8be

net/mlx5: Remove redundant EQ API exports · 3ec5693b

由 Saeed Mahameed 提交于 2月 01, 2018

EQ structure and API is private to mlx5_core driver only, external
drivers should not have access or the means to manipulate EQ objects.

Remove redundant exports and move API functions out of the linux/mlx5
include directory into the driver's mlx5_core.h private include file.
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
Reviewed-by: NGal Pressman <galp@mellanox.com>

3ec5693b

net/mlx5: Move CQ completion and event forwarding logic to eq.c · 3ac7afdb

由 Saeed Mahameed 提交于 2月 01, 2018

Since CQ tree is now per EQ, CQ completion and event forwarding became
specific implementation of EQ logic, this patch moves that logic to eq.c
and makes those functions static.
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
Reviewed-by: NGal Pressman <galp@mellanox.com>

3ac7afdb

net/mlx5: CQ Database per EQ · 02d92f79

由 Saeed Mahameed 提交于 1月 19, 2018

Before this patch the driver had one CQ database protected via one
spinlock, this spinlock is meant to synchronize between CQ
adding/removing and CQ IRQ interrupt handling.

On a system with large number of CPUs and on a work load that requires
lots of interrupts, this global spinlock becomes a very nasty hotspot
and introduces a contention between the active cores, which will
significantly hurt performance and becomes a bottleneck that prevents
seamless cpu scaling.

To solve this we simply move the CQ database and its spinlock to be per
EQ (IRQ), thus per core.

Tested with:
system: 2 sockets, 14 cores per socket, hyperthreading, 2x14x2=56 cores
netperf command: ./super_netperf 200 -P 0 -t TCP_RR  -H <server> -l 30 -- -r 300,300 -o -s 1M,1M -S 1M,1M

WITHOUT THIS PATCH:
Average:     CPU    %usr   %nice    %sys %iowait    %irq   %soft %steal  %guest  %gnice   %idle
Average:     all    4.32    0.00   36.15    0.09    0.00   34.02   0.00    0.00    0.00   25.41

Samples: 2M of event 'cycles:pp', Event count (approx.): 1554616897271
Overhead  Command          Shared Object                 Symbol
+   14.28%  swapper          [kernel.vmlinux]              [k] intel_idle
+   12.25%  swapper          [kernel.vmlinux]              [k] queued_spin_lock_slowpath
+   10.29%  netserver        [kernel.vmlinux]              [k] queued_spin_lock_slowpath
+    1.32%  netserver        [kernel.vmlinux]              [k] mlx5e_xmit

WITH THIS PATCH:
Average:     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
Average:     all    4.27    0.00   34.31    0.01    0.00   18.71    0.00    0.00    0.00   42.69

Samples: 2M of event 'cycles:pp', Event count (approx.): 1498132937483
Overhead  Command          Shared Object             Symbol
+   23.33%  swapper          [kernel.vmlinux]          [k] intel_idle
+    1.69%  netserver        [kernel.vmlinux]          [k] mlx5e_xmit
Tested-by: NSong Liu <songliubraving@fb.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
Reviewed-by: NGal Pressman <galp@mellanox.com>

02d92f79

05 2月, 2018 1 次提交

mlx5: fix mlx5_get_vector_affinity to start from completion vector 0 · 2572cf57

由 Sagi Grimberg 提交于 2月 05, 2018

The consumers of this routine expects the affinity map of of vector
index relative to the first completion vector. The upper layers are
not aware of internal/private completion vectors that mlx5 allocates
for its own usage.

Hence, return the affinity map of vector index relative to the first
completion vector.

Fixes: 05e0cc84 ("net/mlx5: Fix get vector affinity helper function")
Reported-by: NLogan Gunthorpe <logang@deltatee.com>
Tested-by: NMax Gurtovoy <maxg@mellanox.com>
Reviewed-by: NMax Gurtovoy <maxg@mellanox.com>
Cc: <stable@vger.kernel.org> # v4.15
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

2572cf57

19 1月, 2018 1 次提交

net/mlx5e: Add clock info page to mlx5 core devices · 24d33d2c

由 Feras Daoud 提交于 1月 16, 2018

Adds a new page to mlx5 core containing clock info data that allows
user level applications to translate between cqe timestamp to
nanoseconds. The information stored into this page is represented
through mlx5_ib_clock_info.

In order to synchronize between kernel and user space a sequence
number is incremented at the beginning and end of each update.
An odd number means the data is being updated while an even means
the access was already done. To guarantee that the data structure
was accessed atomically user will:

repeat:
        seq1 = <read sequence>
        goto <repeate> while odd
        <read data structure>
        seq2 = <read sequence>
        if seq1 != seq2 goto repeat
Reviewed-by: NYishai Hadas <yishaih@mellanox.com>
Reviewed-by: NJason Gunthorpe <jgg@mellanox.com>
Reviewed-by: NAlex Vesker <valex@mellanox.com>
Signed-off-by: NFeras Daoud <ferasda@mellanox.com>
Signed-off-by: NEitan Rabin <rabin@mellanox.com>
Signed-off-by: NLeon Romanovsky <leon@kernel.org>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

24d33d2c

12 1月, 2018 1 次提交

net/mlx5: Fix get vector affinity helper function · 05e0cc84

由 Saeed Mahameed 提交于 1月 04, 2018

mlx5_get_vector_affinity used to call pci_irq_get_affinity and after
reverting the patch that sets the device affinity via PCI_IRQ_AFFINITY
API, calling pci_irq_get_affinity becomes useless and it breaks RDMA
mlx5 users. To fix this, this patch provides an alternative way to
retrieve IRQ vector affinity using legacy IRQ API, following
smp_affinity read procfs implementation.

Fixes: 231243c8 ("Revert mlx5: move affinity hints assignments to generic code")
Fixes: a435393a ("mlx5: move affinity hints assignments to generic code")
Cc: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

05e0cc84

09 1月, 2018 5 次提交

{net, IB}/mlx5: Change set_roce_gid to take a port number · cfe4e37f

由 Daniel Jurgens 提交于 1月 04, 2018

When in dual port mode setting a RoCE GID for any port flows through the
master ports mlx5_core_dev. Provide an interface to set the port when
sending this command.
Signed-off-by: NDaniel Jurgens <danielj@mellanox.com>
Reviewed-by: NParav Pandit <parav@mellanox.com>
Signed-off-by: NLeon Romanovsky <leon@kernel.org>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

cfe4e37f

{net, IB}/mlx5: Manage port association for multiport RoCE · 32f69e4b

由 Daniel Jurgens 提交于 1月 04, 2018

When mlx5_ib_add is called determine if the mlx5 core device being
added is capable of dual port RoCE operation. If it is, determine
whether it is a master device or a slave device using the
num_vhca_ports and affiliate_nic_vport_criteria capabilities.

If the device is a slave, attempt to find a master device to affiliate it
with. Devices that can be affiliated will share a system image guid. If
none are found place it on a list of unaffiliated ports. If a master is
found bind the port to it by configuring the port affiliation in the NIC
vport context.

Similarly when mlx5_ib_remove is called determine the port type. If it's
a slave port, unaffiliate it from the master device, otherwise just
remove it from the unaffiliated port list.

The IB device is registered as a multiport device, even if a 2nd port is
not available for affiliation. When the 2nd port is affiliated later the
GID cache must be refreshed in order to get the default GIDs for the 2nd
port in the cache. Export roce_rescan_device to provide a mechanism to
refresh the cache after a new port is bound.

In a multiport configuration all IB object (QP, MR, PD, etc) related
commands should flow through the master mlx5_core_dev, other commands
must be sent to the slave port mlx5_core_mdev, an interface is provide
to get the correct mdev for non IB object commands.
Signed-off-by: NDaniel Jurgens <danielj@mellanox.com>
Reviewed-by: NParav Pandit <parav@mellanox.com>
Signed-off-by: NLeon Romanovsky <leon@kernel.org>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

32f69e4b

IB/mlx5: Make netdev notifications multiport capable · 7fd8aefb

由 Daniel Jurgens 提交于 1月 04, 2018

When multiple RoCE ports are supported registration for events on
multiple netdevs is required. Refactor the event registration and
handling to support multiple ports.
Signed-off-by: NDaniel Jurgens <danielj@mellanox.com>
Reviewed-by: NParav Pandit <parav@mellanox.com>
Signed-off-by: NLeon Romanovsky <leon@kernel.org>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

7fd8aefb

net/mlx5: Fix race for multiple RoCE enable · 734dc065

由 Daniel Jurgens 提交于 1月 04, 2018

There are two potential problems with the existing implementation.

1. Enable and disable can race after the atomic operations.
2. If a command fails the refcount is left in an inconsistent state.

Introduce a lock and perform error checking.

Fixes: a6f7d2af ("net/mlx5: Add support for multiple RoCE enable")
Signed-off-by: NDaniel Jurgens <danielj@mellanox.com>
Reviewed-by: NParav Pandit <parav@mellanox.com>
Signed-off-by: NLeon Romanovsky <leon@kernel.org>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

734dc065

net/mlx5: Add DCT command interface · 57cda166

由 Moni Shoua 提交于 1月 02, 2018

Add a missing command interface to work with a DCT. It includes: creating,
destroying and get events for.
Signed-off-by: NMoni Shoua <monis@mellanox.com>
Reviewed-by: NYishai Hadas <yishaih@mellanox.com>
Signed-off-by: NLeon Romanovsky <leon@kernel.org>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

57cda166

openeuler / Kernel 大约 1 年 前同步成功

openeuler / Kernel
大约 1 年前同步成功