提交 · 22215908d81f61d293e8b128e819a8437f37cc20 · openeuler / Kernel

15 2月, 2018 4 次提交

IB/mlx5: Implement fragmented completion queue (CQ) · 388ca8be

由 Yonatan Cohen 提交于 1月 02, 2018

The current implementation of create CQ requires contiguous
memory, such requirement is problematic once the memory is
fragmented or the system is low in memory, it causes for
failures in dma_zalloc_coherent().

This patch implements new scheme of fragmented CQ to overcome
this issue by introducing new type: 'struct mlx5_frag_buf_ctrl'
to allocate fragmented buffers, rather than contiguous ones.

Base the Completion Queues (CQs) on this new fragmented buffer.

It fixes following crashes:
kworker/29:0: page allocation failure: order:6, mode:0x80d0
CPU: 29 PID: 8374 Comm: kworker/29:0 Tainted: G OE 3.10.0
Workqueue: ib_cm cm_work_handler [ib_cm]
Call Trace:
[<>] dump_stack+0x19/0x1b
[<>] warn_alloc_failed+0x110/0x180
[<>] __alloc_pages_slowpath+0x6b7/0x725
[<>] __alloc_pages_nodemask+0x405/0x420
[<>] dma_generic_alloc_coherent+0x8f/0x140
[<>] x86_swiotlb_alloc_coherent+0x21/0x50
[<>] mlx5_dma_zalloc_coherent_node+0xad/0x110 [mlx5_core]
[<>] ? mlx5_db_alloc_node+0x69/0x1b0 [mlx5_core]
[<>] mlx5_buf_alloc_node+0x3e/0xa0 [mlx5_core]
[<>] mlx5_buf_alloc+0x14/0x20 [mlx5_core]
[<>] create_cq_kernel+0x90/0x1f0 [mlx5_ib]
[<>] mlx5_ib_create_cq+0x3b0/0x4e0 [mlx5_ib]
Signed-off-by: NYonatan Cohen <yonatanc@mellanox.com>
Reviewed-by: NTariq Toukan <tariqt@mellanox.com>
Signed-off-by: NLeon Romanovsky <leon@kernel.org>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

388ca8be

net/mlx5: Remove redundant EQ API exports · 3ec5693b

由 Saeed Mahameed 提交于 2月 01, 2018

EQ structure and API is private to mlx5_core driver only, external
drivers should not have access or the means to manipulate EQ objects.

Remove redundant exports and move API functions out of the linux/mlx5
include directory into the driver's mlx5_core.h private include file.
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
Reviewed-by: NGal Pressman <galp@mellanox.com>

3ec5693b

net/mlx5: Move CQ completion and event forwarding logic to eq.c · 3ac7afdb

由 Saeed Mahameed 提交于 2月 01, 2018

Since CQ tree is now per EQ, CQ completion and event forwarding became
specific implementation of EQ logic, this patch moves that logic to eq.c
and makes those functions static.
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
Reviewed-by: NGal Pressman <galp@mellanox.com>

3ac7afdb

net/mlx5: CQ Database per EQ · 02d92f79

由 Saeed Mahameed 提交于 1月 19, 2018

Before this patch the driver had one CQ database protected via one
spinlock, this spinlock is meant to synchronize between CQ
adding/removing and CQ IRQ interrupt handling.

On a system with large number of CPUs and on a work load that requires
lots of interrupts, this global spinlock becomes a very nasty hotspot
and introduces a contention between the active cores, which will
significantly hurt performance and becomes a bottleneck that prevents
seamless cpu scaling.

To solve this we simply move the CQ database and its spinlock to be per
EQ (IRQ), thus per core.

Tested with:
system: 2 sockets, 14 cores per socket, hyperthreading, 2x14x2=56 cores
netperf command: ./super_netperf 200 -P 0 -t TCP_RR  -H <server> -l 30 -- -r 300,300 -o -s 1M,1M -S 1M,1M

WITHOUT THIS PATCH:
Average:     CPU    %usr   %nice    %sys %iowait    %irq   %soft %steal  %guest  %gnice   %idle
Average:     all    4.32    0.00   36.15    0.09    0.00   34.02   0.00    0.00    0.00   25.41

Samples: 2M of event 'cycles:pp', Event count (approx.): 1554616897271
Overhead  Command          Shared Object                 Symbol
+   14.28%  swapper          [kernel.vmlinux]              [k] intel_idle
+   12.25%  swapper          [kernel.vmlinux]              [k] queued_spin_lock_slowpath
+   10.29%  netserver        [kernel.vmlinux]              [k] queued_spin_lock_slowpath
+    1.32%  netserver        [kernel.vmlinux]              [k] mlx5e_xmit

WITH THIS PATCH:
Average:     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
Average:     all    4.27    0.00   34.31    0.01    0.00   18.71    0.00    0.00    0.00   42.69

Samples: 2M of event 'cycles:pp', Event count (approx.): 1498132937483
Overhead  Command          Shared Object             Symbol
+   23.33%  swapper          [kernel.vmlinux]          [k] intel_idle
+    1.69%  netserver        [kernel.vmlinux]          [k] mlx5e_xmit
Tested-by: NSong Liu <songliubraving@fb.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
Reviewed-by: NGal Pressman <galp@mellanox.com>

02d92f79

05 2月, 2018 1 次提交

mlx5: fix mlx5_get_vector_affinity to start from completion vector 0 · 2572cf57

由 Sagi Grimberg 提交于 2月 05, 2018

The consumers of this routine expects the affinity map of of vector
index relative to the first completion vector. The upper layers are
not aware of internal/private completion vectors that mlx5 allocates
for its own usage.

Hence, return the affinity map of vector index relative to the first
completion vector.

Fixes: 05e0cc84 ("net/mlx5: Fix get vector affinity helper function")
Reported-by: NLogan Gunthorpe <logang@deltatee.com>
Tested-by: NMax Gurtovoy <maxg@mellanox.com>
Reviewed-by: NMax Gurtovoy <maxg@mellanox.com>
Cc: <stable@vger.kernel.org> # v4.15
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

2572cf57

19 1月, 2018 1 次提交

net/mlx5e: Add clock info page to mlx5 core devices · 24d33d2c

由 Feras Daoud 提交于 1月 16, 2018

Adds a new page to mlx5 core containing clock info data that allows
user level applications to translate between cqe timestamp to
nanoseconds. The information stored into this page is represented
through mlx5_ib_clock_info.

In order to synchronize between kernel and user space a sequence
number is incremented at the beginning and end of each update.
An odd number means the data is being updated while an even means
the access was already done. To guarantee that the data structure
was accessed atomically user will:

repeat:
        seq1 = <read sequence>
        goto <repeate> while odd
        <read data structure>
        seq2 = <read sequence>
        if seq1 != seq2 goto repeat
Reviewed-by: NYishai Hadas <yishaih@mellanox.com>
Reviewed-by: NJason Gunthorpe <jgg@mellanox.com>
Reviewed-by: NAlex Vesker <valex@mellanox.com>
Signed-off-by: NFeras Daoud <ferasda@mellanox.com>
Signed-off-by: NEitan Rabin <rabin@mellanox.com>
Signed-off-by: NLeon Romanovsky <leon@kernel.org>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

24d33d2c

12 1月, 2018 1 次提交

net/mlx5: Fix get vector affinity helper function · 05e0cc84

由 Saeed Mahameed 提交于 1月 04, 2018

mlx5_get_vector_affinity used to call pci_irq_get_affinity and after
reverting the patch that sets the device affinity via PCI_IRQ_AFFINITY
API, calling pci_irq_get_affinity becomes useless and it breaks RDMA
mlx5 users. To fix this, this patch provides an alternative way to
retrieve IRQ vector affinity using legacy IRQ API, following
smp_affinity read procfs implementation.

Fixes: 231243c8 ("Revert mlx5: move affinity hints assignments to generic code")
Fixes: a435393a ("mlx5: move affinity hints assignments to generic code")
Cc: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

05e0cc84

09 1月, 2018 5 次提交

{net, IB}/mlx5: Change set_roce_gid to take a port number · cfe4e37f

由 Daniel Jurgens 提交于 1月 04, 2018

When in dual port mode setting a RoCE GID for any port flows through the
master ports mlx5_core_dev. Provide an interface to set the port when
sending this command.
Signed-off-by: NDaniel Jurgens <danielj@mellanox.com>
Reviewed-by: NParav Pandit <parav@mellanox.com>
Signed-off-by: NLeon Romanovsky <leon@kernel.org>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

cfe4e37f

{net, IB}/mlx5: Manage port association for multiport RoCE · 32f69e4b

由 Daniel Jurgens 提交于 1月 04, 2018

When mlx5_ib_add is called determine if the mlx5 core device being
added is capable of dual port RoCE operation. If it is, determine
whether it is a master device or a slave device using the
num_vhca_ports and affiliate_nic_vport_criteria capabilities.

If the device is a slave, attempt to find a master device to affiliate it
with. Devices that can be affiliated will share a system image guid. If
none are found place it on a list of unaffiliated ports. If a master is
found bind the port to it by configuring the port affiliation in the NIC
vport context.

Similarly when mlx5_ib_remove is called determine the port type. If it's
a slave port, unaffiliate it from the master device, otherwise just
remove it from the unaffiliated port list.

The IB device is registered as a multiport device, even if a 2nd port is
not available for affiliation. When the 2nd port is affiliated later the
GID cache must be refreshed in order to get the default GIDs for the 2nd
port in the cache. Export roce_rescan_device to provide a mechanism to
refresh the cache after a new port is bound.

In a multiport configuration all IB object (QP, MR, PD, etc) related
commands should flow through the master mlx5_core_dev, other commands
must be sent to the slave port mlx5_core_mdev, an interface is provide
to get the correct mdev for non IB object commands.
Signed-off-by: NDaniel Jurgens <danielj@mellanox.com>
Reviewed-by: NParav Pandit <parav@mellanox.com>
Signed-off-by: NLeon Romanovsky <leon@kernel.org>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

32f69e4b

IB/mlx5: Make netdev notifications multiport capable · 7fd8aefb

由 Daniel Jurgens 提交于 1月 04, 2018

When multiple RoCE ports are supported registration for events on
multiple netdevs is required. Refactor the event registration and
handling to support multiple ports.
Signed-off-by: NDaniel Jurgens <danielj@mellanox.com>
Reviewed-by: NParav Pandit <parav@mellanox.com>
Signed-off-by: NLeon Romanovsky <leon@kernel.org>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

7fd8aefb

net/mlx5: Fix race for multiple RoCE enable · 734dc065

由 Daniel Jurgens 提交于 1月 04, 2018

There are two potential problems with the existing implementation.

1. Enable and disable can race after the atomic operations.
2. If a command fails the refcount is left in an inconsistent state.

Introduce a lock and perform error checking.

Fixes: a6f7d2af ("net/mlx5: Add support for multiple RoCE enable")
Signed-off-by: NDaniel Jurgens <danielj@mellanox.com>
Reviewed-by: NParav Pandit <parav@mellanox.com>
Signed-off-by: NLeon Romanovsky <leon@kernel.org>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

734dc065

net/mlx5: Add DCT command interface · 57cda166

由 Moni Shoua 提交于 1月 02, 2018

Add a missing command interface to work with a DCT. It includes: creating,
destroying and get events for.
Signed-off-by: NMoni Shoua <monis@mellanox.com>
Reviewed-by: NYishai Hadas <yishaih@mellanox.com>
Signed-off-by: NLeon Romanovsky <leon@kernel.org>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

57cda166

29 12月, 2017 1 次提交

IB/mlx5: Extend UAR stuff to support dynamic allocation · 31a78a5a

由 Yishai Hadas 提交于 12月 24, 2017

This patch extends the alloc context flow to be prepared for working
with dynamic UAR allocations.

Currently upon alloc context there is some fix size of UARs that are
allocated (named 'static allocation') and there is no option to user
application to ask for more or control which UAR will be used by which
QP.

In this patch the driver prepares its data structures to manage both the
static and the dynamic allocations and let the user driver knows about
the max value of dynamic blue-flame registers that are allowed.

Downstream patches from this series will enable the dynamic allocation
and the association as part of QP creation.
Signed-off-by: NYishai Hadas <yishaih@mellanox.com>
Signed-off-by: NLeon Romanovsky <leon@kernel.org>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

31a78a5a

22 12月, 2017 1 次提交

IB/mlx5: Fix congestion counters in LAG mode · 71a0ff65

由 Majd Dibbiny 提交于 12月 21, 2017

Congestion counters are counted and queried per physical function.
When working in LAG mode, CNP packets can be sent or received on both
of the functions, thus congestion counters should be aggregated from
the two physical functions.

Fixes: e1f24a79 ("IB/mlx5: Support congestion related counters")
Signed-off-by: NMajd Dibbiny <majd@mellanox.com>
Reviewed-by: NAviv Heller <avivh@mellanox.com>
Signed-off-by: NLeon Romanovsky <leon@kernel.org>
Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>

71a0ff65

20 12月, 2017 2 次提交

net/mlx5: Cleanup IRQs in case of unload failure · d6b2785c

由 Moshe Shemesh 提交于 11月 21, 2017

When mlx5_stop_eqs fails to destroy any of the eqs it returns with an error.
In such failure flow the function will return without
releasing all EQs irqs and then pci_free_irq_vectors will fail.
Fix by only warn on destroy EQ failure and continue to release other
EQs and their irqs.

It fixes the following kernel trace:
kernel: kernel BUG at drivers/pci/msi.c:352!
...
...
kernel: Call Trace:
kernel: pci_disable_msix+0xd3/0x100
kernel: pci_free_irq_vectors+0xe/0x20
kernel: mlx5_load_one.isra.17+0x9f5/0xec0 [mlx5_core]

Fixes: e126ba97 ("mlx5: Add driver for Mellanox Connect-IB adapters")
Signed-off-by: NMoshe Shemesh <moshe@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

d6b2785c

Revert "mlx5: move affinity hints assignments to generic code" · 231243c8

由 Saeed Mahameed 提交于 11月 10, 2017

Before the offending commit, mlx5 core did the IRQ affinity itself,
and it seems that the new generic code have some drawbacks and one
of them is the lack for user ability to modify irq affinity after
the initial affinity values got assigned.

The issue is still being discussed and a solution in the new generic code
is required, until then we need to revert this patch.

This fixes the following issue:
echo <new affinity> > /proc/irq/<x>/smp_affinity
fails with  -EIO

This reverts commit a435393a.
Note: kept mlx5_get_vector_affinity in include/linux/mlx5/driver.h since
it is used in mlx5_ib driver.

Fixes: a435393a ("mlx5: move affinity hints assignments to generic code")
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Jes Sorensen <jsorensen@fb.com>
Reported-by: NJes Sorensen <jsorensen@fb.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

231243c8

05 11月, 2017 2 次提交

net/mlx5: QPTS and QPDPM register firmware command support · 415a64aa

由 Huy Nguyen 提交于 7月 18, 2017

The QPTS register allows changing the priority trust state between pcp and
dscp. Add support to get/set trust state from device. When the port is
in pcp/dscp trust state, packet is routed by hardware to matching priority
based on its pcp/dscp value respectively.

The QPDPM register allow channing the dscp to priority mapping. Add support
to get/set dscp to priority mapping from device.
Note that to change a dscp mapping, the "e" bit of this dscp structure
must be set in the QPDPM firmware command.
Signed-off-by: NHuy Nguyen <huyn@mellanox.com>
Reviewed-by: NParav Pandit <parav@mellanox.com>
Reviewed-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

415a64aa

net/mlx5: QCAM register firmware command support · c02762eb

由 Huy Nguyen 提交于 7月 18, 2017

The QCAM register provides capability bit for all the QoS registers
using ACCESS_REG command.
Signed-off-by: NHuy Nguyen <huyn@mellanox.com>
Reviewed-by: NParav Pandit <parav@mellanox.com>
Reviewed-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

c02762eb

15 10月, 2017 1 次提交

net/mlx5: PTP code migration to driver core section · 7c39afb3

由 Feras Daoud 提交于 8月 15, 2017

PTP code is moved to core section of mlx5 driver in order to share
it between ethernet and infiniband. This movement involves the following
changes:
- Change mlx5e_ prefix to be mlx5_
- Add clock structs to Core
- Add clock object to mlx5_core_dev
- Call Init/Uninit clock from core init/cleanup
- Rename mlx5e_tstamp to be mlx5_clock
Signed-off-by: NFeras Daoud <ferasda@mellanox.com>
Signed-off-by: NEitan Rabin <rabin@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

7c39afb3

28 9月, 2017 1 次提交

net/mlx5: Fix FPGA capability location · 99d3cd27

由 Inbar Karmy 提交于 8月 24, 2017

Currently, FPGA capability is located in (mdev)->caps.hca_cur,
change the location to be (mdev)->caps.fpga,
since hca_cur is reserved for HCA device capabilities.

Fixes: e29341fb ("net/mlx5: FPGA, Add basic support for Innova")
Signed-off-by: NInbar Karmy <inbark@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

99d3cd27

31 8月, 2017 2 次提交

net/mlx5: Remove the flag MLX5_INTERFACE_STATE_SHUTDOWN · 10a8d007

由 Huy Nguyen 提交于 8月 09, 2017

MLX5_INTERFACE_STATE_SHUTDOWN is not used in the code.

Fixes: 5fc7197d ("net/mlx5: Add pci shutdown callback")
Signed-off-by: NHuy Nguyen <huyn@mellanox.com>
Reviewed-by: NDaniel Jurgens <danielj@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

10a8d007

net/mlx5: Skip mlx5_unload_one if mlx5_load_one fails · b3cb5388

由 Huy Nguyen 提交于 8月 08, 2017

There is an issue where the firmware fails during mlx5_load_one,
the health_care timer detects the issue and schedules a health_care call.
Then the mlx5_load_one detects the issue, cleans up and quits. Then
the health_care starts and calls mlx5_unload_one to clean up the resources
that no longer exist and causes kernel panic.

The root cause is that the bit MLX5_INTERFACE_STATE_DOWN is not set
after mlx5_load_one fails. The solution is removing the bit
MLX5_INTERFACE_STATE_DOWN and quit mlx5_unload_one if the
bit MLX5_INTERFACE_STATE_UP is not set. The bit MLX5_INTERFACE_STATE_DOWN
is redundant and we can use MLX5_INTERFACE_STATE_UP instead.

Fixes: 5fc7197d ("net/mlx5: Add pci shutdown callback")
Signed-off-by: NHuy Nguyen <huyn@mellanox.com>
Reviewed-by: NDaniel Jurgens <danielj@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

b3cb5388

29 8月, 2017 1 次提交

net/mlx5: Add XRQ support · 5b3ec3fc

由 Artemy Kovalyov 提交于 8月 17, 2017

Add support to new XRQ(eXtended shared Receive Queue)
hardware object. It supports SRQ semantics with addition
of extended receive buffers topologies and offloads.

Currently supports tag matching topology and rendezvouz offload.
Signed-off-by: NArtemy Kovalyov <artemyko@mellanox.com>
Reviewed-by: NYossi Itigin <yosefe@mellanox.com>
Signed-off-by: NLeon Romanovsky <leon@kernel.org>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

5b3ec3fc

25 8月, 2017 1 次提交

IB/mlx5: Enable UMR for MRs created with reg_create · 8b7ff7f3

由 Ilya Lesokhin 提交于 8月 17, 2017

This patch is the first step in decoupling UMR usage and
allocation from the MR cache. The only functional change
in this patch is to enables UMR for MRs created with
reg_create.

This change fixes a bug where ODP memory regions that
were not allocated from the MR cache did not have UMR
enabled.
Signed-off-by: NIlya Lesokhin <ilyal@mellanox.com>
Signed-off-by: NLeon Romanovsky <leon@kernel.org>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

8b7ff7f3

24 8月, 2017 1 次提交

net/mlx5: Remove a leftover unused variable · 07533c67

由 Gal Pressman 提交于 8月 21, 2017

mlx5_core_wq is no longer being used and should be removed
from the code.
Signed-off-by: NGal Pressman <galp@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

07533c67

23 8月, 2017 1 次提交

mlx5: Replace PCI pool old API · 18c90df9

由 Romain Perier 提交于 8月 22, 2017

The PCI pool API is deprecated. This commit replaces the PCI pool old
API by the appropriate function with the DMA pool API.
Signed-off-by: NRomain Perier <romain.perier@collabora.com>
Reviewed-by: NPeter Senna Tschudin <peter.senna@collabora.com>
Acked-by: NDoug Ledford <dledford@redhat.com>
Tested-by: NDoug Ledford <dledford@redhat.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

18c90df9

09 8月, 2017 2 次提交

mlx5: move affinity hints assignments to generic code · a435393a

由 Sagi Grimberg 提交于 7月 13, 2017

generic api takes care of spreading affinity similar to
what mlx5 open coded (and even handles better asymmetric
configurations). Ask the generic API to spread affinity
for us, and feed him pre_vectors that do not participate
in affinity settings (which is an improvement to what we
had before).

The affinity assignments should match what mlx5 tried to
do earlier but now we do not set affinity to async, cmd
and pages dedicated vectors.

Also, remove mlx5e_get_cpu and introduce mlx5e_get_node
(used for allocation purposes) and mlx5_get_vector_affinity
(for indirection table construction) as they provide the needed
information. Luckily, we have generic helpers to get cpumask
and node given a irq vector. mlx5_get_vector_affinity will
be used by mlx5_ib in a subsequent patch.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Acked-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

a435393a

mlx5: convert to generic pci_alloc_irq_vectors · 78249c42

由 Sagi Grimberg 提交于 7月 13, 2017

Now that we have a generic code to allocate an array
of irq vectors and even correctly spread their affinity,
correctly handle cpu hotplug events and more, were much
better off using it.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Acked-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

78249c42

07 8月, 2017 2 次提交

net/mlx5: Delay events till ib registration ends · 97834eba

由 Erez Shitrit 提交于 6月 07, 2017

When mlx5_ib registers itself to mlx5_core as an interface, it will
call mlx5_add_device which will call mlx5_ib interface add callback,
in case the latter successfully returns, only then mlx5_core will add
it to the interface list and async events will be forwarded to mlx5_ib.
Between mlx5_ib interface add callback and mlx5_core adding the mlx5_ib
interface to its devices list, arriving mlx5_core events can be missed
by the new mlx5_ib registering interface.

In other words:
thread 1: mlx5_ib: mlx5_register_interface(dev)
thread 1: mlx5_core: mlx5_add_device(dev)
thread 1: mlx5_core: ctx = dev->add => (mlx5_ib)->mlx5_ib_add
thread 2: mlx5_core_event: **new event arrives, forward to dev_list
thread 1: mlx5_core: add_ctx_to_dev_list(ctx)
/* previous event was missed by the new interface.*/
It is ok to miss events before dev->add (mlx5_ib)->mlx5_ib_add_device
but not after.

We fix this race by accumulating the events that come between the
ib_register_device (inside mlx5_add_device->(dev->add)) till the adding
to the list completes and fire them to the new registering interface
after that.

Fixes: f1ee87fe ("net/mlx5: Organize device list API in one place")
Signed-off-by: NErez Shitrit <erezsh@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

97834eba

net/mlx5: Separate between E-Switch and MPFS · eeb66cdb

由 Saeed Mahameed 提交于 6月 04, 2017

Multi-Physical Function Switch (MPFs) is required for when multi-PF
configuration is enabled to allow passing user configured unicast MAC
addresses to the requesting PF.

Before this patch eswitch.c used to manage the HW MPFS l2 table,
E-Switch always (regardless of sriov) enabled vport(0) (NIC PF) vport's
contexts update on unicast mac address list changes, to populate the PF's
MPFS L2 table accordingly.

In downstream patch we would like to allow compiling the driver without
E-Switch functionalities, for that we move MPFS l2 table logic out
of eswitch.c into its own file, and provide Kconfig flag (MLX5_MPFS) to
allow compiling out MPFS for those who don't want Multi-PF support.

NIC PF netdevice will now directly update MPFS l2 table via the new MPFS
API. VF netdevice has no access to MPFS L2 table, so E-Switch will remain
responsible of updating its MPFS l2 table on behalf of its VFs.

Due to this change we also don't require enabling vport(0) (PF vport)
unicast mac changes events anymore, for when SRIOV is not enabled.
Which means E-Switch is now activated only on SRIOV activation, and not
required otherwise.
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
Cc: Jes Sorensen <jsorensen@fb.com>
Cc: kernel-team@fb.com

eeb66cdb

24 7月, 2017 2 次提交

net/mlx5: Introduce general notification event · 246ac981

由 Maor Gottlieb 提交于 5月 30, 2017

When delay drop timeout is expired, the firmware raises
general notification event of DELAY_DROP_TIMEOUT subtype.
In addition the feature is disable so the driver have to
reactivate the timeout.
Signed-off-by: NMaor Gottlieb <maorg@mellanox.com>
Reviewed-by: NYishai Hadas <yishaih@mellanox.com>
Signed-off-by: NLeon Romanovsky <leon@kernel.org>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

246ac981

IB/mlx5: Restore IB guid/policy for virtual functions · 7ecf6d8f

由 Bodong Wang 提交于 5月 30, 2017

When a user sets port_guid, node_guid or policy of an IB virtual
function, save this information in "struct mlx5_vf_context".

This information will be restored later when pci_resume is called.
To make sure this works, one can use aer-inject to generate PCI
errors on mlx5 devices and verify if relevant fields are restored
after PCI resume.
Signed-off-by: NBodong Wang <bodong@mellanox.com>
Reviewed-by: NEli Cohen <eli@mellanox.com>
Signed-off-by: NLeon Romanovsky <leon@kernel.org>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

7ecf6d8f

27 6月, 2017 4 次提交

net/mlx5: FPGA, Add SBU infrastructure · a9956d35

由 Ilan Tayari 提交于 4月 18, 2017

Add interface to initialize and interact with Innova FPGA SBU
connections.
A client driver may use these functions to set up a high-speed DMA
connection with its SBU hardware logic, and send/receive messages
over this connection.

A later patch in this patchset will make use of these functions for
Innova IPSec offload in mlx5 Ethernet driver.

Add commands to retrieve Innova FPGA SBU capabilities, and to
read/write Innova FPGA configuration space registers and memory,
over internal I2C.

At high level, the FPGA configuration space is divided such:
 0x00000000 - 0x007fffff is reserved for the SBU
 0x00800000 - 0xffffffff is reserved for the Shell
0x400000000 - ...        is DDR memory

A later patchset will add support for accessing FPGA CrSpace and memory
over a high-speed connection. This is the reason for the ACCESS_TYPE
enumeration, which currently only supports I2C.
Signed-off-by: NIlan Tayari <ilant@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

a9956d35

net/mlx5: Add support for multiple RoCE enable · a6f7d2af

由 Ilan Tayari 提交于 3月 26, 2017

Previously, only mlx5_ib enabled RoCE on the port, but FPGA needs it as
well.
Add support for counting number of enables, so that FPGA and IB can work
in parallel and independently.
Program the HW to enable RoCE on the first enable call, and program to
disable RoCE on the last disable call.
Signed-off-by: NIlan Tayari <ilant@mellanox.com>
Reviewed-by: NBoris Pismenny <borisp@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

a6f7d2af

net/mlx5: Add reserved-gids support · 52ec462e

由 Ilan Tayari 提交于 3月 26, 2017

Reserved GIDs are entries in the GID table in use by the mlx5_core
and its submodules (e.g. FPGA, SRIOV, E-Swtich, netdev).
The entries are reserved at the high indexes of the GID table.

A mlx5 submodule may reserve a certain amount of GIDs for its own use
during the load sequence by calling mlx5_core_reserve_gids, and must
also take care to un-reserve these GIDs when it closes.
Reservation is only allowed during the load sequence and before any
interfaces (e.g. mlx5_ib or mlx5_en) are up.

After reservation, a submodule may call mlx5_core_reserved_gid_alloc/
free to allocate entries from the reserved GIDs pool.

Reserve a GID table entry for every supported FPGA QP.

A later patch in the patchset will remove them from being reported to
IB core.
Another such patch will make use of these for FPGA QPs in Innova NIC.

Added lib/mlx5.h to serve as a library for mlx5 submodlues, and to
expose only public mlx5 API, more mlx5 library files will be added in
future submissions.
Signed-off-by: NIlan Tayari <ilant@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

52ec462e

net/mlx5: Cancel delayed recovery work when unloading the driver · 2a0165a0

由 Mohamad Haj Yahia 提交于 3月 30, 2017

Draining the health workqueue will ignore future health works including
the one that report hardware failure and thus we can't enter error state
Instead cancel the recovery flow and make sure only recovery flow won't
be scheduled.

Fixes: 5e44fca5 ('net/mlx5: Only cancel recovery work when cleaning up device')
Signed-off-by: NMohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: NMoshe Shemesh <moshe@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

2a0165a0

22 6月, 2017 1 次提交

net/mlx5: Add MCC (Management Component Control) register definitions · 47176289

由 Or Gerlitz 提交于 4月 18, 2017

MCC (Management Component Control) allows to control a firmware
component update.

MCDA (Management Component Data Access) allows to read and write
a firmware component.

MCQI (Management Component Query Information) allows to query
information about firmware components.
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NYotam Gigi <yotamg@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

47176289

16 6月, 2017 1 次提交

net/mlx5: Expose command polling interface · 4525abea

由 Majd Dibbiny 提交于 2月 09, 2017

Add a new interface for commands execution that allows the
caller to wait for the command's completion in a busy-wait
loop (polling mode).

This is useful if we want to execute a command in a polling mode
while the driver is working in events mode for the rest of
the commands.
This interface will be used in the downstream patches.
Signed-off-by: NMajd Dibbiny <majd@mellanox.com>
Signed-off-by: NMaor Gottlieb <maorg@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

4525abea

23 5月, 2017 1 次提交

net/mlx5: Avoid using pending command interface slots · 73dd3a48

由 Mohamad Haj Yahia 提交于 2月 23, 2017

Currently when firmware command gets stuck or it takes long time to
complete, the driver command will get timeout and the command slot is
freed and can be used for new commands, and if the firmware receive new
command on the old busy slot its behavior is unexpected and this could
be harmful.
To fix this when the driver command gets timeout we return failure,
but we don't free the command slot and we wait for the firmware to
explicitly respond to that command.
Once all the entries are busy we will stop processing new firmware
commands.

Fixes: 9cba4ebc ('net/mlx5: Fix potential deadlock in command mode change')
Signed-off-by: NMohamad Haj Yahia <mohamad@mellanox.com>
Cc: kernel-team@fb.com
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

73dd3a48

14 5月, 2017 1 次提交

net/mlx5: FPGA, Add basic support for Innova · e29341fb

由 Ilan Tayari 提交于 3月 13, 2017

Mellanox Innova is a NIC with ConnectX and an FPGA on the same
board. The FPGA is a bump-on-the-wire and thus affects operation of
the mlx5_core driver on the ConnectX ASIC.

Add basic support for Innova in mlx5_core.

This allows using the Innova card as a regular NIC, by detecting
the FPGA capability bit, and verifying its load state before
initializing ConnectX interfaces.

Also detect FPGA fatal runtime failures and enter error state if
they ever happen.

All new FPGA-related logic is placed in its own subdirectory 'fpga',
which may be built by selecting CONFIG_MLX5_FPGA.
This prepares for further support of various Innova features in later
patchsets.
Additional details about hardware architecture will be provided as
more features get submitted.
Signed-off-by: NIlan Tayari <ilant@mellanox.com>
Reviewed-by: NBoris Pismenny <borisp@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

e29341fb

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功