提交 · d2c8a1554c10d5e0443b1f97f480d7dacd55cf55 · openeuler / Kernel

28 9月, 2021 1 次提交

IB/mlx5: Enable UAR to have DevX UID · d2c8a155

由 Meir Lichtinger 提交于 9月 22, 2021

UID field was added to alloc_uar and dealloc_uar PRM command, to specify
DevX UID for UAR. This change enables firmware validating user access to
its own UAR resources.

For the kernel allocated UARs the UID will stay 0 as of today.
Signed-off-by: NMeir Lichtinger <meirl@nvidia.com>
Reviewed-by: NYishai Hadas <yishaih@nvidia.com>
Signed-off-by: NLeon Romanovsky <leonro@nvidia.com>

d2c8a155

12 8月, 2021 4 次提交

net/mlx5: Allocate individual capability · 48f02eef

由 Parav Pandit 提交于 7月 13, 2021

Currently mlx5_core_dev contains array of capabilities. It contains 19
valid capabilities of the device, 2 reserved entries and 12 holes.
Due to this for 14 unused entries, mlx5_core_dev allocates 14 * 8K = 112K
bytes of memory which is never used. Due to this mlx5_core_dev structure
size is 270Kbytes odd. This allocation further aligns to next power of 2
to 512Kbytes.

By skipping non-existent entries,
(a) 112Kbyte is saved,
(b) mlx5_core_dev reduces to 8KB with alignment
(c) 350KB saved in alignment

In future individual capability allocation can be used to skip its
allocation when such capability is disabled at the device level. This
patch prepares mlx5_core_dev to hold capability using a pointer instead
of inline array.
Signed-off-by: NParav Pandit <parav@nvidia.com>
Reviewed-by: NLeon Romanovsky <leonro@nvidia.com>
Reviewed-by: NShay Drory <shayd@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

48f02eef

net/mlx5: Reorganize current and maximal capabilities to be per-type · 5958a6fa

由 Parav Pandit 提交于 7月 13, 2021

In the current code, the current and maximal capabilities are
maintained in separate arrays which are both per type. In order to
allow the creation of such a basic structure as a dynamically
allocated array, we move curr and max fields to a unified
structure so that specific capabilities can be allocated as one unit.
Signed-off-by: NParav Pandit <parav@nvidia.com>
Reviewed-by: NLeon Romanovsky <leonro@nvidia.com>
Reviewed-by: NShay Drory <shayd@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

5958a6fa

net/mlx5: Delete impossible dev->state checks · 8e792700

由 Leon Romanovsky 提交于 8月 01, 2021

New mlx5_core device structure is allocated through devlink_alloc
with\ kzalloc and that ensures that all fields are equal to zero
and it includes ->state too.

That means that checks of that field in the mlx5_init_one() is
completely redundant, because that function is called only once
in the begging of mlx5_core_dev lifetime.

PCI:
 .probe()
  -> probe_one()
   -> mlx5_init_one()

The recovery flow can't run at that time or before it, because relevant
work initialized later in mlx5_init_once().

Such initialization flow ensures that dev->state can't be
MLX5_DEVICE_STATE_UNINITIALIZED at all, so remove such impossible
checks.
Signed-off-by: NLeon Romanovsky <leonro@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

8e792700

net/mlx5: Fix typo in comments · 39c538d6

由 Cai Huoqing 提交于 7月 30, 2021

Fix typo:
*vectores  ==> vectors
*realeased  ==> released
*erros  ==> errors
*namepsace  ==> namespace
*trafic  ==> traffic
*proccessed  ==> processed
*retore  ==> restore
*Currenlty  ==> Currently
*crated  ==> created
*chane  ==> change
*cannnot  ==> cannot
*usuallly  ==> usually
*failes  ==> fails
*importent  ==> important
*reenabled  ==> re-enabled
*alocation  ==> allocation
*recived  ==> received
*tanslation  ==> translation
Signed-off-by: NCai Huoqing <caihuoqing@baidu.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

39c538d6

10 8月, 2021 1 次提交

net/mlx5: Synchronize correct IRQ when destroying CQ · 563476ae

由 Shay Drory 提交于 4月 11, 2021

The CQ destroy is performed based on the IRQ number that is stored in
cq->irqn. That number wasn't set explicitly during CQ creation and as
expected some of the API users of mlx5_core_create_cq() forgot to update
it.

This caused to wrong synchronization call of the wrong IRQ with a number
0 instead of the real one.

As a fix, set the IRQ number directly in the mlx5_core_create_cq() and
update all users accordingly.

Fixes: 1a86b377 ("vdpa/mlx5: Add VDPA driver for supported mlx5 devices")
Fixes: ef1659ad ("IB/mlx5: Add DEVX support for CQ events")
Signed-off-by: NShay Drory <shayd@nvidia.com>
Reviewed-by: NTariq Toukan <tariqt@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

563476ae

06 8月, 2021 1 次提交

net/mlx5: Lag, add initial logic for shared FDB · af8c0e25

由 Mark Bloch 提交于 8月 03, 2021

As shared FDB requires changes in two subsystems first expose the needed
core functions so the RDMA side can be changed.

mlx5_lag_is_master(): return true if a given mlx5 device is the lag master.
mlx5_lag_is_shared_fdb(): Returns true if the lag mode is shared FDB.
mlx5_lag_get_peer_mdev(): Return the peer mdev in lag.

The mentioned functions will be used by downstream patches in order
to add support for shared FDB for the RDMA side.
Signed-off-by: NMark Bloch <mbloch@nvidia.com>
Reviewed-by: NMark Zhang <markzhang@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

af8c0e25

17 6月, 2021 1 次提交

net/mlx5e: Don't create devices during unload flow · a5ae8fc9

由 Dmytro Linkin 提交于 5月 14, 2021

Running devlink reload command for port in switchdev mode cause
resources to corrupt: driver can't release allocated EQ and reclaim
memory pages, because "rdma" auxiliary device had add CQs which blocks
EQ from deletion.
Erroneous sequence happens during reload-down phase, and is following:

1. detach device - suspends auxiliary devices which support it, destroys
   others. During this step "eth-rep" and "rdma-rep" are destroyed,
   "eth" - suspended.
2. disable SRIOV - moves device to legacy mode; as part of disablement -
   rescans drivers. This step adds "rdma" auxiliary device.
3. destroy EQ table - <failure>.

Driver shouldn't create any device during unload flows. To handle that
implement MLX5_PRIV_FLAGS_DETACH flag, set it on device detach and unset
on device attach. If flag is set do no-op on drivers rescan.

Fixes: a925b5e3 ("net/mlx5: Register mlx5 devices to auxiliary virtual bus")
Signed-off-by: NDmytro Linkin <dlinkin@nvidia.com>
Reviewed-by: NLeon Romanovsky <leonro@nvidia.com>
Reviewed-by: NRoi Dayan <roid@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

a5ae8fc9

28 5月, 2021 1 次提交

net/mlx5: Move chains ft pool to be used by all firmware steering · 4a98544d

由 Paul Blakey 提交于 3月 08, 2021

Firmware FT pool is per device, but the software tracking of this pool
only services fs_chains users, and if another layer takes a flow table,
the pool will not be updated, and fs_chains will fail creating a flow
table, with no recovery till the flow table is returned.

Move FT pool to be global per device, and stored at the cmd level,
so all layers can use it.
Signed-off-by: NPaul Blakey <paulb@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

4a98544d

19 5月, 2021 1 次提交

{net, RDMA}/mlx5: Fix override of log_max_qp by other device · 3410fbcd

由 Maor Gottlieb 提交于 5月 12, 2021

mlx5_core_dev holds pointer to static profile, hence when the
log_max_qp of the profile is override by some device, then it
effect all other mlx5 devices that share the same profile.
Fix it by having a profile instance for every mlx5 device.

Fixes: 883371c4 ("net/mlx5: Check FW limitations on log_max_qp before setting it")
Signed-off-by: NMaor Gottlieb <maorg@nvidia.com>
Reviewed-by: NMark Bloch <mbloch@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

3410fbcd

17 4月, 2021 1 次提交

net/mlx5: Add register layout to support extended link state · 36830159

由 Moshe Tal 提交于 2月 15, 2021

Add needed structure layouts and defines for pddr register
(Port Diagnostics Database Register) and the troublshooting page.

This will be used to get extended link state from the monitor opcode
bits.
Signed-off-by: NMoshe Tal <moshet@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

36830159

03 4月, 2021 2 次提交

net/mlx5: Allocate rate limit table when rate is configured · 6b30b6d4

由 Parav Pandit 提交于 2月 19, 2021

A device supports 128 rate limiters. A static table allocation consumes
8KB of memory even when rate is not configured.

Instead, allocate the table when at least one rate is configured.
Signed-off-by: NParav Pandit <parav@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

6b30b6d4

net/mlx5: Pack mlx5_rl_entry structure · 4c4c0a89

由 Parav Pandit 提交于 2月 19, 2021

mlx5_rl_entry structure is not properly packed as shown below. Due to this
an array of size 9144 bytes allocated which is aligned to 16Kbytes.
Hence, pack the structure and avoid the wastage.

This offers 8Kbytes of saving per mlx5_core_dev struct.

pahole -C mlx5_rl_entry  drivers/net/ethernet/mellanox/mlx5/core/en_main.o

Existing layout:

struct mlx5_rl_entry {
        u8                         rl_raw[48];           /*     0    48 */
        u16                        index;                /*    48     2 */

        /* XXX 6 bytes hole, try to pack */

        u64                        refcount;             /*    56     8 */
        /* --- cacheline 1 boundary (64 bytes) --- */
        u16                        uid;                  /*    64     2 */
        u8                         dedicated:1;          /*    66: 0  1 */

        /* size: 72, cachelines: 2, members: 5 */
        /* sum members: 60, holes: 1, sum holes: 6 */
        /* sum bitfield members: 1 bits (0 bytes) */
        /* padding: 5 */
        /* bit_padding: 7 bits */
        /* last cacheline: 8 bytes */
};

After alignment:

struct mlx5_rl_entry {
        u8                         rl_raw[48];           /*     0    48 */
        u64                        refcount;             /*    48     8 */
        u16                        index;                /*    56     2 */
        u16                        uid;                  /*    58     2 */
        u8                         dedicated:1;          /*    60: 0  1 */

        /* size: 64, cachelines: 1, members: 5 */
        /* padding: 3 */
        /* bit_padding: 7 bits */
};
Signed-off-by: NParav Pandit <parav@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

4c4c0a89

17 3月, 2021 3 次提交

net/mlx5e: Do not reload ethernet ports when changing eswitch mode · 7a9fb35e

由 Roi Dayan 提交于 9月 16, 2020

When switching modes between legacy and switchdev and back, do not
reload ethernet interfaces. just change the profile from nic profile
to uplink rep profile in switchdev mode.
Signed-off-by: NRoi Dayan <roid@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

7a9fb35e

net/mlx5: Move devlink port from mlx5e priv to mlx5e resources · c27971d0

由 Roi Dayan 提交于 10月 28, 2020

We re-use the native NIC port net device instance for the Uplink
representor, and the devlink port.
When changing profiles we reset the mlx5e priv but we should still
use the devlink port so move it to mlx5e resources.
Signed-off-by: NRoi Dayan <roid@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

c27971d0

net/mlx5: Move mlx5e hw resources into a sub object · c276aae8

由 Roi Dayan 提交于 1月 26, 2021

This is to separate between resources attributes and other
attributes we will want to use.
Signed-off-by: NRoi Dayan <roid@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

c276aae8

13 3月, 2021 2 次提交

net/mlx5: Use order-0 allocations for EQs · 26bf3090

由 Tariq Toukan 提交于 3月 10, 2021

Currently we are allocating high-order page for EQs. In case of
fragmented system, VF hot remove/add in VMs for example, there isn't
enough contiguous memory for EQs allocation, which results in crashing
of the VM.
Therefore, use order-0 fragments for the EQ allocations instead.

Performance tests:
ConnectX-5 100Gbps, CPU: Intel(R) Xeon(R) CPU E5-2697 v3 @ 2.60GHz
Performance tests show no sensible degradation.
Signed-off-by: NTariq Toukan <tariqt@nvidia.com>
Signed-off-by: NShay Drory <shayd@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

26bf3090

net/mlx5: Remove unused mlx5_core_health member recover_work · 59079438

由 Mikhael Goikhman 提交于 3月 10, 2021

The code related to health->recover_work was removed in
commit 63cbc552 ("net/mlx5: Handle SW reset of FW in error flow")

Fix struct mlx5_core_health accordingly.
Signed-off-by: NMikhael Goikhman <migo@nvidia.com>
Reviewed-by: NTariq Toukan <tariqt@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

59079438

12 3月, 2021 1 次提交

RDMA/mlx5: Fix query RoCE port · 7852546f

由 Maor Gottlieb 提交于 3月 04, 2021

mlx5_is_roce_enabled returns the devlink RoCE init value, therefore it
should be used only when driver is loaded. Instead we just need to read
the roce_en field.

In addition, rename mlx5_is_roce_enabled to mlx5_is_roce_init_enabled.

Fixes: 7a58779e ("IB/mlx5: Improve query port for representor port")
Link: https://lore.kernel.org/r/20210304124517.1100608-2-leon@kernel.orgSigned-off-by: NMaor Gottlieb <maorg@nvidia.com>
Signed-off-by: NLeon Romanovsky <leonro@nvidia.com>
Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>

7852546f

17 2月, 2021 2 次提交

net/mlx5: Move all internal timer metadata into a dedicated struct · d6f3dc8f

由 Eran Ben Elisha 提交于 2月 12, 2021

Internal timer mode (SW clock) requires some PTP clock related metadata
structs. Real time mode (HW clock) will not need these metadata structs.
This separation emphasize the different interfaces for HW clock and SW
clock.
Signed-off-by: NEran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: NAya Levin <ayal@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

d6f3dc8f

net/mlx5: Add register layout to support real-time time-stamp · ae02d415

由 Eran Ben Elisha 提交于 2月 12, 2021

Add needed structure layouts and defines for MTUTC (Management UTC)
register. MTUTC will be used for cyc2time HW translation.

In addition, add cyc2time modify capability bit and init segment HCA
real time address.

Finally, add capability bits indicating which time-stamping format is
supported per SQ and RQ. Add ts_format in the queue's context layout to
allow configuration.
Signed-off-by: NEran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: NAya Levin <ayal@nvidia.com>
Reviewed-by: NMoshe Shemesh <moshe@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

ae02d415

11 2月, 2021 1 次提交

net/mlx5: Delete device list leftover · 5b74df80

由 Leon Romanovsky 提交于 1月 04, 2021

Device list is not stored in mlx5_priv anymore, so delete it as it's not
used.
Signed-off-by: NLeon Romanovsky <leonro@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

5b74df80

09 2月, 2021 1 次提交

RDMA/mlx5: Cleanup the synchronize_srcu() from the ODP flow · db72438c

由 Yishai Hadas 提交于 2月 02, 2021

Cleanup the synchronize_srcu() from the ODP flow as it was found to be a
very heavy time consumer as part of dereg_mr.

For example de-registration of 10000 ODP MRs each with size of 2M hugepage
took 19.6 sec comparing de-registration of same number of non ODP MRs that
took 172 ms.

The new locking scheme uses the wait_event() mechanism which follows the
use count of the MR instead of using synchronize_srcu().

By that change, the time required for the above test took 95 ms which is
even better than the non ODP flow.

Once fully dropped the srcu usage, had to come with a lock to protect the
XA access.

As part of using the above mechanism we could also clean the
num_deferred_work stuff and follow the use count instead.

Link: https://lore.kernel.org/r/20210202071309.2057998-1-leon@kernel.orgSigned-off-by: NYishai Hadas <yishaih@nvidia.com>
Signed-off-by: NLeon Romanovsky <leonro@nvidia.com>
Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>

db72438c

06 2月, 2021 1 次提交

IB/mlx5: Move mlx5_port_caps from mlx5_core_dev to mlx5_ib_dev · 3ce60f44

由 Parav Pandit 提交于 2月 03, 2021

mlx5_port_caps are RDMA specific capabilities. These are not used by the
mlx5_core_device at all. Move them to mlx5_ib_dev where it is used and
reduce the scope of it to multiple drivers.

Link: https://lore.kernel.org/r/20210203130133.4057329-2-leon@kernel.orgSigned-off-by: NParav Pandit <parav@nvidia.com>
Signed-off-by: NLeon Romanovsky <leonro@nvidia.com>
Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>

3ce60f44

28 1月, 2021 2 次提交

net/mlx5: Notify on trap action by blocking event · 241dc159

由 Aya Levin 提交于 1月 26, 2021

In order to allow mlx5 core driver to trigger synchronous operations to
its consumers, add a blocking events handler. Add wrappers to
blocking_notifier_[call_chain/chain_register/chain_unregister]. Add trap
callback for action set and notify about this change. Following patches
in the set add a listener for this event.
Signed-off-by: NAya Levin <ayal@nvidia.com>
Reviewed-by: NTariq Toukan <tariqt@nvidia.com>
Signed-off-by: NTariq Toukan <tariqt@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

241dc159

net/mlx5: Add support for devlink traps in mlx5 core driver · 3d347b1b

由 Aya Levin 提交于 1月 26, 2021

Add devlink traps infra-structure to mlx5 core driver. Add traps list to
mlx5_priv and corresponding API:
 - mlx5_devlink_trap_report() to wrap trap reports to devlink
 - mlx5_devlink_trap_get_num_active() to decide whether to open/close trap
   resources.
Signed-off-by: NAya Levin <ayal@nvidia.com>
Reviewed-by: NTariq Toukan <tariqt@nvidia.com>
Signed-off-by: NTariq Toukan <tariqt@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

3d347b1b

23 1月, 2021 4 次提交

net/mlx5: SF, Add port add delete functionality · 8f010541

由 Parav Pandit 提交于 12月 11, 2020

To handle SF port management outside of the eswitch as independent
software layer, introduce eswitch notifier APIs so that mlx5 upper
layer who wish to support sf port management in switchdev mode can
perform its task whenever eswitch mode is set to switchdev or before
eswitch is disabled.

Initialize sf port table on such eswitch event.

Add SF port add and delete functionality in switchdev mode.
Destroy all SF ports when eswitch is disabled.
Expose SF port add and delete to user via devlink commands.

$ devlink dev eswitch set pci/0000:06:00.0 mode switchdev

$ devlink port show
pci/0000:06:00.0/65535: type eth netdev ens2f0np0 flavour physical port 0 splittable false

$ devlink port add pci/0000:06:00.0 flavour pcisf pfnum 0 sfnum 88
pci/0000:06:00.0/32768: type eth netdev eth6 flavour pcisf controller 0 pfnum 0 sfnum 88 external false splittable false
  function:
    hw_addr 00:00:00:00:00:00 state inactive opstate detached

$ devlink port show ens2f0npf0sf88
pci/0000:06:00.0/32768: type eth netdev ens2f0npf0sf88 flavour pcisf controller 0 pfnum 0 sfnum 88 external false splittable false
  function:
    hw_addr 00:00:00:00:00:00 state inactive opstate detached

or by its unique port index:
$ devlink port show pci/0000:06:00.0/32768
pci/0000:06:00.0/32768: type eth netdev ens2f0npf0sf88 flavour pcisf controller 0 pfnum 0 sfnum 88 external false splittable false
  function:
    hw_addr 00:00:00:00:00:00 state inactive opstate detached

$ devlink port show ens2f0npf0sf88 -jp
{
    "port": {
        "pci/0000:06:00.0/32768": {
            "type": "eth",
            "netdev": "ens2f0npf0sf88",
            "flavour": "pcisf",
            "controller": 0,
            "pfnum": 0,
            "sfnum": 88,
            "external": false,
            "splittable": false,
            "function": {
                "hw_addr": "00:00:00:00:00:00",
                "state": "inactive",
                "opstate": "detached"
            }
        }
    }
}
Signed-off-by: NParav Pandit <parav@nvidia.com>
Reviewed-by: NVu Pham <vuhuong@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

8f010541

net/mlx5: SF, Add auxiliary device driver · 1958fc2f

由 Parav Pandit 提交于 12月 11, 2020

Add auxiliary device driver for mlx5 subfunction auxiliary device.

A mlx5 subfunction is similar to PCI PF and VF. For a subfunction
an auxiliary device is created.

As a result, when mlx5 SF auxiliary device binds to the driver,
its netdev and rdma device are created, they appear as

$ ls -l /sys/bus/auxiliary/devices/
mlx5_core.sf.4 -> ../../../devices/pci0000:00/0000:00:03.0/0000:06:00.0/mlx5_core.sf.4

$ ls -l /sys/class/net/eth1/device
/sys/class/net/eth1/device -> ../../../mlx5_core.sf.4

$ cat /sys/bus/auxiliary/devices/mlx5_core.sf.4/sfnum
88

$ devlink dev show
pci/0000:06:00.0
auxiliary/mlx5_core.sf.4

$ devlink port show auxiliary/mlx5_core.sf.4/1
auxiliary/mlx5_core.sf.4/1: type eth netdev p0sf88 flavour virtual port 0 splittable false

$ rdma link show mlx5_0/1
link mlx5_0/1 state ACTIVE physical_state LINK_UP netdev p0sf88

$ rdma dev show
8: rocep6s0f1: node_type ca fw 16.29.0550 node_guid 248a:0703:00b3:d113 sys_image_guid 248a:0703:00b3:d112
13: mlx5_0: node_type ca fw 16.29.0550 node_guid 0000:00ff:fe00:8888 sys_image_guid 248a:0703:00b3:d112

In future, devlink device instance name will adapt to have sfnum
annotation using either an alias or as devlink instance name described
in RFC [1].

[1] https://lore.kernel.org/netdev/20200519092258.GF4655@nanopsycho/Signed-off-by: NParav Pandit <parav@nvidia.com>
Reviewed-by: NVu Pham <vuhuong@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

1958fc2f

net/mlx5: SF, Add auxiliary device support · 90d010b8

由 Parav Pandit 提交于 12月 11, 2020

Introduce API to add and delete an auxiliary device for an SF.
Each SF has its own dedicated window in the PCI BAR 2.

SF device is similar to PCI PF and VF that supports multiple class of
devices such as net, rdma and vdpa.

SF device will be added or removed in subsequent patch during SF
devlink port function state change command.

A subfunction device exposes user supplied subfunction number which will
be further used by systemd/udev to have deterministic name for its
netdevice and rdma device.

An mlx5 subfunction auxiliary device example:

$ devlink dev eswitch set pci/0000:06:00.0 mode switchdev

$ devlink port show
pci/0000:06:00.0/65535: type eth netdev ens2f0np0 flavour physical port 0 splittable false

$ devlink port add pci/0000:06:00.0 flavour pcisf pfnum 0 sfnum 88
pci/0000:08:00.0/32768: type eth netdev eth6 flavour pcisf controller 0 pfnum 0 sfnum 88 external false splittable false
  function:
    hw_addr 00:00:00:00:00:00 state inactive opstate detached

$ devlink port show ens2f0npf0sf88
pci/0000:06:00.0/32768: type eth netdev ens2f0npf0sf88 flavour pcisf controller 0 pfnum 0 sfnum 88 external false splittable false
  function:
    hw_addr 00:00:00:00:88:88 state inactive opstate detached

$ devlink port function set ens2f0npf0sf88 hw_addr 00:00:00:00:88:88 state active

On activation,

$ ls -l /sys/bus/auxiliary/devices/
mlx5_core.sf.4 -> ../../../devices/pci0000:00/0000:00:03.0/0000:06:00.0/mlx5_core.sf.4

$ cat /sys/bus/auxiliary/devices/mlx5_core.sf.4/sfnum
88
Signed-off-by: NParav Pandit <parav@nvidia.com>
Reviewed-by: NVu Pham <vuhuong@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

90d010b8

net/mlx5: Introduce vhca state event notifier · f3196bb0

由 Parav Pandit 提交于 12月 11, 2020

vhca state events indicates change in the state of the vhca that may
occur due to a SF allocation, deallocation or enabling/disabling the
SF HCA.

Introduce vhca state event handler which will be used by SF devlink
port manager and SF hardware id allocator in subsequent patches
to act on the event.

This enables single entity to subscribe, query and rearm the event
for a function.
Signed-off-by: NParav Pandit <parav@nvidia.com>
Reviewed-by: NVu Pham <vuhuong@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

f3196bb0

20 1月, 2021 1 次提交

Revert "RDMA/mlx5: Fix devlink deadlock on net namespace deletion" · de641d74

由 Parav Pandit 提交于 1月 17, 2021

This reverts commit fbdd0049.

Due to commit in fixes tag, netdevice events were received only in one net
namespace of mlx5_core_dev. Due to this when netdevice events arrive in
net namespace other than net namespace of mlx5_core_dev, they are missed.

This results in empty GID table due to RDMA device being detached from its
net device.

Hence, revert back to receive netdevice events in all net namespaces to
restore back RDMA functionality in non init_net net namespace. The
deadlock will have to be addressed in another patch.

Fixes: fbdd0049 ("RDMA/mlx5: Fix devlink deadlock on net namespace deletion")
Link: https://lore.kernel.org/r/20210117092633.10690-1-leon@kernel.orgSigned-off-by: NParav Pandit <parav@nvidia.com>
Signed-off-by: NLeon Romanovsky <leonro@nvidia.com>
Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>

de641d74

06 12月, 2020 1 次提交

net/mlx5: Delete custom device management logic · 601c10c8

由 Leon Romanovsky 提交于 10月 05, 2020

After conversion to use auxiliary bus, all custom device management is
not needed anymore, delete it.
Signed-off-by: NLeon Romanovsky <leonro@nvidia.com>

601c10c8

04 12月, 2020 2 次提交

net/mlx5: Register mlx5 devices to auxiliary virtual bus · a925b5e3

由 Leon Romanovsky 提交于 10月 08, 2020

Create auxiliary devices under new virtual bus. This will replace
the custom-made mlx5 ->add()/->remove() interfaces and next patches
will fill the missing callback and remove the old interface logic.

The attachment of auxiliary drivers to the devices is possible in
1-to-1 manner only and it requires us to create device for every protocol,
so that device (module) will be able to connect to it.

System with 2 IB and 1 RoCE cards:
[leonro@vm ~]$ lspci |grep nox
00:09.0 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5]
00:0a.0 Ethernet controller: Mellanox Technologies MT28908 Family [ConnectX-6]
00:0b.0 Ethernet controller: Mellanox Technologies MT2910 Family [ConnectX-7]
[leonro@vm ~]$ ls -l /sys/bus/auxiliary/devices/
 mlx5_core.eth.2 -> ../../../devices/pci0000:00/0000:00:0b.0/mlx5_core.eth.2
 mlx5_core.rdma.0 -> ../../../devices/pci0000:00/0000:00:09.0/mlx5_core.rdma.0
 mlx5_core.rdma.1 -> ../../../devices/pci0000:00/0000:00:0a.0/mlx5_core.rdma.1
 mlx5_core.rdma.2 -> ../../../devices/pci0000:00/0000:00:0b.0/mlx5_core.rdma.2
 mlx5_core.vdpa.1 -> ../../../devices/pci0000:00/0000:00:0a.0/mlx5_core.vdpa.1
 mlx5_core.vdpa.2 -> ../../../devices/pci0000:00/0000:00:0b.0/mlx5_core.vdpa.2
[leonro@vm ~]$ rdma dev
0: ibp0s9: node_type ca fw 4.6.9999 node_guid 5254:00c0:fe12:3455 sys_image_guid 5254:00c0:fe12:3455
1: ibp0s10: node_type ca fw 4.6.9999 node_guid 5254:00c0:fe12:3456 sys_image_guid 5254:00c0:fe12:3456
2: rdmap0s11: node_type ca fw 4.6.9999 node_guid 5254:00c0:fe12:3457 sys_image_guid 5254:00c0:fe12:3457

System with RoCE SR-IOV card with 4 VFs:
[leonro@vm ~]$ lspci |grep nox
01:00.0 Ethernet controller: Mellanox Technologies MT28908 Family [ConnectX-6]
01:00.1 Ethernet controller: Mellanox Technologies MT28908 Family [ConnectX-6 Virtual Function]
01:00.2 Ethernet controller: Mellanox Technologies MT28908 Family [ConnectX-6 Virtual Function]
01:00.3 Ethernet controller: Mellanox Technologies MT28908 Family [ConnectX-6 Virtual Function]
01:00.4 Ethernet controller: Mellanox Technologies MT28908 Family [ConnectX-6 Virtual Function]
[leonro@vm ~]$ ls -l /sys/bus/auxiliary/devices/
 mlx5_core.eth.0 -> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.0/mlx5_core.eth.0
 mlx5_core.eth.1 -> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.1/mlx5_core.eth.1
 mlx5_core.eth.2 -> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.2/mlx5_core.eth.2
 mlx5_core.eth.3 -> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.3/mlx5_core.eth.3
 mlx5_core.eth.4 -> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.4/mlx5_core.eth.4
 mlx5_core.rdma.0 -> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.0/mlx5_core.rdma.0
 mlx5_core.rdma.1 -> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.1/mlx5_core.rdma.1
 mlx5_core.rdma.2 -> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.2/mlx5_core.rdma.2
 mlx5_core.rdma.3 -> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.3/mlx5_core.rdma.3
 mlx5_core.rdma.4 -> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.4/mlx5_core.rdma.4
 mlx5_core.vdpa.1 -> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.1/mlx5_core.vdpa.1
 mlx5_core.vdpa.2 -> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.2/mlx5_core.vdpa.2
 mlx5_core.vdpa.3 -> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.3/mlx5_core.vdpa.3
 mlx5_core.vdpa.4 -> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.4/mlx5_core.vdpa.4
[leonro@vm ~]$ rdma dev
0: rocep1s0f0: node_type ca fw 4.6.9999 node_guid 5254:00c0:fe12:3455 sys_image_guid 5254:00c0:fe12:3455
1: rocep1s0f0v0: node_type ca fw 4.6.9999 node_guid 0000:0000:0000:0000 sys_image_guid 5254:00c0:fe12:3456
2: rocep1s0f0v1: node_type ca fw 4.6.9999 node_guid 0000:0000:0000:0000 sys_image_guid 5254:00c0:fe12:3457
3: rocep1s0f0v2: node_type ca fw 4.6.9999 node_guid 0000:0000:0000:0000 sys_image_guid 5254:00c0:fe12:3458
4: rocep1s0f0v3: node_type ca fw 4.6.9999 node_guid 0000:0000:0000:0000 sys_image_guid 5254:00c0:fe12:3459
Signed-off-by: NLeon Romanovsky <leonro@nvidia.com>

a925b5e3

net/mlx5_core: Clean driver version and name · 17a7612b

由 Leon Romanovsky 提交于 10月 04, 2020

Remove exposed driver version as it was done in other drivers,
so module version will work correctly by displaying the kernel
version for which it is compiled.

And move mlx5_core module name to general include, so auxiliary drivers
will be able to use it as a basis for a name in their device ID tables.
Reviewed-by: NParav Pandit <parav@nvidia.com>
Reviewed-by: NRoi Dayan <roid@nvidia.com>
Signed-off-by: NLeon Romanovsky <leonro@nvidia.com>

17a7612b

27 11月, 2020 3 次提交

net/mlx5: Rename peer_pf to host_pf · 8a90f2fc

由 Parav Pandit 提交于 11月 20, 2020

To match the hardware spec, rename peer_pf to host_pf.
Signed-off-by: NParav Pandit <parav@nvidia.com>
Reviewed-by: NBodong Wang <bodong@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

8a90f2fc

net/mlx5: Make API mlx5_core_is_ecpf accept const pointer · 3b1e58aa

由 Parav Pandit 提交于 11月 20, 2020

Subsequent patch implements helper API which has mlx5_core_dev
as const pointer, make its caller API too const *.
Signed-off-by: NParav Pandit <parav@nvidia.com>
Reviewed-by: NBodong Wang <bodong@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

3b1e58aa

net/mlx5: Avoid exposing driver internal command helpers · e5dfe6b5

由 Parav Pandit 提交于 11月 20, 2020

mlx5 command init and cleanup routines are internal to mlx5_core driver.
Hence, avoid exporting them and move their definition to mlx5_core
driver's internal file mlx5_core.h
Signed-off-by: NParav Pandit <parav@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

e5dfe6b5

27 10月, 2020 1 次提交

RDMA/mlx5: Fix devlink deadlock on net namespace deletion · fbdd0049

由 Parav Pandit 提交于 10月 26, 2020

When a mlx5 core devlink instance is reloaded in different net namespace,
its associated IB device is deleted and recreated.

Example sequence is:
$ ip netns add foo
$ devlink dev reload pci/0000:00:08.0 netns foo
$ ip netns del foo

mlx5 IB device needs to attach and detach the netdevice to it through the
netdev notifier chain during load and unload sequence.  A below call graph
of the unload flow.

cleanup_net()
   down_read(&pernet_ops_rwsem); <- first sem acquired
     ops_pre_exit_list()
       pre_exit()
         devlink_pernet_pre_exit()
           devlink_reload()
             mlx5_devlink_reload_down()
               mlx5_unload_one()
               [...]
                 mlx5_ib_remove()
                   mlx5_ib_unbind_slave_port()
                     mlx5_remove_netdev_notifier()
                       unregister_netdevice_notifier()
                         down_write(&pernet_ops_rwsem);<- recurrsive lock

Hence, when net namespace is deleted, mlx5 reload results in deadlock.

When deadlock occurs, devlink mutex is also held. This not only deadlocks
the mlx5 device under reload, but all the processes which attempt to
access unrelated devlink devices are deadlocked.

Hence, fix this by mlx5 ib driver to register for per net netdev notifier
instead of global one, which operats on the net namespace without holding
the pernet_ops_rwsem.

Fixes: 4383cfcc ("net/mlx5: Add devlink reload")
Link: https://lore.kernel.org/r/20201026134359.23150-1-parav@nvidia.comSigned-off-by: NParav Pandit <parav@nvidia.com>
Signed-off-by: NLeon Romanovsky <leonro@nvidia.com>
Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>

fbdd0049

10 10月, 2020 1 次提交

net/mlx5: Handle sync reset request event · 38b9f903

由 Moshe Shemesh 提交于 10月 07, 2020

Once the driver gets sync_reset_request from firmware it prepares for the
coming reset and sends acknowledge.
After getting this event the driver expects device reset, either it will
trigger PCI reset on sync_reset_now event or such PCI reset will be
triggered by another PF of the same device. So it moves to reset
requested mode and if it gets PCI reset triggered by the other PF it
detect the reset and reloads.
Signed-off-by: NMoshe Shemesh <moshe@mellanox.com>
Reviewed-by: NSaeed Mahameed <saeedm@nvidia.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

38b9f903

03 10月, 2020 1 次提交

net/mlx5: cmdif, Avoid skipping reclaim pages if FW is not accessible · b898ce7b

由 Saeed Mahameed 提交于 9月 11, 2020

In case of pci is offline reclaim_pages_cmd() will still try to call
the FW to release FW pages, cmd_exec() in this case will return a silent
success without actually calling the FW.

This is wrong and will cause page leaks, what we should do is to detect
pci offline or command interface un-available before tying to access the
FW and manually release the FW pages in the driver.

In this patch we share the code to check for FW command interface
availability and we call it in sensitive places e.g. reclaim_pages_cmd().

Alternative fix:
 1. Remove MLX5_CMD_OP_MANAGE_PAGES form mlx5_internal_err_ret_value,
    command success simulation list.
 2. Always Release FW pages even if cmd_exec fails in reclaim_pages_cmd().
Reviewed-by: NMoshe Shemesh <moshe@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

b898ce7b

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功