提交 · 90d010b8634b89a97ca3b7aa6a88fd566fc77717 · openeuler / Kernel

23 1月, 2021 2 次提交

net/mlx5: SF, Add auxiliary device support · 90d010b8

由 Parav Pandit 提交于 12月 11, 2020

Introduce API to add and delete an auxiliary device for an SF.
Each SF has its own dedicated window in the PCI BAR 2.

SF device is similar to PCI PF and VF that supports multiple class of
devices such as net, rdma and vdpa.

SF device will be added or removed in subsequent patch during SF
devlink port function state change command.

A subfunction device exposes user supplied subfunction number which will
be further used by systemd/udev to have deterministic name for its
netdevice and rdma device.

An mlx5 subfunction auxiliary device example:

$ devlink dev eswitch set pci/0000:06:00.0 mode switchdev

$ devlink port show
pci/0000:06:00.0/65535: type eth netdev ens2f0np0 flavour physical port 0 splittable false

$ devlink port add pci/0000:06:00.0 flavour pcisf pfnum 0 sfnum 88
pci/0000:08:00.0/32768: type eth netdev eth6 flavour pcisf controller 0 pfnum 0 sfnum 88 external false splittable false
  function:
    hw_addr 00:00:00:00:00:00 state inactive opstate detached

$ devlink port show ens2f0npf0sf88
pci/0000:06:00.0/32768: type eth netdev ens2f0npf0sf88 flavour pcisf controller 0 pfnum 0 sfnum 88 external false splittable false
  function:
    hw_addr 00:00:00:00:88:88 state inactive opstate detached

$ devlink port function set ens2f0npf0sf88 hw_addr 00:00:00:00:88:88 state active

On activation,

$ ls -l /sys/bus/auxiliary/devices/
mlx5_core.sf.4 -> ../../../devices/pci0000:00/0000:00:03.0/0000:06:00.0/mlx5_core.sf.4

$ cat /sys/bus/auxiliary/devices/mlx5_core.sf.4/sfnum
88
Signed-off-by: NParav Pandit <parav@nvidia.com>
Reviewed-by: NVu Pham <vuhuong@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

90d010b8

net/mlx5: Introduce vhca state event notifier · f3196bb0

由 Parav Pandit 提交于 12月 11, 2020

vhca state events indicates change in the state of the vhca that may
occur due to a SF allocation, deallocation or enabling/disabling the
SF HCA.

Introduce vhca state event handler which will be used by SF devlink
port manager and SF hardware id allocator in subsequent patches
to act on the event.

This enables single entity to subscribe, query and rearm the event
for a function.
Signed-off-by: NParav Pandit <parav@nvidia.com>
Reviewed-by: NVu Pham <vuhuong@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

f3196bb0

14 1月, 2021 1 次提交

net/mlx5: Add HW definition of reg_c_preserve · 838b00a2

由 Paul Blakey 提交于 1月 11, 2021

Add capability bit to test whether reg_c value is preserved on
recirculation.
Signed-off-by: NPaul Blakey <paulb@mellanox.com>
Signed-off-by: NMaor Dickman <maord@nvidia.com>
Reviewed-by: NRoi Dayan <roid@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

838b00a2

08 1月, 2021 1 次提交

net/mlx5e: Add missing capability check for uplink follow · 9c9be85f

由 Aya Levin 提交于 11月 24, 2020

Expose firmware indication that it supports setting eswitch uplink state
to follow (follow the physical link). Condition setting the eswitch
uplink admin-state with this capability bit. Older FW may not support
the uplink state setting.

Fixes: 7d0314b1 ("net/mlx5e: Modify uplink state on interface up/down")
Signed-off-by: NAya Levin <ayal@nvidia.com>
Reviewed-by: NMoshe Shemesh <moshe@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

9c9be85f

18 12月, 2020 1 次提交

net/mlx5: Fix compilation warning for 32-bit platform · 49e27134

由 Parav Pandit 提交于 12月 13, 2020

MLX5_GENERAL_OBJECT_TYPES types bitfield is 64-bit field.

Defining an enum for such bit fields on 32-bit platform results in below
warning.

./include/vdso/bits.h:7:26: warning: left shift count >= width of type [-Wshift-count-overflow]
                         ^
./include/linux/mlx5/mlx5_ifc.h:10716:46: note: in expansion of macro ‘BIT’
 MLX5_HCA_CAP_GENERAL_OBJECT_TYPES_SAMPLER = BIT(0x20),
                                             ^~~

Use 32-bit friendly BIT_ULL macro.

Fixes: 2a297089 ("net/mlx5: Add sample offload hardware bits and structures")
Signed-off-by: NParav Pandit <parav@nvidia.com>
Reported-by: NStephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: NLeon Romanovsky <leonro@nvidia.com>
Link: https://lore.kernel.org/r/20201213120641.216032-1-leon@kernel.orgSigned-off-by: NJakub Kicinski <kuba@kernel.org>

49e27134

06 12月, 2020 2 次提交

net/mlx5: Simplify eswitch mode check · e8711402

由 Leon Romanovsky 提交于 10月 10, 2020

Provide mlx5_core device instead of "priv" pointer while checking
eswith mode.
Reviewed-by: NRoi Dayan <roid@nvidia.com>
Signed-off-by: NLeon Romanovsky <leonro@nvidia.com>

e8711402

net/mlx5: Delete custom device management logic · 601c10c8

由 Leon Romanovsky 提交于 10月 05, 2020

After conversion to use auxiliary bus, all custom device management is
not needed anymore, delete it.
Signed-off-by: NLeon Romanovsky <leonro@nvidia.com>

601c10c8

04 12月, 2020 4 次提交

net/mlx5: Register mlx5 devices to auxiliary virtual bus · a925b5e3

由 Leon Romanovsky 提交于 10月 08, 2020

Create auxiliary devices under new virtual bus. This will replace
the custom-made mlx5 ->add()/->remove() interfaces and next patches
will fill the missing callback and remove the old interface logic.

The attachment of auxiliary drivers to the devices is possible in
1-to-1 manner only and it requires us to create device for every protocol,
so that device (module) will be able to connect to it.

System with 2 IB and 1 RoCE cards:
[leonro@vm ~]$ lspci |grep nox
00:09.0 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5]
00:0a.0 Ethernet controller: Mellanox Technologies MT28908 Family [ConnectX-6]
00:0b.0 Ethernet controller: Mellanox Technologies MT2910 Family [ConnectX-7]
[leonro@vm ~]$ ls -l /sys/bus/auxiliary/devices/
 mlx5_core.eth.2 -> ../../../devices/pci0000:00/0000:00:0b.0/mlx5_core.eth.2
 mlx5_core.rdma.0 -> ../../../devices/pci0000:00/0000:00:09.0/mlx5_core.rdma.0
 mlx5_core.rdma.1 -> ../../../devices/pci0000:00/0000:00:0a.0/mlx5_core.rdma.1
 mlx5_core.rdma.2 -> ../../../devices/pci0000:00/0000:00:0b.0/mlx5_core.rdma.2
 mlx5_core.vdpa.1 -> ../../../devices/pci0000:00/0000:00:0a.0/mlx5_core.vdpa.1
 mlx5_core.vdpa.2 -> ../../../devices/pci0000:00/0000:00:0b.0/mlx5_core.vdpa.2
[leonro@vm ~]$ rdma dev
0: ibp0s9: node_type ca fw 4.6.9999 node_guid 5254:00c0:fe12:3455 sys_image_guid 5254:00c0:fe12:3455
1: ibp0s10: node_type ca fw 4.6.9999 node_guid 5254:00c0:fe12:3456 sys_image_guid 5254:00c0:fe12:3456
2: rdmap0s11: node_type ca fw 4.6.9999 node_guid 5254:00c0:fe12:3457 sys_image_guid 5254:00c0:fe12:3457

System with RoCE SR-IOV card with 4 VFs:
[leonro@vm ~]$ lspci |grep nox
01:00.0 Ethernet controller: Mellanox Technologies MT28908 Family [ConnectX-6]
01:00.1 Ethernet controller: Mellanox Technologies MT28908 Family [ConnectX-6 Virtual Function]
01:00.2 Ethernet controller: Mellanox Technologies MT28908 Family [ConnectX-6 Virtual Function]
01:00.3 Ethernet controller: Mellanox Technologies MT28908 Family [ConnectX-6 Virtual Function]
01:00.4 Ethernet controller: Mellanox Technologies MT28908 Family [ConnectX-6 Virtual Function]
[leonro@vm ~]$ ls -l /sys/bus/auxiliary/devices/
 mlx5_core.eth.0 -> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.0/mlx5_core.eth.0
 mlx5_core.eth.1 -> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.1/mlx5_core.eth.1
 mlx5_core.eth.2 -> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.2/mlx5_core.eth.2
 mlx5_core.eth.3 -> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.3/mlx5_core.eth.3
 mlx5_core.eth.4 -> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.4/mlx5_core.eth.4
 mlx5_core.rdma.0 -> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.0/mlx5_core.rdma.0
 mlx5_core.rdma.1 -> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.1/mlx5_core.rdma.1
 mlx5_core.rdma.2 -> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.2/mlx5_core.rdma.2
 mlx5_core.rdma.3 -> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.3/mlx5_core.rdma.3
 mlx5_core.rdma.4 -> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.4/mlx5_core.rdma.4
 mlx5_core.vdpa.1 -> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.1/mlx5_core.vdpa.1
 mlx5_core.vdpa.2 -> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.2/mlx5_core.vdpa.2
 mlx5_core.vdpa.3 -> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.3/mlx5_core.vdpa.3
 mlx5_core.vdpa.4 -> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.4/mlx5_core.vdpa.4
[leonro@vm ~]$ rdma dev
0: rocep1s0f0: node_type ca fw 4.6.9999 node_guid 5254:00c0:fe12:3455 sys_image_guid 5254:00c0:fe12:3455
1: rocep1s0f0v0: node_type ca fw 4.6.9999 node_guid 0000:0000:0000:0000 sys_image_guid 5254:00c0:fe12:3456
2: rocep1s0f0v1: node_type ca fw 4.6.9999 node_guid 0000:0000:0000:0000 sys_image_guid 5254:00c0:fe12:3457
3: rocep1s0f0v2: node_type ca fw 4.6.9999 node_guid 0000:0000:0000:0000 sys_image_guid 5254:00c0:fe12:3458
4: rocep1s0f0v3: node_type ca fw 4.6.9999 node_guid 0000:0000:0000:0000 sys_image_guid 5254:00c0:fe12:3459
Signed-off-by: NLeon Romanovsky <leonro@nvidia.com>

a925b5e3

vdpa/mlx5: Make hardware definitions visible to all mlx5 devices · 0aae392b

由 Leon Romanovsky 提交于 10月 06, 2020

Move mlx5_vdpa IFC header file to the general include folder, so
mlx5_core will be able to reuse it to check if VDPA is supported
prior to creating an auxiliary device.

As part of this move, update the header file name to mlx5 general
naming scheme.
Reviewed-by: NParav Pandit <parav@nvidia.com>
Signed-off-by: NLeon Romanovsky <leonro@nvidia.com>

0aae392b

net/mlx5_core: Clean driver version and name · 17a7612b

由 Leon Romanovsky 提交于 10月 04, 2020

Remove exposed driver version as it was done in other drivers,
so module version will work correctly by displaying the kernel
version for which it is compiled.

And move mlx5_core module name to general include, so auxiliary drivers
will be able to use it as a basis for a name in their device ID tables.
Reviewed-by: NParav Pandit <parav@nvidia.com>
Reviewed-by: NRoi Dayan <roid@nvidia.com>
Signed-off-by: NLeon Romanovsky <leonro@nvidia.com>

17a7612b

net/mlx5: DR, Proper handling of unsupported Connect-X6DX SW steering · d421e466

由 Yevgeny Kliteynik 提交于 12月 02, 2020

STEs format for Connect-X5 and Connect-X6DX different. Currently, on
Connext-X6DX the SW steering would break at some point when building STEs
w/o giving a proper error message. Fix this by checking the STE format of
the current device when initializing domain: add mlx5_ifc definitions for
Connect-X6DX SW steering, read FW capability to get the current format
version, and check this version when domain is being created.

Fixes: 26d688e3 ("net/mlx5: DR, Add Steering entry (STE) utilities")
Signed-off-by: NYevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

d421e466

27 11月, 2020 11 次提交

net/mlx5: Treat host PF vport as other (non eswitch manager) vport · 617b860c

由 Parav Pandit 提交于 11月 20, 2020

When eswitch manager is running on ECPF, host PF should be treated
as non eswitch manager port, similar to other VF vports.
Fail to do so, results in firmware treating PF's vport as ECPF
vport for eswitch ACL tables.
Non zero check to figure out if a given vport is other vport or not
is not sufficient becase PF vport number = 0 on ECPF.
Hence, create esw acl tables with an attribute of other vport.
Signed-off-by: NParav Pandit <parav@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

617b860c

net/mlx5: Rename peer_pf to host_pf · 8a90f2fc

由 Parav Pandit 提交于 11月 20, 2020

To match the hardware spec, rename peer_pf to host_pf.
Signed-off-by: NParav Pandit <parav@nvidia.com>
Reviewed-by: NBodong Wang <bodong@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

8a90f2fc

net/mlx5: Make API mlx5_core_is_ecpf accept const pointer · 3b1e58aa

由 Parav Pandit 提交于 11月 20, 2020

Subsequent patch implements helper API which has mlx5_core_dev
as const pointer, make its caller API too const *.
Signed-off-by: NParav Pandit <parav@nvidia.com>
Reviewed-by: NBodong Wang <bodong@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

3b1e58aa

net/mlx5: Expose other function ifc bits · 959af556

由 Yishai Hadas 提交于 11月 20, 2020

Expose other function ifc bits to enable setting HCA caps on behalf of
other function.

In addition, expose vhca_resource_manager bit to control whether the
other function functionality is supported by firmware.
Signed-off-by: NYishai Hadas <yishaih@nvidia.com>
Reviewed-by: NParav Pandit <parav@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

959af556

net/mlx5: Expose IP-in-IP TX and RX capability bits · 21adf05d

由 Aya Levin 提交于 11月 20, 2020

Expose FW indication that it supports stateless offloads for IP over IP
tunneled packets per direction. In some HW like ConnectX-4 IP-in-IP
support is not symmetric, it supports steering on the inner header but
it doesn't TX-Checksum and TSO. Add IP-in-IP capability per direction to
cover this case as well.

Note: only if both indications are turned on, the global
tunnel_stateless_ip_over_ip is on too.
Signed-off-by: NAya Levin <ayal@nvidia.com>
Reviewed-by: NMoshe Shemesh <moshe@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

21adf05d

net/mlx5: Update the hardware interface definition for vhca state · 349125ba

由 Parav Pandit 提交于 11月 20, 2020

Update the hardware interface definitions to query and modify vhca
state, related EQE and event code.
Signed-off-by: NParav Pandit <parav@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

349125ba

net/mlx5: Avoid exposing driver internal command helpers · e5dfe6b5

由 Parav Pandit 提交于 11月 20, 2020

mlx5 command init and cleanup routines are internal to mlx5_core driver.
Hence, avoid exporting them and move their definition to mlx5_core
driver's internal file mlx5_core.h
Signed-off-by: NParav Pandit <parav@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

e5dfe6b5

net/mlx5: Add ts_cqe_to_dest_cqn related bits · 59d2ae1d

由 Eran Ben Elisha 提交于 11月 20, 2020

Add a bit in HCA capabilities layout to indicate if ts_cqe_to_dest_cqn is
supported.

In addition, add ts_cqe_to_dest_cqn field to SQ context, for driver to
set the actual CQN.
Signed-off-by: NEran Ben Elisha <eranbe@nvidia.com>
Reviewed-by: NTariq Toukan <tariqt@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

59d2ae1d

net/mlx5: Add misc4 to mlx5_ifc_fte_match_param_bits · 7da3ad6c

由 Muhammad Sammar 提交于 11月 20, 2020

Add misc4 match params to enable matching on prog_sample_fields.
Signed-off-by: NMuhammad Sammar <muhammads@nvidia.com>
Reviewed-by: NAlex Vesker <valex@nvidia.com>
Reviewed-by: NMark Bloch <mbloch@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

7da3ad6c

net/mlx5: Add sampler destination type · 38730630

由 Chris Mi 提交于 11月 20, 2020

The flow sampler object is a new destination type. Add a new member
for the flow destination.
Signed-off-by: NChris Mi <cmi@nvidia.com>
Reviewed-by: NOz Shlomo <ozsh@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

38730630

net/mlx5: Add sample offload hardware bits and structures · 2a297089

由 Chris Mi 提交于 11月 20, 2020

Hardware introduces flow sampler object for packet sampling.
Add the offload hardware bits and structures.
Signed-off-by: NChris Mi <cmi@nvidia.com>
Reviewed-by: NOz Shlomo <ozsh@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

2a297089

31 10月, 2020 1 次提交

net/mlx5: Replace zero-length array with flexible-array member · 29056207

由 Gustavo A. R. Silva 提交于 10月 27, 2020

There is a regular need in the kernel to provide a way to declare having a
dynamically sized set of trailing elements in a structure. Kernel code should
always use “flexible array members”[1] for these cases. The older style of
one-element or zero-length arrays should no longer be used[2].

[1] https://en.wikipedia.org/wiki/Flexible_array_member
[2] https://www.kernel.org/doc/html/v5.9/process/deprecated.html#zero-length-and-one-element-arraysSigned-off-by: NGustavo A. R. Silva <gustavoars@kernel.org>

29056207

27 10月, 2020 1 次提交

RDMA/mlx5: Fix devlink deadlock on net namespace deletion · fbdd0049

由 Parav Pandit 提交于 10月 26, 2020

When a mlx5 core devlink instance is reloaded in different net namespace,
its associated IB device is deleted and recreated.

Example sequence is:
$ ip netns add foo
$ devlink dev reload pci/0000:00:08.0 netns foo
$ ip netns del foo

mlx5 IB device needs to attach and detach the netdevice to it through the
netdev notifier chain during load and unload sequence.  A below call graph
of the unload flow.

cleanup_net()
   down_read(&pernet_ops_rwsem); <- first sem acquired
     ops_pre_exit_list()
       pre_exit()
         devlink_pernet_pre_exit()
           devlink_reload()
             mlx5_devlink_reload_down()
               mlx5_unload_one()
               [...]
                 mlx5_ib_remove()
                   mlx5_ib_unbind_slave_port()
                     mlx5_remove_netdev_notifier()
                       unregister_netdevice_notifier()
                         down_write(&pernet_ops_rwsem);<- recurrsive lock

Hence, when net namespace is deleted, mlx5 reload results in deadlock.

When deadlock occurs, devlink mutex is also held. This not only deadlocks
the mlx5 device under reload, but all the processes which attempt to
access unrelated devlink devices are deadlocked.

Hence, fix this by mlx5 ib driver to register for per net netdev notifier
instead of global one, which operats on the net namespace without holding
the pernet_ops_rwsem.

Fixes: 4383cfcc ("net/mlx5: Add devlink reload")
Link: https://lore.kernel.org/r/20201026134359.23150-1-parav@nvidia.comSigned-off-by: NParav Pandit <parav@nvidia.com>
Signed-off-by: NLeon Romanovsky <leonro@nvidia.com>
Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>

fbdd0049

13 10月, 2020 2 次提交

net/mlx5e: IPsec: Add TX steering rule per IPsec state · 9b9d454d

由 Huy Nguyen 提交于 6月 05, 2020

Add new FTE in TX IPsec FT per IPsec state. It has the
same matching criteria as the RX steering rule.

The IPsec FT is created/destroyed when the first/last rule
is added/deleted respectively.
Signed-off-by: NHuy Nguyen <huyn@mellanox.com>
Reviewed-by: NBoris Pismenny <borisp@nvidia.com>
Reviewed-by: NTariq Toukan <tariqt@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

9b9d454d

net/mlx5: Add NIC TX domain namespace · ee92e4f1

由 Huy Nguyen 提交于 4月 08, 2020

Add new namespace that represents the NIC TX domain.
Signed-off-by: NHuy Nguyen <huyn@mellanox.com>
Signed-off-by: NRaed Salem <raeds@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

ee92e4f1

10 10月, 2020 2 次提交

net/mlx5: Add support for fw live patch event · 2d693567

由 Moshe Shemesh 提交于 10月 07, 2020

Firmware live patch event notifies the driver that the firmware was just
updated using live patch. In such case the driver should not reload or
re-initiate entities, part to updating the firmware version and
re-initiate the firmware tracer which can be updated by live patch with
new strings database to help debugging an issue.
Signed-off-by: NMoshe Shemesh <moshe@mellanox.com>
Reviewed-by: NSaeed Mahameed <saeedm@nvidia.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

2d693567

net/mlx5: Handle sync reset request event · 38b9f903

由 Moshe Shemesh 提交于 10月 07, 2020

Once the driver gets sync_reset_request from firmware it prepares for the
coming reset and sends acknowledge.
After getting this event the driver expects device reset, either it will
trigger PCI reset on sync_reset_now event or such PCI reset will be
triggered by another PF of the same device. So it moves to reset
requested mode and if it gets PCI reset triggered by the other PF it
detect the reset and reloads.
Signed-off-by: NMoshe Shemesh <moshe@mellanox.com>
Reviewed-by: NSaeed Mahameed <saeedm@nvidia.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

38b9f903

03 10月, 2020 2 次提交

net/mlx5: cmdif, Avoid skipping reclaim pages if FW is not accessible · b898ce7b

由 Saeed Mahameed 提交于 9月 11, 2020

In case of pci is offline reclaim_pages_cmd() will still try to call
the FW to release FW pages, cmd_exec() in this case will return a silent
success without actually calling the FW.

This is wrong and will cause page leaks, what we should do is to detect
pci offline or command interface un-available before tying to access the
FW and manually release the FW pages in the driver.

In this patch we share the code to check for FW command interface
availability and we call it in sensitive places e.g. reclaim_pages_cmd().

Alternative fix:
 1. Remove MLX5_CMD_OP_MANAGE_PAGES form mlx5_internal_err_ret_value,
    command success simulation list.
 2. Always Release FW pages even if cmd_exec fails in reclaim_pages_cmd().
Reviewed-by: NMoshe Shemesh <moshe@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

b898ce7b

net/mlx5: Avoid possible free of command entry while timeout comp handler · 50b2412b

由 Eran Ben Elisha 提交于 8月 04, 2020

Upon command completion timeout, driver simulates a forced command
completion. In a rare case where real interrupt for that command arrives
simultaneously, it might release the command entry while the forced
handler might still access it.

Fix that by adding an entry refcount, to track current amount of allowed
handlers. Command entry to be released only when this refcount is
decremented to zero.

Command refcount is always initialized to one. For callback commands,
command completion handler is the symmetric flow to decrement it. For
non-callback commands, it is wait_func().

Before ringing the doorbell, increment the refcount for the real completion
handler. Once the real completion handler is called, it will decrement it.

For callback commands, once the delayed work is scheduled, increment the
refcount. Upon callback command completion handler, we will try to cancel
the timeout callback. In case of success, we need to decrement the callback
refcount as it will never run.

In addition, gather the entry index free and the entry free into a one
flow for all command types release.

Fixes: e126ba97 ("mlx5: Add driver for Mellanox Connect-IB adapters")
Signed-off-by: NEran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: NMoshe Shemesh <moshe@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

50b2412b

01 10月, 2020 1 次提交

net/mlx5: E-switch, Use PF num in metadata reg c0 · 7cd7becd

由 sunils 提交于 9月 11, 2020

Currently only 256 vports can be supported as only 8 bits are
reserved for them and 8 bits are reserved for vhca_ids in
metadata reg c0. To support more than 256 vports, replace
vhca_id with a unique shorter 4-bit PF number which covers
upto 16 PF's. Use remaining 12 bits for vports ranging 1-4095.
This will continue to generate unique metadata even if
multiple PCI devices have same switch_id.
Signed-off-by: Nsunils <sunils@nvidia.com>
Reviewed-by: NParav Pandit <parav@nvidia.com>
Reviewed-by: NVu Pham <vuhuong@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

7cd7becd

18 9月, 2020 3 次提交

RDMA/mlx5: Add sw_owner_v2 bit capability · 9d8feb46

由 Alex Vesker 提交于 8月 31, 2020

Added sw_owner_v2 which will be enabled for future devices,
replacing sw_owner bit.
Signed-off-by: NAlex Vesker <valex@nvidia.com>
Signed-off-by: NLeon Romanovsky <leonro@nvidia.com>

9d8feb46

RDMA/mlx5: Delete duplicated mlx5_ptys_width enum · e27014bd

由 Aharon Landau 提交于 9月 17, 2020

Combine two same enums to avoid duplication.
Signed-off-by: NAharon Landau <aharonl@mellanox.com>
Reviewed-by: NMichael Guralnik <michaelgur@nvidia.com>
Signed-off-by: NLeon Romanovsky <leonro@nvidia.com>

e27014bd

net/mlx5: Refactor query port speed functions · 639bf441

由 Aharon Landau 提交于 9月 17, 2020

The functions mlx5_query_port_link_width_oper and
mlx5_query_port_ib_proto_oper are always called together, so combine them
to a new function called mlx5_query_port_oper to avoid duplication.

And while the mlx5i_get_port_settings is the same as
mlx5_query_port_oper therefore let's remove it.

According to the IB spec link_width_oper and ib_proto_oper should be u16
and not as written u8, so perform casting as a preparation to cross-RDMA
patch which will fix that type for all drivers in the RDMA subsystem.

Fixes: ada68c31 ("net/mlx5: Introduce a new header file for physical port functions")
Signed-off-by: NAharon Landau <aharonl@mellanox.com>
Reviewed-by: NMichael Guralnik <michaelgur@nvidia.com>
Signed-off-by: NLeon Romanovsky <leonro@nvidia.com>

639bf441

16 9月, 2020 2 次提交

net/mlx5e: Add CQE compression support for multi-strides packets · b7cf0806

由 Ofer Levi 提交于 5月 17, 2020

Add CQE compression support for completions of packets that span
multiple strides in a Striding RQ, per the HW capability.
In our memory model, we use small strides (256B as of today) for the
non-linear SKB mode. This feature allows CQE compression to work also
for multiple strides packets. In this case decompressing the mini CQE
array will use stride index provided by HW as part of the mini CQE.
Before this feature, compression was possible only for single-strided
packets, i.e. for packets of size up to 256 bytes when in non-linear
mode, and the index was maintained by SW.
This feature is supported for ConnectX-5 and above.

Feature performance test:
This was whitebox-tested, we reduced the PCI speed from 125Gb/s to
62.5Gb/s to overload pci and manipulated mlx5 driver to drop incoming
packets before building the SKB to achieve low cpu utilization.
Outcome is low cpu utilization and bottleneck on pci only.
Test setup:
Server: Intel(R) Xeon(R) Silver 4108 CPU @ 1.80GHz server, 32 cores
NIC: ConnectX-6 DX.
Sender side generates 300 byte packets at full pci bandwidth.
Receiver side configuration:
Single channel, one cpu processing with one ring allocated. Cpu utilization
is ~20% while pci bandwidth is fully utilized.
For the generated traffic and interface MTU of 4500B (to activate the
non-linear SKB mode), packet rate improvement is about 19% from ~17.6Mpps
to ~21Mpps.
Without this feature, counters show no CQE compression blocks for
this setup, while with the feature, counters show ~20.7Mpps compressed CQEs
in ~500K compression blocks.
Signed-off-by: NOfer Levi <oferle@mellanox.com>
Reviewed-by: NTariq Toukan <tariqt@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

b7cf0806

net/mlx5: Always use container_of to find mdev pointer from clock struct · fb609b51

由 Eran Ben Elisha 提交于 5月 13, 2020

Clock struct is part of struct mlx5_core_dev. Code was inconsistent, on
some cases used container_of and on another used clock->mdev.

Align code to use container_of amd remove clock->mdev pointer.
While here, fix reverse xmas tree coding style.
Signed-off-by: NEran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: NMoshe Shemesh <moshe@mellanox.com>

fb609b51

27 8月, 2020 1 次提交

IB/mlx5: Add DCT RoCE LAG support · 7c4b1ab9

由 Mark Zhang 提交于 8月 18, 2020

When DCT QPs work in RoCE LAG mode:
 1. DCT creation is allowed only when it is supported
 2. The "port" of a DCT QP is assigned in a round-robin way

Link: https://lore.kernel.org/r/20200818115245.700581-3-leon@kernel.orgSigned-off-by: NMark Zhang <markz@mellanox.com>
Reviewed-by: NMaor Gottlieb <maorg@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>

7c4b1ab9

29 7月, 2020 1 次提交

net/mlx5e: Modify uplink state on interface up/down · 7d0314b1

由 Ron Diskin 提交于 4月 05, 2020

When setting the PF interface up/down, notify the firmware to update
uplink state via MODIFY_VPORT_STATE, when E-Switch is enabled.

This behavior will prevent sending traffic out on uplink port when PF is
down, such as sending traffic from a VF interface which is still up.
Currently when calling mlx5e_open/close(), the driver only sends PAOS
command to notify the firmware to set the physical port state to
up/down, however, it is not sufficient. When VF is in "auto" state, it
follows the uplink state, which was not updated on mlx5e_open/close()
before this patch.

When switchdev mode is enabled and uplink representor is first enabled,
set the uplink port state value back to its FW default "AUTO".

Fixes: 63bfd399 ("net/mlx5e: Send PAOS command on interface up/down")
Signed-off-by: NRon Diskin <rondi@mellanox.com>
Reviewed-by: NRoi Dayan <roid@mellanox.com>
Reviewed-by: NMoshe Shemesh <moshe@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

7d0314b1

28 7月, 2020 1 次提交

net/mlx5: Hold pages RB tree per VF · d6945242

由 Eran Ben Elisha 提交于 5月 18, 2020

Per page request event, FW request to allocated or release pages for a
single function. Driver maintains FW pages object per function, so there
is no need to hold one global page data-base. Instead, have a page
data-base per function, which will improve performance release flow in all
cases, especially for "release all pages".

As the range of function IDs is large and not sequential, use xarray to
store a per function ID page data-base, where the function ID is the key.

Upon first allocation of a page to a function ID, create the page
data-base per function. This data-base will be released only at pagealloc
mechanism cleanup.

NIC: ConnectX-4 Lx
CPU: Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz
Test case: 32 VFs, measure release pages on one VF as part of FLR
Before: 0.021 Sec
After:  0.014 Sec

The improvement depends on amount of VFs and memory utilization
by them. Time measurements above were taken from idle system.
Signed-off-by: NEran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: NMark Bloch <markb@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

d6945242

27 7月, 2020 1 次提交

RDMA/mlx5: Set mkey relaxed ordering by UMR with ConnectX-7 · 896ec973

由 Meir Lichtinger 提交于 7月 16, 2020

Up to ConnectX-7 UMR is not used when user passes relaxed ordering access
flag. ConnectX-7 supports setting relaxed ordering read/write mkey
attribute by UMR, indicated by new HCA capabilities.

With ConnectX-7 driver uses UMR when user set relaxed ordering access
flag, in contrast to previous silicon models. Specifically it includes
setting relvant flags of mkey context mask in UMR control segment, and
relaxed ordering write and read flags in UMR mkey context segment.

Link: https://lore.kernel.org/r/20200716105248.1423452-4-leon@kernel.orgSigned-off-by: NMeir Lichtinger <meirl@mellanox.com>
Reviewed-by: NMichael Guralnik <michaelgur@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>

896ec973

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功