提交 · c27971d08abecc91f06214dacc66ce3ce2662a44 · openeuler / Kernel

17 3月, 2021 2 次提交

net/mlx5: Move devlink port from mlx5e priv to mlx5e resources · c27971d0

由 Roi Dayan 提交于 10月 28, 2020

We re-use the native NIC port net device instance for the Uplink
representor, and the devlink port.
When changing profiles we reset the mlx5e priv but we should still
use the devlink port so move it to mlx5e resources.
Signed-off-by: NRoi Dayan <roid@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

c27971d0

net/mlx5: Move mlx5e hw resources into a sub object · c276aae8

由 Roi Dayan 提交于 1月 26, 2021

This is to separate between resources attributes and other
attributes we will want to use.
Signed-off-by: NRoi Dayan <roid@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

c276aae8

17 2月, 2021 2 次提交

net/mlx5: Move all internal timer metadata into a dedicated struct · d6f3dc8f

由 Eran Ben Elisha 提交于 2月 12, 2021

Internal timer mode (SW clock) requires some PTP clock related metadata
structs. Real time mode (HW clock) will not need these metadata structs.
This separation emphasize the different interfaces for HW clock and SW
clock.
Signed-off-by: NEran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: NAya Levin <ayal@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

d6f3dc8f

net/mlx5: Add register layout to support real-time time-stamp · ae02d415

由 Eran Ben Elisha 提交于 2月 12, 2021

Add needed structure layouts and defines for MTUTC (Management UTC)
register. MTUTC will be used for cyc2time HW translation.

In addition, add cyc2time modify capability bit and init segment HCA
real time address.

Finally, add capability bits indicating which time-stamping format is
supported per SQ and RQ. Add ts_format in the queue's context layout to
allow configuration.
Signed-off-by: NEran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: NAya Levin <ayal@nvidia.com>
Reviewed-by: NMoshe Shemesh <moshe@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

ae02d415

11 2月, 2021 1 次提交

net/mlx5: Delete device list leftover · 5b74df80

由 Leon Romanovsky 提交于 1月 04, 2021

Device list is not stored in mlx5_priv anymore, so delete it as it's not
used.
Signed-off-by: NLeon Romanovsky <leonro@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

5b74df80

09 2月, 2021 1 次提交

RDMA/mlx5: Cleanup the synchronize_srcu() from the ODP flow · db72438c

由 Yishai Hadas 提交于 2月 02, 2021

Cleanup the synchronize_srcu() from the ODP flow as it was found to be a
very heavy time consumer as part of dereg_mr.

For example de-registration of 10000 ODP MRs each with size of 2M hugepage
took 19.6 sec comparing de-registration of same number of non ODP MRs that
took 172 ms.

The new locking scheme uses the wait_event() mechanism which follows the
use count of the MR instead of using synchronize_srcu().

By that change, the time required for the above test took 95 ms which is
even better than the non ODP flow.

Once fully dropped the srcu usage, had to come with a lock to protect the
XA access.

As part of using the above mechanism we could also clean the
num_deferred_work stuff and follow the use count instead.

Link: https://lore.kernel.org/r/20210202071309.2057998-1-leon@kernel.orgSigned-off-by: NYishai Hadas <yishaih@nvidia.com>
Signed-off-by: NLeon Romanovsky <leonro@nvidia.com>
Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>

db72438c

06 2月, 2021 1 次提交

IB/mlx5: Move mlx5_port_caps from mlx5_core_dev to mlx5_ib_dev · 3ce60f44

由 Parav Pandit 提交于 2月 03, 2021

mlx5_port_caps are RDMA specific capabilities. These are not used by the
mlx5_core_device at all. Move them to mlx5_ib_dev where it is used and
reduce the scope of it to multiple drivers.

Link: https://lore.kernel.org/r/20210203130133.4057329-2-leon@kernel.orgSigned-off-by: NParav Pandit <parav@nvidia.com>
Signed-off-by: NLeon Romanovsky <leonro@nvidia.com>
Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>

3ce60f44

28 1月, 2021 2 次提交

net/mlx5: Notify on trap action by blocking event · 241dc159

由 Aya Levin 提交于 1月 26, 2021

In order to allow mlx5 core driver to trigger synchronous operations to
its consumers, add a blocking events handler. Add wrappers to
blocking_notifier_[call_chain/chain_register/chain_unregister]. Add trap
callback for action set and notify about this change. Following patches
in the set add a listener for this event.
Signed-off-by: NAya Levin <ayal@nvidia.com>
Reviewed-by: NTariq Toukan <tariqt@nvidia.com>
Signed-off-by: NTariq Toukan <tariqt@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

241dc159

net/mlx5: Add support for devlink traps in mlx5 core driver · 3d347b1b

由 Aya Levin 提交于 1月 26, 2021

Add devlink traps infra-structure to mlx5 core driver. Add traps list to
mlx5_priv and corresponding API:
 - mlx5_devlink_trap_report() to wrap trap reports to devlink
 - mlx5_devlink_trap_get_num_active() to decide whether to open/close trap
   resources.
Signed-off-by: NAya Levin <ayal@nvidia.com>
Reviewed-by: NTariq Toukan <tariqt@nvidia.com>
Signed-off-by: NTariq Toukan <tariqt@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

3d347b1b

23 1月, 2021 4 次提交

net/mlx5: SF, Add port add delete functionality · 8f010541

由 Parav Pandit 提交于 12月 11, 2020

To handle SF port management outside of the eswitch as independent
software layer, introduce eswitch notifier APIs so that mlx5 upper
layer who wish to support sf port management in switchdev mode can
perform its task whenever eswitch mode is set to switchdev or before
eswitch is disabled.

Initialize sf port table on such eswitch event.

Add SF port add and delete functionality in switchdev mode.
Destroy all SF ports when eswitch is disabled.
Expose SF port add and delete to user via devlink commands.

$ devlink dev eswitch set pci/0000:06:00.0 mode switchdev

$ devlink port show
pci/0000:06:00.0/65535: type eth netdev ens2f0np0 flavour physical port 0 splittable false

$ devlink port add pci/0000:06:00.0 flavour pcisf pfnum 0 sfnum 88
pci/0000:06:00.0/32768: type eth netdev eth6 flavour pcisf controller 0 pfnum 0 sfnum 88 external false splittable false
  function:
    hw_addr 00:00:00:00:00:00 state inactive opstate detached

$ devlink port show ens2f0npf0sf88
pci/0000:06:00.0/32768: type eth netdev ens2f0npf0sf88 flavour pcisf controller 0 pfnum 0 sfnum 88 external false splittable false
  function:
    hw_addr 00:00:00:00:00:00 state inactive opstate detached

or by its unique port index:
$ devlink port show pci/0000:06:00.0/32768
pci/0000:06:00.0/32768: type eth netdev ens2f0npf0sf88 flavour pcisf controller 0 pfnum 0 sfnum 88 external false splittable false
  function:
    hw_addr 00:00:00:00:00:00 state inactive opstate detached

$ devlink port show ens2f0npf0sf88 -jp
{
    "port": {
        "pci/0000:06:00.0/32768": {
            "type": "eth",
            "netdev": "ens2f0npf0sf88",
            "flavour": "pcisf",
            "controller": 0,
            "pfnum": 0,
            "sfnum": 88,
            "external": false,
            "splittable": false,
            "function": {
                "hw_addr": "00:00:00:00:00:00",
                "state": "inactive",
                "opstate": "detached"
            }
        }
    }
}
Signed-off-by: NParav Pandit <parav@nvidia.com>
Reviewed-by: NVu Pham <vuhuong@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

8f010541

net/mlx5: SF, Add auxiliary device driver · 1958fc2f

由 Parav Pandit 提交于 12月 11, 2020

Add auxiliary device driver for mlx5 subfunction auxiliary device.

A mlx5 subfunction is similar to PCI PF and VF. For a subfunction
an auxiliary device is created.

As a result, when mlx5 SF auxiliary device binds to the driver,
its netdev and rdma device are created, they appear as

$ ls -l /sys/bus/auxiliary/devices/
mlx5_core.sf.4 -> ../../../devices/pci0000:00/0000:00:03.0/0000:06:00.0/mlx5_core.sf.4

$ ls -l /sys/class/net/eth1/device
/sys/class/net/eth1/device -> ../../../mlx5_core.sf.4

$ cat /sys/bus/auxiliary/devices/mlx5_core.sf.4/sfnum
88

$ devlink dev show
pci/0000:06:00.0
auxiliary/mlx5_core.sf.4

$ devlink port show auxiliary/mlx5_core.sf.4/1
auxiliary/mlx5_core.sf.4/1: type eth netdev p0sf88 flavour virtual port 0 splittable false

$ rdma link show mlx5_0/1
link mlx5_0/1 state ACTIVE physical_state LINK_UP netdev p0sf88

$ rdma dev show
8: rocep6s0f1: node_type ca fw 16.29.0550 node_guid 248a:0703:00b3:d113 sys_image_guid 248a:0703:00b3:d112
13: mlx5_0: node_type ca fw 16.29.0550 node_guid 0000:00ff:fe00:8888 sys_image_guid 248a:0703:00b3:d112

In future, devlink device instance name will adapt to have sfnum
annotation using either an alias or as devlink instance name described
in RFC [1].

[1] https://lore.kernel.org/netdev/20200519092258.GF4655@nanopsycho/Signed-off-by: NParav Pandit <parav@nvidia.com>
Reviewed-by: NVu Pham <vuhuong@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

1958fc2f

net/mlx5: SF, Add auxiliary device support · 90d010b8

由 Parav Pandit 提交于 12月 11, 2020

Introduce API to add and delete an auxiliary device for an SF.
Each SF has its own dedicated window in the PCI BAR 2.

SF device is similar to PCI PF and VF that supports multiple class of
devices such as net, rdma and vdpa.

SF device will be added or removed in subsequent patch during SF
devlink port function state change command.

A subfunction device exposes user supplied subfunction number which will
be further used by systemd/udev to have deterministic name for its
netdevice and rdma device.

An mlx5 subfunction auxiliary device example:

$ devlink dev eswitch set pci/0000:06:00.0 mode switchdev

$ devlink port show
pci/0000:06:00.0/65535: type eth netdev ens2f0np0 flavour physical port 0 splittable false

$ devlink port add pci/0000:06:00.0 flavour pcisf pfnum 0 sfnum 88
pci/0000:08:00.0/32768: type eth netdev eth6 flavour pcisf controller 0 pfnum 0 sfnum 88 external false splittable false
  function:
    hw_addr 00:00:00:00:00:00 state inactive opstate detached

$ devlink port show ens2f0npf0sf88
pci/0000:06:00.0/32768: type eth netdev ens2f0npf0sf88 flavour pcisf controller 0 pfnum 0 sfnum 88 external false splittable false
  function:
    hw_addr 00:00:00:00:88:88 state inactive opstate detached

$ devlink port function set ens2f0npf0sf88 hw_addr 00:00:00:00:88:88 state active

On activation,

$ ls -l /sys/bus/auxiliary/devices/
mlx5_core.sf.4 -> ../../../devices/pci0000:00/0000:00:03.0/0000:06:00.0/mlx5_core.sf.4

$ cat /sys/bus/auxiliary/devices/mlx5_core.sf.4/sfnum
88
Signed-off-by: NParav Pandit <parav@nvidia.com>
Reviewed-by: NVu Pham <vuhuong@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

90d010b8

net/mlx5: Introduce vhca state event notifier · f3196bb0

由 Parav Pandit 提交于 12月 11, 2020

vhca state events indicates change in the state of the vhca that may
occur due to a SF allocation, deallocation or enabling/disabling the
SF HCA.

Introduce vhca state event handler which will be used by SF devlink
port manager and SF hardware id allocator in subsequent patches
to act on the event.

This enables single entity to subscribe, query and rearm the event
for a function.
Signed-off-by: NParav Pandit <parav@nvidia.com>
Reviewed-by: NVu Pham <vuhuong@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

f3196bb0

20 1月, 2021 1 次提交

Revert "RDMA/mlx5: Fix devlink deadlock on net namespace deletion" · de641d74

由 Parav Pandit 提交于 1月 17, 2021

This reverts commit fbdd0049.

Due to commit in fixes tag, netdevice events were received only in one net
namespace of mlx5_core_dev. Due to this when netdevice events arrive in
net namespace other than net namespace of mlx5_core_dev, they are missed.

This results in empty GID table due to RDMA device being detached from its
net device.

Hence, revert back to receive netdevice events in all net namespaces to
restore back RDMA functionality in non init_net net namespace. The
deadlock will have to be addressed in another patch.

Fixes: fbdd0049 ("RDMA/mlx5: Fix devlink deadlock on net namespace deletion")
Link: https://lore.kernel.org/r/20210117092633.10690-1-leon@kernel.orgSigned-off-by: NParav Pandit <parav@nvidia.com>
Signed-off-by: NLeon Romanovsky <leonro@nvidia.com>
Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>

de641d74

06 12月, 2020 1 次提交

net/mlx5: Delete custom device management logic · 601c10c8

由 Leon Romanovsky 提交于 10月 05, 2020

After conversion to use auxiliary bus, all custom device management is
not needed anymore, delete it.
Signed-off-by: NLeon Romanovsky <leonro@nvidia.com>

601c10c8

04 12月, 2020 2 次提交

net/mlx5: Register mlx5 devices to auxiliary virtual bus · a925b5e3

由 Leon Romanovsky 提交于 10月 08, 2020

Create auxiliary devices under new virtual bus. This will replace
the custom-made mlx5 ->add()/->remove() interfaces and next patches
will fill the missing callback and remove the old interface logic.

The attachment of auxiliary drivers to the devices is possible in
1-to-1 manner only and it requires us to create device for every protocol,
so that device (module) will be able to connect to it.

System with 2 IB and 1 RoCE cards:
[leonro@vm ~]$ lspci |grep nox
00:09.0 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5]
00:0a.0 Ethernet controller: Mellanox Technologies MT28908 Family [ConnectX-6]
00:0b.0 Ethernet controller: Mellanox Technologies MT2910 Family [ConnectX-7]
[leonro@vm ~]$ ls -l /sys/bus/auxiliary/devices/
 mlx5_core.eth.2 -> ../../../devices/pci0000:00/0000:00:0b.0/mlx5_core.eth.2
 mlx5_core.rdma.0 -> ../../../devices/pci0000:00/0000:00:09.0/mlx5_core.rdma.0
 mlx5_core.rdma.1 -> ../../../devices/pci0000:00/0000:00:0a.0/mlx5_core.rdma.1
 mlx5_core.rdma.2 -> ../../../devices/pci0000:00/0000:00:0b.0/mlx5_core.rdma.2
 mlx5_core.vdpa.1 -> ../../../devices/pci0000:00/0000:00:0a.0/mlx5_core.vdpa.1
 mlx5_core.vdpa.2 -> ../../../devices/pci0000:00/0000:00:0b.0/mlx5_core.vdpa.2
[leonro@vm ~]$ rdma dev
0: ibp0s9: node_type ca fw 4.6.9999 node_guid 5254:00c0:fe12:3455 sys_image_guid 5254:00c0:fe12:3455
1: ibp0s10: node_type ca fw 4.6.9999 node_guid 5254:00c0:fe12:3456 sys_image_guid 5254:00c0:fe12:3456
2: rdmap0s11: node_type ca fw 4.6.9999 node_guid 5254:00c0:fe12:3457 sys_image_guid 5254:00c0:fe12:3457

System with RoCE SR-IOV card with 4 VFs:
[leonro@vm ~]$ lspci |grep nox
01:00.0 Ethernet controller: Mellanox Technologies MT28908 Family [ConnectX-6]
01:00.1 Ethernet controller: Mellanox Technologies MT28908 Family [ConnectX-6 Virtual Function]
01:00.2 Ethernet controller: Mellanox Technologies MT28908 Family [ConnectX-6 Virtual Function]
01:00.3 Ethernet controller: Mellanox Technologies MT28908 Family [ConnectX-6 Virtual Function]
01:00.4 Ethernet controller: Mellanox Technologies MT28908 Family [ConnectX-6 Virtual Function]
[leonro@vm ~]$ ls -l /sys/bus/auxiliary/devices/
 mlx5_core.eth.0 -> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.0/mlx5_core.eth.0
 mlx5_core.eth.1 -> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.1/mlx5_core.eth.1
 mlx5_core.eth.2 -> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.2/mlx5_core.eth.2
 mlx5_core.eth.3 -> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.3/mlx5_core.eth.3
 mlx5_core.eth.4 -> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.4/mlx5_core.eth.4
 mlx5_core.rdma.0 -> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.0/mlx5_core.rdma.0
 mlx5_core.rdma.1 -> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.1/mlx5_core.rdma.1
 mlx5_core.rdma.2 -> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.2/mlx5_core.rdma.2
 mlx5_core.rdma.3 -> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.3/mlx5_core.rdma.3
 mlx5_core.rdma.4 -> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.4/mlx5_core.rdma.4
 mlx5_core.vdpa.1 -> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.1/mlx5_core.vdpa.1
 mlx5_core.vdpa.2 -> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.2/mlx5_core.vdpa.2
 mlx5_core.vdpa.3 -> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.3/mlx5_core.vdpa.3
 mlx5_core.vdpa.4 -> ../../../devices/pci0000:00/0000:00:09.0/0000:01:00.4/mlx5_core.vdpa.4
[leonro@vm ~]$ rdma dev
0: rocep1s0f0: node_type ca fw 4.6.9999 node_guid 5254:00c0:fe12:3455 sys_image_guid 5254:00c0:fe12:3455
1: rocep1s0f0v0: node_type ca fw 4.6.9999 node_guid 0000:0000:0000:0000 sys_image_guid 5254:00c0:fe12:3456
2: rocep1s0f0v1: node_type ca fw 4.6.9999 node_guid 0000:0000:0000:0000 sys_image_guid 5254:00c0:fe12:3457
3: rocep1s0f0v2: node_type ca fw 4.6.9999 node_guid 0000:0000:0000:0000 sys_image_guid 5254:00c0:fe12:3458
4: rocep1s0f0v3: node_type ca fw 4.6.9999 node_guid 0000:0000:0000:0000 sys_image_guid 5254:00c0:fe12:3459
Signed-off-by: NLeon Romanovsky <leonro@nvidia.com>

a925b5e3

net/mlx5_core: Clean driver version and name · 17a7612b

由 Leon Romanovsky 提交于 10月 04, 2020

Remove exposed driver version as it was done in other drivers,
so module version will work correctly by displaying the kernel
version for which it is compiled.

And move mlx5_core module name to general include, so auxiliary drivers
will be able to use it as a basis for a name in their device ID tables.
Reviewed-by: NParav Pandit <parav@nvidia.com>
Reviewed-by: NRoi Dayan <roid@nvidia.com>
Signed-off-by: NLeon Romanovsky <leonro@nvidia.com>

17a7612b

27 11月, 2020 3 次提交

net/mlx5: Rename peer_pf to host_pf · 8a90f2fc

由 Parav Pandit 提交于 11月 20, 2020

To match the hardware spec, rename peer_pf to host_pf.
Signed-off-by: NParav Pandit <parav@nvidia.com>
Reviewed-by: NBodong Wang <bodong@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

8a90f2fc

net/mlx5: Make API mlx5_core_is_ecpf accept const pointer · 3b1e58aa

由 Parav Pandit 提交于 11月 20, 2020

Subsequent patch implements helper API which has mlx5_core_dev
as const pointer, make its caller API too const *.
Signed-off-by: NParav Pandit <parav@nvidia.com>
Reviewed-by: NBodong Wang <bodong@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

3b1e58aa

net/mlx5: Avoid exposing driver internal command helpers · e5dfe6b5

由 Parav Pandit 提交于 11月 20, 2020

mlx5 command init and cleanup routines are internal to mlx5_core driver.
Hence, avoid exporting them and move their definition to mlx5_core
driver's internal file mlx5_core.h
Signed-off-by: NParav Pandit <parav@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

e5dfe6b5

27 10月, 2020 1 次提交

RDMA/mlx5: Fix devlink deadlock on net namespace deletion · fbdd0049

由 Parav Pandit 提交于 10月 26, 2020

When a mlx5 core devlink instance is reloaded in different net namespace,
its associated IB device is deleted and recreated.

Example sequence is:
$ ip netns add foo
$ devlink dev reload pci/0000:00:08.0 netns foo
$ ip netns del foo

mlx5 IB device needs to attach and detach the netdevice to it through the
netdev notifier chain during load and unload sequence.  A below call graph
of the unload flow.

cleanup_net()
   down_read(&pernet_ops_rwsem); <- first sem acquired
     ops_pre_exit_list()
       pre_exit()
         devlink_pernet_pre_exit()
           devlink_reload()
             mlx5_devlink_reload_down()
               mlx5_unload_one()
               [...]
                 mlx5_ib_remove()
                   mlx5_ib_unbind_slave_port()
                     mlx5_remove_netdev_notifier()
                       unregister_netdevice_notifier()
                         down_write(&pernet_ops_rwsem);<- recurrsive lock

Hence, when net namespace is deleted, mlx5 reload results in deadlock.

When deadlock occurs, devlink mutex is also held. This not only deadlocks
the mlx5 device under reload, but all the processes which attempt to
access unrelated devlink devices are deadlocked.

Hence, fix this by mlx5 ib driver to register for per net netdev notifier
instead of global one, which operats on the net namespace without holding
the pernet_ops_rwsem.

Fixes: 4383cfcc ("net/mlx5: Add devlink reload")
Link: https://lore.kernel.org/r/20201026134359.23150-1-parav@nvidia.comSigned-off-by: NParav Pandit <parav@nvidia.com>
Signed-off-by: NLeon Romanovsky <leonro@nvidia.com>
Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>

fbdd0049

10 10月, 2020 1 次提交

net/mlx5: Handle sync reset request event · 38b9f903

由 Moshe Shemesh 提交于 10月 07, 2020

Once the driver gets sync_reset_request from firmware it prepares for the
coming reset and sends acknowledge.
After getting this event the driver expects device reset, either it will
trigger PCI reset on sync_reset_now event or such PCI reset will be
triggered by another PF of the same device. So it moves to reset
requested mode and if it gets PCI reset triggered by the other PF it
detect the reset and reloads.
Signed-off-by: NMoshe Shemesh <moshe@mellanox.com>
Reviewed-by: NSaeed Mahameed <saeedm@nvidia.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

38b9f903

03 10月, 2020 2 次提交

net/mlx5: cmdif, Avoid skipping reclaim pages if FW is not accessible · b898ce7b

由 Saeed Mahameed 提交于 9月 11, 2020

In case of pci is offline reclaim_pages_cmd() will still try to call
the FW to release FW pages, cmd_exec() in this case will return a silent
success without actually calling the FW.

This is wrong and will cause page leaks, what we should do is to detect
pci offline or command interface un-available before tying to access the
FW and manually release the FW pages in the driver.

In this patch we share the code to check for FW command interface
availability and we call it in sensitive places e.g. reclaim_pages_cmd().

Alternative fix:
 1. Remove MLX5_CMD_OP_MANAGE_PAGES form mlx5_internal_err_ret_value,
    command success simulation list.
 2. Always Release FW pages even if cmd_exec fails in reclaim_pages_cmd().
Reviewed-by: NMoshe Shemesh <moshe@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

b898ce7b

net/mlx5: Avoid possible free of command entry while timeout comp handler · 50b2412b

由 Eran Ben Elisha 提交于 8月 04, 2020

Upon command completion timeout, driver simulates a forced command
completion. In a rare case where real interrupt for that command arrives
simultaneously, it might release the command entry while the forced
handler might still access it.

Fix that by adding an entry refcount, to track current amount of allowed
handlers. Command entry to be released only when this refcount is
decremented to zero.

Command refcount is always initialized to one. For callback commands,
command completion handler is the symmetric flow to decrement it. For
non-callback commands, it is wait_func().

Before ringing the doorbell, increment the refcount for the real completion
handler. Once the real completion handler is called, it will decrement it.

For callback commands, once the delayed work is scheduled, increment the
refcount. Upon callback command completion handler, we will try to cancel
the timeout callback. In case of success, we need to decrement the callback
refcount as it will never run.

In addition, gather the entry index free and the entry free into a one
flow for all command types release.

Fixes: e126ba97 ("mlx5: Add driver for Mellanox Connect-IB adapters")
Signed-off-by: NEran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: NMoshe Shemesh <moshe@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

50b2412b

16 9月, 2020 1 次提交

net/mlx5: Always use container_of to find mdev pointer from clock struct · fb609b51

由 Eran Ben Elisha 提交于 5月 13, 2020

Clock struct is part of struct mlx5_core_dev. Code was inconsistent, on
some cases used container_of and on another used clock->mdev.

Align code to use container_of amd remove clock->mdev pointer.
While here, fix reverse xmas tree coding style.
Signed-off-by: NEran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: NMoshe Shemesh <moshe@mellanox.com>

fb609b51

28 7月, 2020 1 次提交

net/mlx5: Hold pages RB tree per VF · d6945242

由 Eran Ben Elisha 提交于 5月 18, 2020

Per page request event, FW request to allocated or release pages for a
single function. Driver maintains FW pages object per function, so there
is no need to hold one global page data-base. Instead, have a page
data-base per function, which will improve performance release flow in all
cases, especially for "release all pages".

As the range of function IDs is large and not sequential, use xarray to
store a per function ID page data-base, where the function ID is the key.

Upon first allocation of a page to a function ID, create the page
data-base per function. This data-base will be released only at pagealloc
mechanism cleanup.

NIC: ConnectX-4 Lx
CPU: Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz
Test case: 32 VFs, measure release pages on one VF as part of FLR
Before: 0.021 Sec
After:  0.014 Sec

The improvement depends on amount of VFs and memory utilization
by them. Time measurements above were taken from idle system.
Signed-off-by: NEran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: NMark Bloch <markb@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

d6945242

17 7月, 2020 1 次提交

net/mlx5: Accel, Add core IPsec support for the Connect-X family · 9a6ad1ad

由 Raed Salem 提交于 11月 18, 2019

This to set the base for downstream patches to support
the new IPsec implementation of the Connect-X family.

Following modifications made:
- Remove accel layer dependency from MLX5_FPGA_IPSEC.
- Introduce accel_ipsec_ops, each IPsec device will
  have to support these ops.
Signed-off-by: NRaed Salem <raeds@mellanox.com>
Reviewed-by: NTariq Toukan <tariqt@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

9a6ad1ad

16 7月, 2020 2 次提交

net/mlx5: Add VDPA interface type to supported enumerations · 2a913f23

由 Eli Cohen 提交于 7月 14, 2020

VDPA is a new interface that will be added in subsequent patches. It
uses mlx5 core devices and resources. Add an interface type for it.
Signed-off-by: NEli Cohen <eli@mellanox.com>
Reviewed-by: NParav Pandit <parav@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

2a913f23

net/mlx5: Support setting access rights of dma addresses · 1dcb6c36

由 Eli Cohen 提交于 7月 14, 2020

mlx5_fill_page_frag_array() is used to populate dma addresses to
resources that require it, such as QPs, RQs etc. When the resource is
used, PA list permissions are ignored. For resources that use MTT list,
the user is required to provide the access rights. Subsequent patches
use resources that require MTT lists, so modify API and implementation
to support that.
Signed-off-by: NEli Cohen <eli@mellanox.com>
Reviewed-by: NParav Pandit <parav@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

1dcb6c36

10 7月, 2020 1 次提交

net/mlx5e: Fix port buffers cell size value · 88b3d5c9

由 Eran Ben Elisha 提交于 6月 22, 2020

Device unit for port buffers size, xoff_threshold and xon_threshold is
cells. Fix a bug in driver where cell unit size was hard-coded to
128 bytes. This hard-coded value is buggy, as it is wrong for some hardware
versions.

Driver to read cell size from SBCAM register and translate bytes to cell
units accordingly.

In order to fix the bug, this patch exposes SBCAM (Shared buffer
capabilities mask) layout and defines.

If SBCAM.cap_cell_size is valid, use it for all bytes to cells
calculations. If not valid, fallback to 128.

Cell size do not change on the fly per device. Instead of issuing SBCAM
access reg command every time such translation is needed, cache it in
mlx5e_dcbx as part of mlx5e_dcbnl_initialize(). Pass dcbx.port_buff_cell_sz
as a param to every function that needs bytes to cells translation.

While fixing the bug, move MLX5E_BUFFER_CELL_SHIFT macro to
en_dcbnl.c, as it is only used by that file.

Fixes: 0696d608 ("net/mlx5e: Receive buffer configuration")
Signed-off-by: NEran Ben Elisha <eranbe@mellanox.com>
Reviewed-by: NHuy Nguyen <huyn@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

88b3d5c9

30 5月, 2020 1 次提交

net/mlx5: cmd: Fix memset with byte count warning · 2553f421

由 Saeed Mahameed 提交于 5月 27, 2020

Fix sparse warning:
drivers/net/ethernet/mellanox/mlx5/core/cmd.c:1949:15:
warning: memset with byte count of 271720

mlx5_cmd_stats array is too big to be held inline in mlx5_cmd.
Allocate it separately.
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

2553f421

23 5月, 2020 3 次提交

net/mlx5: Avoid processing commands before cmdif is ready · f7936ddd

由 Eran Ben Elisha 提交于 3月 19, 2020

When driver is reloading during recovery flow, it can't get new commands
till command interface is up again. Otherwise we may get to null pointer
trying to access non initialized command structures.

Add cmdif state to avoid processing commands while cmdif is not ready.

Fixes: e126ba97 ("mlx5: Add driver for Mellanox Connect-IB adapters")
Signed-off-by: NEran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: NMoshe Shemesh <moshe@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

f7936ddd

net/mlx5: Fix a race when moving command interface to events mode · d43b7007

由 Eran Ben Elisha 提交于 3月 18, 2020

After driver creates (via FW command) an EQ for commands, the driver will
be informed on new commands completion by EQE. However, due to a race in
driver's internal command mode metadata update, some new commands will
still be miss-handled by driver as if we are in polling mode. Such commands
can get two non forced completion, leading to already freed command entry
access.

CREATE_EQ command, that maps EQ to the command queue must be posted to the
command queue while it is empty and no other command should be posted.

Add SW mechanism that once the CREATE_EQ command is about to be executed,
all other commands will return error without being sent to the FW. Allow
sending other commands only after successfully changing the driver's
internal command mode metadata.
We can safely return error to all other commands while creating the command
EQ, as all other commands might be sent from the user/application during
driver load. Application can rerun them later after driver's load was
finished.

Fixes: e126ba97 ("mlx5: Add driver for Mellanox Connect-IB adapters")
Signed-off-by: NEran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: NMoshe Shemesh <moshe@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

d43b7007

net/mlx5: Add command entry handling completion · 17d00e83

由 Moshe Shemesh 提交于 12月 27, 2019

When FW response to commands is very slow and all command entries in
use are waiting for completion we can have a race where commands can get
timeout before they get out of the queue and handled. Timeout
completion on uninitialized command will cause releasing command's
buffers before accessing it for initialization and then we will get NULL
pointer exception while trying access it. It may also cause releasing
buffers of another command since we may have timeout completion before
even allocating entry index for this command.
Add entry handling completion to avoid this race.

Fixes: e126ba97 ("mlx5: Add driver for Mellanox Connect-IB adapters")
Signed-off-by: NMoshe Shemesh <moshe@mellanox.com>
Signed-off-by: NEran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

17d00e83

19 5月, 2020 1 次提交

net/mlx5: Move iseg access helper routines close to mlx5_core driver · 555af0c3

由 Parav Pandit 提交于 5月 15, 2020

Only mlx5_core driver handles fw initialization check and command
interface revision check.
Hence move them inside the mlx5_core driver where it is used.
This avoid exposing these helpers to all mlx5 drivers.
Signed-off-by: NParav Pandit <parav@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

555af0c3

11 5月, 2020 1 次提交

net/mlx5: Replace zero-length array with flexible-array · b6ca09cb

由 Gustavo A. R. Silva 提交于 5月 07, 2020

The current codebase makes use of the zero-length array language
extension to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:

struct foo {
        int stuff;
        struct boo array[];
};

By making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last in the structure, which
will help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.

Also, notice that, dynamic memory allocations won't be affected by
this change:

"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]

sizeof(flexible-array-member) triggers a warning because flexible array
members have incomplete type[1]. There are some instances of code in
which the sizeof operator is being incorrectly/erroneously applied to
zero-length arrays and the result is zero. Such instances may be hiding
some bugs. So, this work (flexible-array member conversions) will also
help to get completely rid of those sorts of issues.

This issue was found with the help of Coccinelle.

[1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
[2] https://github.com/KSPP/linux/issues/21
[3] commit 76497732 ("cxgb3/l2t: Fix undefined behaviour")
Signed-off-by: NGustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

b6ca09cb

02 5月, 2020 1 次提交

net/mlx5: Add support to get lag physical port · c6bc6041

由 Maor Gottlieb 提交于 4月 30, 2020

Add function to get the device physical port of the lag slave.
Signed-off-by: NMaor Gottlieb <maorg@mellanox.com>
Reviewed-by: NLeon Romanovsky <leonro@mellanox.com>
Acked-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

c6bc6041

29 4月, 2020 2 次提交

net/mlx5: Add structure layout and defines for MFRL register · 06939536

由 Moshe Shemesh 提交于 4月 24, 2020

Add needed structure layouts and defines for MFRL (Management Firmware
Reset Level) register. This structure will be used for the firmware
upgrade and reset flow in the downstream patches.
Signed-off-by: NMoshe Shemesh <moshe@mellanox.com>
Reviewed-by: NTariq Toukan <tariqt@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

06939536

net/mlx5: Use aligned variable while allocating ICM memory · dff8e2d1

由 Erez Shitrit 提交于 4月 24, 2020

The alignment value is part of the input structure, so use it and spare
extra memory allocation when is not needed.
Now, using the new ability when allocating icm for Direct-Rule
insertion.
Signed-off-by: NAriel Levkovich <lariel@mellanox.com>
Signed-off-by: NErez Shitrit <erezsh@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

dff8e2d1

19 4月, 2020 1 次提交

net/mlx5: Move QP logic to mlx5_ib · 333fbaa0

由 Leon Romanovsky 提交于 4月 04, 2020

The mlx5_core doesn't need any functionality coded in qp.c, so move
that file to drivers/infiniband/ be under mlx5_ib responsibility.
Reviewed-by: NSaeed Mahameed <saeedm@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>

333fbaa0

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功