提交 · 9965bbebae59b3563a4d95e4aed121e8965dfdc2 · openeuler / Kernel

08 2月, 2023 2 次提交

net/mlx5: Expose SF firmware pages counter · 9965bbeb

由 Maher Sanalla 提交于 1月 22, 2023

Currently, each core device has VF pages counter which stores number of
fw pages used by its VFs and SFs.

The current design led to a hang when performing firmware reset on DPU,
where the DPU PFs stalled in sriov unload flow due to waiting on release
of SFs pages instead of waiting on only VFs pages.

Thus, Add a separate counter for SF firmware pages, which will prevent
the stall scenario described above.

Fixes: 1958fc2f ("net/mlx5: SF, Add auxiliary device driver")
Signed-off-by: NMaher Sanalla <msanalla@nvidia.com>
Reviewed-by: NShay Drory <shayd@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

9965bbeb

net/mlx5: Store page counters in a single array · c3bdbaea

由 Maher Sanalla 提交于 1月 22, 2023

Currently, an independent page counter is used for tracking memory usage
for each function type such as VF, PF and host PF (DPU).

For better code-readibilty, use a single array that stores
the number of allocated memory pages for each function type.
Signed-off-by: NMaher Sanalla <msanalla@nvidia.com>
Reviewed-by: NShay Drory <shayd@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

c3bdbaea

10 1月, 2023 1 次提交

net/mlx5: Fix command stats access after free · da2e552b

由 Moshe Shemesh 提交于 11月 28, 2022

Command may fail while driver is reloading and can't accept FW commands
till command interface is reinitialized. Such command failure is being
logged to command stats. This results in NULL pointer access as command
stats structure is being freed and reallocated during mlx5 devlink
reload (see kernel log below).

Fix it by making command stats statically allocated on driver probe.

Kernel log:
[ 2394.808802] BUG: unable to handle kernel paging request at 000000000002a9c0
[ 2394.810610] PGD 0 P4D 0
[ 2394.811811] Oops: 0002 [#1] SMP NOPTI
...
[ 2394.815482] RIP: 0010:native_queued_spin_lock_slowpath+0x183/0x1d0
...
[ 2394.829505] Call Trace:
[ 2394.830667]  _raw_spin_lock_irq+0x23/0x26
[ 2394.831858]  cmd_status_err+0x55/0x110 [mlx5_core]
[ 2394.833020]  mlx5_access_reg+0xe7/0x150 [mlx5_core]
[ 2394.834175]  mlx5_query_port_ptys+0x78/0xa0 [mlx5_core]
[ 2394.835337]  mlx5e_ethtool_get_link_ksettings+0x74/0x590 [mlx5_core]
[ 2394.836454]  ? kmem_cache_alloc_trace+0x140/0x1c0
[ 2394.837562]  __rh_call_get_link_ksettings+0x33/0x100
[ 2394.838663]  ? __rtnl_unlock+0x25/0x50
[ 2394.839755]  __ethtool_get_link_ksettings+0x72/0x150
[ 2394.840862]  duplex_show+0x6e/0xc0
[ 2394.841963]  dev_attr_show+0x1c/0x40
[ 2394.843048]  sysfs_kf_seq_show+0x9b/0x100
[ 2394.844123]  seq_read+0x153/0x410
[ 2394.845187]  vfs_read+0x91/0x140
[ 2394.846226]  ksys_read+0x4f/0xb0
[ 2394.847234]  do_syscall_64+0x5b/0x1a0
[ 2394.848228]  entry_SYSCALL_64_after_hwframe+0x65/0xca

Fixes: 34f46ae0 ("net/mlx5: Add command failures data to debugfs")
Signed-off-by: NMoshe Shemesh <moshe@nvidia.com>
Reviewed-by: NShay Drory <shayd@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

da2e552b

30 11月, 2022 1 次提交

net/mlx5: Remove unused ctx variables · 12eb0f84

由 Petr Pavlu 提交于 9月 29, 2022

Remove mlx5_priv.ctx_list and ctx_lock which are no longer used after
commit 601c10c8 ("net/mlx5: Delete custom device management logic").
Signed-off-by: NPetr Pavlu <petr.pavlu@suse.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

12eb0f84

22 11月, 2022 1 次提交

net/mlx5: cmdif, Print info on any firmware cmd failure to tracepoint · 870c2481

由 Moshe Shemesh 提交于 5月 31, 2022

While moving to new CMD API (quiet API), some pre-existing flows may call the new API
function that in case of error, returns the error instead of printing it as previously done.
For such flows we bring back the print but to tracepoint this time for sys admins to
have the ability to check for errors especially for commands using the new quiet API.

Tracepoint output example:
devlink-1333 [001] ..... 822.746922: mlx5_cmd: ACCESS_REG(0x805) op_mod(0x0) failed, status bad resource(0x5), syndrome (0xb06e1f), err(-22)

Fixes: f23519e5 ("net/mlx5: cmdif, Add new api for command execution")
Signed-off-by: NMoshe Shemesh <moshe@nvidia.com>
Reviewed-by: NShay Drory <shayd@nvidia.com>
Reviewed-by: NMaor Gottlieb <maorg@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

870c2481

28 10月, 2022 1 次提交

net/mlx5: Fix possible use-after-free in async command interface · bacd22df

由 Tariq Toukan 提交于 10月 26, 2022

mlx5_cmd_cleanup_async_ctx should return only after all its callback
handlers were completed. Before this patch, the below race between
mlx5_cmd_cleanup_async_ctx and mlx5_cmd_exec_cb_handler was possible and
lead to a use-after-free:

1. mlx5_cmd_cleanup_async_ctx is called while num_inflight is 2 (i.e.
   elevated by 1, a single inflight callback).
2. mlx5_cmd_cleanup_async_ctx decreases num_inflight to 1.
3. mlx5_cmd_exec_cb_handler is called, decreases num_inflight to 0 and
   is about to call wake_up().
4. mlx5_cmd_cleanup_async_ctx calls wait_event, which returns
   immediately as the condition (num_inflight == 0) holds.
5. mlx5_cmd_cleanup_async_ctx returns.
6. The caller of mlx5_cmd_cleanup_async_ctx frees the mlx5_async_ctx
   object.
7. mlx5_cmd_exec_cb_handler goes on and calls wake_up() on the freed
   object.

Fix it by syncing using a completion object. Mark it completed when
num_inflight reaches 0.

Trace:

BUG: KASAN: use-after-free in do_raw_spin_lock+0x23d/0x270
Read of size 4 at addr ffff888139cd12f4 by task swapper/5/0

CPU: 5 PID: 0 Comm: swapper/5 Not tainted 6.0.0-rc3_for_upstream_debug_2022_08_30_13_10 #1
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
Call Trace:
 <IRQ>
 dump_stack_lvl+0x57/0x7d
 print_report.cold+0x2d5/0x684
 ? do_raw_spin_lock+0x23d/0x270
 kasan_report+0xb1/0x1a0
 ? do_raw_spin_lock+0x23d/0x270
 do_raw_spin_lock+0x23d/0x270
 ? rwlock_bug.part.0+0x90/0x90
 ? __delete_object+0xb8/0x100
 ? lock_downgrade+0x6e0/0x6e0
 _raw_spin_lock_irqsave+0x43/0x60
 ? __wake_up_common_lock+0xb9/0x140
 __wake_up_common_lock+0xb9/0x140
 ? __wake_up_common+0x650/0x650
 ? destroy_tis_callback+0x53/0x70 [mlx5_core]
 ? kasan_set_track+0x21/0x30
 ? destroy_tis_callback+0x53/0x70 [mlx5_core]
 ? kfree+0x1ba/0x520
 ? do_raw_spin_unlock+0x54/0x220
 mlx5_cmd_exec_cb_handler+0x136/0x1a0 [mlx5_core]
 ? mlx5_cmd_cleanup_async_ctx+0x220/0x220 [mlx5_core]
 ? mlx5_cmd_cleanup_async_ctx+0x220/0x220 [mlx5_core]
 mlx5_cmd_comp_handler+0x65a/0x12b0 [mlx5_core]
 ? dump_command+0xcc0/0xcc0 [mlx5_core]
 ? lockdep_hardirqs_on_prepare+0x400/0x400
 ? cmd_comp_notifier+0x7e/0xb0 [mlx5_core]
 cmd_comp_notifier+0x7e/0xb0 [mlx5_core]
 atomic_notifier_call_chain+0xd7/0x1d0
 mlx5_eq_async_int+0x3ce/0xa20 [mlx5_core]
 atomic_notifier_call_chain+0xd7/0x1d0
 ? irq_release+0x140/0x140 [mlx5_core]
 irq_int_handler+0x19/0x30 [mlx5_core]
 __handle_irq_event_percpu+0x1f2/0x620
 handle_irq_event+0xb2/0x1d0
 handle_edge_irq+0x21e/0xb00
 __common_interrupt+0x79/0x1a0
 common_interrupt+0x78/0xa0
 </IRQ>
 <TASK>
 asm_common_interrupt+0x22/0x40
RIP: 0010:default_idle+0x42/0x60
Code: c1 83 e0 07 48 c1 e9 03 83 c0 03 0f b6 14 11 38 d0 7c 04 84 d2 75 14 8b 05 eb 47 22 02 85 c0 7e 07 0f 00 2d e0 9f 48 00 fb f4 <c3> 48 c7 c7 80 08 7f 85 e8 d1 d3 3e fe eb de 66 66 2e 0f 1f 84 00
RSP: 0018:ffff888100dbfdf0 EFLAGS: 00000242
RAX: 0000000000000001 RBX: ffffffff84ecbd48 RCX: 1ffffffff0afe110
RDX: 0000000000000004 RSI: 0000000000000000 RDI: ffffffff835cc9bc
RBP: 0000000000000005 R08: 0000000000000001 R09: ffff88881dec4ac3
R10: ffffed1103bd8958 R11: 0000017d0ca571c9 R12: 0000000000000005
R13: ffffffff84f024e0 R14: 0000000000000000 R15: dffffc0000000000
 ? default_idle_call+0xcc/0x450
 default_idle_call+0xec/0x450
 do_idle+0x394/0x450
 ? arch_cpu_idle_exit+0x40/0x40
 ? do_idle+0x17/0x450
 cpu_startup_entry+0x19/0x20
 start_secondary+0x221/0x2b0
 ? set_cpu_sibling_map+0x2070/0x2070
 secondary_startup_64_no_verify+0xcd/0xdb
 </TASK>

Allocated by task 49502:
 kasan_save_stack+0x1e/0x40
 __kasan_kmalloc+0x81/0xa0
 kvmalloc_node+0x48/0xe0
 mlx5e_bulk_async_init+0x35/0x110 [mlx5_core]
 mlx5e_tls_priv_tx_list_cleanup+0x84/0x3e0 [mlx5_core]
 mlx5e_ktls_cleanup_tx+0x38f/0x760 [mlx5_core]
 mlx5e_cleanup_nic_tx+0xa7/0x100 [mlx5_core]
 mlx5e_detach_netdev+0x1ca/0x2b0 [mlx5_core]
 mlx5e_suspend+0xdb/0x140 [mlx5_core]
 mlx5e_remove+0x89/0x190 [mlx5_core]
 auxiliary_bus_remove+0x52/0x70
 device_release_driver_internal+0x40f/0x650
 driver_detach+0xc1/0x180
 bus_remove_driver+0x125/0x2f0
 auxiliary_driver_unregister+0x16/0x50
 mlx5e_cleanup+0x26/0x30 [mlx5_core]
 cleanup+0xc/0x4e [mlx5_core]
 __x64_sys_delete_module+0x2b5/0x450
 do_syscall_64+0x3d/0x90
 entry_SYSCALL_64_after_hwframe+0x46/0xb0

Freed by task 49502:
 kasan_save_stack+0x1e/0x40
 kasan_set_track+0x21/0x30
 kasan_set_free_info+0x20/0x30
 ____kasan_slab_free+0x11d/0x1b0
 kfree+0x1ba/0x520
 mlx5e_tls_priv_tx_list_cleanup+0x2e7/0x3e0 [mlx5_core]
 mlx5e_ktls_cleanup_tx+0x38f/0x760 [mlx5_core]
 mlx5e_cleanup_nic_tx+0xa7/0x100 [mlx5_core]
 mlx5e_detach_netdev+0x1ca/0x2b0 [mlx5_core]
 mlx5e_suspend+0xdb/0x140 [mlx5_core]
 mlx5e_remove+0x89/0x190 [mlx5_core]
 auxiliary_bus_remove+0x52/0x70
 device_release_driver_internal+0x40f/0x650
 driver_detach+0xc1/0x180
 bus_remove_driver+0x125/0x2f0
 auxiliary_driver_unregister+0x16/0x50
 mlx5e_cleanup+0x26/0x30 [mlx5_core]
 cleanup+0xc/0x4e [mlx5_core]
 __x64_sys_delete_module+0x2b5/0x450
 do_syscall_64+0x3d/0x90
 entry_SYSCALL_64_after_hwframe+0x46/0xb0

Fixes: e355477e ("net/mlx5: Make mlx5_cmd_exec_cb() a safe API")
Signed-off-by: NTariq Toukan <tariqt@nvidia.com>
Reviewed-by: NMoshe Shemesh <moshe@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>
Link: https://lore.kernel.org/r/20221026135153.154807-8-saeed@kernel.orgSigned-off-by: NJakub Kicinski <kuba@kernel.org>

bacd22df

04 10月, 2022 2 次提交

net/mlx5: Start health poll at earlier stage of driver load · 9b98d395

由 Moshe Shemesh 提交于 10月 01, 2022

Start health poll at earlier stage, so if fw fatal issue occurred before
or during initialization commands such as init_hca or set_hca_cap the
poll health can detect and indicate that the driver is already in error
state.
Signed-off-by: NMoshe Shemesh <moshe@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

9b98d395

net/mlx5e: xsk: Use umr_mode to calculate striding RQ parameters · 168723c1

由 Maxim Mikityanskiy 提交于 10月 01, 2022

Instead of passing the unaligned flag, pass an enum that indicates the
UMR mode. The next commit will add the third mode (KLM for certain
configurations of XSK), which will be added to this enum instead of
adding another bool flag everywhere.
Signed-off-by: NMaxim Mikityanskiy <maximmi@nvidia.com>
Reviewed-by: NTariq Toukan <tariqt@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

168723c1

28 9月, 2022 4 次提交

net/mlx5: Remove unused structs · b53ff37f

由 Gal Pressman 提交于 9月 07, 2022

Remove structs which are no longer used in the driver:
  mlx5dr_cmd_qp_create_attr
  mlx5_fs_dr_ns
  mlx5_pas
Signed-off-by: NGal Pressman <gal@nvidia.com>
Reviewed-by: NLeon Romanovsky <leonro@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

b53ff37f

net/mlx5: Remove unused functions · 66af4fe3

由 Gal Pressman 提交于 9月 07, 2022

Remove functions which are no longer used in the driver:
  mlx5e_ipsec_is_tx_flow
  mlx5_health_flush
  get_cqe_enhanced_num_mini_cqes
  get_cqe_l3_hdr_type
  mlx5_health_flush
  mlx5_fs_is_ipsec_flow
  _mlx5_fs_is_outer_ipproto_flow
  mlx5_fs_is_outer_tcp_flow
  mlx5_fs_is_outer_udp_flow
  mlx5_fs_is_vxlan_flow
  mlx5_fs_is_outer_ipsec_flow
Signed-off-by: NGal Pressman <gal@nvidia.com>
Reviewed-by: NLeon Romanovsky <leonro@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

66af4fe3

RDMA/mlx5: Don't set tx affinity when lag is in hash mode · a83bb5df

由 Liu, Changcheng 提交于 9月 07, 2022

In hash mode, without setting tx affinity explicitly, the port select
flow table decides which port is used for the traffic.
If port_select_flow_table_bypass capability is supported and tx affinity
is set explicitly for QP/TIS, they will be added into the explicit affinity
table in FW to check which port is used for the traffic.
1. The overloaded explicit affinity table may affect performance.
   To avoid this, do not set tx affinity explicitly by default.
2. The packets of the same flow need to be transmitted on the same port.
   Because the packets of the same flow use different QPs in slow & fast
   path, it shouldn't set tx affinity explicitly for these QPs.
Signed-off-by: NLiu, Changcheng <jerrliu@nvidia.com>
Reviewed-by: NMark Bloch <mbloch@nvidia.com>
Reviewed-by: NVlad Buslov <vladbu@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

a83bb5df

net/mlx5: Add support for NPPS with real time mode · f0462bc3

由 Aya Levin 提交于 9月 07, 2022

Add support for setting NPPS. NPPS is currently available in
REAL_TIME_CLOCK mode only. In addition allow the user to set the pulse
duration.

When NPPS pulse duration is not set explicitly by the user, driver set
it to 50% of the NPPS period.
Signed-off-by: NAya Levin <ayal@nvidia.com>
Reviewed-by: NEran Ben Elisha <eranbe@nvidia.com>
Reviewed-by: NSaeed Mahameed <saeedm@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

f0462bc3

05 9月, 2022 2 次提交

RDMA/mlx5: Move function mlx5_core_query_ib_ppcnt() to mlx5_ib · 8a2dd123

由 Chris Mi 提交于 8月 29, 2022

This patch doesn't change any functionality, but move one function
to mlx5_ib because it is not used by mlx5_core.

The actual fix is in the next patch.
Reviewed-by: NRoi Dayan <roid@nvidia.com>
Signed-off-by: NChris Mi <cmi@nvidia.com>
Link: https://lore.kernel.org/r/fd47b9138412bd94ed30f838026cbb4cf3878150.1661763871.git.leonro@nvidia.comSigned-off-by: NLeon Romanovsky <leon@kernel.org>

8a2dd123

RDMA/mlx5: Rely on RoCE fw cap instead of devlink when setting profile · 9ca05b0f

由 Maher Sanalla 提交于 8月 29, 2022

When the RDMA auxiliary driver probes, it sets its profile based on
devlink driverinit value. The latter might not be in sync with FW yet
(In case devlink reload is not performed), thus causing a mismatch
between RDMA driver and FW. This results in the following FW syndrome
when the RDMA driver tries to adjust RoCE state, which fails the probe:

"0xC1F678 | modify_nic_vport_context: roce_en set on a vport that
doesn't support roce"

To prevent this, select the PF profile based on FW RoCE capability
instead of relying on devlink driverinit value.
To provide backward compatibility of the RoCE disable feature, on older
FW's where roce_rw is not set (FW RoCE capability is read-only), keep
the current behavior e.g., rely on devlink driverinit value.

Fixes: fbfa97b4 ("net/mlx5: Disable roce at HCA level")
Reviewed-by: NShay Drory <shayd@nvidia.com>
Reviewed-by: NMichael Guralnik <michaelgur@nvidia.com>
Reviewed-by: NSaeed Mahameed <saeedm@nvidia.com>
Signed-off-by: NMaher Sanalla <msanalla@nvidia.com>
Link: https://lore.kernel.org/r/cb34ce9a1df4a24c135cb804db87f7d2418bd6cc.1661763459.git.leonro@nvidia.comSigned-off-by: NLeon Romanovsky <leon@kernel.org>

9ca05b0f

23 8月, 2022 1 次提交

net/mlx5: Avoid false positive lockdep warning by adding lock_class_key · d59b73a6

由 Moshe Shemesh 提交于 8月 03, 2022

Add a lock_class_key per mlx5 device to avoid a false positive
"possible circular locking dependency" warning by lockdep, on flows
which lock more than one mlx5 device, such as adding SF.

kernel log:
 ======================================================
 WARNING: possible circular locking dependency detected
 5.19.0-rc8+ #2 Not tainted
 ------------------------------------------------------
 kworker/u20:0/8 is trying to acquire lock:
 ffff88812dfe0d98 (&dev->intf_state_mutex){+.+.}-{3:3}, at: mlx5_init_one+0x2e/0x490 [mlx5_core]

 but task is already holding lock:
 ffff888101aa7898 (&(&notifier->n_head)->rwsem){++++}-{3:3}, at: blocking_notifier_call_chain+0x5a/0x130

 which lock already depends on the new lock.

 the existing dependency chain (in reverse order) is:

 -> #1 (&(&notifier->n_head)->rwsem){++++}-{3:3}:
        down_write+0x90/0x150
        blocking_notifier_chain_register+0x53/0xa0
        mlx5_sf_table_init+0x369/0x4a0 [mlx5_core]
        mlx5_init_one+0x261/0x490 [mlx5_core]
        probe_one+0x430/0x680 [mlx5_core]
        local_pci_probe+0xd6/0x170
        work_for_cpu_fn+0x4e/0xa0
        process_one_work+0x7c2/0x1340
        worker_thread+0x6f6/0xec0
        kthread+0x28f/0x330
        ret_from_fork+0x1f/0x30

 -> #0 (&dev->intf_state_mutex){+.+.}-{3:3}:
        __lock_acquire+0x2fc7/0x6720
        lock_acquire+0x1c1/0x550
        __mutex_lock+0x12c/0x14b0
        mlx5_init_one+0x2e/0x490 [mlx5_core]
        mlx5_sf_dev_probe+0x29c/0x370 [mlx5_core]
        auxiliary_bus_probe+0x9d/0xe0
        really_probe+0x1e0/0xaa0
        __driver_probe_device+0x219/0x480
        driver_probe_device+0x49/0x130
        __device_attach_driver+0x1b8/0x280
        bus_for_each_drv+0x123/0x1a0
        __device_attach+0x1a3/0x460
        bus_probe_device+0x1a2/0x260
        device_add+0x9b1/0x1b40
        __auxiliary_device_add+0x88/0xc0
        mlx5_sf_dev_state_change_handler+0x67e/0x9d0 [mlx5_core]
        blocking_notifier_call_chain+0xd5/0x130
        mlx5_vhca_state_work_handler+0x2b0/0x3f0 [mlx5_core]
        process_one_work+0x7c2/0x1340
        worker_thread+0x59d/0xec0
        kthread+0x28f/0x330
        ret_from_fork+0x1f/0x30

  other info that might help us debug this:

  Possible unsafe locking scenario:

        CPU0                    CPU1
        ----                    ----
   lock(&(&notifier->n_head)->rwsem);
                                lock(&dev->intf_state_mutex);
                                lock(&(&notifier->n_head)->rwsem);
   lock(&dev->intf_state_mutex);

  *** DEADLOCK ***

 4 locks held by kworker/u20:0/8:
  #0: ffff888150612938 ((wq_completion)mlx5_events){+.+.}-{0:0}, at: process_one_work+0x6e2/0x1340
  #1: ffff888100cafdb8 ((work_completion)(&work->work)#3){+.+.}-{0:0}, at: process_one_work+0x70f/0x1340
  #2: ffff888101aa7898 (&(&notifier->n_head)->rwsem){++++}-{3:3}, at: blocking_notifier_call_chain+0x5a/0x130
  #3: ffff88813682d0e8 (&dev->mutex){....}-{3:3}, at:__device_attach+0x76/0x460

 stack backtrace:
 CPU: 6 PID: 8 Comm: kworker/u20:0 Not tainted 5.19.0-rc8+
 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
 Workqueue: mlx5_events mlx5_vhca_state_work_handler [mlx5_core]
 Call Trace:
  <TASK>
  dump_stack_lvl+0x57/0x7d
  check_noncircular+0x278/0x300
  ? print_circular_bug+0x460/0x460
  ? lock_chain_count+0x20/0x20
  ? register_lock_class+0x1880/0x1880
  __lock_acquire+0x2fc7/0x6720
  ? register_lock_class+0x1880/0x1880
  ? register_lock_class+0x1880/0x1880
  lock_acquire+0x1c1/0x550
  ? mlx5_init_one+0x2e/0x490 [mlx5_core]
  ? lockdep_hardirqs_on_prepare+0x400/0x400
  __mutex_lock+0x12c/0x14b0
  ? mlx5_init_one+0x2e/0x490 [mlx5_core]
  ? mlx5_init_one+0x2e/0x490 [mlx5_core]
  ? _raw_read_unlock+0x1f/0x30
  ? mutex_lock_io_nested+0x1320/0x1320
  ? __ioremap_caller.constprop.0+0x306/0x490
  ? mlx5_sf_dev_probe+0x269/0x370 [mlx5_core]
  ? iounmap+0x160/0x160
  mlx5_init_one+0x2e/0x490 [mlx5_core]
  mlx5_sf_dev_probe+0x29c/0x370 [mlx5_core]
  ? mlx5_sf_dev_remove+0x130/0x130 [mlx5_core]
  auxiliary_bus_probe+0x9d/0xe0
  really_probe+0x1e0/0xaa0
  __driver_probe_device+0x219/0x480
  ? auxiliary_match_id+0xe9/0x140
  driver_probe_device+0x49/0x130
  __device_attach_driver+0x1b8/0x280
  ? driver_allows_async_probing+0x140/0x140
  bus_for_each_drv+0x123/0x1a0
  ? bus_for_each_dev+0x1a0/0x1a0
  ? lockdep_hardirqs_on_prepare+0x286/0x400
  ? trace_hardirqs_on+0x2d/0x100
  __device_attach+0x1a3/0x460
  ? device_driver_attach+0x1e0/0x1e0
  ? kobject_uevent_env+0x22d/0xf10
  bus_probe_device+0x1a2/0x260
  device_add+0x9b1/0x1b40
  ? dev_set_name+0xab/0xe0
  ? __fw_devlink_link_to_suppliers+0x260/0x260
  ? memset+0x20/0x40
  ? lockdep_init_map_type+0x21a/0x7d0
  __auxiliary_device_add+0x88/0xc0
  ? auxiliary_device_init+0x86/0xa0
  mlx5_sf_dev_state_change_handler+0x67e/0x9d0 [mlx5_core]
  blocking_notifier_call_chain+0xd5/0x130
  mlx5_vhca_state_work_handler+0x2b0/0x3f0 [mlx5_core]
  ? mlx5_vhca_event_arm+0x100/0x100 [mlx5_core]
  ? lock_downgrade+0x6e0/0x6e0
  ? lockdep_hardirqs_on_prepare+0x286/0x400
  process_one_work+0x7c2/0x1340
  ? lockdep_hardirqs_on_prepare+0x400/0x400
  ? pwq_dec_nr_in_flight+0x230/0x230
  ? rwlock_bug.part.0+0x90/0x90
  worker_thread+0x59d/0xec0
  ? process_one_work+0x1340/0x1340
  kthread+0x28f/0x330
  ? kthread_complete_and_exit+0x20/0x20
  ret_from_fork+0x1f/0x30
  </TASK>

Fixes: 6a327321 ("net/mlx5: SF, Port function state change support")
Signed-off-by: NMoshe Shemesh <moshe@nvidia.com>
Reviewed-by: NShay Drory <shayd@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

d59b73a6

28 7月, 2022 1 次提交

RDMA/mlx5: Rename the mkey cache variables and functions · 01137808

由 Aharon Landau 提交于 7月 26, 2022

After replacing the MR cache with an Mkey cache, rename the variables and
functions to fit the new meaning.

Link: https://lore.kernel.org/r/20220726071911.122765-6-michaelgur@nvidia.comSigned-off-by: NAharon Landau <aharonl@nvidia.com>
Signed-off-by: NLeon Romanovsky <leonro@nvidia.com>
Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>

01137808

14 7月, 2022 1 次提交

net/mlx5: Use software VHCA id when it's supported · dc402ccc

由 Yishai Hadas 提交于 6月 02, 2022

Use software VHCA id when it's supported by the firmware.

A unique id is allocated upon mlx5_mdev_init() and freed upon
mlx5_mdev_uninit(), as such it stays the same during the full life cycle
of the device including upon health recovery if occurred.

The conjunction of sw_vhca_id with sw_owner_id will be a global unique
id per function which uses mlx5_core.

The sw_vhca_id is set upon init_hca command and is used to specify the
VHCA that the NIC vport is affiliated with.

This functionality is needed upon migration of VM which is MPV based.
(i.e. multi port device).
Signed-off-by: NYishai Hadas <yishaih@nvidia.com>
Reviewed-by: NMark Bloch <mbloch@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

dc402ccc

12 7月, 2022 1 次提交

net/mlx5: Use devl_ API in mlx5e_devlink_port_register · 7b19119f

由 Moshe Shemesh 提交于 7月 11, 2022

As part of the flows invoked by mlx5_devlink_eswitch_mode_set() get to
mlx5_rescan_drivers_locked() which can call mlx5e_probe()/mlx5e_remove
and register/unregister mlx5e driver ports accordingly. This can lead to
deadlock once mlx5_devlink_eswitch_mode_set() will use devlink lock.
Use devl_port_register/unregister() instead of
devlink_port_register/unregister() and add devlink instance locks in the
driver paths to this function to have it locked while calling devl_ API
function.

If remove or probe were called by module init or module cleanup flows,
need to lock devlink just before calling devl_port_register(), otherwise
it is called by attach/detach or register/unregister flow and we can
have the flow locked. Added flag to distinguish between these cases.

This will be used by the downstream patch to invoke
mlx5_devlink_eswitch_mode_set() with devlink locked.
Signed-off-by: NMoshe Shemesh <moshe@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>
Signed-off-by: NPaolo Abeni <pabeni@redhat.com>

7b19119f

14 6月, 2022 1 次提交

net/mlx5: Manage ICM of type modify-header pattern · 66765836

由 Yevgeny Kliteynik 提交于 6月 07, 2022

Added support for managing new type of ICM for devices that
support sw_owner_v2.
Signed-off-by: NErez Shitrit <erezsh@mellanox.com>
Signed-off-by: NYevgeny Kliteynik <kliteyn@nvidia.com>
Signed-off-by: NLeon Romanovsky <leonro@nvidia.com>
Acked-by: NSaeed Mahameed <saeedm@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

66765836

18 5月, 2022 2 次提交

net/mlx5: Inline db alloc API function · 9b45bde8

由 Tariq Toukan 提交于 1月 25, 2022

Take the wrapper version which picks default node into a header file.
This reduces the number of exported functions.
Signed-off-by: NTariq Toukan <tariqt@nvidia.com>
Reviewed-by: NMoshe Shemesh <moshe@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

9b45bde8

net/mlx5: Add last command failure syndrome to debugfs · 1d2c717b

由 Moshe Shemesh 提交于 5月 13, 2022

Add syndrome of last command failure per command type to debugfs to ease
debugging of such failure.
last_failed_syndrome - last command failed syndrome returned by FW.
Signed-off-by: NMoshe Shemesh <moshe@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

1d2c717b

10 5月, 2022 5 次提交

net/mlx5: Expose mlx5_sriov_blocking_notifier_register / unregister APIs · 846e4373

由 Yishai Hadas 提交于 5月 10, 2022

Expose mlx5_sriov_blocking_notifier_register / unregister APIs to let a
VF register to be notified for its enablement / disablement by the PF.

Upon VF probe it will call mlx5_sriov_blocking_notifier_register() with
its notifier block and upon VF remove it will call
mlx5_sriov_blocking_notifier_unregister() to drop its registration.

This can give a VF the ability to clean some resources upon disable
before that the command interface goes down and on the other hand sets
some stuff before that it's enabled.

This may be used by a VF which is migration capable in few cases.(e.g.
PF load/unload upon an health recovery).

Link: https://lore.kernel.org/r/20220510090206.90374-2-yishaih@nvidia.comSigned-off-by: NYishai Hadas <yishaih@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>
Signed-off-by: NLeon Romanovsky <leonro@nvidia.com>

846e4373

net/mlx5: Lag, add debugfs to query hardware lag state · 7f46a0b7

由 Mark Bloch 提交于 3月 15, 2022

Lag state has become very complicated with many modes, flags, types and
port selections methods and future work will add additional features.

Add a debugfs to query the current lag state. A new directory named "lag"
will be created under the mlx5 debugfs directory. As the driver has
debugfs per pci function the location will be: <debugfs>/mlx5/<BDF>/lag

For example:
/sys/kernel/debug/mlx5/0000:08:00.0/lag

The following files are exposed:

- state: Returns "active" or "disabled". If "active" it means hardware
         lag is active.

- members: Returns the BDFs of all the members of lag object.

- type: Returns the type of the lag currently configured. Valid only
	if hardware lag is active.
	* "roce" - Members are bare metal PFs.
	* "switchdev" - Members are in switchdev mode.
	* "multipath" - ECMP offloads.

- port_sel_mode: Returns the egress port selection method, valid
		 only if hardware lag is active.
		 * "queue_affinity" - Egress port is selected by
		   the QP/SQ affinity.
		 * "hash" - Egress port is selected by hash done on
		   each packet. Controlled by: xmit_hash_policy of the
		   bond device.
- flags: Returns flags that are specific per lag @type. Valid only if
	 hardware lag is active.
	 * "shared_fdb" - "on" or "off", if "on" single FDB is used.

- mapping: Returns the mapping which is used to select egress port.
	   Valid only if hardware lag is active.
	   If @port_sel_mode is "hash" returns the active egress ports.
	   The hash result will select only active ports.
	   if @port_sel_mode is "queue_affinity" returns the mapping
	   between the configured port affinity of the QP/SQ and actual
	   egress port. For example:
	   * 1:1 - Mapping means if the configured affinity is port 1
	           traffic will egress via port 1.
	   * 1:2 - Mapping means if the configured affinity is port 1
		   traffic will egress via port 2. This can happen
		   if port 1 is down or in active/backup mode and port 1
		   is backup.
Signed-off-by: NMark Bloch <mbloch@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

7f46a0b7

net/mlx5: Support devices with more than 2 ports · 4cd14d44

由 Mark Bloch 提交于 3月 01, 2022

Increase the define MLX5_MAX_PORTS to 4 as the driver is ready
to support NICs with 4 ports.
Signed-off-by: NMark Bloch <mbloch@nvidia.com>
Reviewed-by: NMaor Gottlieb <maorg@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

4cd14d44

net/mlx5: Lag, expose number of lag ports · 34a30d76

由 Mark Bloch 提交于 3月 01, 2022

Downstream patches will add support for hardware lag with
more than 2 ports. Add a way for users to query the number of lag ports.
Signed-off-by: NMark Bloch <mbloch@nvidia.com>
Reviewed-by: NMaor Gottlieb <maorg@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

34a30d76

net/mlx5: Add exit route when waiting for FW · 8324a02c

由 Gavin Li 提交于 3月 27, 2022

Currently, removing a device needs to get the driver interface lock before
doing any cleanup. If the driver is waiting in a loop for FW init, there
is no way to cancel the wait, instead the device cleanup waits for the
loop to conclude and release the lock.

To allow immediate response to remove device commands, check the TEARDOWN
flag while waiting for FW init, and exit the loop if it has been set.
Signed-off-by: NGavin Li <gavinl@nvidia.com>
Reviewed-by: NMoshe Shemesh <moshe@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

8324a02c

09 4月, 2022 2 次提交

net/mlx5: Remove ipsec_ops function table · f2b41b32

由 Leon Romanovsky 提交于 4月 06, 2022

There is only one IPsec implementation and ipsec_ops is not needed
at all in this situation. Together with removal of ipsec_ops, we can
drop the entry checks as these functions are called for IPsec devices
only.

Link: https://lore.kernel.org/r/bc8dd1c8a77b65dbf5e2cf92c813ffaca2505c5f.1649232994.git.leonro@nvidia.comReviewed-by: NRaed Salem <raeds@nvidia.com>
Signed-off-by: NLeon Romanovsky <leonro@nvidia.com>

f2b41b32

net/mlx5: Remove not-needed IPsec config · 54deb0e7

由 Leon Romanovsky 提交于 4月 06, 2022

In current code, the CONFIG_MLX5_IPSEC and CONFIG_MLX5_EN_IPSEC are
the same. So remove useless indirection.

Link: https://lore.kernel.org/r/fd14492cbc01a0d51a5bfedde02bcd2154123fde.1649232994.git.leonro@nvidia.comReviewed-by: NRaed Salem <raeds@nvidia.com>
Signed-off-by: NLeon Romanovsky <leonro@nvidia.com>

54deb0e7

18 3月, 2022 2 次提交

net/mlx5: Remove unused fill page array API function · 770c9a3a

由 Tariq Toukan 提交于 5月 20, 2021

mlx5_fill_page_array API function is not used.
Remove it, reduce the number of exported functions.
Signed-off-by: NTariq Toukan <tariqt@nvidia.com>
Reviewed-by: NMoshe Shemesh <moshe@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

770c9a3a

net/mlx5: Remove unused exported contiguous coherent buffer allocation API · 4206fe40

由 Tariq Toukan 提交于 4月 27, 2021

All WQ types moved to using the fragmented allocation API
for coherent memory. Contiguous API is not used anymore.
Remove it, reduce the number of exported functions.
Signed-off-by: NTariq Toukan <tariqt@nvidia.com>
Reviewed-by: NMoshe Shemesh <moshe@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

4206fe40

10 3月, 2022 4 次提交

net/mlx5: Add debugfs counters for page commands failures · 32071187

由 Moshe Shemesh 提交于 1月 27, 2022

Add the following new debugfs counters for debug and verbosity:
fw_pages_alloc_failed - number of pages FW requested but driver failed
to allocate.
give_pages_dropped - number of pages given to FW, but command give pages
failed by FW.
reclaim_pages_discard - number of pages which were about to reclaim back
and FW failed the command.
Signed-off-by: NMoshe Shemesh <moshe@nvidia.com>
Reviewed-by: NLeon Romanovsky <leonro@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

32071187

net/mlx5: Add pages debugfs · 4e05cbf0

由 Moshe Shemesh 提交于 1月 27, 2022

Add pages debugfs to expose the following counters for debuggability:
fw_pages_total - How many pages were given to FW and not returned yet.
vfs_pages - For SRIOV, how many pages were given to FW for virtual
functions usage.
host_pf_pages - For ECPF, how many pages were given to FW for external
hosts physical functions usage.
Signed-off-by: NMoshe Shemesh <moshe@nvidia.com>
Reviewed-by: NLeon Romanovsky <leonro@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

4e05cbf0

net/mlx5: Move debugfs entries to separate struct · 66771a1c

由 Moshe Shemesh 提交于 2月 18, 2022

Move the debugfs entry pointers under priv to their own struct.
Add get function for device debugfs root.
Signed-off-by: NMoshe Shemesh <moshe@nvidia.com>
Reviewed-by: NLeon Romanovsky <leonro@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

66771a1c

net/mlx5: Add command failures data to debugfs · 34f46ae0

由 Moshe Shemesh 提交于 1月 27, 2022

Add new counters to command interface debugfs to count command failures.
The following counters added:
total_failed - number of times command failed (any kind of failure).
failed_mbox_status - number of times command failed on bad status
returned by FW.

In addition, add data about last command failure to command interface
debugfs:
last_failed_errno - last command failed returned errno.
last_failed_mbox_status - last bad status returned by FW.
Signed-off-by: NMoshe Shemesh <moshe@nvidia.com>
Reviewed-by: NLeon Romanovsky <leonro@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

34f46ae0

27 2月, 2022 1 次提交

net/mlx5: Expose APIs to get/put the mlx5 core device · 1695b97b

由 Yishai Hadas 提交于 2月 24, 2022

Expose an API to get the mlx5 core device from a given VF PCI device if
mlx5_core is its driver.

Upon the get API we stay with the intf_state_mutex locked to make sure
that the device can't be gone/unloaded till the caller will complete
its job over the device, this expects to be for a short period of time
for any flow that the lock is taken.

Upon the put API we unlock the intf_state_mutex.

The use case for those APIs is the migration flow of a VF over VFIO PCI.
In that case the VF doesn't ride on mlx5_core, because the device is
driving *two* different PCI devices, the PF owned by mlx5_core and the
VF owned by the vfio driver.

The mlx5_core of the PF is accessed only during the narrow window of the
VF's ioctl that requires its services.

This allows the PF driver to be more independent of the VF driver, so
long as it doesn't reset the FW.

Link: https://lore.kernel.org/all/20220224142024.147653-6-yishaih@nvidia.comSigned-off-by: NYishai Hadas <yishaih@nvidia.com>
Signed-off-by: NLeon Romanovsky <leonro@nvidia.com>

1695b97b

24 2月, 2022 5 次提交

net/mlx5: Add clarification on sync reset failure · 45fee8ed

由 Moshe Shemesh 提交于 9月 06, 2021

In case devlink reload action fw_activate failed in sync reset stage,
use the new MFRL field reset_state to find why it failed and share this
clarification with the user.
Signed-off-by: NMoshe Shemesh <moshe@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

45fee8ed

net/mlx5: cmdif, Refactor error handling and reporting of async commands · 0a415276

由 Saeed Mahameed 提交于 3月 31, 2020

Same as the new mlx5_cmd_do API, report all information to callers and
let them handle the error values and outbox parsing.

The user callback status "work->user_callback(status)" is now similar to
the error rc code returned from the blocking mlx5_cmd_do() version,
and now is defined as follows:

 -EREMOTEIO : Command executed by FW, outbox.status != MLX5_CMD_STAT_OK.
              Caller must check FW outbox status.
 0 : Command execution successful,  outbox.status == MLX5_CMD_STAT_OK.
 < 0 : Command couldn't execute, FW or driver induced error.
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

0a415276

net/mlx5: cmdif, Add new api for command execution · f23519e5

由 Saeed Mahameed 提交于 8月 17, 2019

Add mlx5_cmd_do. Unlike mlx5_cmd_exec, this function will not modify
or translate outbox.status.

The function will return:

return = 0: Command was executed, outbox.status == MLX5_CMD_STAT_OK.

return = -EREMOTEIO: Executed, outbox.status != MLX5_CMD_STAT_OK.

return < 0: Command execution couldn't be performed by FW or driver.

And document other mlx5_cmd_exec functions.
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
Reviewed-by: NMoshe Shemesh <moshe@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

f23519e5

net/mlx5: cmdif, cmd_check refactoring · 605bef00

由 Saeed Mahameed 提交于 4月 05, 2020

Do not mangle the command outbox in the internal low level cmd_exec and
cmd_invoke functions.

Instead return a proper unique error code and move the driver error
checking to be at a higher level in mlx5_cmd_exec().
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
Reviewed-by: NMoshe Shemesh <moshe@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

605bef00

mlx5: remove unused static inlines · 43c07595

由 Jakub Kicinski 提交于 1月 26, 2022

mlx5 has some unused static inline helpers in include/
while at it also clean static inlines in the driver itself.
Signed-off-by: NJakub Kicinski <kuba@kernel.org>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

43c07595

openeuler / Kernel 大约 2 年 前同步成功

openeuler / Kernel
大约 2 年前同步成功