1. 08 2月, 2023 2 次提交
  2. 10 1月, 2023 1 次提交
    • M
      net/mlx5: Fix command stats access after free · da2e552b
      Moshe Shemesh 提交于
      Command may fail while driver is reloading and can't accept FW commands
      till command interface is reinitialized. Such command failure is being
      logged to command stats. This results in NULL pointer access as command
      stats structure is being freed and reallocated during mlx5 devlink
      reload (see kernel log below).
      
      Fix it by making command stats statically allocated on driver probe.
      
      Kernel log:
      [ 2394.808802] BUG: unable to handle kernel paging request at 000000000002a9c0
      [ 2394.810610] PGD 0 P4D 0
      [ 2394.811811] Oops: 0002 [#1] SMP NOPTI
      ...
      [ 2394.815482] RIP: 0010:native_queued_spin_lock_slowpath+0x183/0x1d0
      ...
      [ 2394.829505] Call Trace:
      [ 2394.830667]  _raw_spin_lock_irq+0x23/0x26
      [ 2394.831858]  cmd_status_err+0x55/0x110 [mlx5_core]
      [ 2394.833020]  mlx5_access_reg+0xe7/0x150 [mlx5_core]
      [ 2394.834175]  mlx5_query_port_ptys+0x78/0xa0 [mlx5_core]
      [ 2394.835337]  mlx5e_ethtool_get_link_ksettings+0x74/0x590 [mlx5_core]
      [ 2394.836454]  ? kmem_cache_alloc_trace+0x140/0x1c0
      [ 2394.837562]  __rh_call_get_link_ksettings+0x33/0x100
      [ 2394.838663]  ? __rtnl_unlock+0x25/0x50
      [ 2394.839755]  __ethtool_get_link_ksettings+0x72/0x150
      [ 2394.840862]  duplex_show+0x6e/0xc0
      [ 2394.841963]  dev_attr_show+0x1c/0x40
      [ 2394.843048]  sysfs_kf_seq_show+0x9b/0x100
      [ 2394.844123]  seq_read+0x153/0x410
      [ 2394.845187]  vfs_read+0x91/0x140
      [ 2394.846226]  ksys_read+0x4f/0xb0
      [ 2394.847234]  do_syscall_64+0x5b/0x1a0
      [ 2394.848228]  entry_SYSCALL_64_after_hwframe+0x65/0xca
      
      Fixes: 34f46ae0 ("net/mlx5: Add command failures data to debugfs")
      Signed-off-by: NMoshe Shemesh <moshe@nvidia.com>
      Reviewed-by: NShay Drory <shayd@nvidia.com>
      Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>
      da2e552b
  3. 30 11月, 2022 1 次提交
  4. 22 11月, 2022 1 次提交
  5. 28 10月, 2022 1 次提交
    • T
      net/mlx5: Fix possible use-after-free in async command interface · bacd22df
      Tariq Toukan 提交于
      mlx5_cmd_cleanup_async_ctx should return only after all its callback
      handlers were completed. Before this patch, the below race between
      mlx5_cmd_cleanup_async_ctx and mlx5_cmd_exec_cb_handler was possible and
      lead to a use-after-free:
      
      1. mlx5_cmd_cleanup_async_ctx is called while num_inflight is 2 (i.e.
         elevated by 1, a single inflight callback).
      2. mlx5_cmd_cleanup_async_ctx decreases num_inflight to 1.
      3. mlx5_cmd_exec_cb_handler is called, decreases num_inflight to 0 and
         is about to call wake_up().
      4. mlx5_cmd_cleanup_async_ctx calls wait_event, which returns
         immediately as the condition (num_inflight == 0) holds.
      5. mlx5_cmd_cleanup_async_ctx returns.
      6. The caller of mlx5_cmd_cleanup_async_ctx frees the mlx5_async_ctx
         object.
      7. mlx5_cmd_exec_cb_handler goes on and calls wake_up() on the freed
         object.
      
      Fix it by syncing using a completion object. Mark it completed when
      num_inflight reaches 0.
      
      Trace:
      
      BUG: KASAN: use-after-free in do_raw_spin_lock+0x23d/0x270
      Read of size 4 at addr ffff888139cd12f4 by task swapper/5/0
      
      CPU: 5 PID: 0 Comm: swapper/5 Not tainted 6.0.0-rc3_for_upstream_debug_2022_08_30_13_10 #1
      Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
      Call Trace:
       <IRQ>
       dump_stack_lvl+0x57/0x7d
       print_report.cold+0x2d5/0x684
       ? do_raw_spin_lock+0x23d/0x270
       kasan_report+0xb1/0x1a0
       ? do_raw_spin_lock+0x23d/0x270
       do_raw_spin_lock+0x23d/0x270
       ? rwlock_bug.part.0+0x90/0x90
       ? __delete_object+0xb8/0x100
       ? lock_downgrade+0x6e0/0x6e0
       _raw_spin_lock_irqsave+0x43/0x60
       ? __wake_up_common_lock+0xb9/0x140
       __wake_up_common_lock+0xb9/0x140
       ? __wake_up_common+0x650/0x650
       ? destroy_tis_callback+0x53/0x70 [mlx5_core]
       ? kasan_set_track+0x21/0x30
       ? destroy_tis_callback+0x53/0x70 [mlx5_core]
       ? kfree+0x1ba/0x520
       ? do_raw_spin_unlock+0x54/0x220
       mlx5_cmd_exec_cb_handler+0x136/0x1a0 [mlx5_core]
       ? mlx5_cmd_cleanup_async_ctx+0x220/0x220 [mlx5_core]
       ? mlx5_cmd_cleanup_async_ctx+0x220/0x220 [mlx5_core]
       mlx5_cmd_comp_handler+0x65a/0x12b0 [mlx5_core]
       ? dump_command+0xcc0/0xcc0 [mlx5_core]
       ? lockdep_hardirqs_on_prepare+0x400/0x400
       ? cmd_comp_notifier+0x7e/0xb0 [mlx5_core]
       cmd_comp_notifier+0x7e/0xb0 [mlx5_core]
       atomic_notifier_call_chain+0xd7/0x1d0
       mlx5_eq_async_int+0x3ce/0xa20 [mlx5_core]
       atomic_notifier_call_chain+0xd7/0x1d0
       ? irq_release+0x140/0x140 [mlx5_core]
       irq_int_handler+0x19/0x30 [mlx5_core]
       __handle_irq_event_percpu+0x1f2/0x620
       handle_irq_event+0xb2/0x1d0
       handle_edge_irq+0x21e/0xb00
       __common_interrupt+0x79/0x1a0
       common_interrupt+0x78/0xa0
       </IRQ>
       <TASK>
       asm_common_interrupt+0x22/0x40
      RIP: 0010:default_idle+0x42/0x60
      Code: c1 83 e0 07 48 c1 e9 03 83 c0 03 0f b6 14 11 38 d0 7c 04 84 d2 75 14 8b 05 eb 47 22 02 85 c0 7e 07 0f 00 2d e0 9f 48 00 fb f4 <c3> 48 c7 c7 80 08 7f 85 e8 d1 d3 3e fe eb de 66 66 2e 0f 1f 84 00
      RSP: 0018:ffff888100dbfdf0 EFLAGS: 00000242
      RAX: 0000000000000001 RBX: ffffffff84ecbd48 RCX: 1ffffffff0afe110
      RDX: 0000000000000004 RSI: 0000000000000000 RDI: ffffffff835cc9bc
      RBP: 0000000000000005 R08: 0000000000000001 R09: ffff88881dec4ac3
      R10: ffffed1103bd8958 R11: 0000017d0ca571c9 R12: 0000000000000005
      R13: ffffffff84f024e0 R14: 0000000000000000 R15: dffffc0000000000
       ? default_idle_call+0xcc/0x450
       default_idle_call+0xec/0x450
       do_idle+0x394/0x450
       ? arch_cpu_idle_exit+0x40/0x40
       ? do_idle+0x17/0x450
       cpu_startup_entry+0x19/0x20
       start_secondary+0x221/0x2b0
       ? set_cpu_sibling_map+0x2070/0x2070
       secondary_startup_64_no_verify+0xcd/0xdb
       </TASK>
      
      Allocated by task 49502:
       kasan_save_stack+0x1e/0x40
       __kasan_kmalloc+0x81/0xa0
       kvmalloc_node+0x48/0xe0
       mlx5e_bulk_async_init+0x35/0x110 [mlx5_core]
       mlx5e_tls_priv_tx_list_cleanup+0x84/0x3e0 [mlx5_core]
       mlx5e_ktls_cleanup_tx+0x38f/0x760 [mlx5_core]
       mlx5e_cleanup_nic_tx+0xa7/0x100 [mlx5_core]
       mlx5e_detach_netdev+0x1ca/0x2b0 [mlx5_core]
       mlx5e_suspend+0xdb/0x140 [mlx5_core]
       mlx5e_remove+0x89/0x190 [mlx5_core]
       auxiliary_bus_remove+0x52/0x70
       device_release_driver_internal+0x40f/0x650
       driver_detach+0xc1/0x180
       bus_remove_driver+0x125/0x2f0
       auxiliary_driver_unregister+0x16/0x50
       mlx5e_cleanup+0x26/0x30 [mlx5_core]
       cleanup+0xc/0x4e [mlx5_core]
       __x64_sys_delete_module+0x2b5/0x450
       do_syscall_64+0x3d/0x90
       entry_SYSCALL_64_after_hwframe+0x46/0xb0
      
      Freed by task 49502:
       kasan_save_stack+0x1e/0x40
       kasan_set_track+0x21/0x30
       kasan_set_free_info+0x20/0x30
       ____kasan_slab_free+0x11d/0x1b0
       kfree+0x1ba/0x520
       mlx5e_tls_priv_tx_list_cleanup+0x2e7/0x3e0 [mlx5_core]
       mlx5e_ktls_cleanup_tx+0x38f/0x760 [mlx5_core]
       mlx5e_cleanup_nic_tx+0xa7/0x100 [mlx5_core]
       mlx5e_detach_netdev+0x1ca/0x2b0 [mlx5_core]
       mlx5e_suspend+0xdb/0x140 [mlx5_core]
       mlx5e_remove+0x89/0x190 [mlx5_core]
       auxiliary_bus_remove+0x52/0x70
       device_release_driver_internal+0x40f/0x650
       driver_detach+0xc1/0x180
       bus_remove_driver+0x125/0x2f0
       auxiliary_driver_unregister+0x16/0x50
       mlx5e_cleanup+0x26/0x30 [mlx5_core]
       cleanup+0xc/0x4e [mlx5_core]
       __x64_sys_delete_module+0x2b5/0x450
       do_syscall_64+0x3d/0x90
       entry_SYSCALL_64_after_hwframe+0x46/0xb0
      
      Fixes: e355477e ("net/mlx5: Make mlx5_cmd_exec_cb() a safe API")
      Signed-off-by: NTariq Toukan <tariqt@nvidia.com>
      Reviewed-by: NMoshe Shemesh <moshe@nvidia.com>
      Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>
      Link: https://lore.kernel.org/r/20221026135153.154807-8-saeed@kernel.orgSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      bacd22df
  6. 04 10月, 2022 2 次提交
  7. 28 9月, 2022 4 次提交
  8. 05 9月, 2022 2 次提交
  9. 23 8月, 2022 1 次提交
    • M
      net/mlx5: Avoid false positive lockdep warning by adding lock_class_key · d59b73a6
      Moshe Shemesh 提交于
      Add a lock_class_key per mlx5 device to avoid a false positive
      "possible circular locking dependency" warning by lockdep, on flows
      which lock more than one mlx5 device, such as adding SF.
      
      kernel log:
       ======================================================
       WARNING: possible circular locking dependency detected
       5.19.0-rc8+ #2 Not tainted
       ------------------------------------------------------
       kworker/u20:0/8 is trying to acquire lock:
       ffff88812dfe0d98 (&dev->intf_state_mutex){+.+.}-{3:3}, at: mlx5_init_one+0x2e/0x490 [mlx5_core]
      
       but task is already holding lock:
       ffff888101aa7898 (&(&notifier->n_head)->rwsem){++++}-{3:3}, at: blocking_notifier_call_chain+0x5a/0x130
      
       which lock already depends on the new lock.
      
       the existing dependency chain (in reverse order) is:
      
       -> #1 (&(&notifier->n_head)->rwsem){++++}-{3:3}:
              down_write+0x90/0x150
              blocking_notifier_chain_register+0x53/0xa0
              mlx5_sf_table_init+0x369/0x4a0 [mlx5_core]
              mlx5_init_one+0x261/0x490 [mlx5_core]
              probe_one+0x430/0x680 [mlx5_core]
              local_pci_probe+0xd6/0x170
              work_for_cpu_fn+0x4e/0xa0
              process_one_work+0x7c2/0x1340
              worker_thread+0x6f6/0xec0
              kthread+0x28f/0x330
              ret_from_fork+0x1f/0x30
      
       -> #0 (&dev->intf_state_mutex){+.+.}-{3:3}:
              __lock_acquire+0x2fc7/0x6720
              lock_acquire+0x1c1/0x550
              __mutex_lock+0x12c/0x14b0
              mlx5_init_one+0x2e/0x490 [mlx5_core]
              mlx5_sf_dev_probe+0x29c/0x370 [mlx5_core]
              auxiliary_bus_probe+0x9d/0xe0
              really_probe+0x1e0/0xaa0
              __driver_probe_device+0x219/0x480
              driver_probe_device+0x49/0x130
              __device_attach_driver+0x1b8/0x280
              bus_for_each_drv+0x123/0x1a0
              __device_attach+0x1a3/0x460
              bus_probe_device+0x1a2/0x260
              device_add+0x9b1/0x1b40
              __auxiliary_device_add+0x88/0xc0
              mlx5_sf_dev_state_change_handler+0x67e/0x9d0 [mlx5_core]
              blocking_notifier_call_chain+0xd5/0x130
              mlx5_vhca_state_work_handler+0x2b0/0x3f0 [mlx5_core]
              process_one_work+0x7c2/0x1340
              worker_thread+0x59d/0xec0
              kthread+0x28f/0x330
              ret_from_fork+0x1f/0x30
      
        other info that might help us debug this:
      
        Possible unsafe locking scenario:
      
              CPU0                    CPU1
              ----                    ----
         lock(&(&notifier->n_head)->rwsem);
                                      lock(&dev->intf_state_mutex);
                                      lock(&(&notifier->n_head)->rwsem);
         lock(&dev->intf_state_mutex);
      
        *** DEADLOCK ***
      
       4 locks held by kworker/u20:0/8:
        #0: ffff888150612938 ((wq_completion)mlx5_events){+.+.}-{0:0}, at: process_one_work+0x6e2/0x1340
        #1: ffff888100cafdb8 ((work_completion)(&work->work)#3){+.+.}-{0:0}, at: process_one_work+0x70f/0x1340
        #2: ffff888101aa7898 (&(&notifier->n_head)->rwsem){++++}-{3:3}, at: blocking_notifier_call_chain+0x5a/0x130
        #3: ffff88813682d0e8 (&dev->mutex){....}-{3:3}, at:__device_attach+0x76/0x460
      
       stack backtrace:
       CPU: 6 PID: 8 Comm: kworker/u20:0 Not tainted 5.19.0-rc8+
       Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
       Workqueue: mlx5_events mlx5_vhca_state_work_handler [mlx5_core]
       Call Trace:
        <TASK>
        dump_stack_lvl+0x57/0x7d
        check_noncircular+0x278/0x300
        ? print_circular_bug+0x460/0x460
        ? lock_chain_count+0x20/0x20
        ? register_lock_class+0x1880/0x1880
        __lock_acquire+0x2fc7/0x6720
        ? register_lock_class+0x1880/0x1880
        ? register_lock_class+0x1880/0x1880
        lock_acquire+0x1c1/0x550
        ? mlx5_init_one+0x2e/0x490 [mlx5_core]
        ? lockdep_hardirqs_on_prepare+0x400/0x400
        __mutex_lock+0x12c/0x14b0
        ? mlx5_init_one+0x2e/0x490 [mlx5_core]
        ? mlx5_init_one+0x2e/0x490 [mlx5_core]
        ? _raw_read_unlock+0x1f/0x30
        ? mutex_lock_io_nested+0x1320/0x1320
        ? __ioremap_caller.constprop.0+0x306/0x490
        ? mlx5_sf_dev_probe+0x269/0x370 [mlx5_core]
        ? iounmap+0x160/0x160
        mlx5_init_one+0x2e/0x490 [mlx5_core]
        mlx5_sf_dev_probe+0x29c/0x370 [mlx5_core]
        ? mlx5_sf_dev_remove+0x130/0x130 [mlx5_core]
        auxiliary_bus_probe+0x9d/0xe0
        really_probe+0x1e0/0xaa0
        __driver_probe_device+0x219/0x480
        ? auxiliary_match_id+0xe9/0x140
        driver_probe_device+0x49/0x130
        __device_attach_driver+0x1b8/0x280
        ? driver_allows_async_probing+0x140/0x140
        bus_for_each_drv+0x123/0x1a0
        ? bus_for_each_dev+0x1a0/0x1a0
        ? lockdep_hardirqs_on_prepare+0x286/0x400
        ? trace_hardirqs_on+0x2d/0x100
        __device_attach+0x1a3/0x460
        ? device_driver_attach+0x1e0/0x1e0
        ? kobject_uevent_env+0x22d/0xf10
        bus_probe_device+0x1a2/0x260
        device_add+0x9b1/0x1b40
        ? dev_set_name+0xab/0xe0
        ? __fw_devlink_link_to_suppliers+0x260/0x260
        ? memset+0x20/0x40
        ? lockdep_init_map_type+0x21a/0x7d0
        __auxiliary_device_add+0x88/0xc0
        ? auxiliary_device_init+0x86/0xa0
        mlx5_sf_dev_state_change_handler+0x67e/0x9d0 [mlx5_core]
        blocking_notifier_call_chain+0xd5/0x130
        mlx5_vhca_state_work_handler+0x2b0/0x3f0 [mlx5_core]
        ? mlx5_vhca_event_arm+0x100/0x100 [mlx5_core]
        ? lock_downgrade+0x6e0/0x6e0
        ? lockdep_hardirqs_on_prepare+0x286/0x400
        process_one_work+0x7c2/0x1340
        ? lockdep_hardirqs_on_prepare+0x400/0x400
        ? pwq_dec_nr_in_flight+0x230/0x230
        ? rwlock_bug.part.0+0x90/0x90
        worker_thread+0x59d/0xec0
        ? process_one_work+0x1340/0x1340
        kthread+0x28f/0x330
        ? kthread_complete_and_exit+0x20/0x20
        ret_from_fork+0x1f/0x30
        </TASK>
      
      Fixes: 6a327321 ("net/mlx5: SF, Port function state change support")
      Signed-off-by: NMoshe Shemesh <moshe@nvidia.com>
      Reviewed-by: NShay Drory <shayd@nvidia.com>
      Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>
      d59b73a6
  10. 28 7月, 2022 1 次提交
  11. 14 7月, 2022 1 次提交
    • Y
      net/mlx5: Use software VHCA id when it's supported · dc402ccc
      Yishai Hadas 提交于
      Use software VHCA id when it's supported by the firmware.
      
      A unique id is allocated upon mlx5_mdev_init() and freed upon
      mlx5_mdev_uninit(), as such it stays the same during the full life cycle
      of the device including upon health recovery if occurred.
      
      The conjunction of sw_vhca_id with sw_owner_id will be a global unique
      id per function which uses mlx5_core.
      
      The sw_vhca_id is set upon init_hca command and is used to specify the
      VHCA that the NIC vport is affiliated with.
      
      This functionality is needed upon migration of VM which is MPV based.
      (i.e. multi port device).
      Signed-off-by: NYishai Hadas <yishaih@nvidia.com>
      Reviewed-by: NMark Bloch <mbloch@nvidia.com>
      Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>
      dc402ccc
  12. 12 7月, 2022 1 次提交
    • M
      net/mlx5: Use devl_ API in mlx5e_devlink_port_register · 7b19119f
      Moshe Shemesh 提交于
      As part of the flows invoked by mlx5_devlink_eswitch_mode_set() get to
      mlx5_rescan_drivers_locked() which can call mlx5e_probe()/mlx5e_remove
      and register/unregister mlx5e driver ports accordingly. This can lead to
      deadlock once mlx5_devlink_eswitch_mode_set() will use devlink lock.
      Use devl_port_register/unregister() instead of
      devlink_port_register/unregister() and add devlink instance locks in the
      driver paths to this function to have it locked while calling devl_ API
      function.
      
      If remove or probe were called by module init or module cleanup flows,
      need to lock devlink just before calling devl_port_register(), otherwise
      it is called by attach/detach or register/unregister flow and we can
      have the flow locked. Added flag to distinguish between these cases.
      
      This will be used by the downstream patch to invoke
      mlx5_devlink_eswitch_mode_set() with devlink locked.
      Signed-off-by: NMoshe Shemesh <moshe@nvidia.com>
      Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      7b19119f
  13. 14 6月, 2022 1 次提交
  14. 18 5月, 2022 2 次提交
  15. 10 5月, 2022 5 次提交
    • Y
      net/mlx5: Expose mlx5_sriov_blocking_notifier_register / unregister APIs · 846e4373
      Yishai Hadas 提交于
      Expose mlx5_sriov_blocking_notifier_register / unregister APIs to let a
      VF register to be notified for its enablement / disablement by the PF.
      
      Upon VF probe it will call mlx5_sriov_blocking_notifier_register() with
      its notifier block and upon VF remove it will call
      mlx5_sriov_blocking_notifier_unregister() to drop its registration.
      
      This can give a VF the ability to clean some resources upon disable
      before that the command interface goes down and on the other hand sets
      some stuff before that it's enabled.
      
      This may be used by a VF which is migration capable in few cases.(e.g.
      PF load/unload upon an health recovery).
      
      Link: https://lore.kernel.org/r/20220510090206.90374-2-yishaih@nvidia.comSigned-off-by: NYishai Hadas <yishaih@nvidia.com>
      Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>
      Signed-off-by: NLeon Romanovsky <leonro@nvidia.com>
      846e4373
    • M
      net/mlx5: Lag, add debugfs to query hardware lag state · 7f46a0b7
      Mark Bloch 提交于
      Lag state has become very complicated with many modes, flags, types and
      port selections methods and future work will add additional features.
      
      Add a debugfs to query the current lag state. A new directory named "lag"
      will be created under the mlx5 debugfs directory. As the driver has
      debugfs per pci function the location will be: <debugfs>/mlx5/<BDF>/lag
      
      For example:
      /sys/kernel/debug/mlx5/0000:08:00.0/lag
      
      The following files are exposed:
      
      - state: Returns "active" or "disabled". If "active" it means hardware
               lag is active.
      
      - members: Returns the BDFs of all the members of lag object.
      
      - type: Returns the type of the lag currently configured. Valid only
      	if hardware lag is active.
      	* "roce" - Members are bare metal PFs.
      	* "switchdev" - Members are in switchdev mode.
      	* "multipath" - ECMP offloads.
      
      - port_sel_mode: Returns the egress port selection method, valid
      		 only if hardware lag is active.
      		 * "queue_affinity" - Egress port is selected by
      		   the QP/SQ affinity.
      		 * "hash" - Egress port is selected by hash done on
      		   each packet. Controlled by: xmit_hash_policy of the
      		   bond device.
      - flags: Returns flags that are specific per lag @type. Valid only if
      	 hardware lag is active.
      	 * "shared_fdb" - "on" or "off", if "on" single FDB is used.
      
      - mapping: Returns the mapping which is used to select egress port.
      	   Valid only if hardware lag is active.
      	   If @port_sel_mode is "hash" returns the active egress ports.
      	   The hash result will select only active ports.
      	   if @port_sel_mode is "queue_affinity" returns the mapping
      	   between the configured port affinity of the QP/SQ and actual
      	   egress port. For example:
      	   * 1:1 - Mapping means if the configured affinity is port 1
      	           traffic will egress via port 1.
      	   * 1:2 - Mapping means if the configured affinity is port 1
      		   traffic will egress via port 2. This can happen
      		   if port 1 is down or in active/backup mode and port 1
      		   is backup.
      Signed-off-by: NMark Bloch <mbloch@nvidia.com>
      Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>
      7f46a0b7
    • M
      net/mlx5: Support devices with more than 2 ports · 4cd14d44
      Mark Bloch 提交于
      Increase the define MLX5_MAX_PORTS to 4 as the driver is ready
      to support NICs with 4 ports.
      Signed-off-by: NMark Bloch <mbloch@nvidia.com>
      Reviewed-by: NMaor Gottlieb <maorg@nvidia.com>
      Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>
      4cd14d44
    • M
      net/mlx5: Lag, expose number of lag ports · 34a30d76
      Mark Bloch 提交于
      Downstream patches will add support for hardware lag with
      more than 2 ports. Add a way for users to query the number of lag ports.
      Signed-off-by: NMark Bloch <mbloch@nvidia.com>
      Reviewed-by: NMaor Gottlieb <maorg@nvidia.com>
      Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>
      34a30d76
    • G
      net/mlx5: Add exit route when waiting for FW · 8324a02c
      Gavin Li 提交于
      Currently, removing a device needs to get the driver interface lock before
      doing any cleanup. If the driver is waiting in a loop for FW init, there
      is no way to cancel the wait, instead the device cleanup waits for the
      loop to conclude and release the lock.
      
      To allow immediate response to remove device commands, check the TEARDOWN
      flag while waiting for FW init, and exit the loop if it has been set.
      Signed-off-by: NGavin Li <gavinl@nvidia.com>
      Reviewed-by: NMoshe Shemesh <moshe@nvidia.com>
      Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>
      8324a02c
  16. 09 4月, 2022 2 次提交
  17. 18 3月, 2022 2 次提交
  18. 10 3月, 2022 4 次提交
  19. 27 2月, 2022 1 次提交
    • Y
      net/mlx5: Expose APIs to get/put the mlx5 core device · 1695b97b
      Yishai Hadas 提交于
      Expose an API to get the mlx5 core device from a given VF PCI device if
      mlx5_core is its driver.
      
      Upon the get API we stay with the intf_state_mutex locked to make sure
      that the device can't be gone/unloaded till the caller will complete
      its job over the device, this expects to be for a short period of time
      for any flow that the lock is taken.
      
      Upon the put API we unlock the intf_state_mutex.
      
      The use case for those APIs is the migration flow of a VF over VFIO PCI.
      In that case the VF doesn't ride on mlx5_core, because the device is
      driving *two* different PCI devices, the PF owned by mlx5_core and the
      VF owned by the vfio driver.
      
      The mlx5_core of the PF is accessed only during the narrow window of the
      VF's ioctl that requires its services.
      
      This allows the PF driver to be more independent of the VF driver, so
      long as it doesn't reset the FW.
      
      Link: https://lore.kernel.org/all/20220224142024.147653-6-yishaih@nvidia.comSigned-off-by: NYishai Hadas <yishaih@nvidia.com>
      Signed-off-by: NLeon Romanovsky <leonro@nvidia.com>
      1695b97b
  20. 24 2月, 2022 5 次提交