1. 06 5月, 2022 10 次提交
  2. 27 4月, 2022 4 次提交
    • J
      ice: fix use-after-free when deinitializing mailbox snapshot · b668f4cd
      Jacob Keller 提交于
      During ice_sriov_configure, if num_vfs is 0, we are being asked by the
      kernel to remove all VFs.
      
      The driver first de-initializes the snapshot before freeing all the VFs.
      This results in a use-after-free BUG detected by KASAN. The bug occurs
      because the snapshot can still be accessed until all VFs are removed.
      
      Fix this by freeing all the VFs first before calling
      ice_mbx_deinit_snapshot.
      
      [  +0.032591] ==================================================================
      [  +0.000021] BUG: KASAN: use-after-free in ice_mbx_vf_state_handler+0x1c3/0x410 [ice]
      [  +0.000315] Write of size 28 at addr ffff889908eb6f28 by task kworker/55:2/1530996
      
      [  +0.000029] CPU: 55 PID: 1530996 Comm: kworker/55:2 Kdump: loaded Tainted: G S        I       5.17.0-dirty #1
      [  +0.000022] Hardware name: Dell Inc. PowerEdge R740/0923K0, BIOS 1.6.13 12/17/2018
      [  +0.000013] Workqueue: ice ice_service_task [ice]
      [  +0.000279] Call Trace:
      [  +0.000012]  <TASK>
      [  +0.000011]  dump_stack_lvl+0x33/0x42
      [  +0.000030]  print_report.cold.13+0xb2/0x6b3
      [  +0.000028]  ? ice_mbx_vf_state_handler+0x1c3/0x410 [ice]
      [  +0.000295]  kasan_report+0xa5/0x120
      [  +0.000026]  ? __switch_to_asm+0x21/0x70
      [  +0.000024]  ? ice_mbx_vf_state_handler+0x1c3/0x410 [ice]
      [  +0.000298]  kasan_check_range+0x183/0x1e0
      [  +0.000019]  memset+0x1f/0x40
      [  +0.000018]  ice_mbx_vf_state_handler+0x1c3/0x410 [ice]
      [  +0.000304]  ? ice_conv_link_speed_to_virtchnl+0x160/0x160 [ice]
      [  +0.000297]  ? ice_vsi_dis_spoofchk+0x40/0x40 [ice]
      [  +0.000305]  ice_is_malicious_vf+0x1aa/0x250 [ice]
      [  +0.000303]  ? ice_restore_all_vfs_msi_state+0x160/0x160 [ice]
      [  +0.000297]  ? __mutex_unlock_slowpath.isra.15+0x410/0x410
      [  +0.000022]  ? ice_debug_cq+0xb7/0x230 [ice]
      [  +0.000273]  ? __kasan_slab_alloc+0x2f/0x90
      [  +0.000022]  ? memset+0x1f/0x40
      [  +0.000017]  ? do_raw_spin_lock+0x119/0x1d0
      [  +0.000022]  ? rwlock_bug.part.2+0x60/0x60
      [  +0.000024]  __ice_clean_ctrlq+0x3a6/0xd60 [ice]
      [  +0.000273]  ? newidle_balance+0x5b1/0x700
      [  +0.000026]  ? ice_print_link_msg+0x2f0/0x2f0 [ice]
      [  +0.000271]  ? update_cfs_group+0x1b/0x140
      [  +0.000018]  ? load_balance+0x1260/0x1260
      [  +0.000022]  ? ice_process_vflr_event+0x27/0x130 [ice]
      [  +0.000301]  ice_service_task+0x136e/0x1470 [ice]
      [  +0.000281]  process_one_work+0x3b4/0x6c0
      [  +0.000030]  worker_thread+0x65/0x660
      [  +0.000023]  ? __kthread_parkme+0xe4/0x100
      [  +0.000021]  ? process_one_work+0x6c0/0x6c0
      [  +0.000020]  kthread+0x179/0x1b0
      [  +0.000018]  ? kthread_complete_and_exit+0x20/0x20
      [  +0.000022]  ret_from_fork+0x22/0x30
      [  +0.000026]  </TASK>
      
      [  +0.000018] Allocated by task 10742:
      [  +0.000013]  kasan_save_stack+0x1c/0x40
      [  +0.000018]  __kasan_kmalloc+0x84/0xa0
      [  +0.000016]  kmem_cache_alloc_trace+0x16c/0x2e0
      [  +0.000015]  intel_iommu_probe_device+0xeb/0x860
      [  +0.000015]  __iommu_probe_device+0x9a/0x2f0
      [  +0.000016]  iommu_probe_device+0x43/0x270
      [  +0.000015]  iommu_bus_notifier+0xa7/0xd0
      [  +0.000015]  blocking_notifier_call_chain+0x90/0xc0
      [  +0.000017]  device_add+0x5f3/0xd70
      [  +0.000014]  pci_device_add+0x404/0xa40
      [  +0.000015]  pci_iov_add_virtfn+0x3b0/0x550
      [  +0.000016]  sriov_enable+0x3bb/0x600
      [  +0.000013]  ice_ena_vfs+0x113/0xa79 [ice]
      [  +0.000293]  ice_sriov_configure.cold.17+0x21/0xe0 [ice]
      [  +0.000291]  sriov_numvfs_store+0x160/0x200
      [  +0.000015]  kernfs_fop_write_iter+0x1db/0x270
      [  +0.000018]  new_sync_write+0x21d/0x330
      [  +0.000013]  vfs_write+0x376/0x410
      [  +0.000013]  ksys_write+0xba/0x150
      [  +0.000012]  do_syscall_64+0x3a/0x80
      [  +0.000012]  entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      [  +0.000028] Freed by task 10742:
      [  +0.000011]  kasan_save_stack+0x1c/0x40
      [  +0.000015]  kasan_set_track+0x21/0x30
      [  +0.000016]  kasan_set_free_info+0x20/0x30
      [  +0.000012]  __kasan_slab_free+0x104/0x170
      [  +0.000016]  kfree+0x9b/0x470
      [  +0.000013]  devres_destroy+0x1c/0x20
      [  +0.000015]  devm_kfree+0x33/0x40
      [  +0.000012]  ice_mbx_deinit_snapshot+0x39/0x70 [ice]
      [  +0.000295]  ice_sriov_configure+0xb0/0x260 [ice]
      [  +0.000295]  sriov_numvfs_store+0x1bc/0x200
      [  +0.000015]  kernfs_fop_write_iter+0x1db/0x270
      [  +0.000016]  new_sync_write+0x21d/0x330
      [  +0.000012]  vfs_write+0x376/0x410
      [  +0.000012]  ksys_write+0xba/0x150
      [  +0.000012]  do_syscall_64+0x3a/0x80
      [  +0.000012]  entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      [  +0.000024] Last potentially related work creation:
      [  +0.000010]  kasan_save_stack+0x1c/0x40
      [  +0.000016]  __kasan_record_aux_stack+0x98/0xa0
      [  +0.000013]  insert_work+0x34/0x160
      [  +0.000015]  __queue_work+0x20e/0x650
      [  +0.000016]  queue_work_on+0x4c/0x60
      [  +0.000015]  nf_nat_masq_schedule+0x297/0x2e0 [nf_nat]
      [  +0.000034]  masq_device_event+0x5a/0x60 [nf_nat]
      [  +0.000031]  raw_notifier_call_chain+0x5f/0x80
      [  +0.000017]  dev_close_many+0x1d6/0x2c0
      [  +0.000015]  unregister_netdevice_many+0x4e3/0xa30
      [  +0.000015]  unregister_netdevice_queue+0x192/0x1d0
      [  +0.000014]  iavf_remove+0x8f9/0x930 [iavf]
      [  +0.000058]  pci_device_remove+0x65/0x110
      [  +0.000015]  device_release_driver_internal+0xf8/0x190
      [  +0.000017]  pci_stop_bus_device+0xb5/0xf0
      [  +0.000014]  pci_stop_and_remove_bus_device+0xe/0x20
      [  +0.000016]  pci_iov_remove_virtfn+0x19c/0x230
      [  +0.000015]  sriov_disable+0x4f/0x170
      [  +0.000014]  ice_free_vfs+0x9a/0x490 [ice]
      [  +0.000306]  ice_sriov_configure+0xb8/0x260 [ice]
      [  +0.000294]  sriov_numvfs_store+0x1bc/0x200
      [  +0.000015]  kernfs_fop_write_iter+0x1db/0x270
      [  +0.000016]  new_sync_write+0x21d/0x330
      [  +0.000012]  vfs_write+0x376/0x410
      [  +0.000012]  ksys_write+0xba/0x150
      [  +0.000012]  do_syscall_64+0x3a/0x80
      [  +0.000012]  entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      [  +0.000025] The buggy address belongs to the object at ffff889908eb6f00
                     which belongs to the cache kmalloc-96 of size 96
      [  +0.000016] The buggy address is located 40 bytes inside of
                     96-byte region [ffff889908eb6f00, ffff889908eb6f60)
      
      [  +0.000026] The buggy address belongs to the physical page:
      [  +0.000010] page:00000000b7e99a2e refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1908eb6
      [  +0.000016] flags: 0x57ffffc0000200(slab|node=1|zone=2|lastcpupid=0x1fffff)
      [  +0.000024] raw: 0057ffffc0000200 ffffea0069d9fd80 dead000000000002 ffff88810004c780
      [  +0.000015] raw: 0000000000000000 0000000000200020 00000001ffffffff 0000000000000000
      [  +0.000009] page dumped because: kasan: bad access detected
      
      [  +0.000016] Memory state around the buggy address:
      [  +0.000012]  ffff889908eb6e00: fa fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
      [  +0.000014]  ffff889908eb6e80: fa fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
      [  +0.000014] >ffff889908eb6f00: fa fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
      [  +0.000011]                                   ^
      [  +0.000013]  ffff889908eb6f80: fa fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
      [  +0.000013]  ffff889908eb7000: fa fb fb fb fb fb fb fb fc fc fc fc fa fb fb fb
      [  +0.000012] ==================================================================
      
      Fixes: 0891c896 ("ice: warn about potentially malicious VFs")
      Reported-by: NSlawomir Laba <slawomirx.laba@intel.com>
      Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
      Tested-by: NKonrad Jankowski <konrad0.jankowski@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      b668f4cd
    • P
      ice: wait 5 s for EMP reset after firmware flash · b537752e
      Petr Oros 提交于
      We need to wait 5 s for EMP reset after firmware flash. Code was extracted
      from OOT driver (ice v1.8.3 downloaded from sourceforge). Without this
      wait, fw_activate let card in inconsistent state and recoverable only
      by second flash/activate. Flash was tested on these fw's:
      From -> To
       3.00 -> 3.10/3.20
       3.10 -> 3.00/3.20
       3.20 -> 3.00/3.10
      
      Reproducer:
      [root@host ~]# devlink dev flash pci/0000:ca:00.0 file E810_XXVDA4_FH_O_SEC_FW_1p6p1p9_NVM_3p10_PLDMoMCTP_0.11_8000AD7B.bin
      Preparing to flash
      [fw.mgmt] Erasing
      [fw.mgmt] Erasing done
      [fw.mgmt] Flashing 100%
      [fw.mgmt] Flashing done 100%
      [fw.undi] Erasing
      [fw.undi] Erasing done
      [fw.undi] Flashing 100%
      [fw.undi] Flashing done 100%
      [fw.netlist] Erasing
      [fw.netlist] Erasing done
      [fw.netlist] Flashing 100%
      [fw.netlist] Flashing done 100%
      Activate new firmware by devlink reload
      [root@host ~]# devlink dev reload pci/0000:ca:00.0 action fw_activate
      reload_actions_performed:
          fw_activate
      [root@host ~]# ip link show ens7f0
      71: ens7f0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN mode DEFAULT group default qlen 1000
          link/ether b4:96:91:dc:72:e0 brd ff:ff:ff:ff:ff:ff
          altname enp202s0f0
      
      dmesg after flash:
      [   55.120788] ice: Copyright (c) 2018, Intel Corporation.
      [   55.274734] ice 0000:ca:00.0: Get PHY capabilities failed status = -5, continuing anyway
      [   55.569797] ice 0000:ca:00.0: The DDP package was successfully loaded: ICE OS Default Package version 1.3.28.0
      [   55.603629] ice 0000:ca:00.0: Get PHY capability failed.
      [   55.608951] ice 0000:ca:00.0: ice_init_nvm_phy_type failed: -5
      [   55.647348] ice 0000:ca:00.0: PTP init successful
      [   55.675536] ice 0000:ca:00.0: DCB is enabled in the hardware, max number of TCs supported on this port are 8
      [   55.685365] ice 0000:ca:00.0: FW LLDP is disabled, DCBx/LLDP in SW mode.
      [   55.692179] ice 0000:ca:00.0: Commit DCB Configuration to the hardware
      [   55.701382] ice 0000:ca:00.0: 126.024 Gb/s available PCIe bandwidth, limited by 16.0 GT/s PCIe x8 link at 0000:c9:02.0 (capable of 252.048 Gb/s with 16.0 GT/s PCIe x16 link)
      Reboot doesn’t help, only second flash/activate with OOT or patched
      driver put card back in consistent state.
      
      After patch:
      [root@host ~]# devlink dev flash pci/0000:ca:00.0 file E810_XXVDA4_FH_O_SEC_FW_1p6p1p9_NVM_3p10_PLDMoMCTP_0.11_8000AD7B.bin
      Preparing to flash
      [fw.mgmt] Erasing
      [fw.mgmt] Erasing done
      [fw.mgmt] Flashing 100%
      [fw.mgmt] Flashing done 100%
      [fw.undi] Erasing
      [fw.undi] Erasing done
      [fw.undi] Flashing 100%
      [fw.undi] Flashing done 100%
      [fw.netlist] Erasing
      [fw.netlist] Erasing done
      [fw.netlist] Flashing 100%
      [fw.netlist] Flashing done 100%
      Activate new firmware by devlink reload
      [root@host ~]# devlink dev reload pci/0000:ca:00.0 action fw_activate
      reload_actions_performed:
          fw_activate
      [root@host ~]# ip link show ens7f0
      19: ens7f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
          link/ether b4:96:91:dc:72:e0 brd ff:ff:ff:ff:ff:ff
          altname enp202s0f0
      
      Fixes: 399e27db ("ice: support immediate firmware activation via devlink reload")
      Signed-off-by: NPetr Oros <poros@redhat.com>
      Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      b537752e
    • I
      ice: Protect vf_state check by cfg_lock in ice_vc_process_vf_msg() · 77d64d28
      Ivan Vecera 提交于
      Previous patch labelled "ice: Fix incorrect locking in
      ice_vc_process_vf_msg()"  fixed an issue with ignored messages
      sent by VF driver but a small race window still left.
      
      Recently caught trace during 'ip link set ... vf 0 vlan ...' operation:
      
      [ 7332.995625] ice 0000:3b:00.0: Clearing port VLAN on VF 0
      [ 7333.001023] iavf 0000:3b:01.0: Reset indication received from the PF
      [ 7333.007391] iavf 0000:3b:01.0: Scheduling reset task
      [ 7333.059575] iavf 0000:3b:01.0: PF returned error -5 (IAVF_ERR_PARAM) to our request 3
      [ 7333.059626] ice 0000:3b:00.0: Invalid message from VF 0, opcode 3, len 4, error -1
      
      Setting of VLAN for VF causes a reset of the affected VF using
      ice_reset_vf() function that runs with cfg_lock taken:
      
      1. ice_notify_vf_reset() informs IAVF driver that reset is needed and
         IAVF schedules its own reset procedure
      2. Bit ICE_VF_STATE_DIS is set in vf->vf_state
      3. Misc initialization steps
      4. ice_sriov_post_vsi_rebuild() -> ice_vf_set_initialized() and that
         clears ICE_VF_STATE_DIS in vf->vf_state
      
      Step 3 is mentioned race window because IAVF reset procedure runs in
      parallel and one of its step is sending of VIRTCHNL_OP_GET_VF_RESOURCES
      message (opcode==3). This message is handled in ice_vc_process_vf_msg()
      and if it is received during the mentioned race window then it's
      marked as invalid and error is returned to VF driver.
      
      Protect vf_state check in ice_vc_process_vf_msg() by cfg_lock to avoid
      this race condition.
      
      Fixes: e6ba5273 ("ice: Fix race conditions between virtchnl handling and VF ndo ops")
      Tested-by: NFei Liu <feliu@redhat.com>
      Signed-off-by: NIvan Vecera <ivecera@redhat.com>
      Reviewed-by: NJacob Keller <jacob.e.keller@intel.com>
      Tested-by: NKonrad Jankowski <konrad0.jankowski@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      77d64d28
    • I
      ice: Fix incorrect locking in ice_vc_process_vf_msg() · aaf461af
      Ivan Vecera 提交于
      Usage of mutex_trylock() in ice_vc_process_vf_msg() is incorrect
      because message sent from VF is ignored and never processed.
      
      Use mutex_lock() instead to fix the issue. It is safe because this
      mutex is used to prevent races between VF related NDOs and
      handlers processing request messages from VF and these handlers
      are running in ice_service_task() context. Additionally move this
      mutex lock prior ice_vc_is_opcode_allowed() call to avoid potential
      races during allowlist access.
      
      Fixes: e6ba5273 ("ice: Fix race conditions between virtchnl handling and VF ndo ops")
      Signed-off-by: NIvan Vecera <ivecera@redhat.com>
      Tested-by: NKonrad Jankowski <konrad0.jankowski@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      aaf461af
  3. 16 4月, 2022 4 次提交
  4. 14 4月, 2022 4 次提交
    • J
      ice: Fix memory leak in ice_get_orom_civd_data() · 7c8881b7
      Jianglei Nie 提交于
      A memory chunk was allocated for orom_data in ice_get_orom_civd_data()
      by vzmalloc(). But when ice_read_flash_module() fails, the allocated
      memory is not freed, which will lead to a memory leak.
      
      We can fix it by freeing the orom_data when ce_read_flash_module() fails.
      
      Fixes: af18d886 ("ice: reduce time to read Option ROM CIVD data")
      Signed-off-by: NJianglei Nie <niejianglei2021@163.com>
      Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      7c8881b7
    • W
      ice: fix crash in switchdev mode · d2016651
      Wojciech Drewek 提交于
      Below steps end up with crash:
      - modprobe ice
      - devlink dev eswitch set $PF1_PCI mode switchdev
      - echo 64 > /sys/class/net/$PF1/device/sriov_numvfs
      - rmmod ice
      
      Calling ice_eswitch_port_start_xmit while the process of removing
      VFs is in progress ends up with NULL pointer dereference.
      That's because PR netdev is not released but some resources
      are already freed. Fix it by checking if ICE_VF_DIS bit is set.
      
      Call trace:
      [ 1379.595146] BUG: kernel NULL pointer dereference, address: 0000000000000040
      [ 1379.595284] #PF: supervisor read access in kernel mode
      [ 1379.595410] #PF: error_code(0x0000) - not-present page
      [ 1379.595535] PGD 0 P4D 0
      [ 1379.595657] Oops: 0000 [#1] PREEMPT SMP PTI
      [ 1379.595783] CPU: 4 PID: 974 Comm: NetworkManager Kdump: loaded Tainted: G           OE     5.17.0-rc8_mrq_dev-queue+ #12
      [ 1379.595926] Hardware name: Intel Corporation S1200SP/S1200SP, BIOS S1200SP.86B.03.01.0042.013020190050 01/30/2019
      [ 1379.596063] RIP: 0010:ice_eswitch_port_start_xmit+0x46/0xd0 [ice]
      [ 1379.596292] Code: c7 c8 09 00 00 e8 9a c9 fc ff 84 c0 0f 85 82 00 00 00 4c 89 e7 e8 ca 70 fe ff 48 8b 7d 58 48 89 c3 48 85 ff 75 5e 48 8b 53 20 <8b> 42 40 85 c0 74 78 8d 48 01 f0 0f b1 4a 40 75 f2 0f b6 95 84 00
      [ 1379.596456] RSP: 0018:ffffaba0c0d7bad0 EFLAGS: 00010246
      [ 1379.596584] RAX: ffff969c14c71680 RBX: ffff969c14c71680 RCX: 000100107a0f0000
      [ 1379.596715] RDX: 0000000000000000 RSI: ffff969b9d631000 RDI: 0000000000000000
      [ 1379.596846] RBP: ffff969c07b46500 R08: ffff969becfca8ac R09: 0000000000000001
      [ 1379.596977] R10: 0000000000000004 R11: ffffaba0c0d7bbec R12: ffff969b9d631000
      [ 1379.597106] R13: ffffffffc08357a0 R14: ffff969c07b46500 R15: ffff969b9d631000
      [ 1379.597237] FS:  00007f72c0e25c80(0000) GS:ffff969f13500000(0000) knlGS:0000000000000000
      [ 1379.597414] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 1379.597562] CR2: 0000000000000040 CR3: 000000012b316006 CR4: 00000000003706e0
      [ 1379.597713] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [ 1379.597863] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [ 1379.598015] Call Trace:
      [ 1379.598153]  <TASK>
      [ 1379.598294]  dev_hard_start_xmit+0xd9/0x220
      [ 1379.598444]  sch_direct_xmit+0x8a/0x340
      [ 1379.598592]  __dev_queue_xmit+0xa3c/0xd30
      [ 1379.598739]  ? packet_parse_headers+0xb4/0xf0
      [ 1379.598890]  packet_sendmsg+0xa15/0x1620
      [ 1379.599038]  ? __check_object_size+0x46/0x140
      [ 1379.599186]  sock_sendmsg+0x5e/0x60
      [ 1379.599330]  ____sys_sendmsg+0x22c/0x270
      [ 1379.599474]  ? import_iovec+0x17/0x20
      [ 1379.599622]  ? sendmsg_copy_msghdr+0x59/0x90
      [ 1379.599771]  ___sys_sendmsg+0x81/0xc0
      [ 1379.599917]  ? __pollwait+0xd0/0xd0
      [ 1379.600061]  ? preempt_count_add+0x68/0xa0
      [ 1379.600210]  ? _raw_write_lock_irq+0x1a/0x40
      [ 1379.600369]  ? ep_done_scan+0xc9/0x110
      [ 1379.600494]  ? _raw_spin_unlock_irqrestore+0x25/0x40
      [ 1379.600622]  ? preempt_count_add+0x68/0xa0
      [ 1379.600747]  ? _raw_spin_lock_irq+0x1a/0x40
      [ 1379.600899]  ? __fget_light+0x8f/0x110
      [ 1379.601024]  __sys_sendmsg+0x49/0x80
      [ 1379.601148]  ? release_ds_buffers+0x50/0xe0
      [ 1379.601274]  do_syscall_64+0x3b/0x90
      [ 1379.601399]  entry_SYSCALL_64_after_hwframe+0x44/0xae
      [ 1379.601525] RIP: 0033:0x7f72c1e2e35d
      
      Fixes: f5396b8a ("ice: switchdev slow path")
      Signed-off-by: NWojciech Drewek <wojciech.drewek@intel.com>
      Reported-by: NMarcin Szycik <marcin.szycik@linux.intel.com>
      Reviewed-by: NMichal Swiatkowski <michal.swiatkowski@linux.intel.com>
      Tested-by: NSandeep Penigalapati <sandeep.penigalapati@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      d2016651
    • M
      ice: allow creating VFs for !CONFIG_NET_SWITCHDEV · aacca7a8
      Maciej Fijalkowski 提交于
      Currently for !CONFIG_NET_SWITCHDEV kernel builds it is not possible to
      create VFs properly as call to ice_eswitch_configure() returns
      -EOPNOTSUPP for us. This is because CONFIG_ICE_SWITCHDEV depends on
      CONFIG_NET_SWITCHDEV.
      
      Change the ice_eswitch_configure() implementation for
      !CONFIG_ICE_SWITCHDEV to return 0 instead -EOPNOTSUPP and let
      ice_ena_vfs() finish its work properly.
      
      CC: Grzegorz Nitka <grzegorz.nitka@intel.com>
      Fixes: 1a1c40df ("ice: set and release switchdev environment")
      Signed-off-by: NMaciej Fijalkowski <maciej.fijalkowski@intel.com>
      Signed-off-by: NMichal Swiatkowski <michal.swiatkowski@intel.com>
      Tested-by: NKonrad Jankowski <konrad0.jankowski@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      aacca7a8
    • M
      ice: xsk: check if Rx ring was filled up to the end · d1fc4c6f
      Maciej Fijalkowski 提交于
      __ice_alloc_rx_bufs_zc() checks if a number of the descriptors to be
      allocated would cause the ring wrap. In that case, driver will issue two
      calls to xsk_buff_alloc_batch() - one that will fill the ring up to the
      end and the second one that will start with filling descriptors from the
      beginning of the ring.
      
      ice_fill_rx_descs() is a wrapper for taking care of what
      xsk_buff_alloc_batch() gave back to the driver. It works in a best
      effort approach, so for example when driver asks for 64 buffers,
      ice_fill_rx_descs() could assign only 32. Such case needs to be checked
      when ring is being filled up to the end, because in that situation ntu
      might not reached the end of the ring.
      
      Fix the ring wrap by checking if nb_buffs_extra has the expected value.
      If not, bump ntu and go directly to tail update.
      
      Fixes: 3876ff52 ("ice: xsk: Handle SW XDP ring wrap and bump tail more often")
      Signed-off-by: NMagnus Karlsson <magnus.karlsson@intel.com>
      Signed-off-by: NMaciej Fijalkowski <maciej.fijalkowski@intel.com>
      Tested-by: NShwetha Nagaraju <Shwetha.nagaraju@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      d1fc4c6f
  5. 13 4月, 2022 1 次提交
    • J
      ice: Add mpls+tso support · 69e66c04
      Joe Damato 提交于
      Attempt to add mpls+tso support.
      
      I don't have ice hardware available to test myself, but I just implemented
      this feature in i40e and thought it might be useful to implement for ice
      while this is fresh in my brain.
      
      Hoping some one at intel will be able to test this on my behalf.
      Signed-off-by: NJoe Damato <jdamato@fastly.com>
      Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      69e66c04
  6. 09 4月, 2022 1 次提交
    • A
      ice: arfs: fix use-after-free when freeing @rx_cpu_rmap · d7442f51
      Alexander Lobakin 提交于
      The CI testing bots triggered the following splat:
      
      [  718.203054] BUG: KASAN: use-after-free in free_irq_cpu_rmap+0x53/0x80
      [  718.206349] Read of size 4 at addr ffff8881bd127e00 by task sh/20834
      [  718.212852] CPU: 28 PID: 20834 Comm: sh Kdump: loaded Tainted: G S      W IOE     5.17.0-rc8_nextqueue-devqueue-02643-g23f3121aca93 #1
      [  718.219695] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0012.070720200218 07/07/2020
      [  718.223418] Call Trace:
      [  718.227139]
      [  718.230783]  dump_stack_lvl+0x33/0x42
      [  718.234431]  print_address_description.constprop.9+0x21/0x170
      [  718.238177]  ? free_irq_cpu_rmap+0x53/0x80
      [  718.241885]  ? free_irq_cpu_rmap+0x53/0x80
      [  718.245539]  kasan_report.cold.18+0x7f/0x11b
      [  718.249197]  ? free_irq_cpu_rmap+0x53/0x80
      [  718.252852]  free_irq_cpu_rmap+0x53/0x80
      [  718.256471]  ice_free_cpu_rx_rmap.part.11+0x37/0x50 [ice]
      [  718.260174]  ice_remove_arfs+0x5f/0x70 [ice]
      [  718.263810]  ice_rebuild_arfs+0x3b/0x70 [ice]
      [  718.267419]  ice_rebuild+0x39c/0xb60 [ice]
      [  718.270974]  ? asm_sysvec_apic_timer_interrupt+0x12/0x20
      [  718.274472]  ? ice_init_phy_user_cfg+0x360/0x360 [ice]
      [  718.278033]  ? delay_tsc+0x4a/0xb0
      [  718.281513]  ? preempt_count_sub+0x14/0xc0
      [  718.284984]  ? delay_tsc+0x8f/0xb0
      [  718.288463]  ice_do_reset+0x92/0xf0 [ice]
      [  718.292014]  ice_pci_err_resume+0x91/0xf0 [ice]
      [  718.295561]  pci_reset_function+0x53/0x80
      <...>
      [  718.393035] Allocated by task 690:
      [  718.433497] Freed by task 20834:
      [  718.495688] Last potentially related work creation:
      [  718.568966] The buggy address belongs to the object at ffff8881bd127e00
                      which belongs to the cache kmalloc-96 of size 96
      [  718.574085] The buggy address is located 0 bytes inside of
                      96-byte region [ffff8881bd127e00, ffff8881bd127e60)
      [  718.579265] The buggy address belongs to the page:
      [  718.598905] Memory state around the buggy address:
      [  718.601809]  ffff8881bd127d00: fa fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
      [  718.604796]  ffff8881bd127d80: 00 00 00 00 00 00 00 00 00 00 fc fc fc fc fc fc
      [  718.607794] >ffff8881bd127e00: fa fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
      [  718.610811]                    ^
      [  718.613819]  ffff8881bd127e80: 00 00 00 00 00 00 00 00 00 00 00 00 fc fc fc fc
      [  718.617107]  ffff8881bd127f00: fa fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
      
      This is due to that free_irq_cpu_rmap() is always being called
      *after* (devm_)free_irq() and thus it tries to work with IRQ descs
      already freed. For example, on device reset the driver frees the
      rmap right before allocating a new one (the splat above).
      Make rmap creation and freeing function symmetrical with
      {request,free}_irq() calls i.e. do that on ifup/ifdown instead
      of device probe/remove/resume. These operations can be performed
      independently from the actual device aRFS configuration.
      Also, make sure ice_vsi_free_irq() clears IRQ affinity notifiers
      only when aRFS is disabled -- otherwise, CPU rmap sets and clears
      its own and they must not be touched manually.
      
      Fixes: 28bf2672 ("ice: Implement aRFS")
      Co-developed-by: NIvan Vecera <ivecera@redhat.com>
      Signed-off-by: NIvan Vecera <ivecera@redhat.com>
      Signed-off-by: NAlexander Lobakin <alexandr.lobakin@intel.com>
      Tested-by: NIvan Vecera <ivecera@redhat.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      d7442f51
  7. 07 4月, 2022 5 次提交
  8. 06 4月, 2022 3 次提交
  9. 05 4月, 2022 2 次提交
    • A
      ice: Do not skip not enabled queues in ice_vc_dis_qs_msg · 05ef6813
      Anatolii Gerasymenko 提交于
      Disable check for queue being enabled in ice_vc_dis_qs_msg, because
      there could be a case when queues were created, but were not enabled.
      We still need to delete those queues.
      
      Normal workflow for VF looks like:
      Enable path:
      VIRTCHNL_OP_ADD_ETH_ADDR (opcode 10)
      VIRTCHNL_OP_CONFIG_VSI_QUEUES (opcode 6)
      VIRTCHNL_OP_ENABLE_QUEUES (opcode 8)
      
      Disable path:
      VIRTCHNL_OP_DISABLE_QUEUES (opcode 9)
      VIRTCHNL_OP_DEL_ETH_ADDR (opcode 11)
      
      The issue appears only in stress conditions when VF is enabled and
      disabled very fast.
      Eventually there will be a case, when queues are created by
      VIRTCHNL_OP_CONFIG_VSI_QUEUES, but are not enabled by
      VIRTCHNL_OP_ENABLE_QUEUES.
      In turn, these queues are not deleted by VIRTCHNL_OP_DISABLE_QUEUES,
      because there is a check whether queues are enabled in
      ice_vc_dis_qs_msg.
      
      When we bring up the VF again, we will see the "Failed to set LAN Tx queue
      context" error during VIRTCHNL_OP_CONFIG_VSI_QUEUES step. This
      happens because old 16 queues were not deleted and VF requests to create
      16 more, but ice_sched_get_free_qparent in ice_ena_vsi_txq would fail to
      find a parent node for first newly requested queue (because all nodes
      are allocated to 16 old queues).
      
      Testing Hints:
      
      Just enable and disable VF fast enough, so it would be disabled before
      reaching VIRTCHNL_OP_ENABLE_QUEUES.
      
      while true; do
              ip link set dev ens785f0v0 up
              sleep 0.065 # adjust delay value for you machine
              ip link set dev ens785f0v0 down
      done
      
      Fixes: 77ca27c4 ("ice: add support for virtchnl_queue_select.[tx|rx]_queues bitmap")
      Signed-off-by: NAnatolii Gerasymenko <anatolii.gerasymenko@intel.com>
      Tested-by: NKonrad Jankowski <konrad0.jankowski@intel.com>
      Signed-off-by: NAlice Michael <alice.michael@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      05ef6813
    • A
      ice: Set txq_teid to ICE_INVAL_TEID on ring creation · ccfee182
      Anatolii Gerasymenko 提交于
      When VF is freshly created, but not brought up, ring->txq_teid
      value is by default set to 0.
      But 0 is a valid TEID. On some platforms the Root Node of
      Tx scheduler has a TEID = 0. This can cause issues as shown below.
      
      The proper way is to set ring->txq_teid to ICE_INVAL_TEID (0xFFFFFFFF).
      
      Testing Hints:
      echo 1 > /sys/class/net/ens785f0/device/sriov_numvfs
      ip link set dev ens785f0v0 up
      ip link set dev ens785f0v0 down
      
      If we have freshly created VF and quickly turn it on and off, so there
      would be no time to reach VIRTCHNL_OP_CONFIG_VSI_QUEUES stage, then
      VIRTCHNL_OP_DISABLE_QUEUES stage will fail with error:
      [  639.531454] disable queue 89 failed 14
      [  639.532233] Failed to disable LAN Tx queues, error: ICE_ERR_AQ_ERROR
      [  639.533107] ice 0000:02:00.0: Failed to stop Tx ring 0 on VSI 5
      
      The reason for the fail is that we are trying to send AQ command to
      delete queue 89, which has never been created and receive an "invalid
      argument" error from firmware.
      
      As this queue has never been created, it's teid and ring->txq_teid
      have default value 0.
      ice_dis_vsi_txq has a check against non-existent queues:
      
      node = ice_sched_find_node_by_teid(pi->root, q_teids[i]);
      if (!node)
      	continue;
      
      But on some platforms the Root Node of Tx scheduler has a teid = 0.
      Hence, ice_sched_find_node_by_teid finds a node with teid = 0 (it is
      pi->root), and we go further to submit an erroneous request to firmware.
      
      Fixes: 37bb8390 ("ice: Move common functions out of ice_main.c part 7/7")
      Signed-off-by: NAnatolii Gerasymenko <anatolii.gerasymenko@intel.com>
      Reviewed-by: NMaciej Fijalkowski <maciej.fijalkowski@intel.com>
      Tested-by: NKonrad Jankowski <konrad0.jankowski@intel.com>
      Signed-off-by: NAlice Michael <alice.michael@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      ccfee182
  10. 01 4月, 2022 3 次提交
    • I
      ice: Fix broken IFF_ALLMULTI handling · 1273f895
      Ivan Vecera 提交于
      Handling of all-multicast flag and associated multicast promiscuous
      mode is broken in ice driver. When an user switches allmulticast
      flag on or off the driver checks whether any VLANs are configured
      over the interface (except default VLAN 0).
      
      If any extra VLANs are registered it enables multicast promiscuous
      mode for all these VLANs (including default VLAN 0) using
      ICE_SW_LKUP_PROMISC_VLAN look-up type. In this situation all
      multicast packets tagged with known VLAN ID or untagged are received
      and multicast packets tagged with unknown VLAN ID ignored.
      
      If no extra VLANs are registered (so only VLAN 0 exists) it enables
      multicast promiscuous mode for VLAN 0 and uses ICE_SW_LKUP_PROMISC
      look-up type. In this situation any multicast packets including
      tagged ones are received.
      
      The driver handles IFF_ALLMULTI in ice_vsi_sync_fltr() this way:
      
      ice_vsi_sync_fltr() {
        ...
        if (changed_flags & IFF_ALLMULTI) {
          if (netdev->flags & IFF_ALLMULTI) {
            if (vsi->num_vlans > 1)
              ice_set_promisc(..., ICE_MCAST_VLAN_PROMISC_BITS);
            else
              ice_set_promisc(..., ICE_MCAST_PROMISC_BITS);
          } else {
            if (vsi->num_vlans > 1)
              ice_clear_promisc(..., ICE_MCAST_VLAN_PROMISC_BITS);
            else
              ice_clear_promisc(..., ICE_MCAST_PROMISC_BITS);
          }
        }
        ...
      }
      
      The code above depends on value vsi->num_vlan that specifies number
      of VLANs configured over the interface (including VLAN 0) and
      this is problem because that value is modified in NDO callbacks
      ice_vlan_rx_add_vid() and ice_vlan_rx_kill_vid().
      
      Scenario 1:
      1. ip link set ens7f0 allmulticast on
      2. ip link add vlan10 link ens7f0 type vlan id 10
      3. ip link set ens7f0 allmulticast off
      4. ip link set ens7f0 allmulticast on
      
      [1] In this scenario IFF_ALLMULTI is enabled and the driver calls
          ice_set_promisc(..., ICE_MCAST_PROMISC_BITS) that installs
          multicast promisc rule with non-VLAN look-up type.
      [2] Then VLAN with ID 10 is added and vsi->num_vlan incremented to 2
      [3] Command switches IFF_ALLMULTI off and the driver calls
          ice_clear_promisc(..., ICE_MCAST_VLAN_PROMISC_BITS) but this
          call is effectively NOP because it looks for multicast promisc
          rules for VLAN 0 and VLAN 10 with VLAN look-up type but no such
          rules exist. So the all-multicast remains enabled silently
          in hardware.
      [4] Command tries to switch IFF_ALLMULTI on and the driver calls
          ice_clear_promisc(..., ICE_MCAST_PROMISC_BITS) but this call
          fails (-EEXIST) because non-VLAN multicast promisc rule already
          exists.
      
      Scenario 2:
      1. ip link add vlan10 link ens7f0 type vlan id 10
      2. ip link set ens7f0 allmulticast on
      3. ip link add vlan20 link ens7f0 type vlan id 20
      4. ip link del vlan10 ; ip link del vlan20
      5. ip link set ens7f0 allmulticast off
      
      [1] VLAN with ID 10 is added and vsi->num_vlan==2
      [2] Command switches IFF_ALLMULTI on and driver installs multicast
          promisc rules with VLAN look-up type for VLAN 0 and 10
      [3] VLAN with ID 20 is added and vsi->num_vlan==3 but no multicast
          promisc rules is added for this new VLAN so the interface does
          not receive MC packets from VLAN 20
      [4] Both VLANs are removed but multicast rule for VLAN 10 remains
          installed so interface receives multicast packets from VLAN 10
      [5] Command switches IFF_ALLMULTI off and because vsi->num_vlan is 1
          the driver tries to remove multicast promisc rule for VLAN 0
          with non-VLAN look-up that does not exist.
          All-multicast looks disabled from user point of view but it
          is partially enabled in HW (interface receives all multicast
          packets either untagged or tagged with VLAN ID 10)
      
      To resolve these issues the patch introduces these changes:
      1. Adds handling for IFF_ALLMULTI to ice_vlan_rx_add_vid() and
         ice_vlan_rx_kill_vid() callbacks. So when VLAN is added/removed
         and IFF_ALLMULTI is enabled an appropriate multicast promisc
         rule for that VLAN ID is added/removed.
      2. In ice_vlan_rx_add_vid() when first VLAN besides VLAN 0 is added
         so (vsi->num_vlan == 2) and IFF_ALLMULTI is enabled then look-up
         type for existing multicast promisc rule for VLAN 0 is updated
         to ICE_MCAST_VLAN_PROMISC_BITS.
      3. In ice_vlan_rx_kill_vid() when last VLAN besides VLAN 0 is removed
         so (vsi->num_vlan == 1) and IFF_ALLMULTI is enabled then look-up
         type for existing multicast promisc rule for VLAN 0 is updated
         to ICE_MCAST_PROMISC_BITS.
      4. Both ice_vlan_rx_{add,kill}_vid() have to run under ICE_CFG_BUSY
         bit protection to avoid races with ice_vsi_sync_fltr() that runs
         in ice_service_task() context.
      5. Bit ICE_VSI_VLAN_FLTR_CHANGED is use-less and can be removed.
      6. Error messages added to ice_fltr_*_vsi_promisc() helper functions
         to avoid them in their callers
      7. Small improvements to increase readability
      
      Fixes: 5eda8afd ("ice: Add support for PF/VF promiscuous mode")
      Signed-off-by: NIvan Vecera <ivecera@redhat.com>
      Reviewed-by: NJacob Keller <jacob.e.keller@intel.com>
      Signed-off-by: NAlice Michael <alice.michael@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1273f895
    • I
      ice: Fix MAC address setting · 2c0069f3
      Ivan Vecera 提交于
      Commit 2ccc1c1c ("ice: Remove excess error variables") merged
      the usage of 'status' and 'err' variables into single one in
      function ice_set_mac_address(). Unfortunately this causes
      a regression when call of ice_fltr_add_mac() returns -EEXIST because
      this return value does not indicate an error in this case but
      value of 'err' remains to be -EEXIST till the end of the function
      and is returned to caller.
      
      Prior mentioned commit this does not happen because return value of
      ice_fltr_add_mac() was stored to 'status' variable first and
      if it was -EEXIST then 'err' remains to be zero.
      
      Fix the problem by reset 'err' to zero when ice_fltr_add_mac()
      returns -EEXIST.
      
      Fixes: 2ccc1c1c ("ice: Remove excess error variables")
      Signed-off-by: NIvan Vecera <ivecera@redhat.com>
      Reviewed-by: NJacob Keller <jacob.e.keller@intel.com>
      Acked-by: NAlexander Lobakin <alexandr.lobakin@intel.com>
      Signed-off-by: NAlice Michael <alice.michael@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2c0069f3
    • I
      ice: Clear default forwarding VSI during VSI release · bd8c624c
      Ivan Vecera 提交于
      VSI is set as default forwarding one when promisc mode is set for
      PF interface, when PF is switched to switchdev mode or when VF
      driver asks to enable allmulticast or promisc mode for the VF
      interface (when vf-true-promisc-support priv flag is off).
      The third case is buggy because in that case VSI associated with
      VF remains as default one after VF removal.
      
      Reproducer:
      1. Create VF
         echo 1 > sys/class/net/ens7f0/device/sriov_numvfs
      2. Enable allmulticast or promisc mode on VF
         ip link set ens7f0v0 allmulticast on
         ip link set ens7f0v0 promisc on
      3. Delete VF
         echo 0 > sys/class/net/ens7f0/device/sriov_numvfs
      4. Try to enable promisc mode on PF
         ip link set ens7f0 promisc on
      
      Although it looks that promisc mode on PF is enabled the opposite
      is true because ice_vsi_sync_fltr() responsible for IFF_PROMISC
      handling first checks if any other VSI is set as default forwarding
      one and if so the function does not do anything. At this point
      it is not possible to enable promisc mode on PF without re-probe
      device.
      
      To resolve the issue this patch clear default forwarding VSI
      during ice_vsi_release() when the VSI to be released is the default
      one.
      
      Fixes: 01b5e89a ("ice: Add VF promiscuous support")
      Signed-off-by: NIvan Vecera <ivecera@redhat.com>
      Reviewed-by: NMichal Swiatkowski <michal.swiatkowski@linux.intel.com>
      Reviewed-by: NMaciej Fijalkowski <maciej.fijalkowski@intel.com>
      Signed-off-by: NAlice Michael <alice.michael@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bd8c624c
  11. 29 3月, 2022 3 次提交