1. 07 2月, 2023 1 次提交
    • A
      ice: Do not use WQ_MEM_RECLAIM flag for workqueue · 4d159f78
      Anirudh Venkataramanan 提交于
      When both ice and the irdma driver are loaded, a warning in
      check_flush_dependency is being triggered. This is due to ice driver
      workqueue being allocated with the WQ_MEM_RECLAIM flag and the irdma one
      is not.
      
      According to kernel documentation, this flag should be set if the
      workqueue will be involved in the kernel's memory reclamation flow.
      Since it is not, there is no need for the ice driver's WQ to have this
      flag set so remove it.
      
      Example trace:
      
      [  +0.000004] workqueue: WQ_MEM_RECLAIM ice:ice_service_task [ice] is flushing !WQ_MEM_RECLAIM infiniband:0x0
      [  +0.000139] WARNING: CPU: 0 PID: 728 at kernel/workqueue.c:2632 check_flush_dependency+0x178/0x1a0
      [  +0.000011] Modules linked in: bonding tls xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 nft_compat nft_cha
      in_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfnetlink bridge stp llc rfkill vfat fat intel_rapl_msr intel
      _rapl_common isst_if_common skx_edac nfit libnvdimm x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct1
      0dif_pclmul crc32_pclmul ghash_clmulni_intel rapl intel_cstate rpcrdma sunrpc rdma_ucm ib_srpt ib_isert iscsi_target_mod target_
      core_mod ib_iser libiscsi scsi_transport_iscsi rdma_cm ib_cm iw_cm iTCO_wdt iTCO_vendor_support ipmi_ssif irdma mei_me ib_uverbs
      ib_core intel_uncore joydev pcspkr i2c_i801 acpi_ipmi mei lpc_ich i2c_smbus intel_pch_thermal ioatdma ipmi_si acpi_power_meter
      acpi_pad xfs libcrc32c sd_mod t10_pi crc64_rocksoft crc64 sg ahci ixgbe libahci ice i40e igb crc32c_intel mdio i2c_algo_bit liba
      ta dca wmi dm_mirror dm_region_hash dm_log dm_mod ipmi_devintf ipmi_msghandler fuse
      [  +0.000161]  [last unloaded: bonding]
      [  +0.000006] CPU: 0 PID: 728 Comm: kworker/0:2 Tainted: G S                 6.2.0-rc2_next-queue-13jan-00458-gc20aabd57164 #1
      [  +0.000006] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0010.010620200716 01/06/2020
      [  +0.000003] Workqueue: ice ice_service_task [ice]
      [  +0.000127] RIP: 0010:check_flush_dependency+0x178/0x1a0
      [  +0.000005] Code: 89 8e 02 01 e8 49 3d 40 00 49 8b 55 18 48 8d 8d d0 00 00 00 48 8d b3 d0 00 00 00 4d 89 e0 48 c7 c7 e0 3b 08
      9f e8 bb d3 07 01 <0f> 0b e9 be fe ff ff 80 3d 24 89 8e 02 00 0f 85 6b ff ff ff e9 06
      [  +0.000004] RSP: 0018:ffff88810a39f990 EFLAGS: 00010282
      [  +0.000005] RAX: 0000000000000000 RBX: ffff888141bc2400 RCX: 0000000000000000
      [  +0.000004] RDX: 0000000000000001 RSI: dffffc0000000000 RDI: ffffffffa1213a80
      [  +0.000003] RBP: ffff888194bf3400 R08: ffffed117b306112 R09: ffffed117b306112
      [  +0.000003] R10: ffff888bd983088b R11: ffffed117b306111 R12: 0000000000000000
      [  +0.000003] R13: ffff888111f84d00 R14: ffff88810a3943ac R15: ffff888194bf3400
      [  +0.000004] FS:  0000000000000000(0000) GS:ffff888bd9800000(0000) knlGS:0000000000000000
      [  +0.000003] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  +0.000003] CR2: 000056035b208b60 CR3: 000000017795e005 CR4: 00000000007706f0
      [  +0.000003] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [  +0.000003] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [  +0.000002] PKRU: 55555554
      [  +0.000003] Call Trace:
      [  +0.000002]  <TASK>
      [  +0.000003]  __flush_workqueue+0x203/0x840
      [  +0.000006]  ? mutex_unlock+0x84/0xd0
      [  +0.000008]  ? __pfx_mutex_unlock+0x10/0x10
      [  +0.000004]  ? __pfx___flush_workqueue+0x10/0x10
      [  +0.000006]  ? mutex_lock+0xa3/0xf0
      [  +0.000005]  ib_cache_cleanup_one+0x39/0x190 [ib_core]
      [  +0.000174]  __ib_unregister_device+0x84/0xf0 [ib_core]
      [  +0.000094]  ib_unregister_device+0x25/0x30 [ib_core]
      [  +0.000093]  irdma_ib_unregister_device+0x97/0xc0 [irdma]
      [  +0.000064]  ? __pfx_irdma_ib_unregister_device+0x10/0x10 [irdma]
      [  +0.000059]  ? up_write+0x5c/0x90
      [  +0.000005]  irdma_remove+0x36/0x90 [irdma]
      [  +0.000062]  auxiliary_bus_remove+0x32/0x50
      [  +0.000007]  device_release_driver_internal+0xfa/0x1c0
      [  +0.000005]  bus_remove_device+0x18a/0x260
      [  +0.000007]  device_del+0x2e5/0x650
      [  +0.000005]  ? __pfx_device_del+0x10/0x10
      [  +0.000003]  ? mutex_unlock+0x84/0xd0
      [  +0.000004]  ? __pfx_mutex_unlock+0x10/0x10
      [  +0.000004]  ? _raw_spin_unlock+0x18/0x40
      [  +0.000005]  ice_unplug_aux_dev+0x52/0x70 [ice]
      [  +0.000160]  ice_service_task+0x1309/0x14f0 [ice]
      [  +0.000134]  ? __pfx___schedule+0x10/0x10
      [  +0.000006]  process_one_work+0x3b1/0x6c0
      [  +0.000008]  worker_thread+0x69/0x670
      [  +0.000005]  ? __kthread_parkme+0xec/0x110
      [  +0.000007]  ? __pfx_worker_thread+0x10/0x10
      [  +0.000005]  kthread+0x17f/0x1b0
      [  +0.000005]  ? __pfx_kthread+0x10/0x10
      [  +0.000004]  ret_from_fork+0x29/0x50
      [  +0.000009]  </TASK>
      
      Fixes: 940b61af ("ice: Initialize PF and setup miscellaneous interrupt")
      Signed-off-by: NAnirudh Venkataramanan <anirudh.venkataramanan@intel.com>
      Signed-off-by: NMarcin Szycik <marcin.szycik@linux.intel.com>
      Tested-by: NJakub Andrysiak <jakub.andrysiak@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      Reviewed-by: NLeon Romanovsky <leonro@nvidia.com>
      4d159f78
  2. 28 1月, 2023 1 次提交
  3. 25 1月, 2023 1 次提交
  4. 09 12月, 2022 1 次提交
    • J
      ice: always call ice_ptp_link_change and make it void · 6b1ff5d3
      Jacob Keller 提交于
      The ice_ptp_link_change function is currently only called for E822 based
      hardware. Future changes are going to extend this function to perform
      additional tasks on link change.
      
      Always call this function, moving the E810 check from the callers down to
      just before we call the E822-specific function required to restart the PHY.
      
      This function also returns an error value, but none of the callers actually
      check it. In general, the errors it produces are more likely systemic
      problems such as invalid or corrupt port numbers. No caller checks these,
      and so no warning is logged.
      
      Re-order the flag checks so that ICE_FLAG_PTP is checked first. Drop the
      unnecessary check for ICE_FLAG_PTP_SUPPORTED, as ICE_FLAG_PTP will not be
      set except when ICE_FLAG_PTP_SUPPORTED is set.
      
      Convert the port checks to WARN_ON_ONCE, in order to generate a kernel
      stack trace when they are hit.
      
      Convert the function to void since no caller actually checks these return
      values.
      Co-developed-by: NDave Ertman <david.m.ertman@intel.com>
      Signed-off-by: NDave Ertman <david.m.ertman@intel.com>
      Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
      Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel)
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      6b1ff5d3
  5. 24 11月, 2022 2 次提交
    • B
      ice: Accumulate ring statistics over reset · 288ecf49
      Benjamin Mikailenko 提交于
      Resets may occur with or without user interaction. For example, a TX hang
      or reconfiguration of parameters will result in a reset. During reset, the
      VSI is freed, freeing any statistics structures inside as well. This would
      create an issue for the user where a reset happens in the background,
      statistics set to zero, and the user checks ring statistics expecting them
      to be populated.
      
      To ensure this doesn't happen, accumulate ring statistics over reset.
      
      Define a new ring statistics structure, ice_ring_stats. The new structure
      lives in the VSI's parent, preserving ring statistics when VSI is freed.
      
      1. Define a new structure vsi_ring_stats in the PF scope
      2. Allocate/free stats only during probe, unload, or change in ring size
      3. Replace previous ring statistics functionality with new structure
      Signed-off-by: NBenjamin Mikailenko <benjamin.mikailenko@intel.com>
      Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel)
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      288ecf49
    • B
      ice: Accumulate HW and Netdev statistics over reset · 2fd5e433
      Benjamin Mikailenko 提交于
      Resets happen with or without user interaction. For example, incidents
      such as TX hang or a reconfiguration of parameters will result in a reset.
      During reset, hardware and software statistics were set to zero. This
      created an issue for the user where a reset happens in the background,
      statistics set to zero, and the user checks statistics expecting them to
      be populated.
      
      To ensure this doesn't happen, keep accumulating stats over reset.
      
      1. Remove function calls which reset hardware and netdev statistics.
      2. Do not rollover statistics in ice_stat_update40 during reset.
      Signed-off-by: NBenjamin Mikailenko <benjamin.mikailenko@intel.com>
      Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel)
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      2fd5e433
  6. 22 11月, 2022 1 次提交
    • J
      ice: fix handling of burst Tx timestamps · 30f15874
      Jacob Keller 提交于
      Commit 1229b339 ("ice: Add low latency Tx timestamp read") refactored
      PTP timestamping logic to use a threaded IRQ instead of a separate kthread.
      
      This implementation introduced ice_misc_intr_thread_fn and redefined the
      ice_ptp_process_ts function interface to return a value of whether or not
      the timestamp processing was complete.
      
      ice_misc_intr_thread_fn would take the return value from ice_ptp_process_ts
      and convert it into either IRQ_HANDLED if there were no more timestamps to
      be processed, or IRQ_WAKE_THREAD if the thread should continue processing.
      
      This is not correct, as the kernel does not re-schedule threaded IRQ
      functions automatically. IRQ_WAKE_THREAD can only be used by the main IRQ
      function.
      
      This results in the ice_ptp_process_ts function (and in turn the
      ice_ptp_tx_tstamp function) from only being called exactly once per
      interrupt.
      
      If an application sends a burst of Tx timestamps without waiting for a
      response, the interrupt will trigger for the first timestamp. However,
      later timestamps may not have arrived yet. This can result in dropped or
      discarded timestamps. Worse, on E822 hardware this results in the interrupt
      logic getting stuck such that no future interrupts will be triggered. The
      result is complete loss of Tx timestamp functionality.
      
      Fix this by modifying the ice_misc_intr_thread_fn to perform its own
      polling of the ice_ptp_process_ts function. We sleep for a few microseconds
      between attempts to avoid wasting significant CPU time. The value was
      chosen to allow time for the Tx timestamps to complete without wasting so
      much time that we overrun application wait budgets in the worst case.
      
      The ice_ptp_process_ts function also currently returns false in the event
      that the Tx tracker is not initialized. This would result in the threaded
      IRQ handler never exiting if it gets started while the tracker is not
      initialized.
      
      Fix the function to appropriately return true when the tracker is not
      initialized.
      
      Note that this will not reproduce with default ptp4l behavior, as the
      program always synchronously waits for a timestamp response before sending
      another timestamp request.
      Reported-by: NSiddaraju DH <siddaraju.dh@intel.com>
      Fixes: 1229b339 ("ice: Add low latency Tx timestamp read")
      Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
      Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel)
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      Link: https://lore.kernel.org/r/20221118222729.1565317-1-anthony.l.nguyen@intel.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      30f15874
  7. 18 11月, 2022 1 次提交
  8. 04 11月, 2022 2 次提交
  9. 29 10月, 2022 1 次提交
  10. 25 10月, 2022 1 次提交
  11. 29 9月, 2022 1 次提交
  12. 27 9月, 2022 1 次提交
  13. 21 9月, 2022 3 次提交
  14. 09 9月, 2022 1 次提交
    • D
      ice: Don't double unplug aux on peer initiated reset · 23c61919
      Dave Ertman 提交于
      In the IDC callback that is accessed when the aux drivers request a reset,
      the function to unplug the aux devices is called.  This function is also
      called in the ice_prepare_for_reset function. This double call is causing
      a "scheduling while atomic" BUG.
      
      [  662.676430] ice 0000:4c:00.0 rocep76s0: cqp opcode = 0x1 maj_err_code = 0xffff min_err_code = 0x8003
      
      [  662.676609] ice 0000:4c:00.0 rocep76s0: [Modify QP Cmd Error][op_code=8] status=-29 waiting=1 completion_err=1 maj=0xffff min=0x8003
      
      [  662.815006] ice 0000:4c:00.0 rocep76s0: ICE OICR event notification: oicr = 0x10000003
      
      [  662.815014] ice 0000:4c:00.0 rocep76s0: critical PE Error, GLPE_CRITERR=0x00011424
      
      [  662.815017] ice 0000:4c:00.0 rocep76s0: Requesting a reset
      
      [  662.815475] BUG: scheduling while atomic: swapper/37/0/0x00010002
      
      [  662.815475] BUG: scheduling while atomic: swapper/37/0/0x00010002
      [  662.815477] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache netfs rfkill 8021q garp mrp stp llc vfat fat rpcrdma intel_rapl_msr intel_rapl_common sunrpc i10nm_edac rdma_ucm nfit ib_srpt libnvdimm ib_isert iscsi_target_mod x86_pkg_temp_thermal intel_powerclamp coretemp target_core_mod snd_hda_intel ib_iser snd_intel_dspcfg libiscsi snd_intel_sdw_acpi scsi_transport_iscsi kvm_intel iTCO_wdt rdma_cm snd_hda_codec kvm iw_cm ipmi_ssif iTCO_vendor_support snd_hda_core irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hwdep snd_seq snd_seq_device rapl snd_pcm snd_timer isst_if_mbox_pci pcspkr isst_if_mmio irdma intel_uncore idxd acpi_ipmi joydev isst_if_common snd mei_me idxd_bus ipmi_si soundcore i2c_i801 mei ipmi_devintf i2c_smbus i2c_ismt ipmi_msghandler acpi_power_meter acpi_pad rv(OE) ib_uverbs ib_cm ib_core xfs libcrc32c ast i2c_algo_bit drm_vram_helper drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm_ttm_helpe
       r ttm
      [  662.815546]  nvme nvme_core ice drm crc32c_intel i40e t10_pi wmi pinctrl_emmitsburg dm_mirror dm_region_hash dm_log dm_mod fuse
      [  662.815557] Preemption disabled at:
      [  662.815558] [<0000000000000000>] 0x0
      [  662.815563] CPU: 37 PID: 0 Comm: swapper/37 Kdump: loaded Tainted: G S         OE     5.17.1 #2
      [  662.815566] Hardware name: Intel Corporation D50DNP/D50DNP, BIOS SE5C6301.86B.6624.D18.2111021741 11/02/2021
      [  662.815568] Call Trace:
      [  662.815572]  <IRQ>
      [  662.815574]  dump_stack_lvl+0x33/0x42
      [  662.815581]  __schedule_bug.cold.147+0x7d/0x8a
      [  662.815588]  __schedule+0x798/0x990
      [  662.815595]  schedule+0x44/0xc0
      [  662.815597]  schedule_preempt_disabled+0x14/0x20
      [  662.815600]  __mutex_lock.isra.11+0x46c/0x490
      [  662.815603]  ? __ibdev_printk+0x76/0xc0 [ib_core]
      [  662.815633]  device_del+0x37/0x3d0
      [  662.815639]  ice_unplug_aux_dev+0x1a/0x40 [ice]
      [  662.815674]  ice_schedule_reset+0x3c/0xd0 [ice]
      [  662.815693]  irdma_iidc_event_handler.cold.7+0xb6/0xd3 [irdma]
      [  662.815712]  ? bitmap_find_next_zero_area_off+0x45/0xa0
      [  662.815719]  ice_send_event_to_aux+0x54/0x70 [ice]
      [  662.815741]  ice_misc_intr+0x21d/0x2d0 [ice]
      [  662.815756]  __handle_irq_event_percpu+0x4c/0x180
      [  662.815762]  handle_irq_event_percpu+0xf/0x40
      [  662.815764]  handle_irq_event+0x34/0x60
      [  662.815766]  handle_edge_irq+0x9a/0x1c0
      [  662.815770]  __common_interrupt+0x62/0x100
      [  662.815774]  common_interrupt+0xb4/0xd0
      [  662.815779]  </IRQ>
      [  662.815780]  <TASK>
      [  662.815780]  asm_common_interrupt+0x1e/0x40
      [  662.815785] RIP: 0010:cpuidle_enter_state+0xd6/0x380
      [  662.815789] Code: 49 89 c4 0f 1f 44 00 00 31 ff e8 65 d7 95 ff 45 84 ff 74 12 9c 58 f6 c4 02 0f 85 64 02 00 00 31 ff e8 ae c5 9c ff fb 45 85 f6 <0f> 88 12 01 00 00 49 63 d6 4c 2b 24 24 48 8d 04 52 48 8d 04 82 49
      [  662.815791] RSP: 0018:ff2c2c4f18edbe80 EFLAGS: 00000202
      [  662.815793] RAX: ff280805df140000 RBX: 0000000000000002 RCX: 000000000000001f
      [  662.815795] RDX: 0000009a52da2d08 RSI: ffffffff93f8240b RDI: ffffffff93f53ee7
      [  662.815796] RBP: ff5e2bd11ff41928 R08: 0000000000000000 R09: 000000000002f8c0
      [  662.815797] R10: 0000010c3f18e2cf R11: 000000000000000f R12: 0000009a52da2d08
      [  662.815798] R13: ffffffff94ad7e20 R14: 0000000000000002 R15: 0000000000000000
      [  662.815801]  cpuidle_enter+0x29/0x40
      [  662.815803]  do_idle+0x261/0x2b0
      [  662.815807]  cpu_startup_entry+0x19/0x20
      [  662.815809]  start_secondary+0x114/0x150
      [  662.815813]  secondary_startup_64_no_verify+0xd5/0xdb
      [  662.815818]  </TASK>
      [  662.815846] bad: scheduling from the idle thread!
      [  662.815849] CPU: 37 PID: 0 Comm: swapper/37 Kdump: loaded Tainted: G S      W  OE     5.17.1 #2
      [  662.815852] Hardware name: Intel Corporation D50DNP/D50DNP, BIOS SE5C6301.86B.6624.D18.2111021741 11/02/2021
      [  662.815853] Call Trace:
      [  662.815855]  <IRQ>
      [  662.815856]  dump_stack_lvl+0x33/0x42
      [  662.815860]  dequeue_task_idle+0x20/0x30
      [  662.815863]  __schedule+0x1c3/0x990
      [  662.815868]  schedule+0x44/0xc0
      [  662.815871]  schedule_preempt_disabled+0x14/0x20
      [  662.815873]  __mutex_lock.isra.11+0x3a8/0x490
      [  662.815876]  ? __ibdev_printk+0x76/0xc0 [ib_core]
      [  662.815904]  device_del+0x37/0x3d0
      [  662.815909]  ice_unplug_aux_dev+0x1a/0x40 [ice]
      [  662.815937]  ice_schedule_reset+0x3c/0xd0 [ice]
      [  662.815961]  irdma_iidc_event_handler.cold.7+0xb6/0xd3 [irdma]
      [  662.815979]  ? bitmap_find_next_zero_area_off+0x45/0xa0
      [  662.815985]  ice_send_event_to_aux+0x54/0x70 [ice]
      [  662.816011]  ice_misc_intr+0x21d/0x2d0 [ice]
      [  662.816033]  __handle_irq_event_percpu+0x4c/0x180
      [  662.816037]  handle_irq_event_percpu+0xf/0x40
      [  662.816039]  handle_irq_event+0x34/0x60
      [  662.816042]  handle_edge_irq+0x9a/0x1c0
      [  662.816045]  __common_interrupt+0x62/0x100
      [  662.816048]  common_interrupt+0xb4/0xd0
      [  662.816052]  </IRQ>
      [  662.816053]  <TASK>
      [  662.816054]  asm_common_interrupt+0x1e/0x40
      [  662.816057] RIP: 0010:cpuidle_enter_state+0xd6/0x380
      [  662.816060] Code: 49 89 c4 0f 1f 44 00 00 31 ff e8 65 d7 95 ff 45 84 ff 74 12 9c 58 f6 c4 02 0f 85 64 02 00 00 31 ff e8 ae c5 9c ff fb 45 85 f6 <0f> 88 12 01 00 00 49 63 d6 4c 2b 24 24 48 8d 04 52 48 8d 04 82 49
      [  662.816063] RSP: 0018:ff2c2c4f18edbe80 EFLAGS: 00000202
      [  662.816065] RAX: ff280805df140000 RBX: 0000000000000002 RCX: 000000000000001f
      [  662.816067] RDX: 0000009a52da2d08 RSI: ffffffff93f8240b RDI: ffffffff93f53ee7
      [  662.816068] RBP: ff5e2bd11ff41928 R08: 0000000000000000 R09: 000000000002f8c0
      [  662.816070] R10: 0000010c3f18e2cf R11: 000000000000000f R12: 0000009a52da2d08
      [  662.816071] R13: ffffffff94ad7e20 R14: 0000000000000002 R15: 0000000000000000
      [  662.816075]  cpuidle_enter+0x29/0x40
      [  662.816077]  do_idle+0x261/0x2b0
      [  662.816080]  cpu_startup_entry+0x19/0x20
      [  662.816083]  start_secondary+0x114/0x150
      [  662.816087]  secondary_startup_64_no_verify+0xd5/0xdb
      [  662.816091]  </TASK>
      [  662.816169] bad: scheduling from the idle thread!
      
      The correct place to unplug the aux devices for a reset is in the
      prepare_for_reset function, as this is a common place for all reset flows.
      It also has built in protection from being called twice in a single reset
      instance before the aux devices are replugged.
      
      Fixes: f9f5301e ("ice: Register auxiliary device to provide RDMA")
      Signed-off-by: NDave Ertman <david.m.ertman@intel.com>
      Tested-by: NHelena Anna Dubel <helena.anna.dubel@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      23c61919
  15. 07 9月, 2022 1 次提交
    • T
      ice: Allow operation with reduced device MSI-X · ce462613
      Tony Nguyen 提交于
      The driver currently takes an all or nothing approach for device MSI-X
      vectors. Meaning if it does not get its full allocation, it will fail and
      not load. There is no reason it can't work with a reduced number of MSI-X
      vectors. Take a similar approach as commit 741106f7 ("ice: Improve
      MSI-X fallback logic") and, instead, adjust the MSI-X request to make use
      of what is available.
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      Tested-by: NPetr Oros <poros@redhat.com>
      Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
      ce462613
  16. 02 9月, 2022 2 次提交
  17. 22 8月, 2022 1 次提交
    • M
      ice: xsk: use Rx ring's XDP ring when picking NAPI context · 9ead7e74
      Maciej Fijalkowski 提交于
      Ice driver allocates per cpu XDP queues so that redirect path can safely
      use smp_processor_id() as an index to the array. At the same time
      though, XDP rings are used to pick NAPI context to call napi_schedule()
      or set NAPIF_STATE_MISSED. When user reduces queue count, say to 8, and
      num_possible_cpus() of underlying platform is 44, then this means queue
      vectors with correlated NAPI contexts will carry several XDP queues.
      
      This in turn can result in a broken behavior where NAPI context of
      interest will never be scheduled and AF_XDP socket will not process any
      traffic.
      
      To fix this, let us change the way how XDP rings are assigned to Rx
      rings and use this information later on when setting
      ice_tx_ring::xsk_pool pointer. For each Rx ring, grab the associated
      queue vector and walk through Tx ring's linked list. Once we stumble
      upon XDP ring in it, assign this ring to ice_rx_ring::xdp_ring.
      
      Previous [0] approach of fixing this issue was for txonly scenario
      because of the described grouping of XDP rings across queue vectors. So,
      relying on Rx ring meant that NAPI context could be scheduled with a
      queue vector without XDP ring with associated XSK pool.
      
      [0]: https://lore.kernel.org/netdev/20220707161128.54215-1-maciej.fijalkowski@intel.com/
      
      Fixes: 2d4238f5 ("ice: Add support for AF_XDP")
      Fixes: 22bf877e ("ice: introduce XDP_TX fallback path")
      Signed-off-by: NMaciej Fijalkowski <maciej.fijalkowski@intel.com>
      Tested-by: NGeorge Kuruvinakunnel <george.kuruvinakunnel@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      9ead7e74
  18. 18 8月, 2022 4 次提交
  19. 02 8月, 2022 1 次提交
  20. 29 7月, 2022 3 次提交
  21. 27 7月, 2022 2 次提交
  22. 16 7月, 2022 1 次提交
    • Z
      ice: Remove pci_aer_clear_nonfatal_status() call · ca415ea1
      Zhuo Chen 提交于
      After commit 62b36c3e ("PCI/AER: Remove
      pci_cleanup_aer_uncorrect_error_status() calls"), calls to
      pci_cleanup_aer_uncorrect_error_status() have already been removed. But in
      commit 5995b6d0 ("ice: Implement pci_error_handler ops")
      pci_cleanup_aer_uncorrect_error_status  was used again, so remove it in
      this patch.
      Signed-off-by: NZhuo Chen <chenzhuo.1@bytedance.com>
      Cc: Muchun Song <songmuchun@bytedance.com>
      Cc: Sen Wang <wangsen.harry@bytedance.com>
      Cc: Wenliang Wang <wangwenliang.1995@bytedance.com>
      Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      ca415ea1
  23. 13 7月, 2022 1 次提交
  24. 15 6月, 2022 1 次提交
  25. 18 5月, 2022 1 次提交
    • P
      ice: fix possible under reporting of ethtool Tx and Rx statistics · 31b6298f
      Paul Greenwalt 提交于
      The hardware statistics counters are not cleared during resets so the
      drivers first access is to initialize the baseline and then subsequent
      reads are for reporting the counters. The statistics counters are read
      during the watchdog subtask when the interface is up. If the baseline
      is not initialized before the interface is up, then there can be a brief
      window in which some traffic can be transmitted/received before the
      initial baseline reading takes place.
      
      Directly initialize ethtool statistics in driver open so the baseline will
      be initialized when the interface is up, and any dropped packets
      incremented before the interface is up won't be reported.
      
      Fixes: 28dc1b86 ("ice: ignore dropped packets during init")
      Signed-off-by: NPaul Greenwalt <paul.greenwalt@intel.com>
      Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      31b6298f
  26. 09 5月, 2022 1 次提交
  27. 07 5月, 2022 1 次提交
    • I
      ice: Fix race during aux device (un)plugging · 486b9eee
      Ivan Vecera 提交于
      Function ice_plug_aux_dev() assigns pf->adev field too early prior
      aux device initialization and on other side ice_unplug_aux_dev()
      starts aux device deinit and at the end assigns NULL to pf->adev.
      This is wrong because pf->adev should always be non-NULL only when
      aux device is fully initialized and ready. This wrong order causes
      a crash when ice_send_event_to_aux() call occurs because that function
      depends on non-NULL value of pf->adev and does not assume that
      aux device is half-initialized or half-destroyed.
      After order correction the race window is tiny but it is still there,
      as Leon mentioned and manipulation with pf->adev needs to be protected
      by mutex.
      
      Fix (un-)plugging functions so pf->adev field is set after aux device
      init and prior aux device destroy and protect pf->adev assignment by
      new mutex. This mutex is also held during ice_send_event_to_aux()
      call to ensure that aux device is valid during that call.
      Note that device lock used ice_send_event_to_aux() needs to be kept
      to avoid race with aux drv unload.
      
      Reproducer:
      cycle=1
      while :;do
              echo "#### Cycle: $cycle"
      
              ip link set ens7f0 mtu 9000
              ip link add bond0 type bond mode 1 miimon 100
              ip link set bond0 up
              ifenslave bond0 ens7f0
              ip link set bond0 mtu 9000
              ethtool -L ens7f0 combined 1
              ip link del bond0
              ip link set ens7f0 mtu 1500
              sleep 1
      
              let cycle++
      done
      
      In short when the device is added/removed to/from bond the aux device
      is unplugged/plugged. When MTU of the device is changed an event is
      sent to aux device asynchronously. This can race with (un)plugging
      operation and because pf->adev is set too early (plug) or too late
      (unplug) the function ice_send_event_to_aux() can touch uninitialized
      or destroyed fields. In the case of crash below pf->adev->dev.mutex.
      
      Crash:
      [   53.372066] bond0: (slave ens7f0): making interface the new active one
      [   53.378622] bond0: (slave ens7f0): Enslaving as an active interface with an u
      p link
      [   53.386294] IPv6: ADDRCONF(NETDEV_CHANGE): bond0: link becomes ready
      [   53.549104] bond0: (slave ens7f1): Enslaving as a backup interface with an up
       link
      [   54.118906] ice 0000:ca:00.0 ens7f0: Number of in use tx queues changed inval
      idating tc mappings. Priority traffic classification disabled!
      [   54.233374] ice 0000:ca:00.1 ens7f1: Number of in use tx queues changed inval
      idating tc mappings. Priority traffic classification disabled!
      [   54.248204] bond0: (slave ens7f0): Releasing backup interface
      [   54.253955] bond0: (slave ens7f1): making interface the new active one
      [   54.274875] bond0: (slave ens7f1): Releasing backup interface
      [   54.289153] bond0 (unregistering): Released all slaves
      [   55.383179] MII link monitoring set to 100 ms
      [   55.398696] bond0: (slave ens7f0): making interface the new active one
      [   55.405241] BUG: kernel NULL pointer dereference, address: 0000000000000080
      [   55.405289] bond0: (slave ens7f0): Enslaving as an active interface with an u
      p link
      [   55.412198] #PF: supervisor write access in kernel mode
      [   55.412200] #PF: error_code(0x0002) - not-present page
      [   55.412201] PGD 25d2ad067 P4D 0
      [   55.412204] Oops: 0002 [#1] PREEMPT SMP NOPTI
      [   55.412207] CPU: 0 PID: 403 Comm: kworker/0:2 Kdump: loaded Tainted: G S
                 5.17.0-13579-g57f2d6540f03 #1
      [   55.429094] bond0: (slave ens7f1): Enslaving as a backup interface with an up
       link
      [   55.430224] Hardware name: Dell Inc. PowerEdge R750/06V45N, BIOS 1.4.4 10/07/
      2021
      [   55.430226] Workqueue: ice ice_service_task [ice]
      [   55.468169] RIP: 0010:mutex_unlock+0x10/0x20
      [   55.472439] Code: 0f b1 13 74 96 eb e0 4c 89 ee eb d8 e8 79 54 ff ff 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 65 48 8b 04 25 40 ef 01 00 31 d2 <f0> 48 0f b1 17 75 01 c3 e9 e3 fe ff ff 0f 1f 00 0f 1f 44 00 00 48
      [   55.491186] RSP: 0018:ff4454230d7d7e28 EFLAGS: 00010246
      [   55.496413] RAX: ff1a79b208b08000 RBX: ff1a79b2182e8880 RCX: 0000000000000001
      [   55.503545] RDX: 0000000000000000 RSI: ff4454230d7d7db0 RDI: 0000000000000080
      [   55.510678] RBP: ff1a79d1c7e48b68 R08: ff4454230d7d7db0 R09: 0000000000000041
      [   55.517812] R10: 00000000000000a5 R11: 00000000000006e6 R12: ff1a79d1c7e48bc0
      [   55.524945] R13: 0000000000000000 R14: ff1a79d0ffc305c0 R15: 0000000000000000
      [   55.532076] FS:  0000000000000000(0000) GS:ff1a79d0ffc00000(0000) knlGS:0000000000000000
      [   55.540163] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   55.545908] CR2: 0000000000000080 CR3: 00000003487ae003 CR4: 0000000000771ef0
      [   55.553041] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [   55.560173] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [   55.567305] PKRU: 55555554
      [   55.570018] Call Trace:
      [   55.572474]  <TASK>
      [   55.574579]  ice_service_task+0xaab/0xef0 [ice]
      [   55.579130]  process_one_work+0x1c5/0x390
      [   55.583141]  ? process_one_work+0x390/0x390
      [   55.587326]  worker_thread+0x30/0x360
      [   55.590994]  ? process_one_work+0x390/0x390
      [   55.595180]  kthread+0xe6/0x110
      [   55.598325]  ? kthread_complete_and_exit+0x20/0x20
      [   55.603116]  ret_from_fork+0x1f/0x30
      [   55.606698]  </TASK>
      
      Fixes: f9f5301e ("ice: Register auxiliary device to provide RDMA")
      Reviewed-by: NLeon Romanovsky <leonro@nvidia.com>
      Signed-off-by: NIvan Vecera <ivecera@redhat.com>
      Reviewed-by: NDave Ertman <david.m.ertman@intel.com>
      Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      486b9eee
  28. 06 5月, 2022 1 次提交
  29. 27 4月, 2022 1 次提交
    • P
      ice: wait 5 s for EMP reset after firmware flash · b537752e
      Petr Oros 提交于
      We need to wait 5 s for EMP reset after firmware flash. Code was extracted
      from OOT driver (ice v1.8.3 downloaded from sourceforge). Without this
      wait, fw_activate let card in inconsistent state and recoverable only
      by second flash/activate. Flash was tested on these fw's:
      From -> To
       3.00 -> 3.10/3.20
       3.10 -> 3.00/3.20
       3.20 -> 3.00/3.10
      
      Reproducer:
      [root@host ~]# devlink dev flash pci/0000:ca:00.0 file E810_XXVDA4_FH_O_SEC_FW_1p6p1p9_NVM_3p10_PLDMoMCTP_0.11_8000AD7B.bin
      Preparing to flash
      [fw.mgmt] Erasing
      [fw.mgmt] Erasing done
      [fw.mgmt] Flashing 100%
      [fw.mgmt] Flashing done 100%
      [fw.undi] Erasing
      [fw.undi] Erasing done
      [fw.undi] Flashing 100%
      [fw.undi] Flashing done 100%
      [fw.netlist] Erasing
      [fw.netlist] Erasing done
      [fw.netlist] Flashing 100%
      [fw.netlist] Flashing done 100%
      Activate new firmware by devlink reload
      [root@host ~]# devlink dev reload pci/0000:ca:00.0 action fw_activate
      reload_actions_performed:
          fw_activate
      [root@host ~]# ip link show ens7f0
      71: ens7f0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN mode DEFAULT group default qlen 1000
          link/ether b4:96:91:dc:72:e0 brd ff:ff:ff:ff:ff:ff
          altname enp202s0f0
      
      dmesg after flash:
      [   55.120788] ice: Copyright (c) 2018, Intel Corporation.
      [   55.274734] ice 0000:ca:00.0: Get PHY capabilities failed status = -5, continuing anyway
      [   55.569797] ice 0000:ca:00.0: The DDP package was successfully loaded: ICE OS Default Package version 1.3.28.0
      [   55.603629] ice 0000:ca:00.0: Get PHY capability failed.
      [   55.608951] ice 0000:ca:00.0: ice_init_nvm_phy_type failed: -5
      [   55.647348] ice 0000:ca:00.0: PTP init successful
      [   55.675536] ice 0000:ca:00.0: DCB is enabled in the hardware, max number of TCs supported on this port are 8
      [   55.685365] ice 0000:ca:00.0: FW LLDP is disabled, DCBx/LLDP in SW mode.
      [   55.692179] ice 0000:ca:00.0: Commit DCB Configuration to the hardware
      [   55.701382] ice 0000:ca:00.0: 126.024 Gb/s available PCIe bandwidth, limited by 16.0 GT/s PCIe x8 link at 0000:c9:02.0 (capable of 252.048 Gb/s with 16.0 GT/s PCIe x16 link)
      Reboot doesn’t help, only second flash/activate with OOT or patched
      driver put card back in consistent state.
      
      After patch:
      [root@host ~]# devlink dev flash pci/0000:ca:00.0 file E810_XXVDA4_FH_O_SEC_FW_1p6p1p9_NVM_3p10_PLDMoMCTP_0.11_8000AD7B.bin
      Preparing to flash
      [fw.mgmt] Erasing
      [fw.mgmt] Erasing done
      [fw.mgmt] Flashing 100%
      [fw.mgmt] Flashing done 100%
      [fw.undi] Erasing
      [fw.undi] Erasing done
      [fw.undi] Flashing 100%
      [fw.undi] Flashing done 100%
      [fw.netlist] Erasing
      [fw.netlist] Erasing done
      [fw.netlist] Flashing 100%
      [fw.netlist] Flashing done 100%
      Activate new firmware by devlink reload
      [root@host ~]# devlink dev reload pci/0000:ca:00.0 action fw_activate
      reload_actions_performed:
          fw_activate
      [root@host ~]# ip link show ens7f0
      19: ens7f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
          link/ether b4:96:91:dc:72:e0 brd ff:ff:ff:ff:ff:ff
          altname enp202s0f0
      
      Fixes: 399e27db ("ice: support immediate firmware activation via devlink reload")
      Signed-off-by: NPetr Oros <poros@redhat.com>
      Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      b537752e