1. 24 11月, 2022 2 次提交
    • B
      ice: Accumulate ring statistics over reset · 288ecf49
      Benjamin Mikailenko 提交于
      Resets may occur with or without user interaction. For example, a TX hang
      or reconfiguration of parameters will result in a reset. During reset, the
      VSI is freed, freeing any statistics structures inside as well. This would
      create an issue for the user where a reset happens in the background,
      statistics set to zero, and the user checks ring statistics expecting them
      to be populated.
      
      To ensure this doesn't happen, accumulate ring statistics over reset.
      
      Define a new ring statistics structure, ice_ring_stats. The new structure
      lives in the VSI's parent, preserving ring statistics when VSI is freed.
      
      1. Define a new structure vsi_ring_stats in the PF scope
      2. Allocate/free stats only during probe, unload, or change in ring size
      3. Replace previous ring statistics functionality with new structure
      Signed-off-by: NBenjamin Mikailenko <benjamin.mikailenko@intel.com>
      Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel)
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      288ecf49
    • B
      ice: Accumulate HW and Netdev statistics over reset · 2fd5e433
      Benjamin Mikailenko 提交于
      Resets happen with or without user interaction. For example, incidents
      such as TX hang or a reconfiguration of parameters will result in a reset.
      During reset, hardware and software statistics were set to zero. This
      created an issue for the user where a reset happens in the background,
      statistics set to zero, and the user checks statistics expecting them to
      be populated.
      
      To ensure this doesn't happen, keep accumulating stats over reset.
      
      1. Remove function calls which reset hardware and netdev statistics.
      2. Do not rollover statistics in ice_stat_update40 during reset.
      Signed-off-by: NBenjamin Mikailenko <benjamin.mikailenko@intel.com>
      Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel)
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      2fd5e433
  2. 18 11月, 2022 1 次提交
  3. 04 11月, 2022 2 次提交
  4. 29 10月, 2022 1 次提交
  5. 25 10月, 2022 1 次提交
  6. 29 9月, 2022 1 次提交
  7. 27 9月, 2022 1 次提交
  8. 21 9月, 2022 3 次提交
  9. 09 9月, 2022 1 次提交
    • D
      ice: Don't double unplug aux on peer initiated reset · 23c61919
      Dave Ertman 提交于
      In the IDC callback that is accessed when the aux drivers request a reset,
      the function to unplug the aux devices is called.  This function is also
      called in the ice_prepare_for_reset function. This double call is causing
      a "scheduling while atomic" BUG.
      
      [  662.676430] ice 0000:4c:00.0 rocep76s0: cqp opcode = 0x1 maj_err_code = 0xffff min_err_code = 0x8003
      
      [  662.676609] ice 0000:4c:00.0 rocep76s0: [Modify QP Cmd Error][op_code=8] status=-29 waiting=1 completion_err=1 maj=0xffff min=0x8003
      
      [  662.815006] ice 0000:4c:00.0 rocep76s0: ICE OICR event notification: oicr = 0x10000003
      
      [  662.815014] ice 0000:4c:00.0 rocep76s0: critical PE Error, GLPE_CRITERR=0x00011424
      
      [  662.815017] ice 0000:4c:00.0 rocep76s0: Requesting a reset
      
      [  662.815475] BUG: scheduling while atomic: swapper/37/0/0x00010002
      
      [  662.815475] BUG: scheduling while atomic: swapper/37/0/0x00010002
      [  662.815477] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache netfs rfkill 8021q garp mrp stp llc vfat fat rpcrdma intel_rapl_msr intel_rapl_common sunrpc i10nm_edac rdma_ucm nfit ib_srpt libnvdimm ib_isert iscsi_target_mod x86_pkg_temp_thermal intel_powerclamp coretemp target_core_mod snd_hda_intel ib_iser snd_intel_dspcfg libiscsi snd_intel_sdw_acpi scsi_transport_iscsi kvm_intel iTCO_wdt rdma_cm snd_hda_codec kvm iw_cm ipmi_ssif iTCO_vendor_support snd_hda_core irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hwdep snd_seq snd_seq_device rapl snd_pcm snd_timer isst_if_mbox_pci pcspkr isst_if_mmio irdma intel_uncore idxd acpi_ipmi joydev isst_if_common snd mei_me idxd_bus ipmi_si soundcore i2c_i801 mei ipmi_devintf i2c_smbus i2c_ismt ipmi_msghandler acpi_power_meter acpi_pad rv(OE) ib_uverbs ib_cm ib_core xfs libcrc32c ast i2c_algo_bit drm_vram_helper drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm_ttm_helpe
       r ttm
      [  662.815546]  nvme nvme_core ice drm crc32c_intel i40e t10_pi wmi pinctrl_emmitsburg dm_mirror dm_region_hash dm_log dm_mod fuse
      [  662.815557] Preemption disabled at:
      [  662.815558] [<0000000000000000>] 0x0
      [  662.815563] CPU: 37 PID: 0 Comm: swapper/37 Kdump: loaded Tainted: G S         OE     5.17.1 #2
      [  662.815566] Hardware name: Intel Corporation D50DNP/D50DNP, BIOS SE5C6301.86B.6624.D18.2111021741 11/02/2021
      [  662.815568] Call Trace:
      [  662.815572]  <IRQ>
      [  662.815574]  dump_stack_lvl+0x33/0x42
      [  662.815581]  __schedule_bug.cold.147+0x7d/0x8a
      [  662.815588]  __schedule+0x798/0x990
      [  662.815595]  schedule+0x44/0xc0
      [  662.815597]  schedule_preempt_disabled+0x14/0x20
      [  662.815600]  __mutex_lock.isra.11+0x46c/0x490
      [  662.815603]  ? __ibdev_printk+0x76/0xc0 [ib_core]
      [  662.815633]  device_del+0x37/0x3d0
      [  662.815639]  ice_unplug_aux_dev+0x1a/0x40 [ice]
      [  662.815674]  ice_schedule_reset+0x3c/0xd0 [ice]
      [  662.815693]  irdma_iidc_event_handler.cold.7+0xb6/0xd3 [irdma]
      [  662.815712]  ? bitmap_find_next_zero_area_off+0x45/0xa0
      [  662.815719]  ice_send_event_to_aux+0x54/0x70 [ice]
      [  662.815741]  ice_misc_intr+0x21d/0x2d0 [ice]
      [  662.815756]  __handle_irq_event_percpu+0x4c/0x180
      [  662.815762]  handle_irq_event_percpu+0xf/0x40
      [  662.815764]  handle_irq_event+0x34/0x60
      [  662.815766]  handle_edge_irq+0x9a/0x1c0
      [  662.815770]  __common_interrupt+0x62/0x100
      [  662.815774]  common_interrupt+0xb4/0xd0
      [  662.815779]  </IRQ>
      [  662.815780]  <TASK>
      [  662.815780]  asm_common_interrupt+0x1e/0x40
      [  662.815785] RIP: 0010:cpuidle_enter_state+0xd6/0x380
      [  662.815789] Code: 49 89 c4 0f 1f 44 00 00 31 ff e8 65 d7 95 ff 45 84 ff 74 12 9c 58 f6 c4 02 0f 85 64 02 00 00 31 ff e8 ae c5 9c ff fb 45 85 f6 <0f> 88 12 01 00 00 49 63 d6 4c 2b 24 24 48 8d 04 52 48 8d 04 82 49
      [  662.815791] RSP: 0018:ff2c2c4f18edbe80 EFLAGS: 00000202
      [  662.815793] RAX: ff280805df140000 RBX: 0000000000000002 RCX: 000000000000001f
      [  662.815795] RDX: 0000009a52da2d08 RSI: ffffffff93f8240b RDI: ffffffff93f53ee7
      [  662.815796] RBP: ff5e2bd11ff41928 R08: 0000000000000000 R09: 000000000002f8c0
      [  662.815797] R10: 0000010c3f18e2cf R11: 000000000000000f R12: 0000009a52da2d08
      [  662.815798] R13: ffffffff94ad7e20 R14: 0000000000000002 R15: 0000000000000000
      [  662.815801]  cpuidle_enter+0x29/0x40
      [  662.815803]  do_idle+0x261/0x2b0
      [  662.815807]  cpu_startup_entry+0x19/0x20
      [  662.815809]  start_secondary+0x114/0x150
      [  662.815813]  secondary_startup_64_no_verify+0xd5/0xdb
      [  662.815818]  </TASK>
      [  662.815846] bad: scheduling from the idle thread!
      [  662.815849] CPU: 37 PID: 0 Comm: swapper/37 Kdump: loaded Tainted: G S      W  OE     5.17.1 #2
      [  662.815852] Hardware name: Intel Corporation D50DNP/D50DNP, BIOS SE5C6301.86B.6624.D18.2111021741 11/02/2021
      [  662.815853] Call Trace:
      [  662.815855]  <IRQ>
      [  662.815856]  dump_stack_lvl+0x33/0x42
      [  662.815860]  dequeue_task_idle+0x20/0x30
      [  662.815863]  __schedule+0x1c3/0x990
      [  662.815868]  schedule+0x44/0xc0
      [  662.815871]  schedule_preempt_disabled+0x14/0x20
      [  662.815873]  __mutex_lock.isra.11+0x3a8/0x490
      [  662.815876]  ? __ibdev_printk+0x76/0xc0 [ib_core]
      [  662.815904]  device_del+0x37/0x3d0
      [  662.815909]  ice_unplug_aux_dev+0x1a/0x40 [ice]
      [  662.815937]  ice_schedule_reset+0x3c/0xd0 [ice]
      [  662.815961]  irdma_iidc_event_handler.cold.7+0xb6/0xd3 [irdma]
      [  662.815979]  ? bitmap_find_next_zero_area_off+0x45/0xa0
      [  662.815985]  ice_send_event_to_aux+0x54/0x70 [ice]
      [  662.816011]  ice_misc_intr+0x21d/0x2d0 [ice]
      [  662.816033]  __handle_irq_event_percpu+0x4c/0x180
      [  662.816037]  handle_irq_event_percpu+0xf/0x40
      [  662.816039]  handle_irq_event+0x34/0x60
      [  662.816042]  handle_edge_irq+0x9a/0x1c0
      [  662.816045]  __common_interrupt+0x62/0x100
      [  662.816048]  common_interrupt+0xb4/0xd0
      [  662.816052]  </IRQ>
      [  662.816053]  <TASK>
      [  662.816054]  asm_common_interrupt+0x1e/0x40
      [  662.816057] RIP: 0010:cpuidle_enter_state+0xd6/0x380
      [  662.816060] Code: 49 89 c4 0f 1f 44 00 00 31 ff e8 65 d7 95 ff 45 84 ff 74 12 9c 58 f6 c4 02 0f 85 64 02 00 00 31 ff e8 ae c5 9c ff fb 45 85 f6 <0f> 88 12 01 00 00 49 63 d6 4c 2b 24 24 48 8d 04 52 48 8d 04 82 49
      [  662.816063] RSP: 0018:ff2c2c4f18edbe80 EFLAGS: 00000202
      [  662.816065] RAX: ff280805df140000 RBX: 0000000000000002 RCX: 000000000000001f
      [  662.816067] RDX: 0000009a52da2d08 RSI: ffffffff93f8240b RDI: ffffffff93f53ee7
      [  662.816068] RBP: ff5e2bd11ff41928 R08: 0000000000000000 R09: 000000000002f8c0
      [  662.816070] R10: 0000010c3f18e2cf R11: 000000000000000f R12: 0000009a52da2d08
      [  662.816071] R13: ffffffff94ad7e20 R14: 0000000000000002 R15: 0000000000000000
      [  662.816075]  cpuidle_enter+0x29/0x40
      [  662.816077]  do_idle+0x261/0x2b0
      [  662.816080]  cpu_startup_entry+0x19/0x20
      [  662.816083]  start_secondary+0x114/0x150
      [  662.816087]  secondary_startup_64_no_verify+0xd5/0xdb
      [  662.816091]  </TASK>
      [  662.816169] bad: scheduling from the idle thread!
      
      The correct place to unplug the aux devices for a reset is in the
      prepare_for_reset function, as this is a common place for all reset flows.
      It also has built in protection from being called twice in a single reset
      instance before the aux devices are replugged.
      
      Fixes: f9f5301e ("ice: Register auxiliary device to provide RDMA")
      Signed-off-by: NDave Ertman <david.m.ertman@intel.com>
      Tested-by: NHelena Anna Dubel <helena.anna.dubel@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      23c61919
  10. 07 9月, 2022 1 次提交
    • T
      ice: Allow operation with reduced device MSI-X · ce462613
      Tony Nguyen 提交于
      The driver currently takes an all or nothing approach for device MSI-X
      vectors. Meaning if it does not get its full allocation, it will fail and
      not load. There is no reason it can't work with a reduced number of MSI-X
      vectors. Take a similar approach as commit 741106f7 ("ice: Improve
      MSI-X fallback logic") and, instead, adjust the MSI-X request to make use
      of what is available.
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      Tested-by: NPetr Oros <poros@redhat.com>
      Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
      ce462613
  11. 02 9月, 2022 2 次提交
  12. 22 8月, 2022 1 次提交
    • M
      ice: xsk: use Rx ring's XDP ring when picking NAPI context · 9ead7e74
      Maciej Fijalkowski 提交于
      Ice driver allocates per cpu XDP queues so that redirect path can safely
      use smp_processor_id() as an index to the array. At the same time
      though, XDP rings are used to pick NAPI context to call napi_schedule()
      or set NAPIF_STATE_MISSED. When user reduces queue count, say to 8, and
      num_possible_cpus() of underlying platform is 44, then this means queue
      vectors with correlated NAPI contexts will carry several XDP queues.
      
      This in turn can result in a broken behavior where NAPI context of
      interest will never be scheduled and AF_XDP socket will not process any
      traffic.
      
      To fix this, let us change the way how XDP rings are assigned to Rx
      rings and use this information later on when setting
      ice_tx_ring::xsk_pool pointer. For each Rx ring, grab the associated
      queue vector and walk through Tx ring's linked list. Once we stumble
      upon XDP ring in it, assign this ring to ice_rx_ring::xdp_ring.
      
      Previous [0] approach of fixing this issue was for txonly scenario
      because of the described grouping of XDP rings across queue vectors. So,
      relying on Rx ring meant that NAPI context could be scheduled with a
      queue vector without XDP ring with associated XSK pool.
      
      [0]: https://lore.kernel.org/netdev/20220707161128.54215-1-maciej.fijalkowski@intel.com/
      
      Fixes: 2d4238f5 ("ice: Add support for AF_XDP")
      Fixes: 22bf877e ("ice: introduce XDP_TX fallback path")
      Signed-off-by: NMaciej Fijalkowski <maciej.fijalkowski@intel.com>
      Tested-by: NGeorge Kuruvinakunnel <george.kuruvinakunnel@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      9ead7e74
  13. 18 8月, 2022 4 次提交
  14. 02 8月, 2022 1 次提交
  15. 29 7月, 2022 3 次提交
  16. 27 7月, 2022 2 次提交
  17. 16 7月, 2022 1 次提交
    • Z
      ice: Remove pci_aer_clear_nonfatal_status() call · ca415ea1
      Zhuo Chen 提交于
      After commit 62b36c3e ("PCI/AER: Remove
      pci_cleanup_aer_uncorrect_error_status() calls"), calls to
      pci_cleanup_aer_uncorrect_error_status() have already been removed. But in
      commit 5995b6d0 ("ice: Implement pci_error_handler ops")
      pci_cleanup_aer_uncorrect_error_status  was used again, so remove it in
      this patch.
      Signed-off-by: NZhuo Chen <chenzhuo.1@bytedance.com>
      Cc: Muchun Song <songmuchun@bytedance.com>
      Cc: Sen Wang <wangsen.harry@bytedance.com>
      Cc: Wenliang Wang <wangwenliang.1995@bytedance.com>
      Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      ca415ea1
  18. 13 7月, 2022 1 次提交
  19. 15 6月, 2022 1 次提交
  20. 18 5月, 2022 1 次提交
    • P
      ice: fix possible under reporting of ethtool Tx and Rx statistics · 31b6298f
      Paul Greenwalt 提交于
      The hardware statistics counters are not cleared during resets so the
      drivers first access is to initialize the baseline and then subsequent
      reads are for reporting the counters. The statistics counters are read
      during the watchdog subtask when the interface is up. If the baseline
      is not initialized before the interface is up, then there can be a brief
      window in which some traffic can be transmitted/received before the
      initial baseline reading takes place.
      
      Directly initialize ethtool statistics in driver open so the baseline will
      be initialized when the interface is up, and any dropped packets
      incremented before the interface is up won't be reported.
      
      Fixes: 28dc1b86 ("ice: ignore dropped packets during init")
      Signed-off-by: NPaul Greenwalt <paul.greenwalt@intel.com>
      Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      31b6298f
  21. 09 5月, 2022 1 次提交
  22. 07 5月, 2022 1 次提交
    • I
      ice: Fix race during aux device (un)plugging · 486b9eee
      Ivan Vecera 提交于
      Function ice_plug_aux_dev() assigns pf->adev field too early prior
      aux device initialization and on other side ice_unplug_aux_dev()
      starts aux device deinit and at the end assigns NULL to pf->adev.
      This is wrong because pf->adev should always be non-NULL only when
      aux device is fully initialized and ready. This wrong order causes
      a crash when ice_send_event_to_aux() call occurs because that function
      depends on non-NULL value of pf->adev and does not assume that
      aux device is half-initialized or half-destroyed.
      After order correction the race window is tiny but it is still there,
      as Leon mentioned and manipulation with pf->adev needs to be protected
      by mutex.
      
      Fix (un-)plugging functions so pf->adev field is set after aux device
      init and prior aux device destroy and protect pf->adev assignment by
      new mutex. This mutex is also held during ice_send_event_to_aux()
      call to ensure that aux device is valid during that call.
      Note that device lock used ice_send_event_to_aux() needs to be kept
      to avoid race with aux drv unload.
      
      Reproducer:
      cycle=1
      while :;do
              echo "#### Cycle: $cycle"
      
              ip link set ens7f0 mtu 9000
              ip link add bond0 type bond mode 1 miimon 100
              ip link set bond0 up
              ifenslave bond0 ens7f0
              ip link set bond0 mtu 9000
              ethtool -L ens7f0 combined 1
              ip link del bond0
              ip link set ens7f0 mtu 1500
              sleep 1
      
              let cycle++
      done
      
      In short when the device is added/removed to/from bond the aux device
      is unplugged/plugged. When MTU of the device is changed an event is
      sent to aux device asynchronously. This can race with (un)plugging
      operation and because pf->adev is set too early (plug) or too late
      (unplug) the function ice_send_event_to_aux() can touch uninitialized
      or destroyed fields. In the case of crash below pf->adev->dev.mutex.
      
      Crash:
      [   53.372066] bond0: (slave ens7f0): making interface the new active one
      [   53.378622] bond0: (slave ens7f0): Enslaving as an active interface with an u
      p link
      [   53.386294] IPv6: ADDRCONF(NETDEV_CHANGE): bond0: link becomes ready
      [   53.549104] bond0: (slave ens7f1): Enslaving as a backup interface with an up
       link
      [   54.118906] ice 0000:ca:00.0 ens7f0: Number of in use tx queues changed inval
      idating tc mappings. Priority traffic classification disabled!
      [   54.233374] ice 0000:ca:00.1 ens7f1: Number of in use tx queues changed inval
      idating tc mappings. Priority traffic classification disabled!
      [   54.248204] bond0: (slave ens7f0): Releasing backup interface
      [   54.253955] bond0: (slave ens7f1): making interface the new active one
      [   54.274875] bond0: (slave ens7f1): Releasing backup interface
      [   54.289153] bond0 (unregistering): Released all slaves
      [   55.383179] MII link monitoring set to 100 ms
      [   55.398696] bond0: (slave ens7f0): making interface the new active one
      [   55.405241] BUG: kernel NULL pointer dereference, address: 0000000000000080
      [   55.405289] bond0: (slave ens7f0): Enslaving as an active interface with an u
      p link
      [   55.412198] #PF: supervisor write access in kernel mode
      [   55.412200] #PF: error_code(0x0002) - not-present page
      [   55.412201] PGD 25d2ad067 P4D 0
      [   55.412204] Oops: 0002 [#1] PREEMPT SMP NOPTI
      [   55.412207] CPU: 0 PID: 403 Comm: kworker/0:2 Kdump: loaded Tainted: G S
                 5.17.0-13579-g57f2d6540f03 #1
      [   55.429094] bond0: (slave ens7f1): Enslaving as a backup interface with an up
       link
      [   55.430224] Hardware name: Dell Inc. PowerEdge R750/06V45N, BIOS 1.4.4 10/07/
      2021
      [   55.430226] Workqueue: ice ice_service_task [ice]
      [   55.468169] RIP: 0010:mutex_unlock+0x10/0x20
      [   55.472439] Code: 0f b1 13 74 96 eb e0 4c 89 ee eb d8 e8 79 54 ff ff 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 65 48 8b 04 25 40 ef 01 00 31 d2 <f0> 48 0f b1 17 75 01 c3 e9 e3 fe ff ff 0f 1f 00 0f 1f 44 00 00 48
      [   55.491186] RSP: 0018:ff4454230d7d7e28 EFLAGS: 00010246
      [   55.496413] RAX: ff1a79b208b08000 RBX: ff1a79b2182e8880 RCX: 0000000000000001
      [   55.503545] RDX: 0000000000000000 RSI: ff4454230d7d7db0 RDI: 0000000000000080
      [   55.510678] RBP: ff1a79d1c7e48b68 R08: ff4454230d7d7db0 R09: 0000000000000041
      [   55.517812] R10: 00000000000000a5 R11: 00000000000006e6 R12: ff1a79d1c7e48bc0
      [   55.524945] R13: 0000000000000000 R14: ff1a79d0ffc305c0 R15: 0000000000000000
      [   55.532076] FS:  0000000000000000(0000) GS:ff1a79d0ffc00000(0000) knlGS:0000000000000000
      [   55.540163] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   55.545908] CR2: 0000000000000080 CR3: 00000003487ae003 CR4: 0000000000771ef0
      [   55.553041] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [   55.560173] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [   55.567305] PKRU: 55555554
      [   55.570018] Call Trace:
      [   55.572474]  <TASK>
      [   55.574579]  ice_service_task+0xaab/0xef0 [ice]
      [   55.579130]  process_one_work+0x1c5/0x390
      [   55.583141]  ? process_one_work+0x390/0x390
      [   55.587326]  worker_thread+0x30/0x360
      [   55.590994]  ? process_one_work+0x390/0x390
      [   55.595180]  kthread+0xe6/0x110
      [   55.598325]  ? kthread_complete_and_exit+0x20/0x20
      [   55.603116]  ret_from_fork+0x1f/0x30
      [   55.606698]  </TASK>
      
      Fixes: f9f5301e ("ice: Register auxiliary device to provide RDMA")
      Reviewed-by: NLeon Romanovsky <leonro@nvidia.com>
      Signed-off-by: NIvan Vecera <ivecera@redhat.com>
      Reviewed-by: NDave Ertman <david.m.ertman@intel.com>
      Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      486b9eee
  23. 06 5月, 2022 1 次提交
  24. 27 4月, 2022 1 次提交
    • P
      ice: wait 5 s for EMP reset after firmware flash · b537752e
      Petr Oros 提交于
      We need to wait 5 s for EMP reset after firmware flash. Code was extracted
      from OOT driver (ice v1.8.3 downloaded from sourceforge). Without this
      wait, fw_activate let card in inconsistent state and recoverable only
      by second flash/activate. Flash was tested on these fw's:
      From -> To
       3.00 -> 3.10/3.20
       3.10 -> 3.00/3.20
       3.20 -> 3.00/3.10
      
      Reproducer:
      [root@host ~]# devlink dev flash pci/0000:ca:00.0 file E810_XXVDA4_FH_O_SEC_FW_1p6p1p9_NVM_3p10_PLDMoMCTP_0.11_8000AD7B.bin
      Preparing to flash
      [fw.mgmt] Erasing
      [fw.mgmt] Erasing done
      [fw.mgmt] Flashing 100%
      [fw.mgmt] Flashing done 100%
      [fw.undi] Erasing
      [fw.undi] Erasing done
      [fw.undi] Flashing 100%
      [fw.undi] Flashing done 100%
      [fw.netlist] Erasing
      [fw.netlist] Erasing done
      [fw.netlist] Flashing 100%
      [fw.netlist] Flashing done 100%
      Activate new firmware by devlink reload
      [root@host ~]# devlink dev reload pci/0000:ca:00.0 action fw_activate
      reload_actions_performed:
          fw_activate
      [root@host ~]# ip link show ens7f0
      71: ens7f0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN mode DEFAULT group default qlen 1000
          link/ether b4:96:91:dc:72:e0 brd ff:ff:ff:ff:ff:ff
          altname enp202s0f0
      
      dmesg after flash:
      [   55.120788] ice: Copyright (c) 2018, Intel Corporation.
      [   55.274734] ice 0000:ca:00.0: Get PHY capabilities failed status = -5, continuing anyway
      [   55.569797] ice 0000:ca:00.0: The DDP package was successfully loaded: ICE OS Default Package version 1.3.28.0
      [   55.603629] ice 0000:ca:00.0: Get PHY capability failed.
      [   55.608951] ice 0000:ca:00.0: ice_init_nvm_phy_type failed: -5
      [   55.647348] ice 0000:ca:00.0: PTP init successful
      [   55.675536] ice 0000:ca:00.0: DCB is enabled in the hardware, max number of TCs supported on this port are 8
      [   55.685365] ice 0000:ca:00.0: FW LLDP is disabled, DCBx/LLDP in SW mode.
      [   55.692179] ice 0000:ca:00.0: Commit DCB Configuration to the hardware
      [   55.701382] ice 0000:ca:00.0: 126.024 Gb/s available PCIe bandwidth, limited by 16.0 GT/s PCIe x8 link at 0000:c9:02.0 (capable of 252.048 Gb/s with 16.0 GT/s PCIe x16 link)
      Reboot doesn’t help, only second flash/activate with OOT or patched
      driver put card back in consistent state.
      
      After patch:
      [root@host ~]# devlink dev flash pci/0000:ca:00.0 file E810_XXVDA4_FH_O_SEC_FW_1p6p1p9_NVM_3p10_PLDMoMCTP_0.11_8000AD7B.bin
      Preparing to flash
      [fw.mgmt] Erasing
      [fw.mgmt] Erasing done
      [fw.mgmt] Flashing 100%
      [fw.mgmt] Flashing done 100%
      [fw.undi] Erasing
      [fw.undi] Erasing done
      [fw.undi] Flashing 100%
      [fw.undi] Flashing done 100%
      [fw.netlist] Erasing
      [fw.netlist] Erasing done
      [fw.netlist] Flashing 100%
      [fw.netlist] Flashing done 100%
      Activate new firmware by devlink reload
      [root@host ~]# devlink dev reload pci/0000:ca:00.0 action fw_activate
      reload_actions_performed:
          fw_activate
      [root@host ~]# ip link show ens7f0
      19: ens7f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
          link/ether b4:96:91:dc:72:e0 brd ff:ff:ff:ff:ff:ff
          altname enp202s0f0
      
      Fixes: 399e27db ("ice: support immediate firmware activation via devlink reload")
      Signed-off-by: NPetr Oros <poros@redhat.com>
      Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      b537752e
  25. 13 4月, 2022 1 次提交
    • J
      ice: Add mpls+tso support · 69e66c04
      Joe Damato 提交于
      Attempt to add mpls+tso support.
      
      I don't have ice hardware available to test myself, but I just implemented
      this feature in i40e and thought it might be useful to implement for ice
      while this is fresh in my brain.
      
      Hoping some one at intel will be able to test this on my behalf.
      Signed-off-by: NJoe Damato <jdamato@fastly.com>
      Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      69e66c04
  26. 09 4月, 2022 1 次提交
    • A
      ice: arfs: fix use-after-free when freeing @rx_cpu_rmap · d7442f51
      Alexander Lobakin 提交于
      The CI testing bots triggered the following splat:
      
      [  718.203054] BUG: KASAN: use-after-free in free_irq_cpu_rmap+0x53/0x80
      [  718.206349] Read of size 4 at addr ffff8881bd127e00 by task sh/20834
      [  718.212852] CPU: 28 PID: 20834 Comm: sh Kdump: loaded Tainted: G S      W IOE     5.17.0-rc8_nextqueue-devqueue-02643-g23f3121aca93 #1
      [  718.219695] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0012.070720200218 07/07/2020
      [  718.223418] Call Trace:
      [  718.227139]
      [  718.230783]  dump_stack_lvl+0x33/0x42
      [  718.234431]  print_address_description.constprop.9+0x21/0x170
      [  718.238177]  ? free_irq_cpu_rmap+0x53/0x80
      [  718.241885]  ? free_irq_cpu_rmap+0x53/0x80
      [  718.245539]  kasan_report.cold.18+0x7f/0x11b
      [  718.249197]  ? free_irq_cpu_rmap+0x53/0x80
      [  718.252852]  free_irq_cpu_rmap+0x53/0x80
      [  718.256471]  ice_free_cpu_rx_rmap.part.11+0x37/0x50 [ice]
      [  718.260174]  ice_remove_arfs+0x5f/0x70 [ice]
      [  718.263810]  ice_rebuild_arfs+0x3b/0x70 [ice]
      [  718.267419]  ice_rebuild+0x39c/0xb60 [ice]
      [  718.270974]  ? asm_sysvec_apic_timer_interrupt+0x12/0x20
      [  718.274472]  ? ice_init_phy_user_cfg+0x360/0x360 [ice]
      [  718.278033]  ? delay_tsc+0x4a/0xb0
      [  718.281513]  ? preempt_count_sub+0x14/0xc0
      [  718.284984]  ? delay_tsc+0x8f/0xb0
      [  718.288463]  ice_do_reset+0x92/0xf0 [ice]
      [  718.292014]  ice_pci_err_resume+0x91/0xf0 [ice]
      [  718.295561]  pci_reset_function+0x53/0x80
      <...>
      [  718.393035] Allocated by task 690:
      [  718.433497] Freed by task 20834:
      [  718.495688] Last potentially related work creation:
      [  718.568966] The buggy address belongs to the object at ffff8881bd127e00
                      which belongs to the cache kmalloc-96 of size 96
      [  718.574085] The buggy address is located 0 bytes inside of
                      96-byte region [ffff8881bd127e00, ffff8881bd127e60)
      [  718.579265] The buggy address belongs to the page:
      [  718.598905] Memory state around the buggy address:
      [  718.601809]  ffff8881bd127d00: fa fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
      [  718.604796]  ffff8881bd127d80: 00 00 00 00 00 00 00 00 00 00 fc fc fc fc fc fc
      [  718.607794] >ffff8881bd127e00: fa fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
      [  718.610811]                    ^
      [  718.613819]  ffff8881bd127e80: 00 00 00 00 00 00 00 00 00 00 00 00 fc fc fc fc
      [  718.617107]  ffff8881bd127f00: fa fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
      
      This is due to that free_irq_cpu_rmap() is always being called
      *after* (devm_)free_irq() and thus it tries to work with IRQ descs
      already freed. For example, on device reset the driver frees the
      rmap right before allocating a new one (the splat above).
      Make rmap creation and freeing function symmetrical with
      {request,free}_irq() calls i.e. do that on ifup/ifdown instead
      of device probe/remove/resume. These operations can be performed
      independently from the actual device aRFS configuration.
      Also, make sure ice_vsi_free_irq() clears IRQ affinity notifiers
      only when aRFS is disabled -- otherwise, CPU rmap sets and clears
      its own and they must not be touched manually.
      
      Fixes: 28bf2672 ("ice: Implement aRFS")
      Co-developed-by: NIvan Vecera <ivecera@redhat.com>
      Signed-off-by: NIvan Vecera <ivecera@redhat.com>
      Signed-off-by: NAlexander Lobakin <alexandr.lobakin@intel.com>
      Tested-by: NIvan Vecera <ivecera@redhat.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      d7442f51
  27. 06 4月, 2022 2 次提交
  28. 01 4月, 2022 1 次提交
    • I
      ice: Fix broken IFF_ALLMULTI handling · 1273f895
      Ivan Vecera 提交于
      Handling of all-multicast flag and associated multicast promiscuous
      mode is broken in ice driver. When an user switches allmulticast
      flag on or off the driver checks whether any VLANs are configured
      over the interface (except default VLAN 0).
      
      If any extra VLANs are registered it enables multicast promiscuous
      mode for all these VLANs (including default VLAN 0) using
      ICE_SW_LKUP_PROMISC_VLAN look-up type. In this situation all
      multicast packets tagged with known VLAN ID or untagged are received
      and multicast packets tagged with unknown VLAN ID ignored.
      
      If no extra VLANs are registered (so only VLAN 0 exists) it enables
      multicast promiscuous mode for VLAN 0 and uses ICE_SW_LKUP_PROMISC
      look-up type. In this situation any multicast packets including
      tagged ones are received.
      
      The driver handles IFF_ALLMULTI in ice_vsi_sync_fltr() this way:
      
      ice_vsi_sync_fltr() {
        ...
        if (changed_flags & IFF_ALLMULTI) {
          if (netdev->flags & IFF_ALLMULTI) {
            if (vsi->num_vlans > 1)
              ice_set_promisc(..., ICE_MCAST_VLAN_PROMISC_BITS);
            else
              ice_set_promisc(..., ICE_MCAST_PROMISC_BITS);
          } else {
            if (vsi->num_vlans > 1)
              ice_clear_promisc(..., ICE_MCAST_VLAN_PROMISC_BITS);
            else
              ice_clear_promisc(..., ICE_MCAST_PROMISC_BITS);
          }
        }
        ...
      }
      
      The code above depends on value vsi->num_vlan that specifies number
      of VLANs configured over the interface (including VLAN 0) and
      this is problem because that value is modified in NDO callbacks
      ice_vlan_rx_add_vid() and ice_vlan_rx_kill_vid().
      
      Scenario 1:
      1. ip link set ens7f0 allmulticast on
      2. ip link add vlan10 link ens7f0 type vlan id 10
      3. ip link set ens7f0 allmulticast off
      4. ip link set ens7f0 allmulticast on
      
      [1] In this scenario IFF_ALLMULTI is enabled and the driver calls
          ice_set_promisc(..., ICE_MCAST_PROMISC_BITS) that installs
          multicast promisc rule with non-VLAN look-up type.
      [2] Then VLAN with ID 10 is added and vsi->num_vlan incremented to 2
      [3] Command switches IFF_ALLMULTI off and the driver calls
          ice_clear_promisc(..., ICE_MCAST_VLAN_PROMISC_BITS) but this
          call is effectively NOP because it looks for multicast promisc
          rules for VLAN 0 and VLAN 10 with VLAN look-up type but no such
          rules exist. So the all-multicast remains enabled silently
          in hardware.
      [4] Command tries to switch IFF_ALLMULTI on and the driver calls
          ice_clear_promisc(..., ICE_MCAST_PROMISC_BITS) but this call
          fails (-EEXIST) because non-VLAN multicast promisc rule already
          exists.
      
      Scenario 2:
      1. ip link add vlan10 link ens7f0 type vlan id 10
      2. ip link set ens7f0 allmulticast on
      3. ip link add vlan20 link ens7f0 type vlan id 20
      4. ip link del vlan10 ; ip link del vlan20
      5. ip link set ens7f0 allmulticast off
      
      [1] VLAN with ID 10 is added and vsi->num_vlan==2
      [2] Command switches IFF_ALLMULTI on and driver installs multicast
          promisc rules with VLAN look-up type for VLAN 0 and 10
      [3] VLAN with ID 20 is added and vsi->num_vlan==3 but no multicast
          promisc rules is added for this new VLAN so the interface does
          not receive MC packets from VLAN 20
      [4] Both VLANs are removed but multicast rule for VLAN 10 remains
          installed so interface receives multicast packets from VLAN 10
      [5] Command switches IFF_ALLMULTI off and because vsi->num_vlan is 1
          the driver tries to remove multicast promisc rule for VLAN 0
          with non-VLAN look-up that does not exist.
          All-multicast looks disabled from user point of view but it
          is partially enabled in HW (interface receives all multicast
          packets either untagged or tagged with VLAN ID 10)
      
      To resolve these issues the patch introduces these changes:
      1. Adds handling for IFF_ALLMULTI to ice_vlan_rx_add_vid() and
         ice_vlan_rx_kill_vid() callbacks. So when VLAN is added/removed
         and IFF_ALLMULTI is enabled an appropriate multicast promisc
         rule for that VLAN ID is added/removed.
      2. In ice_vlan_rx_add_vid() when first VLAN besides VLAN 0 is added
         so (vsi->num_vlan == 2) and IFF_ALLMULTI is enabled then look-up
         type for existing multicast promisc rule for VLAN 0 is updated
         to ICE_MCAST_VLAN_PROMISC_BITS.
      3. In ice_vlan_rx_kill_vid() when last VLAN besides VLAN 0 is removed
         so (vsi->num_vlan == 1) and IFF_ALLMULTI is enabled then look-up
         type for existing multicast promisc rule for VLAN 0 is updated
         to ICE_MCAST_PROMISC_BITS.
      4. Both ice_vlan_rx_{add,kill}_vid() have to run under ICE_CFG_BUSY
         bit protection to avoid races with ice_vsi_sync_fltr() that runs
         in ice_service_task() context.
      5. Bit ICE_VSI_VLAN_FLTR_CHANGED is use-less and can be removed.
      6. Error messages added to ice_fltr_*_vsi_promisc() helper functions
         to avoid them in their callers
      7. Small improvements to increase readability
      
      Fixes: 5eda8afd ("ice: Add support for PF/VF promiscuous mode")
      Signed-off-by: NIvan Vecera <ivecera@redhat.com>
      Reviewed-by: NJacob Keller <jacob.e.keller@intel.com>
      Signed-off-by: NAlice Michael <alice.michael@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1273f895