1. 07 2月, 2023 4 次提交
    • J
      ice: move vsi_type assignment from ice_vsi_alloc to ice_vsi_cfg · e1588197
      Jacob Keller 提交于
      The ice_vsi_alloc and ice_vsi_cfg functions are used together to allocate
      and configure a new VSI, called as part of the ice_vsi_setup function.
      
      In the future with the addition of the subfunction code the ice driver
      will want to be able to allocate a VSI while delaying the configuration to
      a later point of the port activation.
      
      Currently this requires that the port code know what type of VSI should
      be allocated. This is required because ice_vsi_alloc assigns the VSI type.
      
      Refactor the ice_vsi_alloc and ice_vsi_cfg functions so that VSI type
      assignment isn't done until the configuration stage. This will allow the
      devlink port addition logic to reserve a VSI as early as possible before
      the type of the port is known. In this way, the port add can fail in the
      event that all hardware VSI resources are exhausted.
      
      Since the ice_vsi_cfg function already takes the ice_vsi_cfg_params
      structure, this is relatively straight forward.
      Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
      Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel)
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      e1588197
    • J
      ice: refactor VSI setup to use parameter structure · 5e509ab2
      Jacob Keller 提交于
      The ice_vsi_setup function, ice_vsi_alloc, and ice_vsi_cfg functions have
      grown a large number of parameters. These parameters are used to initialize
      a new VSI, as well as re-configure an existing VSI
      
      Any time we want to add a new parameter to this function chain, even if it
      will usually be unset, we have to change many call sites due to changing
      the function signature.
      
      A future change is going to refactor ice_vsi_alloc and ice_vsi_cfg to move
      the VSI configuration and initialization all into ice_vsi_cfg.
      
      Before this, refactor the VSI setup flow to use a new ice_vsi_cfg_params
      structure. This will contain the configuration (mainly pointers) used to
      initialize a VSI.
      
      Pass this from ice_vsi_setup into the related functions such as
      ice_vsi_alloc, ice_vsi_cfg, and ice_vsi_cfg_def.
      
      Introduce a helper, ice_vsi_to_params to convert an existing VSI to the
      parameters used to initialize it. This will aid in the flows where we
      rebuild an existing VSI.
      
      Since we also pass the ICE_VSI_FLAG_INIT to more functions which do not
      need (or cannot yet have) the VSI parameters, lets make this clear by
      renaming the function parameter to vsi_flags and using a u32 instead of a
      signed integer. The name vsi_flags also makes it clear that we may extend
      the flags in the future.
      
      This change will make it easier to refactor the setup flow in the future,
      and will reduce the complexity required to add a new parameter for
      configuration in the future.
      Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
      Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel)
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      5e509ab2
    • J
      ice: drop unnecessary VF parameter from several VSI functions · 157acda5
      Jacob Keller 提交于
      The vsi->vf pointer gets assigned early on during ice_vsi_alloc. Several
      functions currently take a VF pointer, but they can just use the existing
      vsi->vf pointer as needed. Modify these functions to drop the unnecessary
      VF parameter.
      
      Note that ice_vsi_cfg is not changed as a following change will refactor so
      that the VF pointer is assigned during ice_vsi_cfg rather than
      ice_vsi_alloc.
      Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
      Reviewed-by: NMichal Swiatkowski <michal.swiatkowski@linux.intel.com>
      Tested-by: NMarek Szlosek <marek.szlosek@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      157acda5
    • J
      ice: fix function comment referring to ice_vsi_alloc · a2ca73ea
      Jacob Keller 提交于
      Since commit 1d2e32275de7 ("ice: split ice_vsi_setup into smaller
      functions") ice_vsi_alloc has not been responsible for all of the behavior
      implied by the comment for ice_vsi_setup_vector_base.
      
      Fix the comment to refer to the new function ice_vsi_alloc_def().
      Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
      Reviewed-by: NMichal Swiatkowski <michal.swiatkowski@linux.intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      a2ca73ea
  2. 04 2月, 2023 5 次提交
    • M
      ice: update VSI instead of init in some case · ccf531b2
      Michal Swiatkowski 提交于
      ice_vsi_cfg() is called from different contexts:
      1) VSI exsist in HW, but it is reconfigured, because of changing queues
         for example -> update instead of init should be used
      2) VSI doesn't exsist, because rest has happened -> init command should
         be sent
      
      To support both cases pass boolean value which will store information
      what type of command has to be sent to HW.
      Signed-off-by: NMichal Swiatkowski <michal.swiatkowski@linux.intel.com>
      Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel)
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      ccf531b2
    • M
      ice: move VSI delete outside deconfig · 227bf450
      Michal Swiatkowski 提交于
      In deconfig VSI shouldn't be deleted from hw.
      
      Rewrite VSI delete function to reflect that sometimes it is only needed
      to remove VSI from hw without freeing the memory:
      ice_vsi_delete() -> delete from HW and free memory
      ice_vsi_delete_from_hw() -> delete only from HW
      
      Value returned from ice_vsi_free() is never used. Change return type to
      void.
      Signed-off-by: NMichal Swiatkowski <michal.swiatkowski@linux.intel.com>
      Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel)
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      227bf450
    • J
      ice: stop hard coding the ICE_VSI_CTRL location · a696d615
      Jacob Keller 提交于
      When allocating the ICE_VSI_CTRL, the allocated struct ice_vsi pointer is
      stored into the PF's pf->vsi array at a fixed location. This was
      historically done on the basis that it could provide an O(1) lookup for the
      special control VSI.
      
      Since we store the ctrl_vsi_idx, we already have O(1) lookup regardless of
      where in the array we store this VSI.
      
      Simplify the logic in ice_vsi_alloc by using the same method of storing the
      control VSI as other types of VSIs.
      Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
      Signed-off-by: NMichal Swiatkowski <michal.swiatkowski@linux.intel.com>
      Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel)
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      a696d615
    • M
      ice: split ice_vsi_setup into smaller functions · 6624e780
      Michal Swiatkowski 提交于
      Main goal is to reuse the same functions in VSI config and rebuild
      paths.
      To do this split ice_vsi_setup into smaller pieces and reuse it during
      rebuild.
      
      ice_vsi_alloc() should only alloc memory, not set the default values
      for VSI.
      Move setting defaults to separate function. This will allow config of
      already allocated VSI, for example in reload path.
      
      The path is mostly moving code around without introducing new
      functionality. Functions ice_vsi_cfg() and ice_vsi_decfg() were
      added, but they are using code that already exist.
      
      Use flag to pass information about VSI initialization during rebuild
      instead of using boolean value.
      Co-developed-by: NJacob Keller <jacob.e.keller@intel.com>
      Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
      Signed-off-by: NMichal Swiatkowski <michal.swiatkowski@linux.intel.com>
      Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel)
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      6624e780
    • M
      ice: cleanup in VSI config/deconfig code · 0db66d20
      Michal Swiatkowski 提交于
      Do few small cleanups:
      
      1) Rename the function to reflect that it doesn't configure all things
      related to VSI. ice_vsi_cfg_lan() better fits to what function is doing.
      
      ice_vsi_cfg() can be use to name function that will configure whole VSI.
      
      2) Remove unused ethtype field from VSI. There is no need to set
      ethtype here, because it is never used.
      
      3) Remove unnecessary check for ICE_VSI_CHNL. There is check for
      ICE_VSI_CHNL in ice_vsi_get_qs, so there is no need to check it before
      calling the function.
      
      4) Simplify ice_vsi_alloc() call. There is no need to check the type of
      VSI before calling ice_vsi_alloc(). For ICE_VSI_CHNL vf is always NULL
      (ice_vsi_setup() is called with vf=NULL).
      For ICE_VSI_VF or ICE_VSI_CTRL ch is always NULL and for other VSI types
      ch and vf are always NULL.
      
      5) Remove unnecessary call to ice_vsi_dis_irq(). ice_vsi_dis_irq() will
      be called in ice_vsi_close() flow (ice_vsi_close() -> ice_vsi_down() ->
      ice_vsi_dis_irq()). Remove unnecessary call.
      
      6) Don't remove specific filters in release. All hw filters are removed
      in ice_fltr_remove_alli(), which is always called in VSI release flow.
      There is no need to remove only ethertype filters before calling
      ice_fltr_remove_all().
      
      7) Rename ice_vsi_clear() to ice_vsi_free(). As ice_vsi_clear() only
      free memory allocated in ice_vsi_alloc() rename it to ice_vsi_free()
      which better shows what function is doing.
      
      8) Free coalesce param in rebuild. There is potential memory leak if
      configuration of VSI lan fails. Free coalesce to avoid it.
      Signed-off-by: NMichal Swiatkowski <michal.swiatkowski@linux.intel.com>
      Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel)
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      0db66d20
  3. 25 1月, 2023 1 次提交
  4. 20 1月, 2023 1 次提交
  5. 24 11月, 2022 3 次提交
    • B
      ice: Accumulate ring statistics over reset · 288ecf49
      Benjamin Mikailenko 提交于
      Resets may occur with or without user interaction. For example, a TX hang
      or reconfiguration of parameters will result in a reset. During reset, the
      VSI is freed, freeing any statistics structures inside as well. This would
      create an issue for the user where a reset happens in the background,
      statistics set to zero, and the user checks ring statistics expecting them
      to be populated.
      
      To ensure this doesn't happen, accumulate ring statistics over reset.
      
      Define a new ring statistics structure, ice_ring_stats. The new structure
      lives in the VSI's parent, preserving ring statistics when VSI is freed.
      
      1. Define a new structure vsi_ring_stats in the PF scope
      2. Allocate/free stats only during probe, unload, or change in ring size
      3. Replace previous ring statistics functionality with new structure
      Signed-off-by: NBenjamin Mikailenko <benjamin.mikailenko@intel.com>
      Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel)
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      288ecf49
    • B
      ice: Accumulate HW and Netdev statistics over reset · 2fd5e433
      Benjamin Mikailenko 提交于
      Resets happen with or without user interaction. For example, incidents
      such as TX hang or a reconfiguration of parameters will result in a reset.
      During reset, hardware and software statistics were set to zero. This
      created an issue for the user where a reset happens in the background,
      statistics set to zero, and the user checks statistics expecting them to
      be populated.
      
      To ensure this doesn't happen, keep accumulating stats over reset.
      
      1. Remove function calls which reset hardware and netdev statistics.
      2. Do not rollover statistics in ice_stat_update40 during reset.
      Signed-off-by: NBenjamin Mikailenko <benjamin.mikailenko@intel.com>
      Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel)
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      2fd5e433
    • B
      ice: Remove and replace ice speed defines with ethtool.h versions · 1d0e28a9
      Brett Creeley 提交于
      The driver is currently using ICE_LINK_SPEED_* defines that mirror what
      ethtool.h defines, with one exception ICE_LINK_SPEED_UNKNOWN.
      
      This issue is fixed by the following changes:
      
      1. replace ICE_LINK_SPEED_UNKNOWN with 0 because SPEED_UNKNOWN in
         ethtool.h is "-1" and that doesn't match the driver's expected behavior
      2. transform ICE_LINK_SPEED_*MBPS to SPEED_* using static tables and
         fls()-1 to convert from BIT() to an index in a table.
      Suggested-by: NAlexander Lobakin <alexandr.lobakin@intel.com>
      Signed-off-by: NBrett Creeley <brett.creeley@intel.com>
      Co-developed-by: NJesse Brandeburg <jesse.brandeburg@intel.com>
      Signed-off-by: NJesse Brandeburg <jesse.brandeburg@intel.com>
      Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel)
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      1d0e28a9
  6. 10 11月, 2022 1 次提交
  7. 27 9月, 2022 1 次提交
  8. 09 9月, 2022 1 次提交
    • D
      ice: Fix crash by keep old cfg when update TCs more than queues · a509702c
      Ding Hui 提交于
      There are problems if allocated queues less than Traffic Classes.
      
      Commit a632b2a4 ("ice: ethtool: Prohibit improper channel config
      for DCB") already disallow setting less queues than TCs.
      
      Another case is if we first set less queues, and later update more TCs
      config due to LLDP, ice_vsi_cfg_tc() will failed but left dirty
      num_txq/rxq and tc_cfg in vsi, that will cause invalid pointer access.
      
      [   95.968089] ice 0000:3b:00.1: More TCs defined than queues/rings allocated.
      [   95.968092] ice 0000:3b:00.1: Trying to use more Rx queues (8), than were allocated (1)!
      [   95.968093] ice 0000:3b:00.1: Failed to config TC for VSI index: 0
      [   95.969621] general protection fault: 0000 [#1] SMP NOPTI
      [   95.969705] CPU: 1 PID: 58405 Comm: lldpad Kdump: loaded Tainted: G     U  W  O     --------- -t - 4.18.0 #1
      [   95.969867] Hardware name: O.E.M/BC11SPSCB10, BIOS 8.23 12/30/2021
      [   95.969992] RIP: 0010:devm_kmalloc+0xa/0x60
      [   95.970052] Code: 5c ff ff ff 31 c0 5b 5d 41 5c c3 b8 f4 ff ff ff eb f4 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 89 d1 <8b> 97 60 02 00 00 48 8d 7e 18 48 39 f7 72 3f 55 89 ce 53 48 8b 4c
      [   95.970344] RSP: 0018:ffffc9003f553888 EFLAGS: 00010206
      [   95.970425] RAX: dead000000000200 RBX: ffffea003c425b00 RCX: 00000000006080c0
      [   95.970536] RDX: 00000000006080c0 RSI: 0000000000000200 RDI: dead000000000200
      [   95.970648] RBP: dead000000000200 R08: 00000000000463c0 R09: ffff888ffa900000
      [   95.970760] R10: 0000000000000000 R11: 0000000000000002 R12: ffff888ff6b40100
      [   95.970870] R13: ffff888ff6a55018 R14: 0000000000000000 R15: ffff888ff6a55460
      [   95.970981] FS:  00007f51b7d24700(0000) GS:ffff88903ee80000(0000) knlGS:0000000000000000
      [   95.971108] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   95.971197] CR2: 00007fac5410d710 CR3: 0000000f2c1de002 CR4: 00000000007606e0
      [   95.971309] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [   95.971419] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [   95.971530] PKRU: 55555554
      [   95.971573] Call Trace:
      [   95.971622]  ice_setup_rx_ring+0x39/0x110 [ice]
      [   95.971695]  ice_vsi_setup_rx_rings+0x54/0x90 [ice]
      [   95.971774]  ice_vsi_open+0x25/0x120 [ice]
      [   95.971843]  ice_open_internal+0xb8/0x1f0 [ice]
      [   95.971919]  ice_ena_vsi+0x4f/0xd0 [ice]
      [   95.971987]  ice_dcb_ena_dis_vsi.constprop.5+0x29/0x90 [ice]
      [   95.972082]  ice_pf_dcb_cfg+0x29a/0x380 [ice]
      [   95.972154]  ice_dcbnl_setets+0x174/0x1b0 [ice]
      [   95.972220]  dcbnl_ieee_set+0x89/0x230
      [   95.972279]  ? dcbnl_ieee_del+0x150/0x150
      [   95.972341]  dcb_doit+0x124/0x1b0
      [   95.972392]  rtnetlink_rcv_msg+0x243/0x2f0
      [   95.972457]  ? dcb_doit+0x14d/0x1b0
      [   95.972510]  ? __kmalloc_node_track_caller+0x1d3/0x280
      [   95.972591]  ? rtnl_calcit.isra.31+0x100/0x100
      [   95.972661]  netlink_rcv_skb+0xcf/0xf0
      [   95.972720]  netlink_unicast+0x16d/0x220
      [   95.972781]  netlink_sendmsg+0x2ba/0x3a0
      [   95.975891]  sock_sendmsg+0x4c/0x50
      [   95.979032]  ___sys_sendmsg+0x2e4/0x300
      [   95.982147]  ? kmem_cache_alloc+0x13e/0x190
      [   95.985242]  ? __wake_up_common_lock+0x79/0x90
      [   95.988338]  ? __check_object_size+0xac/0x1b0
      [   95.991440]  ? _copy_to_user+0x22/0x30
      [   95.994539]  ? move_addr_to_user+0xbb/0xd0
      [   95.997619]  ? __sys_sendmsg+0x53/0x80
      [   96.000664]  __sys_sendmsg+0x53/0x80
      [   96.003747]  do_syscall_64+0x5b/0x1d0
      [   96.006862]  entry_SYSCALL_64_after_hwframe+0x65/0xca
      
      Only update num_txq/rxq when passed check, and restore tc_cfg if setup
      queue map failed.
      
      Fixes: a632b2a4 ("ice: ethtool: Prohibit improper channel config for DCB")
      Signed-off-by: NDing Hui <dinghui@sangfor.com.cn>
      Reviewed-by: NAnatolii Gerasymenko <anatolii.gerasymenko@intel.com>
      Tested-by: Arpana Arland <arpanax.arland@intel.com> (A Contingent worker at Intel)
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      a509702c
  9. 22 8月, 2022 1 次提交
    • M
      ice: xsk: use Rx ring's XDP ring when picking NAPI context · 9ead7e74
      Maciej Fijalkowski 提交于
      Ice driver allocates per cpu XDP queues so that redirect path can safely
      use smp_processor_id() as an index to the array. At the same time
      though, XDP rings are used to pick NAPI context to call napi_schedule()
      or set NAPIF_STATE_MISSED. When user reduces queue count, say to 8, and
      num_possible_cpus() of underlying platform is 44, then this means queue
      vectors with correlated NAPI contexts will carry several XDP queues.
      
      This in turn can result in a broken behavior where NAPI context of
      interest will never be scheduled and AF_XDP socket will not process any
      traffic.
      
      To fix this, let us change the way how XDP rings are assigned to Rx
      rings and use this information later on when setting
      ice_tx_ring::xsk_pool pointer. For each Rx ring, grab the associated
      queue vector and walk through Tx ring's linked list. Once we stumble
      upon XDP ring in it, assign this ring to ice_rx_ring::xdp_ring.
      
      Previous [0] approach of fixing this issue was for txonly scenario
      because of the described grouping of XDP rings across queue vectors. So,
      relying on Rx ring meant that NAPI context could be scheduled with a
      queue vector without XDP ring with associated XSK pool.
      
      [0]: https://lore.kernel.org/netdev/20220707161128.54215-1-maciej.fijalkowski@intel.com/
      
      Fixes: 2d4238f5 ("ice: Add support for AF_XDP")
      Fixes: 22bf877e ("ice: introduce XDP_TX fallback path")
      Signed-off-by: NMaciej Fijalkowski <maciej.fijalkowski@intel.com>
      Tested-by: NGeorge Kuruvinakunnel <george.kuruvinakunnel@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      9ead7e74
  10. 18 8月, 2022 2 次提交
  11. 17 8月, 2022 1 次提交
  12. 11 8月, 2022 1 次提交
  13. 29 7月, 2022 1 次提交
    • M
      ice: Introduce enabling promiscuous mode on multiple VF's · d7393425
      Michal Wilczynski 提交于
      In current implementation default VSI switch filter is only able to
      forward traffic to a single VSI. This limits promiscuous mode with
      private flag 'vf-true-promisc-support' to a single VF. Enabling it on
      the second VF won't work. Also allmulticast support doesn't seem to be
      properly implemented when vf-true-promisc-support is true.
      
      Use standard ice_add_rule_internal() function that already implements
      forwarding to multiple VSI's instead of constructing AQ call manually.
      
      Add switch filter for allmulticast mode when vf-true-promisc-support is
      enabled. The same filter is added regardless of the flag - it doesn't
      matter for this case.
      
      Remove unnecessary fields in switch structure. From now on book keeping
      will be done by ice_add_rule_internal().
      
      Refactor unnecessarily passed function arguments.
      
      To test:
      1) Create 2 VM's, and two VF's. Attach VF's to VM's.
      2) Enable promiscuous mode on both of them and check if
         traffic is seen on both of them.
      Signed-off-by: NMichal Wilczynski <michal.wilczynski@intel.com>
      Tested-by: NMarek Szlosek <marek.szlosek@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      d7393425
  14. 16 7月, 2022 1 次提交
  15. 22 6月, 2022 1 次提交
    • A
      ice: ethtool: Prohibit improper channel config for DCB · a632b2a4
      Anatolii Gerasymenko 提交于
      Do not allow setting less channels, than Traffic Classes there are
      via ethtool. There must be at least one channel per Traffic Class.
      
      If you set less channels, than Traffic Classes there are, then during
      ice_vsi_rebuild there would be allocated only the requested amount
      of tx/rx rings in ice_vsi_alloc_arrays. But later in ice_vsi_setup_q_map
      there would be requested at least one channel per Traffic Class. This
      results in setting num_rxq > alloc_rxq and num_txq > alloc_txq.
      Later, there would be a NULL pointer dereference in
      ice_vsi_map_rings_to_vectors, because we go beyond of rx_rings or
      tx_rings arrays.
      
      Change ice_set_channels() to return error if you try to allocate less
      channels, than Traffic Classes there are.
      Change ice_vsi_setup_q_map() and ice_vsi_setup_q_map_mqprio() to return
      status code instead of void.
      Add error handling for ice_vsi_setup_q_map() and
      ice_vsi_setup_q_map_mqprio() in ice_vsi_init() and ice_vsi_cfg_tc().
      
      [53753.889983] INFO: Flow control is disabled for this traffic class (0) on this vsi.
      [53763.984862] BUG: unable to handle kernel NULL pointer dereference at 0000000000000028
      [53763.992915] PGD 14b45f5067 P4D 0
      [53763.996444] Oops: 0002 [#1] SMP NOPTI
      [53764.000312] CPU: 12 PID: 30661 Comm: ethtool Kdump: loaded Tainted: GOE    --------- -  - 4.18.0-240.el8.x86_64 #1
      [53764.011825] Hardware name: Intel Corporation WilsonCity/WilsonCity, BIOS WLYDCRB1.SYS.0020.P21.2012150710 12/15/2020
      [53764.022584] RIP: 0010:ice_vsi_map_rings_to_vectors+0x7e/0x120 [ice]
      [53764.029089] Code: 41 0d 0f b7 b7 12 05 00 00 0f b6 d0 44 29 de 44 0f b7 c6 44 01 c2 41 39 d0 7d 2d 4c 8b 47 28 44 0f b7 ce 83 c6 01 4f 8b 04 c8 <49> 89 48 28 4                           c 8b 89 b8 01 00 00 4d 89 08 4c 89 81 b8 01 00 00 44
      [53764.048379] RSP: 0018:ff550dd88ea47b20 EFLAGS: 00010206
      [53764.053884] RAX: 0000000000000002 RBX: 0000000000000004 RCX: ff385ea42fa4a018
      [53764.061301] RDX: 0000000000000006 RSI: 0000000000000005 RDI: ff385e9baeedd018
      [53764.068717] RBP: 0000000000000010 R08: 0000000000000000 R09: 0000000000000004
      [53764.076133] R10: 0000000000000002 R11: 0000000000000004 R12: 0000000000000000
      [53764.083553] R13: 0000000000000000 R14: ff385e658fdd9000 R15: ff385e9baeedd018
      [53764.090976] FS:  000014872c5b5740(0000) GS:ff385e847f100000(0000) knlGS:0000000000000000
      [53764.099362] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [53764.105409] CR2: 0000000000000028 CR3: 0000000a820fa002 CR4: 0000000000761ee0
      [53764.112851] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [53764.120301] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [53764.127747] PKRU: 55555554
      [53764.130781] Call Trace:
      [53764.133564]  ice_vsi_rebuild+0x611/0x870 [ice]
      [53764.138341]  ice_vsi_recfg_qs+0x94/0x100 [ice]
      [53764.143116]  ice_set_channels+0x1a8/0x3e0 [ice]
      [53764.147975]  ethtool_set_channels+0x14e/0x240
      [53764.152667]  dev_ethtool+0xd74/0x2a10
      [53764.156665]  ? __mod_lruvec_state+0x44/0x110
      [53764.161280]  ? __mod_lruvec_state+0x44/0x110
      [53764.165893]  ? page_add_file_rmap+0x15/0x170
      [53764.170518]  ? inet_ioctl+0xd1/0x220
      [53764.174445]  ? netdev_run_todo+0x5e/0x290
      [53764.178808]  dev_ioctl+0xb5/0x550
      [53764.182485]  sock_do_ioctl+0xa0/0x140
      [53764.186512]  sock_ioctl+0x1a8/0x300
      [53764.190367]  ? selinux_file_ioctl+0x161/0x200
      [53764.195090]  do_vfs_ioctl+0xa4/0x640
      [53764.199035]  ksys_ioctl+0x60/0x90
      [53764.202722]  __x64_sys_ioctl+0x16/0x20
      [53764.206845]  do_syscall_64+0x5b/0x1a0
      [53764.210887]  entry_SYSCALL_64_after_hwframe+0x65/0xca
      
      Fixes: 87324e74 ("ice: Implement ethtool ops for channels")
      Signed-off-by: NAnatolii Gerasymenko <anatolii.gerasymenko@intel.com>
      Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      a632b2a4
  16. 10 6月, 2022 1 次提交
  17. 08 6月, 2022 1 次提交
  18. 18 5月, 2022 1 次提交
    • M
      ice: Fix interrupt moderation settings getting cleared · bf13502e
      Michal Wilczynski 提交于
      Adaptive-rx and Adaptive-tx are interrupt moderation settings
      that can be enabled/disabled using ethtool:
      ethtool -C ethX adaptive-rx on/off adaptive-tx on/off
      
      Unfortunately those settings are getting cleared after
      changing number of queues, or in ethtool world 'channels':
      ethtool -L ethX rx 1 tx 1
      
      Clearing was happening due to introduction of bit fields
      in ice_ring_container struct. This way only itr_setting
      bits were rebuilt during ice_vsi_rebuild_set_coalesce().
      
      Introduce an anonymous struct of bitfields and create a
      union to refer to them as a single variable.
      This way variable can be easily saved and restored.
      
      Fixes: 61dc79ce ("ice: Restore interrupt throttle settings after VSI rebuild")
      Signed-off-by: NMichal Wilczynski <michal.wilczynski@intel.com>
      Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      bf13502e
  19. 09 4月, 2022 1 次提交
    • A
      ice: arfs: fix use-after-free when freeing @rx_cpu_rmap · d7442f51
      Alexander Lobakin 提交于
      The CI testing bots triggered the following splat:
      
      [  718.203054] BUG: KASAN: use-after-free in free_irq_cpu_rmap+0x53/0x80
      [  718.206349] Read of size 4 at addr ffff8881bd127e00 by task sh/20834
      [  718.212852] CPU: 28 PID: 20834 Comm: sh Kdump: loaded Tainted: G S      W IOE     5.17.0-rc8_nextqueue-devqueue-02643-g23f3121aca93 #1
      [  718.219695] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0012.070720200218 07/07/2020
      [  718.223418] Call Trace:
      [  718.227139]
      [  718.230783]  dump_stack_lvl+0x33/0x42
      [  718.234431]  print_address_description.constprop.9+0x21/0x170
      [  718.238177]  ? free_irq_cpu_rmap+0x53/0x80
      [  718.241885]  ? free_irq_cpu_rmap+0x53/0x80
      [  718.245539]  kasan_report.cold.18+0x7f/0x11b
      [  718.249197]  ? free_irq_cpu_rmap+0x53/0x80
      [  718.252852]  free_irq_cpu_rmap+0x53/0x80
      [  718.256471]  ice_free_cpu_rx_rmap.part.11+0x37/0x50 [ice]
      [  718.260174]  ice_remove_arfs+0x5f/0x70 [ice]
      [  718.263810]  ice_rebuild_arfs+0x3b/0x70 [ice]
      [  718.267419]  ice_rebuild+0x39c/0xb60 [ice]
      [  718.270974]  ? asm_sysvec_apic_timer_interrupt+0x12/0x20
      [  718.274472]  ? ice_init_phy_user_cfg+0x360/0x360 [ice]
      [  718.278033]  ? delay_tsc+0x4a/0xb0
      [  718.281513]  ? preempt_count_sub+0x14/0xc0
      [  718.284984]  ? delay_tsc+0x8f/0xb0
      [  718.288463]  ice_do_reset+0x92/0xf0 [ice]
      [  718.292014]  ice_pci_err_resume+0x91/0xf0 [ice]
      [  718.295561]  pci_reset_function+0x53/0x80
      <...>
      [  718.393035] Allocated by task 690:
      [  718.433497] Freed by task 20834:
      [  718.495688] Last potentially related work creation:
      [  718.568966] The buggy address belongs to the object at ffff8881bd127e00
                      which belongs to the cache kmalloc-96 of size 96
      [  718.574085] The buggy address is located 0 bytes inside of
                      96-byte region [ffff8881bd127e00, ffff8881bd127e60)
      [  718.579265] The buggy address belongs to the page:
      [  718.598905] Memory state around the buggy address:
      [  718.601809]  ffff8881bd127d00: fa fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
      [  718.604796]  ffff8881bd127d80: 00 00 00 00 00 00 00 00 00 00 fc fc fc fc fc fc
      [  718.607794] >ffff8881bd127e00: fa fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
      [  718.610811]                    ^
      [  718.613819]  ffff8881bd127e80: 00 00 00 00 00 00 00 00 00 00 00 00 fc fc fc fc
      [  718.617107]  ffff8881bd127f00: fa fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
      
      This is due to that free_irq_cpu_rmap() is always being called
      *after* (devm_)free_irq() and thus it tries to work with IRQ descs
      already freed. For example, on device reset the driver frees the
      rmap right before allocating a new one (the splat above).
      Make rmap creation and freeing function symmetrical with
      {request,free}_irq() calls i.e. do that on ifup/ifdown instead
      of device probe/remove/resume. These operations can be performed
      independently from the actual device aRFS configuration.
      Also, make sure ice_vsi_free_irq() clears IRQ affinity notifiers
      only when aRFS is disabled -- otherwise, CPU rmap sets and clears
      its own and they must not be touched manually.
      
      Fixes: 28bf2672 ("ice: Implement aRFS")
      Co-developed-by: NIvan Vecera <ivecera@redhat.com>
      Signed-off-by: NIvan Vecera <ivecera@redhat.com>
      Signed-off-by: NAlexander Lobakin <alexandr.lobakin@intel.com>
      Tested-by: NIvan Vecera <ivecera@redhat.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      d7442f51
  20. 05 4月, 2022 1 次提交
    • A
      ice: Set txq_teid to ICE_INVAL_TEID on ring creation · ccfee182
      Anatolii Gerasymenko 提交于
      When VF is freshly created, but not brought up, ring->txq_teid
      value is by default set to 0.
      But 0 is a valid TEID. On some platforms the Root Node of
      Tx scheduler has a TEID = 0. This can cause issues as shown below.
      
      The proper way is to set ring->txq_teid to ICE_INVAL_TEID (0xFFFFFFFF).
      
      Testing Hints:
      echo 1 > /sys/class/net/ens785f0/device/sriov_numvfs
      ip link set dev ens785f0v0 up
      ip link set dev ens785f0v0 down
      
      If we have freshly created VF and quickly turn it on and off, so there
      would be no time to reach VIRTCHNL_OP_CONFIG_VSI_QUEUES stage, then
      VIRTCHNL_OP_DISABLE_QUEUES stage will fail with error:
      [  639.531454] disable queue 89 failed 14
      [  639.532233] Failed to disable LAN Tx queues, error: ICE_ERR_AQ_ERROR
      [  639.533107] ice 0000:02:00.0: Failed to stop Tx ring 0 on VSI 5
      
      The reason for the fail is that we are trying to send AQ command to
      delete queue 89, which has never been created and receive an "invalid
      argument" error from firmware.
      
      As this queue has never been created, it's teid and ring->txq_teid
      have default value 0.
      ice_dis_vsi_txq has a check against non-existent queues:
      
      node = ice_sched_find_node_by_teid(pi->root, q_teids[i]);
      if (!node)
      	continue;
      
      But on some platforms the Root Node of Tx scheduler has a teid = 0.
      Hence, ice_sched_find_node_by_teid finds a node with teid = 0 (it is
      pi->root), and we go further to submit an erroneous request to firmware.
      
      Fixes: 37bb8390 ("ice: Move common functions out of ice_main.c part 7/7")
      Signed-off-by: NAnatolii Gerasymenko <anatolii.gerasymenko@intel.com>
      Reviewed-by: NMaciej Fijalkowski <maciej.fijalkowski@intel.com>
      Tested-by: NKonrad Jankowski <konrad0.jankowski@intel.com>
      Signed-off-by: NAlice Michael <alice.michael@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      ccfee182
  21. 01 4月, 2022 1 次提交
    • I
      ice: Clear default forwarding VSI during VSI release · bd8c624c
      Ivan Vecera 提交于
      VSI is set as default forwarding one when promisc mode is set for
      PF interface, when PF is switched to switchdev mode or when VF
      driver asks to enable allmulticast or promisc mode for the VF
      interface (when vf-true-promisc-support priv flag is off).
      The third case is buggy because in that case VSI associated with
      VF remains as default one after VF removal.
      
      Reproducer:
      1. Create VF
         echo 1 > sys/class/net/ens7f0/device/sriov_numvfs
      2. Enable allmulticast or promisc mode on VF
         ip link set ens7f0v0 allmulticast on
         ip link set ens7f0v0 promisc on
      3. Delete VF
         echo 0 > sys/class/net/ens7f0/device/sriov_numvfs
      4. Try to enable promisc mode on PF
         ip link set ens7f0 promisc on
      
      Although it looks that promisc mode on PF is enabled the opposite
      is true because ice_vsi_sync_fltr() responsible for IFF_PROMISC
      handling first checks if any other VSI is set as default forwarding
      one and if so the function does not do anything. At this point
      it is not possible to enable promisc mode on PF without re-probe
      device.
      
      To resolve the issue this patch clear default forwarding VSI
      during ice_vsi_release() when the VSI to be released is the default
      one.
      
      Fixes: 01b5e89a ("ice: Add VF promiscuous support")
      Signed-off-by: NIvan Vecera <ivecera@redhat.com>
      Reviewed-by: NMichal Swiatkowski <michal.swiatkowski@linux.intel.com>
      Reviewed-by: NMaciej Fijalkowski <maciej.fijalkowski@intel.com>
      Signed-off-by: NAlice Michael <alice.michael@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bd8c624c
  22. 10 3月, 2022 1 次提交
  23. 04 3月, 2022 5 次提交
    • J
      ice: convert VF storage to hash table with krefs and RCU · 3d5985a1
      Jacob Keller 提交于
      The ice driver stores VF structures in a simple array which is allocated
      once at the time of VF creation. The VF structures are then accessed
      from the array by their VF ID. The ID must be between 0 and the number
      of allocated VFs.
      
      Multiple threads can access this table:
      
       * .ndo operations such as .ndo_get_vf_cfg or .ndo_set_vf_trust
       * interrupts, such as due to messages from the VF using the virtchnl
         communication
       * processing such as device reset
       * commands to add or remove VFs
      
      The current implementation does not keep track of when all threads are
      done operating on a VF and can potentially result in use-after-free
      issues caused by one thread accessing a VF structure after it has been
      released when removing VFs. Some of these are prevented with various
      state flags and checks.
      
      In addition, this structure is quite static and does not support a
      planned future where virtualization can be more dynamic. As we begin to
      look at supporting Scalable IOV with the ice driver (as opposed to just
      supporting Single Root IOV), this structure is not sufficient.
      
      In the future, VFs will be able to be added and removed individually and
      dynamically.
      
      To allow for this, and to better protect against a whole class of
      use-after-free bugs, replace the VF storage with a combination of a hash
      table and krefs to reference track all of the accesses to VFs through
      the hash table.
      
      A hash table still allows efficient look up of the VF given its ID, but
      also allows adding and removing VFs. It does not require contiguous VF
      IDs.
      
      The use of krefs allows the cleanup of the VF memory to be delayed until
      after all threads have released their reference (by calling ice_put_vf).
      
      To prevent corruption of the hash table, a combination of RCU and the
      mutex table_lock are used. Addition and removal from the hash table use
      the RCU-aware hash macros. This allows simple read-only look ups that
      iterate to locate a single VF can be fast using RCU. Accesses which
      modify the hash table, or which can't take RCU because they sleep, will
      hold the mutex lock.
      
      By using this design, we have a stronger guarantee that the VF structure
      can't be released until after all threads are finished operating on it.
      We also pave the way for the more dynamic Scalable IOV implementation in
      the future.
      Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
      Tested-by: NKonrad Jankowski <konrad0.jankowski@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      3d5985a1
    • J
      ice: introduce VF accessor functions · fb916db1
      Jacob Keller 提交于
      Before we switch the VF data structure storage mechanism to a hash,
      introduce new accessor functions to define the new interface.
      
      * ice_get_vf_by_id is a function used to obtain a reference to a VF from
        the table based on its VF ID
      * ice_has_vfs is used to quickly check if any VFs are configured
      * ice_get_num_vfs is used to get an exact count of how many VFs are
        configured
      
      We can drop the old ice_validate_vf_id function, since every caller was
      just going to immediately access the VF table to get a reference
      anyways. This way we simply use the single ice_get_vf_by_id to both
      validate the VF ID is within range and that there exists a VF with that
      ID.
      
      This change enables us to more easily convert the codebase to the hash
      table since most callers now properly use the interface.
      Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
      Tested-by: NKonrad Jankowski <konrad0.jankowski@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      fb916db1
    • J
      ice: factor VF variables to separate structure · 000773c0
      Jacob Keller 提交于
      We maintain a number of values for VFs within the ice_pf structure. This
      includes the VF table, the number of allocated VFs, the maximum number
      of supported SR-IOV VFs, the number of queue pairs per VF, the number of
      MSI-X vectors per VF, and a bitmap of the VFs with detected MDD events.
      
      We're about to add a few more variables to this list. Clean this up
      first by extracting these members out into a new ice_vfs structure
      defined in ice_virtchnl_pf.h
      Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
      Tested-by: NKonrad Jankowski <konrad0.jankowski@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      000773c0
    • J
      ice: convert ice_for_each_vf to include VF entry iterator · c4c2c7db
      Jacob Keller 提交于
      The ice_for_each_vf macro is intended to be used to loop over all VFs.
      The current implementation relies on an iterator that is the index into
      the VF array in the PF structure. This forces all users to perform a
      look up themselves.
      
      This abstraction forces a lot of duplicate work on callers and leaks the
      interface implementation to the caller. Replace this with an
      implementation that includes the VF pointer the primary iterator. This
      version simplifies callers which just want to iterate over every VF, as
      they no longer need to perform their own lookup.
      
      The "i" iterator value is replaced with a new unsigned int "bkt"
      parameter, as this will match the necessary interface for replacing
      the VF array with a hash table. For now, the bkt is the VF ID, but in
      the future it will simply be the hash bucket index. Document that it
      should not be treated as a VF ID.
      
      This change aims to simplify switching from the array to a hash table. I
      considered alternative implementations such as an xarray but decided
      that the hash table was the simplest and most suitable implementation. I
      also looked at methods to hide the bkt iterator entirely, but I couldn't
      come up with a feasible solution that worked for hash table iterators.
      Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
      Tested-by: NKonrad Jankowski <konrad0.jankowski@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      c4c2c7db
    • J
      ice: store VF pointer instead of VF ID · b03d519d
      Jacob Keller 提交于
      The VSI structure contains a vf_id field used to associate a VSI with a
      VF. This is used mainly for ICE_VSI_VF as well as partially for
      ICE_VSI_CTRL associated with the VFs.
      
      This API was designed with the idea that VFs are stored in a simple
      array that was expected to be static throughout most of the driver's
      life.
      
      We plan on refactoring VF storage in a few key ways:
      
        1) converting from a simple static array to a hash table
        2) using krefs to track VF references obtained from the hash table
        3) use RCU to delay release of VF memory until after all references
           are dropped
      
      This is motivated by the goal to ensure that the lifetime of VF
      structures is accounted for, and prevent various use-after-free bugs.
      
      With the existing vsi->vf_id, the reference tracking for VFs would
      become somewhat convoluted, because each VSI maintains a vf_id field
      which will then require performing a look up. This means all these flows
      will require reference tracking and proper usage of rcu_read_lock, etc.
      
      We know that the VF VSI will always be backed by a valid VF structure,
      because the VSI is created during VF initialization and removed before
      the VF is destroyed. Rely on this and store a reference to the VF in the
      VSI structure instead of storing a VF ID. This will simplify the usage
      and avoid the need to perform lookups on the hash table in the future.
      
      For ICE_VSI_VF, it is expected that vsi->vf is always non-NULL after
      ice_vsi_alloc succeeds. Because of this, use WARN_ON when checking if a
      vsi->vf pointer is valid when dealing with VF VSIs. This will aid in
      debugging code which violates this assumption and avoid more disastrous
      panics.
      Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
      Tested-by: NKonrad Jankowski <konrad0.jankowski@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      b03d519d
  24. 03 3月, 2022 1 次提交
  25. 14 2月, 2022 2 次提交