1. 16 7月, 2022 1 次提交
  2. 22 6月, 2022 1 次提交
    • A
      ice: ethtool: Prohibit improper channel config for DCB · a632b2a4
      Anatolii Gerasymenko 提交于
      Do not allow setting less channels, than Traffic Classes there are
      via ethtool. There must be at least one channel per Traffic Class.
      
      If you set less channels, than Traffic Classes there are, then during
      ice_vsi_rebuild there would be allocated only the requested amount
      of tx/rx rings in ice_vsi_alloc_arrays. But later in ice_vsi_setup_q_map
      there would be requested at least one channel per Traffic Class. This
      results in setting num_rxq > alloc_rxq and num_txq > alloc_txq.
      Later, there would be a NULL pointer dereference in
      ice_vsi_map_rings_to_vectors, because we go beyond of rx_rings or
      tx_rings arrays.
      
      Change ice_set_channels() to return error if you try to allocate less
      channels, than Traffic Classes there are.
      Change ice_vsi_setup_q_map() and ice_vsi_setup_q_map_mqprio() to return
      status code instead of void.
      Add error handling for ice_vsi_setup_q_map() and
      ice_vsi_setup_q_map_mqprio() in ice_vsi_init() and ice_vsi_cfg_tc().
      
      [53753.889983] INFO: Flow control is disabled for this traffic class (0) on this vsi.
      [53763.984862] BUG: unable to handle kernel NULL pointer dereference at 0000000000000028
      [53763.992915] PGD 14b45f5067 P4D 0
      [53763.996444] Oops: 0002 [#1] SMP NOPTI
      [53764.000312] CPU: 12 PID: 30661 Comm: ethtool Kdump: loaded Tainted: GOE    --------- -  - 4.18.0-240.el8.x86_64 #1
      [53764.011825] Hardware name: Intel Corporation WilsonCity/WilsonCity, BIOS WLYDCRB1.SYS.0020.P21.2012150710 12/15/2020
      [53764.022584] RIP: 0010:ice_vsi_map_rings_to_vectors+0x7e/0x120 [ice]
      [53764.029089] Code: 41 0d 0f b7 b7 12 05 00 00 0f b6 d0 44 29 de 44 0f b7 c6 44 01 c2 41 39 d0 7d 2d 4c 8b 47 28 44 0f b7 ce 83 c6 01 4f 8b 04 c8 <49> 89 48 28 4                           c 8b 89 b8 01 00 00 4d 89 08 4c 89 81 b8 01 00 00 44
      [53764.048379] RSP: 0018:ff550dd88ea47b20 EFLAGS: 00010206
      [53764.053884] RAX: 0000000000000002 RBX: 0000000000000004 RCX: ff385ea42fa4a018
      [53764.061301] RDX: 0000000000000006 RSI: 0000000000000005 RDI: ff385e9baeedd018
      [53764.068717] RBP: 0000000000000010 R08: 0000000000000000 R09: 0000000000000004
      [53764.076133] R10: 0000000000000002 R11: 0000000000000004 R12: 0000000000000000
      [53764.083553] R13: 0000000000000000 R14: ff385e658fdd9000 R15: ff385e9baeedd018
      [53764.090976] FS:  000014872c5b5740(0000) GS:ff385e847f100000(0000) knlGS:0000000000000000
      [53764.099362] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [53764.105409] CR2: 0000000000000028 CR3: 0000000a820fa002 CR4: 0000000000761ee0
      [53764.112851] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [53764.120301] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [53764.127747] PKRU: 55555554
      [53764.130781] Call Trace:
      [53764.133564]  ice_vsi_rebuild+0x611/0x870 [ice]
      [53764.138341]  ice_vsi_recfg_qs+0x94/0x100 [ice]
      [53764.143116]  ice_set_channels+0x1a8/0x3e0 [ice]
      [53764.147975]  ethtool_set_channels+0x14e/0x240
      [53764.152667]  dev_ethtool+0xd74/0x2a10
      [53764.156665]  ? __mod_lruvec_state+0x44/0x110
      [53764.161280]  ? __mod_lruvec_state+0x44/0x110
      [53764.165893]  ? page_add_file_rmap+0x15/0x170
      [53764.170518]  ? inet_ioctl+0xd1/0x220
      [53764.174445]  ? netdev_run_todo+0x5e/0x290
      [53764.178808]  dev_ioctl+0xb5/0x550
      [53764.182485]  sock_do_ioctl+0xa0/0x140
      [53764.186512]  sock_ioctl+0x1a8/0x300
      [53764.190367]  ? selinux_file_ioctl+0x161/0x200
      [53764.195090]  do_vfs_ioctl+0xa4/0x640
      [53764.199035]  ksys_ioctl+0x60/0x90
      [53764.202722]  __x64_sys_ioctl+0x16/0x20
      [53764.206845]  do_syscall_64+0x5b/0x1a0
      [53764.210887]  entry_SYSCALL_64_after_hwframe+0x65/0xca
      
      Fixes: 87324e74 ("ice: Implement ethtool ops for channels")
      Signed-off-by: NAnatolii Gerasymenko <anatolii.gerasymenko@intel.com>
      Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      a632b2a4
  3. 10 6月, 2022 1 次提交
  4. 08 6月, 2022 1 次提交
  5. 18 5月, 2022 1 次提交
    • M
      ice: Fix interrupt moderation settings getting cleared · bf13502e
      Michal Wilczynski 提交于
      Adaptive-rx and Adaptive-tx are interrupt moderation settings
      that can be enabled/disabled using ethtool:
      ethtool -C ethX adaptive-rx on/off adaptive-tx on/off
      
      Unfortunately those settings are getting cleared after
      changing number of queues, or in ethtool world 'channels':
      ethtool -L ethX rx 1 tx 1
      
      Clearing was happening due to introduction of bit fields
      in ice_ring_container struct. This way only itr_setting
      bits were rebuilt during ice_vsi_rebuild_set_coalesce().
      
      Introduce an anonymous struct of bitfields and create a
      union to refer to them as a single variable.
      This way variable can be easily saved and restored.
      
      Fixes: 61dc79ce ("ice: Restore interrupt throttle settings after VSI rebuild")
      Signed-off-by: NMichal Wilczynski <michal.wilczynski@intel.com>
      Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      bf13502e
  6. 09 4月, 2022 1 次提交
    • A
      ice: arfs: fix use-after-free when freeing @rx_cpu_rmap · d7442f51
      Alexander Lobakin 提交于
      The CI testing bots triggered the following splat:
      
      [  718.203054] BUG: KASAN: use-after-free in free_irq_cpu_rmap+0x53/0x80
      [  718.206349] Read of size 4 at addr ffff8881bd127e00 by task sh/20834
      [  718.212852] CPU: 28 PID: 20834 Comm: sh Kdump: loaded Tainted: G S      W IOE     5.17.0-rc8_nextqueue-devqueue-02643-g23f3121aca93 #1
      [  718.219695] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0012.070720200218 07/07/2020
      [  718.223418] Call Trace:
      [  718.227139]
      [  718.230783]  dump_stack_lvl+0x33/0x42
      [  718.234431]  print_address_description.constprop.9+0x21/0x170
      [  718.238177]  ? free_irq_cpu_rmap+0x53/0x80
      [  718.241885]  ? free_irq_cpu_rmap+0x53/0x80
      [  718.245539]  kasan_report.cold.18+0x7f/0x11b
      [  718.249197]  ? free_irq_cpu_rmap+0x53/0x80
      [  718.252852]  free_irq_cpu_rmap+0x53/0x80
      [  718.256471]  ice_free_cpu_rx_rmap.part.11+0x37/0x50 [ice]
      [  718.260174]  ice_remove_arfs+0x5f/0x70 [ice]
      [  718.263810]  ice_rebuild_arfs+0x3b/0x70 [ice]
      [  718.267419]  ice_rebuild+0x39c/0xb60 [ice]
      [  718.270974]  ? asm_sysvec_apic_timer_interrupt+0x12/0x20
      [  718.274472]  ? ice_init_phy_user_cfg+0x360/0x360 [ice]
      [  718.278033]  ? delay_tsc+0x4a/0xb0
      [  718.281513]  ? preempt_count_sub+0x14/0xc0
      [  718.284984]  ? delay_tsc+0x8f/0xb0
      [  718.288463]  ice_do_reset+0x92/0xf0 [ice]
      [  718.292014]  ice_pci_err_resume+0x91/0xf0 [ice]
      [  718.295561]  pci_reset_function+0x53/0x80
      <...>
      [  718.393035] Allocated by task 690:
      [  718.433497] Freed by task 20834:
      [  718.495688] Last potentially related work creation:
      [  718.568966] The buggy address belongs to the object at ffff8881bd127e00
                      which belongs to the cache kmalloc-96 of size 96
      [  718.574085] The buggy address is located 0 bytes inside of
                      96-byte region [ffff8881bd127e00, ffff8881bd127e60)
      [  718.579265] The buggy address belongs to the page:
      [  718.598905] Memory state around the buggy address:
      [  718.601809]  ffff8881bd127d00: fa fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
      [  718.604796]  ffff8881bd127d80: 00 00 00 00 00 00 00 00 00 00 fc fc fc fc fc fc
      [  718.607794] >ffff8881bd127e00: fa fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
      [  718.610811]                    ^
      [  718.613819]  ffff8881bd127e80: 00 00 00 00 00 00 00 00 00 00 00 00 fc fc fc fc
      [  718.617107]  ffff8881bd127f00: fa fb fb fb fb fb fb fb fb fb fb fb fc fc fc fc
      
      This is due to that free_irq_cpu_rmap() is always being called
      *after* (devm_)free_irq() and thus it tries to work with IRQ descs
      already freed. For example, on device reset the driver frees the
      rmap right before allocating a new one (the splat above).
      Make rmap creation and freeing function symmetrical with
      {request,free}_irq() calls i.e. do that on ifup/ifdown instead
      of device probe/remove/resume. These operations can be performed
      independently from the actual device aRFS configuration.
      Also, make sure ice_vsi_free_irq() clears IRQ affinity notifiers
      only when aRFS is disabled -- otherwise, CPU rmap sets and clears
      its own and they must not be touched manually.
      
      Fixes: 28bf2672 ("ice: Implement aRFS")
      Co-developed-by: NIvan Vecera <ivecera@redhat.com>
      Signed-off-by: NIvan Vecera <ivecera@redhat.com>
      Signed-off-by: NAlexander Lobakin <alexandr.lobakin@intel.com>
      Tested-by: NIvan Vecera <ivecera@redhat.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      d7442f51
  7. 05 4月, 2022 1 次提交
    • A
      ice: Set txq_teid to ICE_INVAL_TEID on ring creation · ccfee182
      Anatolii Gerasymenko 提交于
      When VF is freshly created, but not brought up, ring->txq_teid
      value is by default set to 0.
      But 0 is a valid TEID. On some platforms the Root Node of
      Tx scheduler has a TEID = 0. This can cause issues as shown below.
      
      The proper way is to set ring->txq_teid to ICE_INVAL_TEID (0xFFFFFFFF).
      
      Testing Hints:
      echo 1 > /sys/class/net/ens785f0/device/sriov_numvfs
      ip link set dev ens785f0v0 up
      ip link set dev ens785f0v0 down
      
      If we have freshly created VF and quickly turn it on and off, so there
      would be no time to reach VIRTCHNL_OP_CONFIG_VSI_QUEUES stage, then
      VIRTCHNL_OP_DISABLE_QUEUES stage will fail with error:
      [  639.531454] disable queue 89 failed 14
      [  639.532233] Failed to disable LAN Tx queues, error: ICE_ERR_AQ_ERROR
      [  639.533107] ice 0000:02:00.0: Failed to stop Tx ring 0 on VSI 5
      
      The reason for the fail is that we are trying to send AQ command to
      delete queue 89, which has never been created and receive an "invalid
      argument" error from firmware.
      
      As this queue has never been created, it's teid and ring->txq_teid
      have default value 0.
      ice_dis_vsi_txq has a check against non-existent queues:
      
      node = ice_sched_find_node_by_teid(pi->root, q_teids[i]);
      if (!node)
      	continue;
      
      But on some platforms the Root Node of Tx scheduler has a teid = 0.
      Hence, ice_sched_find_node_by_teid finds a node with teid = 0 (it is
      pi->root), and we go further to submit an erroneous request to firmware.
      
      Fixes: 37bb8390 ("ice: Move common functions out of ice_main.c part 7/7")
      Signed-off-by: NAnatolii Gerasymenko <anatolii.gerasymenko@intel.com>
      Reviewed-by: NMaciej Fijalkowski <maciej.fijalkowski@intel.com>
      Tested-by: NKonrad Jankowski <konrad0.jankowski@intel.com>
      Signed-off-by: NAlice Michael <alice.michael@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      ccfee182
  8. 01 4月, 2022 1 次提交
    • I
      ice: Clear default forwarding VSI during VSI release · bd8c624c
      Ivan Vecera 提交于
      VSI is set as default forwarding one when promisc mode is set for
      PF interface, when PF is switched to switchdev mode or when VF
      driver asks to enable allmulticast or promisc mode for the VF
      interface (when vf-true-promisc-support priv flag is off).
      The third case is buggy because in that case VSI associated with
      VF remains as default one after VF removal.
      
      Reproducer:
      1. Create VF
         echo 1 > sys/class/net/ens7f0/device/sriov_numvfs
      2. Enable allmulticast or promisc mode on VF
         ip link set ens7f0v0 allmulticast on
         ip link set ens7f0v0 promisc on
      3. Delete VF
         echo 0 > sys/class/net/ens7f0/device/sriov_numvfs
      4. Try to enable promisc mode on PF
         ip link set ens7f0 promisc on
      
      Although it looks that promisc mode on PF is enabled the opposite
      is true because ice_vsi_sync_fltr() responsible for IFF_PROMISC
      handling first checks if any other VSI is set as default forwarding
      one and if so the function does not do anything. At this point
      it is not possible to enable promisc mode on PF without re-probe
      device.
      
      To resolve the issue this patch clear default forwarding VSI
      during ice_vsi_release() when the VSI to be released is the default
      one.
      
      Fixes: 01b5e89a ("ice: Add VF promiscuous support")
      Signed-off-by: NIvan Vecera <ivecera@redhat.com>
      Reviewed-by: NMichal Swiatkowski <michal.swiatkowski@linux.intel.com>
      Reviewed-by: NMaciej Fijalkowski <maciej.fijalkowski@intel.com>
      Signed-off-by: NAlice Michael <alice.michael@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bd8c624c
  9. 10 3月, 2022 1 次提交
  10. 04 3月, 2022 5 次提交
    • J
      ice: convert VF storage to hash table with krefs and RCU · 3d5985a1
      Jacob Keller 提交于
      The ice driver stores VF structures in a simple array which is allocated
      once at the time of VF creation. The VF structures are then accessed
      from the array by their VF ID. The ID must be between 0 and the number
      of allocated VFs.
      
      Multiple threads can access this table:
      
       * .ndo operations such as .ndo_get_vf_cfg or .ndo_set_vf_trust
       * interrupts, such as due to messages from the VF using the virtchnl
         communication
       * processing such as device reset
       * commands to add or remove VFs
      
      The current implementation does not keep track of when all threads are
      done operating on a VF and can potentially result in use-after-free
      issues caused by one thread accessing a VF structure after it has been
      released when removing VFs. Some of these are prevented with various
      state flags and checks.
      
      In addition, this structure is quite static and does not support a
      planned future where virtualization can be more dynamic. As we begin to
      look at supporting Scalable IOV with the ice driver (as opposed to just
      supporting Single Root IOV), this structure is not sufficient.
      
      In the future, VFs will be able to be added and removed individually and
      dynamically.
      
      To allow for this, and to better protect against a whole class of
      use-after-free bugs, replace the VF storage with a combination of a hash
      table and krefs to reference track all of the accesses to VFs through
      the hash table.
      
      A hash table still allows efficient look up of the VF given its ID, but
      also allows adding and removing VFs. It does not require contiguous VF
      IDs.
      
      The use of krefs allows the cleanup of the VF memory to be delayed until
      after all threads have released their reference (by calling ice_put_vf).
      
      To prevent corruption of the hash table, a combination of RCU and the
      mutex table_lock are used. Addition and removal from the hash table use
      the RCU-aware hash macros. This allows simple read-only look ups that
      iterate to locate a single VF can be fast using RCU. Accesses which
      modify the hash table, or which can't take RCU because they sleep, will
      hold the mutex lock.
      
      By using this design, we have a stronger guarantee that the VF structure
      can't be released until after all threads are finished operating on it.
      We also pave the way for the more dynamic Scalable IOV implementation in
      the future.
      Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
      Tested-by: NKonrad Jankowski <konrad0.jankowski@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      3d5985a1
    • J
      ice: introduce VF accessor functions · fb916db1
      Jacob Keller 提交于
      Before we switch the VF data structure storage mechanism to a hash,
      introduce new accessor functions to define the new interface.
      
      * ice_get_vf_by_id is a function used to obtain a reference to a VF from
        the table based on its VF ID
      * ice_has_vfs is used to quickly check if any VFs are configured
      * ice_get_num_vfs is used to get an exact count of how many VFs are
        configured
      
      We can drop the old ice_validate_vf_id function, since every caller was
      just going to immediately access the VF table to get a reference
      anyways. This way we simply use the single ice_get_vf_by_id to both
      validate the VF ID is within range and that there exists a VF with that
      ID.
      
      This change enables us to more easily convert the codebase to the hash
      table since most callers now properly use the interface.
      Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
      Tested-by: NKonrad Jankowski <konrad0.jankowski@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      fb916db1
    • J
      ice: factor VF variables to separate structure · 000773c0
      Jacob Keller 提交于
      We maintain a number of values for VFs within the ice_pf structure. This
      includes the VF table, the number of allocated VFs, the maximum number
      of supported SR-IOV VFs, the number of queue pairs per VF, the number of
      MSI-X vectors per VF, and a bitmap of the VFs with detected MDD events.
      
      We're about to add a few more variables to this list. Clean this up
      first by extracting these members out into a new ice_vfs structure
      defined in ice_virtchnl_pf.h
      Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
      Tested-by: NKonrad Jankowski <konrad0.jankowski@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      000773c0
    • J
      ice: convert ice_for_each_vf to include VF entry iterator · c4c2c7db
      Jacob Keller 提交于
      The ice_for_each_vf macro is intended to be used to loop over all VFs.
      The current implementation relies on an iterator that is the index into
      the VF array in the PF structure. This forces all users to perform a
      look up themselves.
      
      This abstraction forces a lot of duplicate work on callers and leaks the
      interface implementation to the caller. Replace this with an
      implementation that includes the VF pointer the primary iterator. This
      version simplifies callers which just want to iterate over every VF, as
      they no longer need to perform their own lookup.
      
      The "i" iterator value is replaced with a new unsigned int "bkt"
      parameter, as this will match the necessary interface for replacing
      the VF array with a hash table. For now, the bkt is the VF ID, but in
      the future it will simply be the hash bucket index. Document that it
      should not be treated as a VF ID.
      
      This change aims to simplify switching from the array to a hash table. I
      considered alternative implementations such as an xarray but decided
      that the hash table was the simplest and most suitable implementation. I
      also looked at methods to hide the bkt iterator entirely, but I couldn't
      come up with a feasible solution that worked for hash table iterators.
      Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
      Tested-by: NKonrad Jankowski <konrad0.jankowski@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      c4c2c7db
    • J
      ice: store VF pointer instead of VF ID · b03d519d
      Jacob Keller 提交于
      The VSI structure contains a vf_id field used to associate a VSI with a
      VF. This is used mainly for ICE_VSI_VF as well as partially for
      ICE_VSI_CTRL associated with the VFs.
      
      This API was designed with the idea that VFs are stored in a simple
      array that was expected to be static throughout most of the driver's
      life.
      
      We plan on refactoring VF storage in a few key ways:
      
        1) converting from a simple static array to a hash table
        2) using krefs to track VF references obtained from the hash table
        3) use RCU to delay release of VF memory until after all references
           are dropped
      
      This is motivated by the goal to ensure that the lifetime of VF
      structures is accounted for, and prevent various use-after-free bugs.
      
      With the existing vsi->vf_id, the reference tracking for VFs would
      become somewhat convoluted, because each VSI maintains a vf_id field
      which will then require performing a look up. This means all these flows
      will require reference tracking and proper usage of rcu_read_lock, etc.
      
      We know that the VF VSI will always be backed by a valid VF structure,
      because the VSI is created during VF initialization and removed before
      the VF is destroyed. Rely on this and store a reference to the VF in the
      VSI structure instead of storing a VF ID. This will simplify the usage
      and avoid the need to perform lookups on the hash table in the future.
      
      For ICE_VSI_VF, it is expected that vsi->vf is always non-NULL after
      ice_vsi_alloc succeeds. Because of this, use WARN_ON when checking if a
      vsi->vf pointer is valid when dealing with VF VSIs. This will aid in
      debugging code which violates this assumption and avoid more disastrous
      panics.
      Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
      Tested-by: NKonrad Jankowski <konrad0.jankowski@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      b03d519d
  11. 03 3月, 2022 1 次提交
  12. 14 2月, 2022 2 次提交
  13. 10 2月, 2022 9 次提交
    • B
      ice: Advertise 802.1ad VLAN filtering and offloads for PF netdev · 1babaf77
      Brett Creeley 提交于
      In order for the driver to support 802.1ad VLAN filtering and offloads,
      it needs to advertise those VLAN features and also support modifying
      those VLAN features, so make the necessary changes to
      ice_set_netdev_features(). By default, enable CTAG insertion/stripping
      and CTAG filtering for both Single and Double VLAN Modes (SVM/DVM).
      Also, in DVM, enable STAG filtering by default. This is done by
      setting the feature bits in netdev->features. Also, in DVM, support
      toggling of STAG insertion/stripping, but don't enable them by
      default. This is done by setting the feature bits in
      netdev->hw_features.
      
      Since 802.1ad VLAN filtering and offloads are only supported in DVM, make
      sure they are not enabled by default and that they cannot be enabled
      during runtime, when the device is in SVM.
      
      Add an implementation for the ndo_fix_features() callback. This is
      needed since the hardware cannot support multiple VLAN ethertypes for
      VLAN insertion/stripping simultaneously and all supported VLAN filtering
      must either be enabled or disabled together.
      
      Disable inner VLAN stripping by default when DVM is enabled. If a VSI
      supports stripping the inner VLAN in DVM, then it will have to configure
      that during runtime. For example if a VF is configured in a port VLAN
      while DVM is enabled it will be allowed to offload inner VLANs.
      Signed-off-by: NBrett Creeley <brett.creeley@intel.com>
      Tested-by: NGurucharan G <gurucharanx.g@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      1babaf77
    • B
      ice: Add hot path support for 802.1Q and 802.1ad VLAN offloads · 0d54d8f7
      Brett Creeley 提交于
      Currently the driver only supports 802.1Q VLAN insertion and stripping.
      However, once Double VLAN Mode (DVM) is fully supported, then both 802.1Q
      and 802.1ad VLAN insertion and stripping will be supported. Unfortunately
      the VSI context parameters only allow for one VLAN ethertype at a time
      for VLAN offloads so only one or the other VLAN ethertype offload can be
      supported at once.
      
      To support this, multiple changes are needed.
      
      Rx path changes:
      
      [1] In DVM, the Rx queue context l2tagsel field needs to be cleared so
      the outermost tag shows up in the l2tag2_2nd field of the Rx flex
      descriptor. In Single VLAN Mode (SVM), the l2tagsel field should remain
      1 to support SVM configurations.
      
      [2] Modify the ice_test_staterr() function to take a __le16 instead of
      the ice_32b_rx_flex_desc union pointer so this function can be used for
      both rx_desc->wb.status_error0 and rx_desc->wb.status_error1.
      
      [3] Add the new inline function ice_get_vlan_tag_from_rx_desc() that
      checks if there is a VLAN tag in l2tag1 or l2tag2_2nd.
      
      [4] In ice_receive_skb(), add a check to see if NETIF_F_HW_VLAN_STAG_RX
      is enabled in netdev->features. If it is, then this is the VLAN
      ethertype that needs to be added to the stripping VLAN tag. Since
      ice_fix_features() prevents CTAG_RX and STAG_RX from being enabled
      simultaneously, the VLAN ethertype will only ever be 802.1Q or 802.1ad.
      
      Tx path changes:
      
      [1] In DVM, the VLAN tag needs to be placed in the l2tag2 field of the Tx
      context descriptor. The new define ICE_TX_FLAGS_HW_OUTER_SINGLE_VLAN was
      added to the list of tx_flags to handle this case.
      
      [2] When the stack requests the VLAN tag to be offloaded on Tx, the
      driver needs to set either ICE_TX_FLAGS_HW_OUTER_SINGLE_VLAN or
      ICE_TX_FLAGS_HW_VLAN, so the tag is inserted in l2tag2 or l2tag1
      respectively. To determine which location to use, set a bit in the Tx
      ring flags field during ring allocation that can be used to determine
      which field to use in the Tx descriptor. In DVM, always use l2tag2,
      and in SVM, always use l2tag1.
      Signed-off-by: NBrett Creeley <brett.creeley@intel.com>
      Tested-by: NGurucharan G <gurucharanx.g@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      0d54d8f7
    • B
      ice: Add outer_vlan_ops and VSI specific VLAN ops implementations · c31af68a
      Brett Creeley 提交于
      Add a new outer_vlan_ops member to the ice_vsi structure as outer VLAN
      ops are only available when the device is in Double VLAN Mode (DVM).
      Depending on the VSI type, the requirements for what operations to
      use/allow differ.
      
      By default all VSI's have unsupported inner and outer VSI VLAN ops. This
      implementation was chosen to prevent unexpected crashes due to null
      pointer dereferences. Instead, if a VSI calls an unsupported op, it will
      just return -EOPNOTSUPP.
      
      Add implementations to support modifying outer VLAN fields for VSI
      context. This includes the ability to modify VLAN stripping, insertion,
      and the port VLAN based on the outer VLAN handling fields of the VSI
      context.
      
      These functions should only ever be used if DVM is enabled because that
      means the firmware supports the outer VLAN fields in the VSI context. If
      the device is in DVM, then always use the outer_vlan_ops, else use the
      vlan_ops since the device is in Single VLAN Mode (SVM).
      
      Also, move adding the untagged VLAN 0 filter from ice_vsi_setup() to
      ice_vsi_vlan_setup() as the latter function is specific to the PF and
      all other VSI types that need an untagged VLAN 0 filter already do this
      in their specific flows. Without this change, Flow Director is failing
      to initialize because it does not implement any VSI VLAN ops.
      Signed-off-by: NBrett Creeley <brett.creeley@intel.com>
      Tested-by: NGurucharan G <gurucharanx.g@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      c31af68a
    • B
      ice: Adjust naming for inner VLAN operations · 7bd527aa
      Brett Creeley 提交于
      Current operations act on inner VLAN fields. To support double VLAN, outer
      VLAN operations and functions will be implemented. Add the "inner" naming
      to existing VLAN operations to distinguish them from the upcoming outer
      values and functions. Some spacing adjustments are made to align
      values.
      
      Note that the inner is not talking about a tunneled VLAN, but the second
      VLAN in the packet. For SVM the driver uses inner or single VLAN
      filtering and offloads and in Double VLAN Mode the driver uses the
      inner filtering and offloads for SR-IOV VFs in port VLANs in order to
      support offloading the guest VLAN while a port VLAN is configured.
      Signed-off-by: NBrett Creeley <brett.creeley@intel.com>
      Tested-by: NGurucharan G <gurucharanx.g@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      7bd527aa
    • B
      ice: Use the proto argument for VLAN ops · 2bfefa2d
      Brett Creeley 提交于
      Currently the proto argument is unused. This is because the driver only
      supports 802.1Q VLAN filtering. This policy is enforced via netdev
      features that the driver sets up when configuring the netdev, so the
      proto argument won't ever be anything other than 802.1Q. However, this
      will allow for future iterations of the driver to seemlessly support
      802.1ad filtering. Begin using the proto argument and extend the related
      structures to support its use.
      Signed-off-by: NBrett Creeley <brett.creeley@intel.com>
      Tested-by: NGurucharan G <gurucharanx.g@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      2bfefa2d
    • B
      ice: Introduce ice_vlan struct · fb05ba12
      Brett Creeley 提交于
      Add a new struct for VLAN related information. Currently this holds
      VLAN ID and priority values, but will be expanded to hold TPID value.
      This reduces the changes necessary if any other values are added in
      future. Remove the action argument from these calls as it's always
      ICE_FWD_VSI.
      Signed-off-by: NBrett Creeley <brett.creeley@intel.com>
      Tested-by: NGurucharan G <gurucharanx.g@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      fb05ba12
    • B
      ice: Add new VSI VLAN ops · bc42afa9
      Brett Creeley 提交于
      Incoming changes to support 802.1Q and/or 802.1ad VLAN filtering and
      offloads require more flexibility when configuring VLANs. The VSI VLAN
      interface will allow flexibility for configuring VLANs for all VSI
      types. Add new files to separate the VSI VLAN ops and move functions to
      make the code more organized.
      Signed-off-by: NBrett Creeley <brett.creeley@intel.com>
      Tested-by: NGurucharan G <gurucharanx.g@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      bc42afa9
    • B
      ice: Add helper function for adding VLAN 0 · 3e0b5971
      Brett Creeley 提交于
      There are multiple places where VLAN 0 is being added. Create a function
      to be called in order to minimize changes as the implementation is expanded
      to support double VLAN and avoid duplicated code.
      Signed-off-by: NBrett Creeley <brett.creeley@intel.com>
      Tested-by: NGurucharan G <gurucharanx.g@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      3e0b5971
    • B
      ice: Refactor spoofcheck configuration functions · daf4dd16
      Brett Creeley 提交于
      Add functions to configure Tx VLAN antispoof based on iproute
      configuration and/or VLAN mode and VF driver support. This is needed
      later so the driver can control when it can be configured. Also, add
      functions that can be used to enable and disable MAC and VLAN
      spoofcheck. Move spoofchk configuration during VSI setup into the
      SR-IOV initialization path and into the post VSI rebuild flow for VF
      VSIs.
      Signed-off-by: NBrett Creeley <brett.creeley@intel.com>
      Tested-by: NGurucharan G <gurucharanx.g@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      daf4dd16
  14. 30 12月, 2021 1 次提交
  15. 15 12月, 2021 6 次提交
  16. 23 11月, 2021 1 次提交
    • M
      ice: fix vsi->txq_map sizing · 792b2086
      Maciej Fijalkowski 提交于
      The approach of having XDP queue per CPU regardless of user's setting
      exposed a hidden bug that could occur in case when Rx queue count differ
      from Tx queue count. Currently vsi->txq_map's size is equal to the
      doubled vsi->alloc_txq, which is not correct due to the fact that XDP
      rings were previously based on the Rx queue count. Below splat can be
      seen when ethtool -L is used and XDP rings are configured:
      
      [  682.875339] BUG: kernel NULL pointer dereference, address: 000000000000000f
      [  682.883403] #PF: supervisor read access in kernel mode
      [  682.889345] #PF: error_code(0x0000) - not-present page
      [  682.895289] PGD 0 P4D 0
      [  682.898218] Oops: 0000 [#1] PREEMPT SMP PTI
      [  682.903055] CPU: 42 PID: 2878 Comm: ethtool Tainted: G           OE     5.15.0-rc5+ #1
      [  682.912214] Hardware name: Intel Corp. GRANTLEY/GRANTLEY, BIOS GRRFCRB1.86B.0276.D07.1605190235 05/19/2016
      [  682.923380] RIP: 0010:devres_remove+0x44/0x130
      [  682.928527] Code: 49 89 f4 55 48 89 fd 4c 89 ff 53 48 83 ec 10 e8 92 b9 49 00 48 8b 9d a8 02 00 00 48 8d 8d a0 02 00 00 49 89 c2 48 39 cb 74 0f <4c> 3b 63 10 74 25 48 8b 5b 08 48 39 cb 75 f1 4c 89 ff 4c 89 d6 e8
      [  682.950237] RSP: 0018:ffffc90006a679f0 EFLAGS: 00010002
      [  682.956285] RAX: 0000000000000286 RBX: ffffffffffffffff RCX: ffff88908343a370
      [  682.964538] RDX: 0000000000000001 RSI: ffffffff81690d60 RDI: 0000000000000000
      [  682.972789] RBP: ffff88908343a0d0 R08: 0000000000000000 R09: 0000000000000000
      [  682.981040] R10: 0000000000000286 R11: 3fffffffffffffff R12: ffffffff81690d60
      [  682.989282] R13: ffffffff81690a00 R14: ffff8890819807a8 R15: ffff88908343a36c
      [  682.997535] FS:  00007f08c7bfa740(0000) GS:ffff88a03fd00000(0000) knlGS:0000000000000000
      [  683.006910] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  683.013557] CR2: 000000000000000f CR3: 0000001080a66003 CR4: 00000000003706e0
      [  683.021819] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [  683.030075] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [  683.038336] Call Trace:
      [  683.041167]  devm_kfree+0x33/0x50
      [  683.045004]  ice_vsi_free_arrays+0x5e/0xc0 [ice]
      [  683.050380]  ice_vsi_rebuild+0x4c8/0x750 [ice]
      [  683.055543]  ice_vsi_recfg_qs+0x9a/0x110 [ice]
      [  683.060697]  ice_set_channels+0x14f/0x290 [ice]
      [  683.065962]  ethnl_set_channels+0x333/0x3f0
      [  683.070807]  genl_family_rcv_msg_doit+0xea/0x150
      [  683.076152]  genl_rcv_msg+0xde/0x1d0
      [  683.080289]  ? channels_prepare_data+0x60/0x60
      [  683.085432]  ? genl_get_cmd+0xd0/0xd0
      [  683.089667]  netlink_rcv_skb+0x50/0xf0
      [  683.094006]  genl_rcv+0x24/0x40
      [  683.097638]  netlink_unicast+0x239/0x340
      [  683.102177]  netlink_sendmsg+0x22e/0x470
      [  683.106717]  sock_sendmsg+0x5e/0x60
      [  683.110756]  __sys_sendto+0xee/0x150
      [  683.114894]  ? handle_mm_fault+0xd0/0x2a0
      [  683.119535]  ? do_user_addr_fault+0x1f3/0x690
      [  683.134173]  __x64_sys_sendto+0x25/0x30
      [  683.148231]  do_syscall_64+0x3b/0xc0
      [  683.161992]  entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      Fix this by taking into account the value that num_possible_cpus()
      yields in addition to vsi->alloc_txq instead of doubling the latter.
      
      Fixes: efc2214b ("ice: Add support for XDP")
      Fixes: 22bf877e ("ice: introduce XDP_TX fallback path")
      Reviewed-by: NAlexander Lobakin <alexandr.lobakin@intel.com>
      Signed-off-by: NMaciej Fijalkowski <maciej.fijalkowski@intel.com>
      Tested-by: NKiran Bhandare <kiranx.bhandare@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      792b2086
  17. 30 10月, 2021 1 次提交
  18. 29 10月, 2021 1 次提交
  19. 21 10月, 2021 2 次提交
  20. 20 10月, 2021 2 次提交
    • J
      ice: fix rate limit update after coalesce change · d16a4f45
      Jesse Brandeburg 提交于
      If the adaptive settings are changed with
      ethtool -C ethx adaptive-rx off adaptive-tx off
      then the interrupt rate limit should be maintained as a user set value,
      but only if BOTH adaptive settings are off. Fix a bug where the rate
      limit that was being used in adaptive mode was staying set in the
      register but was not reported correctly by ethtool -c ethx. Due to long
      lines include a small refactor of q_vector variable.
      
      Fixes: b8b47723 ("ice: refactor interrupt moderation writes")
      Signed-off-by: NJesse Brandeburg <jesse.brandeburg@intel.com>
      Tested-by: NGurucharan G <gurucharanx.g@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      d16a4f45
    • J
      ice: update dim usage and moderation · d8eb7ad5
      Jesse Brandeburg 提交于
      The driver was having trouble with unreliable latency when doing single
      threaded ping-pong tests. This was root caused to the DIM algorithm
      landing on a too slow interrupt value, which caused high latency, and it
      was especially present when queues were being switched frequently by the
      scheduler as happens on default setups today.
      
      In attempting to improve this, we allow the upper rate limit for
      interrupts to move to rate limit of 4 microseconds as a max, which means
      that no vector can generate more than 250,000 interrupts per second. The
      old config was up to 100,000. The driver previously tried to program the
      rate limit too frequently and if the receive and transmit side were both
      active on the same vector, the INTRL would be set incorrectly, and this
      change fixes that issue as a side effect of the redesign.
      
      This driver will operate from now on with a slightly changed DIM table
      with more emphasis towards latency sensitivity by having more table
      entries with lower latency than with high latency (high being >= 64
      microseconds).
      
      The driver also resets the DIM algorithm state with a new stats set when
      there is no work done and the data becomes stale (older than 1 second),
      for the respective receive or transmit portion of the interrupt.
      
      Add a new helper for setting rate limit, which will be used more
      in a followup patch.
      Signed-off-by: NJesse Brandeburg <jesse.brandeburg@intel.com>
      Tested-by: NGurucharan G <gurucharanx.g@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      d8eb7ad5