1. 15 3月, 2022 1 次提交
    • J
      ice: remove circular header dependencies on ice.h · 649c87c6
      Jacob Keller 提交于
      Several headers in the ice driver include ice.h even though they are
      themselves included by that header. The most notable of these is
      ice_common.h, but several other headers also do this.
      
      Such a recursive inclusion is problematic as it forces headers to be
      included in a strict order, otherwise compilation errors can result. The
      circular inclusions do not trigger an endless loop due to standard
      header inclusion guards, however other errors can occur.
      
      For example, ice_flow.h defines ice_rss_hash_cfg, which is used by
      ice_sriov.h as part of the definition of ice_vf_hash_ip_ctx.
      
      ice_flow.h includes ice_acl.h, which includes ice_common.h, and which
      finally includes ice.h. Since ice.h itself includes ice_sriov.h, this
      creates a circular dependency.
      
      The definition in ice_sriov.h requires things from ice_flow.h, but
      ice_flow.h itself will lead to trying to load ice_sriov.h as part of its
      process for expanding ice.h. The current code avoids this issue by
      having an implicit dependency without the include of ice_flow.h.
      
      If we were to fix that so that ice_sriov.h explicitly depends on
      ice_flow.h the following pattern would occur:
      
        ice_flow.h -> ice_acl.h -> ice_common.h -> ice.h -> ice_sriov.h
      
      At this point, during the expansion of, the header guard for ice_flow.h
      is already set, so when ice_sriov.h attempts to load the ice_flow.h
      header it is skipped. Then, we go on to begin including the rest of
      ice_sriov.h, including structure definitions which depend on
      ice_rss_hash_cfg. This produces a compiler warning because
      ice_rss_hash_cfg hasn't yet been included. Remember, we're just at the
      start of ice_flow.h!
      
      If the order of headers is incorrect (ice_flow.h is not implicitly
      loaded first in all files which include ice_sriov.h) then we get the
      same failure.
      
      Removing this recursive inclusion requires fixing a few cases where some
      headers depended on the header inclusions from ice.h. In addition, a few
      other changes are also required.
      
      Most notably, ice_hw_to_dev is implemented as a macro in ice_osdep.h,
      which is the likely reason that ice_common.h includes ice.h at all. This
      macro implementation requires the full definition of ice_pf in order to
      properly compile.
      
      Fix this by moving it to a function declared in ice_main.c, so that we
      do not require all files to depend on the layout of the ice_pf
      structure.
      
      Note that this change only fixes circular dependencies, but it does not
      fully resolve all implicit dependencies where one header may depend on
      the inclusion of another. I tried to fix as many of the implicit
      dependencies as I noticed, but fixing them all requires a somewhat
      tedious analysis of each header and attempting to compile it separately.
      Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
      Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel)
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      649c87c6
  2. 11 3月, 2022 1 次提交
    • I
      ice: Fix race condition during interface enslave · 5cb1ebdb
      Ivan Vecera 提交于
      Commit 5dbbbd01 ("ice: Avoid RTNL lock when re-creating
      auxiliary device") changes a process of re-creation of aux device
      so ice_plug_aux_dev() is called from ice_service_task() context.
      This unfortunately opens a race window that can result in dead-lock
      when interface has left LAG and immediately enters LAG again.
      
      Reproducer:
      ```
      #!/bin/sh
      
      ip link add lag0 type bond mode 1 miimon 100
      ip link set lag0
      
      for n in {1..10}; do
              echo Cycle: $n
              ip link set ens7f0 master lag0
              sleep 1
              ip link set ens7f0 nomaster
      done
      ```
      
      This results in:
      [20976.208697] Workqueue: ice ice_service_task [ice]
      [20976.213422] Call Trace:
      [20976.215871]  __schedule+0x2d1/0x830
      [20976.219364]  schedule+0x35/0xa0
      [20976.222510]  schedule_preempt_disabled+0xa/0x10
      [20976.227043]  __mutex_lock.isra.7+0x310/0x420
      [20976.235071]  enum_all_gids_of_dev_cb+0x1c/0x100 [ib_core]
      [20976.251215]  ib_enum_roce_netdev+0xa4/0xe0 [ib_core]
      [20976.256192]  ib_cache_setup_one+0x33/0xa0 [ib_core]
      [20976.261079]  ib_register_device+0x40d/0x580 [ib_core]
      [20976.266139]  irdma_ib_register_device+0x129/0x250 [irdma]
      [20976.281409]  irdma_probe+0x2c1/0x360 [irdma]
      [20976.285691]  auxiliary_bus_probe+0x45/0x70
      [20976.289790]  really_probe+0x1f2/0x480
      [20976.298509]  driver_probe_device+0x49/0xc0
      [20976.302609]  bus_for_each_drv+0x79/0xc0
      [20976.306448]  __device_attach+0xdc/0x160
      [20976.310286]  bus_probe_device+0x9d/0xb0
      [20976.314128]  device_add+0x43c/0x890
      [20976.321287]  __auxiliary_device_add+0x43/0x60
      [20976.325644]  ice_plug_aux_dev+0xb2/0x100 [ice]
      [20976.330109]  ice_service_task+0xd0c/0xed0 [ice]
      [20976.342591]  process_one_work+0x1a7/0x360
      [20976.350536]  worker_thread+0x30/0x390
      [20976.358128]  kthread+0x10a/0x120
      [20976.365547]  ret_from_fork+0x1f/0x40
      ...
      [20976.438030] task:ip              state:D stack:    0 pid:213658 ppid:213627 flags:0x00004084
      [20976.446469] Call Trace:
      [20976.448921]  __schedule+0x2d1/0x830
      [20976.452414]  schedule+0x35/0xa0
      [20976.455559]  schedule_preempt_disabled+0xa/0x10
      [20976.460090]  __mutex_lock.isra.7+0x310/0x420
      [20976.464364]  device_del+0x36/0x3c0
      [20976.467772]  ice_unplug_aux_dev+0x1a/0x40 [ice]
      [20976.472313]  ice_lag_event_handler+0x2a2/0x520 [ice]
      [20976.477288]  notifier_call_chain+0x47/0x70
      [20976.481386]  __netdev_upper_dev_link+0x18b/0x280
      [20976.489845]  bond_enslave+0xe05/0x1790 [bonding]
      [20976.494475]  do_setlink+0x336/0xf50
      [20976.502517]  __rtnl_newlink+0x529/0x8b0
      [20976.543441]  rtnl_newlink+0x43/0x60
      [20976.546934]  rtnetlink_rcv_msg+0x2b1/0x360
      [20976.559238]  netlink_rcv_skb+0x4c/0x120
      [20976.563079]  netlink_unicast+0x196/0x230
      [20976.567005]  netlink_sendmsg+0x204/0x3d0
      [20976.570930]  sock_sendmsg+0x4c/0x50
      [20976.574423]  ____sys_sendmsg+0x1eb/0x250
      [20976.586807]  ___sys_sendmsg+0x7c/0xc0
      [20976.606353]  __sys_sendmsg+0x57/0xa0
      [20976.609930]  do_syscall_64+0x5b/0x1a0
      [20976.613598]  entry_SYSCALL_64_after_hwframe+0x65/0xca
      
      1. Command 'ip link ... set nomaster' causes that ice_plug_aux_dev()
         is called from ice_service_task() context, aux device is created
         and associated device->lock is taken.
      2. Command 'ip link ... set master...' calls ice's notifier under
         RTNL lock and that notifier calls ice_unplug_aux_dev(). That
         function tries to take aux device->lock but this is already taken
         by ice_plug_aux_dev() in step 1
      3. Later ice_plug_aux_dev() tries to take RTNL lock but this is already
         taken in step 2
      4. Dead-lock
      
      The patch fixes this issue by following changes:
      - Bit ICE_FLAG_PLUG_AUX_DEV is kept to be set during ice_plug_aux_dev()
        call in ice_service_task()
      - The bit is checked in ice_clear_rdma_cap() and only if it is not set
        then ice_unplug_aux_dev() is called. If it is set (in other words
        plugging of aux device was requested and ice_plug_aux_dev() is
        potentially running) then the function only clears the bit
      - Once ice_plug_aux_dev() call (in ice_service_task) is finished
        the bit ICE_FLAG_PLUG_AUX_DEV is cleared but it is also checked
        whether it was already cleared by ice_clear_rdma_cap(). If so then
        aux device is unplugged.
      Signed-off-by: NIvan Vecera <ivecera@redhat.com>
      Co-developed-by: NPetr Oros <poros@redhat.com>
      Signed-off-by: NPetr Oros <poros@redhat.com>
      Reviewed-by: NDave Ertman <david.m.ertman@intel.com>
      Link: https://lore.kernel.org/r/20220310171641.3863659-1-ivecera@redhat.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      5cb1ebdb
  3. 10 3月, 2022 1 次提交
  4. 09 3月, 2022 2 次提交
  5. 04 3月, 2022 4 次提交
    • J
      ice: convert VF storage to hash table with krefs and RCU · 3d5985a1
      Jacob Keller 提交于
      The ice driver stores VF structures in a simple array which is allocated
      once at the time of VF creation. The VF structures are then accessed
      from the array by their VF ID. The ID must be between 0 and the number
      of allocated VFs.
      
      Multiple threads can access this table:
      
       * .ndo operations such as .ndo_get_vf_cfg or .ndo_set_vf_trust
       * interrupts, such as due to messages from the VF using the virtchnl
         communication
       * processing such as device reset
       * commands to add or remove VFs
      
      The current implementation does not keep track of when all threads are
      done operating on a VF and can potentially result in use-after-free
      issues caused by one thread accessing a VF structure after it has been
      released when removing VFs. Some of these are prevented with various
      state flags and checks.
      
      In addition, this structure is quite static and does not support a
      planned future where virtualization can be more dynamic. As we begin to
      look at supporting Scalable IOV with the ice driver (as opposed to just
      supporting Single Root IOV), this structure is not sufficient.
      
      In the future, VFs will be able to be added and removed individually and
      dynamically.
      
      To allow for this, and to better protect against a whole class of
      use-after-free bugs, replace the VF storage with a combination of a hash
      table and krefs to reference track all of the accesses to VFs through
      the hash table.
      
      A hash table still allows efficient look up of the VF given its ID, but
      also allows adding and removing VFs. It does not require contiguous VF
      IDs.
      
      The use of krefs allows the cleanup of the VF memory to be delayed until
      after all threads have released their reference (by calling ice_put_vf).
      
      To prevent corruption of the hash table, a combination of RCU and the
      mutex table_lock are used. Addition and removal from the hash table use
      the RCU-aware hash macros. This allows simple read-only look ups that
      iterate to locate a single VF can be fast using RCU. Accesses which
      modify the hash table, or which can't take RCU because they sleep, will
      hold the mutex lock.
      
      By using this design, we have a stronger guarantee that the VF structure
      can't be released until after all threads are finished operating on it.
      We also pave the way for the more dynamic Scalable IOV implementation in
      the future.
      Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
      Tested-by: NKonrad Jankowski <konrad0.jankowski@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      3d5985a1
    • J
      ice: factor VF variables to separate structure · 000773c0
      Jacob Keller 提交于
      We maintain a number of values for VFs within the ice_pf structure. This
      includes the VF table, the number of allocated VFs, the maximum number
      of supported SR-IOV VFs, the number of queue pairs per VF, the number of
      MSI-X vectors per VF, and a bitmap of the VFs with detected MDD events.
      
      We're about to add a few more variables to this list. Clean this up
      first by extracting these members out into a new ice_vfs structure
      defined in ice_virtchnl_pf.h
      Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
      Tested-by: NKonrad Jankowski <konrad0.jankowski@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      000773c0
    • J
      ice: convert ice_for_each_vf to include VF entry iterator · c4c2c7db
      Jacob Keller 提交于
      The ice_for_each_vf macro is intended to be used to loop over all VFs.
      The current implementation relies on an iterator that is the index into
      the VF array in the PF structure. This forces all users to perform a
      look up themselves.
      
      This abstraction forces a lot of duplicate work on callers and leaks the
      interface implementation to the caller. Replace this with an
      implementation that includes the VF pointer the primary iterator. This
      version simplifies callers which just want to iterate over every VF, as
      they no longer need to perform their own lookup.
      
      The "i" iterator value is replaced with a new unsigned int "bkt"
      parameter, as this will match the necessary interface for replacing
      the VF array with a hash table. For now, the bkt is the VF ID, but in
      the future it will simply be the hash bucket index. Document that it
      should not be treated as a VF ID.
      
      This change aims to simplify switching from the array to a hash table. I
      considered alternative implementations such as an xarray but decided
      that the hash table was the simplest and most suitable implementation. I
      also looked at methods to hide the bkt iterator entirely, but I couldn't
      come up with a feasible solution that worked for hash table iterators.
      Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
      Tested-by: NKonrad Jankowski <konrad0.jankowski@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      c4c2c7db
    • J
      ice: store VF pointer instead of VF ID · b03d519d
      Jacob Keller 提交于
      The VSI structure contains a vf_id field used to associate a VSI with a
      VF. This is used mainly for ICE_VSI_VF as well as partially for
      ICE_VSI_CTRL associated with the VFs.
      
      This API was designed with the idea that VFs are stored in a simple
      array that was expected to be static throughout most of the driver's
      life.
      
      We plan on refactoring VF storage in a few key ways:
      
        1) converting from a simple static array to a hash table
        2) using krefs to track VF references obtained from the hash table
        3) use RCU to delay release of VF memory until after all references
           are dropped
      
      This is motivated by the goal to ensure that the lifetime of VF
      structures is accounted for, and prevent various use-after-free bugs.
      
      With the existing vsi->vf_id, the reference tracking for VFs would
      become somewhat convoluted, because each VSI maintains a vf_id field
      which will then require performing a look up. This means all these flows
      will require reference tracking and proper usage of rcu_read_lock, etc.
      
      We know that the VF VSI will always be backed by a valid VF structure,
      because the VSI is created during VF initialization and removed before
      the VF is destroyed. Rely on this and store a reference to the VF in the
      VSI structure instead of storing a VF ID. This will simplify the usage
      and avoid the need to perform lookups on the hash table in the future.
      
      For ICE_VSI_VF, it is expected that vsi->vf is always non-NULL after
      ice_vsi_alloc succeeds. Because of this, use WARN_ON when checking if a
      vsi->vf pointer is valid when dealing with VF VSIs. This will aid in
      debugging code which violates this assumption and avoid more disastrous
      panics.
      Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
      Tested-by: NKonrad Jankowski <konrad0.jankowski@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      b03d519d
  6. 03 3月, 2022 1 次提交
  7. 19 2月, 2022 1 次提交
    • J
      ice: fix concurrent reset and removal of VFs · fadead80
      Jacob Keller 提交于
      Commit c503e632 ("ice: Stop processing VF messages during teardown")
      introduced a driver state flag, ICE_VF_DEINIT_IN_PROGRESS, which is
      intended to prevent some issues with concurrently handling messages from
      VFs while tearing down the VFs.
      
      This change was motivated by crashes caused while tearing down and
      bringing up VFs in rapid succession.
      
      It turns out that the fix actually introduces issues with the VF driver
      caused because the PF no longer responds to any messages sent by the VF
      during its .remove routine. This results in the VF potentially removing
      its DMA memory before the PF has shut down the device queues.
      
      Additionally, the fix doesn't actually resolve concurrency issues within
      the ice driver. It is possible for a VF to initiate a reset just prior
      to the ice driver removing VFs. This can result in the remove task
      concurrently operating while the VF is being reset. This results in
      similar memory corruption and panics purportedly fixed by that commit.
      
      Fix this concurrency at its root by protecting both the reset and
      removal flows using the existing VF cfg_lock. This ensures that we
      cannot remove the VF while any outstanding critical tasks such as a
      virtchnl message or a reset are occurring.
      
      This locking change also fixes the root cause originally fixed by commit
      c503e632 ("ice: Stop processing VF messages during teardown"), so we
      can simply revert it.
      
      Note that I kept these two changes together because simply reverting the
      original commit alone would leave the driver vulnerable to worse race
      conditions.
      
      Fixes: c503e632 ("ice: Stop processing VF messages during teardown")
      Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
      Tested-by: NKonrad Jankowski <konrad0.jankowski@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      fadead80
  8. 14 2月, 2022 1 次提交
  9. 11 2月, 2022 2 次提交
  10. 10 2月, 2022 7 次提交
    • B
      ice: Advertise 802.1ad VLAN filtering and offloads for PF netdev · 1babaf77
      Brett Creeley 提交于
      In order for the driver to support 802.1ad VLAN filtering and offloads,
      it needs to advertise those VLAN features and also support modifying
      those VLAN features, so make the necessary changes to
      ice_set_netdev_features(). By default, enable CTAG insertion/stripping
      and CTAG filtering for both Single and Double VLAN Modes (SVM/DVM).
      Also, in DVM, enable STAG filtering by default. This is done by
      setting the feature bits in netdev->features. Also, in DVM, support
      toggling of STAG insertion/stripping, but don't enable them by
      default. This is done by setting the feature bits in
      netdev->hw_features.
      
      Since 802.1ad VLAN filtering and offloads are only supported in DVM, make
      sure they are not enabled by default and that they cannot be enabled
      during runtime, when the device is in SVM.
      
      Add an implementation for the ndo_fix_features() callback. This is
      needed since the hardware cannot support multiple VLAN ethertypes for
      VLAN insertion/stripping simultaneously and all supported VLAN filtering
      must either be enabled or disabled together.
      
      Disable inner VLAN stripping by default when DVM is enabled. If a VSI
      supports stripping the inner VLAN in DVM, then it will have to configure
      that during runtime. For example if a VF is configured in a port VLAN
      while DVM is enabled it will be allowed to offload inner VLANs.
      Signed-off-by: NBrett Creeley <brett.creeley@intel.com>
      Tested-by: NGurucharan G <gurucharanx.g@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      1babaf77
    • B
      ice: Support configuring the device to Double VLAN Mode · a1ffafb0
      Brett Creeley 提交于
      In order to support configuring the device in Double VLAN Mode (DVM),
      the DDP and FW have to support DVM. If both support DVM, the PF that
      downloads the package needs to update the default recipes, set the
      VLAN mode, and update boost TCAM entries.
      
      To support updating the default recipes in DVM, add support for
      updating an existing switch recipe's lkup_idx and mask. This is done
      by first calling the get recipe AQ (0x0292) with the desired recipe
      ID. Then, if that is successful update one of the lookup indices
      (lkup_idx) and its associated mask if the mask is valid otherwise
      the already existing mask will be used.
      
      The VLAN mode of the device has to be configured while the global
      configuration lock is held while downloading the DDP, specifically after
      the DDP has been downloaded. If supported, the device will default to
      DVM.
      Co-developed-by: NDan Nowlin <dan.nowlin@intel.com>
      Signed-off-by: NDan Nowlin <dan.nowlin@intel.com>
      Signed-off-by: NBrett Creeley <brett.creeley@intel.com>
      Tested-by: NGurucharan G <gurucharanx.g@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      a1ffafb0
    • B
      ice: Add outer_vlan_ops and VSI specific VLAN ops implementations · c31af68a
      Brett Creeley 提交于
      Add a new outer_vlan_ops member to the ice_vsi structure as outer VLAN
      ops are only available when the device is in Double VLAN Mode (DVM).
      Depending on the VSI type, the requirements for what operations to
      use/allow differ.
      
      By default all VSI's have unsupported inner and outer VSI VLAN ops. This
      implementation was chosen to prevent unexpected crashes due to null
      pointer dereferences. Instead, if a VSI calls an unsupported op, it will
      just return -EOPNOTSUPP.
      
      Add implementations to support modifying outer VLAN fields for VSI
      context. This includes the ability to modify VLAN stripping, insertion,
      and the port VLAN based on the outer VLAN handling fields of the VSI
      context.
      
      These functions should only ever be used if DVM is enabled because that
      means the firmware supports the outer VLAN fields in the VSI context. If
      the device is in DVM, then always use the outer_vlan_ops, else use the
      vlan_ops since the device is in Single VLAN Mode (SVM).
      
      Also, move adding the untagged VLAN 0 filter from ice_vsi_setup() to
      ice_vsi_vlan_setup() as the latter function is specific to the PF and
      all other VSI types that need an untagged VLAN 0 filter already do this
      in their specific flows. Without this change, Flow Director is failing
      to initialize because it does not implement any VSI VLAN ops.
      Signed-off-by: NBrett Creeley <brett.creeley@intel.com>
      Tested-by: NGurucharan G <gurucharanx.g@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      c31af68a
    • B
      ice: Adjust naming for inner VLAN operations · 7bd527aa
      Brett Creeley 提交于
      Current operations act on inner VLAN fields. To support double VLAN, outer
      VLAN operations and functions will be implemented. Add the "inner" naming
      to existing VLAN operations to distinguish them from the upcoming outer
      values and functions. Some spacing adjustments are made to align
      values.
      
      Note that the inner is not talking about a tunneled VLAN, but the second
      VLAN in the packet. For SVM the driver uses inner or single VLAN
      filtering and offloads and in Double VLAN Mode the driver uses the
      inner filtering and offloads for SR-IOV VFs in port VLANs in order to
      support offloading the guest VLAN while a port VLAN is configured.
      Signed-off-by: NBrett Creeley <brett.creeley@intel.com>
      Tested-by: NGurucharan G <gurucharanx.g@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      7bd527aa
    • B
      ice: Use the proto argument for VLAN ops · 2bfefa2d
      Brett Creeley 提交于
      Currently the proto argument is unused. This is because the driver only
      supports 802.1Q VLAN filtering. This policy is enforced via netdev
      features that the driver sets up when configuring the netdev, so the
      proto argument won't ever be anything other than 802.1Q. However, this
      will allow for future iterations of the driver to seemlessly support
      802.1ad filtering. Begin using the proto argument and extend the related
      structures to support its use.
      Signed-off-by: NBrett Creeley <brett.creeley@intel.com>
      Tested-by: NGurucharan G <gurucharanx.g@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      2bfefa2d
    • B
      ice: Introduce ice_vlan struct · fb05ba12
      Brett Creeley 提交于
      Add a new struct for VLAN related information. Currently this holds
      VLAN ID and priority values, but will be expanded to hold TPID value.
      This reduces the changes necessary if any other values are added in
      future. Remove the action argument from these calls as it's always
      ICE_FWD_VSI.
      Signed-off-by: NBrett Creeley <brett.creeley@intel.com>
      Tested-by: NGurucharan G <gurucharanx.g@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      fb05ba12
    • B
      ice: Add new VSI VLAN ops · bc42afa9
      Brett Creeley 提交于
      Incoming changes to support 802.1Q and/or 802.1ad VLAN filtering and
      offloads require more flexibility when configuring VLANs. The VSI VLAN
      interface will allow flexibility for configuring VLANs for all VSI
      types. Add new files to separate the VSI VLAN ops and move functions to
      make the code more organized.
      Signed-off-by: NBrett Creeley <brett.creeley@intel.com>
      Tested-by: NGurucharan G <gurucharanx.g@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      bc42afa9
  11. 28 1月, 2022 2 次提交
  12. 07 1月, 2022 1 次提交
    • W
      ice: improve switchdev's slow-path · c1e5da5d
      Wojciech Drewek 提交于
      In current switchdev implementation, every VF PR is assigned to
      individual ring on switchdev ctrl VSI. For slow-path traffic, there
      is a mapping VF->ring done in software based on src_vsi value (by
      calling ice_eswitch_get_target_netdev function).
      
      With this change, HW solution is introduced which is more
      efficient. For each VF, src MAC (VF's MAC) filter will be created,
      which forwards packets to the corresponding switchdev ctrl VSI queue
      based on src MAC address.
      
      This filter has to be removed and then replayed in case of
      resetting one VF. Keep information about this rule in repr->mac_rule,
      thanks to that we know which rule has to be removed and replayed
      for a given VF.
      
      In case of CORE/GLOBAL all rules are removed
      automatically. We have to take care of readding them. This is done
      by ice_replay_vsi_adv_rule.
      
      When driver leaves switchdev mode, remove all advanced rules
      from switchdev ctrl VSI. This is done by ice_rem_adv_rule_for_vsi.
      
      Flag repr->rule_added is needed because in some cases reset
      might be triggered before VF sends request to add MAC.
      Co-developed-by: NGrzegorz Nitka <grzegorz.nitka@intel.com>
      Signed-off-by: NGrzegorz Nitka <grzegorz.nitka@intel.com>
      Signed-off-by: NWojciech Drewek <wojciech.drewek@intel.com>
      Tested-by: NSandeep Penigalapati <sandeep.penigalapati@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      c1e5da5d
  13. 30 12月, 2021 1 次提交
  14. 23 12月, 2021 1 次提交
  15. 22 12月, 2021 2 次提交
  16. 16 12月, 2021 2 次提交
    • J
      ice: tighter control over VSI_DOWN state · 21c6e36b
      Jesse Brandeburg 提交于
      The driver had comments to the effect of: This flag should be set before
      calling this function. While reviewing code it was found that there were
      several violations of this policy, which could introduce hard to find
      bugs or races.
      
      Fix the violations of the "VSI DOWN state must be set before calling
      ice_down" and make checking the state into code with a WARN_ON.
      Signed-off-by: NJesse Brandeburg <jesse.brandeburg@intel.com>
      Tested-by: NGurucharan G <gurucharanx.g@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      21c6e36b
    • J
      ice: support immediate firmware activation via devlink reload · 399e27db
      Jacob Keller 提交于
      The ice hardware contains an embedded chip with firmware which can be
      updated using devlink flash. The firmware which runs on this chip is
      referred to as the Embedded Management Processor firmware (EMP
      firmware).
      
      Activating the new firmware image currently requires that the system be
      rebooted. This is not ideal as rebooting the system can cause unwanted
      downtime.
      
      In practical terms, activating the firmware does not always require a
      full system reboot. In many cases it is possible to activate the EMP
      firmware immediately. There are a couple of different scenarios to
      cover.
      
       * The EMP firmware itself can be reloaded by issuing a special update
         to the device called an Embedded Management Processor reset (EMP
         reset). This reset causes the device to reset and reload the EMP
         firmware.
      
       * PCI configuration changes are only reloaded after a cold PCIe reset.
         Unfortunately there is no generic way to trigger this for a PCIe
         device without a system reboot.
      
      When performing a flash update, firmware is capable of responding with
      some information about the specific update requirements.
      
      The driver updates the flash by programming a secondary inactive bank
      with the contents of the new image, and then issuing a command to
      request to switch the active bank starting from the next load.
      
      The response to the final command for updating the inactive NVM flash
      bank includes an indication of the minimum reset required to fully
      update the device. This can be one of the following:
      
       * A full power on is required
       * A cold PCIe reset is required
       * An EMP reset is required
      
      The response to the command to switch flash banks includes an indication
      of whether or not the firmware will allow an EMP reset request.
      
      For most updates, an EMP reset is sufficient to load the new EMP
      firmware without issues. In some cases, this reset is not sufficient
      because the PCI configuration space has changed. When this could cause
      incompatibility with the new EMP image, the firmware is capable of
      rejecting the EMP reset request.
      
      Add logic to ice_fw_update.c to handle the response data flash update
      AdminQ commands.
      
      For the reset level, issue a devlink status notification informing the
      user of how to complete the update with a simple suggestion like
      "Activate new firmware by rebooting the system".
      
      Cache the status of whether or not firmware will restrict the EMP reset
      for use in implementing devlink reload.
      
      Implement support for devlink reload with the "fw_activate" flag. This
      allows user space to request the firmware be activated immediately.
      
      For the .reload_down handler, we will issue a request for the EMP reset
      using the appropriate firmware AdminQ command. If we know that the
      firmware will not allow an EMP reset, simply exit with a suitable
      netlink extended ACK message indicating that the EMP reset is not
      available.
      
      For the .reload_up handler, simply wait until the driver has finished
      resetting. Logic to handle processing of an EMP reset already exists in
      the driver as part of its reset and rebuild flows.
      
      Implement support for the devlink reload interface with the
      "fw_activate" action. This allows userspace to request activation of
      firmware without a reboot.
      
      Note that support for indicating the required reset and EMP reset
      restriction is not supported on old versions of firmware. The driver can
      determine if the two features are supported by checking the device
      capabilities report. I confirmed support has existed since at least
      version 5.5.2 as reported by the 'fw.mgmt' version. Support to issue the
      EMP reset request has existed in all version of the EMP firmware for the
      ice hardware.
      
      Check the device capabilities report to determine whether or not the
      indications are reported by the running firmware. If the reset
      requirement indication is not supported, always assume a full power on
      is necessary. If the reset restriction capability is not supported,
      always assume the EMP reset is available.
      
      Users can verify if the EMP reset has activated the firmware by using
      the devlink info report to check that the 'running' firmware version has
      updated. For example a user might do the following:
      
       # Check current version
       $ devlink dev info
      
       # Update the device
       $ devlink dev flash pci/0000:af:00.0 file firmware.bin
      
       # Confirm stored version updated
       $ devlink dev info
      
       # Reload to activate new firmware
       $ devlink dev reload pci/0000:af:00.0 action fw_activate
      
       # Confirm running version updated
       $ devlink dev info
      
      Finally, this change does *not* implement basic driver-only reload
      support. I did look into trying to do this. However, it requires
      significant refactor of how the ice driver probes and loads everything.
      The ice driver probe and allocation flows were not designed with such
      a reload in mind. Refactoring the flow to support this is beyond the
      scope of this change.
      Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
      Tested-by: NGurucharan G <gurucharanx.g@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      399e27db
  17. 15 12月, 2021 8 次提交
  18. 09 12月, 2021 1 次提交
  19. 08 12月, 2021 1 次提交