1. 15 3月, 2022 6 次提交
    • J
      ice: rename ICE_MAX_VF_COUNT to avoid confusion · dc36796e
      Jacob Keller 提交于
      The ICE_MAX_VF_COUNT field is defined in ice_sriov.h. This count is true
      for SR-IOV but will not be true for all VF implementations, such as when
      the ice driver supports Scalable IOV.
      
      Rename this definition to clearly indicate ICE_MAX_SRIOV_VFS.
      Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
      Tested-by: NKonrad Jankowski <konrad0.jankowski@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      dc36796e
    • J
      ice: remove unused definitions from ice_sriov.h · 00a57e29
      Jacob Keller 提交于
      A few more macros exist in ice_sriov.h which are not used anywhere.
      These can be safely removed. Note that ICE_VIRTCHNL_VF_CAP_L2 capability
      is set but never checked anywhere in the driver. Thus it is also safe to
      remove.
      Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
      Tested-by: NKonrad Jankowski <konrad0.jankowski@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      00a57e29
    • J
      ice: convert vf->vc_ops to a const pointer · a7e11710
      Jacob Keller 提交于
      The vc_ops structure is used to allow different handlers for virtchnl
      commands when the driver is in representor mode. The current
      implementation uses a copy of the ops table in each VF, and modifies
      this copy dynamically.
      
      The usual practice in kernel code is to store the ops table in a
      constant structure and point to different versions. This has a number of
      advantages:
      
        1. Reduced memory usage. Each VF merely points to the correct table,
           so they're able to re-use the same constant lookup table in memory.
        2. Consistency. It becomes more difficult to accidentally update or
           edit only one op call. Instead, the code switches to the correct
           able by a single pointer write. In general this is atomic, either
           the pointer is updated or its not.
        3. Code Layout. The VF structure can store a pointer to the table
           without needing to have the full structure definition defined prior
           to the VF structure definition. This will aid in future refactoring
           of code by allowing the VF pointer to be kept in ice_vf_lib.h while
           the virtchnl ops table can be maintained in ice_virtchnl.h
      
      There is one major downside in the case of the vc_ops structure. Most of
      the operations in the table are the same between the two current
      implementations. This can appear to lead to duplication since each
      implementation must now fill in the complete table. It could make
      spotting the differences in the representor mode more challenging.
      Unfortunately, methods to make this less error prone either add
      complexity overhead (macros using CPP token concatenation) or don't work
      on all compilers we support (constant initializer from another constant
      structure).
      
      The cost of maintaining two structures does not out weigh the benefits
      of the constant table model.
      
      While we're making these changes, go ahead and rename the structure and
      implementations with "virtchnl" instead of "vc_vf_". This will more
      closely align with the planned file renaming, and avoid similar names when
      we later introduce a "vf ops" table for separating Scalable IOV and
      Single Root IOV implementations.
      
      Leave the accessor/assignment functions in order to avoid issues with
      compiling with options disabled. The interface makes it easier to handle
      when CONFIG_PCI_IOV is disabled in the kernel.
      Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
      Tested-by: NSandeep Penigalapati <sandeep.penigalapati@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      a7e11710
    • J
      ice: remove circular header dependencies on ice.h · 649c87c6
      Jacob Keller 提交于
      Several headers in the ice driver include ice.h even though they are
      themselves included by that header. The most notable of these is
      ice_common.h, but several other headers also do this.
      
      Such a recursive inclusion is problematic as it forces headers to be
      included in a strict order, otherwise compilation errors can result. The
      circular inclusions do not trigger an endless loop due to standard
      header inclusion guards, however other errors can occur.
      
      For example, ice_flow.h defines ice_rss_hash_cfg, which is used by
      ice_sriov.h as part of the definition of ice_vf_hash_ip_ctx.
      
      ice_flow.h includes ice_acl.h, which includes ice_common.h, and which
      finally includes ice.h. Since ice.h itself includes ice_sriov.h, this
      creates a circular dependency.
      
      The definition in ice_sriov.h requires things from ice_flow.h, but
      ice_flow.h itself will lead to trying to load ice_sriov.h as part of its
      process for expanding ice.h. The current code avoids this issue by
      having an implicit dependency without the include of ice_flow.h.
      
      If we were to fix that so that ice_sriov.h explicitly depends on
      ice_flow.h the following pattern would occur:
      
        ice_flow.h -> ice_acl.h -> ice_common.h -> ice.h -> ice_sriov.h
      
      At this point, during the expansion of, the header guard for ice_flow.h
      is already set, so when ice_sriov.h attempts to load the ice_flow.h
      header it is skipped. Then, we go on to begin including the rest of
      ice_sriov.h, including structure definitions which depend on
      ice_rss_hash_cfg. This produces a compiler warning because
      ice_rss_hash_cfg hasn't yet been included. Remember, we're just at the
      start of ice_flow.h!
      
      If the order of headers is incorrect (ice_flow.h is not implicitly
      loaded first in all files which include ice_sriov.h) then we get the
      same failure.
      
      Removing this recursive inclusion requires fixing a few cases where some
      headers depended on the header inclusions from ice.h. In addition, a few
      other changes are also required.
      
      Most notably, ice_hw_to_dev is implemented as a macro in ice_osdep.h,
      which is the likely reason that ice_common.h includes ice.h at all. This
      macro implementation requires the full definition of ice_pf in order to
      properly compile.
      
      Fix this by moving it to a function declared in ice_main.c, so that we
      do not require all files to depend on the layout of the ice_pf
      structure.
      
      Note that this change only fixes circular dependencies, but it does not
      fully resolve all implicit dependencies where one header may depend on
      the inclusion of another. I tried to fix as many of the implicit
      dependencies as I noticed, but fixing them all requires a somewhat
      tedious analysis of each header and attempting to compile it separately.
      Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
      Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel)
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      649c87c6
    • J
      ice: rename ice_virtchnl_pf.c to ice_sriov.c · 0deb0bf7
      Jacob Keller 提交于
      The ice_virtchnl_pf.c and ice_virtchnl_pf.h files are where most of the
      code for implementing Single Root IOV virtualization resides. This code
      includes support for bringing up and tearing down VFs, hooks into the
      kernel SR-IOV netdev operations, and for handling virtchnl messages from
      VFs.
      
      In the future, we plan to support Scalable IOV in addition to Single
      Root IOV as an alternative virtualization scheme. This implementation
      will re-use some but not all of the code in ice_virtchnl_pf.c
      
      To prepare for this future, we want to refactor and split up the code in
      ice_virtchnl_pf.c into the following scheme:
      
       * ice_vf_lib.[ch]
      
         Basic VF structures and accessors. This is where scheme-independent
         code will reside.
      
       * ice_virtchnl.[ch]
      
         Virtchnl message handling. This is where the bulk of the logic for
         processing messages from VFs using the virtchnl messaging scheme will
         reside. This is separated from ice_vf_lib.c because it is distinct
         and has a bulk of the processing code.
      
       * ice_sriov.[ch]
      
         Single Root IOV implementation, including initialization and the
         routines for interacting with SR-IOV based netdev operations.
      
       * (future) ice_siov.[ch]
      
         Scalable IOV implementation.
      
      As a first step, lets assume that all of the code in
      ice_virtchnl_pf.[ch] is for Single Root IOV. Rename this file to
      ice_sriov.c and its header to ice_sriov.h
      
      Future changes will further split out the code in these files following
      the plan outlined here.
      Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
      Tested-by: NKonrad Jankowski <konrad0.jankowski@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      0deb0bf7
    • J
      ice: rename ice_sriov.c to ice_vf_mbx.c · d775155a
      Jacob Keller 提交于
      The ice_sriov.c file primarily contains code which handles the logic for
      mailbox overflow detection and some other utility functions related to
      the virtualization mailbox.
      
      The bulk of the SR-IOV implementation is actually found in
      ice_virtchnl_pf.c, and this file isn't strictly SR-IOV specific.
      
      In the future, the ice driver will support an additional virtualization
      scheme known as Scalable IOV, and the code in this file will be used
      for this alternative implementation.
      
      Rename this file (and its associated header) to ice_vf_mbx.c, so that we
      can later re-use the ice_sriov.c file as the SR-IOV specific file.
      Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
      Tested-by: NKonrad Jankowski <konrad0.jankowski@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      d775155a
  2. 12 3月, 2022 2 次提交
  3. 11 3月, 2022 1 次提交
    • I
      ice: Fix race condition during interface enslave · 5cb1ebdb
      Ivan Vecera 提交于
      Commit 5dbbbd01 ("ice: Avoid RTNL lock when re-creating
      auxiliary device") changes a process of re-creation of aux device
      so ice_plug_aux_dev() is called from ice_service_task() context.
      This unfortunately opens a race window that can result in dead-lock
      when interface has left LAG and immediately enters LAG again.
      
      Reproducer:
      ```
      #!/bin/sh
      
      ip link add lag0 type bond mode 1 miimon 100
      ip link set lag0
      
      for n in {1..10}; do
              echo Cycle: $n
              ip link set ens7f0 master lag0
              sleep 1
              ip link set ens7f0 nomaster
      done
      ```
      
      This results in:
      [20976.208697] Workqueue: ice ice_service_task [ice]
      [20976.213422] Call Trace:
      [20976.215871]  __schedule+0x2d1/0x830
      [20976.219364]  schedule+0x35/0xa0
      [20976.222510]  schedule_preempt_disabled+0xa/0x10
      [20976.227043]  __mutex_lock.isra.7+0x310/0x420
      [20976.235071]  enum_all_gids_of_dev_cb+0x1c/0x100 [ib_core]
      [20976.251215]  ib_enum_roce_netdev+0xa4/0xe0 [ib_core]
      [20976.256192]  ib_cache_setup_one+0x33/0xa0 [ib_core]
      [20976.261079]  ib_register_device+0x40d/0x580 [ib_core]
      [20976.266139]  irdma_ib_register_device+0x129/0x250 [irdma]
      [20976.281409]  irdma_probe+0x2c1/0x360 [irdma]
      [20976.285691]  auxiliary_bus_probe+0x45/0x70
      [20976.289790]  really_probe+0x1f2/0x480
      [20976.298509]  driver_probe_device+0x49/0xc0
      [20976.302609]  bus_for_each_drv+0x79/0xc0
      [20976.306448]  __device_attach+0xdc/0x160
      [20976.310286]  bus_probe_device+0x9d/0xb0
      [20976.314128]  device_add+0x43c/0x890
      [20976.321287]  __auxiliary_device_add+0x43/0x60
      [20976.325644]  ice_plug_aux_dev+0xb2/0x100 [ice]
      [20976.330109]  ice_service_task+0xd0c/0xed0 [ice]
      [20976.342591]  process_one_work+0x1a7/0x360
      [20976.350536]  worker_thread+0x30/0x390
      [20976.358128]  kthread+0x10a/0x120
      [20976.365547]  ret_from_fork+0x1f/0x40
      ...
      [20976.438030] task:ip              state:D stack:    0 pid:213658 ppid:213627 flags:0x00004084
      [20976.446469] Call Trace:
      [20976.448921]  __schedule+0x2d1/0x830
      [20976.452414]  schedule+0x35/0xa0
      [20976.455559]  schedule_preempt_disabled+0xa/0x10
      [20976.460090]  __mutex_lock.isra.7+0x310/0x420
      [20976.464364]  device_del+0x36/0x3c0
      [20976.467772]  ice_unplug_aux_dev+0x1a/0x40 [ice]
      [20976.472313]  ice_lag_event_handler+0x2a2/0x520 [ice]
      [20976.477288]  notifier_call_chain+0x47/0x70
      [20976.481386]  __netdev_upper_dev_link+0x18b/0x280
      [20976.489845]  bond_enslave+0xe05/0x1790 [bonding]
      [20976.494475]  do_setlink+0x336/0xf50
      [20976.502517]  __rtnl_newlink+0x529/0x8b0
      [20976.543441]  rtnl_newlink+0x43/0x60
      [20976.546934]  rtnetlink_rcv_msg+0x2b1/0x360
      [20976.559238]  netlink_rcv_skb+0x4c/0x120
      [20976.563079]  netlink_unicast+0x196/0x230
      [20976.567005]  netlink_sendmsg+0x204/0x3d0
      [20976.570930]  sock_sendmsg+0x4c/0x50
      [20976.574423]  ____sys_sendmsg+0x1eb/0x250
      [20976.586807]  ___sys_sendmsg+0x7c/0xc0
      [20976.606353]  __sys_sendmsg+0x57/0xa0
      [20976.609930]  do_syscall_64+0x5b/0x1a0
      [20976.613598]  entry_SYSCALL_64_after_hwframe+0x65/0xca
      
      1. Command 'ip link ... set nomaster' causes that ice_plug_aux_dev()
         is called from ice_service_task() context, aux device is created
         and associated device->lock is taken.
      2. Command 'ip link ... set master...' calls ice's notifier under
         RTNL lock and that notifier calls ice_unplug_aux_dev(). That
         function tries to take aux device->lock but this is already taken
         by ice_plug_aux_dev() in step 1
      3. Later ice_plug_aux_dev() tries to take RTNL lock but this is already
         taken in step 2
      4. Dead-lock
      
      The patch fixes this issue by following changes:
      - Bit ICE_FLAG_PLUG_AUX_DEV is kept to be set during ice_plug_aux_dev()
        call in ice_service_task()
      - The bit is checked in ice_clear_rdma_cap() and only if it is not set
        then ice_unplug_aux_dev() is called. If it is set (in other words
        plugging of aux device was requested and ice_plug_aux_dev() is
        potentially running) then the function only clears the bit
      - Once ice_plug_aux_dev() call (in ice_service_task) is finished
        the bit ICE_FLAG_PLUG_AUX_DEV is cleared but it is also checked
        whether it was already cleared by ice_clear_rdma_cap(). If so then
        aux device is unplugged.
      Signed-off-by: NIvan Vecera <ivecera@redhat.com>
      Co-developed-by: NPetr Oros <poros@redhat.com>
      Signed-off-by: NPetr Oros <poros@redhat.com>
      Reviewed-by: NDave Ertman <david.m.ertman@intel.com>
      Link: https://lore.kernel.org/r/20220310171641.3863659-1-ivecera@redhat.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      5cb1ebdb
  4. 10 3月, 2022 5 次提交
  5. 09 3月, 2022 4 次提交
    • J
      ice: Fix curr_link_speed advertised speed · ad35ffa2
      Jedrzej Jagielski 提交于
      Change curr_link_speed advertised speed, due to
      link_info.link_speed is not equal phy.curr_user_speed_req.
      Without this patch it is impossible to set advertised
      speed to same as link_speed.
      
      Testing Hints: Try to set advertised speed
      to 25G only with 25G default link (use ethtool -s 0x80000000)
      
      Fixes: 48cb27f2 ("ice: Implement handlers for ethtool PHY/link operations")
      Signed-off-by: NGrzegorz Siwik <grzegorz.siwik@intel.com>
      Signed-off-by: NJedrzej Jagielski <jedrzej.jagielski@intel.com>
      Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      ad35ffa2
    • C
      ice: Don't use GFP_KERNEL in atomic context · 3d97f1af
      Christophe JAILLET 提交于
      ice_misc_intr() is an irq handler. It should not sleep.
      
      Use GFP_ATOMIC instead of GFP_KERNEL when allocating some memory.
      
      Fixes: 348048e7 ("ice: Implement iidc operations")
      Signed-off-by: NChristophe JAILLET <christophe.jaillet@wanadoo.fr>
      Tested-by: NLeszek Kaliszczuk <leszek.kaliszczuk@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      3d97f1af
    • D
      ice: Fix error with handling of bonding MTU · 97b01291
      Dave Ertman 提交于
      When a bonded interface is destroyed, .ndo_change_mtu can be called
      during the tear-down process while the RTNL lock is held.  This is a
      problem since the auxiliary driver linked to the LAN driver needs to be
      notified of the MTU change, and this requires grabbing a device_lock on
      the auxiliary_device's dev.  Currently this is being attempted in the
      same execution context as the call to .ndo_change_mtu which is causing a
      dead-lock.
      
      Move the notification of the changed MTU to a separate execution context
      (watchdog service task) and eliminate the "before" notification.
      
      Fixes: 348048e7 ("ice: Implement iidc operations")
      Signed-off-by: NDave Ertman <david.m.ertman@intel.com>
      Tested-by: NJonathan Toppins <jtoppins@redhat.com>
      Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel)
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      97b01291
    • J
      ice: stop disabling VFs due to PF error responses · 79498d5a
      Jacob Keller 提交于
      The ice_vc_send_msg_to_vf function has logic to detect "failure"
      responses being sent to a VF. If a VF is sent more than
      ICE_DFLT_NUM_INVAL_MSGS_ALLOWED then the VF is marked as disabled.
      Almost identical logic also existed in the i40e driver.
      
      This logic was added to the ice driver in commit 1071a835 ("ice:
      Implement virtchnl commands for AVF support") which itself copied from
      the i40e implementation in commit 5c3c48ac ("i40e: implement virtual
      device interface").
      
      Neither commit provides a proper explanation or justification of the
      check. In fact, later commits to i40e changed the logic to allow
      bypassing the check in some specific instances.
      
      The "logic" for this seems to be that error responses somehow indicate a
      malicious VF. This is not really true. The PF might be sending an error
      for any number of reasons such as lack of resources, etc.
      
      Additionally, this causes the PF to log an info message for every failed
      VF response which may confuse users, and can spam the kernel log.
      
      This behavior is not documented as part of any requirement for our
      products and other operating system drivers such as the FreeBSD
      implementation of our drivers do not include this type of check.
      
      In fact, the change from dev_err to dev_info in i40e commit 18b7af57
      ("i40e: Lower some message levels") explains that these messages
      typically don't actually indicate a real issue. It is quite likely that
      a user who hits this in practice will be very confused as the VF will be
      disabled without an obvious way to recover.
      
      We already have robust malicious driver detection logic using actual
      hardware detection mechanisms that detect and prevent invalid device
      usage. Remove the logic since its not a documented requirement and the
      behavior is not intuitive.
      
      Fixes: 1071a835 ("ice: Implement virtchnl commands for AVF support")
      Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
      Tested-by: NKonrad Jankowski <konrad0.jankowski@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      79498d5a
  6. 08 3月, 2022 1 次提交
  7. 04 3月, 2022 11 次提交
    • J
      ice: convert VF storage to hash table with krefs and RCU · 3d5985a1
      Jacob Keller 提交于
      The ice driver stores VF structures in a simple array which is allocated
      once at the time of VF creation. The VF structures are then accessed
      from the array by their VF ID. The ID must be between 0 and the number
      of allocated VFs.
      
      Multiple threads can access this table:
      
       * .ndo operations such as .ndo_get_vf_cfg or .ndo_set_vf_trust
       * interrupts, such as due to messages from the VF using the virtchnl
         communication
       * processing such as device reset
       * commands to add or remove VFs
      
      The current implementation does not keep track of when all threads are
      done operating on a VF and can potentially result in use-after-free
      issues caused by one thread accessing a VF structure after it has been
      released when removing VFs. Some of these are prevented with various
      state flags and checks.
      
      In addition, this structure is quite static and does not support a
      planned future where virtualization can be more dynamic. As we begin to
      look at supporting Scalable IOV with the ice driver (as opposed to just
      supporting Single Root IOV), this structure is not sufficient.
      
      In the future, VFs will be able to be added and removed individually and
      dynamically.
      
      To allow for this, and to better protect against a whole class of
      use-after-free bugs, replace the VF storage with a combination of a hash
      table and krefs to reference track all of the accesses to VFs through
      the hash table.
      
      A hash table still allows efficient look up of the VF given its ID, but
      also allows adding and removing VFs. It does not require contiguous VF
      IDs.
      
      The use of krefs allows the cleanup of the VF memory to be delayed until
      after all threads have released their reference (by calling ice_put_vf).
      
      To prevent corruption of the hash table, a combination of RCU and the
      mutex table_lock are used. Addition and removal from the hash table use
      the RCU-aware hash macros. This allows simple read-only look ups that
      iterate to locate a single VF can be fast using RCU. Accesses which
      modify the hash table, or which can't take RCU because they sleep, will
      hold the mutex lock.
      
      By using this design, we have a stronger guarantee that the VF structure
      can't be released until after all threads are finished operating on it.
      We also pave the way for the more dynamic Scalable IOV implementation in
      the future.
      Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
      Tested-by: NKonrad Jankowski <konrad0.jankowski@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      3d5985a1
    • J
      ice: introduce VF accessor functions · fb916db1
      Jacob Keller 提交于
      Before we switch the VF data structure storage mechanism to a hash,
      introduce new accessor functions to define the new interface.
      
      * ice_get_vf_by_id is a function used to obtain a reference to a VF from
        the table based on its VF ID
      * ice_has_vfs is used to quickly check if any VFs are configured
      * ice_get_num_vfs is used to get an exact count of how many VFs are
        configured
      
      We can drop the old ice_validate_vf_id function, since every caller was
      just going to immediately access the VF table to get a reference
      anyways. This way we simply use the single ice_get_vf_by_id to both
      validate the VF ID is within range and that there exists a VF with that
      ID.
      
      This change enables us to more easily convert the codebase to the hash
      table since most callers now properly use the interface.
      Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
      Tested-by: NKonrad Jankowski <konrad0.jankowski@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      fb916db1
    • J
      ice: factor VF variables to separate structure · 000773c0
      Jacob Keller 提交于
      We maintain a number of values for VFs within the ice_pf structure. This
      includes the VF table, the number of allocated VFs, the maximum number
      of supported SR-IOV VFs, the number of queue pairs per VF, the number of
      MSI-X vectors per VF, and a bitmap of the VFs with detected MDD events.
      
      We're about to add a few more variables to this list. Clean this up
      first by extracting these members out into a new ice_vfs structure
      defined in ice_virtchnl_pf.h
      Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
      Tested-by: NKonrad Jankowski <konrad0.jankowski@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      000773c0
    • J
      ice: convert ice_for_each_vf to include VF entry iterator · c4c2c7db
      Jacob Keller 提交于
      The ice_for_each_vf macro is intended to be used to loop over all VFs.
      The current implementation relies on an iterator that is the index into
      the VF array in the PF structure. This forces all users to perform a
      look up themselves.
      
      This abstraction forces a lot of duplicate work on callers and leaks the
      interface implementation to the caller. Replace this with an
      implementation that includes the VF pointer the primary iterator. This
      version simplifies callers which just want to iterate over every VF, as
      they no longer need to perform their own lookup.
      
      The "i" iterator value is replaced with a new unsigned int "bkt"
      parameter, as this will match the necessary interface for replacing
      the VF array with a hash table. For now, the bkt is the VF ID, but in
      the future it will simply be the hash bucket index. Document that it
      should not be treated as a VF ID.
      
      This change aims to simplify switching from the array to a hash table. I
      considered alternative implementations such as an xarray but decided
      that the hash table was the simplest and most suitable implementation. I
      also looked at methods to hide the bkt iterator entirely, but I couldn't
      come up with a feasible solution that worked for hash table iterators.
      Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
      Tested-by: NKonrad Jankowski <konrad0.jankowski@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      c4c2c7db
    • J
      ice: use ice_for_each_vf for iteration during removal · 19281e86
      Jacob Keller 提交于
      When removing VFs, the driver takes a weird approach of assigning
      pf->num_alloc_vfs to 0 before iterating over the VFs using a temporary
      variable.
      
      This logic has been in the driver for a long time, and seems to have
      been carried forward from i40e.
      
      We want to refactor the way VFs are stored, and iterating over the data
      structure without the ice_for_each_vf interface impedes this work.
      
      The logic relies on implicitly using the num_alloc_vfs as a sort of
      "safe guard" for accessing VF data.
      
      While this sort of guard makes sense for Single Root IOV where all VFs
      are added at once, the data structures don't work for VFs which can be
      added and removed dynamically. We also have a separate state flag,
      ICE_VF_DEINIT_IN_PROGRESS which is a stronger protection against
      concurrent removal and access.
      
      Avoid the custom tmp iteration and replace it with the standard
      ice_for_each_vf iterator. Delay the assignment of num_alloc_vfs until
      after this loop finishes.
      Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
      Tested-by: NKonrad Jankowski <konrad0.jankowski@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      19281e86
    • J
      ice: remove checks in ice_vc_send_msg_to_vf · 59e1f857
      Jacob Keller 提交于
      The ice_vc_send_msg_to_vf function is used by the PF to send a response
      to a VF. This function has overzealous checks to ensure its not passed a
      NULL VF pointer and to ensure that the passed in struct ice_vf has a
      valid vf_id sub-member.
      
      These checks have existed since commit 1071a835 ("ice: Implement
      virtchnl commands for AVF support") and function as simple sanity
      checks.
      
      We are planning to refactor the ice driver to use a hash table along
      with appropriate locks in a future refactor. This change will modify how
      the ice_validate_vf_id function works. Instead of a simple >= check to
      ensure the VF ID is between some range, it will check the hash table to
      see if the specified VF ID is actually in the table. This requires that
      the function properly lock the table to prevent race conditions.
      
      The checks may seem ok at first glance, but they don't really provide
      much benefit.
      
      In order for ice_vc_send_msg_to_vf to have these checks fail, the
      callers must either (1) pass NULL as the VF, (2) construct an invalid VF
      pointer manually, or (3) be using a VF pointer which becomes invalid
      after they obtain it properly using ice_get_vf_by_id.
      
      For (1), a cursory glance over callers of ice_vc_send_msg_to_vf can show
      that in most cases the functions already operate assuming their VF
      pointer is valid, such as by derferencing vf->pf or other members.
      
      They obtain the VF pointer by accessing the VF array using the VF ID,
      which can never produce a NULL value (since its a simple address
      operation on the array it will not be NULL.
      
      The sole exception for (1) is that ice_vc_process_vf_msg will forward a
      NULL VF pointer to this function as part of its goto error handler
      logic. This requires some minor cleanup to simply exit immediately when
      an invalid VF ID is detected (Rather than use the same error flow as
      the rest of the function).
      
      For (2), it is unexpected for a flow to construct a VF pointer manually
      instead of accessing the VF array. Defending against this is likely to
      just hide bad programming.
      
      For (3), it is definitely true that VF pointers could become invalid,
      for example if a thread is processing a VF message while the VF gets
      removed. However, the correct solution is not to add additional checks
      like this which do not guarantee to prevent the race. Instead we plan to
      solve the root of the problem by preventing the possibility entirely.
      
      This solution will require the change to a hash table with proper
      locking and reference counts of the VF structures. When this is done,
      ice_validate_vf_id will require locking of the hash table. This will be
      problematic because all of the callers of ice_vc_send_msg_to_vf will
      already have to take the lock to obtain the VF pointer anyways. With a
      mutex, this leads to a double lock that could hang the kernel thread.
      
      Avoid this by removing the checks which don't provide much value, so
      that we can safely add the necessary protections properly.
      Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
      Tested-by: NKonrad Jankowski <konrad0.jankowski@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      59e1f857
    • J
      ice: move VFLR acknowledge during ice_free_vfs · 44efe75f
      Jacob Keller 提交于
      After removing all VFs, the driver clears the VFLR indication for VFs.
      This has been in ice since the beginning of SR-IOV support in the ice
      driver.
      
      The implementation was copied from i40e, and the motivation for the VFLR
      indication clearing is described in the commit f7414531 ("i40e:
      acknowledge VFLR when disabling SR-IOV")
      
      The commit explains that we need to clear the VFLR indication because
      the virtual function undergoes a VFLR event. If we don't indicate that
      it is complete it can cause an issue when VFs are re-enabled due to
      a "phantom" VFLR.
      
      The register block read was added under a pci_vfs_assigned check
      originally. This was done because we added the check after calling
      pci_disable_sriov. This was later moved to disable SRIOV earlier in the
      flow so that the VF drivers could be torn down before we removed
      functionality.
      
      Move the VFLR acknowledge into the main loop that tears down VF
      resources. This avoids using the tmp value for iterating over VFs
      multiple times. The result will make it easier to refactor the VF array
      in a future change.
      
      It's possible we might want to modify this flow to also stop checking
      pci_vfs_assigned. However, it seems reasonable to keep this change: we
      should only clear the VFLR if we actually disabled SR-IOV.
      Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
      Tested-by: NKonrad Jankowski <konrad0.jankowski@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      44efe75f
    • J
      ice: move clear_malvf call in ice_free_vfs · 294627a6
      Jacob Keller 提交于
      The ice_mbx_clear_malvf function is used to clear the indication and
      count of how many times a VF was detected as malicious. During
      ice_free_vfs, we use this function to ensure that all removed VFs are
      reset to a clean state.
      
      The call currently is done at the end of ice_free_vfs() using a tmp
      value to iterate over all of the entries in the bitmap.
      
      This separate iteration using tmp is problematic for a planned refactor
      of the VF array data structure. To avoid this, lets move the call
      slightly higher into the function inside the loop where we teardown all
      of the VFs. This avoids one use of the tmp value used for iteration.
      We'll fix the other user in a future change.
      Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
      Tested-by: NKonrad Jankowski <konrad0.jankowski@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      294627a6
    • J
      ice: pass num_vfs to ice_set_per_vf_res() · cd0f4f3b
      Jacob Keller 提交于
      We are planning to replace the simple array structure tracking VFs with
      a hash table. This change will also remove the "num_alloc_vfs" variable.
      
      Instead, new access functions to use the hash table as the source of
      truth will be introduced. These will generally be equivalent to existing
      checks, except during VF initialization.
      
      Specifically, ice_set_per_vf_res() cannot use the hash table as it will
      be operating prior to VF structures being inserted into the hash table.
      
      Instead of using pf->num_alloc_vfs, simply pass the num_vfs value in
      from the caller.
      
      Note that a sub-function of ice_set_per_vf_res, ice_determine_res, also
      implicitly depends on pf->num_alloc_vfs. Replace ice_determine_res with
      a simpler inline implementation based on rounddown_pow_of_two. Note that
      we must explicitly check that the argument is non-zero since it does not
      play well with zero as a value.
      
      Instead of using the function and while loop, simply calculate the
      number of queues we have available by dividing by num_vfs. Check if the
      desired queues are available. If not, round down to the nearest power of
      2 that fits within our available queues.
      
      This matches the behavior of ice_determine_res but is easier to follow
      as simple in-line logic. Remove ice_determine_res entirely.
      
      With this change, we no longer depend on the pf->num_alloc_vfs during
      the initialization phase of VFs. This will allow us to safely remove it
      in a future planned refactor of the VF data structures.
      Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
      Tested-by: NKonrad Jankowski <konrad0.jankowski@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      cd0f4f3b
    • J
      ice: store VF pointer instead of VF ID · b03d519d
      Jacob Keller 提交于
      The VSI structure contains a vf_id field used to associate a VSI with a
      VF. This is used mainly for ICE_VSI_VF as well as partially for
      ICE_VSI_CTRL associated with the VFs.
      
      This API was designed with the idea that VFs are stored in a simple
      array that was expected to be static throughout most of the driver's
      life.
      
      We plan on refactoring VF storage in a few key ways:
      
        1) converting from a simple static array to a hash table
        2) using krefs to track VF references obtained from the hash table
        3) use RCU to delay release of VF memory until after all references
           are dropped
      
      This is motivated by the goal to ensure that the lifetime of VF
      structures is accounted for, and prevent various use-after-free bugs.
      
      With the existing vsi->vf_id, the reference tracking for VFs would
      become somewhat convoluted, because each VSI maintains a vf_id field
      which will then require performing a look up. This means all these flows
      will require reference tracking and proper usage of rcu_read_lock, etc.
      
      We know that the VF VSI will always be backed by a valid VF structure,
      because the VSI is created during VF initialization and removed before
      the VF is destroyed. Rely on this and store a reference to the VF in the
      VSI structure instead of storing a VF ID. This will simplify the usage
      and avoid the need to perform lookups on the hash table in the future.
      
      For ICE_VSI_VF, it is expected that vsi->vf is always non-NULL after
      ice_vsi_alloc succeeds. Because of this, use WARN_ON when checking if a
      vsi->vf pointer is valid when dealing with VF VSIs. This will aid in
      debugging code which violates this assumption and avoid more disastrous
      panics.
      Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
      Tested-by: NKonrad Jankowski <konrad0.jankowski@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      b03d519d
    • J
      ice: refactor unwind cleanup in eswitch mode · df830543
      Jacob Keller 提交于
      The code for supporting eswitch mode and port representors on VFs uses
      an unwind based cleanup flow when handling errors.
      
      These flows are used to cleanup and get everything back to the state
      prior to attempting to switch from legacy to representor mode or back.
      
      The unwind iterations make sense, but complicate a plan to refactor the
      VF array structure. In the future we won't have a clean method of
      reversing an iteration of the VFs.
      
      Instead, we can change the cleanup flow to just iterate over all VF
      structures and clean up appropriately.
      
      First notice that ice_repr_add_for_all_vfs and ice_repr_rem_from_all_vfs
      have an additional step of re-assigning the VC ops. There is no good
      reason to do this outside of ice_repr_add and ice_repr_rem. It can
      simply be done as the last step of these functions.
      
      Second, make sure ice_repr_rem is safe to call on a VF which does not
      have a representor. Check if vf->repr is NULL first and exit early if
      so.
      
      Move ice_repr_rem_from_all_vfs above ice_repr_add_for_all_vfs so that we
      can call it from the cleanup function.
      
      In ice_eswitch.c, replace the unwind iteration with a call to
      ice_eswitch_release_reprs. This will go through all of the VFs and
      revert the VF back to the standard model without the eswitch mode.
      
      To make this safe, ensure this function checks whether or not the
      represent or has been moved. Rely on the metadata destination in
      vf->repr->dst. This must be NULL if the representor has not been moved
      to eswitch mode.
      
      Ensure that we always re-assign this value back to NULL after freeing
      it, and move the ice_eswitch_release_reprs so that it can be called from
      the setup function.
      
      With these changes, eswitch cleanup no longer uses an unwind flow that
      is problematic for the planned VF data structure change.
      Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
      Tested-by: NSandeep Penigalapati <sandeep.penigalapati@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      df830543
  8. 03 3月, 2022 1 次提交
  9. 19 2月, 2022 5 次提交
    • T
      ice: initialize local variable 'tlv' · 5950bdc8
      Tom Rix 提交于
      Clang static analysis reports this issues
      ice_common.c:5008:21: warning: The left expression of the compound
        assignment is an uninitialized value. The computed value will
        also be garbage
        ldo->phy_type_low |= ((u64)buf << (i * 16));
        ~~~~~~~~~~~~~~~~~ ^
      
      When called from ice_cfg_phy_fec() ldo is the uninitialized local
      variable tlv.  So initialize.
      
      Fixes: ea78ce4d ("ice: add link lenient and default override support")
      Signed-off-by: NTom Rix <trix@redhat.com>
      Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel)
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      5950bdc8
    • T
      ice: check the return of ice_ptp_gettimex64 · ed22d9c8
      Tom Rix 提交于
      Clang static analysis reports this issue
      time64.h:69:50: warning: The left operand of '+'
        is a garbage value
        set_normalized_timespec64(&ts_delta, lhs.tv_sec + rhs.tv_sec,
                                             ~~~~~~~~~~ ^
      In ice_ptp_adjtime_nonatomic(), the timespec64 variable 'now'
      is set by ice_ptp_gettimex64().  This function can fail
      with -EBUSY, so 'now' can have a gargbage value.
      So check the return.
      
      Fixes: 06c16d89 ("ice: register 1588 PTP clock device object for E810 devices")
      Signed-off-by: NTom Rix <trix@redhat.com>
      Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel)
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      ed22d9c8
    • J
      ice: fix concurrent reset and removal of VFs · fadead80
      Jacob Keller 提交于
      Commit c503e632 ("ice: Stop processing VF messages during teardown")
      introduced a driver state flag, ICE_VF_DEINIT_IN_PROGRESS, which is
      intended to prevent some issues with concurrently handling messages from
      VFs while tearing down the VFs.
      
      This change was motivated by crashes caused while tearing down and
      bringing up VFs in rapid succession.
      
      It turns out that the fix actually introduces issues with the VF driver
      caused because the PF no longer responds to any messages sent by the VF
      during its .remove routine. This results in the VF potentially removing
      its DMA memory before the PF has shut down the device queues.
      
      Additionally, the fix doesn't actually resolve concurrency issues within
      the ice driver. It is possible for a VF to initiate a reset just prior
      to the ice driver removing VFs. This can result in the remove task
      concurrently operating while the VF is being reset. This results in
      similar memory corruption and panics purportedly fixed by that commit.
      
      Fix this concurrency at its root by protecting both the reset and
      removal flows using the existing VF cfg_lock. This ensures that we
      cannot remove the VF while any outstanding critical tasks such as a
      virtchnl message or a reset are occurring.
      
      This locking change also fixes the root cause originally fixed by commit
      c503e632 ("ice: Stop processing VF messages during teardown"), so we
      can simply revert it.
      
      Note that I kept these two changes together because simply reverting the
      original commit alone would leave the driver vulnerable to worse race
      conditions.
      
      Fixes: c503e632 ("ice: Stop processing VF messages during teardown")
      Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
      Tested-by: NKonrad Jankowski <konrad0.jankowski@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      fadead80
    • M
      ice: fix setting l4 port flag when adding filter · 932645c2
      Michal Swiatkowski 提交于
      Accidentally filter flag for none encapsulated l4 port field is always
      set. Even if user wants to add encapsulated l4 port field.
      
      Remove this unnecessary flag setting.
      
      Fixes: 9e300987 ("ice: VXLAN and Geneve TC support")
      Signed-off-by: NMichal Swiatkowski <michal.swiatkowski@linux.intel.com>
      Tested-by: NSandeep Penigalapati <sandeep.penigalapati@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      932645c2
    • W
      ice: Match on all profiles in slow-path · b70bc066
      Wojciech Drewek 提交于
      In switchdev mode, slow-path rules need to match all protocols, in order
      to correctly redirect unfiltered or missed packets to the uplink. To set
      this up for the virtual function to uplink flow, the rule that redirects
      packets to the control VSI must have the tunnel type set to
      ICE_SW_TUN_AND_NON_TUN. As a result of that new tunnel type being set,
      ice_get_compat_fv_bitmap will select ICE_PROF_ALL. At that point all
      profiles would be selected for this rule, resulting in the desired
      behavior. Without this change slow-path would not work with
      tunnel protocols.
      
      Fixes: 8b032a55 ("ice: low level support for tunnels")
      Signed-off-by: NWojciech Drewek <wojciech.drewek@intel.com>
      Tested-by: NSandeep Penigalapati <sandeep.penigalapati@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      b70bc066
  10. 14 2月, 2022 2 次提交
  11. 11 2月, 2022 2 次提交