1. 16 3月, 2022 6 次提交
    • J
      ice: convert ice_reset_vf to take flags · 7eb517e4
      Jacob Keller 提交于
      The ice_reset_vf function takes a boolean parameter which indicates
      whether or not the reset is due to a VFLR event.
      
      This is somewhat confusing to read because readers must interpret what
      "true" and "false" mean when seeing a line of code like
      "ice_reset_vf(vf, false)".
      
      We will want to add another toggle to the ice_reset_vf in a following
      change. To avoid proliferating many arguments, convert this function to
      take flags instead. ICE_VF_RESET_VFLR will indicate if this is a VFLR
      reset. A value of 0 indicates no flags.
      
      One could argue that "ice_reset_vf(vf, 0)" is no more readable than
      "ice_reset_vf(vf, false)".. However, this type of flags interface is
      somewhat common and using 0 to mean "no flags" makes sense in this
      context. We could bother to add a define for "ICE_VF_RESET_PLAIN" or
      something similar, but this can be confusing since its not an actual bit
      flag.
      
      This paves the way to add another flag to the function in a following
      change.
      Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
      Tested-by: NKonrad Jankowski <konrad0.jankowski@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      7eb517e4
    • J
      ice: move reset functionality into ice_vf_lib.c · 16686d7f
      Jacob Keller 提交于
      Now that the reset functions do not rely on Single Root specific
      behavior, move the ice_reset_vf, ice_reset_all_vfs, and
      ice_vf_rebuild_host_cfg functions and their dependent helper functions
      out of ice_sriov.c and into ice_vf_lib.c
      Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
      Tested-by: NKonrad Jankowski <konrad0.jankowski@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      16686d7f
    • J
      ice: fix a long line warning in ice_reset_vf · 5de95744
      Jacob Keller 提交于
      We're about to move ice_reset_vf out of ice_sriov.c and into
      ice_vf_lib.c
      
      One of the dev_err statements has a checkpatch.pl violation due to
      putting the vf->vf_id on the same line as the dev_err. Fix this style
      issue first before moving the code.
      Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
      Tested-by: NKonrad Jankowski <konrad0.jankowski@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      5de95744
    • J
      ice: introduce VF operations structure for reset flows · 9c6f7878
      Jacob Keller 提交于
      The ice driver currently supports virtualization using Single Root IOV,
      with code in the ice_sriov.c file. In the future, we plan to also
      implement support for Scalable IOV, which uses slightly different
      hardware implementations for some functionality.
      
      To eventually allow this, we introduce a new ice_vf_ops structure which
      will contain the basic operations that are different between the two IOV
      implementations. This primarily includes logic for how to handle the VF
      reset registers, as well as what to do before and after rebuilding the
      VF's VSI.
      
      Implement these ops structures and call the ops table instead of
      directly calling the SR-IOV specific function. This will allow us to
      easily add the Scalable IOV implementation in the future. Additionally,
      it helps separate the generalized VF logic from SR-IOV specifics. This
      change allows us to move the reset logic out of ice_sriov.c and into
      ice_vf_lib.c without placing any Single Root specific details into the
      generic file.
      Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
      Tested-by: NKonrad Jankowski <konrad0.jankowski@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      9c6f7878
    • J
      ice: fix incorrect dev_dbg print mistaking 'i' for vf->vf_id · f5840e0d
      Jacob Keller 提交于
      If we fail to clear the malicious VF indication after a VF reset, the
      dev_dbg message which is printed uses the local variable 'i' when it
      meant to use vf->vf_id. Fix this.
      
      Fixes: 0891c896 ("ice: warn about potentially malicious VFs")
      Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
      Tested-by: NKonrad Jankowski <konrad0.jankowski@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      f5840e0d
    • J
      ice: introduce ice_vf_lib.c, ice_vf_lib.h, and ice_vf_lib_private.h · 109aba47
      Jacob Keller 提交于
      Introduce the ice_vf_lib.c file along with the ice_vf_lib.h and
      ice_vf_lib_private.h header files.
      
      These files will house the generic VF structures and access functions.
      Move struct ice_vf and its dependent definitions into this new header
      file.
      
      The ice_vf_lib.c is compiled conditionally on CONFIG_PCI_IOV. Some of
      its functionality is required by all driver files. However, some of its
      functionality will only be required by other files also conditionally
      compiled based on CONFIG_PCI_IOV.
      
      Declaring these functions used only in CONFIG_PCI_IOV files in
      ice_vf_lib.h is verbose. This is because we must provide a fallback
      implementation for each function in this header since it is included in
      files which may not be compiled with CONFIG_PCI_IOV.
      
      Instead, introduce a new ice_vf_lib_private.h header which verifies that
      CONFIG_PCI_IOV is enabled. This header is intended to be directly
      included in .c files which are CONFIG_PCI_IOV only. Add a #error
      indication that will complain if the file ever gets included by another
      C file on a kernel with CONFIG_PCI_IOV disabled. Add a comment
      indicating the nature of the file and why it is useful.
      
      This makes it so that we can easily define functions exposed from
      ice_vf_lib.c into other virtualization files without needing to add
      fallback implementations for every single function.
      
      This begins the path to separate out generic code which will be reused
      by other virtualization implementations from ice_sriov.h and ice_sriov.c
      Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
      Tested-by: NKonrad Jankowski <konrad0.jankowski@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      109aba47
  2. 15 3月, 2022 9 次提交
    • J
      ice: use ice_is_vf_trusted helper function · 1261691d
      Jacob Keller 提交于
      The ice_vc_cfg_promiscuous_mode_msg function directly checks
      ICE_VIRTCHNL_VF_CAP_PRIVILEGE, instead of using the existing helper
      function ice_is_vf_trusted. Switch this to use the helper function so
      that all trusted checks are consistent. This aids in any potential
      future refactor by ensuring consistent code.
      Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
      Tested-by: NKonrad Jankowski <konrad0.jankowski@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      1261691d
    • J
      ice: log an error message when eswitch fails to configure · 2b369448
      Jacob Keller 提交于
      When ice_eswitch_configure fails, print an error message to make it more
      obvious why VF initialization did not succeed.
      Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
      Tested-by: NSandeep Penigalapati <sandeep.penigalapati@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      2b369448
    • J
      ice: cleanup error logging for ice_ena_vfs · 94ab2488
      Jacob Keller 提交于
      The ice_ena_vfs function and some of its sub-functions like
      ice_set_per_vf_res use a "if (<function>) { <print error> ; <exit> }"
      flow. This flow discards specialized errors reported by the called
      function.
      
      This style is generally not preferred as it makes tracing error sources
      more difficult. It also means we cannot log the actual error received
      properly.
      
      Refactor several calls in the ice_ena_vfs function that do this to catch
      the error in the 'ret' variable. Report this in the messages, and then
      return the more precise error value.
      
      Doing this reveals that ice_set_per_vf_res returns -EINVAL or -EIO in
      places where -ENOSPC makes more sense. Fix these calls up to return the
      more appropriate value.
      Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
      Tested-by: NKonrad Jankowski <konrad0.jankowski@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      94ab2488
    • J
      ice: move ice_set_vf_port_vlan near other .ndo ops · 346f7aa3
      Jacob Keller 提交于
      The ice_set_vf_port_vlan function is located in ice_sriov.c very far
      away from the other .ndo operations that it is similar to. Move this so
      that its located near the other .ndo operation definitions.
      Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
      Tested-by: NKonrad Jankowski <konrad0.jankowski@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      346f7aa3
    • J
      ice: refactor spoofchk control code in ice_sriov.c · a8ea6d86
      Jacob Keller 提交于
      The API to control the VSI spoof checking for a VF VSI has three
      functions: enable, disable, and set. The set function takes the VSI and
      the VF and decides whether to call enable or disable based on the
      vf->spoofchk field.
      
      In some flows, vf->spoofchk is not yet set, such as the function used to
      control the setting for a VF. (vf->spoofchk is only updated after a
      success).
      
      Simplify this API by refactoring ice_vf_set_spoofchk_cfg to be
      "ice_vsi_apply_spoofchk" which takes the boolean and allows all callers
      to avoid having to determine whether to call enable or disable
      themselves.
      
      This matches the expected callers better, and will prevent the need to
      export more than one function when this code must be called from another
      file.
      Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
      Tested-by: NKonrad Jankowski <konrad0.jankowski@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      a8ea6d86
    • J
      ice: rename ICE_MAX_VF_COUNT to avoid confusion · dc36796e
      Jacob Keller 提交于
      The ICE_MAX_VF_COUNT field is defined in ice_sriov.h. This count is true
      for SR-IOV but will not be true for all VF implementations, such as when
      the ice driver supports Scalable IOV.
      
      Rename this definition to clearly indicate ICE_MAX_SRIOV_VFS.
      Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
      Tested-by: NKonrad Jankowski <konrad0.jankowski@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      dc36796e
    • J
      ice: remove unused definitions from ice_sriov.h · 00a57e29
      Jacob Keller 提交于
      A few more macros exist in ice_sriov.h which are not used anywhere.
      These can be safely removed. Note that ICE_VIRTCHNL_VF_CAP_L2 capability
      is set but never checked anywhere in the driver. Thus it is also safe to
      remove.
      Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
      Tested-by: NKonrad Jankowski <konrad0.jankowski@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      00a57e29
    • J
      ice: convert vf->vc_ops to a const pointer · a7e11710
      Jacob Keller 提交于
      The vc_ops structure is used to allow different handlers for virtchnl
      commands when the driver is in representor mode. The current
      implementation uses a copy of the ops table in each VF, and modifies
      this copy dynamically.
      
      The usual practice in kernel code is to store the ops table in a
      constant structure and point to different versions. This has a number of
      advantages:
      
        1. Reduced memory usage. Each VF merely points to the correct table,
           so they're able to re-use the same constant lookup table in memory.
        2. Consistency. It becomes more difficult to accidentally update or
           edit only one op call. Instead, the code switches to the correct
           able by a single pointer write. In general this is atomic, either
           the pointer is updated or its not.
        3. Code Layout. The VF structure can store a pointer to the table
           without needing to have the full structure definition defined prior
           to the VF structure definition. This will aid in future refactoring
           of code by allowing the VF pointer to be kept in ice_vf_lib.h while
           the virtchnl ops table can be maintained in ice_virtchnl.h
      
      There is one major downside in the case of the vc_ops structure. Most of
      the operations in the table are the same between the two current
      implementations. This can appear to lead to duplication since each
      implementation must now fill in the complete table. It could make
      spotting the differences in the representor mode more challenging.
      Unfortunately, methods to make this less error prone either add
      complexity overhead (macros using CPP token concatenation) or don't work
      on all compilers we support (constant initializer from another constant
      structure).
      
      The cost of maintaining two structures does not out weigh the benefits
      of the constant table model.
      
      While we're making these changes, go ahead and rename the structure and
      implementations with "virtchnl" instead of "vc_vf_". This will more
      closely align with the planned file renaming, and avoid similar names when
      we later introduce a "vf ops" table for separating Scalable IOV and
      Single Root IOV implementations.
      
      Leave the accessor/assignment functions in order to avoid issues with
      compiling with options disabled. The interface makes it easier to handle
      when CONFIG_PCI_IOV is disabled in the kernel.
      Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
      Tested-by: NSandeep Penigalapati <sandeep.penigalapati@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      a7e11710
    • J
      ice: rename ice_virtchnl_pf.c to ice_sriov.c · 0deb0bf7
      Jacob Keller 提交于
      The ice_virtchnl_pf.c and ice_virtchnl_pf.h files are where most of the
      code for implementing Single Root IOV virtualization resides. This code
      includes support for bringing up and tearing down VFs, hooks into the
      kernel SR-IOV netdev operations, and for handling virtchnl messages from
      VFs.
      
      In the future, we plan to support Scalable IOV in addition to Single
      Root IOV as an alternative virtualization scheme. This implementation
      will re-use some but not all of the code in ice_virtchnl_pf.c
      
      To prepare for this future, we want to refactor and split up the code in
      ice_virtchnl_pf.c into the following scheme:
      
       * ice_vf_lib.[ch]
      
         Basic VF structures and accessors. This is where scheme-independent
         code will reside.
      
       * ice_virtchnl.[ch]
      
         Virtchnl message handling. This is where the bulk of the logic for
         processing messages from VFs using the virtchnl messaging scheme will
         reside. This is separated from ice_vf_lib.c because it is distinct
         and has a bulk of the processing code.
      
       * ice_sriov.[ch]
      
         Single Root IOV implementation, including initialization and the
         routines for interacting with SR-IOV based netdev operations.
      
       * (future) ice_siov.[ch]
      
         Scalable IOV implementation.
      
      As a first step, lets assume that all of the code in
      ice_virtchnl_pf.[ch] is for Single Root IOV. Rename this file to
      ice_sriov.c and its header to ice_sriov.h
      
      Future changes will further split out the code in these files following
      the plan outlined here.
      Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
      Tested-by: NKonrad Jankowski <konrad0.jankowski@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      0deb0bf7
  3. 09 3月, 2022 1 次提交
    • J
      ice: stop disabling VFs due to PF error responses · 79498d5a
      Jacob Keller 提交于
      The ice_vc_send_msg_to_vf function has logic to detect "failure"
      responses being sent to a VF. If a VF is sent more than
      ICE_DFLT_NUM_INVAL_MSGS_ALLOWED then the VF is marked as disabled.
      Almost identical logic also existed in the i40e driver.
      
      This logic was added to the ice driver in commit 1071a835 ("ice:
      Implement virtchnl commands for AVF support") which itself copied from
      the i40e implementation in commit 5c3c48ac ("i40e: implement virtual
      device interface").
      
      Neither commit provides a proper explanation or justification of the
      check. In fact, later commits to i40e changed the logic to allow
      bypassing the check in some specific instances.
      
      The "logic" for this seems to be that error responses somehow indicate a
      malicious VF. This is not really true. The PF might be sending an error
      for any number of reasons such as lack of resources, etc.
      
      Additionally, this causes the PF to log an info message for every failed
      VF response which may confuse users, and can spam the kernel log.
      
      This behavior is not documented as part of any requirement for our
      products and other operating system drivers such as the FreeBSD
      implementation of our drivers do not include this type of check.
      
      In fact, the change from dev_err to dev_info in i40e commit 18b7af57
      ("i40e: Lower some message levels") explains that these messages
      typically don't actually indicate a real issue. It is quite likely that
      a user who hits this in practice will be very confused as the VF will be
      disabled without an obvious way to recover.
      
      We already have robust malicious driver detection logic using actual
      hardware detection mechanisms that detect and prevent invalid device
      usage. Remove the logic since its not a documented requirement and the
      behavior is not intuitive.
      
      Fixes: 1071a835 ("ice: Implement virtchnl commands for AVF support")
      Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
      Tested-by: NKonrad Jankowski <konrad0.jankowski@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      79498d5a
  4. 04 3月, 2022 10 次提交
    • J
      ice: convert VF storage to hash table with krefs and RCU · 3d5985a1
      Jacob Keller 提交于
      The ice driver stores VF structures in a simple array which is allocated
      once at the time of VF creation. The VF structures are then accessed
      from the array by their VF ID. The ID must be between 0 and the number
      of allocated VFs.
      
      Multiple threads can access this table:
      
       * .ndo operations such as .ndo_get_vf_cfg or .ndo_set_vf_trust
       * interrupts, such as due to messages from the VF using the virtchnl
         communication
       * processing such as device reset
       * commands to add or remove VFs
      
      The current implementation does not keep track of when all threads are
      done operating on a VF and can potentially result in use-after-free
      issues caused by one thread accessing a VF structure after it has been
      released when removing VFs. Some of these are prevented with various
      state flags and checks.
      
      In addition, this structure is quite static and does not support a
      planned future where virtualization can be more dynamic. As we begin to
      look at supporting Scalable IOV with the ice driver (as opposed to just
      supporting Single Root IOV), this structure is not sufficient.
      
      In the future, VFs will be able to be added and removed individually and
      dynamically.
      
      To allow for this, and to better protect against a whole class of
      use-after-free bugs, replace the VF storage with a combination of a hash
      table and krefs to reference track all of the accesses to VFs through
      the hash table.
      
      A hash table still allows efficient look up of the VF given its ID, but
      also allows adding and removing VFs. It does not require contiguous VF
      IDs.
      
      The use of krefs allows the cleanup of the VF memory to be delayed until
      after all threads have released their reference (by calling ice_put_vf).
      
      To prevent corruption of the hash table, a combination of RCU and the
      mutex table_lock are used. Addition and removal from the hash table use
      the RCU-aware hash macros. This allows simple read-only look ups that
      iterate to locate a single VF can be fast using RCU. Accesses which
      modify the hash table, or which can't take RCU because they sleep, will
      hold the mutex lock.
      
      By using this design, we have a stronger guarantee that the VF structure
      can't be released until after all threads are finished operating on it.
      We also pave the way for the more dynamic Scalable IOV implementation in
      the future.
      Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
      Tested-by: NKonrad Jankowski <konrad0.jankowski@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      3d5985a1
    • J
      ice: introduce VF accessor functions · fb916db1
      Jacob Keller 提交于
      Before we switch the VF data structure storage mechanism to a hash,
      introduce new accessor functions to define the new interface.
      
      * ice_get_vf_by_id is a function used to obtain a reference to a VF from
        the table based on its VF ID
      * ice_has_vfs is used to quickly check if any VFs are configured
      * ice_get_num_vfs is used to get an exact count of how many VFs are
        configured
      
      We can drop the old ice_validate_vf_id function, since every caller was
      just going to immediately access the VF table to get a reference
      anyways. This way we simply use the single ice_get_vf_by_id to both
      validate the VF ID is within range and that there exists a VF with that
      ID.
      
      This change enables us to more easily convert the codebase to the hash
      table since most callers now properly use the interface.
      Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
      Tested-by: NKonrad Jankowski <konrad0.jankowski@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      fb916db1
    • J
      ice: factor VF variables to separate structure · 000773c0
      Jacob Keller 提交于
      We maintain a number of values for VFs within the ice_pf structure. This
      includes the VF table, the number of allocated VFs, the maximum number
      of supported SR-IOV VFs, the number of queue pairs per VF, the number of
      MSI-X vectors per VF, and a bitmap of the VFs with detected MDD events.
      
      We're about to add a few more variables to this list. Clean this up
      first by extracting these members out into a new ice_vfs structure
      defined in ice_virtchnl_pf.h
      Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
      Tested-by: NKonrad Jankowski <konrad0.jankowski@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      000773c0
    • J
      ice: convert ice_for_each_vf to include VF entry iterator · c4c2c7db
      Jacob Keller 提交于
      The ice_for_each_vf macro is intended to be used to loop over all VFs.
      The current implementation relies on an iterator that is the index into
      the VF array in the PF structure. This forces all users to perform a
      look up themselves.
      
      This abstraction forces a lot of duplicate work on callers and leaks the
      interface implementation to the caller. Replace this with an
      implementation that includes the VF pointer the primary iterator. This
      version simplifies callers which just want to iterate over every VF, as
      they no longer need to perform their own lookup.
      
      The "i" iterator value is replaced with a new unsigned int "bkt"
      parameter, as this will match the necessary interface for replacing
      the VF array with a hash table. For now, the bkt is the VF ID, but in
      the future it will simply be the hash bucket index. Document that it
      should not be treated as a VF ID.
      
      This change aims to simplify switching from the array to a hash table. I
      considered alternative implementations such as an xarray but decided
      that the hash table was the simplest and most suitable implementation. I
      also looked at methods to hide the bkt iterator entirely, but I couldn't
      come up with a feasible solution that worked for hash table iterators.
      Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
      Tested-by: NKonrad Jankowski <konrad0.jankowski@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      c4c2c7db
    • J
      ice: use ice_for_each_vf for iteration during removal · 19281e86
      Jacob Keller 提交于
      When removing VFs, the driver takes a weird approach of assigning
      pf->num_alloc_vfs to 0 before iterating over the VFs using a temporary
      variable.
      
      This logic has been in the driver for a long time, and seems to have
      been carried forward from i40e.
      
      We want to refactor the way VFs are stored, and iterating over the data
      structure without the ice_for_each_vf interface impedes this work.
      
      The logic relies on implicitly using the num_alloc_vfs as a sort of
      "safe guard" for accessing VF data.
      
      While this sort of guard makes sense for Single Root IOV where all VFs
      are added at once, the data structures don't work for VFs which can be
      added and removed dynamically. We also have a separate state flag,
      ICE_VF_DEINIT_IN_PROGRESS which is a stronger protection against
      concurrent removal and access.
      
      Avoid the custom tmp iteration and replace it with the standard
      ice_for_each_vf iterator. Delay the assignment of num_alloc_vfs until
      after this loop finishes.
      Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
      Tested-by: NKonrad Jankowski <konrad0.jankowski@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      19281e86
    • J
      ice: remove checks in ice_vc_send_msg_to_vf · 59e1f857
      Jacob Keller 提交于
      The ice_vc_send_msg_to_vf function is used by the PF to send a response
      to a VF. This function has overzealous checks to ensure its not passed a
      NULL VF pointer and to ensure that the passed in struct ice_vf has a
      valid vf_id sub-member.
      
      These checks have existed since commit 1071a835 ("ice: Implement
      virtchnl commands for AVF support") and function as simple sanity
      checks.
      
      We are planning to refactor the ice driver to use a hash table along
      with appropriate locks in a future refactor. This change will modify how
      the ice_validate_vf_id function works. Instead of a simple >= check to
      ensure the VF ID is between some range, it will check the hash table to
      see if the specified VF ID is actually in the table. This requires that
      the function properly lock the table to prevent race conditions.
      
      The checks may seem ok at first glance, but they don't really provide
      much benefit.
      
      In order for ice_vc_send_msg_to_vf to have these checks fail, the
      callers must either (1) pass NULL as the VF, (2) construct an invalid VF
      pointer manually, or (3) be using a VF pointer which becomes invalid
      after they obtain it properly using ice_get_vf_by_id.
      
      For (1), a cursory glance over callers of ice_vc_send_msg_to_vf can show
      that in most cases the functions already operate assuming their VF
      pointer is valid, such as by derferencing vf->pf or other members.
      
      They obtain the VF pointer by accessing the VF array using the VF ID,
      which can never produce a NULL value (since its a simple address
      operation on the array it will not be NULL.
      
      The sole exception for (1) is that ice_vc_process_vf_msg will forward a
      NULL VF pointer to this function as part of its goto error handler
      logic. This requires some minor cleanup to simply exit immediately when
      an invalid VF ID is detected (Rather than use the same error flow as
      the rest of the function).
      
      For (2), it is unexpected for a flow to construct a VF pointer manually
      instead of accessing the VF array. Defending against this is likely to
      just hide bad programming.
      
      For (3), it is definitely true that VF pointers could become invalid,
      for example if a thread is processing a VF message while the VF gets
      removed. However, the correct solution is not to add additional checks
      like this which do not guarantee to prevent the race. Instead we plan to
      solve the root of the problem by preventing the possibility entirely.
      
      This solution will require the change to a hash table with proper
      locking and reference counts of the VF structures. When this is done,
      ice_validate_vf_id will require locking of the hash table. This will be
      problematic because all of the callers of ice_vc_send_msg_to_vf will
      already have to take the lock to obtain the VF pointer anyways. With a
      mutex, this leads to a double lock that could hang the kernel thread.
      
      Avoid this by removing the checks which don't provide much value, so
      that we can safely add the necessary protections properly.
      Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
      Tested-by: NKonrad Jankowski <konrad0.jankowski@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      59e1f857
    • J
      ice: move VFLR acknowledge during ice_free_vfs · 44efe75f
      Jacob Keller 提交于
      After removing all VFs, the driver clears the VFLR indication for VFs.
      This has been in ice since the beginning of SR-IOV support in the ice
      driver.
      
      The implementation was copied from i40e, and the motivation for the VFLR
      indication clearing is described in the commit f7414531 ("i40e:
      acknowledge VFLR when disabling SR-IOV")
      
      The commit explains that we need to clear the VFLR indication because
      the virtual function undergoes a VFLR event. If we don't indicate that
      it is complete it can cause an issue when VFs are re-enabled due to
      a "phantom" VFLR.
      
      The register block read was added under a pci_vfs_assigned check
      originally. This was done because we added the check after calling
      pci_disable_sriov. This was later moved to disable SRIOV earlier in the
      flow so that the VF drivers could be torn down before we removed
      functionality.
      
      Move the VFLR acknowledge into the main loop that tears down VF
      resources. This avoids using the tmp value for iterating over VFs
      multiple times. The result will make it easier to refactor the VF array
      in a future change.
      
      It's possible we might want to modify this flow to also stop checking
      pci_vfs_assigned. However, it seems reasonable to keep this change: we
      should only clear the VFLR if we actually disabled SR-IOV.
      Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
      Tested-by: NKonrad Jankowski <konrad0.jankowski@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      44efe75f
    • J
      ice: move clear_malvf call in ice_free_vfs · 294627a6
      Jacob Keller 提交于
      The ice_mbx_clear_malvf function is used to clear the indication and
      count of how many times a VF was detected as malicious. During
      ice_free_vfs, we use this function to ensure that all removed VFs are
      reset to a clean state.
      
      The call currently is done at the end of ice_free_vfs() using a tmp
      value to iterate over all of the entries in the bitmap.
      
      This separate iteration using tmp is problematic for a planned refactor
      of the VF array data structure. To avoid this, lets move the call
      slightly higher into the function inside the loop where we teardown all
      of the VFs. This avoids one use of the tmp value used for iteration.
      We'll fix the other user in a future change.
      Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
      Tested-by: NKonrad Jankowski <konrad0.jankowski@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      294627a6
    • J
      ice: pass num_vfs to ice_set_per_vf_res() · cd0f4f3b
      Jacob Keller 提交于
      We are planning to replace the simple array structure tracking VFs with
      a hash table. This change will also remove the "num_alloc_vfs" variable.
      
      Instead, new access functions to use the hash table as the source of
      truth will be introduced. These will generally be equivalent to existing
      checks, except during VF initialization.
      
      Specifically, ice_set_per_vf_res() cannot use the hash table as it will
      be operating prior to VF structures being inserted into the hash table.
      
      Instead of using pf->num_alloc_vfs, simply pass the num_vfs value in
      from the caller.
      
      Note that a sub-function of ice_set_per_vf_res, ice_determine_res, also
      implicitly depends on pf->num_alloc_vfs. Replace ice_determine_res with
      a simpler inline implementation based on rounddown_pow_of_two. Note that
      we must explicitly check that the argument is non-zero since it does not
      play well with zero as a value.
      
      Instead of using the function and while loop, simply calculate the
      number of queues we have available by dividing by num_vfs. Check if the
      desired queues are available. If not, round down to the nearest power of
      2 that fits within our available queues.
      
      This matches the behavior of ice_determine_res but is easier to follow
      as simple in-line logic. Remove ice_determine_res entirely.
      
      With this change, we no longer depend on the pf->num_alloc_vfs during
      the initialization phase of VFs. This will allow us to safely remove it
      in a future planned refactor of the VF data structures.
      Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
      Tested-by: NKonrad Jankowski <konrad0.jankowski@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      cd0f4f3b
    • J
      ice: store VF pointer instead of VF ID · b03d519d
      Jacob Keller 提交于
      The VSI structure contains a vf_id field used to associate a VSI with a
      VF. This is used mainly for ICE_VSI_VF as well as partially for
      ICE_VSI_CTRL associated with the VFs.
      
      This API was designed with the idea that VFs are stored in a simple
      array that was expected to be static throughout most of the driver's
      life.
      
      We plan on refactoring VF storage in a few key ways:
      
        1) converting from a simple static array to a hash table
        2) using krefs to track VF references obtained from the hash table
        3) use RCU to delay release of VF memory until after all references
           are dropped
      
      This is motivated by the goal to ensure that the lifetime of VF
      structures is accounted for, and prevent various use-after-free bugs.
      
      With the existing vsi->vf_id, the reference tracking for VFs would
      become somewhat convoluted, because each VSI maintains a vf_id field
      which will then require performing a look up. This means all these flows
      will require reference tracking and proper usage of rcu_read_lock, etc.
      
      We know that the VF VSI will always be backed by a valid VF structure,
      because the VSI is created during VF initialization and removed before
      the VF is destroyed. Rely on this and store a reference to the VF in the
      VSI structure instead of storing a VF ID. This will simplify the usage
      and avoid the need to perform lookups on the hash table in the future.
      
      For ICE_VSI_VF, it is expected that vsi->vf is always non-NULL after
      ice_vsi_alloc succeeds. Because of this, use WARN_ON when checking if a
      vsi->vf pointer is valid when dealing with VF VSIs. This will aid in
      debugging code which violates this assumption and avoid more disastrous
      panics.
      Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
      Tested-by: NKonrad Jankowski <konrad0.jankowski@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      b03d519d
  5. 19 2月, 2022 1 次提交
    • J
      ice: fix concurrent reset and removal of VFs · fadead80
      Jacob Keller 提交于
      Commit c503e632 ("ice: Stop processing VF messages during teardown")
      introduced a driver state flag, ICE_VF_DEINIT_IN_PROGRESS, which is
      intended to prevent some issues with concurrently handling messages from
      VFs while tearing down the VFs.
      
      This change was motivated by crashes caused while tearing down and
      bringing up VFs in rapid succession.
      
      It turns out that the fix actually introduces issues with the VF driver
      caused because the PF no longer responds to any messages sent by the VF
      during its .remove routine. This results in the VF potentially removing
      its DMA memory before the PF has shut down the device queues.
      
      Additionally, the fix doesn't actually resolve concurrency issues within
      the ice driver. It is possible for a VF to initiate a reset just prior
      to the ice driver removing VFs. This can result in the remove task
      concurrently operating while the VF is being reset. This results in
      similar memory corruption and panics purportedly fixed by that commit.
      
      Fix this concurrency at its root by protecting both the reset and
      removal flows using the existing VF cfg_lock. This ensures that we
      cannot remove the VF while any outstanding critical tasks such as a
      virtchnl message or a reset are occurring.
      
      This locking change also fixes the root cause originally fixed by commit
      c503e632 ("ice: Stop processing VF messages during teardown"), so we
      can simply revert it.
      
      Note that I kept these two changes together because simply reverting the
      original commit alone would leave the driver vulnerable to worse race
      conditions.
      
      Fixes: c503e632 ("ice: Stop processing VF messages during teardown")
      Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
      Tested-by: NKonrad Jankowski <konrad0.jankowski@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      fadead80
  6. 10 2月, 2022 10 次提交
  7. 07 1月, 2022 1 次提交
    • W
      ice: improve switchdev's slow-path · c1e5da5d
      Wojciech Drewek 提交于
      In current switchdev implementation, every VF PR is assigned to
      individual ring on switchdev ctrl VSI. For slow-path traffic, there
      is a mapping VF->ring done in software based on src_vsi value (by
      calling ice_eswitch_get_target_netdev function).
      
      With this change, HW solution is introduced which is more
      efficient. For each VF, src MAC (VF's MAC) filter will be created,
      which forwards packets to the corresponding switchdev ctrl VSI queue
      based on src MAC address.
      
      This filter has to be removed and then replayed in case of
      resetting one VF. Keep information about this rule in repr->mac_rule,
      thanks to that we know which rule has to be removed and replayed
      for a given VF.
      
      In case of CORE/GLOBAL all rules are removed
      automatically. We have to take care of readding them. This is done
      by ice_replay_vsi_adv_rule.
      
      When driver leaves switchdev mode, remove all advanced rules
      from switchdev ctrl VSI. This is done by ice_rem_adv_rule_for_vsi.
      
      Flag repr->rule_added is needed because in some cases reset
      might be triggered before VF sends request to add MAC.
      Co-developed-by: NGrzegorz Nitka <grzegorz.nitka@intel.com>
      Signed-off-by: NGrzegorz Nitka <grzegorz.nitka@intel.com>
      Signed-off-by: NWojciech Drewek <wojciech.drewek@intel.com>
      Tested-by: NSandeep Penigalapati <sandeep.penigalapati@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      c1e5da5d
  8. 15 12月, 2021 2 次提交