1. 16 3月, 2022 9 次提交
  2. 15 3月, 2022 11 次提交
    • J
      ice: use ice_is_vf_trusted helper function · 1261691d
      Jacob Keller 提交于
      The ice_vc_cfg_promiscuous_mode_msg function directly checks
      ICE_VIRTCHNL_VF_CAP_PRIVILEGE, instead of using the existing helper
      function ice_is_vf_trusted. Switch this to use the helper function so
      that all trusted checks are consistent. This aids in any potential
      future refactor by ensuring consistent code.
      Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
      Tested-by: NKonrad Jankowski <konrad0.jankowski@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      1261691d
    • J
      ice: log an error message when eswitch fails to configure · 2b369448
      Jacob Keller 提交于
      When ice_eswitch_configure fails, print an error message to make it more
      obvious why VF initialization did not succeed.
      Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
      Tested-by: NSandeep Penigalapati <sandeep.penigalapati@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      2b369448
    • J
      ice: cleanup error logging for ice_ena_vfs · 94ab2488
      Jacob Keller 提交于
      The ice_ena_vfs function and some of its sub-functions like
      ice_set_per_vf_res use a "if (<function>) { <print error> ; <exit> }"
      flow. This flow discards specialized errors reported by the called
      function.
      
      This style is generally not preferred as it makes tracing error sources
      more difficult. It also means we cannot log the actual error received
      properly.
      
      Refactor several calls in the ice_ena_vfs function that do this to catch
      the error in the 'ret' variable. Report this in the messages, and then
      return the more precise error value.
      
      Doing this reveals that ice_set_per_vf_res returns -EINVAL or -EIO in
      places where -ENOSPC makes more sense. Fix these calls up to return the
      more appropriate value.
      Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
      Tested-by: NKonrad Jankowski <konrad0.jankowski@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      94ab2488
    • J
      ice: move ice_set_vf_port_vlan near other .ndo ops · 346f7aa3
      Jacob Keller 提交于
      The ice_set_vf_port_vlan function is located in ice_sriov.c very far
      away from the other .ndo operations that it is similar to. Move this so
      that its located near the other .ndo operation definitions.
      Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
      Tested-by: NKonrad Jankowski <konrad0.jankowski@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      346f7aa3
    • J
      ice: refactor spoofchk control code in ice_sriov.c · a8ea6d86
      Jacob Keller 提交于
      The API to control the VSI spoof checking for a VF VSI has three
      functions: enable, disable, and set. The set function takes the VSI and
      the VF and decides whether to call enable or disable based on the
      vf->spoofchk field.
      
      In some flows, vf->spoofchk is not yet set, such as the function used to
      control the setting for a VF. (vf->spoofchk is only updated after a
      success).
      
      Simplify this API by refactoring ice_vf_set_spoofchk_cfg to be
      "ice_vsi_apply_spoofchk" which takes the boolean and allows all callers
      to avoid having to determine whether to call enable or disable
      themselves.
      
      This matches the expected callers better, and will prevent the need to
      export more than one function when this code must be called from another
      file.
      Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
      Tested-by: NKonrad Jankowski <konrad0.jankowski@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      a8ea6d86
    • J
      ice: rename ICE_MAX_VF_COUNT to avoid confusion · dc36796e
      Jacob Keller 提交于
      The ICE_MAX_VF_COUNT field is defined in ice_sriov.h. This count is true
      for SR-IOV but will not be true for all VF implementations, such as when
      the ice driver supports Scalable IOV.
      
      Rename this definition to clearly indicate ICE_MAX_SRIOV_VFS.
      Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
      Tested-by: NKonrad Jankowski <konrad0.jankowski@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      dc36796e
    • J
      ice: remove unused definitions from ice_sriov.h · 00a57e29
      Jacob Keller 提交于
      A few more macros exist in ice_sriov.h which are not used anywhere.
      These can be safely removed. Note that ICE_VIRTCHNL_VF_CAP_L2 capability
      is set but never checked anywhere in the driver. Thus it is also safe to
      remove.
      Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
      Tested-by: NKonrad Jankowski <konrad0.jankowski@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      00a57e29
    • J
      ice: convert vf->vc_ops to a const pointer · a7e11710
      Jacob Keller 提交于
      The vc_ops structure is used to allow different handlers for virtchnl
      commands when the driver is in representor mode. The current
      implementation uses a copy of the ops table in each VF, and modifies
      this copy dynamically.
      
      The usual practice in kernel code is to store the ops table in a
      constant structure and point to different versions. This has a number of
      advantages:
      
        1. Reduced memory usage. Each VF merely points to the correct table,
           so they're able to re-use the same constant lookup table in memory.
        2. Consistency. It becomes more difficult to accidentally update or
           edit only one op call. Instead, the code switches to the correct
           able by a single pointer write. In general this is atomic, either
           the pointer is updated or its not.
        3. Code Layout. The VF structure can store a pointer to the table
           without needing to have the full structure definition defined prior
           to the VF structure definition. This will aid in future refactoring
           of code by allowing the VF pointer to be kept in ice_vf_lib.h while
           the virtchnl ops table can be maintained in ice_virtchnl.h
      
      There is one major downside in the case of the vc_ops structure. Most of
      the operations in the table are the same between the two current
      implementations. This can appear to lead to duplication since each
      implementation must now fill in the complete table. It could make
      spotting the differences in the representor mode more challenging.
      Unfortunately, methods to make this less error prone either add
      complexity overhead (macros using CPP token concatenation) or don't work
      on all compilers we support (constant initializer from another constant
      structure).
      
      The cost of maintaining two structures does not out weigh the benefits
      of the constant table model.
      
      While we're making these changes, go ahead and rename the structure and
      implementations with "virtchnl" instead of "vc_vf_". This will more
      closely align with the planned file renaming, and avoid similar names when
      we later introduce a "vf ops" table for separating Scalable IOV and
      Single Root IOV implementations.
      
      Leave the accessor/assignment functions in order to avoid issues with
      compiling with options disabled. The interface makes it easier to handle
      when CONFIG_PCI_IOV is disabled in the kernel.
      Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
      Tested-by: NSandeep Penigalapati <sandeep.penigalapati@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      a7e11710
    • J
      ice: remove circular header dependencies on ice.h · 649c87c6
      Jacob Keller 提交于
      Several headers in the ice driver include ice.h even though they are
      themselves included by that header. The most notable of these is
      ice_common.h, but several other headers also do this.
      
      Such a recursive inclusion is problematic as it forces headers to be
      included in a strict order, otherwise compilation errors can result. The
      circular inclusions do not trigger an endless loop due to standard
      header inclusion guards, however other errors can occur.
      
      For example, ice_flow.h defines ice_rss_hash_cfg, which is used by
      ice_sriov.h as part of the definition of ice_vf_hash_ip_ctx.
      
      ice_flow.h includes ice_acl.h, which includes ice_common.h, and which
      finally includes ice.h. Since ice.h itself includes ice_sriov.h, this
      creates a circular dependency.
      
      The definition in ice_sriov.h requires things from ice_flow.h, but
      ice_flow.h itself will lead to trying to load ice_sriov.h as part of its
      process for expanding ice.h. The current code avoids this issue by
      having an implicit dependency without the include of ice_flow.h.
      
      If we were to fix that so that ice_sriov.h explicitly depends on
      ice_flow.h the following pattern would occur:
      
        ice_flow.h -> ice_acl.h -> ice_common.h -> ice.h -> ice_sriov.h
      
      At this point, during the expansion of, the header guard for ice_flow.h
      is already set, so when ice_sriov.h attempts to load the ice_flow.h
      header it is skipped. Then, we go on to begin including the rest of
      ice_sriov.h, including structure definitions which depend on
      ice_rss_hash_cfg. This produces a compiler warning because
      ice_rss_hash_cfg hasn't yet been included. Remember, we're just at the
      start of ice_flow.h!
      
      If the order of headers is incorrect (ice_flow.h is not implicitly
      loaded first in all files which include ice_sriov.h) then we get the
      same failure.
      
      Removing this recursive inclusion requires fixing a few cases where some
      headers depended on the header inclusions from ice.h. In addition, a few
      other changes are also required.
      
      Most notably, ice_hw_to_dev is implemented as a macro in ice_osdep.h,
      which is the likely reason that ice_common.h includes ice.h at all. This
      macro implementation requires the full definition of ice_pf in order to
      properly compile.
      
      Fix this by moving it to a function declared in ice_main.c, so that we
      do not require all files to depend on the layout of the ice_pf
      structure.
      
      Note that this change only fixes circular dependencies, but it does not
      fully resolve all implicit dependencies where one header may depend on
      the inclusion of another. I tried to fix as many of the implicit
      dependencies as I noticed, but fixing them all requires a somewhat
      tedious analysis of each header and attempting to compile it separately.
      Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
      Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel)
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      649c87c6
    • J
      ice: rename ice_virtchnl_pf.c to ice_sriov.c · 0deb0bf7
      Jacob Keller 提交于
      The ice_virtchnl_pf.c and ice_virtchnl_pf.h files are where most of the
      code for implementing Single Root IOV virtualization resides. This code
      includes support for bringing up and tearing down VFs, hooks into the
      kernel SR-IOV netdev operations, and for handling virtchnl messages from
      VFs.
      
      In the future, we plan to support Scalable IOV in addition to Single
      Root IOV as an alternative virtualization scheme. This implementation
      will re-use some but not all of the code in ice_virtchnl_pf.c
      
      To prepare for this future, we want to refactor and split up the code in
      ice_virtchnl_pf.c into the following scheme:
      
       * ice_vf_lib.[ch]
      
         Basic VF structures and accessors. This is where scheme-independent
         code will reside.
      
       * ice_virtchnl.[ch]
      
         Virtchnl message handling. This is where the bulk of the logic for
         processing messages from VFs using the virtchnl messaging scheme will
         reside. This is separated from ice_vf_lib.c because it is distinct
         and has a bulk of the processing code.
      
       * ice_sriov.[ch]
      
         Single Root IOV implementation, including initialization and the
         routines for interacting with SR-IOV based netdev operations.
      
       * (future) ice_siov.[ch]
      
         Scalable IOV implementation.
      
      As a first step, lets assume that all of the code in
      ice_virtchnl_pf.[ch] is for Single Root IOV. Rename this file to
      ice_sriov.c and its header to ice_sriov.h
      
      Future changes will further split out the code in these files following
      the plan outlined here.
      Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
      Tested-by: NKonrad Jankowski <konrad0.jankowski@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      0deb0bf7
    • J
      ice: rename ice_sriov.c to ice_vf_mbx.c · d775155a
      Jacob Keller 提交于
      The ice_sriov.c file primarily contains code which handles the logic for
      mailbox overflow detection and some other utility functions related to
      the virtualization mailbox.
      
      The bulk of the SR-IOV implementation is actually found in
      ice_virtchnl_pf.c, and this file isn't strictly SR-IOV specific.
      
      In the future, the ice driver will support an additional virtualization
      scheme known as Scalable IOV, and the code in this file will be used
      for this alternative implementation.
      
      Rename this file (and its associated header) to ice_vf_mbx.c, so that we
      can later re-use the ice_sriov.c file as the SR-IOV specific file.
      Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
      Tested-by: NKonrad Jankowski <konrad0.jankowski@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      d775155a
  3. 12 3月, 2022 2 次提交
  4. 11 3月, 2022 1 次提交
    • I
      ice: Fix race condition during interface enslave · 5cb1ebdb
      Ivan Vecera 提交于
      Commit 5dbbbd01 ("ice: Avoid RTNL lock when re-creating
      auxiliary device") changes a process of re-creation of aux device
      so ice_plug_aux_dev() is called from ice_service_task() context.
      This unfortunately opens a race window that can result in dead-lock
      when interface has left LAG and immediately enters LAG again.
      
      Reproducer:
      ```
      #!/bin/sh
      
      ip link add lag0 type bond mode 1 miimon 100
      ip link set lag0
      
      for n in {1..10}; do
              echo Cycle: $n
              ip link set ens7f0 master lag0
              sleep 1
              ip link set ens7f0 nomaster
      done
      ```
      
      This results in:
      [20976.208697] Workqueue: ice ice_service_task [ice]
      [20976.213422] Call Trace:
      [20976.215871]  __schedule+0x2d1/0x830
      [20976.219364]  schedule+0x35/0xa0
      [20976.222510]  schedule_preempt_disabled+0xa/0x10
      [20976.227043]  __mutex_lock.isra.7+0x310/0x420
      [20976.235071]  enum_all_gids_of_dev_cb+0x1c/0x100 [ib_core]
      [20976.251215]  ib_enum_roce_netdev+0xa4/0xe0 [ib_core]
      [20976.256192]  ib_cache_setup_one+0x33/0xa0 [ib_core]
      [20976.261079]  ib_register_device+0x40d/0x580 [ib_core]
      [20976.266139]  irdma_ib_register_device+0x129/0x250 [irdma]
      [20976.281409]  irdma_probe+0x2c1/0x360 [irdma]
      [20976.285691]  auxiliary_bus_probe+0x45/0x70
      [20976.289790]  really_probe+0x1f2/0x480
      [20976.298509]  driver_probe_device+0x49/0xc0
      [20976.302609]  bus_for_each_drv+0x79/0xc0
      [20976.306448]  __device_attach+0xdc/0x160
      [20976.310286]  bus_probe_device+0x9d/0xb0
      [20976.314128]  device_add+0x43c/0x890
      [20976.321287]  __auxiliary_device_add+0x43/0x60
      [20976.325644]  ice_plug_aux_dev+0xb2/0x100 [ice]
      [20976.330109]  ice_service_task+0xd0c/0xed0 [ice]
      [20976.342591]  process_one_work+0x1a7/0x360
      [20976.350536]  worker_thread+0x30/0x390
      [20976.358128]  kthread+0x10a/0x120
      [20976.365547]  ret_from_fork+0x1f/0x40
      ...
      [20976.438030] task:ip              state:D stack:    0 pid:213658 ppid:213627 flags:0x00004084
      [20976.446469] Call Trace:
      [20976.448921]  __schedule+0x2d1/0x830
      [20976.452414]  schedule+0x35/0xa0
      [20976.455559]  schedule_preempt_disabled+0xa/0x10
      [20976.460090]  __mutex_lock.isra.7+0x310/0x420
      [20976.464364]  device_del+0x36/0x3c0
      [20976.467772]  ice_unplug_aux_dev+0x1a/0x40 [ice]
      [20976.472313]  ice_lag_event_handler+0x2a2/0x520 [ice]
      [20976.477288]  notifier_call_chain+0x47/0x70
      [20976.481386]  __netdev_upper_dev_link+0x18b/0x280
      [20976.489845]  bond_enslave+0xe05/0x1790 [bonding]
      [20976.494475]  do_setlink+0x336/0xf50
      [20976.502517]  __rtnl_newlink+0x529/0x8b0
      [20976.543441]  rtnl_newlink+0x43/0x60
      [20976.546934]  rtnetlink_rcv_msg+0x2b1/0x360
      [20976.559238]  netlink_rcv_skb+0x4c/0x120
      [20976.563079]  netlink_unicast+0x196/0x230
      [20976.567005]  netlink_sendmsg+0x204/0x3d0
      [20976.570930]  sock_sendmsg+0x4c/0x50
      [20976.574423]  ____sys_sendmsg+0x1eb/0x250
      [20976.586807]  ___sys_sendmsg+0x7c/0xc0
      [20976.606353]  __sys_sendmsg+0x57/0xa0
      [20976.609930]  do_syscall_64+0x5b/0x1a0
      [20976.613598]  entry_SYSCALL_64_after_hwframe+0x65/0xca
      
      1. Command 'ip link ... set nomaster' causes that ice_plug_aux_dev()
         is called from ice_service_task() context, aux device is created
         and associated device->lock is taken.
      2. Command 'ip link ... set master...' calls ice's notifier under
         RTNL lock and that notifier calls ice_unplug_aux_dev(). That
         function tries to take aux device->lock but this is already taken
         by ice_plug_aux_dev() in step 1
      3. Later ice_plug_aux_dev() tries to take RTNL lock but this is already
         taken in step 2
      4. Dead-lock
      
      The patch fixes this issue by following changes:
      - Bit ICE_FLAG_PLUG_AUX_DEV is kept to be set during ice_plug_aux_dev()
        call in ice_service_task()
      - The bit is checked in ice_clear_rdma_cap() and only if it is not set
        then ice_unplug_aux_dev() is called. If it is set (in other words
        plugging of aux device was requested and ice_plug_aux_dev() is
        potentially running) then the function only clears the bit
      - Once ice_plug_aux_dev() call (in ice_service_task) is finished
        the bit ICE_FLAG_PLUG_AUX_DEV is cleared but it is also checked
        whether it was already cleared by ice_clear_rdma_cap(). If so then
        aux device is unplugged.
      Signed-off-by: NIvan Vecera <ivecera@redhat.com>
      Co-developed-by: NPetr Oros <poros@redhat.com>
      Signed-off-by: NPetr Oros <poros@redhat.com>
      Reviewed-by: NDave Ertman <david.m.ertman@intel.com>
      Link: https://lore.kernel.org/r/20220310171641.3863659-1-ivecera@redhat.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      5cb1ebdb
  5. 10 3月, 2022 5 次提交
  6. 09 3月, 2022 4 次提交
    • J
      ice: Fix curr_link_speed advertised speed · ad35ffa2
      Jedrzej Jagielski 提交于
      Change curr_link_speed advertised speed, due to
      link_info.link_speed is not equal phy.curr_user_speed_req.
      Without this patch it is impossible to set advertised
      speed to same as link_speed.
      
      Testing Hints: Try to set advertised speed
      to 25G only with 25G default link (use ethtool -s 0x80000000)
      
      Fixes: 48cb27f2 ("ice: Implement handlers for ethtool PHY/link operations")
      Signed-off-by: NGrzegorz Siwik <grzegorz.siwik@intel.com>
      Signed-off-by: NJedrzej Jagielski <jedrzej.jagielski@intel.com>
      Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      ad35ffa2
    • C
      ice: Don't use GFP_KERNEL in atomic context · 3d97f1af
      Christophe JAILLET 提交于
      ice_misc_intr() is an irq handler. It should not sleep.
      
      Use GFP_ATOMIC instead of GFP_KERNEL when allocating some memory.
      
      Fixes: 348048e7 ("ice: Implement iidc operations")
      Signed-off-by: NChristophe JAILLET <christophe.jaillet@wanadoo.fr>
      Tested-by: NLeszek Kaliszczuk <leszek.kaliszczuk@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      3d97f1af
    • D
      ice: Fix error with handling of bonding MTU · 97b01291
      Dave Ertman 提交于
      When a bonded interface is destroyed, .ndo_change_mtu can be called
      during the tear-down process while the RTNL lock is held.  This is a
      problem since the auxiliary driver linked to the LAN driver needs to be
      notified of the MTU change, and this requires grabbing a device_lock on
      the auxiliary_device's dev.  Currently this is being attempted in the
      same execution context as the call to .ndo_change_mtu which is causing a
      dead-lock.
      
      Move the notification of the changed MTU to a separate execution context
      (watchdog service task) and eliminate the "before" notification.
      
      Fixes: 348048e7 ("ice: Implement iidc operations")
      Signed-off-by: NDave Ertman <david.m.ertman@intel.com>
      Tested-by: NJonathan Toppins <jtoppins@redhat.com>
      Tested-by: Gurucharan G <gurucharanx.g@intel.com> (A Contingent worker at Intel)
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      97b01291
    • J
      ice: stop disabling VFs due to PF error responses · 79498d5a
      Jacob Keller 提交于
      The ice_vc_send_msg_to_vf function has logic to detect "failure"
      responses being sent to a VF. If a VF is sent more than
      ICE_DFLT_NUM_INVAL_MSGS_ALLOWED then the VF is marked as disabled.
      Almost identical logic also existed in the i40e driver.
      
      This logic was added to the ice driver in commit 1071a835 ("ice:
      Implement virtchnl commands for AVF support") which itself copied from
      the i40e implementation in commit 5c3c48ac ("i40e: implement virtual
      device interface").
      
      Neither commit provides a proper explanation or justification of the
      check. In fact, later commits to i40e changed the logic to allow
      bypassing the check in some specific instances.
      
      The "logic" for this seems to be that error responses somehow indicate a
      malicious VF. This is not really true. The PF might be sending an error
      for any number of reasons such as lack of resources, etc.
      
      Additionally, this causes the PF to log an info message for every failed
      VF response which may confuse users, and can spam the kernel log.
      
      This behavior is not documented as part of any requirement for our
      products and other operating system drivers such as the FreeBSD
      implementation of our drivers do not include this type of check.
      
      In fact, the change from dev_err to dev_info in i40e commit 18b7af57
      ("i40e: Lower some message levels") explains that these messages
      typically don't actually indicate a real issue. It is quite likely that
      a user who hits this in practice will be very confused as the VF will be
      disabled without an obvious way to recover.
      
      We already have robust malicious driver detection logic using actual
      hardware detection mechanisms that detect and prevent invalid device
      usage. Remove the logic since its not a documented requirement and the
      behavior is not intuitive.
      
      Fixes: 1071a835 ("ice: Implement virtchnl commands for AVF support")
      Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
      Tested-by: NKonrad Jankowski <konrad0.jankowski@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      79498d5a
  7. 08 3月, 2022 1 次提交
  8. 04 3月, 2022 7 次提交
    • J
      ice: convert VF storage to hash table with krefs and RCU · 3d5985a1
      Jacob Keller 提交于
      The ice driver stores VF structures in a simple array which is allocated
      once at the time of VF creation. The VF structures are then accessed
      from the array by their VF ID. The ID must be between 0 and the number
      of allocated VFs.
      
      Multiple threads can access this table:
      
       * .ndo operations such as .ndo_get_vf_cfg or .ndo_set_vf_trust
       * interrupts, such as due to messages from the VF using the virtchnl
         communication
       * processing such as device reset
       * commands to add or remove VFs
      
      The current implementation does not keep track of when all threads are
      done operating on a VF and can potentially result in use-after-free
      issues caused by one thread accessing a VF structure after it has been
      released when removing VFs. Some of these are prevented with various
      state flags and checks.
      
      In addition, this structure is quite static and does not support a
      planned future where virtualization can be more dynamic. As we begin to
      look at supporting Scalable IOV with the ice driver (as opposed to just
      supporting Single Root IOV), this structure is not sufficient.
      
      In the future, VFs will be able to be added and removed individually and
      dynamically.
      
      To allow for this, and to better protect against a whole class of
      use-after-free bugs, replace the VF storage with a combination of a hash
      table and krefs to reference track all of the accesses to VFs through
      the hash table.
      
      A hash table still allows efficient look up of the VF given its ID, but
      also allows adding and removing VFs. It does not require contiguous VF
      IDs.
      
      The use of krefs allows the cleanup of the VF memory to be delayed until
      after all threads have released their reference (by calling ice_put_vf).
      
      To prevent corruption of the hash table, a combination of RCU and the
      mutex table_lock are used. Addition and removal from the hash table use
      the RCU-aware hash macros. This allows simple read-only look ups that
      iterate to locate a single VF can be fast using RCU. Accesses which
      modify the hash table, or which can't take RCU because they sleep, will
      hold the mutex lock.
      
      By using this design, we have a stronger guarantee that the VF structure
      can't be released until after all threads are finished operating on it.
      We also pave the way for the more dynamic Scalable IOV implementation in
      the future.
      Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
      Tested-by: NKonrad Jankowski <konrad0.jankowski@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      3d5985a1
    • J
      ice: introduce VF accessor functions · fb916db1
      Jacob Keller 提交于
      Before we switch the VF data structure storage mechanism to a hash,
      introduce new accessor functions to define the new interface.
      
      * ice_get_vf_by_id is a function used to obtain a reference to a VF from
        the table based on its VF ID
      * ice_has_vfs is used to quickly check if any VFs are configured
      * ice_get_num_vfs is used to get an exact count of how many VFs are
        configured
      
      We can drop the old ice_validate_vf_id function, since every caller was
      just going to immediately access the VF table to get a reference
      anyways. This way we simply use the single ice_get_vf_by_id to both
      validate the VF ID is within range and that there exists a VF with that
      ID.
      
      This change enables us to more easily convert the codebase to the hash
      table since most callers now properly use the interface.
      Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
      Tested-by: NKonrad Jankowski <konrad0.jankowski@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      fb916db1
    • J
      ice: factor VF variables to separate structure · 000773c0
      Jacob Keller 提交于
      We maintain a number of values for VFs within the ice_pf structure. This
      includes the VF table, the number of allocated VFs, the maximum number
      of supported SR-IOV VFs, the number of queue pairs per VF, the number of
      MSI-X vectors per VF, and a bitmap of the VFs with detected MDD events.
      
      We're about to add a few more variables to this list. Clean this up
      first by extracting these members out into a new ice_vfs structure
      defined in ice_virtchnl_pf.h
      Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
      Tested-by: NKonrad Jankowski <konrad0.jankowski@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      000773c0
    • J
      ice: convert ice_for_each_vf to include VF entry iterator · c4c2c7db
      Jacob Keller 提交于
      The ice_for_each_vf macro is intended to be used to loop over all VFs.
      The current implementation relies on an iterator that is the index into
      the VF array in the PF structure. This forces all users to perform a
      look up themselves.
      
      This abstraction forces a lot of duplicate work on callers and leaks the
      interface implementation to the caller. Replace this with an
      implementation that includes the VF pointer the primary iterator. This
      version simplifies callers which just want to iterate over every VF, as
      they no longer need to perform their own lookup.
      
      The "i" iterator value is replaced with a new unsigned int "bkt"
      parameter, as this will match the necessary interface for replacing
      the VF array with a hash table. For now, the bkt is the VF ID, but in
      the future it will simply be the hash bucket index. Document that it
      should not be treated as a VF ID.
      
      This change aims to simplify switching from the array to a hash table. I
      considered alternative implementations such as an xarray but decided
      that the hash table was the simplest and most suitable implementation. I
      also looked at methods to hide the bkt iterator entirely, but I couldn't
      come up with a feasible solution that worked for hash table iterators.
      Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
      Tested-by: NKonrad Jankowski <konrad0.jankowski@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      c4c2c7db
    • J
      ice: use ice_for_each_vf for iteration during removal · 19281e86
      Jacob Keller 提交于
      When removing VFs, the driver takes a weird approach of assigning
      pf->num_alloc_vfs to 0 before iterating over the VFs using a temporary
      variable.
      
      This logic has been in the driver for a long time, and seems to have
      been carried forward from i40e.
      
      We want to refactor the way VFs are stored, and iterating over the data
      structure without the ice_for_each_vf interface impedes this work.
      
      The logic relies on implicitly using the num_alloc_vfs as a sort of
      "safe guard" for accessing VF data.
      
      While this sort of guard makes sense for Single Root IOV where all VFs
      are added at once, the data structures don't work for VFs which can be
      added and removed dynamically. We also have a separate state flag,
      ICE_VF_DEINIT_IN_PROGRESS which is a stronger protection against
      concurrent removal and access.
      
      Avoid the custom tmp iteration and replace it with the standard
      ice_for_each_vf iterator. Delay the assignment of num_alloc_vfs until
      after this loop finishes.
      Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
      Tested-by: NKonrad Jankowski <konrad0.jankowski@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      19281e86
    • J
      ice: remove checks in ice_vc_send_msg_to_vf · 59e1f857
      Jacob Keller 提交于
      The ice_vc_send_msg_to_vf function is used by the PF to send a response
      to a VF. This function has overzealous checks to ensure its not passed a
      NULL VF pointer and to ensure that the passed in struct ice_vf has a
      valid vf_id sub-member.
      
      These checks have existed since commit 1071a835 ("ice: Implement
      virtchnl commands for AVF support") and function as simple sanity
      checks.
      
      We are planning to refactor the ice driver to use a hash table along
      with appropriate locks in a future refactor. This change will modify how
      the ice_validate_vf_id function works. Instead of a simple >= check to
      ensure the VF ID is between some range, it will check the hash table to
      see if the specified VF ID is actually in the table. This requires that
      the function properly lock the table to prevent race conditions.
      
      The checks may seem ok at first glance, but they don't really provide
      much benefit.
      
      In order for ice_vc_send_msg_to_vf to have these checks fail, the
      callers must either (1) pass NULL as the VF, (2) construct an invalid VF
      pointer manually, or (3) be using a VF pointer which becomes invalid
      after they obtain it properly using ice_get_vf_by_id.
      
      For (1), a cursory glance over callers of ice_vc_send_msg_to_vf can show
      that in most cases the functions already operate assuming their VF
      pointer is valid, such as by derferencing vf->pf or other members.
      
      They obtain the VF pointer by accessing the VF array using the VF ID,
      which can never produce a NULL value (since its a simple address
      operation on the array it will not be NULL.
      
      The sole exception for (1) is that ice_vc_process_vf_msg will forward a
      NULL VF pointer to this function as part of its goto error handler
      logic. This requires some minor cleanup to simply exit immediately when
      an invalid VF ID is detected (Rather than use the same error flow as
      the rest of the function).
      
      For (2), it is unexpected for a flow to construct a VF pointer manually
      instead of accessing the VF array. Defending against this is likely to
      just hide bad programming.
      
      For (3), it is definitely true that VF pointers could become invalid,
      for example if a thread is processing a VF message while the VF gets
      removed. However, the correct solution is not to add additional checks
      like this which do not guarantee to prevent the race. Instead we plan to
      solve the root of the problem by preventing the possibility entirely.
      
      This solution will require the change to a hash table with proper
      locking and reference counts of the VF structures. When this is done,
      ice_validate_vf_id will require locking of the hash table. This will be
      problematic because all of the callers of ice_vc_send_msg_to_vf will
      already have to take the lock to obtain the VF pointer anyways. With a
      mutex, this leads to a double lock that could hang the kernel thread.
      
      Avoid this by removing the checks which don't provide much value, so
      that we can safely add the necessary protections properly.
      Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
      Tested-by: NKonrad Jankowski <konrad0.jankowski@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      59e1f857
    • J
      ice: move VFLR acknowledge during ice_free_vfs · 44efe75f
      Jacob Keller 提交于
      After removing all VFs, the driver clears the VFLR indication for VFs.
      This has been in ice since the beginning of SR-IOV support in the ice
      driver.
      
      The implementation was copied from i40e, and the motivation for the VFLR
      indication clearing is described in the commit f7414531 ("i40e:
      acknowledge VFLR when disabling SR-IOV")
      
      The commit explains that we need to clear the VFLR indication because
      the virtual function undergoes a VFLR event. If we don't indicate that
      it is complete it can cause an issue when VFs are re-enabled due to
      a "phantom" VFLR.
      
      The register block read was added under a pci_vfs_assigned check
      originally. This was done because we added the check after calling
      pci_disable_sriov. This was later moved to disable SRIOV earlier in the
      flow so that the VF drivers could be torn down before we removed
      functionality.
      
      Move the VFLR acknowledge into the main loop that tears down VF
      resources. This avoids using the tmp value for iterating over VFs
      multiple times. The result will make it easier to refactor the VF array
      in a future change.
      
      It's possible we might want to modify this flow to also stop checking
      pci_vfs_assigned. However, it seems reasonable to keep this change: we
      should only clear the VFLR if we actually disabled SR-IOV.
      Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
      Tested-by: NKonrad Jankowski <konrad0.jankowski@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      44efe75f