1. 03 3月, 2022 1 次提交
  2. 19 2月, 2022 1 次提交
    • J
      ice: fix concurrent reset and removal of VFs · fadead80
      Jacob Keller 提交于
      Commit c503e632 ("ice: Stop processing VF messages during teardown")
      introduced a driver state flag, ICE_VF_DEINIT_IN_PROGRESS, which is
      intended to prevent some issues with concurrently handling messages from
      VFs while tearing down the VFs.
      
      This change was motivated by crashes caused while tearing down and
      bringing up VFs in rapid succession.
      
      It turns out that the fix actually introduces issues with the VF driver
      caused because the PF no longer responds to any messages sent by the VF
      during its .remove routine. This results in the VF potentially removing
      its DMA memory before the PF has shut down the device queues.
      
      Additionally, the fix doesn't actually resolve concurrency issues within
      the ice driver. It is possible for a VF to initiate a reset just prior
      to the ice driver removing VFs. This can result in the remove task
      concurrently operating while the VF is being reset. This results in
      similar memory corruption and panics purportedly fixed by that commit.
      
      Fix this concurrency at its root by protecting both the reset and
      removal flows using the existing VF cfg_lock. This ensures that we
      cannot remove the VF while any outstanding critical tasks such as a
      virtchnl message or a reset are occurring.
      
      This locking change also fixes the root cause originally fixed by commit
      c503e632 ("ice: Stop processing VF messages during teardown"), so we
      can simply revert it.
      
      Note that I kept these two changes together because simply reverting the
      original commit alone would leave the driver vulnerable to worse race
      conditions.
      
      Fixes: c503e632 ("ice: Stop processing VF messages during teardown")
      Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
      Tested-by: NKonrad Jankowski <konrad0.jankowski@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      fadead80
  3. 14 2月, 2022 1 次提交
  4. 11 2月, 2022 1 次提交
  5. 10 2月, 2022 3 次提交
    • B
      ice: Add ability for PF admin to enable VF VLAN pruning · f1da5a08
      Brett Creeley 提交于
      VFs by default are able to see all tagged traffic regardless of trust
      and VLAN filters. Based on legacy devices (i.e. ixgbe, i40e), customers
      expect VFs to receive all VLAN tagged traffic with a matching
      destination MAC.
      
      Add an ethtool private flag 'vf-vlan-pruning' and set the default to
      off so VFs will receive all VLAN traffic directed towards them. When
      the flag is turned on, VF will only be able to receive untagged
      traffic or traffic with VLAN tags it has created interfaces for.
      
      Also, the flag cannot be changed while any VFs are allocated. This was
      done to simplify the implementation. So, if this flag is needed, then
      the PF admin must enable it. If the user tries to enable the flag while
      VFs are active, then print an unsupported message with the
      vf-vlan-pruning flag included. In case multiple flags were specified, this
      makes it clear to the user which flag failed.
      Signed-off-by: NBrett Creeley <brett.creeley@intel.com>
      Tested-by: NGurucharan G <gurucharanx.g@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      f1da5a08
    • B
      ice: Add outer_vlan_ops and VSI specific VLAN ops implementations · c31af68a
      Brett Creeley 提交于
      Add a new outer_vlan_ops member to the ice_vsi structure as outer VLAN
      ops are only available when the device is in Double VLAN Mode (DVM).
      Depending on the VSI type, the requirements for what operations to
      use/allow differ.
      
      By default all VSI's have unsupported inner and outer VSI VLAN ops. This
      implementation was chosen to prevent unexpected crashes due to null
      pointer dereferences. Instead, if a VSI calls an unsupported op, it will
      just return -EOPNOTSUPP.
      
      Add implementations to support modifying outer VLAN fields for VSI
      context. This includes the ability to modify VLAN stripping, insertion,
      and the port VLAN based on the outer VLAN handling fields of the VSI
      context.
      
      These functions should only ever be used if DVM is enabled because that
      means the firmware supports the outer VLAN fields in the VSI context. If
      the device is in DVM, then always use the outer_vlan_ops, else use the
      vlan_ops since the device is in Single VLAN Mode (SVM).
      
      Also, move adding the untagged VLAN 0 filter from ice_vsi_setup() to
      ice_vsi_vlan_setup() as the latter function is specific to the PF and
      all other VSI types that need an untagged VLAN 0 filter already do this
      in their specific flows. Without this change, Flow Director is failing
      to initialize because it does not implement any VSI VLAN ops.
      Signed-off-by: NBrett Creeley <brett.creeley@intel.com>
      Tested-by: NGurucharan G <gurucharanx.g@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      c31af68a
    • B
      ice: Add new VSI VLAN ops · bc42afa9
      Brett Creeley 提交于
      Incoming changes to support 802.1Q and/or 802.1ad VLAN filtering and
      offloads require more flexibility when configuring VLANs. The VSI VLAN
      interface will allow flexibility for configuring VLANs for all VSI
      types. Add new files to separate the VSI VLAN ops and move functions to
      make the code more organized.
      Signed-off-by: NBrett Creeley <brett.creeley@intel.com>
      Tested-by: NGurucharan G <gurucharanx.g@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      bc42afa9
  6. 30 12月, 2021 1 次提交
  7. 16 12月, 2021 2 次提交
    • J
      ice: support immediate firmware activation via devlink reload · 399e27db
      Jacob Keller 提交于
      The ice hardware contains an embedded chip with firmware which can be
      updated using devlink flash. The firmware which runs on this chip is
      referred to as the Embedded Management Processor firmware (EMP
      firmware).
      
      Activating the new firmware image currently requires that the system be
      rebooted. This is not ideal as rebooting the system can cause unwanted
      downtime.
      
      In practical terms, activating the firmware does not always require a
      full system reboot. In many cases it is possible to activate the EMP
      firmware immediately. There are a couple of different scenarios to
      cover.
      
       * The EMP firmware itself can be reloaded by issuing a special update
         to the device called an Embedded Management Processor reset (EMP
         reset). This reset causes the device to reset and reload the EMP
         firmware.
      
       * PCI configuration changes are only reloaded after a cold PCIe reset.
         Unfortunately there is no generic way to trigger this for a PCIe
         device without a system reboot.
      
      When performing a flash update, firmware is capable of responding with
      some information about the specific update requirements.
      
      The driver updates the flash by programming a secondary inactive bank
      with the contents of the new image, and then issuing a command to
      request to switch the active bank starting from the next load.
      
      The response to the final command for updating the inactive NVM flash
      bank includes an indication of the minimum reset required to fully
      update the device. This can be one of the following:
      
       * A full power on is required
       * A cold PCIe reset is required
       * An EMP reset is required
      
      The response to the command to switch flash banks includes an indication
      of whether or not the firmware will allow an EMP reset request.
      
      For most updates, an EMP reset is sufficient to load the new EMP
      firmware without issues. In some cases, this reset is not sufficient
      because the PCI configuration space has changed. When this could cause
      incompatibility with the new EMP image, the firmware is capable of
      rejecting the EMP reset request.
      
      Add logic to ice_fw_update.c to handle the response data flash update
      AdminQ commands.
      
      For the reset level, issue a devlink status notification informing the
      user of how to complete the update with a simple suggestion like
      "Activate new firmware by rebooting the system".
      
      Cache the status of whether or not firmware will restrict the EMP reset
      for use in implementing devlink reload.
      
      Implement support for devlink reload with the "fw_activate" flag. This
      allows user space to request the firmware be activated immediately.
      
      For the .reload_down handler, we will issue a request for the EMP reset
      using the appropriate firmware AdminQ command. If we know that the
      firmware will not allow an EMP reset, simply exit with a suitable
      netlink extended ACK message indicating that the EMP reset is not
      available.
      
      For the .reload_up handler, simply wait until the driver has finished
      resetting. Logic to handle processing of an EMP reset already exists in
      the driver as part of its reset and rebuild flows.
      
      Implement support for the devlink reload interface with the
      "fw_activate" action. This allows userspace to request activation of
      firmware without a reboot.
      
      Note that support for indicating the required reset and EMP reset
      restriction is not supported on old versions of firmware. The driver can
      determine if the two features are supported by checking the device
      capabilities report. I confirmed support has existed since at least
      version 5.5.2 as reported by the 'fw.mgmt' version. Support to issue the
      EMP reset request has existed in all version of the EMP firmware for the
      ice hardware.
      
      Check the device capabilities report to determine whether or not the
      indications are reported by the running firmware. If the reset
      requirement indication is not supported, always assume a full power on
      is necessary. If the reset restriction capability is not supported,
      always assume the EMP reset is available.
      
      Users can verify if the EMP reset has activated the firmware by using
      the devlink info report to check that the 'running' firmware version has
      updated. For example a user might do the following:
      
       # Check current version
       $ devlink dev info
      
       # Update the device
       $ devlink dev flash pci/0000:af:00.0 file firmware.bin
      
       # Confirm stored version updated
       $ devlink dev info
      
       # Reload to activate new firmware
       $ devlink dev reload pci/0000:af:00.0 action fw_activate
      
       # Confirm running version updated
       $ devlink dev info
      
      Finally, this change does *not* implement basic driver-only reload
      support. I did look into trying to do this. However, it requires
      significant refactor of how the ice driver probes and loads everything.
      The ice driver probe and allocation flows were not designed with such
      a reload in mind. Refactoring the flow to support this is beyond the
      scope of this change.
      Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
      Tested-by: NGurucharan G <gurucharanx.g@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      399e27db
    • J
      ice: devlink: add shadow-ram region to snapshot Shadow RAM · 78ad87da
      Jacob Keller 提交于
      We have a region for reading the contents of the NVM flash as
      a snapshot. This region does not allow reading the Shadow RAM, as it
      always passes the FLASH_ONLY bit to the low level firmware interface.
      
      Add a separate shadow-ram region which will allow snapshot of the
      current contents of the Shadow RAM. This data is built from the NVM
      contents but is distinct as the device builds up the Shadow RAM during
      initialization, so being able to snapshot its contents can be useful
      when attempting to debug flash related issues.
      
      Fix the comment description of the nvm-flash region which incorrectly
      stated that it filled the shadow-ram region, and add a comment
      explaining that the nvm-flash region does not actually read the Shadow
      RAM.
      Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
      Tested-by: NGurucharan G <gurucharanx.g@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      78ad87da
  8. 15 12月, 2021 1 次提交
  9. 23 11月, 2021 1 次提交
  10. 03 11月, 2021 1 次提交
    • B
      ice: Fix VF true promiscuous mode · 1a8c7778
      Brett Creeley 提交于
      When a VF requests promiscuous mode and it's trusted and true promiscuous
      mode is enabled the PF driver attempts to enable unicast and/or
      multicast promiscuous mode filters based on the request. This is fine,
      but there are a couple issues with the current code.
      
      [1] The define to configure the unicast promiscuous mode mask also
          includes bits to configure the multicast promiscuous mode mask, which
          causes multicast to be set/cleared unintentionally.
      [2] All 4 cases for enable/disable unicast/multicast mode are not
          handled in the promiscuous mode message handler, which causes
          unexpected results regarding the current promiscuous mode settings.
      
      To fix [1] make sure any promiscuous mask defines include the correct
      bits for each of the promiscuous modes.
      
      To fix [2] make sure that all 4 cases are handled since there are 2 bits
      (FLAG_VF_UNICAST_PROMISC and FLAG_VF_MULTICAST_PROMISC) that can be
      either set or cleared. Also, since either unicast and/or multicast
      promiscuous configuration can fail, introduce two separate error values
      to handle each of these cases.
      
      Fixes: 01b5e89a ("ice: Add VF promiscuous support")
      Signed-off-by: NBrett Creeley <brett.creeley@intel.com>
      Tested-by: NTony Brelinski <tony.brelinski@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      1a8c7778
  11. 29 10月, 2021 2 次提交
  12. 21 10月, 2021 3 次提交
  13. 15 10月, 2021 3 次提交
    • M
      ice: make use of ice_for_each_* macros · 2faf63b6
      Maciej Fijalkowski 提交于
      Go through the code base and use ice_for_each_* macros.  While at it,
      introduce ice_for_each_xdp_txq() macro that can be used for looping over
      xdp_rings array.
      
      Commit is not introducing any new functionality.
      Signed-off-by: NMaciej Fijalkowski <maciej.fijalkowski@intel.com>
      Tested-by: NGurucharan G <gurucharanx.g@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      2faf63b6
    • M
      ice: introduce XDP_TX fallback path · 22bf877e
      Maciej Fijalkowski 提交于
      Under rare circumstances there might be a situation where a requirement
      of having XDP Tx queue per CPU could not be fulfilled and some of the Tx
      resources have to be shared between CPUs. This yields a need for placing
      accesses to xdp_ring inside a critical section protected by spinlock.
      These accesses happen to be in the hot path, so let's introduce the
      static branch that will be triggered from the control plane when driver
      could not provide Tx queue dedicated for XDP on each CPU.
      
      Currently, the design that has been picked is to allow any number of XDP
      Tx queues that is at least half of a count of CPUs that platform has.
      For lower number driver will bail out with a response to user that there
      were not enough Tx resources that would allow configuring XDP. The
      sharing of rings is signalled via static branch enablement which in turn
      indicates that lock for xdp_ring accesses needs to be taken in hot path.
      
      Approach based on static branch has no impact on performance of a
      non-fallback path. One thing that is needed to be mentioned is a fact
      that the static branch will act as a global driver switch, meaning that
      if one PF got out of Tx resources, then other PFs that ice driver is
      servicing will suffer. However, given the fact that HW that ice driver
      is handling has 1024 Tx queues per each PF, this is currently an
      unlikely scenario.
      Signed-off-by: NMaciej Fijalkowski <maciej.fijalkowski@intel.com>
      Tested-by: NGeorge Kuruvinakunnel <george.kuruvinakunnel@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      22bf877e
    • M
      ice: split ice_ring onto Tx/Rx separate structs · e72bba21
      Maciej Fijalkowski 提交于
      While it was convenient to have a generic ring structure that served
      both Tx and Rx sides, next commits are going to introduce several
      Tx-specific fields, so in order to avoid hurting the Rx side, let's
      pull out the Tx ring onto new ice_tx_ring and ice_rx_ring structs.
      
      Rx ring could be handled by the old ice_ring which would reduce the code
      churn within this patch, but this would make things asymmetric.
      
      Make the union out of the ring container within ice_q_vector so that it
      is possible to iterate over newly introduced ice_tx_ring.
      
      Remove the @size as it's only accessed from control path and it can be
      calculated pretty easily.
      
      Change definitions of ice_update_ring_stats and
      ice_fetch_u64_stats_per_ring so that they are ring agnostic and can be
      used for both Rx and Tx rings.
      
      Sizes of Rx and Tx ring structs are 256 and 192 bytes, respectively. In
      Rx ring xdp_rxq_info occupies its own cacheline, so it's the major
      difference now.
      Signed-off-by: NMaciej Fijalkowski <maciej.fijalkowski@intel.com>
      Tested-by: NGurucharan G <gurucharanx.g@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      e72bba21
  14. 14 10月, 2021 1 次提交
  15. 12 10月, 2021 1 次提交
  16. 08 10月, 2021 6 次提交
    • W
      ice: add port representor ethtool ops and stats · 7aae80ce
      Wojciech Drewek 提交于
      Introduce the following ethtool operations for VF's representor:
      	-get_drvinfo
      	-get_strings
      	-get_ethtool_stats
      	-get_sset_count
      	-get_link
      
      In all cases, existing operations were used with minor
      changes which allow us to detect if ethtool op was called for
      representor. Only VF VSI stats will be available for representor.
      
      Implement ndo_get_stats64 for port representor. This will update
      VF VSI stats and read them.
      Signed-off-by: NWojciech Drewek <wojciech.drewek@intel.com>
      Tested-by: NSandeep Penigalapati <sandeep.penigalapati@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      7aae80ce
    • G
      ice: introduce new type of VSI for switchdev · f66756e0
      Grzegorz Nitka 提交于
      New type of VSI has to be defined for switchdev control plane
      VSI. Number of allocated Tx and Rx queue has to be equal to
      amount of VFs, because each port representor should have one
      Tx and Rx queue.
      
      Also to not increase number of used irqs too much, control plane
      VSI uses only one q_vector and handle all queues in one irq.
      To allow handling all queues in one irq , new function to clean
      msix for eswitch was introduced. This function will schedule napi
      for each representor instead of scheduling it only for one like in
      normal clean irq function.
      
      Only one additional msix has to be requested. Always try to request
      it in ice_ena_msix_range function.
      Signed-off-by: NGrzegorz Nitka <grzegorz.nitka@intel.com>
      Tested-by: NSandeep Penigalapati <sandeep.penigalapati@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      f66756e0
    • G
      ice: set and release switchdev environment · 1a1c40df
      Grzegorz Nitka 提交于
      Switchdev environment has to be set up when user create VFs
      and eswitch mode is switchdev. Release is done when user
      delete all VFs.
      
      Data path in this implementation is based on control plane VSI.
      This VSI is used to pass traffic from port representors to
      corresponding VFs and vice versa. Default TX rule has to be
      added to forward packet to control plane VSI. This will redirect
      packets from VFs which don't match other rules to control plane
      VSI.
      
      On RX side default rule is added on uplink VSI to receive all
      traffic that doesn't match other rules. When setting switchdev
      environment all other rules from VFs should be removed. Packet to
      VFs will be forwarded by control plane VSI.
      
      As VF without any mac rules can't send any packet because of
      antispoof mechanism, VSI antispoof should be turned off on each VFs.
      
      To send packet from representor to correct VSI, destination VSI
      field in TX descriptor will have to be filled. Allow that by
      setting destination override bit in control plane VSI security config.
      
      Packet from VFs will be received on control plane VSI. Driver
      should decide to which netdev forward the packet. Decision is
      made based on src_vsi field from descriptor. There is a target
      netdev list in control plane VSI struct which choose netdev
      based on src_vsi number.
      Co-developed-by: NMichal Swiatkowski <michal.swiatkowski@linux.intel.com>
      Signed-off-by: NMichal Swiatkowski <michal.swiatkowski@linux.intel.com>
      Signed-off-by: NGrzegorz Nitka <grzegorz.nitka@intel.com>
      Tested-by: NSandeep Penigalapati <sandeep.penigalapati@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      1a1c40df
    • M
      ice: introduce VF port representor · 37165e3f
      Michal Swiatkowski 提交于
      Port representor is used to manage VF from host side. To allow
      it each created representor registers netdevice with random hw
      address. Also devlink port is created for all representors.
      
      Port representor name is created based on switch id or managed
      by devlink core if devlink port was registered with success.
      
      Open and stop ndo ops are implemented to allow managing the VF
      link state. Link state is tracked in VF struct.
      
      Struct ice_netdev_priv is extended by pointer to representor
      field. This is needed to get correct representor from netdev
      struct mostly used in ndo calls.
      
      Implement helper functions to check if given netdev is netdev of
      port representor (ice_is_port_repr_netdev) and to get representor
      from netdev (ice_netdev_to_repr).
      
      As driver mostly will create or destroy port representors on all
      VFs instead of on single one, write functions to add and remove
      representor for each VF.
      
      Representor struct contains pointer to source VSI, which is VSI
      configured on VF, backpointer to VF, backpointer to netdev,
      q_vector pointer and metadata_dst which will be used in data path.
      Co-developed-by: NGrzegorz Nitka <grzegorz.nitka@intel.com>
      Signed-off-by: NGrzegorz Nitka <grzegorz.nitka@intel.com>
      Signed-off-by: NMichal Swiatkowski <michal.swiatkowski@linux.intel.com>
      Tested-by: NSandeep Penigalapati <sandeep.penigalapati@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      37165e3f
    • W
      ice: Move devlink port to PF/VF struct · 2ae0aa47
      Wojciech Drewek 提交于
      Keeping devlink port inside VSI data structure causes some issues.
      Since VF VSI is released during reset that means that we have to
      unregister devlink port and register it again every time reset is
      triggered. With the new changes in devlink API it
      might cause deadlock issues. After calling
      devlink_port_register/devlink_port_unregister devlink API is going to
      lock rtnl_mutex. It's an issue when VF reset is triggered in netlink
      operation context (like setting VF MAC address or VLAN),
      because rtnl_lock is already taken by netlink. Another call of
      rtnl_lock from devlink API results in dead-lock.
      
      By moving devlink port to PF/VF we avoid creating/destroying it
      during reset. Since this patch, devlink ports are created during
      ice_probe, destroyed during ice_remove for PF and created during
      ice_repr_add, destroyed during ice_repr_rem for VF.
      Signed-off-by: NWojciech Drewek <wojciech.drewek@intel.com>
      Tested-by: NSandeep Penigalapati <sandeep.penigalapati@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      2ae0aa47
    • M
      ice: support basic E-Switch mode control · 3ea9bd5d
      Michal Swiatkowski 提交于
      Write set and get eswitch mode functions used by devlink
      ops. Use new pf struct member eswitch_mode to track current
      eswitch mode in driver.
      
      Changing eswitch mode is only allowed when there are no
      VFs created.
      
      Create new file for eswitch related code.
      
      Add config flag ICE_SWITCHDEV to allow user to choose if
      switchdev support should be enabled or disabled.
      
      Use case examples:
      - show current eswitch mode ('legacy' is the default one)
      [root@localhost]# devlink dev eswitch show pci/0000:03:00.1
      pci/0000:03:00.1: mode legacy
      
      - move to 'switchdev' mode
      [root@localhost]# devlink dev eswitch set pci/0000:03:00.1 mode
      switchdev
      [root@localhost]# devlink dev eswitch show pci/0000:03:00.1
      pci/0000:03:00.1: mode switchdev
      
      - create 2 VFs
      [root@localhost]# echo 2 > /sys/class/net/ens4f1/device/sriov_numvfs
      
      - unsuccessful attempt to change eswitch mode while VFs are created
      [root@localhost]# devlink dev eswitch set pci/0000:03:00.1 mode legacy
      devlink answers: Operation not supported
      
      - destroy VFs
      [root@localhost]# echo 0 > /sys/class/net/ens4f1/device/sriov_numvfs
      
      - restore 'legacy' mode
      [root@localhost]# devlink dev eswitch set pci/0000:03:00.1 mode legacy
      [root@localhost]# devlink dev eswitch show pci/0000:03:00.1
      pci/0000:03:00.1: mode legacy
      Co-developed-by: NGrzegorz Nitka <grzegorz.nitka@intel.com>
      Signed-off-by: NGrzegorz Nitka <grzegorz.nitka@intel.com>
      Signed-off-by: NMichal Swiatkowski <michal.swiatkowski@linux.intel.com>
      Tested-by: NSandeep Penigalapati <sandeep.penigalapati@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      3ea9bd5d
  17. 29 9月, 2021 1 次提交
  18. 10 9月, 2021 1 次提交
    • D
      ice: Correctly deal with PFs that do not support RDMA · bfe84435
      Dave Ertman 提交于
      There are two cases where the current PF does not support RDMA
      functionality.  The first is if the NVM loaded on the device is set
      to not support RDMA (common_caps.rdma is false).  The second is if
      the kernel bonding driver has included the current PF in an active
      link aggregate.
      
      When the driver has determined that this PF does not support RDMA, then
      auxiliary devices should not be created on the auxiliary bus.  Without
      a device on the auxiliary bus, even if the irdma driver is present, there
      will be no RDMA activity attempted on this PF.
      
      Currently, in the reset flow, an attempt to create auxiliary devices is
      performed without regard to the ability of the PF.  There needs to be a
      check in ice_aux_plug_dev (as the central point that creates auxiliary
      devices) to see if the PF is in a state to support the functionality.
      
      When disabling and re-enabling RDMA due to the inclusion/removal of the PF
      in a link aggregate, we also need to set/clear the bit which controls
      auxiliary device creation so that a reset recovery in a link aggregate
      situation doesn't try to create auxiliary devices when it shouldn't.
      
      Fixes: f9f5301e ("ice: Register auxiliary device to provide RDMA")
      Reported-by: NYongxin Liu <yongxin.liu@windriver.com>
      Signed-off-by: NDave Ertman <david.m.ertman@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bfe84435
  19. 10 8月, 2021 1 次提交
  20. 11 6月, 2021 2 次提交
    • J
      ice: register 1588 PTP clock device object for E810 devices · 06c16d89
      Jacob Keller 提交于
      Add a new ice_ptp.c file for holding the basic PTP clock interface
      functions. If the device supports PTP, call the new ice_ptp_init and
      ice_ptp_release functions where appropriate.
      
      If the function owns the hardware resource associated with the PTP
      hardware clock, register with the PTP_1588_CLOCK infrastructure to
      allocate a new clock object that represents the device hardware clock.
      
      Implement basic functionality for reading and setting the clock time,
      performing clock adjustments, and adjusting the clock frequency.
      
      Future changes will introduce functionality for handling related
      features including Tx and Rx timestamps.
      Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
      Tested-by: NTony Brelinski <tonyx.brelinski@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      06c16d89
    • J
      ice: add support for sideband messages · 8f5ee3c4
      Jacob Keller 提交于
      In order to support certain device features, including enabling the PTP
      hardware clock, the ice driver needs to control some registers on the
      device PHY.
      
      These registers are accessed by sending sideband messages. For some
      hardware, these messages must be sent over the device admin queue, while
      other hardware has a dedicated control queue for the sideband messages.
      
      Add the neighbor device message structure for sending a message to the
      neighboring device. Where supported, initialize the sideband control
      queue and handle cleanup.
      
      Add a wrapper function for sending sideband control queue messages that
      read or write a neighboring device register.
      
      Because some devices send sideband messages over the AdminQ, also
      increase the length of the admin queue to allow more messages to be
      queued up. This is important because the sideband messages add
      additional pressure on the AQ usage.
      
      This support will be used in following patches to enable support for
      CONFIG_1588_PTP_CLOCK.
      Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
      Tested-by: NTony Brelinski <tonyx.brelinski@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      8f5ee3c4
  21. 07 6月, 2021 2 次提交
  22. 03 6月, 2021 1 次提交
    • M
      ice: track AF_XDP ZC enabled queues in bitmap · e102db78
      Maciej Fijalkowski 提交于
      Commit c7a21904 ("ice: Remove xsk_buff_pool from VSI structure")
      silently introduced a regression and broke the Tx side of AF_XDP in copy
      mode. xsk_pool on ice_ring is set only based on the existence of the XDP
      prog on the VSI which in turn picks ice_clean_tx_irq_zc to be executed.
      That is not something that should happen for copy mode as it should use
      the regular data path ice_clean_tx_irq.
      
      This results in a following splat when xdpsock is run in txonly or l2fwd
      scenarios in copy mode:
      
      <snip>
      [  106.050195] BUG: kernel NULL pointer dereference, address: 0000000000000030
      [  106.057269] #PF: supervisor read access in kernel mode
      [  106.062493] #PF: error_code(0x0000) - not-present page
      [  106.067709] PGD 0 P4D 0
      [  106.070293] Oops: 0000 [#1] PREEMPT SMP NOPTI
      [  106.074721] CPU: 61 PID: 0 Comm: swapper/61 Not tainted 5.12.0-rc2+ #45
      [  106.081436] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0008.031920191559 03/19/2019
      [  106.092027] RIP: 0010:xp_raw_get_dma+0x36/0x50
      [  106.096551] Code: 74 14 48 b8 ff ff ff ff ff ff 00 00 48 21 f0 48 c1 ee 30 48 01 c6 48 8b 87 90 00 00 00 48 89 f2 81 e6 ff 0f 00 00 48 c1 ea 0c <48> 8b 04 d0 48 83 e0 fe 48 01 f0 c3 66 66 2e 0f 1f 84 00 00 00 00
      [  106.115588] RSP: 0018:ffffc9000d694e50 EFLAGS: 00010206
      [  106.120893] RAX: 0000000000000000 RBX: ffff88984b8c8a00 RCX: ffff889852581800
      [  106.128137] RDX: 0000000000000006 RSI: 0000000000000000 RDI: ffff88984cd8b800
      [  106.135383] RBP: ffff888123b50001 R08: ffff889896800000 R09: 0000000000000800
      [  106.142628] R10: 0000000000000000 R11: ffffffff826060c0 R12: 00000000000000ff
      [  106.149872] R13: 0000000000000000 R14: 0000000000000040 R15: ffff888123b50018
      [  106.157117] FS:  0000000000000000(0000) GS:ffff8897e0f40000(0000) knlGS:0000000000000000
      [  106.165332] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  106.171163] CR2: 0000000000000030 CR3: 000000000560a004 CR4: 00000000007706e0
      [  106.178408] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [  106.185653] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [  106.192898] PKRU: 55555554
      [  106.195653] Call Trace:
      [  106.198143]  <IRQ>
      [  106.200196]  ice_clean_tx_irq_zc+0x183/0x2a0 [ice]
      [  106.205087]  ice_napi_poll+0x3e/0x590 [ice]
      [  106.209356]  __napi_poll+0x2a/0x160
      [  106.212911]  net_rx_action+0xd6/0x200
      [  106.216634]  __do_softirq+0xbf/0x29b
      [  106.220274]  irq_exit_rcu+0x88/0xc0
      [  106.223819]  common_interrupt+0x7b/0xa0
      [  106.227719]  </IRQ>
      [  106.229857]  asm_common_interrupt+0x1e/0x40
      </snip>
      
      Fix this by introducing the bitmap of queues that are zero-copy enabled,
      where each bit, corresponding to a queue id that xsk pool is being
      configured on, will be set/cleared within ice_xsk_pool_{en,dis}able and
      checked within ice_xsk_pool(). The latter is a function used for
      deciding which napi poll routine is executed.
      Idea is being taken from our other drivers such as i40e and ixgbe.
      
      Fixes: c7a21904 ("ice: Remove xsk_buff_pool from VSI structure")
      Signed-off-by: NMaciej Fijalkowski <maciej.fijalkowski@intel.com>
      Tested-by: NKiran Bhandare <kiranx.bhandare@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      e102db78
  23. 29 5月, 2021 3 次提交