1. 08 10月, 2021 3 次提交
    • M
      ice: introduce VF port representor · 37165e3f
      Michal Swiatkowski 提交于
      Port representor is used to manage VF from host side. To allow
      it each created representor registers netdevice with random hw
      address. Also devlink port is created for all representors.
      
      Port representor name is created based on switch id or managed
      by devlink core if devlink port was registered with success.
      
      Open and stop ndo ops are implemented to allow managing the VF
      link state. Link state is tracked in VF struct.
      
      Struct ice_netdev_priv is extended by pointer to representor
      field. This is needed to get correct representor from netdev
      struct mostly used in ndo calls.
      
      Implement helper functions to check if given netdev is netdev of
      port representor (ice_is_port_repr_netdev) and to get representor
      from netdev (ice_netdev_to_repr).
      
      As driver mostly will create or destroy port representors on all
      VFs instead of on single one, write functions to add and remove
      representor for each VF.
      
      Representor struct contains pointer to source VSI, which is VSI
      configured on VF, backpointer to VF, backpointer to netdev,
      q_vector pointer and metadata_dst which will be used in data path.
      Co-developed-by: NGrzegorz Nitka <grzegorz.nitka@intel.com>
      Signed-off-by: NGrzegorz Nitka <grzegorz.nitka@intel.com>
      Signed-off-by: NMichal Swiatkowski <michal.swiatkowski@linux.intel.com>
      Tested-by: NSandeep Penigalapati <sandeep.penigalapati@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      37165e3f
    • W
      ice: Move devlink port to PF/VF struct · 2ae0aa47
      Wojciech Drewek 提交于
      Keeping devlink port inside VSI data structure causes some issues.
      Since VF VSI is released during reset that means that we have to
      unregister devlink port and register it again every time reset is
      triggered. With the new changes in devlink API it
      might cause deadlock issues. After calling
      devlink_port_register/devlink_port_unregister devlink API is going to
      lock rtnl_mutex. It's an issue when VF reset is triggered in netlink
      operation context (like setting VF MAC address or VLAN),
      because rtnl_lock is already taken by netlink. Another call of
      rtnl_lock from devlink API results in dead-lock.
      
      By moving devlink port to PF/VF we avoid creating/destroying it
      during reset. Since this patch, devlink ports are created during
      ice_probe, destroyed during ice_remove for PF and created during
      ice_repr_add, destroyed during ice_repr_rem for VF.
      Signed-off-by: NWojciech Drewek <wojciech.drewek@intel.com>
      Tested-by: NSandeep Penigalapati <sandeep.penigalapati@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      2ae0aa47
    • M
      ice: support basic E-Switch mode control · 3ea9bd5d
      Michal Swiatkowski 提交于
      Write set and get eswitch mode functions used by devlink
      ops. Use new pf struct member eswitch_mode to track current
      eswitch mode in driver.
      
      Changing eswitch mode is only allowed when there are no
      VFs created.
      
      Create new file for eswitch related code.
      
      Add config flag ICE_SWITCHDEV to allow user to choose if
      switchdev support should be enabled or disabled.
      
      Use case examples:
      - show current eswitch mode ('legacy' is the default one)
      [root@localhost]# devlink dev eswitch show pci/0000:03:00.1
      pci/0000:03:00.1: mode legacy
      
      - move to 'switchdev' mode
      [root@localhost]# devlink dev eswitch set pci/0000:03:00.1 mode
      switchdev
      [root@localhost]# devlink dev eswitch show pci/0000:03:00.1
      pci/0000:03:00.1: mode switchdev
      
      - create 2 VFs
      [root@localhost]# echo 2 > /sys/class/net/ens4f1/device/sriov_numvfs
      
      - unsuccessful attempt to change eswitch mode while VFs are created
      [root@localhost]# devlink dev eswitch set pci/0000:03:00.1 mode legacy
      devlink answers: Operation not supported
      
      - destroy VFs
      [root@localhost]# echo 0 > /sys/class/net/ens4f1/device/sriov_numvfs
      
      - restore 'legacy' mode
      [root@localhost]# devlink dev eswitch set pci/0000:03:00.1 mode legacy
      [root@localhost]# devlink dev eswitch show pci/0000:03:00.1
      pci/0000:03:00.1: mode legacy
      Co-developed-by: NGrzegorz Nitka <grzegorz.nitka@intel.com>
      Signed-off-by: NGrzegorz Nitka <grzegorz.nitka@intel.com>
      Signed-off-by: NMichal Swiatkowski <michal.swiatkowski@linux.intel.com>
      Tested-by: NSandeep Penigalapati <sandeep.penigalapati@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      3ea9bd5d
  2. 29 9月, 2021 1 次提交
  3. 10 9月, 2021 1 次提交
    • D
      ice: Correctly deal with PFs that do not support RDMA · bfe84435
      Dave Ertman 提交于
      There are two cases where the current PF does not support RDMA
      functionality.  The first is if the NVM loaded on the device is set
      to not support RDMA (common_caps.rdma is false).  The second is if
      the kernel bonding driver has included the current PF in an active
      link aggregate.
      
      When the driver has determined that this PF does not support RDMA, then
      auxiliary devices should not be created on the auxiliary bus.  Without
      a device on the auxiliary bus, even if the irdma driver is present, there
      will be no RDMA activity attempted on this PF.
      
      Currently, in the reset flow, an attempt to create auxiliary devices is
      performed without regard to the ability of the PF.  There needs to be a
      check in ice_aux_plug_dev (as the central point that creates auxiliary
      devices) to see if the PF is in a state to support the functionality.
      
      When disabling and re-enabling RDMA due to the inclusion/removal of the PF
      in a link aggregate, we also need to set/clear the bit which controls
      auxiliary device creation so that a reset recovery in a link aggregate
      situation doesn't try to create auxiliary devices when it shouldn't.
      
      Fixes: f9f5301e ("ice: Register auxiliary device to provide RDMA")
      Reported-by: NYongxin Liu <yongxin.liu@windriver.com>
      Signed-off-by: NDave Ertman <david.m.ertman@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bfe84435
  4. 10 8月, 2021 1 次提交
  5. 11 6月, 2021 2 次提交
    • J
      ice: register 1588 PTP clock device object for E810 devices · 06c16d89
      Jacob Keller 提交于
      Add a new ice_ptp.c file for holding the basic PTP clock interface
      functions. If the device supports PTP, call the new ice_ptp_init and
      ice_ptp_release functions where appropriate.
      
      If the function owns the hardware resource associated with the PTP
      hardware clock, register with the PTP_1588_CLOCK infrastructure to
      allocate a new clock object that represents the device hardware clock.
      
      Implement basic functionality for reading and setting the clock time,
      performing clock adjustments, and adjusting the clock frequency.
      
      Future changes will introduce functionality for handling related
      features including Tx and Rx timestamps.
      Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
      Tested-by: NTony Brelinski <tonyx.brelinski@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      06c16d89
    • J
      ice: add support for sideband messages · 8f5ee3c4
      Jacob Keller 提交于
      In order to support certain device features, including enabling the PTP
      hardware clock, the ice driver needs to control some registers on the
      device PHY.
      
      These registers are accessed by sending sideband messages. For some
      hardware, these messages must be sent over the device admin queue, while
      other hardware has a dedicated control queue for the sideband messages.
      
      Add the neighbor device message structure for sending a message to the
      neighboring device. Where supported, initialize the sideband control
      queue and handle cleanup.
      
      Add a wrapper function for sending sideband control queue messages that
      read or write a neighboring device register.
      
      Because some devices send sideband messages over the AdminQ, also
      increase the length of the admin queue to allow more messages to be
      queued up. This is important because the sideband messages add
      additional pressure on the AQ usage.
      
      This support will be used in following patches to enable support for
      CONFIG_1588_PTP_CLOCK.
      Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
      Tested-by: NTony Brelinski <tonyx.brelinski@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      8f5ee3c4
  6. 07 6月, 2021 2 次提交
  7. 03 6月, 2021 1 次提交
    • M
      ice: track AF_XDP ZC enabled queues in bitmap · e102db78
      Maciej Fijalkowski 提交于
      Commit c7a21904 ("ice: Remove xsk_buff_pool from VSI structure")
      silently introduced a regression and broke the Tx side of AF_XDP in copy
      mode. xsk_pool on ice_ring is set only based on the existence of the XDP
      prog on the VSI which in turn picks ice_clean_tx_irq_zc to be executed.
      That is not something that should happen for copy mode as it should use
      the regular data path ice_clean_tx_irq.
      
      This results in a following splat when xdpsock is run in txonly or l2fwd
      scenarios in copy mode:
      
      <snip>
      [  106.050195] BUG: kernel NULL pointer dereference, address: 0000000000000030
      [  106.057269] #PF: supervisor read access in kernel mode
      [  106.062493] #PF: error_code(0x0000) - not-present page
      [  106.067709] PGD 0 P4D 0
      [  106.070293] Oops: 0000 [#1] PREEMPT SMP NOPTI
      [  106.074721] CPU: 61 PID: 0 Comm: swapper/61 Not tainted 5.12.0-rc2+ #45
      [  106.081436] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0008.031920191559 03/19/2019
      [  106.092027] RIP: 0010:xp_raw_get_dma+0x36/0x50
      [  106.096551] Code: 74 14 48 b8 ff ff ff ff ff ff 00 00 48 21 f0 48 c1 ee 30 48 01 c6 48 8b 87 90 00 00 00 48 89 f2 81 e6 ff 0f 00 00 48 c1 ea 0c <48> 8b 04 d0 48 83 e0 fe 48 01 f0 c3 66 66 2e 0f 1f 84 00 00 00 00
      [  106.115588] RSP: 0018:ffffc9000d694e50 EFLAGS: 00010206
      [  106.120893] RAX: 0000000000000000 RBX: ffff88984b8c8a00 RCX: ffff889852581800
      [  106.128137] RDX: 0000000000000006 RSI: 0000000000000000 RDI: ffff88984cd8b800
      [  106.135383] RBP: ffff888123b50001 R08: ffff889896800000 R09: 0000000000000800
      [  106.142628] R10: 0000000000000000 R11: ffffffff826060c0 R12: 00000000000000ff
      [  106.149872] R13: 0000000000000000 R14: 0000000000000040 R15: ffff888123b50018
      [  106.157117] FS:  0000000000000000(0000) GS:ffff8897e0f40000(0000) knlGS:0000000000000000
      [  106.165332] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  106.171163] CR2: 0000000000000030 CR3: 000000000560a004 CR4: 00000000007706e0
      [  106.178408] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [  106.185653] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [  106.192898] PKRU: 55555554
      [  106.195653] Call Trace:
      [  106.198143]  <IRQ>
      [  106.200196]  ice_clean_tx_irq_zc+0x183/0x2a0 [ice]
      [  106.205087]  ice_napi_poll+0x3e/0x590 [ice]
      [  106.209356]  __napi_poll+0x2a/0x160
      [  106.212911]  net_rx_action+0xd6/0x200
      [  106.216634]  __do_softirq+0xbf/0x29b
      [  106.220274]  irq_exit_rcu+0x88/0xc0
      [  106.223819]  common_interrupt+0x7b/0xa0
      [  106.227719]  </IRQ>
      [  106.229857]  asm_common_interrupt+0x1e/0x40
      </snip>
      
      Fix this by introducing the bitmap of queues that are zero-copy enabled,
      where each bit, corresponding to a queue id that xsk pool is being
      configured on, will be set/cleared within ice_xsk_pool_{en,dis}able and
      checked within ice_xsk_pool(). The latter is a function used for
      deciding which napi poll routine is executed.
      Idea is being taken from our other drivers such as i40e and ixgbe.
      
      Fixes: c7a21904 ("ice: Remove xsk_buff_pool from VSI structure")
      Signed-off-by: NMaciej Fijalkowski <maciej.fijalkowski@intel.com>
      Tested-by: NKiran Bhandare <kiranx.bhandare@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      e102db78
  8. 29 5月, 2021 3 次提交
  9. 23 4月, 2021 1 次提交
  10. 15 4月, 2021 4 次提交
  11. 08 4月, 2021 1 次提交
  12. 01 4月, 2021 3 次提交
  13. 30 3月, 2021 3 次提交
  14. 23 3月, 2021 2 次提交
  15. 23 2月, 2021 1 次提交
  16. 09 2月, 2021 3 次提交
  17. 06 2月, 2021 1 次提交
  18. 27 1月, 2021 1 次提交
    • B
      ice: Fix MSI-X vector fallback logic · f3fe97f6
      Brett Creeley 提交于
      The current MSI-X enablement logic tries to enable best-case MSI-X
      vectors and if that fails we only support a bare-minimum set. This
      includes a single MSI-X for 1 Tx and 1 Rx queue and a single MSI-X
      for the OICR interrupt. Unfortunately, the driver fails to load when we
      don't get as many MSI-X as requested for a couple reasons.
      
      First, the code to allocate MSI-X in the driver tries to allocate
      num_online_cpus() MSI-X for LAN traffic without caring about the number
      of MSI-X actually enabled/requested from the kernel for LAN traffic.
      So, when calling ice_get_res() for the PF VSI, it returns failure
      because the number of available vectors is less than requested. Fix
      this by not allowing the PF VSI to allocation  more than
      pf->num_lan_msix MSI-X vectors and pf->num_lan_msix Rx/Tx queues.
      Limiting the number of queues is done because we don't want more than
      1 Tx/Rx queue per interrupt due to performance conerns.
      
      Second, the driver assigns pf->num_lan_msix = 2, to account for LAN
      traffic and the OICR. However, pf->num_lan_msix is only meant for LAN
      MSI-X. This is causing a failure when the PF VSI tries to
      allocate/reserve the minimum pf->num_lan_msix because the OICR MSI-X has
      already been reserved, so there may not be enough MSI-X vectors left.
      Fix this by setting pf->num_lan_msix = 1 for the failure case. Then the
      ICE_MIN_MSIX accounts for the LAN MSI-X and the OICR MSI-X needed for
      the failure case.
      
      Update the related defines used in ice_ena_msix_range() to align with
      the above behavior and remove the unused RDMA defines because RDMA is
      currently not supported. Also, remove the now incorrect comment.
      
      Fixes: 152b978a ("ice: Rework ice_ena_msix_range")
      Signed-off-by: NBrett Creeley <brett.creeley@intel.com>
      Tested-by: NTony Brelinski <tonyx.brelinski@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      f3fe97f6
  19. 10 12月, 2020 1 次提交
  20. 10 10月, 2020 1 次提交
    • J
      ice: refactor devlink_port to be per-VSI · 48d40025
      Jacob Keller 提交于
      Currently, the devlink_port structure is stored within the ice_pf. This
      made sense because we create a single devlink_port for each PF. This
      setup does not mesh with the abstractions in the driver very well, and
      led to a flow where we accidentally call devlink_port_unregister twice
      during error cleanup.
      
      In particular, if devlink_port_register or devlink_port_unregister are
      called twice, this leads to a kernel panic. This appears to occur during
      some possible flows while cleaning up from a failure during driver
      probe.
      
      If register_netdev fails, then we will call devlink_port_unregister in
      ice_cfg_netdev as it cleans up. Later, we again call
      devlink_port_unregister since we assume that we must cleanup the port
      that is associated with the PF structure.
      
      This occurs because we cleanup the devlink_port for the main PF even
      though it was not allocated. We allocated the port within a per-VSI
      function for managing the main netdev, but did not release the port when
      cleaning up that VSI, the allocation and destruction are not aligned.
      
      Instead of attempting to manage the devlink_port as part of the PF
      structure, manage it as part of the PF VSI. Doing this has advantages,
      as we can match the de-allocation of the devlink_port with the
      unregister_netdev associated with the main PF VSI.
      
      Moving the port to the VSI is preferable as it paves the way for
      handling devlink ports allocated for other purposes such as SR-IOV VFs.
      
      Since we're changing up how we allocate the devlink_port, also change
      the indexing. Originally, we indexed the port using the PF id number.
      This came from an old goal of sharing a devlink for each physical
      function. Managing devlink instances across multiple function drivers is
      not workable. Instead, lets set the port number to the logical port
      number returned by firmware and set the index using the VSI index
      (sometimes referred to as VSI handle).
      Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
      Tested-by: NAaron Brown <aaron.f.brown@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      48d40025
  21. 26 9月, 2020 1 次提交
    • J
      intel-ethernet: clean up W=1 warnings in kdoc · b50f7bca
      Jesse Brandeburg 提交于
      This takes care of all of the trivial W=1 fixes in the Intel
      Ethernet drivers, which allows developers and maintainers to
      build more of the networking tree with more complete warning
      checks.
      
      There are three classes of kdoc warnings fixed:
       - cannot understand function prototype: 'x'
       - Excess function parameter 'x' description in 'y'
       - Function parameter or member 'x' not described in 'y'
      
      All of the changes were trivial comment updates on
      function headers.
      
      Inspired by Lee Jones' series of wireless work to do the same.
      Compile tested only, and passes simple test of
      $ git ls-files *.[ch] | egrep drivers/net/ethernet/intel | \
        xargs scripts/kernel-doc -none
      Signed-off-by: NJesse Brandeburg <jesse.brandeburg@intel.com>
      Tested-by: NAaron Brown <aaron.f.brown@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b50f7bca
  22. 01 9月, 2020 1 次提交
  23. 01 8月, 2020 1 次提交
    • J
      ice: add useful statistics · a8fffd7a
      Jesse Brandeburg 提交于
      Display and count some useful hot-path statistics. The usefulness is as
      follows:
      
      - tx_restart: use to determine if the transmit ring size is too small or
        if the transmit interrupt rate is too low.
      - rx_gro_dropped: use to count drops from GRO layer, which previously were
        completely uncounted when occurring.
      - tx_busy: use to determine when the driver is miscounting number of
        descriptors needed for an skb.
      - tx_timeout: as our other drivers, count the number of times we've reset
        due to timeout because the kernel only prints a warning once per netdev.
      
      Several of these were already counted but not displayed.
      Signed-off-by: NJesse Brandeburg <jesse.brandeburg@intel.com>
      Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      a8fffd7a
  24. 29 7月, 2020 1 次提交
    • J
      ice: implement device flash update via devlink · d69ea414
      Jacob Keller 提交于
      Use the newly added pldmfw library to implement device flash update for
      the Intel ice networking device driver. This support uses the devlink
      flash update interface.
      
      The main parts of the flash include the Option ROM, the netlist module,
      and the main NVM data. The PLDM firmware file contains modules for each
      of these components.
      
      Using the pldmfw library, the provided firmware file will be scanned for
      the three major components, "fw.undi" for the Option ROM, "fw.mgmt" for
      the main NVM module containing the primary device firmware, and
      "fw.netlist" containing the netlist module.
      
      The flash is separated into two banks, the active bank containing the
      running firmware, and the inactive bank which we use for update. Each
      module is updated in a staged process. First, the inactive bank is
      erased, preparing the device for update. Second, the contents of the
      component are copied to the inactive portion of the flash. After all
      components are updated, the driver signals the device to switch the
      active bank during the next EMP reset (which would usually occur during
      the next reboot).
      
      Although the firmware AdminQ interface does report an immediate status
      for each command, the NVM erase and NVM write commands receive status
      asynchronously. The driver must not continue writing until previous
      erase and write commands have finished. The real status of the NVM
      commands is returned over the receive AdminQ. Implement a simple
      interface that uses a wait queue so that the main update thread can
      sleep until the completion status is reported by firmware. For erasing
      the inactive banks, this can take quite a while in practice.
      
      To help visualize the process to the devlink application and other
      applications based on the devlink netlink interface, status is reported
      via the devlink_flash_update_status_notify. While we do report status
      after each 4k block when writing, there is no real status we can report
      during erasing. We simply must wait for the complete module erasure to
      finish.
      
      With this implementation, basic flash update for the ice hardware is
      supported.
      Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d69ea414