1. 15 10月, 2021 2 次提交
    • M
      ice: introduce XDP_TX fallback path · 22bf877e
      Maciej Fijalkowski 提交于
      Under rare circumstances there might be a situation where a requirement
      of having XDP Tx queue per CPU could not be fulfilled and some of the Tx
      resources have to be shared between CPUs. This yields a need for placing
      accesses to xdp_ring inside a critical section protected by spinlock.
      These accesses happen to be in the hot path, so let's introduce the
      static branch that will be triggered from the control plane when driver
      could not provide Tx queue dedicated for XDP on each CPU.
      
      Currently, the design that has been picked is to allow any number of XDP
      Tx queues that is at least half of a count of CPUs that platform has.
      For lower number driver will bail out with a response to user that there
      were not enough Tx resources that would allow configuring XDP. The
      sharing of rings is signalled via static branch enablement which in turn
      indicates that lock for xdp_ring accesses needs to be taken in hot path.
      
      Approach based on static branch has no impact on performance of a
      non-fallback path. One thing that is needed to be mentioned is a fact
      that the static branch will act as a global driver switch, meaning that
      if one PF got out of Tx resources, then other PFs that ice driver is
      servicing will suffer. However, given the fact that HW that ice driver
      is handling has 1024 Tx queues per each PF, this is currently an
      unlikely scenario.
      Signed-off-by: NMaciej Fijalkowski <maciej.fijalkowski@intel.com>
      Tested-by: NGeorge Kuruvinakunnel <george.kuruvinakunnel@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      22bf877e
    • M
      ice: split ice_ring onto Tx/Rx separate structs · e72bba21
      Maciej Fijalkowski 提交于
      While it was convenient to have a generic ring structure that served
      both Tx and Rx sides, next commits are going to introduce several
      Tx-specific fields, so in order to avoid hurting the Rx side, let's
      pull out the Tx ring onto new ice_tx_ring and ice_rx_ring structs.
      
      Rx ring could be handled by the old ice_ring which would reduce the code
      churn within this patch, but this would make things asymmetric.
      
      Make the union out of the ring container within ice_q_vector so that it
      is possible to iterate over newly introduced ice_tx_ring.
      
      Remove the @size as it's only accessed from control path and it can be
      calculated pretty easily.
      
      Change definitions of ice_update_ring_stats and
      ice_fetch_u64_stats_per_ring so that they are ring agnostic and can be
      used for both Rx and Tx rings.
      
      Sizes of Rx and Tx ring structs are 256 and 192 bytes, respectively. In
      Rx ring xdp_rxq_info occupies its own cacheline, so it's the major
      difference now.
      Signed-off-by: NMaciej Fijalkowski <maciej.fijalkowski@intel.com>
      Tested-by: NGurucharan G <gurucharanx.g@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      e72bba21
  2. 14 10月, 2021 1 次提交
  3. 12 10月, 2021 1 次提交
  4. 08 10月, 2021 6 次提交
    • W
      ice: add port representor ethtool ops and stats · 7aae80ce
      Wojciech Drewek 提交于
      Introduce the following ethtool operations for VF's representor:
      	-get_drvinfo
      	-get_strings
      	-get_ethtool_stats
      	-get_sset_count
      	-get_link
      
      In all cases, existing operations were used with minor
      changes which allow us to detect if ethtool op was called for
      representor. Only VF VSI stats will be available for representor.
      
      Implement ndo_get_stats64 for port representor. This will update
      VF VSI stats and read them.
      Signed-off-by: NWojciech Drewek <wojciech.drewek@intel.com>
      Tested-by: NSandeep Penigalapati <sandeep.penigalapati@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      7aae80ce
    • G
      ice: introduce new type of VSI for switchdev · f66756e0
      Grzegorz Nitka 提交于
      New type of VSI has to be defined for switchdev control plane
      VSI. Number of allocated Tx and Rx queue has to be equal to
      amount of VFs, because each port representor should have one
      Tx and Rx queue.
      
      Also to not increase number of used irqs too much, control plane
      VSI uses only one q_vector and handle all queues in one irq.
      To allow handling all queues in one irq , new function to clean
      msix for eswitch was introduced. This function will schedule napi
      for each representor instead of scheduling it only for one like in
      normal clean irq function.
      
      Only one additional msix has to be requested. Always try to request
      it in ice_ena_msix_range function.
      Signed-off-by: NGrzegorz Nitka <grzegorz.nitka@intel.com>
      Tested-by: NSandeep Penigalapati <sandeep.penigalapati@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      f66756e0
    • G
      ice: set and release switchdev environment · 1a1c40df
      Grzegorz Nitka 提交于
      Switchdev environment has to be set up when user create VFs
      and eswitch mode is switchdev. Release is done when user
      delete all VFs.
      
      Data path in this implementation is based on control plane VSI.
      This VSI is used to pass traffic from port representors to
      corresponding VFs and vice versa. Default TX rule has to be
      added to forward packet to control plane VSI. This will redirect
      packets from VFs which don't match other rules to control plane
      VSI.
      
      On RX side default rule is added on uplink VSI to receive all
      traffic that doesn't match other rules. When setting switchdev
      environment all other rules from VFs should be removed. Packet to
      VFs will be forwarded by control plane VSI.
      
      As VF without any mac rules can't send any packet because of
      antispoof mechanism, VSI antispoof should be turned off on each VFs.
      
      To send packet from representor to correct VSI, destination VSI
      field in TX descriptor will have to be filled. Allow that by
      setting destination override bit in control plane VSI security config.
      
      Packet from VFs will be received on control plane VSI. Driver
      should decide to which netdev forward the packet. Decision is
      made based on src_vsi field from descriptor. There is a target
      netdev list in control plane VSI struct which choose netdev
      based on src_vsi number.
      Co-developed-by: NMichal Swiatkowski <michal.swiatkowski@linux.intel.com>
      Signed-off-by: NMichal Swiatkowski <michal.swiatkowski@linux.intel.com>
      Signed-off-by: NGrzegorz Nitka <grzegorz.nitka@intel.com>
      Tested-by: NSandeep Penigalapati <sandeep.penigalapati@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      1a1c40df
    • M
      ice: introduce VF port representor · 37165e3f
      Michal Swiatkowski 提交于
      Port representor is used to manage VF from host side. To allow
      it each created representor registers netdevice with random hw
      address. Also devlink port is created for all representors.
      
      Port representor name is created based on switch id or managed
      by devlink core if devlink port was registered with success.
      
      Open and stop ndo ops are implemented to allow managing the VF
      link state. Link state is tracked in VF struct.
      
      Struct ice_netdev_priv is extended by pointer to representor
      field. This is needed to get correct representor from netdev
      struct mostly used in ndo calls.
      
      Implement helper functions to check if given netdev is netdev of
      port representor (ice_is_port_repr_netdev) and to get representor
      from netdev (ice_netdev_to_repr).
      
      As driver mostly will create or destroy port representors on all
      VFs instead of on single one, write functions to add and remove
      representor for each VF.
      
      Representor struct contains pointer to source VSI, which is VSI
      configured on VF, backpointer to VF, backpointer to netdev,
      q_vector pointer and metadata_dst which will be used in data path.
      Co-developed-by: NGrzegorz Nitka <grzegorz.nitka@intel.com>
      Signed-off-by: NGrzegorz Nitka <grzegorz.nitka@intel.com>
      Signed-off-by: NMichal Swiatkowski <michal.swiatkowski@linux.intel.com>
      Tested-by: NSandeep Penigalapati <sandeep.penigalapati@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      37165e3f
    • W
      ice: Move devlink port to PF/VF struct · 2ae0aa47
      Wojciech Drewek 提交于
      Keeping devlink port inside VSI data structure causes some issues.
      Since VF VSI is released during reset that means that we have to
      unregister devlink port and register it again every time reset is
      triggered. With the new changes in devlink API it
      might cause deadlock issues. After calling
      devlink_port_register/devlink_port_unregister devlink API is going to
      lock rtnl_mutex. It's an issue when VF reset is triggered in netlink
      operation context (like setting VF MAC address or VLAN),
      because rtnl_lock is already taken by netlink. Another call of
      rtnl_lock from devlink API results in dead-lock.
      
      By moving devlink port to PF/VF we avoid creating/destroying it
      during reset. Since this patch, devlink ports are created during
      ice_probe, destroyed during ice_remove for PF and created during
      ice_repr_add, destroyed during ice_repr_rem for VF.
      Signed-off-by: NWojciech Drewek <wojciech.drewek@intel.com>
      Tested-by: NSandeep Penigalapati <sandeep.penigalapati@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      2ae0aa47
    • M
      ice: support basic E-Switch mode control · 3ea9bd5d
      Michal Swiatkowski 提交于
      Write set and get eswitch mode functions used by devlink
      ops. Use new pf struct member eswitch_mode to track current
      eswitch mode in driver.
      
      Changing eswitch mode is only allowed when there are no
      VFs created.
      
      Create new file for eswitch related code.
      
      Add config flag ICE_SWITCHDEV to allow user to choose if
      switchdev support should be enabled or disabled.
      
      Use case examples:
      - show current eswitch mode ('legacy' is the default one)
      [root@localhost]# devlink dev eswitch show pci/0000:03:00.1
      pci/0000:03:00.1: mode legacy
      
      - move to 'switchdev' mode
      [root@localhost]# devlink dev eswitch set pci/0000:03:00.1 mode
      switchdev
      [root@localhost]# devlink dev eswitch show pci/0000:03:00.1
      pci/0000:03:00.1: mode switchdev
      
      - create 2 VFs
      [root@localhost]# echo 2 > /sys/class/net/ens4f1/device/sriov_numvfs
      
      - unsuccessful attempt to change eswitch mode while VFs are created
      [root@localhost]# devlink dev eswitch set pci/0000:03:00.1 mode legacy
      devlink answers: Operation not supported
      
      - destroy VFs
      [root@localhost]# echo 0 > /sys/class/net/ens4f1/device/sriov_numvfs
      
      - restore 'legacy' mode
      [root@localhost]# devlink dev eswitch set pci/0000:03:00.1 mode legacy
      [root@localhost]# devlink dev eswitch show pci/0000:03:00.1
      pci/0000:03:00.1: mode legacy
      Co-developed-by: NGrzegorz Nitka <grzegorz.nitka@intel.com>
      Signed-off-by: NGrzegorz Nitka <grzegorz.nitka@intel.com>
      Signed-off-by: NMichal Swiatkowski <michal.swiatkowski@linux.intel.com>
      Tested-by: NSandeep Penigalapati <sandeep.penigalapati@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      3ea9bd5d
  5. 29 9月, 2021 1 次提交
  6. 10 9月, 2021 1 次提交
    • D
      ice: Correctly deal with PFs that do not support RDMA · bfe84435
      Dave Ertman 提交于
      There are two cases where the current PF does not support RDMA
      functionality.  The first is if the NVM loaded on the device is set
      to not support RDMA (common_caps.rdma is false).  The second is if
      the kernel bonding driver has included the current PF in an active
      link aggregate.
      
      When the driver has determined that this PF does not support RDMA, then
      auxiliary devices should not be created on the auxiliary bus.  Without
      a device on the auxiliary bus, even if the irdma driver is present, there
      will be no RDMA activity attempted on this PF.
      
      Currently, in the reset flow, an attempt to create auxiliary devices is
      performed without regard to the ability of the PF.  There needs to be a
      check in ice_aux_plug_dev (as the central point that creates auxiliary
      devices) to see if the PF is in a state to support the functionality.
      
      When disabling and re-enabling RDMA due to the inclusion/removal of the PF
      in a link aggregate, we also need to set/clear the bit which controls
      auxiliary device creation so that a reset recovery in a link aggregate
      situation doesn't try to create auxiliary devices when it shouldn't.
      
      Fixes: f9f5301e ("ice: Register auxiliary device to provide RDMA")
      Reported-by: NYongxin Liu <yongxin.liu@windriver.com>
      Signed-off-by: NDave Ertman <david.m.ertman@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bfe84435
  7. 10 8月, 2021 1 次提交
  8. 11 6月, 2021 2 次提交
    • J
      ice: register 1588 PTP clock device object for E810 devices · 06c16d89
      Jacob Keller 提交于
      Add a new ice_ptp.c file for holding the basic PTP clock interface
      functions. If the device supports PTP, call the new ice_ptp_init and
      ice_ptp_release functions where appropriate.
      
      If the function owns the hardware resource associated with the PTP
      hardware clock, register with the PTP_1588_CLOCK infrastructure to
      allocate a new clock object that represents the device hardware clock.
      
      Implement basic functionality for reading and setting the clock time,
      performing clock adjustments, and adjusting the clock frequency.
      
      Future changes will introduce functionality for handling related
      features including Tx and Rx timestamps.
      Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
      Tested-by: NTony Brelinski <tonyx.brelinski@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      06c16d89
    • J
      ice: add support for sideband messages · 8f5ee3c4
      Jacob Keller 提交于
      In order to support certain device features, including enabling the PTP
      hardware clock, the ice driver needs to control some registers on the
      device PHY.
      
      These registers are accessed by sending sideband messages. For some
      hardware, these messages must be sent over the device admin queue, while
      other hardware has a dedicated control queue for the sideband messages.
      
      Add the neighbor device message structure for sending a message to the
      neighboring device. Where supported, initialize the sideband control
      queue and handle cleanup.
      
      Add a wrapper function for sending sideband control queue messages that
      read or write a neighboring device register.
      
      Because some devices send sideband messages over the AdminQ, also
      increase the length of the admin queue to allow more messages to be
      queued up. This is important because the sideband messages add
      additional pressure on the AQ usage.
      
      This support will be used in following patches to enable support for
      CONFIG_1588_PTP_CLOCK.
      Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
      Tested-by: NTony Brelinski <tonyx.brelinski@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      8f5ee3c4
  9. 07 6月, 2021 2 次提交
  10. 03 6月, 2021 1 次提交
    • M
      ice: track AF_XDP ZC enabled queues in bitmap · e102db78
      Maciej Fijalkowski 提交于
      Commit c7a21904 ("ice: Remove xsk_buff_pool from VSI structure")
      silently introduced a regression and broke the Tx side of AF_XDP in copy
      mode. xsk_pool on ice_ring is set only based on the existence of the XDP
      prog on the VSI which in turn picks ice_clean_tx_irq_zc to be executed.
      That is not something that should happen for copy mode as it should use
      the regular data path ice_clean_tx_irq.
      
      This results in a following splat when xdpsock is run in txonly or l2fwd
      scenarios in copy mode:
      
      <snip>
      [  106.050195] BUG: kernel NULL pointer dereference, address: 0000000000000030
      [  106.057269] #PF: supervisor read access in kernel mode
      [  106.062493] #PF: error_code(0x0000) - not-present page
      [  106.067709] PGD 0 P4D 0
      [  106.070293] Oops: 0000 [#1] PREEMPT SMP NOPTI
      [  106.074721] CPU: 61 PID: 0 Comm: swapper/61 Not tainted 5.12.0-rc2+ #45
      [  106.081436] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0008.031920191559 03/19/2019
      [  106.092027] RIP: 0010:xp_raw_get_dma+0x36/0x50
      [  106.096551] Code: 74 14 48 b8 ff ff ff ff ff ff 00 00 48 21 f0 48 c1 ee 30 48 01 c6 48 8b 87 90 00 00 00 48 89 f2 81 e6 ff 0f 00 00 48 c1 ea 0c <48> 8b 04 d0 48 83 e0 fe 48 01 f0 c3 66 66 2e 0f 1f 84 00 00 00 00
      [  106.115588] RSP: 0018:ffffc9000d694e50 EFLAGS: 00010206
      [  106.120893] RAX: 0000000000000000 RBX: ffff88984b8c8a00 RCX: ffff889852581800
      [  106.128137] RDX: 0000000000000006 RSI: 0000000000000000 RDI: ffff88984cd8b800
      [  106.135383] RBP: ffff888123b50001 R08: ffff889896800000 R09: 0000000000000800
      [  106.142628] R10: 0000000000000000 R11: ffffffff826060c0 R12: 00000000000000ff
      [  106.149872] R13: 0000000000000000 R14: 0000000000000040 R15: ffff888123b50018
      [  106.157117] FS:  0000000000000000(0000) GS:ffff8897e0f40000(0000) knlGS:0000000000000000
      [  106.165332] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  106.171163] CR2: 0000000000000030 CR3: 000000000560a004 CR4: 00000000007706e0
      [  106.178408] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [  106.185653] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [  106.192898] PKRU: 55555554
      [  106.195653] Call Trace:
      [  106.198143]  <IRQ>
      [  106.200196]  ice_clean_tx_irq_zc+0x183/0x2a0 [ice]
      [  106.205087]  ice_napi_poll+0x3e/0x590 [ice]
      [  106.209356]  __napi_poll+0x2a/0x160
      [  106.212911]  net_rx_action+0xd6/0x200
      [  106.216634]  __do_softirq+0xbf/0x29b
      [  106.220274]  irq_exit_rcu+0x88/0xc0
      [  106.223819]  common_interrupt+0x7b/0xa0
      [  106.227719]  </IRQ>
      [  106.229857]  asm_common_interrupt+0x1e/0x40
      </snip>
      
      Fix this by introducing the bitmap of queues that are zero-copy enabled,
      where each bit, corresponding to a queue id that xsk pool is being
      configured on, will be set/cleared within ice_xsk_pool_{en,dis}able and
      checked within ice_xsk_pool(). The latter is a function used for
      deciding which napi poll routine is executed.
      Idea is being taken from our other drivers such as i40e and ixgbe.
      
      Fixes: c7a21904 ("ice: Remove xsk_buff_pool from VSI structure")
      Signed-off-by: NMaciej Fijalkowski <maciej.fijalkowski@intel.com>
      Tested-by: NKiran Bhandare <kiranx.bhandare@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      e102db78
  11. 29 5月, 2021 3 次提交
  12. 23 4月, 2021 1 次提交
  13. 15 4月, 2021 4 次提交
  14. 08 4月, 2021 1 次提交
  15. 01 4月, 2021 3 次提交
  16. 30 3月, 2021 3 次提交
  17. 23 3月, 2021 2 次提交
  18. 23 2月, 2021 1 次提交
  19. 09 2月, 2021 3 次提交
  20. 06 2月, 2021 1 次提交