1. 15 10月, 2021 1 次提交
  2. 18 6月, 2021 1 次提交
  3. 11 6月, 2021 3 次提交
    • J
      ice: enable transmit timestamps for E810 devices · ea9b847c
      Jacob Keller 提交于
      Add support for enabling Tx timestamp requests for outgoing packets on
      E810 devices.
      
      The ice hardware can support multiple outstanding Tx timestamp requests.
      When sending a descriptor to hardware, a Tx timestamp request is made by
      setting a request bit, and assigning an index that represents which Tx
      timestamp index to store the timestamp in.
      
      Hardware makes no effort to synchronize the index use, so it is up to
      software to ensure that Tx timestamp indexes are not re-used before the
      timestamp is reported back.
      
      To do this, introduce a Tx timestamp tracker which will keep track of
      currently in-use indexes.
      
      In the hot path, if a packet has a timestamp request, an index will be
      requested from the tracker. Unfortunately, this does require a lock as
      the indexes are shared across all queues on a PHY. There are not enough
      indexes to reliably assign only 1 to each queue.
      
      For the E810 devices, the timestamp indexes are not shared across PHYs,
      so each port can have its own tracking.
      
      Once hardware captures a timestamp, an interrupt is fired. In this
      interrupt, trigger a new work item that will figure out which timestamp
      was completed, and report the timestamp back to the stack.
      
      This function loops through the Tx timestamp indexes and checks whether
      there is now a valid timestamp. If so, it clears the PHY timestamp
      indication in the PHY memory, locks and removes the SKB and bit in the
      tracker, then reports the timestamp to the stack.
      
      It is possible in some cases that a timestamp request will be initiated
      but never completed. This might occur if the packet is dropped by
      software or hardware before it reaches the PHY.
      
      Add a task to the periodic work function that will check whether
      a timestamp request is more than a few seconds old. If so, the timestamp
      index is cleared in the PHY, and the SKB is released.
      
      Just as with Rx timestamps, the Tx timestamps are only 40 bits wide, and
      use the same overall logic for extending to 64 bits of nanoseconds.
      
      With this change, E810 devices should be able to perform basic PTP
      functionality.
      
      Future changes will extend the support to cover the E822-based devices.
      Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
      Tested-by: NTony Brelinski <tonyx.brelinski@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      ea9b847c
    • J
      ice: enable receive hardware timestamping · 77a78115
      Jacob Keller 提交于
      Add SIOCGHWTSTAMP and SIOCSHWTSTAMP ioctl handlers to respond to
      requests to enable timestamping support. If the request is for enabling
      Rx timestamps, set a bit in the Rx descriptors to indicate that receive
      timestamps should be reported.
      
      Hardware captures receive timestamps in the PHY which only captures part
      of the timer, and reports only 40 bits into the Rx descriptor. The upper
      32 bits represent the contents of GLTSYN_TIME_L at the point of packet
      reception, while the lower 8 bits represent the upper 8 bits of
      GLTSYN_TIME_0.
      
      The networking and PTP stack expect 64 bit timestamps in nanoseconds. To
      support this, implement some logic to extend the timestamps by using the
      full PHC time.
      
      If the Rx timestamp was captured prior to the PHC time, then the real
      timestamp is
      
        PHC - (lower_32_bits(PHC) - timestamp)
      
      If the Rx timestamp was captured after the PHC time, then the real
      timestamp is
      
        PHC + (timestamp - lower_32_bits(PHC))
      
      These calculations are correct as long as neither the PHC timestamp nor
      the Rx timestamps are more than 2^32-1 nanseconds old. Further, we can
      detect when the Rx timestamp is before or after the PHC as long as the
      PHC timestamp is no more than 2^31-1 nanoseconds old.
      
      In that case, we calculate the delta between the lower 32 bits of the
      PHC and the Rx timestamp. If it's larger than 2^31-1 then the Rx
      timestamp must have been captured in the past. If it's smaller, then the
      Rx timestamp must have been captured after PHC time.
      
      Add an ice_ptp_extend_32b_ts function that relies on a cached copy of
      the PHC time and implements this algorithm to calculate the proper upper
      32bits of the Rx timestamps.
      
      Cache the PHC time periodically in all of the Rx rings. This enables
      each Rx ring to simply call the extension function with a recent copy of
      the PHC time. By ensuring that the PHC time is kept up to date
      periodically, we ensure this algorithm doesn't use stale data and
      produce incorrect results.
      
      To cache the time, introduce a kworker and a kwork item to periodically
      store the Rx time. It might seem like we should use the .do_aux_work
      interface of the PTP clock. This doesn't work because all PFs must cache
      this time, but only one PF owns the PTP clock device.
      
      Thus, the ice driver will manage its own kthread instead of relying on
      the PTP do_aux_work handler.
      
      With this change, the driver can now report Rx timestamps on all
      incoming packets.
      Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
      Tested-by: NTony Brelinski <tonyx.brelinski@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      77a78115
    • J
      ice: add support for sideband messages · 8f5ee3c4
      Jacob Keller 提交于
      In order to support certain device features, including enabling the PTP
      hardware clock, the ice driver needs to control some registers on the
      device PHY.
      
      These registers are accessed by sending sideband messages. For some
      hardware, these messages must be sent over the device admin queue, while
      other hardware has a dedicated control queue for the sideband messages.
      
      Add the neighbor device message structure for sending a message to the
      neighboring device. Where supported, initialize the sideband control
      queue and handle cleanup.
      
      Add a wrapper function for sending sideband control queue messages that
      read or write a neighboring device register.
      
      Because some devices send sideband messages over the AdminQ, also
      increase the length of the admin queue to allow more messages to be
      queued up. This is important because the sideband messages add
      additional pressure on the AQ usage.
      
      This support will be used in following patches to enable support for
      CONFIG_1588_PTP_CLOCK.
      Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
      Tested-by: NTony Brelinski <tonyx.brelinski@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      8f5ee3c4
  4. 10 6月, 2021 1 次提交
  5. 07 6月, 2021 5 次提交
  6. 04 6月, 2021 1 次提交
  7. 03 6月, 2021 1 次提交
    • M
      ice: track AF_XDP ZC enabled queues in bitmap · e102db78
      Maciej Fijalkowski 提交于
      Commit c7a21904 ("ice: Remove xsk_buff_pool from VSI structure")
      silently introduced a regression and broke the Tx side of AF_XDP in copy
      mode. xsk_pool on ice_ring is set only based on the existence of the XDP
      prog on the VSI which in turn picks ice_clean_tx_irq_zc to be executed.
      That is not something that should happen for copy mode as it should use
      the regular data path ice_clean_tx_irq.
      
      This results in a following splat when xdpsock is run in txonly or l2fwd
      scenarios in copy mode:
      
      <snip>
      [  106.050195] BUG: kernel NULL pointer dereference, address: 0000000000000030
      [  106.057269] #PF: supervisor read access in kernel mode
      [  106.062493] #PF: error_code(0x0000) - not-present page
      [  106.067709] PGD 0 P4D 0
      [  106.070293] Oops: 0000 [#1] PREEMPT SMP NOPTI
      [  106.074721] CPU: 61 PID: 0 Comm: swapper/61 Not tainted 5.12.0-rc2+ #45
      [  106.081436] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0008.031920191559 03/19/2019
      [  106.092027] RIP: 0010:xp_raw_get_dma+0x36/0x50
      [  106.096551] Code: 74 14 48 b8 ff ff ff ff ff ff 00 00 48 21 f0 48 c1 ee 30 48 01 c6 48 8b 87 90 00 00 00 48 89 f2 81 e6 ff 0f 00 00 48 c1 ea 0c <48> 8b 04 d0 48 83 e0 fe 48 01 f0 c3 66 66 2e 0f 1f 84 00 00 00 00
      [  106.115588] RSP: 0018:ffffc9000d694e50 EFLAGS: 00010206
      [  106.120893] RAX: 0000000000000000 RBX: ffff88984b8c8a00 RCX: ffff889852581800
      [  106.128137] RDX: 0000000000000006 RSI: 0000000000000000 RDI: ffff88984cd8b800
      [  106.135383] RBP: ffff888123b50001 R08: ffff889896800000 R09: 0000000000000800
      [  106.142628] R10: 0000000000000000 R11: ffffffff826060c0 R12: 00000000000000ff
      [  106.149872] R13: 0000000000000000 R14: 0000000000000040 R15: ffff888123b50018
      [  106.157117] FS:  0000000000000000(0000) GS:ffff8897e0f40000(0000) knlGS:0000000000000000
      [  106.165332] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  106.171163] CR2: 0000000000000030 CR3: 000000000560a004 CR4: 00000000007706e0
      [  106.178408] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [  106.185653] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [  106.192898] PKRU: 55555554
      [  106.195653] Call Trace:
      [  106.198143]  <IRQ>
      [  106.200196]  ice_clean_tx_irq_zc+0x183/0x2a0 [ice]
      [  106.205087]  ice_napi_poll+0x3e/0x590 [ice]
      [  106.209356]  __napi_poll+0x2a/0x160
      [  106.212911]  net_rx_action+0xd6/0x200
      [  106.216634]  __do_softirq+0xbf/0x29b
      [  106.220274]  irq_exit_rcu+0x88/0xc0
      [  106.223819]  common_interrupt+0x7b/0xa0
      [  106.227719]  </IRQ>
      [  106.229857]  asm_common_interrupt+0x1e/0x40
      </snip>
      
      Fix this by introducing the bitmap of queues that are zero-copy enabled,
      where each bit, corresponding to a queue id that xsk pool is being
      configured on, will be set/cleared within ice_xsk_pool_{en,dis}able and
      checked within ice_xsk_pool(). The latter is a function used for
      deciding which napi poll routine is executed.
      Idea is being taken from our other drivers such as i40e and ixgbe.
      
      Fixes: c7a21904 ("ice: Remove xsk_buff_pool from VSI structure")
      Signed-off-by: NMaciej Fijalkowski <maciej.fijalkowski@intel.com>
      Tested-by: NKiran Bhandare <kiranx.bhandare@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      e102db78
  8. 29 5月, 2021 1 次提交
  9. 15 4月, 2021 8 次提交
  10. 08 4月, 2021 1 次提交
  11. 01 4月, 2021 5 次提交
  12. 30 3月, 2021 2 次提交
    • D
      ice: remove DCBNL_DEVRESET bit from PF state · 741b7b74
      Dave Ertman 提交于
      The original purpose of the ICE_DCBNL_DEVRESET was to protect
      the driver during DCBNL device resets.  But, the flow for
      DCBNL device resets now consists of only calls up the stack
      such as dev_close() and dev_open() that will result in NDO calls
      to the driver.  These will be handled with state changes from the
      stack.  Also, there is a problem of the dev_close and dev_open
      being blocked by checks for reset in progress also using the
      ICE_DCBNL_DEVRESET bit.
      
      Since the ICE_DCBNL_DEVRESET bit is not necessary for protecting
      the driver from DCBNL device resets and it is actually blocking
      changes coming from the DCBNL interface, remove the bit from the
      PF state and don't block driver function based on DCBNL reset in
      progress.
      
      Fixes: b94b013e ("ice: Implement DCBNL support")
      Signed-off-by: NDave Ertman <david.m.ertman@intel.com>
      Tested-by: NTony Brelinski <tonyx.brelinski@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      741b7b74
    • K
      ice: prevent ice_open and ice_stop during reset · e95fc857
      Krzysztof Goreczny 提交于
      There is a possibility of race between ice_open or ice_stop calls
      performed by OS and reset handling routine both trying to modify VSI
      resources. Observed scenarios:
      - reset handler deallocates memory in ice_vsi_free_arrays and ice_open
        tries to access it in ice_vsi_cfg_txq leading to driver crash
      - reset handler deallocates memory in ice_vsi_free_arrays and ice_close
        tries to access it in ice_down leading to driver crash
      - reset handler clears port scheduler topology and sets port state to
        ICE_SCHED_PORT_STATE_INIT leading to ice_ena_vsi_txq fail in ice_open
      
      To prevent this additional checks in ice_open and ice_stop are
      introduced to make sure that OS is not allowed to alter VSI config while
      reset is in progress.
      
      Fixes: cdedef59 ("ice: Configure VSIs for Tx/Rx")
      Signed-off-by: NKrzysztof Goreczny <krzysztof.goreczny@intel.com>
      Tested-by: NTony Brelinski <tonyx.brelinski@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      e95fc857
  13. 23 3月, 2021 1 次提交
  14. 09 2月, 2021 4 次提交
  15. 27 1月, 2021 1 次提交
    • B
      ice: Fix MSI-X vector fallback logic · f3fe97f6
      Brett Creeley 提交于
      The current MSI-X enablement logic tries to enable best-case MSI-X
      vectors and if that fails we only support a bare-minimum set. This
      includes a single MSI-X for 1 Tx and 1 Rx queue and a single MSI-X
      for the OICR interrupt. Unfortunately, the driver fails to load when we
      don't get as many MSI-X as requested for a couple reasons.
      
      First, the code to allocate MSI-X in the driver tries to allocate
      num_online_cpus() MSI-X for LAN traffic without caring about the number
      of MSI-X actually enabled/requested from the kernel for LAN traffic.
      So, when calling ice_get_res() for the PF VSI, it returns failure
      because the number of available vectors is less than requested. Fix
      this by not allowing the PF VSI to allocation  more than
      pf->num_lan_msix MSI-X vectors and pf->num_lan_msix Rx/Tx queues.
      Limiting the number of queues is done because we don't want more than
      1 Tx/Rx queue per interrupt due to performance conerns.
      
      Second, the driver assigns pf->num_lan_msix = 2, to account for LAN
      traffic and the OICR. However, pf->num_lan_msix is only meant for LAN
      MSI-X. This is causing a failure when the PF VSI tries to
      allocate/reserve the minimum pf->num_lan_msix because the OICR MSI-X has
      already been reserved, so there may not be enough MSI-X vectors left.
      Fix this by setting pf->num_lan_msix = 1 for the failure case. Then the
      ICE_MIN_MSIX accounts for the LAN MSI-X and the OICR MSI-X needed for
      the failure case.
      
      Update the related defines used in ice_ena_msix_range() to align with
      the above behavior and remove the unused RDMA defines because RDMA is
      currently not supported. Also, remove the now incorrect comment.
      
      Fixes: 152b978a ("ice: Rework ice_ena_msix_range")
      Signed-off-by: NBrett Creeley <brett.creeley@intel.com>
      Tested-by: NTony Brelinski <tonyx.brelinski@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      f3fe97f6
  16. 10 10月, 2020 1 次提交
    • J
      ice: refactor devlink_port to be per-VSI · 48d40025
      Jacob Keller 提交于
      Currently, the devlink_port structure is stored within the ice_pf. This
      made sense because we create a single devlink_port for each PF. This
      setup does not mesh with the abstractions in the driver very well, and
      led to a flow where we accidentally call devlink_port_unregister twice
      during error cleanup.
      
      In particular, if devlink_port_register or devlink_port_unregister are
      called twice, this leads to a kernel panic. This appears to occur during
      some possible flows while cleaning up from a failure during driver
      probe.
      
      If register_netdev fails, then we will call devlink_port_unregister in
      ice_cfg_netdev as it cleans up. Later, we again call
      devlink_port_unregister since we assume that we must cleanup the port
      that is associated with the PF structure.
      
      This occurs because we cleanup the devlink_port for the main PF even
      though it was not allocated. We allocated the port within a per-VSI
      function for managing the main netdev, but did not release the port when
      cleaning up that VSI, the allocation and destruction are not aligned.
      
      Instead of attempting to manage the devlink_port as part of the PF
      structure, manage it as part of the PF VSI. Doing this has advantages,
      as we can match the de-allocation of the devlink_port with the
      unregister_netdev associated with the main PF VSI.
      
      Moving the port to the VSI is preferable as it paves the way for
      handling devlink ports allocated for other purposes such as SR-IOV VFs.
      
      Since we're changing up how we allocate the devlink_port, also change
      the indexing. Originally, we indexed the port using the PF id number.
      This came from an old goal of sharing a devlink for each physical
      function. Managing devlink instances across multiple function drivers is
      not workable. Instead, lets set the port number to the logical port
      number returned by firmware and set the index using the VSI index
      (sometimes referred to as VSI handle).
      Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
      Tested-by: NAaron Brown <aaron.f.brown@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      48d40025
  17. 25 9月, 2020 2 次提交
    • J
      ice: fix memory leak in ice_vsi_setup · f6a07271
      Jacob Keller 提交于
      During ice_vsi_setup, if ice_cfg_vsi_lan fails, it does not properly
      release memory associated with the VSI rings. If we had used devres
      allocations for the rings, this would be ok. However, we use kzalloc and
      kfree_rcu for these ring structures.
      
      Using the correct label to cleanup the rings during ice_vsi_setup
      highlights an issue in the ice_vsi_clear_rings function: it can leave
      behind stale ring pointers in the q_vectors structure.
      
      When releasing rings, we must also ensure that no q_vector associated
      with the VSI will point to this ring again. To resolve this, loop over
      all q_vectors and release their ring mapping. Because we are about to
      free all rings, no q_vector should remain pointing to any of the rings
      in this VSI.
      
      Fixes: 5513b920 ("ice: Update Tx scheduler tree for VSI multi-Tx queue support")
      Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
      Tested-by: NAaron Brown <aaron.f.brown@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      f6a07271
    • J
      ice: fix memory leak if register_netdev_fails · 135f4b9e
      Jacob Keller 提交于
      The ice_setup_pf_sw function can cause a memory leak if register_netdev
      fails, due to accidentally failing to free the VSI rings. Fix the memory
      leak by using ice_vsi_release, ensuring we actually go through the full
      teardown process.
      
      This should be safe even if the netdevice is not registered because we
      will have set the netdev pointer to NULL, ensuring ice_vsi_release won't
      call unregister_netdev.
      
      An alternative fix would be moving management of the PF VSI netdev into
      the main VSI setup code. This is complicated and likely requires
      significant refactor in how we manage VSIs
      
      Fixes: 3a858ba3 ("ice: Add support for VSI allocation and deallocation")
      Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
      Tested-by: NAaron Brown <aaron.f.brown@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      135f4b9e
  18. 01 9月, 2020 1 次提交