1. 20 10月, 2021 3 次提交
  2. 15 10月, 2021 5 次提交
    • M
      ice: make use of ice_for_each_* macros · 2faf63b6
      Maciej Fijalkowski 提交于
      Go through the code base and use ice_for_each_* macros.  While at it,
      introduce ice_for_each_xdp_txq() macro that can be used for looping over
      xdp_rings array.
      
      Commit is not introducing any new functionality.
      Signed-off-by: NMaciej Fijalkowski <maciej.fijalkowski@intel.com>
      Tested-by: NGurucharan G <gurucharanx.g@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      2faf63b6
    • M
      ice: introduce XDP_TX fallback path · 22bf877e
      Maciej Fijalkowski 提交于
      Under rare circumstances there might be a situation where a requirement
      of having XDP Tx queue per CPU could not be fulfilled and some of the Tx
      resources have to be shared between CPUs. This yields a need for placing
      accesses to xdp_ring inside a critical section protected by spinlock.
      These accesses happen to be in the hot path, so let's introduce the
      static branch that will be triggered from the control plane when driver
      could not provide Tx queue dedicated for XDP on each CPU.
      
      Currently, the design that has been picked is to allow any number of XDP
      Tx queues that is at least half of a count of CPUs that platform has.
      For lower number driver will bail out with a response to user that there
      were not enough Tx resources that would allow configuring XDP. The
      sharing of rings is signalled via static branch enablement which in turn
      indicates that lock for xdp_ring accesses needs to be taken in hot path.
      
      Approach based on static branch has no impact on performance of a
      non-fallback path. One thing that is needed to be mentioned is a fact
      that the static branch will act as a global driver switch, meaning that
      if one PF got out of Tx resources, then other PFs that ice driver is
      servicing will suffer. However, given the fact that HW that ice driver
      is handling has 1024 Tx queues per each PF, this is currently an
      unlikely scenario.
      Signed-off-by: NMaciej Fijalkowski <maciej.fijalkowski@intel.com>
      Tested-by: NGeorge Kuruvinakunnel <george.kuruvinakunnel@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      22bf877e
    • M
      ice: unify xdp_rings accesses · 0bb4f9ec
      Maciej Fijalkowski 提交于
      There has been a long lasting issue of improper xdp_rings indexing for
      XDP_TX and XDP_REDIRECT actions. Given that currently rx_ring->q_index
      is mixed with smp_processor_id(), there could be a situation where Tx
      descriptors are produced onto XDP Tx ring, but tail is never bumped -
      for example pin a particular queue id to non-matching IRQ line.
      
      Address this problem by ignoring the user ring count setting and always
      initialize the xdp_rings array to be of num_possible_cpus() size. Then,
      always use the smp_processor_id() as an index to xdp_rings array. This
      provides serialization as at given time only a single softirq can run on
      a particular CPU.
      Signed-off-by: NMaciej Fijalkowski <maciej.fijalkowski@intel.com>
      Tested-by: NGeorge Kuruvinakunnel <george.kuruvinakunnel@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      0bb4f9ec
    • M
      ice: split ice_ring onto Tx/Rx separate structs · e72bba21
      Maciej Fijalkowski 提交于
      While it was convenient to have a generic ring structure that served
      both Tx and Rx sides, next commits are going to introduce several
      Tx-specific fields, so in order to avoid hurting the Rx side, let's
      pull out the Tx ring onto new ice_tx_ring and ice_rx_ring structs.
      
      Rx ring could be handled by the old ice_ring which would reduce the code
      churn within this patch, but this would make things asymmetric.
      
      Make the union out of the ring container within ice_q_vector so that it
      is possible to iterate over newly introduced ice_tx_ring.
      
      Remove the @size as it's only accessed from control path and it can be
      calculated pretty easily.
      
      Change definitions of ice_update_ring_stats and
      ice_fetch_u64_stats_per_ring so that they are ring agnostic and can be
      used for both Rx and Tx rings.
      
      Sizes of Rx and Tx ring structs are 256 and 192 bytes, respectively. In
      Rx ring xdp_rxq_info occupies its own cacheline, so it's the major
      difference now.
      Signed-off-by: NMaciej Fijalkowski <maciej.fijalkowski@intel.com>
      Tested-by: NGurucharan G <gurucharanx.g@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      e72bba21
    • M
      ice: remove ring_active from ice_ring · e93d1c37
      Maciej Fijalkowski 提交于
      This field is dead and driver is not making any use of it. Simply remove
      it.
      Signed-off-by: NMaciej Fijalkowski <maciej.fijalkowski@intel.com>
      Tested-by: NGurucharan G <gurucharanx.g@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      e93d1c37
  3. 14 10月, 2021 1 次提交
  4. 08 10月, 2021 3 次提交
  5. 29 9月, 2021 1 次提交
  6. 18 6月, 2021 1 次提交
  7. 11 6月, 2021 3 次提交
    • J
      ice: enable transmit timestamps for E810 devices · ea9b847c
      Jacob Keller 提交于
      Add support for enabling Tx timestamp requests for outgoing packets on
      E810 devices.
      
      The ice hardware can support multiple outstanding Tx timestamp requests.
      When sending a descriptor to hardware, a Tx timestamp request is made by
      setting a request bit, and assigning an index that represents which Tx
      timestamp index to store the timestamp in.
      
      Hardware makes no effort to synchronize the index use, so it is up to
      software to ensure that Tx timestamp indexes are not re-used before the
      timestamp is reported back.
      
      To do this, introduce a Tx timestamp tracker which will keep track of
      currently in-use indexes.
      
      In the hot path, if a packet has a timestamp request, an index will be
      requested from the tracker. Unfortunately, this does require a lock as
      the indexes are shared across all queues on a PHY. There are not enough
      indexes to reliably assign only 1 to each queue.
      
      For the E810 devices, the timestamp indexes are not shared across PHYs,
      so each port can have its own tracking.
      
      Once hardware captures a timestamp, an interrupt is fired. In this
      interrupt, trigger a new work item that will figure out which timestamp
      was completed, and report the timestamp back to the stack.
      
      This function loops through the Tx timestamp indexes and checks whether
      there is now a valid timestamp. If so, it clears the PHY timestamp
      indication in the PHY memory, locks and removes the SKB and bit in the
      tracker, then reports the timestamp to the stack.
      
      It is possible in some cases that a timestamp request will be initiated
      but never completed. This might occur if the packet is dropped by
      software or hardware before it reaches the PHY.
      
      Add a task to the periodic work function that will check whether
      a timestamp request is more than a few seconds old. If so, the timestamp
      index is cleared in the PHY, and the SKB is released.
      
      Just as with Rx timestamps, the Tx timestamps are only 40 bits wide, and
      use the same overall logic for extending to 64 bits of nanoseconds.
      
      With this change, E810 devices should be able to perform basic PTP
      functionality.
      
      Future changes will extend the support to cover the E822-based devices.
      Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
      Tested-by: NTony Brelinski <tonyx.brelinski@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      ea9b847c
    • J
      ice: enable receive hardware timestamping · 77a78115
      Jacob Keller 提交于
      Add SIOCGHWTSTAMP and SIOCSHWTSTAMP ioctl handlers to respond to
      requests to enable timestamping support. If the request is for enabling
      Rx timestamps, set a bit in the Rx descriptors to indicate that receive
      timestamps should be reported.
      
      Hardware captures receive timestamps in the PHY which only captures part
      of the timer, and reports only 40 bits into the Rx descriptor. The upper
      32 bits represent the contents of GLTSYN_TIME_L at the point of packet
      reception, while the lower 8 bits represent the upper 8 bits of
      GLTSYN_TIME_0.
      
      The networking and PTP stack expect 64 bit timestamps in nanoseconds. To
      support this, implement some logic to extend the timestamps by using the
      full PHC time.
      
      If the Rx timestamp was captured prior to the PHC time, then the real
      timestamp is
      
        PHC - (lower_32_bits(PHC) - timestamp)
      
      If the Rx timestamp was captured after the PHC time, then the real
      timestamp is
      
        PHC + (timestamp - lower_32_bits(PHC))
      
      These calculations are correct as long as neither the PHC timestamp nor
      the Rx timestamps are more than 2^32-1 nanseconds old. Further, we can
      detect when the Rx timestamp is before or after the PHC as long as the
      PHC timestamp is no more than 2^31-1 nanoseconds old.
      
      In that case, we calculate the delta between the lower 32 bits of the
      PHC and the Rx timestamp. If it's larger than 2^31-1 then the Rx
      timestamp must have been captured in the past. If it's smaller, then the
      Rx timestamp must have been captured after PHC time.
      
      Add an ice_ptp_extend_32b_ts function that relies on a cached copy of
      the PHC time and implements this algorithm to calculate the proper upper
      32bits of the Rx timestamps.
      
      Cache the PHC time periodically in all of the Rx rings. This enables
      each Rx ring to simply call the extension function with a recent copy of
      the PHC time. By ensuring that the PHC time is kept up to date
      periodically, we ensure this algorithm doesn't use stale data and
      produce incorrect results.
      
      To cache the time, introduce a kworker and a kwork item to periodically
      store the Rx time. It might seem like we should use the .do_aux_work
      interface of the PTP clock. This doesn't work because all PFs must cache
      this time, but only one PF owns the PTP clock device.
      
      Thus, the ice driver will manage its own kthread instead of relying on
      the PTP do_aux_work handler.
      
      With this change, the driver can now report Rx timestamps on all
      incoming packets.
      Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
      Tested-by: NTony Brelinski <tonyx.brelinski@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      77a78115
    • J
      ice: add support for sideband messages · 8f5ee3c4
      Jacob Keller 提交于
      In order to support certain device features, including enabling the PTP
      hardware clock, the ice driver needs to control some registers on the
      device PHY.
      
      These registers are accessed by sending sideband messages. For some
      hardware, these messages must be sent over the device admin queue, while
      other hardware has a dedicated control queue for the sideband messages.
      
      Add the neighbor device message structure for sending a message to the
      neighboring device. Where supported, initialize the sideband control
      queue and handle cleanup.
      
      Add a wrapper function for sending sideband control queue messages that
      read or write a neighboring device register.
      
      Because some devices send sideband messages over the AdminQ, also
      increase the length of the admin queue to allow more messages to be
      queued up. This is important because the sideband messages add
      additional pressure on the AQ usage.
      
      This support will be used in following patches to enable support for
      CONFIG_1588_PTP_CLOCK.
      Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
      Tested-by: NTony Brelinski <tonyx.brelinski@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      8f5ee3c4
  8. 10 6月, 2021 1 次提交
  9. 07 6月, 2021 5 次提交
  10. 04 6月, 2021 1 次提交
  11. 03 6月, 2021 1 次提交
    • M
      ice: track AF_XDP ZC enabled queues in bitmap · e102db78
      Maciej Fijalkowski 提交于
      Commit c7a21904 ("ice: Remove xsk_buff_pool from VSI structure")
      silently introduced a regression and broke the Tx side of AF_XDP in copy
      mode. xsk_pool on ice_ring is set only based on the existence of the XDP
      prog on the VSI which in turn picks ice_clean_tx_irq_zc to be executed.
      That is not something that should happen for copy mode as it should use
      the regular data path ice_clean_tx_irq.
      
      This results in a following splat when xdpsock is run in txonly or l2fwd
      scenarios in copy mode:
      
      <snip>
      [  106.050195] BUG: kernel NULL pointer dereference, address: 0000000000000030
      [  106.057269] #PF: supervisor read access in kernel mode
      [  106.062493] #PF: error_code(0x0000) - not-present page
      [  106.067709] PGD 0 P4D 0
      [  106.070293] Oops: 0000 [#1] PREEMPT SMP NOPTI
      [  106.074721] CPU: 61 PID: 0 Comm: swapper/61 Not tainted 5.12.0-rc2+ #45
      [  106.081436] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0008.031920191559 03/19/2019
      [  106.092027] RIP: 0010:xp_raw_get_dma+0x36/0x50
      [  106.096551] Code: 74 14 48 b8 ff ff ff ff ff ff 00 00 48 21 f0 48 c1 ee 30 48 01 c6 48 8b 87 90 00 00 00 48 89 f2 81 e6 ff 0f 00 00 48 c1 ea 0c <48> 8b 04 d0 48 83 e0 fe 48 01 f0 c3 66 66 2e 0f 1f 84 00 00 00 00
      [  106.115588] RSP: 0018:ffffc9000d694e50 EFLAGS: 00010206
      [  106.120893] RAX: 0000000000000000 RBX: ffff88984b8c8a00 RCX: ffff889852581800
      [  106.128137] RDX: 0000000000000006 RSI: 0000000000000000 RDI: ffff88984cd8b800
      [  106.135383] RBP: ffff888123b50001 R08: ffff889896800000 R09: 0000000000000800
      [  106.142628] R10: 0000000000000000 R11: ffffffff826060c0 R12: 00000000000000ff
      [  106.149872] R13: 0000000000000000 R14: 0000000000000040 R15: ffff888123b50018
      [  106.157117] FS:  0000000000000000(0000) GS:ffff8897e0f40000(0000) knlGS:0000000000000000
      [  106.165332] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  106.171163] CR2: 0000000000000030 CR3: 000000000560a004 CR4: 00000000007706e0
      [  106.178408] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [  106.185653] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [  106.192898] PKRU: 55555554
      [  106.195653] Call Trace:
      [  106.198143]  <IRQ>
      [  106.200196]  ice_clean_tx_irq_zc+0x183/0x2a0 [ice]
      [  106.205087]  ice_napi_poll+0x3e/0x590 [ice]
      [  106.209356]  __napi_poll+0x2a/0x160
      [  106.212911]  net_rx_action+0xd6/0x200
      [  106.216634]  __do_softirq+0xbf/0x29b
      [  106.220274]  irq_exit_rcu+0x88/0xc0
      [  106.223819]  common_interrupt+0x7b/0xa0
      [  106.227719]  </IRQ>
      [  106.229857]  asm_common_interrupt+0x1e/0x40
      </snip>
      
      Fix this by introducing the bitmap of queues that are zero-copy enabled,
      where each bit, corresponding to a queue id that xsk pool is being
      configured on, will be set/cleared within ice_xsk_pool_{en,dis}able and
      checked within ice_xsk_pool(). The latter is a function used for
      deciding which napi poll routine is executed.
      Idea is being taken from our other drivers such as i40e and ixgbe.
      
      Fixes: c7a21904 ("ice: Remove xsk_buff_pool from VSI structure")
      Signed-off-by: NMaciej Fijalkowski <maciej.fijalkowski@intel.com>
      Tested-by: NKiran Bhandare <kiranx.bhandare@intel.com>
      Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
      e102db78
  12. 29 5月, 2021 1 次提交
  13. 15 4月, 2021 8 次提交
  14. 08 4月, 2021 1 次提交
  15. 01 4月, 2021 5 次提交