提交 · d16a4f45f3a3afcb56910a7242cc621c071e80e4 · openeuler / Kernel

20 10月, 2021 3 次提交

ice: fix rate limit update after coalesce change · d16a4f45

由 Jesse Brandeburg 提交于 9月 20, 2021

If the adaptive settings are changed with
ethtool -C ethx adaptive-rx off adaptive-tx off
then the interrupt rate limit should be maintained as a user set value,
but only if BOTH adaptive settings are off. Fix a bug where the rate
limit that was being used in adaptive mode was staying set in the
register but was not reported correctly by ethtool -c ethx. Due to long
lines include a small refactor of q_vector variable.

Fixes: b8b47723 ("ice: refactor interrupt moderation writes")
Signed-off-by: NJesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: NGurucharan G <gurucharanx.g@intel.com>
Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>

d16a4f45

ice: update dim usage and moderation · d8eb7ad5

由 Jesse Brandeburg 提交于 9月 20, 2021

The driver was having trouble with unreliable latency when doing single
threaded ping-pong tests. This was root caused to the DIM algorithm
landing on a too slow interrupt value, which caused high latency, and it
was especially present when queues were being switched frequently by the
scheduler as happens on default setups today.

In attempting to improve this, we allow the upper rate limit for
interrupts to move to rate limit of 4 microseconds as a max, which means
that no vector can generate more than 250,000 interrupts per second. The
old config was up to 100,000. The driver previously tried to program the
rate limit too frequently and if the receive and transmit side were both
active on the same vector, the INTRL would be set incorrectly, and this
change fixes that issue as a side effect of the redesign.

This driver will operate from now on with a slightly changed DIM table
with more emphasis towards latency sensitivity by having more table
entries with lower latency than with high latency (high being >= 64
microseconds).

The driver also resets the DIM algorithm state with a new stats set when
there is no work done and the data becomes stale (older than 1 second),
for the respective receive or transmit portion of the interrupt.

Add a new helper for setting rate limit, which will be used more
in a followup patch.
Signed-off-by: NJesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: NGurucharan G <gurucharanx.g@intel.com>
Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>

d8eb7ad5

ice: Add support for VF rate limiting · 4ecc8633

由 Brett Creeley 提交于 9月 13, 2021

Implement ndo_set_vf_rate to support setting of min_tx_rate and
max_tx_rate; set the appropriate bandwidth in the scheduler for the
node representing the specified VF VSI.
Co-developed-by: NTarun Singh <tarun.k.singh@intel.com>
Signed-off-by: NTarun Singh <tarun.k.singh@intel.com>
Signed-off-by: NBrett Creeley <brett.creeley@intel.com>
Tested-by: NKonrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>

4ecc8633

15 10月, 2021 5 次提交

ice: make use of ice_for_each_* macros · 2faf63b6

由 Maciej Fijalkowski 提交于 8月 19, 2021

Go through the code base and use ice_for_each_* macros.  While at it,
introduce ice_for_each_xdp_txq() macro that can be used for looping over
xdp_rings array.

Commit is not introducing any new functionality.
Signed-off-by: NMaciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: NGurucharan G <gurucharanx.g@intel.com>
Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>

2faf63b6

ice: introduce XDP_TX fallback path · 22bf877e

由 Maciej Fijalkowski 提交于 8月 19, 2021

Under rare circumstances there might be a situation where a requirement
of having XDP Tx queue per CPU could not be fulfilled and some of the Tx
resources have to be shared between CPUs. This yields a need for placing
accesses to xdp_ring inside a critical section protected by spinlock.
These accesses happen to be in the hot path, so let's introduce the
static branch that will be triggered from the control plane when driver
could not provide Tx queue dedicated for XDP on each CPU.

Currently, the design that has been picked is to allow any number of XDP
Tx queues that is at least half of a count of CPUs that platform has.
For lower number driver will bail out with a response to user that there
were not enough Tx resources that would allow configuring XDP. The
sharing of rings is signalled via static branch enablement which in turn
indicates that lock for xdp_ring accesses needs to be taken in hot path.

Approach based on static branch has no impact on performance of a
non-fallback path. One thing that is needed to be mentioned is a fact
that the static branch will act as a global driver switch, meaning that
if one PF got out of Tx resources, then other PFs that ice driver is
servicing will suffer. However, given the fact that HW that ice driver
is handling has 1024 Tx queues per each PF, this is currently an
unlikely scenario.
Signed-off-by: NMaciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: NGeorge Kuruvinakunnel <george.kuruvinakunnel@intel.com>
Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>

22bf877e

ice: unify xdp_rings accesses · 0bb4f9ec

由 Maciej Fijalkowski 提交于 8月 19, 2021

There has been a long lasting issue of improper xdp_rings indexing for
XDP_TX and XDP_REDIRECT actions. Given that currently rx_ring->q_index
is mixed with smp_processor_id(), there could be a situation where Tx
descriptors are produced onto XDP Tx ring, but tail is never bumped -
for example pin a particular queue id to non-matching IRQ line.

Address this problem by ignoring the user ring count setting and always
initialize the xdp_rings array to be of num_possible_cpus() size. Then,
always use the smp_processor_id() as an index to xdp_rings array. This
provides serialization as at given time only a single softirq can run on
a particular CPU.
Signed-off-by: NMaciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: NGeorge Kuruvinakunnel <george.kuruvinakunnel@intel.com>
Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>

0bb4f9ec

ice: split ice_ring onto Tx/Rx separate structs · e72bba21

由 Maciej Fijalkowski 提交于 8月 19, 2021

While it was convenient to have a generic ring structure that served
both Tx and Rx sides, next commits are going to introduce several
Tx-specific fields, so in order to avoid hurting the Rx side, let's
pull out the Tx ring onto new ice_tx_ring and ice_rx_ring structs.

Rx ring could be handled by the old ice_ring which would reduce the code
churn within this patch, but this would make things asymmetric.

Make the union out of the ring container within ice_q_vector so that it
is possible to iterate over newly introduced ice_tx_ring.

Remove the @size as it's only accessed from control path and it can be
calculated pretty easily.

Change definitions of ice_update_ring_stats and
ice_fetch_u64_stats_per_ring so that they are ring agnostic and can be
used for both Rx and Tx rings.

Sizes of Rx and Tx ring structs are 256 and 192 bytes, respectively. In
Rx ring xdp_rxq_info occupies its own cacheline, so it's the major
difference now.
Signed-off-by: NMaciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: NGurucharan G <gurucharanx.g@intel.com>
Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>

e72bba21

ice: remove ring_active from ice_ring · e93d1c37

由 Maciej Fijalkowski 提交于 8月 19, 2021

This field is dead and driver is not making any use of it. Simply remove
it.
Signed-off-by: NMaciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: NGurucharan G <gurucharanx.g@intel.com>
Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>

e93d1c37

14 10月, 2021 1 次提交

ice: Implement support for SMA and U.FL on E810-T · 325b2064

由 Maciej Machnikowski 提交于 8月 17, 2021

Expose SMA and U.FL connectors as ptp_pins on E810-T based adapters and
allow controlling them.

E810-T adapters are equipped with:
- 2 external bidirectional SMA connectors
- 1 internal TX U.FL
- 1 internal RX U.FL

U.FL connectors share signal lines with the SMA connectors. The TX U.FL1
share the line with the SMA1 and the RX U.FL2 share line with the SMA2.
This dependence is controlled by the ice_verify_pin_e810t.

Additionally add support for the E810-T-based  devices which don't use the
SMA/U.FL controller. If the IO expander is not detected don't expose pins
and use 2 predefined 1PPS input and output pins.
Signed-off-by: NMaciej Machnikowski <maciej.machnikowski@intel.com>
Tested-by: NSunitha Mekala <sunithax.d.mekala@intel.com>
Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>

325b2064

08 10月, 2021 3 次提交

ice: introduce new type of VSI for switchdev · f66756e0

由 Grzegorz Nitka 提交于 8月 19, 2021

New type of VSI has to be defined for switchdev control plane
VSI. Number of allocated Tx and Rx queue has to be equal to
amount of VFs, because each port representor should have one
Tx and Rx queue.

Also to not increase number of used irqs too much, control plane
VSI uses only one q_vector and handle all queues in one irq.
To allow handling all queues in one irq , new function to clean
msix for eswitch was introduced. This function will schedule napi
for each representor instead of scheduling it only for one like in
normal clean irq function.

Only one additional msix has to be requested. Always try to request
it in ice_ena_msix_range function.
Signed-off-by: NGrzegorz Nitka <grzegorz.nitka@intel.com>
Tested-by: NSandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>

f66756e0

ice: manage VSI antispoof and destination override · ff5411ef

由 Michal Swiatkowski 提交于 8月 19, 2021

Implement functions to make setting VSI security config easier.
Main function ice_update_security fills security section field and
checks against error in updating VSI. Reset functions are responsible
for correct filling config according to user expectations.

This helper is needed because destination override is located in
this section. Driver has to set this bit to allow strering Tx packet
on VSI based on value in Tx descriptors.
Signed-off-by: NMichal Swiatkowski <michal.swiatkowski@linux.intel.com>
Tested-by: NSandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>

ff5411ef

ice: Move devlink port to PF/VF struct · 2ae0aa47

由 Wojciech Drewek 提交于 8月 19, 2021

Keeping devlink port inside VSI data structure causes some issues.
Since VF VSI is released during reset that means that we have to
unregister devlink port and register it again every time reset is
triggered. With the new changes in devlink API it
might cause deadlock issues. After calling
devlink_port_register/devlink_port_unregister devlink API is going to
lock rtnl_mutex. It's an issue when VF reset is triggered in netlink
operation context (like setting VF MAC address or VLAN),
because rtnl_lock is already taken by netlink. Another call of
rtnl_lock from devlink API results in dead-lock.

By moving devlink port to PF/VF we avoid creating/destroying it
during reset. Since this patch, devlink ports are created during
ice_probe, destroyed during ice_remove for PF and created during
ice_repr_add, destroyed during ice_repr_rem for VF.
Signed-off-by: NWojciech Drewek <wojciech.drewek@intel.com>
Tested-by: NSandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>

2ae0aa47

29 9月, 2021 1 次提交

ice: Add feature bitmap, helpers and a check for DSCP · 40b24760

由 Anirudh Venkataramanan 提交于 7月 16, 2021

DSCP a.k.a L3 QoS is only supported on certain devices. To enforce this,
this patch introduces a bitmap of features and helper functions.

The feature bitmap is set based on device IDs on driver init. Currently,
DSCP is the only feature in this bitmap, but there will be more in the
future. In the DCB netlink flow, check if the feature bit is set before
exercising DSCP.
Signed-off-by: NAnirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: NGurucharan G <gurucharanx.g@intel.com>
Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>

40b24760

18 6月, 2021 1 次提交

ice: reduce scope of variables · b6b0501d

由 Paul M Stillwell Jr 提交于 5月 06, 2021

There are some places where the scope of a variable can
be reduced so do that.
Signed-off-by: NPaul M Stillwell Jr <paul.m.stillwell.jr@intel.com>
Tested-by: NTony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>

b6b0501d

11 6月, 2021 3 次提交

ice: enable transmit timestamps for E810 devices · ea9b847c

由 Jacob Keller 提交于 6月 09, 2021

Add support for enabling Tx timestamp requests for outgoing packets on
E810 devices.

The ice hardware can support multiple outstanding Tx timestamp requests.
When sending a descriptor to hardware, a Tx timestamp request is made by
setting a request bit, and assigning an index that represents which Tx
timestamp index to store the timestamp in.

Hardware makes no effort to synchronize the index use, so it is up to
software to ensure that Tx timestamp indexes are not re-used before the
timestamp is reported back.

To do this, introduce a Tx timestamp tracker which will keep track of
currently in-use indexes.

In the hot path, if a packet has a timestamp request, an index will be
requested from the tracker. Unfortunately, this does require a lock as
the indexes are shared across all queues on a PHY. There are not enough
indexes to reliably assign only 1 to each queue.

For the E810 devices, the timestamp indexes are not shared across PHYs,
so each port can have its own tracking.

Once hardware captures a timestamp, an interrupt is fired. In this
interrupt, trigger a new work item that will figure out which timestamp
was completed, and report the timestamp back to the stack.

This function loops through the Tx timestamp indexes and checks whether
there is now a valid timestamp. If so, it clears the PHY timestamp
indication in the PHY memory, locks and removes the SKB and bit in the
tracker, then reports the timestamp to the stack.

It is possible in some cases that a timestamp request will be initiated
but never completed. This might occur if the packet is dropped by
software or hardware before it reaches the PHY.

Add a task to the periodic work function that will check whether
a timestamp request is more than a few seconds old. If so, the timestamp
index is cleared in the PHY, and the SKB is released.

Just as with Rx timestamps, the Tx timestamps are only 40 bits wide, and
use the same overall logic for extending to 64 bits of nanoseconds.

With this change, E810 devices should be able to perform basic PTP
functionality.

Future changes will extend the support to cover the E822-based devices.
Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
Tested-by: NTony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>

ea9b847c

ice: enable receive hardware timestamping · 77a78115

由 Jacob Keller 提交于 6月 09, 2021

Add SIOCGHWTSTAMP and SIOCSHWTSTAMP ioctl handlers to respond to
requests to enable timestamping support. If the request is for enabling
Rx timestamps, set a bit in the Rx descriptors to indicate that receive
timestamps should be reported.

Hardware captures receive timestamps in the PHY which only captures part
of the timer, and reports only 40 bits into the Rx descriptor. The upper
32 bits represent the contents of GLTSYN_TIME_L at the point of packet
reception, while the lower 8 bits represent the upper 8 bits of
GLTSYN_TIME_0.

The networking and PTP stack expect 64 bit timestamps in nanoseconds. To
support this, implement some logic to extend the timestamps by using the
full PHC time.

If the Rx timestamp was captured prior to the PHC time, then the real
timestamp is

  PHC - (lower_32_bits(PHC) - timestamp)

If the Rx timestamp was captured after the PHC time, then the real
timestamp is

  PHC + (timestamp - lower_32_bits(PHC))

These calculations are correct as long as neither the PHC timestamp nor
the Rx timestamps are more than 2^32-1 nanseconds old. Further, we can
detect when the Rx timestamp is before or after the PHC as long as the
PHC timestamp is no more than 2^31-1 nanoseconds old.

In that case, we calculate the delta between the lower 32 bits of the
PHC and the Rx timestamp. If it's larger than 2^31-1 then the Rx
timestamp must have been captured in the past. If it's smaller, then the
Rx timestamp must have been captured after PHC time.

Add an ice_ptp_extend_32b_ts function that relies on a cached copy of
the PHC time and implements this algorithm to calculate the proper upper
32bits of the Rx timestamps.

Cache the PHC time periodically in all of the Rx rings. This enables
each Rx ring to simply call the extension function with a recent copy of
the PHC time. By ensuring that the PHC time is kept up to date
periodically, we ensure this algorithm doesn't use stale data and
produce incorrect results.

To cache the time, introduce a kworker and a kwork item to periodically
store the Rx time. It might seem like we should use the .do_aux_work
interface of the PTP clock. This doesn't work because all PFs must cache
this time, but only one PF owns the PTP clock device.

Thus, the ice driver will manage its own kthread instead of relying on
the PTP do_aux_work handler.

With this change, the driver can now report Rx timestamps on all
incoming packets.
Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
Tested-by: NTony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>

77a78115

ice: add support for sideband messages · 8f5ee3c4

由 Jacob Keller 提交于 6月 09, 2021

In order to support certain device features, including enabling the PTP
hardware clock, the ice driver needs to control some registers on the
device PHY.

These registers are accessed by sending sideband messages. For some
hardware, these messages must be sent over the device admin queue, while
other hardware has a dedicated control queue for the sideband messages.

Add the neighbor device message structure for sending a message to the
neighboring device. Where supported, initialize the sideband control
queue and handle cleanup.

Add a wrapper function for sending sideband control queue messages that
read or write a neighboring device register.

Because some devices send sideband messages over the AdminQ, also
increase the length of the admin queue to allow more messages to be
queued up. This is important because the sideband messages add
additional pressure on the AQ usage.

This support will be used in following patches to enable support for
CONFIG_1588_PTP_CLOCK.
Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
Tested-by: NTony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>

8f5ee3c4

10 6月, 2021 1 次提交

ice: parameterize functions responsible for Tx ring management · 2e84f6b3

由 Maciej Fijalkowski 提交于 5月 20, 2021

Commit ae15e0ba ("ice: Change number of XDP Tx queues to match
number of Rx queues") tried to address the incorrect setting of XDP
queue count that was based on the Tx queue count, whereas in theory we
should provide the XDP queue per Rx queue. However, the routines that
setup and destroy the set of Tx resources are still based on the
vsi->num_txq.

Ice supports the asynchronous Tx/Rx queue count, so for a setup where
vsi->num_txq > vsi->num_rxq, ice_vsi_stop_tx_rings and ice_vsi_cfg_txqs
will be accessing the vsi->xdp_rings out of the bounds.

Parameterize two mentioned functions so they get the size of Tx resources
array as the input.

Fixes: ae15e0ba ("ice: Change number of XDP Tx queues to match number of Rx queues")
Signed-off-by: NMaciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: NKiran Bhandare <kiranx.bhandare@intel.com>
Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>

2e84f6b3

07 6月, 2021 5 次提交

ice: downgrade error print to debug print · a69606cd

由 Anirudh Venkataramanan 提交于 5月 06, 2021

Failing to add or remove LLDP filter doesn't seem to be a fatal
error, so downgrade the dev_err message to a dev_dbg message.
Signed-off-by: NAnirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: NTony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>

a69606cd

ice: wait for reset before reporting devlink info · 1c08052e

由 Jacob Keller 提交于 5月 06, 2021

Requesting device firmware information while the device is busy cleaning
up after a reset can result in an unexpected failure:

This occurs because the command is attempting to access the device
AdminQ while it is down. Resolve this by having the command wait for
a while until the reset is complete. To do this, introduce
a reset_wait_queue and associated helper function "ice_wait_for_reset".

This helper will use the wait queue to sleep until the driver is done
rebuilding. Use of a wait queue is preferred because the potential sleep
duration can be several seconds.

To ensure that the thread wakes up properly, a new wake_up call is added
during all code paths which clear the reset state bits associated with
the driver rebuild flow.

Using this ensures that tools can request device information without
worrying about whether the driver is cleaning up from a reset.
Specifically, it is expected that a flash update could result in
a device reset, and it is better to delay the response for information
until the reset is complete rather than exit with an immediate failure.
Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
Tested-by: NTony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>

1c08052e

ice: Re-organizes reqstd/avail {R, T}XQ check/code for efficiency · b38b7f2b

由 Salil Mehta 提交于 4月 23, 2021

If user has explicitly requested the number of {R,T}XQs, then it is
unnecessary to get the count of already available {R,T}XQs from the
PF avail_{r,t}xqs bitmap. This value will get overridden by user specified
value in any case.

Re-organize this code for improving the flow, readability and efficiency.
This scope of improvement was found during the review of the ICE driver
code.

Fixes: 87324e74 ("ice: Implement ethtool ops for channels")
Cc: intel-wired-lan@lists.osuosl.org
Tested-by: NTony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: NSalil Mehta <salil.mehta@huawei.com>
Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>

b38b7f2b

ice: Refactor VIRTCHNL_OP_CONFIG_VSI_QUEUES handling · 7ad15440

由 Brett Creeley 提交于 3月 02, 2021

Currently, when a VF requests queue configuration via
VIRTCHNL_OP_CONFIG_VSI_QUEUES the PF driver expects that this message
will only be called once and we always assume the queues being
configured start from 0. This is incorrect and is causing issues when
a VF tries to send this message for multiple queue blocks. Fix this by
using the queue_id specified in the virtchnl message and allowing for
individual Rx and/or Tx queues to be configured.

Also, reduce the duplicated for loops for configuring the queues by
moving all the logic into a single for loop.
Signed-off-by: NBrett Creeley <brett.creeley@intel.com>
Tested-by: NKonrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>

7ad15440

ice: Refactor ice_setup_rx_ctx · 43c7f919

由 Krzysztof Kazimierczak 提交于 11月 20, 2020

Move AF_XDP logic and buffer allocation out of ice_setup_rx_ctx() to a
new function ice_vsi_cfg_rxq(), so the function actually sets up the Rx
context.
Signed-off-by: NKrzysztof Kazimierczak <krzysztof.kazimierczak@intel.com>
Co-developed-by: NTony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: NKiran Bhandare <kiranx.bhandare@intel.com>

43c7f919

04 6月, 2021 1 次提交

ice: Fix allowing VF to request more/less queues via virtchnl · f0457690

由 Brett Creeley 提交于 2月 26, 2021

Commit 12bb018c ("ice: Refactor VF reset") caused a regression
that removes the ability for a VF to request a different amount of
queues via VIRTCHNL_OP_REQUEST_QUEUES. This prevents VF drivers to
either increase or decrease the number of queue pairs they are
allocated. Fix this by using the variable vf->num_req_qs when
determining the vf->num_vf_qs during VF VSI creation.

Fixes: 12bb018c ("ice: Refactor VF reset")
Signed-off-by: NBrett Creeley <brett.creeley@intel.com>
Tested-by: NKonrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>

f0457690

03 6月, 2021 1 次提交

ice: track AF_XDP ZC enabled queues in bitmap · e102db78

由 Maciej Fijalkowski 提交于 4月 27, 2021

Commit c7a21904 ("ice: Remove xsk_buff_pool from VSI structure")
silently introduced a regression and broke the Tx side of AF_XDP in copy
mode. xsk_pool on ice_ring is set only based on the existence of the XDP
prog on the VSI which in turn picks ice_clean_tx_irq_zc to be executed.
That is not something that should happen for copy mode as it should use
the regular data path ice_clean_tx_irq.

This results in a following splat when xdpsock is run in txonly or l2fwd
scenarios in copy mode:

<snip>
[  106.050195] BUG: kernel NULL pointer dereference, address: 0000000000000030
[  106.057269] #PF: supervisor read access in kernel mode
[  106.062493] #PF: error_code(0x0000) - not-present page
[  106.067709] PGD 0 P4D 0
[  106.070293] Oops: 0000 [#1] PREEMPT SMP NOPTI
[  106.074721] CPU: 61 PID: 0 Comm: swapper/61 Not tainted 5.12.0-rc2+ #45
[  106.081436] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0008.031920191559 03/19/2019
[  106.092027] RIP: 0010:xp_raw_get_dma+0x36/0x50
[  106.096551] Code: 74 14 48 b8 ff ff ff ff ff ff 00 00 48 21 f0 48 c1 ee 30 48 01 c6 48 8b 87 90 00 00 00 48 89 f2 81 e6 ff 0f 00 00 48 c1 ea 0c <48> 8b 04 d0 48 83 e0 fe 48 01 f0 c3 66 66 2e 0f 1f 84 00 00 00 00
[  106.115588] RSP: 0018:ffffc9000d694e50 EFLAGS: 00010206
[  106.120893] RAX: 0000000000000000 RBX: ffff88984b8c8a00 RCX: ffff889852581800
[  106.128137] RDX: 0000000000000006 RSI: 0000000000000000 RDI: ffff88984cd8b800
[  106.135383] RBP: ffff888123b50001 R08: ffff889896800000 R09: 0000000000000800
[  106.142628] R10: 0000000000000000 R11: ffffffff826060c0 R12: 00000000000000ff
[  106.149872] R13: 0000000000000000 R14: 0000000000000040 R15: ffff888123b50018
[  106.157117] FS:  0000000000000000(0000) GS:ffff8897e0f40000(0000) knlGS:0000000000000000
[  106.165332] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  106.171163] CR2: 0000000000000030 CR3: 000000000560a004 CR4: 00000000007706e0
[  106.178408] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  106.185653] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  106.192898] PKRU: 55555554
[  106.195653] Call Trace:
[  106.198143]  <IRQ>
[  106.200196]  ice_clean_tx_irq_zc+0x183/0x2a0 [ice]
[  106.205087]  ice_napi_poll+0x3e/0x590 [ice]
[  106.209356]  __napi_poll+0x2a/0x160
[  106.212911]  net_rx_action+0xd6/0x200
[  106.216634]  __do_softirq+0xbf/0x29b
[  106.220274]  irq_exit_rcu+0x88/0xc0
[  106.223819]  common_interrupt+0x7b/0xa0
[  106.227719]  </IRQ>
[  106.229857]  asm_common_interrupt+0x1e/0x40
</snip>

Fix this by introducing the bitmap of queues that are zero-copy enabled,
where each bit, corresponding to a queue id that xsk pool is being
configured on, will be set/cleared within ice_xsk_pool_{en,dis}able and
checked within ice_xsk_pool(). The latter is a function used for
deciding which napi poll routine is executed.
Idea is being taken from our other drivers such as i40e and ixgbe.

Fixes: c7a21904 ("ice: Remove xsk_buff_pool from VSI structure")
Signed-off-by: NMaciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: NKiran Bhandare <kiranx.bhandare@intel.com>
Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>

e102db78

29 5月, 2021 1 次提交

ice: Initialize RDMA support · d25a0fc4

由 Dave Ertman 提交于 5月 20, 2021

Probe the device's capabilities to see if it supports RDMA. If so, allocate
and reserve resources to support its operation; populate structures with
initial values.
Signed-off-by: NDave Ertman <david.m.ertman@intel.com>
Signed-off-by: NShiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>

d25a0fc4

15 4月, 2021 8 次提交

ice: remove return variable · 4fe36226

由 Paul M Stillwell Jr 提交于 3月 31, 2021

We were saving the return value from ice_vsi_manage_rss_lut(), but
the errors from that function are not critical so change it to
return void and remove the code that saved the value.
Signed-off-by: NPaul M Stillwell Jr <paul.m.stillwell.jr@intel.com>
Tested-by: NTony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>

4fe36226

ice: Set vsi->vf_id as ICE_INVAL_VFID for non VF VSI types · c931c782

由 Brett Creeley 提交于 3月 31, 2021

Currently the vsi->vf_id is set only for ICE_VSI_VF and it's left as 0
for all other VSI types. This is confusing and could be problematic
since 0 is a valid vf_id. Fix this by always setting non VF VSI types to
ICE_INVAL_VFID.
Signed-off-by: NBrett Creeley <brett.creeley@intel.com>
Tested-by: NTony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>

c931c782

ice: use local for consistency · 58623c52

由 Jesse Brandeburg 提交于 3月 31, 2021

Do a minor refactor on ice_vsi_rebuild to use a local
variable to store vsi->type.
Signed-off-by: NJesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: NTony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>

58623c52

ice: refactor ITR data structures · d59684a0

由 Jesse Brandeburg 提交于 3月 31, 2021

Use a dedicated bitfield in order to both increase
the amount of checking around the length of ITR writes
as well as simplify the checks of dynamic mode.

Basically unpack the "high bit means dynamic" logic
into bitfields.

Also, remove some unused ITR defines.
Signed-off-by: NJesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: NTony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>

d59684a0

ice: replace custom AIM algorithm with kernel's DIM library · cdf1f1f1

由 Jacob Keller 提交于 3月 31, 2021

The ice driver has support for adaptive interrupt moderation, an
algorithm for tuning the interrupt rate dynamically. This algorithm
is based on various assumptions about ring size, socket buffer size,
link speed, SKB overhead, ethernet frame overhead and more.

The Linux kernel has support for a dynamic interrupt moderation
algorithm known as "dimlib". Replace the custom driver-specific
implementation of dynamic interrupt moderation with the kernel's
algorithm.

The Intel hardware has a different hardware implementation than the
originators of the dimlib code had to work with, which requires the
driver to use a slightly different set of inputs for the actual
moderation values, while getting all the advice from dimlib of
better/worse, shift left or right.

The change made for this implementation is to use a pair of values
for each of the 5 "slots" that the dimlib moderation expects, and
the driver will program those pairs when dimlib recommends a slot to
use. The currently implementation uses two tables, one for receive
and one for transmit, and the pairs of values in each slot set the
maximum delay of an interrupt and a maximum number of interrupts per
second (both expressed in microseconds).

There are two separate kinds of bugs fixed by using DIMLIB, one is
UDP single stream send was too slow, and the other is that 8K
ping-pong was going to the most aggressive moderation and has much
too high latency.

The overall result of using DIMLIB is that we meet or exceed our
performance expectations set based on the old algorithm.
Co-developed-by: NJesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: NJesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
Tested-by: NTony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>

cdf1f1f1

ice: refactor interrupt moderation writes · b8b47723

由 Jesse Brandeburg 提交于 3月 31, 2021

Introduce several new helpers for writing ITR and GLINT_RATE
registers, and refactor the code calling them.  This resulted
in removal of several duplicate functions and rolled a bunch
of simple code back into the calling routines.

In particular this removes some code that was doing both
a store and a set in a helper function, which seems better
done as separate tasks in the caller (and generally takes
less lines of code even with a tiny bit of repetition).
Signed-off-by: NJesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: NTony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>

b8b47723

ice: Add new VSI states to track netdev alloc/registration · a476d72a

由 Anirudh Venkataramanan 提交于 3月 02, 2021

Add two new VSI states, one to track if a netdev for the VSI has been
allocated and the other to track if the netdev has been registered.
Call unregister_netdev/free_netdev only when the corresponding state
bits are set.
Signed-off-by: NAnirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: NTony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>

a476d72a

ice: Drop leading underscores in enum ice_pf_state · 7e408e07

由 Anirudh Venkataramanan 提交于 3月 02, 2021

Remove the leading underscores in enum ice_pf_state. This is not really
communicating anything and is unnecessary. No functional change.
Signed-off-by: NAnirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: NTony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>

7e408e07

08 4月, 2021 1 次提交

ice: Ignore EMODE return for opcode 0x0605 · d348d517

由 Anirudh Venkataramanan 提交于 3月 25, 2021

When link is owned by manageability, the driver is not allowed to fiddle
with link. FW returns ICE_AQ_RC_EMODE if the driver attempts to do so.
This patch adds a new function ice_set_link which abstracts the call to
ice_aq_set_link_restart_an and provides a clean way to turn on/off link.

While making this change, I also spotted that an int variable was being
used to hold both an ice_status return code and the Linux errno return
code. This pattern more often than not results in the driver inadvertently
returning ice_status back to kernel which is a major boo-boo. Clean it up.
Signed-off-by: NAnirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: NTony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>

d348d517

01 4月, 2021 5 次提交

ice: Consolidate VSI state and flags · e97fb1ae

由 Anirudh Venkataramanan 提交于 3月 02, 2021

struct ice_vsi has two fields, state and flags which seem to
be serving the same purpose. Consolidate them into one field
'state'.

enum ice_state is used to represent state information of the PF.
While some of these enum values can be use to represent VSI state,
it makes more sense to represent VSI state with its own enum. So
derive a new enum ice_vsi_state from ice_vsi_flags and ice_state
and use it. Also rename enum ice_state to ice_pf_state for clarity.
Signed-off-by: NAnirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: NTony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>

e97fb1ae

ice: Refactor ice_set/get_rss into LUT and key specific functions · b66a972a

由 Brett Creeley 提交于 3月 02, 2021

Currently ice_set/get_rss are used to set/get the RSS LUT and/or RSS
key. However nearly everywhere these functions are called only the LUT
or key are set/get. Also, making this change reduces how many things
ice_set/get_rss are doing. Fix this by adding ice_set/get_rss_lut and
ice_set/get_rss_key functions.

Also, consolidate all calls for setting/getting the RSS LUT and RSS Key
to use ice_set/get_rss_lut() and ice_set/get_rss_key().
Signed-off-by: NBrett Creeley <brett.creeley@intel.com>
Tested-by: NTony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>

b66a972a

ice: Refactor get/set RSS LUT to use struct parameter · e3c53928

由 Brett Creeley 提交于 3月 02, 2021

Update ice_aq_get_rss_lut() and ice_aq_set_rss_lut() to take a new
structure ice_aq_get_set_rss_params instead of passing individual
parameters. This is done for 2 reasons:

1. Reduce the number of parameters passed to the functions.
2. Reduce the amount of change required if the arguments ever need to be
   updated in the future.

Also, reduce duplicate code that was checking for an invalid vsi_handle
and lut parameter by moving the checks to the lower level
__ice_aq_get_set_rss_lut().
Signed-off-by: NBrett Creeley <brett.creeley@intel.com>
Tested-by: NTony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>

e3c53928

ice: Change ice_vsi_setup_q_map() to not depend on RSS · 8134d5ff

由 Brett Creeley 提交于 3月 02, 2021

Currently, ice_vsi_setup_q_map() depends on the VSI's rss_size. However,
the Rx Queue Mapping section of the VSI context has no dependency on RSS.
Instead, limit the maximum number of Rx queues per TC based on the Rx
Queue mapping section of the VSI context, which currently allows for up
to 256 Rx queues per TC.
Signed-off-by: NBrett Creeley <brett.creeley@intel.com>
Tested-by: NTony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>

8134d5ff

ice: handle increasing Tx or Rx ring sizes · 2ec56385

由 Paul M Stillwell Jr 提交于 3月 02, 2021

There is an issue when the Tx or Rx ring size increases using
'ethtool -L ...' where the new rings don't get the correct ITR
values because when we rebuild the VSI we don't know that some
of the rings may be new.

Fix this by looking at the original number of rings and
determining if the rings in ice_vsi_rebuild_set_coalesce()
were not present in the original rings received in
ice_vsi_rebuild_get_coalesce().

Also change the code to return an error if we can't allocate
memory for the coalesce data in ice_vsi_rebuild().
Signed-off-by: NPaul M Stillwell Jr <paul.m.stillwell.jr@intel.com>
Tested-by: NTony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>

2ec56385

openeuler / Kernel 大约 2 年 前同步成功

openeuler / Kernel
大约 2 年前同步成功