提交 · aae425efdfd1b1d8452260a3cb49344ebf20b1f5 · openeuler / Kernel

14 10月, 2022 1 次提交

由 Jan Sokolowski 提交于 10月 12, 2022

During reallocation of RX buffers, new DMA mappings are created for
those buffers.

steps for reproduction:
while :
do
for ((i=0; i<=8160; i=i+32))
do
ethtool -G enp130s0f0 rx $i tx $i
sleep 0.5
ethtool -g enp130s0f0
done
done

This resulted in crash:
i40e 0000:01:00.1: Unable to allocate memory for the Rx descriptor ring, size=65536
Driver BUG
WARNING: CPU: 0 PID: 4300 at net/core/xdp.c:141 xdp_rxq_info_unreg+0x43/0x50
Call Trace:
i40e_free_rx_resources+0x70/0x80 [i40e]
i40e_set_ringparam+0x27c/0x800 [i40e]
ethnl_set_rings+0x1b2/0x290
genl_family_rcv_msg_doit.isra.15+0x10f/0x150
genl_family_rcv_msg+0xb3/0x160
? rings_fill_reply+0x1a0/0x1a0
genl_rcv_msg+0x47/0x90
? genl_family_rcv_msg+0x160/0x160
netlink_rcv_skb+0x4c/0x120
genl_rcv+0x24/0x40
netlink_unicast+0x196/0x230
netlink_sendmsg+0x204/0x3d0
sock_sendmsg+0x4c/0x50
__sys_sendto+0xee/0x160
? handle_mm_fault+0xbe/0x1e0
? syscall_trace_enter+0x1d3/0x2c0
__x64_sys_sendto+0x24/0x30
do_syscall_64+0x5b/0x1a0
entry_SYSCALL_64_after_hwframe+0x65/0xca
RIP: 0033:0x7f5eac8b035b
Missing register, driver bug
WARNING: CPU: 0 PID: 4300 at net/core/xdp.c:119 xdp_rxq_info_unreg_mem_model+0x69/0x140
Call Trace:
xdp_rxq_info_unreg+0x1e/0x50
i40e_free_rx_resources+0x70/0x80 [i40e]
i40e_set_ringparam+0x27c/0x800 [i40e]
ethnl_set_rings+0x1b2/0x290
genl_family_rcv_msg_doit.isra.15+0x10f/0x150
genl_family_rcv_msg+0xb3/0x160
? rings_fill_reply+0x1a0/0x1a0
genl_rcv_msg+0x47/0x90
? genl_family_rcv_msg+0x160/0x160
netlink_rcv_skb+0x4c/0x120
genl_rcv+0x24/0x40
netlink_unicast+0x196/0x230
netlink_sendmsg+0x204/0x3d0
sock_sendmsg+0x4c/0x50
__sys_sendto+0xee/0x160
? handle_mm_fault+0xbe/0x1e0
? syscall_trace_enter+0x1d3/0x2c0
__x64_sys_sendto+0x24/0x30
do_syscall_64+0x5b/0x1a0
entry_SYSCALL_64_after_hwframe+0x65/0xca
RIP: 0033:0x7f5eac8b035b

This was caused because of new buffers with different RX ring count should
substitute older ones, but those buffers were freed in
i40e_configure_rx_ring and reallocated again with i40e_alloc_rx_bi,
thus kfree on rx_bi caused leak of already mapped DMA.

Fix this by reallocating ZC with rx_bi_zc struct when BPF program loads. Additionally
reallocate back to rx_bi when BPF program unloads.

If BPF program is loaded/unloaded and XSK pools are created, reallocate
RX queues accordingly in XSP_SETUP_XSK_POOL handler.

Fixes: be1222b5 ("i40e: Separate kernel allocated rx_bi rings from AF_XDP rings")
Signed-off-by: NJan Sokolowski <jan.sokolowski@intel.com>
Signed-off-by: NMateusz Palczewski <mateusz.palczewski@intel.com>
Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
Tested-by: Chandan <chandanx.rout@intel.com> (A Contingent Worker at Intel)
Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

aae425ef

03 9月, 2022 1 次提交

i40e: Fix ADQ rate limiting for PF · 45bb006d

由 Przemyslaw Patynowski 提交于 8月 09, 2022

Fix HW rate limiting for ADQ.
Fallback to kernel queue selection for ADQ, as it is network stack
that decides which queue to use for transmit with ADQ configured.
Reset PF after creation of VMDq2 VSIs required for ADQ, as to
reprogram TX queue contexts in i40e_configure_tx_ring.
Without this patch PF would limit TX rate only according to TC0.

Fixes: a9ce82f7 ("i40e: Enable 'channel' mode in mqprio for TC configs")
Signed-off-by: NPrzemyslaw Patynowski <przemyslawx.patynowski@intel.com>
Signed-off-by: NJan Sokolowski <jan.sokolowski@intel.com>
Tested-by: NBharathi Sreenivas <bharathi.sreenivas@intel.com>
Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>

45bb006d

16 8月, 2022 1 次提交

i40e: Fix tunnel checksum offload with fragmented traffic · 2c648209

由 Przemyslaw Patynowski 提交于 7月 27, 2022

Fix checksum offload on VXLAN tunnels.
In case, when mpls protocol is not used, set l4 header to transport
header of skb. This fixes case, when user tries to offload checksums
of VXLAN tunneled traffic.

Steps for reproduction (requires link partner with tunnels):
ip l s enp130s0f0 up
ip a f enp130s0f0
ip a a 10.10.110.2/24 dev enp130s0f0
ip l s enp130s0f0 mtu 1600
ip link add vxlan12_sut type vxlan id 12 group 238.168.100.100 dev \
enp130s0f0 dstport 4789
ip l s vxlan12_sut up
ip a a 20.10.110.2/24 dev vxlan12_sut
iperf3 -c 20.10.110.1 #should connect

Without this patch, TX descriptor was using wrong data, due to
l4 header pointing wrong address. NIC would then drop those packets
internally, due to incorrect TX descriptor data, which increased
GLV_TEPC register.

Fixes: b4fb2d33 ("i40e: Add support for MPLS + TSO")
Signed-off-by: NPrzemyslaw Patynowski <przemyslawx.patynowski@intel.com>
Signed-off-by: NMateusz Palczewski <mateusz.palczewski@intel.com>
Tested-by: NMarek Szlosek <marek.szlosek@intel.com>
Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>

2c648209

01 7月, 2022 1 次提交

intel: remove unused macros · fda35af9

由 Jesse Brandeburg 提交于 6月 24, 2022

As found by the compile option -Wunused-macros, remove these macros
that are never used by the code.
Signed-off-by: NJesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>

fda35af9

25 6月, 2022 1 次提交

i40e: read the XDP program once per NAPI · 78f31931

由 Ciara Loftus 提交于 6月 23, 2022

Similar to how it's done in the ice driver since 'eb087cd8 ("ice:
propagate xdp_ring onto rx_ring")', read the XDP program once per NAPI
instead of once per descriptor cleaned. I measured an improvement in
throughput of 2% for the AF_XDP xdpsock l2fwd benchmark for zero copy mode
and 1% for copy mode.
Signed-off-by: NCiara Loftus <ciara.loftus@intel.com>
Link: https://lore.kernel.org/r/20220623100852.7867-1-ciara.loftus@intel.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>

78f31931

22 6月, 2022 1 次提交

intel/i40e: delete if NULL check before dev_kfree_skb · 56878d49

由 Bernard Zhao 提交于 5月 10, 2022

dev_kfree_skb check if the input parameter NULL and do the right
thing, there is no need to check again.
This change is to cleanup the code a bit.
Signed-off-by: NBernard Zhao <zhaojunkui2008@126.com>
Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>

56878d49

15 6月, 2022 1 次提交

i40e: add xdp frags support to ndo_xdp_xmit · fe63ec97

由 Lorenzo Bianconi 提交于 6月 13, 2022

Add the capability to map non-linear xdp frames in XDP_TX and ndo_xdp_xmit
callback.
Tested-by: NSarkar Tirthendu <tirthendu.sarkar@intel.com>
Signed-off-by: NLorenzo Bianconi <lorenzo@kernel.org>
Tested-by: NGeorge Kuruvinakunnel <george.kuruvinakunnel@intel.com>
Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fe63ec97

13 4月, 2022 2 次提交

i40e: Add tx_stopped stat · f728fa01

由 Joe Damato 提交于 3月 24, 2022

Track TX queue stop events and export the new stat with ethtool.
Signed-off-by: NJoe Damato <jdamato@fastly.com>
Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>

f728fa01

i40e: Add support for MPLS + TSO · b4fb2d33

由 Joe Damato 提交于 3月 02, 2022

This change adds support for TSO of MPLS packets.

In my tests with tcpdump it seems to work. Note this test setup has
a 9000 byte MTU:

MPLS (label 100, exp 0, [S], ttl 64) IP srcip.50086 > dstip.1234:
  Flags [P.], seq 593345:644401, ack 0, win 420,
  options [nop,nop,TS val 45022534 ecr 1722291395], length 51056

IP dstip.1234 > srcip.50086: Flags [.], ack 593345, win 122,
  options [nop,nop,TS val 1722291395 ecr 45022534], length 0

IP dstip.1234 > srcip.50086: Flags [.], ack 602289, win 105,
  options [nop,nop,TS val 1722291395 ecr 45022534], length 0

IP dstip.1234 > srcip.50086: Flags [.], ack 620177, win 71,
  options [nop,nop,TS val 1722291395 ecr 45022534], length 0

MPLS (label 100, exp 0, [S], ttl 64) IP srcip.50086 > dstip.1234:
  Flags [P.], seq 644401:655953, ack 0, win 420,
  options [nop,nop,TS val 45022534 ecr 1722291395], length 11552

IP dstip.1234 > srcip.50086: Flags [.], ack 638065, win 37,
  options [nop,nop,TS val 1722291395 ecr 45022534], length 0

IP dstip.1234 > srcip.50086: Flags [.], ack 644401, win 25,
  options [nop,nop,TS val 1722291395 ecr 45022534], length 0

IP dstip.1234 > srcip.50086: Flags [.], ack 653345, win 8,
  options [nop,nop,TS val 1722291395 ecr 45022534], length 0

IP dstip.1234 > srcip.50086: Flags [.], ack 655953, win 3,
  options [nop,nop,TS val 1722291395 ecr 45022534], length 0
Signed-off-by: NJoe Damato <jdamato@fastly.com>
Co-developed-by: NMike Gallo <mgallo@fastly.com>
Signed-off-by: NMike Gallo <mgallo@fastly.com>
Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>

b4fb2d33

09 2月, 2022 4 次提交

i40e: Add a stat for tracking busy rx pages · b76bc129

由 Joe Damato 提交于 12月 17, 2021

In some cases, pages cannot be reused by i40e because the page is busy. Add
a counter for this event.

Busy page count is accessible via ethtool.
Signed-off-by: NJoe Damato <jdamato@fastly.com>
Tested-by: NDave Switzer <david.switzer@intel.com>
Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>

b76bc129

i40e: Add a stat for tracking pages waived · cb963b98

由 Joe Damato 提交于 12月 17, 2021

In some cases, pages can not be reused because they are not associated with
the correct NUMA zone. Knowing how often pages are waived helps users to
understand the interaction between the driver's memory usage and their
system.

Pass rx_stats through to i40e_can_reuse_rx_page to allow tracking when
pages are waived.

The page waive count is accessible via ethtool.
Signed-off-by: NJoe Damato <jdamato@fastly.com>
Tested-by: NDave Switzer <david.switzer@intel.com>
Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>

cb963b98

i40e: Add a stat tracking new RX page allocations · 453f8305

由 Joe Damato 提交于 12月 17, 2021

Add a counter for new page allocations in the i40e RX path. This stat is
accessible with ethtool.
Signed-off-by: NJoe Damato <jdamato@fastly.com>
Tested-by: NDave Switzer <david.switzer@intel.com>
Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>

453f8305

i40e: Remove rx page reuse double count · 89bb0983

由 Joe Damato 提交于 12月 17, 2021

Page reuse was being tracked from two locations:
  - i40e_reuse_rx_page (via 40e_clean_rx_irq), and
  - i40e_alloc_mapped_page

Remove the double count and only count reuse from i40e_alloc_mapped_page
when the page is about to be reused.
Signed-off-by: NJoe Damato <jdamato@fastly.com>
Tested-by: NDave Switzer <david.switzer@intel.com>
Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>

89bb0983

28 1月, 2022 1 次提交

i40e: xsk: Move tmp desc array from driver to pool · d1bc532e

由 Magnus Karlsson 提交于 1月 25, 2022

Move desc_array from the driver to the pool. The reason behind this is
that we can then reuse this array as a temporary storage for descriptors
in all zero-copy drivers that use the batched interface. This will make
it easier to add batching to more drivers.

i40e is the only driver that has a batched Tx zero-copy
implementation, so no need to touch any other driver.
Signed-off-by: NMagnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Reviewed-by: NAlexander Lobakin <alexandr.lobakin@intel.com>
Link: https://lore.kernel.org/bpf/20220125160446.78976-6-maciej.fijalkowski@intel.com

d1bc532e

29 12月, 2021 1 次提交

i40e: switch to napi_build_skb() · 6e19cf7d

由 Alexander Lobakin 提交于 11月 23, 2021

napi_build_skb() reuses per-cpu NAPI skbuff_head cache in order
to save some cycles on freeing/allocating skbuff_heads on every
new Rx or completed Tx.
i40e driver runs Tx completion polling cycle right before the Rx
one and uses napi_consume_skb() to feed the cache with skbuff_heads
of completed entries, so it's never empty and always warm at that
moment. Switch to the napi_build_skb() to relax mm pressure on
heavy Rx.
Signed-off-by: NAlexander Lobakin <alexandr.lobakin@intel.com>
Reviewed-by: NMichal Swiatkowski <michal.swiatkowski@linux.intel.com>
Tested-by: NGurucharan G <gurucharanx.g@intel.com>
Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>

6e19cf7d

14 12月, 2021 1 次提交

bpf: Let bpf_warn_invalid_xdp_action() report more info · c8064e5b

由 Paolo Abeni 提交于 11月 30, 2021

In non trivial scenarios, the action id alone is not sufficient to
identify the program causing the warning. Before the previous patch,
the generated stack-trace pointed out at least the involved device
driver.

Let's additionally include the program name and id, and the relevant
device name.

If the user needs additional infos, he can fetch them via a kernel
probe, leveraging the arguments added here.
Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NToke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/bpf/ddb96bb975cbfddb1546cf5da60e77d5100b533c.1638189075.git.pabeni@redhat.com

c8064e5b

20 8月, 2021 1 次提交

i40e: Fix ATR queue selection · a222be59

由 Arkadiusz Kubalewski 提交于 8月 18, 2021

Without this patch, ATR does not work. Receive/transmit uses queue
selection based on SW DCB hashing method.

If traffic classes are not configured for PF, then use
netdev_pick_tx function for selecting queue for packet transmission.
Instead of calling i40e_swdcb_skb_tx_hash, call netdev_pick_tx,
which ensures that packet is transmitted/received from CPU that is
running the application.

Reproduction steps:
1. Load i40e driver
2. Map each MSI interrupt of i40e port for each CPU
3. Disable ntuple, enable ATR i.e.:
ethtool -K $interface ntuple off
ethtool --set-priv-flags $interface flow-director-atr
4. Run application that is generating traffic and is bound to a
single CPU, i.e.:
taskset -c 9 netperf -H 1.1.1.1 -t TCP_RR -l 10
5. Observe behavior:
Application's traffic should be restricted to the CPU provided in
taskset.

Fixes: 89ec1f08 ("i40e: Fix queue-to-TC mapping on Tx")
Signed-off-by: NPrzemyslaw Patynowski <przemyslawx.patynowski@intel.com>
Signed-off-by: NArkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Tested-by: NDave Switzer <david.switzer@intel.com>
Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

a222be59

23 7月, 2021 1 次提交

i40e: Fix queue-to-TC mapping on Tx · 89ec1f08

由 Jedrzej Jagielski 提交于 6月 02, 2021

In SW DCB mode the packets sent receive incorrect UP tags. They are
constructed correctly and put into tx_ring, but UP is later remapped by
HW on the basis of TCTUPR register contents according to Tx queue
selected, and BW used is consistent with the new UP values. This is
caused by Tx queue selection in kernel not taking into account DCB
configuration. This patch fixes the issue by implementing the
ndo_select_queue NDO callback.

Fixes: fd0a05ce ("i40e: transmit, receive, and NAPI")
Signed-off-by: NArkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Signed-off-by: NJedrzej Jagielski <jedrzej.jagielski@intel.com>
Tested-by: NImam Hassan Reza Biswas <imam.hassan.reza.biswas@intel.com>
Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>

89ec1f08

25 6月, 2021 1 次提交

intel: Remove rcu_read_lock() around XDP program invocation · 49589b23

由 Toke Høiland-Jørgensen 提交于 6月 24, 2021

The Intel drivers all have rcu_read_lock()/rcu_read_unlock() pairs around
XDP program invocations. However, the actual lifetime of the objects
referred by the XDP program invocation is longer, all the way through to
the call to xdp_do_flush(), making the scope of the rcu_read_lock() too
small. This turns out to be harmless because it all happens in a single
NAPI poll cycle (and thus under local_bh_disable()), but it makes the
rcu_read_lock() misleading.

Rather than extend the scope of the rcu_read_lock(), just get rid of it
entirely. With the addition of RCU annotations to the XDP_REDIRECT map
types that take bh execution into account, lockdep even understands this to
be safe, so there's really no reason to keep it around.
Signed-off-by: NToke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Tested-by: Jesper Dangaard Brouer <brouer@redhat.com> # i40e
Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>
Cc: Tony Nguyen <anthony.l.nguyen@intel.com>
Cc: intel-wired-lan@lists.osuosl.org
Link: https://lore.kernel.org/bpf/20210624160609.292325-12-toke@redhat.com

49589b23

03 6月, 2021 1 次提交

i40e: add correct exception tracing for XDP · f6c10b48

由 Magnus Karlsson 提交于 5月 10, 2021

Add missing exception tracing to XDP when a number of different errors
can occur. The support was only partial. Several errors where not
logged which would confuse the user quite a lot not knowing where and
why the packets disappeared.

Fixes: 74608d17 ("i40e: add support for XDP_TX action")
Fixes: 0a714186 ("i40e: add AF_XDP zero-copy Rx support")
Reported-by: NJesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: NMagnus Karlsson <magnus.karlsson@intel.com>
Tested-by: NKiran Bhandare <kiranx.bhandare@intel.com>
Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>

f6c10b48

08 5月, 2021 1 次提交

i40e: fix broken XDP support · ae4393df

由 Magnus Karlsson 提交于 4月 26, 2021

Commit 12738ac4 ("i40e: Fix sparse errors in i40e_txrx.c") broke
XDP support in the i40e driver. That commit was fixing a sparse error
in the code by introducing a new variable xdp_res instead of
overloading this into the skb pointer. The problem is that the code
later uses the skb pointer in if statements and these where not
extended to also test for the new xdp_res variable. Fix this by adding
the correct tests for xdp_res in these places.

The skb pointer was used to store the result of the XDP program by
overloading the results in the error pointer
ERR_PTR(-result). Therefore, the allocation failure test that used to
only test for !skb now need to be extended to also consider !xdp_res.

i40e_cleanup_headers() had a check that based on the skb value being
an error pointer, i.e. a result from the XDP program != XDP_PASS, and
if so start to process a new packet immediately, instead of populating
skb fields and sending the skb to the stack. This check is not needed
anymore, since we have added an explicit test for xdp_res being set
and if so just do continue to pick the next packet from the NIC.

Fixes: 12738ac4 ("i40e: Fix sparse errors in i40e_txrx.c")
Acked-by: NJesper Dangaard Brouer <brouer@redhat.com>
Tested-by: NJesper Dangaard Brouer <brouer@redhat.com>
Reported-by: NJesper Dangaard Brouer <brouer@redhat.com>
Reviewed-by: NMaciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: NMagnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>

ae4393df

09 4月, 2021 1 次提交

i40e: Fix sparse errors in i40e_txrx.c · 12738ac4

由 Arkadiusz Kubalewski 提交于 3月 26, 2021

Remove error handling through pointers. Instead use plain int
to return value from i40e_run_xdp(...).

Previously:
- sparse errors were produced during compilation:
i40e_txrx.c:2338 i40e_run_xdp() error: (-2147483647) too low for ERR_PTR
i40e_txrx.c:2558 i40e_clean_rx_irq() error: 'skb' dereferencing possible ERR_PTR()

- sk_buff* was used to return value, but it has never had valid
pointer to sk_buff. Returned value was always int handled as
a pointer.

Fixes: 0c8493d9 ("i40e: add XDP support for pass and drop actions")
Fixes: 2e689312 ("i40e: split XDP_TX tail and XDP_REDIRECT map flushing")
Signed-off-by: NAleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: NArkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Tested-by: NDave Switzer <david.switzer@intel.com>
Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>

12738ac4

24 3月, 2021 1 次提交

intel: clean up mismatched header comments · 262de08f

由 Jesse Brandeburg 提交于 3月 18, 2021

A bunch of header comments were showing warnings when compiling
with W=1. Fix them all at once. This changes only comments.
Signed-off-by: NJesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>

262de08f

18 3月, 2021 1 次提交

bpf, devmap: Move drop error path to devmap for XDP_REDIRECT · fdc13979

由 Lorenzo Bianconi 提交于 3月 08, 2021

We want to change the current ndo_xdp_xmit drop semantics because it will
allow us to implement better queue overflow handling. This is working
towards the larger goal of a XDP TX queue-hook. Move XDP_REDIRECT error
path handling from each XDP ethernet driver to devmap code. According to
the new APIs, the driver running the ndo_xdp_xmit pointer, will break tx
loop whenever the hw reports a tx error and it will just return to devmap
caller the number of successfully transmitted frames. It will be devmap
responsibility to free dropped frames.

Move each XDP ndo_xdp_xmit capable driver to the new APIs:

- veth
- virtio-net
- mvneta
- mvpp2
- socionext
- amazon ena
- bnxt
- freescale (dpaa2, dpaa)
- xen-frontend
- qede
- ice
- igb
- ixgbe
- i40e
- mlx5
- ti (cpsw, cpsw-new)
- tun
- sfc
Signed-off-by: NLorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Reviewed-by: NIoana Ciornei <ioana.ciornei@nxp.com>
Reviewed-by: NIlias Apalodimas <ilias.apalodimas@linaro.org>
Reviewed-by: NCamelia Groza <camelia.groza@nxp.com>
Acked-by: NEdward Cree <ecree.xilinx@gmail.com>
Acked-by: NJesper Dangaard Brouer <brouer@redhat.com>
Acked-by: NShay Agroskin <shayagr@amazon.com>
Link: https://lore.kernel.org/bpf/ed670de24f951cfd77590decf0229a0ad7fd12f6.1615201152.git.lorenzo@kernel.org

fdc13979

12 3月, 2021 1 次提交

i40e: move headroom initialization to i40e_configure_rx_ring · a8660626

由 Maciej Fijalkowski 提交于 3月 03, 2021

i40e_rx_offset(), that is supposed to initialize the Rx buffer headroom,
relies on I40E_RXR_FLAGS_BUILD_SKB_ENABLED flag.

Currently, the callsite of mentioned function is placed incorrectly
within i40e_setup_rx_descriptors() where Rx ring's build skb flag is not
set yet. This causes the XDP_REDIRECT to be partially broken due to
inability to create xdp_frame in the headroom space, as the headroom is
0.

For the record, below is the call graph:

i40e_vsi_open
 i40e_vsi_setup_rx_resources
  i40e_setup_rx_descriptors
   i40e_rx_offset() <-- sets offset to 0 as build_skb flag is set below

 i40e_vsi_configure_rx
  i40e_configure_rx_ring
   set_ring_build_skb_enabled(ring) <-- set build_skb flag

Fix this by moving i40e_rx_offset() to i40e_configure_rx_ring() after
the flag setting.

Fixes: f7bb0d71 ("i40e: store the result of i40e_rx_offset() onto i40e_ring")
Reported-by: NJesper Dangaard Brouer <brouer@redhat.com>
Co-developed-by: NJesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: NMaciej Fijalkowski <maciej.fijalkowski@intel.com>
Acked-by: NJesper Dangaard Brouer <brouer@redhat.com>
Tested-by: NJesper Dangaard Brouer <brouer@redhat.com>
Tested-by: NKiran Bhandare <kiranx.bhandare@intel.com>
Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>

a8660626

20 2月, 2021 1 次提交

i40e: Fix endianness conversions · b32cddd2

由 Norbert Ciosek 提交于 2月 05, 2021

Fixes the following sparse warnings:
i40e_main.c:5953:32: warning: cast from restricted __le16
i40e_main.c:8008:29: warning: incorrect type in assignment (different base types)
i40e_main.c:8008:29: expected unsigned int [assigned] [usertype] ipa
i40e_main.c:8008:29: got restricted __le32 [usertype]
i40e_main.c:8008:29: warning: incorrect type in assignment (different base types)
i40e_main.c:8008:29: expected unsigned int [assigned] [usertype] ipa
i40e_main.c:8008:29: got restricted __le32 [usertype]
i40e_txrx.c:1950:59: warning: incorrect type in initializer (different base types)
i40e_txrx.c:1950:59: expected unsigned short [usertype] vlan_tag
i40e_txrx.c:1950:59: got restricted __le16 [usertype] l2tag1
i40e_txrx.c:1953:40: warning: cast to restricted __le16
i40e_xsk.c:448:38: warning: invalid assignment: |=
i40e_xsk.c:448:38: left side has type restricted __le64
i40e_xsk.c:448:38: right side has type int

Fixes: 2f4b411a ("i40e: Enable cloud filters via tc-flower")
Fixes: 2a508c64 ("i40e: fix VLAN.TCI == 0 RX HW offload")
Fixes: 3106c580 ("i40e: Use batched xsk Tx interfaces to increase performance")
Fixes: 8f88b303 ("i40e: Add infrastructure for queue channel support")
Signed-off-by: NNorbert Ciosek <norbertx.ciosek@intel.com>
Tested-by: NTony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>

b32cddd2

19 2月, 2021 1 次提交

i40e: Fix flow for IPv6 next header (extension header) · 92c60580

由 Slawomir Laba 提交于 9月 10, 2020

When a packet contains an IPv6 header with next header which is
an extension header and not a protocol one, the kernel function
skb_transport_header called with such sk_buff will return a
pointer to the extension header and not to the TCP one.

The above explained call caused a problem with packet processing
for skb with encapsulation for tunnel with I40E_TX_CTX_EXT_IP_IPV6.
The extension header was not skipped at all.

The ipv6_skip_exthdr function does check if next header of the IPV6
header is an extension header and doesn't modify the l4_proto pointer
if it points to a protocol header value so its safe to omit the
comparison of exthdr and l4.hdr pointers. The ipv6_skip_exthdr can
return value -1. This means that the skipping process failed
and there is something wrong with the packet so it will be dropped.

Fixes: a3fd9d88 ("i40e/i40evf: Handle IPv6 extension headers in checksum offload")
Signed-off-by: NSlawomir Laba <slawomirx.laba@intel.com>
Signed-off-by: NPrzemyslaw Patynowski <przemyslawx.patynowski@intel.com>
Reviewed-by: NAleksandr Loktionov <aleksandr.loktionov@intel.com>
Tested-by: NTony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>

92c60580

13 2月, 2021 3 次提交

i40e: store the result of i40e_rx_offset() onto i40e_ring · f7bb0d71

由 Maciej Fijalkowski 提交于 1月 18, 2021

Output of i40e_rx_offset() is based on ethtool's priv flag setting,
which when changed, causes PF reset (disables napi, frees irqs, loads
different Rx mem model, etc.). This means that within napi its result is
constant and there is no reason to call it per each processed frame.

Add new 'rx_offset' field to i40e_ring that is meant to hold the
i40e_rx_offset() result and use it within i40e_clean_rx_irq().
Furthermore, use it within i40e_alloc_mapped_page().

Last but not least, un-inline the function of interest so that compiler
makes the decision about inlining as it lives in .c file.
Reviewed-by: NBjörn Töpel <bjorn.topel@intel.com>
Signed-off-by: NMaciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: NTony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>

f7bb0d71

i40e: adjust i40e_is_non_eop · d06e2f05

由 Maciej Fijalkowski 提交于 1月 18, 2021

i40e_is_non_eop had a leftover comment and unused skb argument which was
used for placing the skb onto rx_buf in case when current buffer was
non-eop one. This is not relevant anymore as commit e72e5659
("i40e/i40evf: Moves skb from i40e_rx_buffer to i40e_ring") pulled the
non-complete skb handling out of rx_bufs up to rx_ring.  Therefore,
let's adjust the function arguments that i40e_is_non_eop takes.

Furthermore, since there is already a function responsible for bumping
the ntc, make use of that and drop that logic from i40e_is_non_eop so
that the scope of this function is limited to what the name actually
states.
Reviewed-by: NBjörn Töpel <bjorn.topel@intel.com>
Signed-off-by: NMaciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: NTony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>

d06e2f05

i40e: drop misleading function comments · 4a14994a

由 Maciej Fijalkowski 提交于 1月 18, 2021

i40e_cleanup_headers has a statement about check against skb being
linear or not which is not relevant anymore, so let's remove it.

Same case for i40e_can_reuse_rx_page, it references things that are not
present there anymore.
Reviewed-by: NBjörn Töpel <bjorn.topel@intel.com>
Signed-off-by: NMaciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: NTony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>

4a14994a

11 2月, 2021 2 次提交

i40e: VLAN field for flow director · a9219b33

由 Przemyslaw Patynowski 提交于 12月 18, 2020

Allow user to specify VLAN field and add it to flow director. Show VLAN
field in "ethtool -n ethx" command.
Handle VLAN type and tag field provided by ethtool command. Refactored
filter addition, by replacing static arrays with runtime dummy packet
creation, which allows specifying VLAN field.
Previously, VLAN field was omitted.
Signed-off-by: NPrzemyslaw Patynowski <przemyslawx.patynowski@intel.com>
Tested-by: NTony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>

a9219b33

i40e: Add flow director support for IPv6 · efca91e8

由 Przemyslaw Patynowski 提交于 12月 18, 2020

Flow director for IPv6 is not supported.
1) Implementation of support for IPv6 flow director.
2) Added handlers for addition of TCP6, UDP6, SCTP6, IPv6.
3) Refactored legacy code to make it more generic.
4) Added packet templates for TCP6, UDP6, SCTP6, IPv6.
5) Added handling of IPv6 source and destination address for flow director.
6) Improved argument passing for source and destination portin TCP6, UDP6
and SCTP6.
7) Added handling of ethtool -n for IPv6, TCP6,UDP6, SCTP6.
8) Used correct bit flag regarding FLEXOFF field of flow director data
descriptor.

Without this patch, there would be no support for flow director on IPv6,
TCP6, UDP6, SCTP6.
Tested based on x710 datasheet by using:
ethtool -N enp133s0f0 flow-type tcp4 src-port 13 dst-port 37 user-def 0x44142 action 1
ethtool -N enp133s0f0 flow-type tcp6 src-port 13 dst-port 40 user-def 0x44142 action 2
ethtool -N enp133s0f0 flow-type udp4 src-port 20 dst-port 40 user-def 0x44142 action 3
ethtool -N enp133s0f0 flow-type udp6 src-port 25 dst-port 40 user-def 0x44142 action 4
ethtool -N enp133s0f0 flow-type sctp4 src-port 55 dst-port 65 user-def 0x44142 action 5
ethtool -N enp133s0f0 flow-type sctp6 src-port 60 dst-port 40 user-def 0x44142 action 6
ethtool -N enp133s0f0 flow-type ip4 src-ip 1.1.1.1 dst-ip 1.1.1.4 user-def 0x44142 action 7
ethtool -N enp133s0f0 flow-type ip6 src-ip fe80::3efd:feff:fe6f:bbbb dst-ip fe80::3efd:feff:fe6f:aaaa user-def 0x44142 action 8
Then send traffic from client which matches the criteria provided to ethtool.
Observe that packets are redirected to user set queues with ethtool -S <interface>
Signed-off-by: NPrzemyslaw Patynowski <przemyslawx.patynowski@intel.com>
Tested-by: NTony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>

efca91e8

05 2月, 2021 1 次提交

net: use the new dev_page_is_reusable() instead of private versions · a79afa78

由 Alexander Lobakin 提交于 2月 02, 2021

Now we can remove a bunch of identical functions from the drivers and
make them use common dev_page_is_reusable(). All {,un}likely() checks
are omitted since it's already present in this helper.
Also update some comments near the call sites.
Suggested-by: NDavid Rientjes <rientjes@google.com>
Suggested-by: NJakub Kicinski <kuba@kernel.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: NAlexander Lobakin <alobakin@pm.me>
Reviewed-by: NJesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

a79afa78

09 1月, 2021 2 次提交

net, xdp: Introduce xdp_prepare_buff utility routine · be9df4af

由 Lorenzo Bianconi 提交于 12月 22, 2020

Introduce xdp_prepare_buff utility routine to initialize per-descriptor
xdp_buff fields (e.g. xdp_buff pointers). Rely on xdp_prepare_buff() in
all XDP capable drivers.
Signed-off-by: NLorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Reviewed-by: NAlexander Duyck <alexanderduyck@fb.com>
Acked-by: NJesper Dangaard Brouer <brouer@redhat.com>
Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
Acked-by: NShay Agroskin <shayagr@amazon.com>
Acked-by: NMartin Habets <habetsm.xilinx@gmail.com>
Acked-by: NCamelia Groza <camelia.groza@nxp.com>
Acked-by: NMarcin Wojtas <mw@semihalf.com>
Link: https://lore.kernel.org/bpf/45f46f12295972a97da8ca01990b3e71501e9d89.1608670965.git.lorenzo@kernel.orgSigned-off-by: NAlexei Starovoitov <ast@kernel.org>

be9df4af

net, xdp: Introduce xdp_init_buff utility routine · 43b5169d

由 Lorenzo Bianconi 提交于 12月 22, 2020

Introduce xdp_init_buff utility routine to initialize xdp_buff fields
const over NAPI iterations (e.g. frame_sz or rxq pointer). Rely on
xdp_init_buff in all XDP capable drivers.
Signed-off-by: NLorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Reviewed-by: NAlexander Duyck <alexanderduyck@fb.com>
Acked-by: NJesper Dangaard Brouer <brouer@redhat.com>
Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
Acked-by: NShay Agroskin <shayagr@amazon.com>
Acked-by: NMartin Habets <habetsm.xilinx@gmail.com>
Acked-by: NCamelia Groza <camelia.groza@nxp.com>
Acked-by: NMarcin Wojtas <mw@semihalf.com>
Link: https://lore.kernel.org/bpf/7f8329b6da1434dc2b05a77f2e800b29628a8913.1608670965.git.lorenzo@kernel.orgSigned-off-by: NAlexei Starovoitov <ast@kernel.org>

43b5169d

10 12月, 2020 1 次提交

i40e: avoid premature Rx buffer reuse · 75aab4e1

由 Björn Töpel 提交于 8月 25, 2020

The page recycle code, incorrectly, relied on that a page fragment
could not be freed inside xdp_do_redirect(). This assumption leads to
that page fragments that are used by the stack/XDP redirect can be
reused and overwritten.

To avoid this, store the page count prior invoking xdp_do_redirect().

Longer explanation:

Intel NICs have a recycle mechanism. The main idea is that a page is
split into two parts. One part is owned by the driver, one part might
be owned by someone else, such as the stack.

t0: Page is allocated, and put on the Rx ring
              +---------------
used by NIC ->| upper buffer
(rx_buffer)   +---------------
              | lower buffer
              +---------------
  page count  == USHRT_MAX
  rx_buffer->pagecnt_bias == USHRT_MAX

t1: Buffer is received, and passed to the stack (e.g.)
              +---------------
              | upper buff (skb)
              +---------------
used by NIC ->| lower buffer
(rx_buffer)   +---------------
  page count  == USHRT_MAX
  rx_buffer->pagecnt_bias == USHRT_MAX - 1

t2: Buffer is received, and redirected
              +---------------
              | upper buff (skb)
              +---------------
used by NIC ->| lower buffer
(rx_buffer)   +---------------

Now, prior calling xdp_do_redirect():
  page count  == USHRT_MAX
  rx_buffer->pagecnt_bias == USHRT_MAX - 2

This means that buffer *cannot* be flipped/reused, because the skb is
still using it.

The problem arises when xdp_do_redirect() actually frees the
segment. Then we get:
  page count  == USHRT_MAX - 1
  rx_buffer->pagecnt_bias == USHRT_MAX - 2

From a recycle perspective, the buffer can be flipped and reused,
which means that the skb data area is passed to the Rx HW ring!

To work around this, the page count is stored prior calling
xdp_do_redirect().

Note that this is not optimal, since the NIC could actually reuse the
"lower buffer" again. However, then we need to track whether
XDP_REDIRECT consumed the buffer or not.

Fixes: d9314c47 ("i40e: add support for XDP_REDIRECT")
Reported-and-analyzed-by: NLi RongQing <lirongqing@baidu.com>
Signed-off-by: NBjörn Töpel <bjorn.topel@intel.com>
Tested-by: NGeorge Kuruvinakunnel <george.kuruvinakunnel@intel.com>
Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>

75aab4e1

01 12月, 2020 1 次提交

xsk: Propagate napi_id to XDP socket Rx path · b02e5a0e

由 Björn Töpel 提交于 11月 30, 2020

Add napi_id to the xdp_rxq_info structure, and make sure the XDP
socket pick up the napi_id in the Rx path. The napi_id is used to find
the corresponding NAPI structure for socket busy polling.
Signed-off-by: NBjörn Töpel <bjorn.topel@intel.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NIlias Apalodimas <ilias.apalodimas@linaro.org>
Acked-by: NMichael S. Tsirkin <mst@redhat.com>
Acked-by: NTariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/bpf/20201130185205.196029-7-bjorn.topel@gmail.com

b02e5a0e

18 11月, 2020 1 次提交

i40e: Use batched xsk Tx interfaces to increase performance · 3106c580

由 Magnus Karlsson 提交于 11月 16, 2020

Use the new batched xsk interfaces for the Tx path in the i40e driver
to improve performance. On my machine, this yields a throughput
increase of 4% for the l2fwd sample app in xdpsock. If we instead just
look at the Tx part, this patch set increases throughput with above
20% for Tx.

Note that I had to explicitly loop unroll the inner loop to get to
this performance level, by using a pragma. It is honored by both clang
and gcc and should be ignored by versions that do not support
it. Using the -funroll-loops compiler command line switch on the
source file resulted in a loop unrolling on a higher level that
lead to a performance decrease instead of an increase.
Signed-off-by: NMagnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/1605525167-14450-6-git-send-email-magnus.karlsson@gmail.com

3106c580

26 9月, 2020 1 次提交

intel-ethernet: clean up W=1 warnings in kdoc · b50f7bca

由 Jesse Brandeburg 提交于 9月 25, 2020

This takes care of all of the trivial W=1 fixes in the Intel
Ethernet drivers, which allows developers and maintainers to
build more of the networking tree with more complete warning
checks.

There are three classes of kdoc warnings fixed:
 - cannot understand function prototype: 'x'
 - Excess function parameter 'x' description in 'y'
 - Function parameter or member 'x' not described in 'y'

All of the changes were trivial comment updates on
function headers.

Inspired by Lee Jones' series of wireless work to do the same.
Compile tested only, and passes simple test of
$ git ls-files *.[ch] | egrep drivers/net/ethernet/intel | \
  xargs scripts/kernel-doc -none
Signed-off-by: NJesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: NAaron Brown <aaron.f.brown@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b50f7bca

15 9月, 2020 1 次提交

i40e: use 16B HW descriptors instead of 32B · f0064bfd

由 Björn Töpel 提交于 8月 25, 2020

The i40e NIC supports two flavors of HW descriptors, 16 and 32
byte. The latter has, obviously, room for more offloading
information. However, the only fields of the 32B HW descriptor that is
being used by the driver, is also available in the 16B descriptor.

In other words; Reading and writing 32 bytes instead of 16 byte is a
waste of bus bandwidth.

This commit starts using 16 byte descriptors instead of 32 byte
descriptors.

For AF_XDP the rx_drop benchmark was improved by 2%.
Signed-off-by: NBjörn Töpel <bjorn.topel@intel.com>
Tested-by: NAaron Brown <aaron.f.brown@intel.com>
Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>

f0064bfd

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功