提交 · 49589b23d5a92dff4a7cb705608dff7dd13ef709 · openeuler / Kernel

25 6月, 2021 1 次提交

intel: Remove rcu_read_lock() around XDP program invocation · 49589b23

由 Toke Høiland-Jørgensen 提交于 6月 24, 2021

The Intel drivers all have rcu_read_lock()/rcu_read_unlock() pairs around
XDP program invocations. However, the actual lifetime of the objects
referred by the XDP program invocation is longer, all the way through to
the call to xdp_do_flush(), making the scope of the rcu_read_lock() too
small. This turns out to be harmless because it all happens in a single
NAPI poll cycle (and thus under local_bh_disable()), but it makes the
rcu_read_lock() misleading.

Rather than extend the scope of the rcu_read_lock(), just get rid of it
entirely. With the addition of RCU annotations to the XDP_REDIRECT map
types that take bh execution into account, lockdep even understands this to
be safe, so there's really no reason to keep it around.
Signed-off-by: NToke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Tested-by: Jesper Dangaard Brouer <brouer@redhat.com> # i40e
Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>
Cc: Tony Nguyen <anthony.l.nguyen@intel.com>
Cc: intel-wired-lan@lists.osuosl.org
Link: https://lore.kernel.org/bpf/20210624160609.292325-12-toke@redhat.com

49589b23

03 6月, 2021 1 次提交

i40e: add correct exception tracing for XDP · f6c10b48

由 Magnus Karlsson 提交于 5月 10, 2021

Add missing exception tracing to XDP when a number of different errors
can occur. The support was only partial. Several errors where not
logged which would confuse the user quite a lot not knowing where and
why the packets disappeared.

Fixes: 74608d17 ("i40e: add support for XDP_TX action")
Fixes: 0a714186 ("i40e: add AF_XDP zero-copy Rx support")
Reported-by: NJesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: NMagnus Karlsson <magnus.karlsson@intel.com>
Tested-by: NKiran Bhandare <kiranx.bhandare@intel.com>
Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>

f6c10b48

08 5月, 2021 1 次提交

i40e: fix broken XDP support · ae4393df

由 Magnus Karlsson 提交于 4月 26, 2021

Commit 12738ac4 ("i40e: Fix sparse errors in i40e_txrx.c") broke
XDP support in the i40e driver. That commit was fixing a sparse error
in the code by introducing a new variable xdp_res instead of
overloading this into the skb pointer. The problem is that the code
later uses the skb pointer in if statements and these where not
extended to also test for the new xdp_res variable. Fix this by adding
the correct tests for xdp_res in these places.

The skb pointer was used to store the result of the XDP program by
overloading the results in the error pointer
ERR_PTR(-result). Therefore, the allocation failure test that used to
only test for !skb now need to be extended to also consider !xdp_res.

i40e_cleanup_headers() had a check that based on the skb value being
an error pointer, i.e. a result from the XDP program != XDP_PASS, and
if so start to process a new packet immediately, instead of populating
skb fields and sending the skb to the stack. This check is not needed
anymore, since we have added an explicit test for xdp_res being set
and if so just do continue to pick the next packet from the NIC.

Fixes: 12738ac4 ("i40e: Fix sparse errors in i40e_txrx.c")
Acked-by: NJesper Dangaard Brouer <brouer@redhat.com>
Tested-by: NJesper Dangaard Brouer <brouer@redhat.com>
Reported-by: NJesper Dangaard Brouer <brouer@redhat.com>
Reviewed-by: NMaciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: NMagnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>

ae4393df

09 4月, 2021 1 次提交

i40e: Fix sparse errors in i40e_txrx.c · 12738ac4

由 Arkadiusz Kubalewski 提交于 3月 26, 2021

Remove error handling through pointers. Instead use plain int
to return value from i40e_run_xdp(...).

Previously:
- sparse errors were produced during compilation:
i40e_txrx.c:2338 i40e_run_xdp() error: (-2147483647) too low for ERR_PTR
i40e_txrx.c:2558 i40e_clean_rx_irq() error: 'skb' dereferencing possible ERR_PTR()

- sk_buff* was used to return value, but it has never had valid
pointer to sk_buff. Returned value was always int handled as
a pointer.

Fixes: 0c8493d9 ("i40e: add XDP support for pass and drop actions")
Fixes: 2e689312 ("i40e: split XDP_TX tail and XDP_REDIRECT map flushing")
Signed-off-by: NAleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: NArkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Tested-by: NDave Switzer <david.switzer@intel.com>
Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>

12738ac4

24 3月, 2021 1 次提交

intel: clean up mismatched header comments · 262de08f

由 Jesse Brandeburg 提交于 3月 18, 2021

A bunch of header comments were showing warnings when compiling
with W=1. Fix them all at once. This changes only comments.
Signed-off-by: NJesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>

262de08f

18 3月, 2021 1 次提交

bpf, devmap: Move drop error path to devmap for XDP_REDIRECT · fdc13979

由 Lorenzo Bianconi 提交于 3月 08, 2021

We want to change the current ndo_xdp_xmit drop semantics because it will
allow us to implement better queue overflow handling. This is working
towards the larger goal of a XDP TX queue-hook. Move XDP_REDIRECT error
path handling from each XDP ethernet driver to devmap code. According to
the new APIs, the driver running the ndo_xdp_xmit pointer, will break tx
loop whenever the hw reports a tx error and it will just return to devmap
caller the number of successfully transmitted frames. It will be devmap
responsibility to free dropped frames.

Move each XDP ndo_xdp_xmit capable driver to the new APIs:

- veth
- virtio-net
- mvneta
- mvpp2
- socionext
- amazon ena
- bnxt
- freescale (dpaa2, dpaa)
- xen-frontend
- qede
- ice
- igb
- ixgbe
- i40e
- mlx5
- ti (cpsw, cpsw-new)
- tun
- sfc
Signed-off-by: NLorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Reviewed-by: NIoana Ciornei <ioana.ciornei@nxp.com>
Reviewed-by: NIlias Apalodimas <ilias.apalodimas@linaro.org>
Reviewed-by: NCamelia Groza <camelia.groza@nxp.com>
Acked-by: NEdward Cree <ecree.xilinx@gmail.com>
Acked-by: NJesper Dangaard Brouer <brouer@redhat.com>
Acked-by: NShay Agroskin <shayagr@amazon.com>
Link: https://lore.kernel.org/bpf/ed670de24f951cfd77590decf0229a0ad7fd12f6.1615201152.git.lorenzo@kernel.org

fdc13979

12 3月, 2021 1 次提交

i40e: move headroom initialization to i40e_configure_rx_ring · a8660626

由 Maciej Fijalkowski 提交于 3月 03, 2021

i40e_rx_offset(), that is supposed to initialize the Rx buffer headroom,
relies on I40E_RXR_FLAGS_BUILD_SKB_ENABLED flag.

Currently, the callsite of mentioned function is placed incorrectly
within i40e_setup_rx_descriptors() where Rx ring's build skb flag is not
set yet. This causes the XDP_REDIRECT to be partially broken due to
inability to create xdp_frame in the headroom space, as the headroom is
0.

For the record, below is the call graph:

i40e_vsi_open
 i40e_vsi_setup_rx_resources
  i40e_setup_rx_descriptors
   i40e_rx_offset() <-- sets offset to 0 as build_skb flag is set below

 i40e_vsi_configure_rx
  i40e_configure_rx_ring
   set_ring_build_skb_enabled(ring) <-- set build_skb flag

Fix this by moving i40e_rx_offset() to i40e_configure_rx_ring() after
the flag setting.

Fixes: f7bb0d71 ("i40e: store the result of i40e_rx_offset() onto i40e_ring")
Reported-by: NJesper Dangaard Brouer <brouer@redhat.com>
Co-developed-by: NJesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: NMaciej Fijalkowski <maciej.fijalkowski@intel.com>
Acked-by: NJesper Dangaard Brouer <brouer@redhat.com>
Tested-by: NJesper Dangaard Brouer <brouer@redhat.com>
Tested-by: NKiran Bhandare <kiranx.bhandare@intel.com>
Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>

a8660626

20 2月, 2021 1 次提交

i40e: Fix endianness conversions · b32cddd2

由 Norbert Ciosek 提交于 2月 05, 2021

Fixes the following sparse warnings:
i40e_main.c:5953:32: warning: cast from restricted __le16
i40e_main.c:8008:29: warning: incorrect type in assignment (different base types)
i40e_main.c:8008:29: expected unsigned int [assigned] [usertype] ipa
i40e_main.c:8008:29: got restricted __le32 [usertype]
i40e_main.c:8008:29: warning: incorrect type in assignment (different base types)
i40e_main.c:8008:29: expected unsigned int [assigned] [usertype] ipa
i40e_main.c:8008:29: got restricted __le32 [usertype]
i40e_txrx.c:1950:59: warning: incorrect type in initializer (different base types)
i40e_txrx.c:1950:59: expected unsigned short [usertype] vlan_tag
i40e_txrx.c:1950:59: got restricted __le16 [usertype] l2tag1
i40e_txrx.c:1953:40: warning: cast to restricted __le16
i40e_xsk.c:448:38: warning: invalid assignment: |=
i40e_xsk.c:448:38: left side has type restricted __le64
i40e_xsk.c:448:38: right side has type int

Fixes: 2f4b411a ("i40e: Enable cloud filters via tc-flower")
Fixes: 2a508c64 ("i40e: fix VLAN.TCI == 0 RX HW offload")
Fixes: 3106c580 ("i40e: Use batched xsk Tx interfaces to increase performance")
Fixes: 8f88b303 ("i40e: Add infrastructure for queue channel support")
Signed-off-by: NNorbert Ciosek <norbertx.ciosek@intel.com>
Tested-by: NTony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>

b32cddd2

19 2月, 2021 1 次提交

i40e: Fix flow for IPv6 next header (extension header) · 92c60580

由 Slawomir Laba 提交于 9月 10, 2020

When a packet contains an IPv6 header with next header which is
an extension header and not a protocol one, the kernel function
skb_transport_header called with such sk_buff will return a
pointer to the extension header and not to the TCP one.

The above explained call caused a problem with packet processing
for skb with encapsulation for tunnel with I40E_TX_CTX_EXT_IP_IPV6.
The extension header was not skipped at all.

The ipv6_skip_exthdr function does check if next header of the IPV6
header is an extension header and doesn't modify the l4_proto pointer
if it points to a protocol header value so its safe to omit the
comparison of exthdr and l4.hdr pointers. The ipv6_skip_exthdr can
return value -1. This means that the skipping process failed
and there is something wrong with the packet so it will be dropped.

Fixes: a3fd9d88 ("i40e/i40evf: Handle IPv6 extension headers in checksum offload")
Signed-off-by: NSlawomir Laba <slawomirx.laba@intel.com>
Signed-off-by: NPrzemyslaw Patynowski <przemyslawx.patynowski@intel.com>
Reviewed-by: NAleksandr Loktionov <aleksandr.loktionov@intel.com>
Tested-by: NTony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>

92c60580

13 2月, 2021 3 次提交

i40e: store the result of i40e_rx_offset() onto i40e_ring · f7bb0d71

由 Maciej Fijalkowski 提交于 1月 18, 2021

Output of i40e_rx_offset() is based on ethtool's priv flag setting,
which when changed, causes PF reset (disables napi, frees irqs, loads
different Rx mem model, etc.). This means that within napi its result is
constant and there is no reason to call it per each processed frame.

Add new 'rx_offset' field to i40e_ring that is meant to hold the
i40e_rx_offset() result and use it within i40e_clean_rx_irq().
Furthermore, use it within i40e_alloc_mapped_page().

Last but not least, un-inline the function of interest so that compiler
makes the decision about inlining as it lives in .c file.
Reviewed-by: NBjörn Töpel <bjorn.topel@intel.com>
Signed-off-by: NMaciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: NTony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>

f7bb0d71

i40e: adjust i40e_is_non_eop · d06e2f05

由 Maciej Fijalkowski 提交于 1月 18, 2021

i40e_is_non_eop had a leftover comment and unused skb argument which was
used for placing the skb onto rx_buf in case when current buffer was
non-eop one. This is not relevant anymore as commit e72e5659
("i40e/i40evf: Moves skb from i40e_rx_buffer to i40e_ring") pulled the
non-complete skb handling out of rx_bufs up to rx_ring.  Therefore,
let's adjust the function arguments that i40e_is_non_eop takes.

Furthermore, since there is already a function responsible for bumping
the ntc, make use of that and drop that logic from i40e_is_non_eop so
that the scope of this function is limited to what the name actually
states.
Reviewed-by: NBjörn Töpel <bjorn.topel@intel.com>
Signed-off-by: NMaciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: NTony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>

d06e2f05

i40e: drop misleading function comments · 4a14994a

由 Maciej Fijalkowski 提交于 1月 18, 2021

i40e_cleanup_headers has a statement about check against skb being
linear or not which is not relevant anymore, so let's remove it.

Same case for i40e_can_reuse_rx_page, it references things that are not
present there anymore.
Reviewed-by: NBjörn Töpel <bjorn.topel@intel.com>
Signed-off-by: NMaciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: NTony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>

4a14994a

11 2月, 2021 2 次提交

i40e: VLAN field for flow director · a9219b33

由 Przemyslaw Patynowski 提交于 12月 18, 2020

Allow user to specify VLAN field and add it to flow director. Show VLAN
field in "ethtool -n ethx" command.
Handle VLAN type and tag field provided by ethtool command. Refactored
filter addition, by replacing static arrays with runtime dummy packet
creation, which allows specifying VLAN field.
Previously, VLAN field was omitted.
Signed-off-by: NPrzemyslaw Patynowski <przemyslawx.patynowski@intel.com>
Tested-by: NTony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>

a9219b33

i40e: Add flow director support for IPv6 · efca91e8

由 Przemyslaw Patynowski 提交于 12月 18, 2020

Flow director for IPv6 is not supported.
1) Implementation of support for IPv6 flow director.
2) Added handlers for addition of TCP6, UDP6, SCTP6, IPv6.
3) Refactored legacy code to make it more generic.
4) Added packet templates for TCP6, UDP6, SCTP6, IPv6.
5) Added handling of IPv6 source and destination address for flow director.
6) Improved argument passing for source and destination portin TCP6, UDP6
and SCTP6.
7) Added handling of ethtool -n for IPv6, TCP6,UDP6, SCTP6.
8) Used correct bit flag regarding FLEXOFF field of flow director data
descriptor.

Without this patch, there would be no support for flow director on IPv6,
TCP6, UDP6, SCTP6.
Tested based on x710 datasheet by using:
ethtool -N enp133s0f0 flow-type tcp4 src-port 13 dst-port 37 user-def 0x44142 action 1
ethtool -N enp133s0f0 flow-type tcp6 src-port 13 dst-port 40 user-def 0x44142 action 2
ethtool -N enp133s0f0 flow-type udp4 src-port 20 dst-port 40 user-def 0x44142 action 3
ethtool -N enp133s0f0 flow-type udp6 src-port 25 dst-port 40 user-def 0x44142 action 4
ethtool -N enp133s0f0 flow-type sctp4 src-port 55 dst-port 65 user-def 0x44142 action 5
ethtool -N enp133s0f0 flow-type sctp6 src-port 60 dst-port 40 user-def 0x44142 action 6
ethtool -N enp133s0f0 flow-type ip4 src-ip 1.1.1.1 dst-ip 1.1.1.4 user-def 0x44142 action 7
ethtool -N enp133s0f0 flow-type ip6 src-ip fe80::3efd:feff:fe6f:bbbb dst-ip fe80::3efd:feff:fe6f:aaaa user-def 0x44142 action 8
Then send traffic from client which matches the criteria provided to ethtool.
Observe that packets are redirected to user set queues with ethtool -S <interface>
Signed-off-by: NPrzemyslaw Patynowski <przemyslawx.patynowski@intel.com>
Tested-by: NTony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>

efca91e8

05 2月, 2021 1 次提交

net: use the new dev_page_is_reusable() instead of private versions · a79afa78

由 Alexander Lobakin 提交于 2月 02, 2021

Now we can remove a bunch of identical functions from the drivers and
make them use common dev_page_is_reusable(). All {,un}likely() checks
are omitted since it's already present in this helper.
Also update some comments near the call sites.
Suggested-by: NDavid Rientjes <rientjes@google.com>
Suggested-by: NJakub Kicinski <kuba@kernel.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: NAlexander Lobakin <alobakin@pm.me>
Reviewed-by: NJesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

a79afa78

09 1月, 2021 2 次提交

net, xdp: Introduce xdp_prepare_buff utility routine · be9df4af

由 Lorenzo Bianconi 提交于 12月 22, 2020

Introduce xdp_prepare_buff utility routine to initialize per-descriptor
xdp_buff fields (e.g. xdp_buff pointers). Rely on xdp_prepare_buff() in
all XDP capable drivers.
Signed-off-by: NLorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Reviewed-by: NAlexander Duyck <alexanderduyck@fb.com>
Acked-by: NJesper Dangaard Brouer <brouer@redhat.com>
Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
Acked-by: NShay Agroskin <shayagr@amazon.com>
Acked-by: NMartin Habets <habetsm.xilinx@gmail.com>
Acked-by: NCamelia Groza <camelia.groza@nxp.com>
Acked-by: NMarcin Wojtas <mw@semihalf.com>
Link: https://lore.kernel.org/bpf/45f46f12295972a97da8ca01990b3e71501e9d89.1608670965.git.lorenzo@kernel.orgSigned-off-by: NAlexei Starovoitov <ast@kernel.org>

be9df4af

net, xdp: Introduce xdp_init_buff utility routine · 43b5169d

由 Lorenzo Bianconi 提交于 12月 22, 2020

Introduce xdp_init_buff utility routine to initialize xdp_buff fields
const over NAPI iterations (e.g. frame_sz or rxq pointer). Rely on
xdp_init_buff in all XDP capable drivers.
Signed-off-by: NLorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Reviewed-by: NAlexander Duyck <alexanderduyck@fb.com>
Acked-by: NJesper Dangaard Brouer <brouer@redhat.com>
Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
Acked-by: NShay Agroskin <shayagr@amazon.com>
Acked-by: NMartin Habets <habetsm.xilinx@gmail.com>
Acked-by: NCamelia Groza <camelia.groza@nxp.com>
Acked-by: NMarcin Wojtas <mw@semihalf.com>
Link: https://lore.kernel.org/bpf/7f8329b6da1434dc2b05a77f2e800b29628a8913.1608670965.git.lorenzo@kernel.orgSigned-off-by: NAlexei Starovoitov <ast@kernel.org>

43b5169d

10 12月, 2020 1 次提交

i40e: avoid premature Rx buffer reuse · 75aab4e1

由 Björn Töpel 提交于 8月 25, 2020

The page recycle code, incorrectly, relied on that a page fragment
could not be freed inside xdp_do_redirect(). This assumption leads to
that page fragments that are used by the stack/XDP redirect can be
reused and overwritten.

To avoid this, store the page count prior invoking xdp_do_redirect().

Longer explanation:

Intel NICs have a recycle mechanism. The main idea is that a page is
split into two parts. One part is owned by the driver, one part might
be owned by someone else, such as the stack.

t0: Page is allocated, and put on the Rx ring
              +---------------
used by NIC ->| upper buffer
(rx_buffer)   +---------------
              | lower buffer
              +---------------
  page count  == USHRT_MAX
  rx_buffer->pagecnt_bias == USHRT_MAX

t1: Buffer is received, and passed to the stack (e.g.)
              +---------------
              | upper buff (skb)
              +---------------
used by NIC ->| lower buffer
(rx_buffer)   +---------------
  page count  == USHRT_MAX
  rx_buffer->pagecnt_bias == USHRT_MAX - 1

t2: Buffer is received, and redirected
              +---------------
              | upper buff (skb)
              +---------------
used by NIC ->| lower buffer
(rx_buffer)   +---------------

Now, prior calling xdp_do_redirect():
  page count  == USHRT_MAX
  rx_buffer->pagecnt_bias == USHRT_MAX - 2

This means that buffer *cannot* be flipped/reused, because the skb is
still using it.

The problem arises when xdp_do_redirect() actually frees the
segment. Then we get:
  page count  == USHRT_MAX - 1
  rx_buffer->pagecnt_bias == USHRT_MAX - 2

From a recycle perspective, the buffer can be flipped and reused,
which means that the skb data area is passed to the Rx HW ring!

To work around this, the page count is stored prior calling
xdp_do_redirect().

Note that this is not optimal, since the NIC could actually reuse the
"lower buffer" again. However, then we need to track whether
XDP_REDIRECT consumed the buffer or not.

Fixes: d9314c47 ("i40e: add support for XDP_REDIRECT")
Reported-and-analyzed-by: NLi RongQing <lirongqing@baidu.com>
Signed-off-by: NBjörn Töpel <bjorn.topel@intel.com>
Tested-by: NGeorge Kuruvinakunnel <george.kuruvinakunnel@intel.com>
Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>

75aab4e1

01 12月, 2020 1 次提交

xsk: Propagate napi_id to XDP socket Rx path · b02e5a0e

由 Björn Töpel 提交于 11月 30, 2020

Add napi_id to the xdp_rxq_info structure, and make sure the XDP
socket pick up the napi_id in the Rx path. The napi_id is used to find
the corresponding NAPI structure for socket busy polling.
Signed-off-by: NBjörn Töpel <bjorn.topel@intel.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NIlias Apalodimas <ilias.apalodimas@linaro.org>
Acked-by: NMichael S. Tsirkin <mst@redhat.com>
Acked-by: NTariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/bpf/20201130185205.196029-7-bjorn.topel@gmail.com

b02e5a0e

18 11月, 2020 1 次提交

i40e: Use batched xsk Tx interfaces to increase performance · 3106c580

由 Magnus Karlsson 提交于 11月 16, 2020

Use the new batched xsk interfaces for the Tx path in the i40e driver
to improve performance. On my machine, this yields a throughput
increase of 4% for the l2fwd sample app in xdpsock. If we instead just
look at the Tx part, this patch set increases throughput with above
20% for Tx.

Note that I had to explicitly loop unroll the inner loop to get to
this performance level, by using a pragma. It is honored by both clang
and gcc and should be ignored by versions that do not support
it. Using the -funroll-loops compiler command line switch on the
source file resulted in a loop unrolling on a higher level that
lead to a performance decrease instead of an increase.
Signed-off-by: NMagnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/1605525167-14450-6-git-send-email-magnus.karlsson@gmail.com

3106c580

26 9月, 2020 1 次提交

intel-ethernet: clean up W=1 warnings in kdoc · b50f7bca

由 Jesse Brandeburg 提交于 9月 25, 2020

This takes care of all of the trivial W=1 fixes in the Intel
Ethernet drivers, which allows developers and maintainers to
build more of the networking tree with more complete warning
checks.

There are three classes of kdoc warnings fixed:
 - cannot understand function prototype: 'x'
 - Excess function parameter 'x' description in 'y'
 - Function parameter or member 'x' not described in 'y'

All of the changes were trivial comment updates on
function headers.

Inspired by Lee Jones' series of wireless work to do the same.
Compile tested only, and passes simple test of
$ git ls-files *.[ch] | egrep drivers/net/ethernet/intel | \
  xargs scripts/kernel-doc -none
Signed-off-by: NJesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: NAaron Brown <aaron.f.brown@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b50f7bca

15 9月, 2020 3 次提交

i40e: use 16B HW descriptors instead of 32B · f0064bfd

由 Björn Töpel 提交于 8月 25, 2020

The i40e NIC supports two flavors of HW descriptors, 16 and 32
byte. The latter has, obviously, room for more offloading
information. However, the only fields of the 32B HW descriptor that is
being used by the driver, is also available in the 16B descriptor.

In other words; Reading and writing 32 bytes instead of 16 byte is a
waste of bus bandwidth.

This commit starts using 16 byte descriptors instead of 32 byte
descriptors.

For AF_XDP the rx_drop benchmark was improved by 2%.
Signed-off-by: NBjörn Töpel <bjorn.topel@intel.com>
Tested-by: NAaron Brown <aaron.f.brown@intel.com>
Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>

f0064bfd

i40e, xsk: remove HW descriptor prefetch in AF_XDP path · f78bd130

由 Björn Töpel 提交于 8月 25, 2020

The software prefetching of HW descriptors has a negative impact on
the performance. Therefore, it is now removed.

Performance for the rx_drop benchmark increased with 2%.
Signed-off-by: NBjörn Töpel <bjorn.topel@intel.com>
Tested-by: NAaron Brown <aaron.f.brown@intel.com>
Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>

f78bd130

i40e: optimise prefetch page refcount · 1fa5cef2

由 Li RongQing 提交于 8月 18, 2020

refcount of rx_buffer page will be added here originally, so prefetchw
is needed, but after commit 1793668c ("i40e/i40evf: Update code to
better handle incrementing page count"), and refcount is not added
every time, so change prefetchw as prefetch.

Now it mainly services page_address(), but which accesses struct page
only when WANT_PAGE_VIRTUAL or HASHED_PAGE_VIRTUAL is defined otherwise
it returns address based on offset, so we prefetch it conditionally.

Jakub suggested to define prefetch_page_address in a common header.
Reported-by: Nkernel test robot <lkp@intel.com>
Suggested-by: NJakub Kicinski <kuba@kernel.org>
Signed-off-by: NLi RongQing <lirongqing@baidu.com>
Reviewed-by: NJesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: NAaron Brown <aaron.f.brown@intel.com>
Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>

1fa5cef2

01 9月, 2020 1 次提交

xsk: i40e: ice: ixgbe: mlx5: Pass buffer pool to driver instead of umem · 1742b3d5

由 Magnus Karlsson 提交于 8月 28, 2020

Replace the explicit umem reference passed to the driver in AF_XDP
zero-copy mode with the buffer pool instead. This in preparation for
extending the functionality of the zero-copy mode so that umems can be
shared between queues on the same netdev and also between netdevs. In
this commit, only an umem reference has been added to the buffer pool
struct. But later commits will add other entities to it. These are
going to be entities that are different between different queue ids
and netdevs even though the umem is shared between them.
Signed-off-by: NMagnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NBjörn Töpel <bjorn.topel@intel.com>
Link: https://lore.kernel.org/bpf/1598603189-32145-2-git-send-email-magnus.karlsson@intel.com

1742b3d5

27 8月, 2020 1 次提交

net: Take common prefetch code structure into a function · f468f21b

由 Tariq Toukan 提交于 8月 26, 2020

Many device drivers use the same prefetch code structure to
deal with small L1 cacheline size.
Take this code into a function and call it from the drivers.
Suggested-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
Reviewed-by: NSaeed Mahameed <saeedm@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f468f21b

02 7月, 2020 3 次提交

i40e: eliminate division in napi_poll data path · 4b5539c0

由 Magnus Karlsson 提交于 6月 23, 2020

Eliminate a division in the napi_poll data path. This division is
executed even though it is only needed in the rare case when there are
not enough interrupt lines so they have to be shared between queue
pairs. Instead, just test for this case and only execute the division
if needed. The code has been lifted from the ice driver.
Signed-off-by: NMagnus Karlsson <magnus.karlsson@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>

4b5539c0

i40e: optimize AF_XDP Tx completion path · 5574ff7b

由 Magnus Karlsson 提交于 6月 23, 2020

Improve the performance of the AF_XDP zero-copy Tx completion
path. When there are no XDP buffers being sent using XDP_TX or
XDP_REDIRECT, we do not have go through the SW ring to clean up any
entries since the AF_XDP path does not use these. In these cases, just
fast forward the next-to-use counter and skip going through the SW
ring. The limit on the maximum number of entries to complete is also
removed since the algorithm is now O(1). To simplify the code path, the
maximum number of entries to complete for the XDP path is therefore
also increased from 256 to 512 (the default number of Tx HW
descriptors). This should be fine since the completion in the XDP path
is faster than in the SKB path that has 256 as the maximum number.

This patch provides around 4% throughput improvement for the l2fwd
application in xdpsock on my machine.
Signed-off-by: NMagnus Karlsson <magnus.karlsson@intel.com>
Reviewed-by: NSridhar Samudrala <sridhar.samudrala@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>

5574ff7b

ethernet/intel: Convert fallthrough code comments · 5463fce6

由 Jeff Kirsher 提交于 6月 03, 2020

Convert all the remaining 'fall through" code comments to the newer
'fallthrough;' keyword.
Suggested-by: NJoe Perches <joe@perches.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
Tested-by: NAaron Brown <aaron.f.brown@intel.com>
Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>

5463fce6

02 6月, 2020 1 次提交

xdp: Rename convert_to_xdp_frame in xdp_convert_buff_to_frame · 1b698fa5

由 Lorenzo Bianconi 提交于 5月 28, 2020

In order to use standard 'xdp' prefix, rename convert_to_xdp_frame
utility routine in xdp_convert_buff_to_frame and replace all the
occurrences
Signed-off-by: NLorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NJesper Dangaard Brouer <brouer@redhat.com>
Link: https://lore.kernel.org/bpf/6344f739be0d1a08ab2b9607584c4d5478c8c083.1590698295.git.lorenzo@kernel.org

1b698fa5

22 5月, 2020 2 次提交

i40e: Separate kernel allocated rx_bi rings from AF_XDP rings · be1222b5

由 Björn Töpel 提交于 5月 20, 2020

Continuing the path to support MEM_TYPE_XSK_BUFF_POOL, the AF_XDP
zero-copy/sk_buff rx_bi rings are now separate. Functions to properly
allocate the different rings are added as well.

v3->v4: Made i40e_fd_handle_status() static. (kbuild test robot)
v4->v5: Fix kdoc for i40e_clean_programming_status(). (Jakub)
Signed-off-by: NBjörn Töpel <bjorn.topel@intel.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Cc: intel-wired-lan@lists.osuosl.org
Link: https://lore.kernel.org/bpf/20200520192103.355233-8-bjorn.topel@gmail.com

be1222b5

i40e: Refactor rx_bi accesses · e1675f97

由 Björn Töpel 提交于 5月 20, 2020

As a first step to migrate i40e to the new MEM_TYPE_XSK_BUFF_POOL
APIs, code that accesses the rx_bi (SW/shadow ring) is refactored to
use an accessor function.
Signed-off-by: NBjörn Töpel <bjorn.topel@intel.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Cc: intel-wired-lan@lists.osuosl.org
Link: https://lore.kernel.org/bpf/20200520192103.355233-7-bjorn.topel@gmail.com

e1675f97

15 5月, 2020 1 次提交

i40e: Add XDP frame size to driver · 24104024

由 Jesper Dangaard Brouer 提交于 5月 14, 2020

This driver uses different memory models depending on PAGE_SIZE at
compile time. For PAGE_SIZE 4K it uses page splitting, meaning for
normal MTU frame size is 2048 bytes (and headroom 192 bytes). For
larger MTUs the driver still use page splitting, by allocating
order-1 pages (8192 bytes) for RX frames. For PAGE_SIZE larger than
4K, driver instead advance its rx_buffer->page_offset with the frame
size "truesize".

For XDP frame size calculations, this mean that in PAGE_SIZE larger
than 4K mode the frame_sz change on a per packet basis. For the page
split 4K PAGE_SIZE mode, xdp.frame_sz is more constant and can be
updated once outside the main NAPI loop.

The default setting in the driver uses build_skb(), which provides
the necessary headroom and tailroom for XDP-redirect in RX-frame
(in both modes).

There is one complication, which is legacy-rx mode (configurable via
ethtool priv-flags). There are zero headroom in this mode, which is a
requirement for XDP-redirect to work. The conversion to xdp_frame
(convert_to_xdp_frame) will detect this insufficient space, and
xdp_do_redirect() call will fail. This is deemed acceptable, as it
allows other XDP actions to still work in legacy-mode. In
legacy-mode + larger PAGE_SIZE due to lacking tailroom, we also
accept that xdp_adjust_tail shrink doesn't work.
Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Cc: intel-wired-lan@lists.osuosl.org
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Cc: Alexander Duyck <alexander.duyck@gmail.com>
Link: https://lore.kernel.org/bpf/158945346494.97035.12809400414566061815.stgit@firesoul

24104024

30 10月, 2019 1 次提交

i40e: Add UDP segmentation offload support · 3fd8ed56

由 Josh Hunt 提交于 10月 11, 2019

Based on a series from Alexander Duyck this change adds UDP segmentation
offload support to the i40e driver.

CC: Alexander Duyck <alexander.h.duyck@intel.com>
CC: Willem de Bruijn <willemb@google.com>
Signed-off-by: NJosh Hunt <johunt@akamai.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

3fd8ed56

31 7月, 2019 1 次提交

net: Use skb_frag_off accessors · b54c9d5b

由 Jonathan Lemon 提交于 7月 30, 2019

Use accessor functions for skb fragment's page_offset instead
of direct references, in preparation for bvec conversion.
Signed-off-by: NJonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b54c9d5b

23 7月, 2019 1 次提交

net: Use skb accessors in network drivers · d7840976

由 Matthew Wilcox (Oracle) 提交于 7月 22, 2019

In preparation for unifying the skb_frag and bio_vec, use the fine
accessors which already exist and use skb_frag_t instead of
struct skb_frag_struct.
Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d7840976

15 6月, 2019 1 次提交

i40e: Use signed variable · 97e42ef4

由 Mitch Williams 提交于 4月 24, 2019

The counter variable in i40e_clean_tx_irq starts out negative and climbs
to 0. So it should not be defined as a u16. This was working by accident
due to the fact the u16 overflows and underflows predictably.

Replace the u16 with int, which is signed and can handle the negativity.
Signed-off-by: NMitch Williams <mitch.a.williams@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

97e42ef4

24 4月, 2019 1 次提交

net: pass net_device argument to the eth_get_headlen · c43f1255

由 Stanislav Fomichev 提交于 4月 22, 2019

Update all users of eth_get_headlen to pass network device, fetch
network namespace from it and pass it down to the flow dissector.
This commit is a noop until administrator inserts BPF flow dissector
program.

Cc: Maxim Krasnyansky <maxk@qti.qualcomm.com>
Cc: Saeed Mahameed <saeedm@mellanox.com>
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Cc: intel-wired-lan@lists.osuosl.org
Cc: Yisen Zhuang <yisen.zhuang@huawei.com>
Cc: Salil Mehta <salil.mehta@huawei.com>
Cc: Michael Chan <michael.chan@broadcom.com>
Cc: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: NStanislav Fomichev <sdf@google.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

c43f1255

08 4月, 2019 1 次提交

drivers: Remove explicit invocations of mmiowb() · fb24ea52

由 Will Deacon 提交于 2月 22, 2019

mmiowb() is now implied by spin_unlock() on architectures that require
it, so there is no reason to call it from driver code. This patch was
generated using coccinelle:

	@mmiowb@
	@@
	- mmiowb();

and invoked as:

$ for d in drivers include/linux/qed sound; do \
spatch --include-headers --sp-file mmiowb.cocci --dir $d --in-place; done

NOTE: mmiowb() has only ever guaranteed ordering in conjunction with
spin_unlock(). However, pairing each mmiowb() removal in this patch with
the corresponding call to spin_unlock() is not at all trivial, so there
is a small chance that this change may regress any drivers incorrectly
relying on mmiowb() to order MMIO writes between CPUs using lock-free
synchronisation. If you've ended up bisecting to this commit, you can
reintroduce the mmiowb() calls using wmb() instead, which should restore
the old behaviour on all architectures other than some esoteric ia64
systems.
Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

fb24ea52

02 4月, 2019 1 次提交

net: move skb->xmit_more hint to softnet data · 6b16f9ee

由 Florian Westphal 提交于 4月 01, 2019

There are two reasons for this.

First, the xmit_more flag conceptually doesn't fit into the skb, as
xmit_more is not a property related to the skb.
Its only a hint to the driver that the stack is about to transmit another
packet immediately.

Second, it was only done this way to not have to pass another argument
to ndo_start_xmit().

We can place xmit_more in the softnet data, next to the device recursion.
The recursion counter is already written to on each transmit. The "more"
indicator is placed right next to it.

Drivers can use the netdev_xmit_more() helper instead of skb->xmit_more
to check the "more packets coming" hint.

skb->xmit_more is retained (but always 0) to not cause build breakage.

This change takes care of the simple s/skb->xmit_more/netdev_xmit_more()/
conversions.  Remaining drivers are converted in the next patches.
Suggested-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6b16f9ee

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功