提交 · be1222b585fdc410b8c1dbcc57dd03a00f04eff5 · openeuler / Kernel

22 5月, 2020 2 次提交

i40e: Separate kernel allocated rx_bi rings from AF_XDP rings · be1222b5

由 Björn Töpel 提交于 5月 20, 2020

Continuing the path to support MEM_TYPE_XSK_BUFF_POOL, the AF_XDP
zero-copy/sk_buff rx_bi rings are now separate. Functions to properly
allocate the different rings are added as well.

v3->v4: Made i40e_fd_handle_status() static. (kbuild test robot)
v4->v5: Fix kdoc for i40e_clean_programming_status(). (Jakub)
Signed-off-by: NBjörn Töpel <bjorn.topel@intel.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Cc: intel-wired-lan@lists.osuosl.org
Link: https://lore.kernel.org/bpf/20200520192103.355233-8-bjorn.topel@gmail.com

be1222b5

i40e: Refactor rx_bi accesses · e1675f97

由 Björn Töpel 提交于 5月 20, 2020

As a first step to migrate i40e to the new MEM_TYPE_XSK_BUFF_POOL
APIs, code that accesses the rx_bi (SW/shadow ring) is refactored to
use an accessor function.
Signed-off-by: NBjörn Töpel <bjorn.topel@intel.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Cc: intel-wired-lan@lists.osuosl.org
Link: https://lore.kernel.org/bpf/20200520192103.355233-7-bjorn.topel@gmail.com

e1675f97

15 5月, 2020 1 次提交

i40e: Add XDP frame size to driver · 24104024

由 Jesper Dangaard Brouer 提交于 5月 14, 2020

This driver uses different memory models depending on PAGE_SIZE at
compile time. For PAGE_SIZE 4K it uses page splitting, meaning for
normal MTU frame size is 2048 bytes (and headroom 192 bytes). For
larger MTUs the driver still use page splitting, by allocating
order-1 pages (8192 bytes) for RX frames. For PAGE_SIZE larger than
4K, driver instead advance its rx_buffer->page_offset with the frame
size "truesize".

For XDP frame size calculations, this mean that in PAGE_SIZE larger
than 4K mode the frame_sz change on a per packet basis. For the page
split 4K PAGE_SIZE mode, xdp.frame_sz is more constant and can be
updated once outside the main NAPI loop.

The default setting in the driver uses build_skb(), which provides
the necessary headroom and tailroom for XDP-redirect in RX-frame
(in both modes).

There is one complication, which is legacy-rx mode (configurable via
ethtool priv-flags). There are zero headroom in this mode, which is a
requirement for XDP-redirect to work. The conversion to xdp_frame
(convert_to_xdp_frame) will detect this insufficient space, and
xdp_do_redirect() call will fail. This is deemed acceptable, as it
allows other XDP actions to still work in legacy-mode. In
legacy-mode + larger PAGE_SIZE due to lacking tailroom, we also
accept that xdp_adjust_tail shrink doesn't work.
Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Cc: intel-wired-lan@lists.osuosl.org
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Cc: Alexander Duyck <alexander.duyck@gmail.com>
Link: https://lore.kernel.org/bpf/158945346494.97035.12809400414566061815.stgit@firesoul

24104024

30 10月, 2019 1 次提交

i40e: Add UDP segmentation offload support · 3fd8ed56

由 Josh Hunt 提交于 10月 11, 2019

Based on a series from Alexander Duyck this change adds UDP segmentation
offload support to the i40e driver.

CC: Alexander Duyck <alexander.h.duyck@intel.com>
CC: Willem de Bruijn <willemb@google.com>
Signed-off-by: NJosh Hunt <johunt@akamai.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

3fd8ed56

31 7月, 2019 1 次提交

net: Use skb_frag_off accessors · b54c9d5b

由 Jonathan Lemon 提交于 7月 30, 2019

Use accessor functions for skb fragment's page_offset instead
of direct references, in preparation for bvec conversion.
Signed-off-by: NJonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b54c9d5b

23 7月, 2019 1 次提交

net: Use skb accessors in network drivers · d7840976

由 Matthew Wilcox (Oracle) 提交于 7月 22, 2019

In preparation for unifying the skb_frag and bio_vec, use the fine
accessors which already exist and use skb_frag_t instead of
struct skb_frag_struct.
Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d7840976

15 6月, 2019 1 次提交

i40e: Use signed variable · 97e42ef4

由 Mitch Williams 提交于 4月 24, 2019

The counter variable in i40e_clean_tx_irq starts out negative and climbs
to 0. So it should not be defined as a u16. This was working by accident
due to the fact the u16 overflows and underflows predictably.

Replace the u16 with int, which is signed and can handle the negativity.
Signed-off-by: NMitch Williams <mitch.a.williams@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

97e42ef4

24 4月, 2019 1 次提交

net: pass net_device argument to the eth_get_headlen · c43f1255

由 Stanislav Fomichev 提交于 4月 22, 2019

Update all users of eth_get_headlen to pass network device, fetch
network namespace from it and pass it down to the flow dissector.
This commit is a noop until administrator inserts BPF flow dissector
program.

Cc: Maxim Krasnyansky <maxk@qti.qualcomm.com>
Cc: Saeed Mahameed <saeedm@mellanox.com>
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Cc: intel-wired-lan@lists.osuosl.org
Cc: Yisen Zhuang <yisen.zhuang@huawei.com>
Cc: Salil Mehta <salil.mehta@huawei.com>
Cc: Michael Chan <michael.chan@broadcom.com>
Cc: Igor Russkikh <igor.russkikh@aquantia.com>
Signed-off-by: NStanislav Fomichev <sdf@google.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

c43f1255

08 4月, 2019 1 次提交

drivers: Remove explicit invocations of mmiowb() · fb24ea52

由 Will Deacon 提交于 2月 22, 2019

mmiowb() is now implied by spin_unlock() on architectures that require
it, so there is no reason to call it from driver code. This patch was
generated using coccinelle:

	@mmiowb@
	@@
	- mmiowb();

and invoked as:

$ for d in drivers include/linux/qed sound; do \
spatch --include-headers --sp-file mmiowb.cocci --dir $d --in-place; done

NOTE: mmiowb() has only ever guaranteed ordering in conjunction with
spin_unlock(). However, pairing each mmiowb() removal in this patch with
the corresponding call to spin_unlock() is not at all trivial, so there
is a small chance that this change may regress any drivers incorrectly
relying on mmiowb() to order MMIO writes between CPUs using lock-free
synchronisation. If you've ended up bisecting to this commit, you can
reintroduce the mmiowb() calls using wmb() instead, which should restore
the old behaviour on all architectures other than some esoteric ia64
systems.
Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NWill Deacon <will.deacon@arm.com>

fb24ea52

02 4月, 2019 1 次提交

net: move skb->xmit_more hint to softnet data · 6b16f9ee

由 Florian Westphal 提交于 4月 01, 2019

There are two reasons for this.

First, the xmit_more flag conceptually doesn't fit into the skb, as
xmit_more is not a property related to the skb.
Its only a hint to the driver that the stack is about to transmit another
packet immediately.

Second, it was only done this way to not have to pass another argument
to ndo_start_xmit().

We can place xmit_more in the softnet data, next to the device recursion.
The recursion counter is already written to on each transmit. The "more"
indicator is placed right next to it.

Drivers can use the netdev_xmit_more() helper instead of skb->xmit_more
to check the "more packets coming" hint.

skb->xmit_more is retained (but always 0) to not cause build breakage.

This change takes care of the simple s/skb->xmit_more/netdev_xmit_more()/
conversions.  Remaining drivers are converted in the next patches.
Suggested-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6b16f9ee

22 2月, 2019 1 次提交

i40e: fix XDP_REDIRECT/XDP xmit ring cleanup race · 59eb2a88

由 Björn Töpel 提交于 2月 14, 2019

When the driver clears the XDP xmit ring due to re-configuration or
teardown, in-progress ndo_xdp_xmit must be taken into consideration.

The ndo_xdp_xmit function is typically called from a NAPI context that
the driver does not control. Therefore, we must be careful not to
clear the XDP ring, while the call is on-going. This patch adds a
synchronize_rcu() to wait for napi(s) (preempt-disable regions and
softirqs), prior clearing the queue. Further, the __I40E_CONFIG_BUSY
flag is checked in the ndo_xdp_xmit implementation to avoid touching
the XDP xmit queue during re-configuration.

Fixes: d9314c47 ("i40e: add support for XDP_REDIRECT")
Fixes: 123cecd4 ("i40e: added queue pair disable/enable functions")
Reported-by: NMaciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: NBjörn Töpel <bjorn.topel@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

59eb2a88

13 12月, 2018 2 次提交

i40e: DRY rx_ptype handling code · 800b8f63

由 Michał Mirosław 提交于 12月 04, 2018

Move rx_ptype extracting to i40e_process_skb_fields() to avoid
duplicating the code.
Signed-off-by: NMichał Mirosław <michal.miroslaw@atendesoftware.pl>
Signed-off-by: NMichał Mirosław <mirq-linux@rere.qmqm.pl>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

800b8f63

i40e: fix VLAN.TCI == 0 RX HW offload · 2a508c64

由 Michał Mirosław 提交于 12月 04, 2018

This fixes two bugs in hardware VLAN offload:
 1. VLAN.TCI == 0 was being dropped
 2. there was a race between disabling of VLAN RX feature in hardware
    and processing RX queue, where packets processed in this window
    could have their VLAN information dropped

Fix moves the VLAN handling into i40e_process_skb_fields() to save on
duplicated code. i40e_receive_skb() becomes trivial and so is removed.
Signed-off-by: NMichał Mirosław <michal.miroslaw@atendesoftware.pl>
Signed-off-by: NMichał Mirosław <mirq-linux@rere.qmqm.pl>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

2a508c64

22 11月, 2018 1 次提交

ethernet/intel: consolidate NAPI and NAPI exit · 0bcd952f

由 Jesse Brandeburg 提交于 11月 08, 2018

While reviewing code, I noticed that Eric Dumazet recommends that
drivers check the return code of napi_complete_done, and use that
to decide to enable interrupts or not when exiting poll.  One of
the Intel drivers was already fixed (ixgbe).

Upon looking at the Intel drivers as a whole, we are handling our
polling and NAPI exit in a few different ways based on whether we
have multiqueue and whether we have Tx cleanup included. Several
drivers had the bug of exiting NAPI with return 0, which appears
to mess up the accounting in the stack.

Consolidate all the NAPI routines to do best known way of exiting
and to just mostly look like each other.
1) check return code of napi_complete_done to control interrupt enable
2) return the actual amount of work done.
3) return budget immediately if need NAPI poll again

Tested the changes on e1000e with a high interrupt rate set, and
it shows about an 8% reduction in the CPU utilization when busy
polling because we aren't re-enabling interrupts when we're about
to be polled.
Signed-off-by: NJesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Reviewed-by: NJacob Keller <jacob.e.keller@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

0bcd952f

15 11月, 2018 1 次提交

i40e: Use a local variable for readability · 8554768c

由 Jan Sokolowski 提交于 10月 30, 2018

Use a local variable to make the code a bit more readable.
Signed-off-by: NJan Sokolowski <jan.sokolowski@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

8554768c

08 11月, 2018 1 次提交

intel-ethernet: software timestamp skbs as late as possible · a9e51058

由 Jacob Keller 提交于 10月 05, 2018

Many of the Intel Ethernet drivers call skb_tx_timestamp() earlier than
necessary. Move the calls to this function to the latest point possible,
just prior to notifying hardware of the new Tx packet when we bump the
tail register.

This affects i40e, iavf, igb, igc, and ixgbe.

The e100, e1000, e1000e, fm10k, and ice drivers already call the
skb_tx_timestamp() function just prior to indicating the Tx packet to
hardware, so they do not need to be changed.
Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

a9e51058

26 10月, 2018 1 次提交

drivers: net: remove <net/busy_poll.h> inclusion when not needed · 55469bc6

由 Eric Dumazet 提交于 10月 25, 2018

Drivers using generic NAPI interface no longer need to include
<net/busy_poll.h>, since busy polling was moved to core networking
stack long ago.

See commit 79e7fff4 ("net: remove support for per driver
ndo_busy_poll()") for reference.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

55469bc6

26 9月, 2018 2 次提交

i40e: clean zero-copy XDP Rx ring on shutdown/reset · 411dc16f

由 Björn Töpel 提交于 9月 07, 2018

Outstanding Rx descriptors are temporarily stored on a stash/reuse
queue. When/if the HW rings comes up again, entries from the stash are
used to re-populate the ring.

The latter required some restructuring of the allocation scheme for
the AF_XDP zero-copy implementation. There is now a fast, and a slow
allocation. The "fast allocation" is used from the fast-path and
obtains free buffers from the fill ring and the internal recycle
mechanism. The "slow allocation" is only used in ring setup, and
obtains buffers from the fill ring and the stash (if any).
Signed-off-by: NBjörn Töpel <bjorn.topel@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

411dc16f

i40e: clean zero-copy XDP Tx ring on shutdown/reset · 9dbb1370

由 Björn Töpel 提交于 9月 07, 2018

When the zero-copy enabled XDP Tx ring is torn down, due to
configuration changes, outstanding frames on the hardware descriptor
ring are queued on the completion ring.

The completion ring has a back-pressure mechanism that will guarantee
that there is sufficient space on the ring.
Signed-off-by: NBjörn Töpel <bjorn.topel@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

9dbb1370

30 8月, 2018 5 次提交

i40e: add AF_XDP zero-copy Tx support · 1328dcdd

由 Magnus Karlsson 提交于 8月 28, 2018

This patch adds zero-copy Tx support for AF_XDP sockets. It implements
the ndo_xsk_async_xmit netdev ndo and performs all the Tx logic from a
NAPI context. This means pulling egress packets from the Tx ring,
placing the frames on the NIC HW descriptor ring and completing sent
frames back to the application via the completion ring.

The regular XDP Tx ring is used for AF_XDP as well. This rationale for
this is as follows: XDP_REDIRECT guarantees mutual exclusion between
different NAPI contexts based on CPU id. In other words, a netdev can
XDP_REDIRECT to another netdev with a different NAPI context, since
the operation is bound to a specific core and each core has its own
hardware ring.

As the AF_XDP Tx action is running in the same NAPI context and using
the same ring, it will also be protected from XDP_REDIRECT actions
with the exact same mechanism.

As with AF_XDP Rx, all AF_XDP Tx specific functions are added to
i40e_xsk.c.
Signed-off-by: NMagnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

1328dcdd

i40e: move common Tx functions to i40e_txrx_common.h · a96e7472

由 Magnus Karlsson 提交于 8月 28, 2018

This patch prepares for the upcoming zero-copy Tx functionality, by
moving common functions and refactor chunks of code into re-usable
functions, used both by the regular path and zero-copy path.
Signed-off-by: NMagnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

a96e7472

i40e: add AF_XDP zero-copy Rx support · 0a714186

由 Björn Töpel 提交于 8月 28, 2018

This patch adds zero-copy Rx support for AF_XDP sockets. Instead of
allocating buffers of type MEM_TYPE_PAGE_SHARED, the Rx frames are
allocated as MEM_TYPE_ZERO_COPY when AF_XDP is enabled for a certain
queue.

All AF_XDP specific functions are added to a new file, i40e_xsk.c.

Note that when AF_XDP zero-copy is enabled, the XDP action XDP_PASS
will allocate a new buffer and copy the zero-copy frame prior passing
it to the kernel stack.
Signed-off-by: NBjörn Töpel <bjorn.topel@intel.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

0a714186

i40e: move common Rx functions to i40e_txrx_common.h · 20a739db

由 Björn Töpel 提交于 8月 28, 2018

This patch prepares for the upcoming zero-copy Rx functionality, by
moving/changing linkage of common functions, used both by the regular
path and zero-copy path.
Signed-off-by: NBjörn Töpel <bjorn.topel@intel.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

20a739db

i40e: refactor Rx path for re-use · 6d7aad1d

由 Björn Töpel 提交于 8月 28, 2018

In this commit, the Rx path is refactored some, as a step torwards the
introduction AF_XDP Rx zero-copy.

The page re-use counter is moved into the i40e_reuse_rx_page, instead
of bumping the counter in many places. The Rx buffer page clearing is
moved for better readability. Lastely, functions to update statistics
and bump the XDP Tx ring are introduced.
Signed-off-by: NBjörn Töpel <bjorn.topel@intel.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

6d7aad1d

08 8月, 2018 1 次提交

i40e_txrx: mark expected switch fall-through · f7c3ca2d

由 Gustavo A. R. Silva 提交于 8月 07, 2018

In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.

Addresses-Coverity-ID: 114791 ("Missing break in switch")
Signed-off-by: NGustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f7c3ca2d

28 6月, 2018 1 次提交

i40e: split XDP_TX tail and XDP_REDIRECT map flushing · 2e689312

由 Jesper Dangaard Brouer 提交于 6月 26, 2018

The driver was combining the XDP_TX tail flush and XDP_REDIRECT
map flushing (xdp_do_flush_map).  This is suboptimal, these two
flush operations should be kept separate.

It looks like the mistake was copy-pasted from ixgbe.

Fixes: d9314c47 ("i40e: add support for XDP_REDIRECT")
Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
Acked-by: NBjörn Töpel <bjorn.topel@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2e689312

20 6月, 2018 1 次提交

bpf, xdp, i40e: fix i40e_build_skb skb reserve and truesize · c51818d5

由 Daniel Borkmann 提交于 6月 19, 2018

Using skb_reserve(skb, I40E_SKB_PAD + (xdp->data - xdp->data_hard_start))
is clearly wrong since I40E_SKB_PAD already points to the offset where
the original xdp->data was sitting since xdp->data_hard_start is defined
as xdp->data - i40e_rx_offset(rx_ring) where latter offsets to I40E_SKB_PAD
when build skb is used.

However, also before cc5b114d ("bpf, i40e: add meta data support")
this seems broken since bpf_xdp_adjust_head() helper could have been used
to alter headroom and enlarge / shrink the frame and with that the assumption
that the xdp->data remains unchanged does not hold and would push a bogus
packet to upper stack.

ixgbe got this right in 92470808 ("ixgbe: add XDP support for pass and
drop actions"). In any case, fix it by removing the I40E_SKB_PAD from both
skb_reserve() and truesize calculation.

Fixes: cc5b114d ("bpf, i40e: add meta data support")
Fixes: 0c8493d9 ("i40e: add XDP support for pass and drop actions")
Reported-by: NKeith Busch <keith.busch@linux.intel.com>
Reported-by: NToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Cc: Björn Töpel <bjorn.topel@intel.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Tested-by: NKeith Busch <keith.busch@linux.intel.com>
Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
Acked-by: NAlexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c51818d5

05 6月, 2018 2 次提交

i40e: remove ndo_xdp_flush call i40e_xdp_flush · 763ea096

由 Jesper Dangaard Brouer 提交于 6月 05, 2018

Remove the ndo_xdp_flush call implementation i40e_xdp_flush
as no callers of ndo_xdp_flush are left.
Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

763ea096

bpf, i40e: add meta data support · cc5b114d

由 Daniel Borkmann 提交于 5月 28, 2018

Add support for XDP meta data when using build skb variant of
the i40e driver. Implementation is analogous to the existing
ixgbe and ixgbevf support for meta data from 366a88fe ("bpf,
ixgbe: add meta data support") and be833332 ("ixgbevf: Add
support for meta data"). With the build skb variant we get
192 bytes of extra headroom which can be used for encaps or
meta data.
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
Tested-by: NJohn Fastabend <john.fastabend@gmail.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

cc5b114d

03 6月, 2018 2 次提交

i40e: implement flush flag for ndo_xdp_xmit · cdb57ed0

由 Jesper Dangaard Brouer 提交于 5月 31, 2018

When passed the XDP_XMIT_FLUSH flag i40e_xdp_xmit now performs the
same kind of ring tail update as in i40e_xdp_flush.  The advantage is
that all the necessary checks have been performed and xdp_ring can be
updated, instead of having to perform the exact same steps/checks in
i40e_xdp_flush
Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

cdb57ed0

xdp: add flags argument to ndo_xdp_xmit API · 42b33468

由 Jesper Dangaard Brouer 提交于 5月 31, 2018

This patch only change the API and reject any use of flags. This is an
intermediate step that allows us to implement the flush flag operation
later, for each individual driver in a separate patch.

The plan is to implement flush operation via XDP_XMIT_FLUSH flag
and then remove XDP_XMIT_FLAGS_NONE when done.
Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
Acked-by: NSong Liu <songliubraving@fb.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

42b33468

25 5月, 2018 1 次提交

xdp: change ndo_xdp_xmit API to support bulking · 735fc405

由 Jesper Dangaard Brouer 提交于 5月 24, 2018

This patch change the API for ndo_xdp_xmit to support bulking
xdp_frames.

When kernel is compiled with CONFIG_RETPOLINE, XDP sees a huge slowdown.
Most of the slowdown is caused by DMA API indirect function calls, but
also the net_device->ndo_xdp_xmit() call.

Benchmarked patch with CONFIG_RETPOLINE, using xdp_redirect_map with
single flow/core test (CPU E5-1650 v4 @ 3.60GHz), showed
performance improved:
 for driver ixgbe: 6,042,682 pps -> 6,853,768 pps = +811,086 pps
 for driver i40e : 6,187,169 pps -> 6,724,519 pps = +537,350 pps

With frames avail as a bulk inside the driver ndo_xdp_xmit call,
further optimizations are possible, like bulk DMA-mapping for TX.

Testing without CONFIG_RETPOLINE show the same performance for
physical NIC drivers.

The virtual NIC driver tun sees a huge performance boost, as it can
avoid doing per frame producer locking, but instead amortize the
locking cost over the bulk.

V2: Fix compile errors reported by kbuild test robot <lkp@intel.com>
V4: Isolated ndo, driver changes and callers.
Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

735fc405

01 5月, 2018 1 次提交

i40e/i40evf: cleanup incorrect function doxygen comments · f5254429

由 Jacob Keller 提交于 4月 20, 2018

Recent versions of the Linux kernel now warn about incorrect parameter
definitions for function comments. Fix up several function comments to
correctly reflect the current function arguments. This cleans up the
warnings and helps ensure our documentation is accurate.
Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

f5254429

28 4月, 2018 1 次提交

net: intel: Cleanup the copyright/license headers · 51dce24b

由 Jeff Kirsher 提交于 4月 26, 2018

After many years of having a ~30 line copyright and license header to our
source files, we are finally able to reduce that to one line with the
advent of the SPDX identifier.

Also caught a few files missing the SPDX license identifier, so fixed
them up.
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
Acked-by: NShannon Nelson <shannon.nelson@oracle.com>
Acked-by: NRichard Cochran <richardcochran@gmail.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

51dce24b

17 4月, 2018 3 次提交

xdp: transition into using xdp_frame for ndo_xdp_xmit · 44fa2dbd

由 Jesper Dangaard Brouer 提交于 4月 17, 2018

Changing API ndo_xdp_xmit to take a struct xdp_frame instead of struct
xdp_buff. This brings xdp_return_frame and ndp_xdp_xmit in sync.

This builds towards changing the API further to become a bulk API,
because xdp_buff is not a queue-able object while xdp_frame is.

V4: Adjust for commit 59655a5b ("tuntap: XDP_TX can use native XDP")
V7: Adjust for commit d9314c47 ("i40e: add support for XDP_REDIRECT")
Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

44fa2dbd

xdp: transition into using xdp_frame for return API · 03993094

由 Jesper Dangaard Brouer 提交于 4月 17, 2018

Changing API xdp_return_frame() to take struct xdp_frame as argument,
seems like a natural choice. But there are some subtle performance
details here that needs extra care, which is a deliberate choice.

When de-referencing xdp_frame on a remote CPU during DMA-TX
completion, result in the cache-line is change to "Shared"
state. Later when the page is reused for RX, then this xdp_frame
cache-line is written, which change the state to "Modified".

This situation already happens (naturally) for, virtio_net, tun and
cpumap as the xdp_frame pointer is the queued object.  In tun and
cpumap, the ptr_ring is used for efficiently transferring cache-lines
(with pointers) between CPUs. Thus, the only option is to
de-referencing xdp_frame.

It is only the ixgbe driver that had an optimization, in which it can
avoid doing the de-reference of xdp_frame.  The driver already have
TX-ring queue, which (in case of remote DMA-TX completion) have to be
transferred between CPUs anyhow.  In this data area, we stored a
struct xdp_mem_info and a data pointer, which allowed us to avoid
de-referencing xdp_frame.

To compensate for this, a prefetchw is used for telling the cache
coherency protocol about our access pattern.  My benchmarks show that
this prefetchw is enough to compensate the ixgbe driver.

V7: Adjust for commit d9314c47 ("i40e: add support for XDP_REDIRECT")
V8: Adjust for commit bd658dda ("net/mlx5e: Separate dma base address
and offset in dma_sync call")
Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

03993094

i40e: convert to use generic xdp_frame and xdp_return_frame API · b411ef11

由 Jesper Dangaard Brouer 提交于 4月 17, 2018

Also convert driver i40e, which very recently got XDP_REDIRECT support
in commit d9314c47 ("i40e: add support for XDP_REDIRECT").

V7: This patch got added in V7 of this patchset.
Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b411ef11

27 3月, 2018 3 次提交

i40e: add support for XDP_REDIRECT · d9314c47

由 Björn Töpel 提交于 3月 22, 2018

The driver now acts upon the XDP_REDIRECT return action. Two new ndos
are implemented, ndo_xdp_xmit and ndo_xdp_flush.

XDP_REDIRECT action enables XDP program to redirect frames to other
netdevs.
Signed-off-by: NBjörn Töpel <bjorn.topel@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

d9314c47

i40e: tweak page counting for XDP_REDIRECT · 8ce29c67

由 Björn Töpel 提交于 3月 22, 2018

This commit tweaks the page counting for XDP_REDIRECT to function
properly. XDP_REDIRECT support will be added in a future commit.

The current page counting scheme assumes that the reference count
cannot decrease until the received frame is sent to the upper layers
of the networking stack. This assumption does not hold for the
XDP_REDIRECT action, since a page (pointed out by xdp_buff) can have
its reference count decreased via the xdp_do_redirect call.

To work around that, we now start off by a large page count and then
don't allow a refcount less than two.
Signed-off-by: NBjörn Töpel <bjorn.topel@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

8ce29c67

i40e: move AUTO_DISABLED flags into the state field · 134201ae

由 Jacob Keller 提交于 3月 16, 2018

The two Flow Directory auto disable flags are used at run time to mark
when the flow director features needed to be disabled. Thus the flags
could change even when the RTNL lock is not held.

They also have some code constructions which really should be
test_and_set or test_and_clear using atomic bit operations.

Create new state fields to mark this, and stop including them in
pf->flags.

This is part of a larger effort to remove the need for cmpxchg64 in
i40e_set_priv_flags().
Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

134201ae

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功