提交 · 9d3684c24a5232c2d7ea8f8a3e60fe235e6a9867 · openeuler / Kernel

20 7月, 2021 4 次提交

veth: create by default nr_possible_cpus queues · 9d3684c2

由 Paolo Abeni 提交于 7月 20, 2021

This allows easier XDP usage. The number of default active
queues is not changed: 1 RX and 1 TX so that this does
not introduce overhead on the datapath for queue selection.

v1 -> v2:
 - drop the module parameter, force default to nr_possible_cpus - Toke
Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9d3684c2

veth: implement support for set_channel ethtool op · 4752eeb3

由 Paolo Abeni 提交于 7月 20, 2021

This change implements the set_channel() ethtool operation,
preserving the current defaults values and allowing up set
the number of queues in the range set ad device creation
time.

The update operation tries hard to leave the device in a
consistent status in case of errors.

RFC v1 -> RFC v2:
 - don't flip device status on set_channel()
 - roll-back the changes if possible on error - Jackub
Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4752eeb3

veth: factor out initialization helper · dedd53c5

由 Paolo Abeni 提交于 7月 20, 2021

Extract in simpler helpers the code to enable and disable a
range of xdp/napi instance, with the common property that
"disable" helpers can't fail.

Will be used by the next patch. No functional change intended.
Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

dedd53c5

veth: always report zero combined channels · f7918b79

由 Paolo Abeni 提交于 7月 20, 2021

veth get_channel currently reports for channels being both RX/TX and
combined. As Jakub noted:

"""
ethtool man page is relatively clear, unfortunately the kernel code
is not and few read the man page. A channel is approximately an IRQ,
not a queue, and IRQ can't be dedicated and combined simultaneously
"""

This patch changes the information exposed by veth_get_channels,
setting max_combined to zero, being more consistent with the above
statement. The ethtool_channels is always cleared by the caller, we just
need to avoid setting the 'combined' fields.
Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f7918b79

17 4月, 2021 1 次提交

veth: check for NAPI instead of xdp_prog before xmit of XDP frame · 0e672f30

由 Toke Høiland-Jørgensen 提交于 4月 16, 2021

The recent patch that tied enabling of veth NAPI to the GRO flag also has
the nice side effect that a veth device can be the target of an
XDP_REDIRECT without an XDP program needing to be loaded on the peer
device. However, the patch adding this extra NAPI mode didn't actually
change the check in veth_xdp_xmit() to also look at the new NAPI pointer,
so let's fix that.

Fixes: 6788fa154546 ("veth: allow enabling NAPI even without XDP")
Signed-off-by: NToke Høiland-Jørgensen <toke@redhat.com>
Acked-by: NJesper Dangaard Brouer <brouer@redhat.com>
Acked-by: NPaolo Abeni <pabeni@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0e672f30

12 4月, 2021 3 次提交

veth: refine napi usage · 47e550e0

由 Paolo Abeni 提交于 4月 09, 2021

After the previous patch, when enabling GRO, locally generated
TCP traffic experiences some measurable overhead, as it traverses
the GRO engine without any chance of aggregation.

This change refine the NAPI receive path admission test, to avoid
unnecessary GRO overhead in most scenarios, when GRO is enabled
on a veth peer.

Only skbs that are eligible for aggregation enter the GRO layer,
the others will go through the traditional receive path.
Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

47e550e0

veth: allow enabling NAPI even without XDP · d3256efd

由 Paolo Abeni 提交于 4月 09, 2021

Currently the veth device has the GRO feature bit set, even if
no GRO aggregation is possible with the default configuration,
as the veth device does not hook into the GRO engine.

Flipping the GRO feature bit from user-space is a no-op, unless
XDP is enabled. In such scenario GRO could actually take place, but
TSO is forced to off on the peer device.

This change allow user-space to really control the GRO feature, with
no need for an XDP program.

The GRO feature bit is now cleared by default - so that there are no
user-visible behavior changes with the default configuration.

When the GRO bit is set, the per-queue NAPI instances are initialized
and registered. On xmit, when napi instances are available, we try
to use them.

Some additional checks are in place to ensure we initialize/delete NAPIs
only when needed in case of overlapping XDP and GRO configuration
changes.
Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d3256efd

veth: use skb_orphan_partial instead of skb_orphan · c75fb320

由 Paolo Abeni 提交于 4月 09, 2021

As described by commit 9c4c3252 ("skbuff: preserve sock
reference when scrubbing the skb."), orphaning a skb
in the TX path will cause OoO.

Let's use skb_orphan_partial() instead of skb_orphan(), so
that we keep the sk around for queue's selection sake and we
still avoid the problem fixed with commit 4bf9ffa0 ("veth:
Orphan skb before GRO")
Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c75fb320

31 3月, 2021 1 次提交

veth: Implement ethtool's get_channels() callback · 34829eec

由 Maciej Fijalkowski 提交于 3月 30, 2021

Libbpf's xsk part calls get_channels() API to retrieve the queue count
of the underlying driver so that XSKMAP is sized accordingly.

Implement that in veth so multi queue scenarios can work properly.
Signed-off-by: NMaciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20210329224316.17793-14-maciej.fijalkowski@intel.com

34829eec

18 3月, 2021 1 次提交

bpf, devmap: Move drop error path to devmap for XDP_REDIRECT · fdc13979

由 Lorenzo Bianconi 提交于 3月 08, 2021

We want to change the current ndo_xdp_xmit drop semantics because it will
allow us to implement better queue overflow handling. This is working
towards the larger goal of a XDP TX queue-hook. Move XDP_REDIRECT error
path handling from each XDP ethernet driver to devmap code. According to
the new APIs, the driver running the ndo_xdp_xmit pointer, will break tx
loop whenever the hw reports a tx error and it will just return to devmap
caller the number of successfully transmitted frames. It will be devmap
responsibility to free dropped frames.

Move each XDP ndo_xdp_xmit capable driver to the new APIs:

- veth
- virtio-net
- mvneta
- mvpp2
- socionext
- amazon ena
- bnxt
- freescale (dpaa2, dpaa)
- xen-frontend
- qede
- ice
- igb
- ixgbe
- i40e
- mlx5
- ti (cpsw, cpsw-new)
- tun
- sfc
Signed-off-by: NLorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Reviewed-by: NIoana Ciornei <ioana.ciornei@nxp.com>
Reviewed-by: NIlias Apalodimas <ilias.apalodimas@linaro.org>
Reviewed-by: NCamelia Groza <camelia.groza@nxp.com>
Acked-by: NEdward Cree <ecree.xilinx@gmail.com>
Acked-by: NJesper Dangaard Brouer <brouer@redhat.com>
Acked-by: NShay Agroskin <shayagr@amazon.com>
Link: https://lore.kernel.org/bpf/ed670de24f951cfd77590decf0229a0ad7fd12f6.1615201152.git.lorenzo@kernel.org

fdc13979

06 3月, 2021 1 次提交

veth: Store queue_mapping independently of XDP prog presence · edbea922

由 Maciej Fijalkowski 提交于 3月 03, 2021

Currently, veth_xmit() would call the skb_record_rx_queue() only when
there is XDP program loaded on peer interface in native mode.

If peer has XDP prog in generic mode, then netif_receive_generic_xdp()
has a call to netif_get_rxqueue(skb), so for multi-queue veth it will
not be possible to grab a correct rxq.

To fix that, store queue_mapping independently of XDP prog presence on
peer interface.

Fixes: 638264dc ("veth: Support per queue XDP ring")
Signed-off-by: NMaciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NToshiaki Makita <toshiaki.makita1@gmail.com>
Link: https://lore.kernel.org/bpf/20210303152903.11172-1-maciej.fijalkowski@intel.com

edbea922

04 2月, 2021 1 次提交

net, veth: Alloc skb in bulk for ndo_xdp_xmit · 65e6dcf7

由 Lorenzo Bianconi 提交于 1月 29, 2021

Split ndo_xdp_xmit and ndo_start_xmit use cases in veth_xdp_rcv routine
in order to alloc skbs in bulk for XDP_PASS verdict.

Introduce xdp_alloc_skb_bulk utility routine to alloc skb bulk list.
The proposed approach has been tested in the following scenario:

eth (ixgbe) --> XDP_REDIRECT --> veth0 --> (remote-ns) veth1 --> XDP_PASS

XDP_REDIRECT: xdp_redirect_map bpf sample
XDP_PASS: xdp_rxq_info bpf sample

traffic generator: pkt_gen sending udp traffic on a remote device

bpf-next master: ~3.64Mpps
bpf-next + skb bulking allocation: ~3.79Mpps
Signed-off-by: NLorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Reviewed-by: NToshiaki Makita <toshiaki.makita1@gmail.com>
Acked-by: NJesper Dangaard Brouer <brouer@redhat.com>
Link: https://lore.kernel.org/bpf/a14a30d3c06fff24e13f836c733d80efc0bd6eb5.1611957532.git.lorenzo@kernel.org

65e6dcf7

21 1月, 2021 1 次提交

net, xdp: Introduce xdp_build_skb_from_frame utility routine · 89f479f0

由 Lorenzo Bianconi 提交于 1月 12, 2021

Introduce xdp_build_skb_from_frame utility routine to build the skb
from xdp_frame. Respect to __xdp_build_skb_from_frame,
xdp_build_skb_from_frame will allocate the skb object. Rely on
xdp_build_skb_from_frame in veth driver.
Introduce missing xdp metadata support in veth_xdp_rcv_one routine.
Add missing metadata support in veth_xdp_rcv_one().
Signed-off-by: NLorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Reviewed-by: NToshiaki Makita <toshiaki.makita1@gmail.com>
Acked-by: NJesper Dangaard Brouer <brouer@redhat.com>
Link: https://lore.kernel.org/bpf/94ade9e853162ae1947941965193190da97457bc.1610475660.git.lorenzo@kernel.orgSigned-off-by: NAlexei Starovoitov <ast@kernel.org>

89f479f0

09 1月, 2021 2 次提交

net, xdp: Introduce xdp_prepare_buff utility routine · be9df4af

由 Lorenzo Bianconi 提交于 12月 22, 2020

Introduce xdp_prepare_buff utility routine to initialize per-descriptor
xdp_buff fields (e.g. xdp_buff pointers). Rely on xdp_prepare_buff() in
all XDP capable drivers.
Signed-off-by: NLorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Reviewed-by: NAlexander Duyck <alexanderduyck@fb.com>
Acked-by: NJesper Dangaard Brouer <brouer@redhat.com>
Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
Acked-by: NShay Agroskin <shayagr@amazon.com>
Acked-by: NMartin Habets <habetsm.xilinx@gmail.com>
Acked-by: NCamelia Groza <camelia.groza@nxp.com>
Acked-by: NMarcin Wojtas <mw@semihalf.com>
Link: https://lore.kernel.org/bpf/45f46f12295972a97da8ca01990b3e71501e9d89.1608670965.git.lorenzo@kernel.orgSigned-off-by: NAlexei Starovoitov <ast@kernel.org>

be9df4af

net, xdp: Introduce xdp_init_buff utility routine · 43b5169d

由 Lorenzo Bianconi 提交于 12月 22, 2020

Introduce xdp_init_buff utility routine to initialize xdp_buff fields
const over NAPI iterations (e.g. frame_sz or rxq pointer). Rely on
xdp_init_buff in all XDP capable drivers.
Signed-off-by: NLorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Reviewed-by: NAlexander Duyck <alexanderduyck@fb.com>
Acked-by: NJesper Dangaard Brouer <brouer@redhat.com>
Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
Acked-by: NShay Agroskin <shayagr@amazon.com>
Acked-by: NMartin Habets <habetsm.xilinx@gmail.com>
Acked-by: NCamelia Groza <camelia.groza@nxp.com>
Acked-by: NMarcin Wojtas <mw@semihalf.com>
Link: https://lore.kernel.org/bpf/7f8329b6da1434dc2b05a77f2e800b29628a8913.1608670965.git.lorenzo@kernel.orgSigned-off-by: NAlexei Starovoitov <ast@kernel.org>

43b5169d

01 12月, 2020 1 次提交

xsk: Propagate napi_id to XDP socket Rx path · b02e5a0e

由 Björn Töpel 提交于 11月 30, 2020

Add napi_id to the xdp_rxq_info structure, and make sure the XDP
socket pick up the napi_id in the Rx path. The napi_id is used to find
the corresponding NAPI structure for socket busy polling.
Signed-off-by: NBjörn Töpel <bjorn.topel@intel.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NIlias Apalodimas <ilias.apalodimas@linaro.org>
Acked-by: NMichael S. Tsirkin <mst@redhat.com>
Acked-by: NTariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/bpf/20201130185205.196029-7-bjorn.topel@gmail.com

b02e5a0e

17 11月, 2020 1 次提交

treewide: rename nla_strlcpy to nla_strscpy. · 872f6903

由 Francis Laniel 提交于 11月 15, 2020

Calls to nla_strlcpy are now replaced by calls to nla_strscpy which is the new
name of this function.
Signed-off-by: NFrancis Laniel <laniel_francis@privacyrequired.com>
Reviewed-by: NKees Cook <keescook@chromium.org>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

872f6903

12 10月, 2020 1 次提交

bpf: Add redirect_peer helper · 9aa1206e

由 Daniel Borkmann 提交于 10月 11, 2020

Add an efficient ingress to ingress netns switch that can be used out of tc BPF
programs in order to redirect traffic from host ns ingress into a container
veth device ingress without having to go via CPU backlog queue [0]. For local
containers this can also be utilized and path via CPU backlog queue only needs
to be taken once, not twice. On a high level this borrows from ipvlan which does
similar switch in __netif_receive_skb_core() and then iterates via another_round.
This helps to reduce latency for mentioned use cases.

Pod to remote pod with redirect(), TCP_RR [1]:

  # percpu_netperf 10.217.1.33
          RT_LATENCY:         122.450         (per CPU:         122.666         122.401         122.333         122.401 )
        MEAN_LATENCY:         121.210         (per CPU:         121.100         121.260         121.320         121.160 )
      STDDEV_LATENCY:         120.040         (per CPU:         119.420         119.910         125.460         115.370 )
         MIN_LATENCY:          46.500         (per CPU:          47.000          47.000          47.000          45.000 )
         P50_LATENCY:         118.500         (per CPU:         118.000         119.000         118.000         119.000 )
         P90_LATENCY:         127.500         (per CPU:         127.000         128.000         127.000         128.000 )
         P99_LATENCY:         130.750         (per CPU:         131.000         131.000         129.000         132.000 )

    TRANSACTION_RATE:       32666.400         (per CPU:        8152.200        8169.842        8174.439        8169.897 )

Pod to remote pod with redirect_peer(), TCP_RR:

  # percpu_netperf 10.217.1.33
          RT_LATENCY:          44.449         (per CPU:          43.767          43.127          45.279          45.622 )
        MEAN_LATENCY:          45.065         (per CPU:          44.030          45.530          45.190          45.510 )
      STDDEV_LATENCY:          84.823         (per CPU:          66.770          97.290          84.380          90.850 )
         MIN_LATENCY:          33.500         (per CPU:          33.000          33.000          34.000          34.000 )
         P50_LATENCY:          43.250         (per CPU:          43.000          43.000          43.000          44.000 )
         P90_LATENCY:          46.750         (per CPU:          46.000          47.000          47.000          47.000 )
         P99_LATENCY:          52.750         (per CPU:          51.000          54.000          53.000          53.000 )

    TRANSACTION_RATE:       90039.500         (per CPU:       22848.186       23187.089       22085.077       21919.130 )

  [0] https://linuxplumbersconf.org/event/7/contributions/674/attachments/568/1002/plumbers_2020_cilium_load_balancer.pdf
  [1] https://github.com/borkmann/netperf_scripts/blob/master/percpu_netperfSigned-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20201010234006.7075-3-daniel@iogearbox.net

9aa1206e

11 9月, 2020 1 次提交

net: remove napi_hash_del() from driver-facing API · 5198d545

由 Jakub Kicinski 提交于 9月 09, 2020

We allow drivers to call napi_hash_del() before calling
netif_napi_del() to batch RCU grace periods. This makes
the API asymmetric and leaks internal implementation details.
Soon we will want the grace period to protect more than just
the NAPI hash table.

Restructure the API and have drivers call a new function -
__netif_napi_del() if they want to take care of RCU waits.

Note that only core was checking the return status from
napi_hash_del() so the new helper does not report if the
NAPI was actually deleted.

Some notes on driver oddness:
 - veth observed the grace period before calling netif_napi_del()
   but that should not matter
 - myri10ge observed normal RCU flavor
 - bnx2x and enic did not actually observe the grace period
   (unless they did so implicitly)
 - virtio_net and enic only unhashed Rx NAPIs

The last two points seem to indicate that the calls to
napi_hash_del() were a left over rather than an optimization.
Regardless, it's easy enough to correct them.

This patch may introduce extra synchronize_net() calls for
interfaces which set NAPI_STATE_NO_BUSY_POLL and depend on
free_netdev() to call netif_napi_del(). This seems inevitable
since we want to use RCU for netpoll dev->napi_list traversal,
and almost no drivers set IFF_DISABLE_NETPOLL.
Signed-off-by: NJakub Kicinski <kuba@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5198d545

24 8月, 2020 1 次提交

treewide: Use fallthrough pseudo-keyword · df561f66

由 Gustavo A. R. Silva 提交于 8月 23, 2020

Replace the existing /* fall through */ comments and its variants with
the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary
fall-through markings when it is the case.

[1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-throughSigned-off-by: NGustavo A. R. Silva <gustavoars@kernel.org>

df561f66

20 8月, 2020 1 次提交

net-veth: Add type safety to veth_xdp_to_ptr() and veth_ptr_to_xdp() · defcffeb

由 Maciej Żenczykowski 提交于 8月 18, 2020

This reduces likelihood of incorrect use.

Test: builds
Signed-off-by: NMaciej Żenczykowski <maze@google.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200819020027.4072288-1-zenczykowski@gmail.com

defcffeb

26 7月, 2020 1 次提交

bpf, xdp: Remove XDP_QUERY_PROG and XDP_QUERY_PROG_HW XDP commands · e8407fde

由 Andrii Nakryiko 提交于 7月 21, 2020

Now that BPF program/link management is centralized in generic net_device
code, kernel code never queries program id from drivers, so
XDP_QUERY_PROG/XDP_QUERY_PROG_HW commands are unnecessary.

This patch removes all the implementations of those commands in kernel, along
the xdp_attachment_query().

This patch was compile-tested on allyesconfig.
Signed-off-by: NAndrii Nakryiko <andriin@fb.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200722064603.3350758-10-andriin@fb.com

e8407fde

02 6月, 2020 2 次提交

xdp: Rename convert_to_xdp_frame in xdp_convert_buff_to_frame · 1b698fa5

由 Lorenzo Bianconi 提交于 5月 28, 2020

In order to use standard 'xdp' prefix, rename convert_to_xdp_frame
utility routine in xdp_convert_buff_to_frame and replace all the
occurrences
Signed-off-by: NLorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NJesper Dangaard Brouer <brouer@redhat.com>
Link: https://lore.kernel.org/bpf/6344f739be0d1a08ab2b9607584c4d5478c8c083.1590698295.git.lorenzo@kernel.org

1b698fa5

xdp: Introduce xdp_convert_frame_to_buff utility routine · fc379872

由 Lorenzo Bianconi 提交于 5月 28, 2020

Introduce xdp_convert_frame_to_buff utility routine to initialize xdp_buff
fields from xdp_frames ones. Rely on xdp_convert_frame_to_buff in veth xdp
code.
Suggested-by: NJesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: NLorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NJesper Dangaard Brouer <brouer@redhat.com>
Link: https://lore.kernel.org/bpf/87acf133073c4b2d4cbb8097e8c2480c0a0fac32.1590698295.git.lorenzo@kernel.org

fc379872

15 5月, 2020 2 次提交

veth: Xdp using frame_sz in veth driver · 45a9e6d8

由 Jesper Dangaard Brouer 提交于 5月 14, 2020

The veth driver can run XDP in "native" mode in it's own NAPI
handler, and since commit 9fc8d518 ("veth: Handle xdp_frames in
xdp napi ring") packets can come in two forms either xdp_frame or
skb, calling respectively veth_xdp_rcv_one() or veth_xdp_rcv_skb().

For packets to arrive in xdp_frame format, they will have been
redirected from an XDP native driver. In case of XDP_PASS or no
XDP-prog attached, the veth driver will allocate and create an SKB.

The current code in veth_xdp_rcv_one() xdp_frame case, had to guess
the frame truesize of the incoming xdp_frame, when using
veth_build_skb(). With xdp_frame->frame_sz this is not longer
necessary.

Calculating the frame_sz in veth_xdp_rcv_skb() skb case, is done
similar to the XDP-generic handling code in net/core/dev.c.
Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Reviewed-by: NLorenzo Bianconi <lorenzo@kernel.org>
Acked-by: NToke Høiland-Jørgensen <toke@redhat.com>
Acked-by: NToshiaki Makita <toshiaki.makita1@gmail.com>
Link: https://lore.kernel.org/bpf/158945338840.97035.935897116345700902.stgit@firesoul

45a9e6d8

veth: Adjust hard_start offset on redirect XDP frames · 5c857225

由 Jesper Dangaard Brouer 提交于 5月 14, 2020

When native XDP redirect into a veth device, the frame arrives in the
xdp_frame structure. It is then processed in veth_xdp_rcv_one(),
which can run a new XDP bpf_prog on the packet. Doing so requires
converting xdp_frame to xdp_buff, but the tricky part is that
xdp_frame memory area is located in the top (data_hard_start) memory
area that xdp_buff will point into.

The current code tried to protect the xdp_frame area, by assigning
xdp_buff.data_hard_start past this memory. This results in 32 bytes
less headroom to expand into via BPF-helper bpf_xdp_adjust_head().

This protect step is actually not needed, because BPF-helper
bpf_xdp_adjust_head() already reserve this area, and don't allow
BPF-prog to expand into it. Thus, it is safe to point data_hard_start
directly at xdp_frame memory area.

Fixes: 9fc8d518 ("veth: Handle xdp_frames in xdp napi ring")
Reported-by: NMao Wenan <maowenan@huawei.com>
Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NToshiaki Makita <toshiaki.makita1@gmail.com>
Acked-by: NToke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/bpf/158945338331.97035.5923525383710752178.stgit@firesoul

5c857225

27 3月, 2020 2 次提交

veth: rely on peer veth_rq for ndo_xdp_xmit accounting · 5fe6e567

由 Lorenzo Bianconi 提交于 3月 26, 2020

Rely on 'remote' veth_rq to account ndo_xdp_xmit ethtool counters.
Move XDP_TX accounting to veth_xdp_flush_bq routine.
Remove 'rx' prefix in rx xdp ethool counters
Signed-off-by: NLorenzo Bianconi <lorenzo@kernel.org>
Acked-by: NToshiaki Makita <toshiaki.makita1@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5fe6e567

veth: rely on veth_rq in veth_xdp_flush_bq signature · bd32aa1f

由 Lorenzo Bianconi 提交于 3月 26, 2020

Substitute net_device point with veth_rq one in veth_xdp_flush_bq,
veth_xdp_flush and veth_xdp_tx signature. This is a preliminary patch
to account xdp_xmit counter on 'receiving' veth_rq
Acked-by: NToshiaki Makita <toshiaki.makita1@gmail.com>
Signed-off-by: NLorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bd32aa1f

20 3月, 2020 5 次提交

veth: remove atomic64_add from veth_xdp_xmit hotpath · d99a7c2f

由 Lorenzo Bianconi 提交于 3月 19, 2020

Remove atomic64_add from veth_xdp_xmit hotpath and rely on
xdp_xmit_err/xdp_tx_err counters
Signed-off-by: NLorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d99a7c2f

veth: introduce more xdp counters · 9152cff0

由 Lorenzo Bianconi 提交于 3月 19, 2020

Introduce xdp_xmit counter in order to distinguish between XDP_TX and
ndo_xdp_xmit stats. Introduce the following ethtool counters:
- rx_xdp_tx
- rx_xdp_tx_errors
- tx_xdp_xmit
- tx_xdp_xmit_errors
- rx_xdp_redirect
Signed-off-by: NLorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9152cff0

veth: distinguish between rx_drops and xdp_drops · 66fe4a07

由 Lorenzo Bianconi 提交于 3月 19, 2020

Distinguish between rx_drops and xdp_drops since the latter is already
reported in rx_packets. Report xdp_drops in ethtool statistics
Signed-off-by: NLorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

66fe4a07

veth: introduce more specialized counters in veth_stats · 1c5b82e5

由 Lorenzo Bianconi 提交于 3月 19, 2020

Introduce xdp_tx, xdp_redirect and rx_drops counters in veth_stats data
structure. Move stats accounting in veth_poll. Remove xdp_xmit variable
in veth_xdp_rcv_one/veth_xdp_rcv_skb and rely on veth_stats counters.
This is a preliminary patch to align veth xdp statistics to mlx, intel
and marvell xdp implementation
Signed-off-by: NLorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1c5b82e5

veth: move xdp stats in a dedicated structure · 65780c56

由 Lorenzo Bianconi 提交于 3月 19, 2020

Move xdp stats in veth_stats data structure. This is a preliminary patch
to align xdp statistics to mlx5, ixgbe and mvneta drivers
Signed-off-by: NLorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

65780c56

06 3月, 2020 1 次提交

veth: ignore peer tx_dropped when counting local rx_dropped · e25d5dbc

由 Jiang Lidong 提交于 3月 04, 2020

When local NET_RX backlog is full due to traffic overrun,
peer veth tx_dropped counter increases. At that time, list
local veth stats, rx_dropped has double value of peer
tx_dropped, even bigger than transmit packets by peer.

In NET_RX softirq process, if any packet drop case happens,
it increases dev's rx_dropped counter and returns NET_RX_DROP.

At veth tx side, it records any error returned from peer netif_rx
into local dev tx_dropped counter.

In veth get stats process, it puts local dev rx_dropped and
peer dev tx_dropped into together as local rx_drpped value.
So that it shows double value of real dropped packets number in
this case.

This patch ignores peer tx_dropped when counting local rx_dropped,
since peer tx_dropped is duplicated to local rx_dropped at most cases.
Signed-off-by: NJiang Lidong <jianglidong3@jd.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e25d5dbc

27 1月, 2020 1 次提交

bpf, xdp: Remove no longer required rcu_read_{un}lock() · b23bfa56

由 John Fastabend 提交于 1月 26, 2020

Now that we depend on rcu_call() and synchronize_rcu() to also wait
for preempt_disabled region to complete the rcu read critical section
in __dev_map_flush() is no longer required. Except in a few special
cases in drivers that need it for other reasons.

These originally ensured the map reference was safe while a map was
also being free'd. And additionally that bpf program updates via
ndo_bpf did not happen while flush updates were in flight. But flush
by new rules can only be called from preempt-disabled NAPI context.
The synchronize_rcu from the map free path and the rcu_call from the
delete path will ensure the reference there is safe. So lets remove
the rcu_read_lock and rcu_read_unlock pair to avoid any confusion
around how this is being protected.

If the rcu_read_lock was required it would mean errors in the above
logic and the original patch would also be wrong.

Now that we have done above we put the rcu_read_lock in the driver
code where it is needed in a driver dependent way. I think this
helps readability of the code so we know where and why we are
taking read locks. Most drivers will not need rcu_read_locks here
and further XDP drivers already have rcu_read_locks in their code
paths for reading xdp programs on RX side so this makes it symmetric
where we don't have half of rcu critical sections define in driver
and the other half in devmap.
Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NJesper Dangaard Brouer <brouer@redhat.com>
Link: https://lore.kernel.org/bpf/1580084042-11598-4-git-send-email-john.fastabend@gmail.com

b23bfa56

17 1月, 2020 1 次提交

xdp: Use bulking for non-map XDP_REDIRECT and consolidate code paths · 1d233886

由 Toke Høiland-Jørgensen 提交于 1月 16, 2020

Since the bulk queue used by XDP_REDIRECT now lives in struct net_device,
we can re-use the bulking for the non-map version of the bpf_redirect()
helper. This is a simple matter of having xdp_do_redirect_slow() queue the
frame on the bulk queue instead of sending it out with __bpf_tx_xdp().

Unfortunately we can't make the bpf_redirect() helper return an error if
the ifindex doesn't exit (as bpf_redirect_map() does), because we don't
have a reference to the network namespace of the ingress device at the time
the helper is called. So we have to leave it as-is and keep the device
lookup in xdp_do_redirect_slow().

Since this leaves less reason to have the non-map redirect code in a
separate function, so we get rid of the xdp_do_redirect_slow() function
entirely. This does lose us the tracepoint disambiguation, but fortunately
the xdp_redirect and xdp_redirect_map tracepoints use the same tracepoint
entry structures. This means both can contain a map index, so we can just
amend the tracepoint definitions so we always emit the xdp_redirect(_err)
tracepoints, but with the map ID only populated if a map is present. This
means we retire the xdp_redirect_map(_err) tracepoints entirely, but keep
the definitions around in case someone is still listening for them.

With this change, the performance of the xdp_redirect sample program goes
from 5Mpps to 8.4Mpps (a 68% increase).

Since the flush functions are no longer map-specific, rename the flush()
functions to drop _map from their names. One of the renamed functions is
the xdp_do_flush_map() callback used in all the xdp-enabled drivers. To
keep from having to update all drivers, use a #define to keep the old name
working, and only update the virtual drivers in this patch.
Signed-off-by: NToke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/157918768505.1458396.17518057312953572912.stgit@toke.dk

1d233886

08 11月, 2019 1 次提交

veth: use standard dev_lstats_add() and dev_lstats_read() · b4fba476

由 Eric Dumazet 提交于 11月 07, 2019

This cleanup will ease u64_stats_t adoption in a single location.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b4fba476

25 6月, 2019 1 次提交

veth: Support bulk XDP_TX · 9cda7807

由 Toshiaki Makita 提交于 6月 13, 2019

XDP_TX is similar to XDP_REDIRECT as it essentially redirects packets to
the device itself. XDP_REDIRECT has bulk transmit mechanism to avoid the
heavy cost of indirect call but it also reduces lock acquisition on the
destination device that needs locks like veth and tun.

XDP_TX does not use indirect calls but drivers which require locks can
benefit from the bulk transmit for XDP_TX as well.

This patch introduces bulk transmit mechanism in veth using bulk queue
on stack, and improves XDP_TX performance by about 9%.

Here are single-core/single-flow XDP_TX test results. CPU consumptions
are taken from "perf report --no-child".

- Before:

  7.26 Mpps

  _raw_spin_lock  7.83%
  veth_xdp_xmit  12.23%

- After:

  7.94 Mpps

  _raw_spin_lock  1.08%
  veth_xdp_xmit   6.10%

v2:
- Use stack for bulk queue instead of a global variable.
Signed-off-by: NToshiaki Makita <toshiaki.makita1@gmail.com>
Acked-by: NJesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

9cda7807

19 6月, 2019 1 次提交

veth: use xdp_release_frame for XDP_PASS · cbf33510

由 Jesper Dangaard Brouer 提交于 6月 18, 2019

Like cpumap use xdp_release_frame() when an xdp_frame got
converted into an SKB and send towars the network stack.
Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cbf33510

21 5月, 2019 1 次提交

treewide: Add SPDX license identifier for more missed files · 09c434b8

由 Thomas Gleixner 提交于 5月 19, 2019

Add SPDX license identifiers to all files which:

 - Have no license information of any form

 - Have MODULE_LICENCE("GPL*") inside which was used in the initial
   scan/conversion to ignore the file

These files fall under the project license, GPL v2 only. The resulting SPDX
license identifier is:

  GPL-2.0-only
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

09c434b8

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功