提交 · fae55527ac1164b66bee983a4d82ade2bfedb332 · openeuler / Kernel

18 8月, 2019 12 次提交

selftests/bpf: fix race in test_tcp_rtt test · fae55527

由 Petar Penkov 提交于 8月 16, 2019

There is a race in this test between receiving the ACK for the
single-byte packet sent in the test, and reading the values from the
map.

This patch fixes this by having the client wait until there are no more
unacknowledged packets.

Before:
for i in {1..1000}; do ../net/in_netns.sh ./test_tcp_rtt; \
done | grep -c PASSED
< trimmed error messages >
993

After:
for i in {1..10000}; do ../net/in_netns.sh ./test_tcp_rtt; \
done | grep -c PASSED
10000

Fixes: b5587398 ("selftests/bpf: test BPF_SOCK_OPS_RTT_CB")
Signed-off-by: NPetar Penkov <ppenkov@google.com>
Reviewed-by: NStanislav Fomichev <sdf@google.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

fae55527

libbpf: relicense bpf_helpers.h and bpf_endian.h · 929ffa6e

由 Andrii Nakryiko 提交于 8月 15, 2019

bpf_helpers.h and bpf_endian.h contain useful macros and BPF helper
definitions essential to almost every BPF program. Which makes them
useful not just for selftests. To be able to expose them as part of
libbpf, though, we need them to be dual-licensed as LGPL-2.1 OR
BSD-2-Clause. This patch updates licensing of those two files.
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NHechao Li <hechaol@fb.com>
Acked-by: NMartin KaFai Lau <kafai@fb.com>
Acked-by: NAndrey Ignatov <rdna@fb.com>
Acked-by: NYonghong Song <yhs@fb.com>
Acked-by: NLawrence Brakmo <brakmo@fb.com>
Acked-by: NAdam Barth <arb@fb.com>
Acked-by: NRoman Gushchin <guro@fb.com>
Acked-by: NJosef Bacik <jbacik@fb.com>
Acked-by: NJoe Stringer <joe@wand.net.nz>
Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NJoel Fernandes (Google) <joel@joelfernandes.org>
Acked-by: NDavid Ahern <dsahern@gmail.com>
Acked-by: NJesper Dangaard Brouer <brouer@redhat.com>
Acked-by: NIlya Leoshkevich <iii@linux.ibm.com>
Acked-by: NLorenz Bauer <lmb@cloudflare.com>
Acked-by: NAdrian Ratiu <adrian.ratiu@collabora.com>
Acked-by: NNikita V. Shirokov <tehnerd@tehnerd.com>
Acked-by: NWillem de Bruijn <willemb@google.com>
Acked-by: NPetar Penkov <ppenkov@google.com>
Acked-by: NTeng Qin <palmtenor@gmail.com>
Cc: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Michal Rostecki <mrostecki@opensuse.org>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: Sargun Dhillon <sargun@sargun.me>
Signed-off-by: NAndrii Nakryiko <andriin@fb.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

929ffa6e

net: Don't call XDP_SETUP_PROG when nothing is changed · c14a9f63

由 Maxim Mikityanskiy 提交于 8月 14, 2019

Don't uninstall an XDP program when none is installed, and don't install
an XDP program that has the same ID as the one already installed.

dev_change_xdp_fd doesn't perform any checks in case it uninstalls an
XDP program. It means that the driver's ndo_bpf can be called with
XDP_SETUP_PROG asking to set it to NULL even if it's already NULL. This
case happens if the user runs `ip link set eth0 xdp off` when there is
no XDP program attached.

The symmetrical case is possible when the user tries to set the program
that is already set.

The drivers typically perform some heavy operations on XDP_SETUP_PROG,
so they all have to handle these cases internally to return early if
they happen. This patch puts this check into the kernel code, so that
all drivers will benefit from it.
Signed-off-by: NMaxim Mikityanskiy <maximmi@mellanox.com>
Acked-by: NJonathan Lemon <jonathan.lemon@gmail.com>
Reviewed-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

c14a9f63

Merge branch 'bpf-af-xdp-wakeup' · c8186c80

由 Daniel Borkmann 提交于 8月 17, 2019

Magnus Karlsson says:

====================
This patch set adds support for a new flag called need_wakeup in the
AF_XDP Tx and fill rings. When this flag is set by the driver, it
means that the application has to explicitly wake up the kernel Rx
(for the bit in the fill ring) or kernel Tx (for bit in the Tx ring)
processing by issuing a syscall. Poll() can wake up both and sendto()
will wake up Tx processing only.

The main reason for introducing this new flag is to be able to
efficiently support the case when application and driver is executing
on the same core. Previously, the driver was just busy-spinning on the
fill ring if it ran out of buffers in the HW and there were none to
get from the fill ring. This approach works when the application and
driver is running on different cores as the application can replenish
the fill ring while the driver is busy-spinning. Though, this is a
lousy approach if both of them are running on the same core as the
probability of the fill ring getting more entries when the driver is
busy-spinning is zero. With this new feature the driver now sets the
need_wakeup flag and returns to the application. The application can
then replenish the fill queue and then explicitly wake up the Rx
processing in the kernel using the syscall poll(). For Tx, the flag is
only set to one if the driver has no outstanding Tx completion
interrupts. If it has some, the flag is zero as it will be woken up by
a completion interrupt anyway. This flag can also be used in other
situations where the driver needs to be woken up explicitly.

As a nice side effect, this new flag also improves the Tx performance
of the case where application and driver are running on two different
cores as it reduces the number of syscalls to the kernel. The kernel
tells user space if it needs to be woken up by a syscall, and this
eliminates many of the syscalls. The Rx performance of the 2-core case
is on the other hand slightly worse, since there is a need to use a
syscall now to wake up the driver, instead of the driver
busy-spinning. It does waste less CPU cycles though, which might lead
to better overall system performance.

This new flag needs some simple driver support. If the driver does not
support it, the Rx flag is always zero and the Tx flag is always
one. This makes any application relying on this feature default to the
old behavior of not requiring any syscalls in the Rx path and always
having to call sendto() in the Tx path.

For backwards compatibility reasons, this feature has to be explicitly
turned on using a new bind flag (XDP_USE_NEED_WAKEUP). I recommend
that you always turn it on as it has a large positive performance
impact for the one core case and does not degrade 2 core performance
and actually improves it for Tx heavy workloads.

Here are some performance numbers measured on my local,
non-performance optimized development system. That is why you are
seeing numbers lower than the ones from Björn and Jesper. 64 byte
packets at 40Gbit/s line rate. All results in Mpps. Cores == 1 means
that both application and driver is executing on the same core. Cores
== 2 that they are on different cores.

                              Applications
need_wakeup  cores    txpush    rxdrop      l2fwd
---------------------------------------------------------------
     n         1       0.07      0.06        0.03
     y         1       21.6      8.2         6.5
     n         2       32.3      11.7        8.7
     y         2       33.1      11.7        8.7

Overall, the need_wakeup flag provides the same or better performance
in all the micro-benchmarks. The reduction of sendto() calls in txpush
is large. Only a few per second is needed. For l2fwd, the drop is 50%
for the 1 core case and more than 99.9% for the 2 core case. Do not
know why I am not seeing the same drop for the 1 core case yet.

The name and inspiration of the flag has been taken from io_uring by
Jens Axboe. Details about this feature in io_uring can be found in
http://kernel.dk/io_uring.pdf, section 8.3. It also addresses most of
the denial of service and sendto() concerns raised by Maxim
Mikityanskiy in https://www.spinics.net/lists/netdev/msg554657.html.

The typical Tx part of an application will have to change from:

ret = sendto(fd,....)

to:

if (xsk_ring_prod__needs_wakeup(&xsk->tx))
       ret = sendto(fd,....)

and th Rx part from:

rcvd = xsk_ring_cons__peek(&xsk->rx, BATCH_SIZE, &idx_rx);
if (!rcvd)
       return;

to:

rcvd = xsk_ring_cons__peek(&xsk->rx, BATCH_SIZE, &idx_rx);
if (!rcvd) {
       if (xsk_ring_prod__needs_wakeup(&xsk->umem->fq))
              ret = poll(fd,.....);
       return;
}

v3 -> v4:
* Maxim found a possible race in the Tx part of the driver. The
  setting of the flag needs to happen before the sending, otherwise it
  might trigger this race. Fixed in ixgbe and i40e driver.
* Mellanox support contributed by Maxim
* Removed the XSK_DRV_CAN_SLEEP flag as it was not used
  anymore. Thanks to Sridhar for discovering this.
* For consistency the feature is now always called need_wakeup. There
  were some places where it was referred to as might_sleep, but they
  have been removed. Thanks to Sridhar for spotting.
* Fixed some typos in the commit messages

v2 -> v3:
* Converted the Mellanox driver to the new ndo in patch 1 as pointed
  out by Maxim
* Fixed the compatibility code of XDP_MMAP_OFFSETS so it now works.

v1 -> v2:
* Fixed bisectability problem pointed out by Jakub
* Added missing initiliztion of the Tx need_wakeup flag to 1

This patch has been applied against commit b753c5a7 ("Merge branch 'r8152-RX-improve'")

Structure of the patch set:

Patch 1: Replaces the ndo_xsk_async_xmit with ndo_xsk_wakeup to
         support waking up both Rx and Tx processing
Patch 2: Implements the need_wakeup functionality in common code
Patch 3-4: Add need_wakeup support to the i40e and ixgbe drivers
Patch 5: Add need_wakeup support to libbpf
Patch 6: Add need_wakeup support to the xdpsock sample application
Patch 7-8: Add need_wakeup support to the Mellanox mlx5 driver
====================
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

c8186c80

net/mlx5e: Add AF_XDP need_wakeup support · a7bd4018

由 Maxim Mikityanskiy 提交于 8月 14, 2019

This commit adds support for the new need_wakeup feature of AF_XDP. The
applications can opt-in by using the XDP_USE_NEED_WAKEUP bind() flag.
When this feature is enabled, some behavior changes:

RX side: If the Fill Ring is empty, instead of busy-polling, set the
flag to tell the application to kick the driver when it refills the Fill
Ring.

TX side: If there are pending completions or packets queued for
transmission, set the flag to tell the application that it can skip the
sendto() syscall and save time.

The performance testing was performed on a machine with the following
configuration:

- 24 cores of Intel Xeon E5-2620 v3 @ 2.40 GHz
- Mellanox ConnectX-5 Ex with 100 Gbit/s link

The results with retpoline disabled:

       | without need_wakeup  | with need_wakeup     |
       |----------------------|----------------------|
       | one core | two cores | one core | two cores |
-------|----------|-----------|----------|-----------|
txonly | 20.1     | 33.5      | 29.0     | 34.2      |
rxdrop | 0.065    | 14.1      | 12.0     | 14.1      |
l2fwd  | 0.032    | 7.3       | 6.6      | 7.2       |

"One core" means the application and NAPI run on the same core. "Two
cores" means they are pinned to different cores.
Signed-off-by: NMaxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: NTariq Toukan <tariqt@mellanox.com>
Reviewed-by: NSaeed Mahameed <saeedm@mellanox.com>
Acked-by: NJonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

a7bd4018

net/mlx5e: Move the SW XSK code from NAPI poll to a separate function · 871aa189

由 Maxim Mikityanskiy 提交于 8月 14, 2019

Two XSK tasks are performed during NAPI polling, that are not bound to
hardware interrupts: TXing packets and polling for frames in the Fill
Ring. They are special in a way that the hardware doesn't know about
these tasks, so it doesn't trigger interrupts if there is still some
work to be done, it's our driver's responsibility to ensure NAPI will be
rescheduled if needed.

Create a new function to handle these tasks and move the corresponding
code from mlx5e_napi_poll to the new function to improve modularity and
prepare for the changes in the following patch.
Signed-off-by: NMaxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: NTariq Toukan <tariqt@mellanox.com>
Reviewed-by: NSaeed Mahameed <saeedm@mellanox.com>
Acked-by: NJonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

871aa189

samples/bpf: add use of need_wakeup flag in xdpsock · 46738f73

由 Magnus Karlsson 提交于 8月 14, 2019

This commit adds using the need_wakeup flag to the xdpsock sample
application. It is turned on by default as we think it is a feature
that seems to always produce a performance benefit, if the application
has been written taking advantage of it. It can be turned off in the
sample app by using the '-m' command line option.

The txpush and l2fwd sub applications have also been updated to
support poll() with multiple sockets.
Signed-off-by: NMagnus Karlsson <magnus.karlsson@intel.com>
Acked-by: NJonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

46738f73

libbpf: add support for need_wakeup flag in AF_XDP part · a4500432

由 Magnus Karlsson 提交于 8月 14, 2019

This commit adds support for the new need_wakeup flag in AF_XDP. The
xsk_socket__create function is updated to handle this and a new
function is introduced called xsk_ring_prod__needs_wakeup(). This
function can be used by the application to check if Rx and/or Tx
processing needs to be explicitly woken up.
Signed-off-by: NMagnus Karlsson <magnus.karlsson@intel.com>
Acked-by: NJonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

a4500432

ixgbe: add support for AF_XDP need_wakeup feature · 5c129241

由 Magnus Karlsson 提交于 8月 14, 2019

This patch adds support for the need_wakeup feature of AF_XDP. If the
application has told the kernel that it might sleep using the new bind
flag XDP_USE_NEED_WAKEUP, the driver will then set this flag if it has
no more buffers on the NIC Rx ring and yield to the application. For
Tx, it will set the flag if it has no outstanding Tx completion
interrupts and return to the application.
Signed-off-by: NMagnus Karlsson <magnus.karlsson@intel.com>
Acked-by: NJonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

5c129241

i40e: add support for AF_XDP need_wakeup feature · 3d0c5f1c

由 Magnus Karlsson 提交于 8月 14, 2019

This patch adds support for the need_wakeup feature of AF_XDP. If the
application has told the kernel that it might sleep using the new bind
flag XDP_USE_NEED_WAKEUP, the driver will then set this flag if it has
no more buffers on the NIC Rx ring and yield to the application. For
Tx, it will set the flag if it has no outstanding Tx completion
interrupts and return to the application.
Signed-off-by: NMagnus Karlsson <magnus.karlsson@intel.com>
Acked-by: NJonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

3d0c5f1c

xsk: add support for need_wakeup flag in AF_XDP rings · 77cd0d7b

由 Magnus Karlsson 提交于 8月 14, 2019

This commit adds support for a new flag called need_wakeup in the
AF_XDP Tx and fill rings. When this flag is set, it means that the
application has to explicitly wake up the kernel Rx (for the bit in
the fill ring) or kernel Tx (for bit in the Tx ring) processing by
issuing a syscall. Poll() can wake up both depending on the flags
submitted and sendto() will wake up tx processing only.

The main reason for introducing this new flag is to be able to
efficiently support the case when application and driver is executing
on the same core. Previously, the driver was just busy-spinning on the
fill ring if it ran out of buffers in the HW and there were none on
the fill ring. This approach works when the application is running on
another core as it can replenish the fill ring while the driver is
busy-spinning. Though, this is a lousy approach if both of them are
running on the same core as the probability of the fill ring getting
more entries when the driver is busy-spinning is zero. With this new
feature the driver now sets the need_wakeup flag and returns to the
application. The application can then replenish the fill queue and
then explicitly wake up the Rx processing in the kernel using the
syscall poll(). For Tx, the flag is only set to one if the driver has
no outstanding Tx completion interrupts. If it has some, the flag is
zero as it will be woken up by a completion interrupt anyway.

As a nice side effect, this new flag also improves the performance of
the case where application and driver are running on two different
cores as it reduces the number of syscalls to the kernel. The kernel
tells user space if it needs to be woken up by a syscall, and this
eliminates many of the syscalls.

This flag needs some simple driver support. If the driver does not
support this, the Rx flag is always zero and the Tx flag is always
one. This makes any application relying on this feature default to the
old behaviour of not requiring any syscalls in the Rx path and always
having to call sendto() in the Tx path.

For backwards compatibility reasons, this feature has to be explicitly
turned on using a new bind flag (XDP_USE_NEED_WAKEUP). I recommend
that you always turn it on as it so far always have had a positive
performance impact.

The name and inspiration of the flag has been taken from io_uring by
Jens Axboe. Details about this feature in io_uring can be found in
http://kernel.dk/io_uring.pdf, section 8.3.
Signed-off-by: NMagnus Karlsson <magnus.karlsson@intel.com>
Acked-by: NJonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

77cd0d7b

xsk: replace ndo_xsk_async_xmit with ndo_xsk_wakeup · 9116e5e2

由 Magnus Karlsson 提交于 8月 14, 2019

This commit replaces ndo_xsk_async_xmit with ndo_xsk_wakeup. This new
ndo provides the same functionality as before but with the addition of
a new flags field that is used to specifiy if Rx, Tx or both should be
woken up. The previous ndo only woke up Tx, as implied by the
name. The i40e and ixgbe drivers (which are all the supported ones)
are updated with this new interface.

This new ndo will be used by the new need_wakeup functionality of XDP
sockets that need to be able to wake up both Rx and Tx driver
processing.
Signed-off-by: NMagnus Karlsson <magnus.karlsson@intel.com>
Acked-by: NJonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

9116e5e2

16 8月, 2019 14 次提交

btf: fix return value check in btf_vmlinux_init() · e0325006

由 Wei Yongjun 提交于 8月 16, 2019

In case of error, the function kobject_create_and_add() returns NULL
pointer not ERR_PTR(). The IS_ERR() test in the return value check
should be replaced with NULL test.

Fixes: 341dfcf8 ("btf: expose BTF info through sysfs")
Signed-off-by: NWei Yongjun <weiyongjun1@huawei.com>
Acked-by: NAndrii Nakryiko <andriin@fb.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

e0325006

Merge branch 'fix-printf' · 82c4c3b7

由 Alexei Starovoitov 提交于 8月 15, 2019

Quentin Monnet says:

====================
Because the "__printf()" attributes were used only where the functions are
implemented, and not in header files, the checks have not been enforced on
all the calls to printf()-like functions, and a number of errors slipped in
bpftool over time.

This set cleans up such errors, and then moves the "__printf()" attributes
to header files, so that the checks are performed at all locations.
====================
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

82c4c3b7

tools: bpftool: move "__printf()" attributes to header file · 8918dc42

由 Quentin Monnet 提交于 8月 15, 2019

Some functions in bpftool have a "__printf()" format attributes to tell
the compiler they should expect printf()-like arguments. But because
these attributes are not used for the function prototypes in the header
files, the compiler does not run the checks everywhere the functions are
used, and some mistakes on format string and corresponding arguments
slipped in over time.

Let's move the __printf() attributes to the correct places.

Note: We add guards around the definition of GCC_VERSION in
tools/include/linux/compiler-gcc.h to prevent a conflict in jit_disasm.c
on GCC_VERSION from headers pulled via libbfd.

Fixes: c101189b ("tools: bpftool: fix -Wmissing declaration warnings")
Reported-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: NQuentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

8918dc42

tools: bpftool: fix format string for p_err() in detect_common_prefix() · b0ead6d7

由 Quentin Monnet 提交于 8月 15, 2019

There is one call to the p_err() function in detect_common_prefix()
where the message to print is passed directly as the first argument,
without using a format string. This is harmless, but may trigger
warnings if the "__printf()" attribute is used correctly for the p_err()
function. Let's fix it by using a "%s" format string.

Fixes: ba95c745 ("tools: bpftool: add "prog run" subcommand to test-run programs")
Signed-off-by: NQuentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

b0ead6d7

tools: bpftool: fix format string for p_err() in query_flow_dissector() · 8a15d5ce

由 Quentin Monnet 提交于 8月 15, 2019

The format string passed to one call to the p_err() function in
query_flow_dissector() does not match the value that should be printed,
resulting in some garbage integer being printed instead of
strerror(errno) if /proc/self/ns/net cannot be open. Let's fix the
format string.

Fixes: 7f0c57fe ("bpftool: show flow_dissector attachment status")
Signed-off-by: NQuentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

8a15d5ce

tools: bpftool: fix argument for p_err() in BTF do_dump() · ed4a3983

由 Quentin Monnet 提交于 8月 15, 2019

The last argument passed to one call to the p_err() function is not
correct, it should be "*argv" instead of "**argv". This may lead to a
segmentation fault error if BTF id cannot be parsed correctly. Let's fix
this.

Fixes: c93cc690t ("bpftool: add ability to dump BTF types")
Signed-off-by: NQuentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

ed4a3983

tools: bpftool: fix format strings and arguments for jsonw_printf() · 22c349e8

由 Quentin Monnet 提交于 8月 15, 2019

There are some mismatches between format strings and arguments passed to
jsonw_printf() in the BTF dumper for bpftool, which seems harmless but
may result in warnings if the "__printf()" attribute is used correctly
for jsonw_printf(). Let's fix relevant format strings and type cast.

Fixes: b12d6ec0 ("bpf: btf: add btf print functionality")
Signed-off-by: NQuentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

22c349e8

tools: bpftool: fix arguments for p_err() in do_event_pipe() · 9def249d

由 Quentin Monnet 提交于 8月 15, 2019

The last argument passed to some calls to the p_err() functions is not
correct, it should be "*argv" instead of "**argv". This may lead to a
segmentation fault error if CPU IDs or indices from the command line
cannot be parsed correctly. Let's fix this.

Fixes: f412eed9 ("tools: bpftool: add simple perf event output reader")
Signed-off-by: NQuentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

9def249d

libbpf: make libbpf.map source of truth for libbpf version · dadb81d0

由 Andrii Nakryiko 提交于 8月 14, 2019

Currently libbpf version is specified in 2 places: libbpf.map and
Makefile. They easily get out of sync and it's very easy to update one,
but forget to update another one. In addition, Github projection of
libbpf has to maintain its own version which has to be remembered to be
kept in sync manually, which is very error-prone approach.

This patch makes libbpf.map a source of truth for libbpf version and
uses shell invocation to parse out correct full and major libbpf version
to use during build. Now we need to make sure that once new release
cycle starts, we need to add (initially) empty section to libbpf.map
with correct latest version.

This also will make it possible to keep Github projection consistent
with kernel sources version of libbpf by adopting similar parsing of
version from libbpf.map.

v2->v3:
- grep -o + sort -rV (Andrey);

v1->v2:
- eager version vars evaluation (Jakub);
- simplified version regex (Andrey);

Cc: Andrey Ignatov <rdna@fb.com>
Signed-off-by: NAndrii Nakryiko <andriin@fb.com>
Acked-by: NAndrey Ignatov <rdna@fb.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

dadb81d0

Merge branch 'bpftool-net-attach' · 37b7c058

由 Alexei Starovoitov 提交于 8月 15, 2019

Daniel T. Lee says:

====================
Currently, bpftool net only supports dumping progs attached on the
interface. To attach XDP prog on interface, user must use other tool
(eg. iproute2). By this patch, with `bpftool net attach/detach`, user
can attach/detach XDP prog on interface.

    # bpftool prog
        16: xdp  name xdp_prog1  tag 539ec6ce11b52f98  gpl
        loaded_at 2019-08-07T08:30:17+0900  uid 0
        ...
        20: xdp  name xdp_fwd_prog  tag b9cb69f121e4a274  gpl
        loaded_at 2019-08-07T08:30:17+0900  uid 0

    # bpftool net attach xdpdrv id 16 dev enp6s0np0
    # bpftool net
    xdp:
        enp6s0np0(4) driver id 16

    # bpftool net attach xdpdrv id 20 dev enp6s0np0 overwrite
    # bpftool net
    xdp:
        enp6s0np0(4) driver id 20

    # bpftool net detach xdpdrv dev enp6s0np0
    # bpftool net
    xdp:

While this patch only contains support for XDP, through `net
attach/detach`, bpftool can further support other prog attach types.

XDP attach/detach tested on Mellanox ConnectX-4 and Netronome Agilio.

---
Changes in v5:
  - fix wrong error message, from errno to err with do_attach/detach

Changes in v4:
  - rename variable, attach/detach error message enhancement
  - bash-completion cleanup, doc update with brief description (attach
    types)

Changes in v3:
  - added 'overwrite' option for replacing previously attached XDP prog
  - command argument order has been changed ('ATTACH_TYPE' comes first)
  - add 'dev' keyword in front of <devname>
  - added bash-completion and documentation

Changes in v2:
  - command 'load/unload' changed to 'attach/detach' for the consistency
====================
Reviewed-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

37b7c058

tools: bpftool: add documentation for net attach/detach · cb9d9968

由 Daniel T. Lee 提交于 8月 13, 2019

Since, new sub-command 'net attach/detach' has been added for
attaching XDP program on interface,
this commit documents usage and sample output of `net attach/detach`.
Signed-off-by: NDaniel T. Lee <danieltimlee@gmail.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

cb9d9968

tools: bpftool: add bash-completion for net attach/detach · 10a708c2

由 Daniel T. Lee 提交于 8月 13, 2019

This commit adds bash-completion for new "net attach/detach"
subcommand for attaching XDP program on interface.
Signed-off-by: NDaniel T. Lee <danieltimlee@gmail.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

10a708c2

tools: bpftool: add net detach command to detach XDP on interface · 37c7f863

由 Daniel T. Lee 提交于 8月 13, 2019

By this commit, using `bpftool net detach`, the attached XDP prog can
be detached. Detaching the BPF prog will be done through libbpf
'bpf_set_link_xdp_fd' with the progfd set to -1.
Acked-by: NYonghong Song <yhs@fb.com>
Signed-off-by: NDaniel T. Lee <danieltimlee@gmail.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

37c7f863

tools: bpftool: add net attach command to attach XDP on interface · 04949ccc

由 Daniel T. Lee 提交于 8月 13, 2019

By this commit, using `bpftool net attach`, user can attach XDP prog on
interface. New type of enum 'net_attach_type' has been made, as stat ted at
cover-letter, the meaning of 'attach' is, prog will be attached on interface.

With 'overwrite' option at argument, attached XDP program could be replaced.
Added new helper 'net_parse_dev' to parse the network device at argument.

BPF prog will be attached through libbpf 'bpf_set_link_xdp_fd'.
Acked-by: NYonghong Song <yhs@fb.com>
Signed-off-by: NDaniel T. Lee <danieltimlee@gmail.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

04949ccc

15 8月, 2019 1 次提交

tools: bpftool: compile with $(EXTRA_WARNINGS) · a9436dca

由 Quentin Monnet 提交于 8月 14, 2019

Compile bpftool with $(EXTRA_WARNINGS), as defined in
scripts/Makefile.include, and fix the new warnings produced.

Simply leave -Wswitch-enum out of the warning list, as we have several
switch-case structures where it is not desirable to process all values
of an enum.

Remove -Wshadow from the warnings we manually add to CFLAGS, as it is
handled in $(EXTRA_WARNINGS).
Signed-off-by: NQuentin Monnet <quentin.monnet@netronome.com>
Reviewed-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

a9436dca

14 8月, 2019 13 次提交

Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next · c162610c

由 Jakub Kicinski 提交于 8月 13, 2019

Pablo Neira Ayuso says:

====================
Netfilter/IPVS updates for net-next

The following patchset contains Netfilter/IPVS updates for net-next:

1) Rename mss field to mss_option field in synproxy, from Fernando Mancera.

2) Use SYSCTL_{ZERO,ONE} definitions in conntrack, from Matteo Croce.

3) More strict validation of IPVS sysctl values, from Junwei Hu.

4) Remove unnecessary spaces after on the right hand side of assignments,
   from yangxingwu.

5) Add offload support for bitwise operation.

6) Extend the nft_offload_reg structure to store immediate date.

7) Collapse several ip_set header files into ip_set.h, from
   Jeremy Sowden.

8) Make netfilter headers compile with CONFIG_KERNEL_HEADER_TEST=y,
   from Jeremy Sowden.

9) Fix several sparse warnings due to missing prototypes, from
   Valdis Kletnieks.

10) Use static lock initialiser to ensure connlabel spinlock is
    initialized on boot time to fix sched/act_ct.c, patch
    from Florian Westphal.
====================
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>

c162610c

Merge branch 'r8152-RX-improve' · b753c5a7

由 Jakub Kicinski 提交于 8月 13, 2019

Hayes says:

====================
v2:
For patch #2, replace list_for_each_safe with list_for_each_entry_safe.
Remove unlikely in WARN_ON. Adjust the coding style.

For patch #4, replace list_for_each_safe with list_for_each_entry_safe.
Remove "else" after "continue".

For patch #5. replace sysfs with ethtool to modify rx_copybreak and
rx_pending.

v1:
The different chips use different rx buffer size.

Use skb_add_rx_frag() to reduce memory copy for RX.
====================
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>

b753c5a7

r8152: change rx_copybreak and rx_pending through ethtool · e4a5017a

由 Hayes Wang 提交于 8月 13, 2019

Let the rx_copybreak and rx_pending could be modified by
ethtool.
Signed-off-by: NHayes Wang <hayeswang@realtek.com>
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>

e4a5017a

r8152: support skb_add_rx_frag · 47922fcd

由 Hayes Wang 提交于 8月 13, 2019

Use skb_add_rx_frag() to reduce the memory copy for rx data.

Use a new list of rx_used to store the rx buffer which couldn't be
reused yet.

Besides, the total number of rx buffer may be increased or decreased
dynamically. And it is limited by RTL8152_MAX_RX_AGG.
Signed-off-by: NHayes Wang <hayeswang@realtek.com>
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>

47922fcd

r8152: use alloc_pages for rx buffer · d55d7089

由 Hayes Wang 提交于 8月 13, 2019

Replace kmalloc_node() with alloc_pages() for rx buffer.
Signed-off-by: NHayes Wang <hayeswang@realtek.com>
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>

d55d7089

r8152: replace array with linking list for rx information · 252df8b8

由 Hayes Wang 提交于 8月 13, 2019

The original method uses an array to store the rx information. The
new one uses a list to link each rx structure. Then, it is possible
to increase/decrease the number of rx structure dynamically.
Signed-off-by: NHayes Wang <hayeswang@realtek.com>
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>

252df8b8

r8152: separate the rx buffer size · ec5791c2

由 Hayes Wang 提交于 8月 13, 2019

The different chips may accept different rx buffer sizes. The RTL8152
supports 16K bytes, and RTL8153 support 32K bytes.
Signed-off-by: NHayes Wang <hayeswang@realtek.com>
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>

ec5791c2

Merge branch 'net-phy-let-phy_speed_down-up-support-speeds-1Gbps' · e070ca37

由 Jakub Kicinski 提交于 8月 13, 2019

Heiner says:

====================
So far phy_speed_down/up can be used up to 1Gbps only. Remove this
restriction and add needed helpers to phy-core.c

v2:
- remove unused parameter in patch 1
- rename __phy_speed_down to phy_speed_down_core in patch 2
====================
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>

e070ca37

net: phy: let phy_speed_down/up support speeds >1Gbps · 65b27995

由 Heiner Kallweit 提交于 8月 12, 2019

So far phy_speed_down/up can be used up to 1Gbps only. Remove this
restriction by using new helper __phy_speed_down. New member adv_old
in struct phy_device is used by phy_speed_up to restore the advertised
modes before calling phy_speed_down. Don't simply advertise what is
supported because a user may have intentionally removed modes from
advertisement.
Signed-off-by: NHeiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>

65b27995

net: phy: add phy_speed_down_core and phy_resolve_min_speed · 331c56ac

由 Heiner Kallweit 提交于 8月 12, 2019

phy_speed_down_core provides most of the functionality for
phy_speed_down. It makes use of new helper phy_resolve_min_speed that is
based on the sorting of the settings[] array. In certain cases it may be
helpful to be able to exclude legacy half duplex modes, therefore
prepare phy_resolve_min_speed() for it.

v2:
- rename __phy_speed_down to phy_speed_down_core
Signed-off-by: NHeiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>

331c56ac

net: phy: add __set_linkmode_max_speed · 7b261e0e

由 Heiner Kallweit 提交于 8月 12, 2019

We will need the functionality of __set_linkmode_max_speed also for
linkmode bitmaps other than phydev->supported. Therefore split it.

v2:
- remove unused parameter from __set_linkmode_max_speed
Signed-off-by: NHeiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>

7b261e0e

net: devlink: remove redundant rtnl lock assert · 043b8413

由 Vlad Buslov 提交于 8月 12, 2019

It is enough for caller of devlink_compat_switch_id_get() to hold the net
device to guarantee that devlink port is not destroyed concurrently. Remove
rtnl lock assertion and modify comment to warn user that they must hold
either rtnl lock or reference to net device. This is necessary to
accommodate future implementation of rtnl-unlocked TC offloads driver
callbacks.
Signed-off-by: NVlad Buslov <vladbu@mellanox.com>
Acked-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>

043b8413

Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next · 708852dc

由 Jakub Kicinski 提交于 8月 13, 2019

Daniel Borkmann says:

====================
The following pull-request contains BPF updates for your *net-next* tree.

There is a small merge conflict in libbpf (Cc Andrii so he's in the loop
as well):

        for (i = 1; i <= btf__get_nr_types(btf); i++) {
                t = (struct btf_type *)btf__type_by_id(btf, i);

                if (!has_datasec && btf_is_var(t)) {
                        /* replace VAR with INT */
                        t->info = BTF_INFO_ENC(BTF_KIND_INT, 0, 0);
  <<<<<<< HEAD
                        /*
                         * using size = 1 is the safest choice, 4 will be too
                         * big and cause kernel BTF validation failure if
                         * original variable took less than 4 bytes
                         */
                        t->size = 1;
                        *(int *)(t+1) = BTF_INT_ENC(0, 0, 8);
                } else if (!has_datasec && kind == BTF_KIND_DATASEC) {
  =======
                        t->size = sizeof(int);
                        *(int *)(t + 1) = BTF_INT_ENC(0, 0, 32);
                } else if (!has_datasec && btf_is_datasec(t)) {
  >>>>>>> 72ef80b5
                        /* replace DATASEC with STRUCT */

Conflict is between the two commits 1d4126c4 ("libbpf: sanitize VAR to
conservative 1-byte INT") and b03bc685 ("libbpf: convert libbpf code to
use new btf helpers"), so we need to pick the sanitation fixup as well as
use the new btf_is_datasec() helper and the whitespace cleanup. Looks like
the following:

  [...]
                if (!has_datasec && btf_is_var(t)) {
                        /* replace VAR with INT */
                        t->info = BTF_INFO_ENC(BTF_KIND_INT, 0, 0);
                        /*
                         * using size = 1 is the safest choice, 4 will be too
                         * big and cause kernel BTF validation failure if
                         * original variable took less than 4 bytes
                         */
                        t->size = 1;
                        *(int *)(t + 1) = BTF_INT_ENC(0, 0, 8);
                } else if (!has_datasec && btf_is_datasec(t)) {
                        /* replace DATASEC with STRUCT */
  [...]

The main changes are:

1) Addition of core parts of compile once - run everywhere (co-re) effort,
   that is, relocation of fields offsets in libbpf as well as exposure of
   kernel's own BTF via sysfs and loading through libbpf, from Andrii.

   More info on co-re: http://vger.kernel.org/bpfconf2019.html#session-2
   and http://vger.kernel.org/lpc-bpf2018.html#session-2

2) Enable passing input flags to the BPF flow dissector to customize parsing
   and allowing it to stop early similar to the C based one, from Stanislav.

3) Add a BPF helper function that allows generating SYN cookies from XDP and
   tc BPF, from Petar.

4) Add devmap hash-based map type for more flexibility in device lookup for
   redirects, from Toke.

5) Improvements to XDP forwarding sample code now utilizing recently enabled
   devmap lookups, from Jesper.

6) Add support for reporting the effective cgroup progs in bpftool, from Jakub
   and Takshak.

7) Fix reading kernel config from bpftool via /proc/config.gz, from Peter.

8) Fix AF_XDP umem pages mapping for 32 bit architectures, from Ivan.

9) Follow-up to add two more BPF loop tests for the selftest suite, from Alexei.

10) Add perf event output helper also for other skb-based program types, from Allan.

11) Fix a co-re related compilation error in selftests, from Yonghong.
====================
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>

708852dc

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功