提交 · 3131cf66d3033aa40db5b5d72a2673315b61c862 · openeuler / Kernel

15 9月, 2020 2 次提交

samples/bpf: Fix one packet sending in xdpsock · 3131cf66

由 Magnus Karlsson 提交于 9月 10, 2020

Fix the sending of a single packet (or small burst) in xdpsock when
executing in copy mode. Currently, the l2fwd application in xdpsock
only transmits the packets after a batch of them has been received,
which might be confusing if you only send one packet and expect that
it is returned pronto. Fix this by calling sendto() more often and add
a comment in the code that states that this can be optimized if
needed.
Reported-by: NTirthendu Sarkar <tirthendu.sarkar@intel.com>
Signed-off-by: NMagnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/1599726666-8431-2-git-send-email-magnus.karlsson@gmail.com

3131cf66

s390/bpf: Fix multiple tail calls · d72714c1

由 Ilya Leoshkevich 提交于 9月 10, 2020

In order to branch around tail calls (due to out-of-bounds index,
exceeding tail call count or missing tail call target), JIT uses
label[0] field, which contains the address of the instruction following
the tail call. When there are multiple tail calls, label[0] value comes
from handling of a previous tail call, which is incorrect.

Fix by getting rid of label array and resolving the label address
locally: for all 3 branches that jump to it, emit 0 offsets at the
beginning, and then backpatch them with the correct value.

Also, do not use the long jump infrastructure: the tail call sequence
is known to be short, so make all 3 jumps short.

Fixes: 6651ee07 ("s390/bpf: implement bpf_tail_call() helper")
Signed-off-by: NIlya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200909232141.3099367-1-iii@linux.ibm.com

d72714c1

11 9月, 2020 22 次提交

Merge branch 'improve-bpf-tcp-cc-init' · 2bab48c5

由 Alexei Starovoitov 提交于 9月 10, 2020

Neal Cardwell says:

====================
This patch series reorganizes TCP congestion control initialization so that if
EBPF code called by tcp_init_transfer() sets the congestion control algorithm
by calling setsockopt(TCP_CONGESTION) then the TCP stack initializes the
congestion control module immediately, instead of having tcp_init_transfer()
later initialize the congestion control module.

This increases flexibility for the EBPF code that runs at connection
establishment time, and simplifies the code.

This has the following benefits:

(1) This allows CC module customizations made by the EBPF called in
    tcp_init_transfer() to persist, and not be wiped out by a later
    call to tcp_init_congestion_control() in tcp_init_transfer().

(2) Does not flip the order of EBPF and CC init, to avoid causing bugs
    for existing code upstream that depends on the current order.

(3) Does not cause 2 initializations for for CC in the case where the
    EBPF called in tcp_init_transfer() wants to set the CC to a new CC
    algorithm.

(4) Allows follow-on simplifications to the code in net/core/filter.c
    and net/ipv4/tcp_cong.c, which currently both have some complexity
    to special-case CC initialization to avoid double CC
    initialization if EBPF sets the CC.

changes in v2:

o rebase onto bpf-next

o add another follow-on simplification suggested by Martin KaFai Lau:
   "tcp: simplify tcp_set_congestion_control() load=false case"

changes in v3:

o no change in commits

o resent patch series from @gmail.com, since mail from ncardwell@google.com
  stopped being accepted at netdev@vger.kernel.org mid-way through processing
  the v2 patch series (between patches 2 and 3), confusing patchwork about
  which patches belonged to the v2 patch series
====================
Acked-by: NMartin KaFai Lau <kafai@fb.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

2bab48c5

tcp: Simplify tcp_set_congestion_control() load=false case · 5050bef8

由 Neal Cardwell 提交于 9月 10, 2020

Simplify tcp_set_congestion_control() by removing the initialization
code path for the !load case.

There are only two call sites for tcp_set_congestion_control(). The
EBPF call site is the only one that passes load=false; it also passes
cap_net_admin=true. Because of that, the exact same behavior can be
achieved by removing the special if (!load) branch of the logic. Both
before and after this commit, the EBPF case will call
bpf_try_module_get(), and if that succeeds then call
tcp_reinit_congestion_control() or if that fails then return EBUSY.

Note that this returns the logic to a structure very similar to the
structure before:
  commit 91b5b21c ("bpf: Add support for changing congestion control")
except that the CAP_NET_ADMIN status is passed in as a function
argument.

This clean-up was suggested by Martin KaFai Lau.
Suggested-by: NMartin KaFai Lau <kafai@fb.com>
Signed-off-by: NNeal Cardwell <ncardwell@google.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Cc: Lawrence Brakmo <brakmo@fb.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Kevin Yang <yyd@google.com>

5050bef8

tcp: simplify _bpf_setsockopt(): Remove flags argument · 5cdc744c

由 Neal Cardwell 提交于 9月 10, 2020

Now that the previous patches have removed the code that uses the
flags argument to _bpf_setsockopt(), we can remove that argument.
Signed-off-by: NNeal Cardwell <ncardwell@google.com>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NYuchung Cheng <ycheng@google.com>
Acked-by: NKevin Yang <yyd@google.com>
Cc: Lawrence Brakmo <brakmo@fb.com>

5cdc744c

tcp: simplify tcp_set_congestion_control(): Always reinitialize · 29a94932

由 Neal Cardwell 提交于 9月 10, 2020

Now that the previous patches ensure that all call sites for
tcp_set_congestion_control() want to initialize congestion control, we
can simplify tcp_set_congestion_control() by removing the reinit
argument and the code to support it.
Signed-off-by: NNeal Cardwell <ncardwell@google.com>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NYuchung Cheng <ycheng@google.com>
Acked-by: NKevin Yang <yyd@google.com>
Cc: Lawrence Brakmo <brakmo@fb.com>

29a94932

tcp: Simplify EBPF TCP_CONGESTION to always init CC · e7b10a4d

由 Neal Cardwell 提交于 9月 10, 2020

Now that the previous patch ensures we don't initialize the congestion
control twice, when EBPF sets the congestion control algorithm at
connection establishment we can simplify the code by simply
initializing the congestion control module at that time.
Signed-off-by: NNeal Cardwell <ncardwell@google.com>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NYuchung Cheng <ycheng@google.com>
Acked-by: NKevin Yang <yyd@google.com>
Cc: Lawrence Brakmo <brakmo@fb.com>

e7b10a4d

tcp: Only init congestion control if not initialized already · 8919a9b3

由 Neal Cardwell 提交于 9月 10, 2020

Change tcp_init_transfer() to only initialize congestion control if it
has not been initialized already.

With this new approach, we can arrange things so that if the EBPF code
sets the congestion control by calling setsockopt(TCP_CONGESTION) then
tcp_init_transfer() will not re-initialize the CC module.

This is an approach that has the following beneficial properties:

(1) This allows CC module customizations made by the EBPF called in
    tcp_init_transfer() to persist, and not be wiped out by a later
    call to tcp_init_congestion_control() in tcp_init_transfer().

(2) Does not flip the order of EBPF and CC init, to avoid causing bugs
    for existing code upstream that depends on the current order.

(3) Does not cause 2 initializations for for CC in the case where the
    EBPF called in tcp_init_transfer() wants to set the CC to a new CC
    algorithm.

(4) Allows follow-on simplifications to the code in net/core/filter.c
    and net/ipv4/tcp_cong.c, which currently both have some complexity
    to special-case CC initialization to avoid double CC
    initialization if EBPF sets the CC.
Signed-off-by: NNeal Cardwell <ncardwell@google.com>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NYuchung Cheng <ycheng@google.com>
Acked-by: NKevin Yang <yyd@google.com>
Cc: Lawrence Brakmo <brakmo@fb.com>

8919a9b3

tools: bpftool: Automate generation for "SEE ALSO" sections in man pages · 18841da9

由 Quentin Monnet 提交于 9月 10, 2020

The "SEE ALSO" sections of bpftool's manual pages refer to bpf(2),
bpf-helpers(7), then all existing bpftool man pages (save the current
one).

This leads to nearly-identical lists being duplicated in all manual
pages. Ideally, when a new page is created, all lists should be updated
accordingly, but this has led to omissions and inconsistencies multiple
times in the past.

Let's take it out of the RST files and generate the "SEE ALSO" sections
automatically in the Makefile when generating the man pages. The lists
are not really useful in the RST anyway because all other pages are
available in the same directory.

v3:
- Fix conflict with a previous patchset that introduced RST2MAN_OPTS
  variable passed to rst2man.

v2:
- Use "echo -n" instead of "printf" in Makefile, to avoid any risk of
  passing a format string directly to the command.
Signed-off-by: NQuentin Monnet <quentin@isovalent.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NAndrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200910203935.25304-1-quentin@isovalent.com

18841da9

bpf: Fix comment for helper bpf_current_task_under_cgroup() · 1aef5b43

由 Song Liu 提交于 9月 10, 2020

This should be "current" not "skb".

Fixes: c6b5fb86 ("bpf: add documentation for eBPF helpers (42-50)")
Signed-off-by: NSong Liu <songliubraving@fb.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/bpf/20200910203314.70018-1-songliubraving@fb.com

1aef5b43

selftests/bpf: Define string const as global for test_sysctl_prog.c · 6e057fc1

由 Yonghong Song 提交于 9月 10, 2020

When tweaking llvm optimizations, I found that selftest build failed
with the following error:
libbpf: elf: skipping unrecognized data section(6) .rodata.str1.1
libbpf: prog 'sysctl_tcp_mem': bad map relo against '.L__const.is_tcp_mem.tcp_mem_name'
in section '.rodata.str1.1'
Error: failed to open BPF object file: Relocation failed
make: *** [/work/net-next/tools/testing/selftests/bpf/test_sysctl_prog.skel.h] Error 255
make: *** Deleting file `/work/net-next/tools/testing/selftests/bpf/test_sysctl_prog.skel.h'

The local string constant "tcp_mem_name" is put into '.rodata.str1.1' section
which libbpf cannot handle. Using untweaked upstream llvm, "tcp_mem_name"
is completely inlined after loop unrolling.

Commit 7fb5eefd ("selftests/bpf: Fix test_sysctl_loop{1, 2}
failure due to clang change") solved a similar problem by defining
the string const as a global. Let us do the same here
for test_sysctl_prog.c so it can weather future potential llvm changes.
Signed-off-by: NYonghong Song <yhs@fb.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NAndrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200910202718.956042-1-yhs@fb.com

6e057fc1

selftests/bpf: Fix test_ksyms on non-SMP kernels · 90a1deda

由 Ilya Leoshkevich 提交于 9月 10, 2020

On non-SMP kernels __per_cpu_start is not 0, so look it up in kallsyms.
Signed-off-by: NIlya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NAndrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200910171336.3161995-1-iii@linux.ibm.com

90a1deda

bpf: Plug hole in struct bpf_sk_lookup_kern · d66423fb

由 Lorenz Bauer 提交于 9月 10, 2020

As Alexei points out, struct bpf_sk_lookup_kern has two 4-byte holes.
This leads to suboptimal instructions being generated (IPv4, x86):

    1372                    struct bpf_sk_lookup_kern ctx = {
       0xffffffff81b87f30 <+624>:   xor    %eax,%eax
       0xffffffff81b87f32 <+626>:   mov    $0x6,%ecx
       0xffffffff81b87f37 <+631>:   lea    0x90(%rsp),%rdi
       0xffffffff81b87f3f <+639>:   movl   $0x110002,0x88(%rsp)
       0xffffffff81b87f4a <+650>:   rep stos %rax,%es:(%rdi)
       0xffffffff81b87f4d <+653>:   mov    0x8(%rsp),%eax
       0xffffffff81b87f51 <+657>:   mov    %r13d,0x90(%rsp)
       0xffffffff81b87f59 <+665>:   incl   %gs:0x7e4970a0(%rip)
       0xffffffff81b87f60 <+672>:   mov    %eax,0x8c(%rsp)
       0xffffffff81b87f67 <+679>:   movzwl 0x10(%rsp),%eax
       0xffffffff81b87f6c <+684>:   mov    %ax,0xa8(%rsp)
       0xffffffff81b87f74 <+692>:   movzwl 0x38(%rsp),%eax
       0xffffffff81b87f79 <+697>:   mov    %ax,0xaa(%rsp)

Fix this by moving around sport and dport. pahole confirms there
are no more holes:

    struct bpf_sk_lookup_kern {
        u16                        family;       /*     0     2 */
        u16                        protocol;     /*     2     2 */
        __be16                     sport;        /*     4     2 */
        u16                        dport;        /*     6     2 */
        struct {
                __be32             saddr;        /*     8     4 */
                __be32             daddr;        /*    12     4 */
        } v4;                                    /*     8     8 */
        struct {
                const struct in6_addr  * saddr;  /*    16     8 */
                const struct in6_addr  * daddr;  /*    24     8 */
        } v6;                                    /*    16    16 */
        struct sock *              selected_sk;  /*    32     8 */
        bool                       no_reuseport; /*    40     1 */

        /* size: 48, cachelines: 1, members: 8 */
        /* padding: 7 */
        /* last cacheline: 48 bytes */
    };

The assembly also doesn't contain the pesky rep stos anymore:

    1372                    struct bpf_sk_lookup_kern ctx = {
       0xffffffff81b87f60 <+624>:   movzwl 0x10(%rsp),%eax
       0xffffffff81b87f65 <+629>:   movq   $0x0,0xa8(%rsp)
       0xffffffff81b87f71 <+641>:   movq   $0x0,0xb0(%rsp)
       0xffffffff81b87f7d <+653>:   mov    %ax,0x9c(%rsp)
       0xffffffff81b87f85 <+661>:   movzwl 0x38(%rsp),%eax
       0xffffffff81b87f8a <+666>:   movq   $0x0,0xb8(%rsp)
       0xffffffff81b87f96 <+678>:   mov    %ax,0x9e(%rsp)
       0xffffffff81b87f9e <+686>:   mov    0x8(%rsp),%eax
       0xffffffff81b87fa2 <+690>:   movq   $0x0,0xc0(%rsp)
       0xffffffff81b87fae <+702>:   movl   $0x110002,0x98(%rsp)
       0xffffffff81b87fb9 <+713>:   mov    %eax,0xa0(%rsp)
       0xffffffff81b87fc0 <+720>:   mov    %r13d,0xa4(%rsp)

1: https://lore.kernel.org/bpf/CAADnVQKE6y9h2fwX6OS837v-Uf+aBXnT_JXiN_bbo2gitZQ3tA@mail.gmail.com/

Fixes: e9ddbb77 ("bpf: Introduce SK_LOOKUP program type with a dedicated attach point")
Suggested-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NLorenz Bauer <lmb@cloudflare.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NJesper Dangaard Brouer <brouer@redhat.com>
Link: https://lore.kernel.org/bpf/20200910110248.198326-1-lmb@cloudflare.com

d66423fb

tools: bpftool: Add "inner_map" to "bpftool map create" outer maps · e3b9626f

由 Quentin Monnet 提交于 9月 10, 2020

There is no support for creating maps of types array-of-map or
hash-of-map in bpftool. This is because the kernel needs an inner_map_fd
to collect metadata on the inner maps to be supported by the new map,
but bpftool does not provide a way to pass this file descriptor.

Add a new optional "inner_map" keyword that can be used to pass a
reference to a map, retrieve a fd to that map, and pass it as the
inner_map_fd.

Add related documentation and bash completion. Note that we can
reference the inner map by its name, meaning we can have several times
the keyword "name" with different meanings (mandatory outer map name,
and possibly a name to use to find the inner_map_fd). The bash
completion will offer it just once, and will not suggest "name" on the
following command:

    # bpftool map create /sys/fs/bpf/my_outer_map type hash_of_maps \
        inner_map name my_inner_map [TAB]

Fixing that specific case seems too convoluted. Completion will work as
expected, however, if the outer map name comes first and the "inner_map
name ..." is passed second.
Signed-off-by: NQuentin Monnet <quentin@isovalent.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NAndrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200910102652.10509-4-quentin@isovalent.com

e3b9626f

tools: bpftool: Keep errors for map-of-map dumps if distinct from ENOENT · 86233ce3

由 Quentin Monnet 提交于 9月 10, 2020

When dumping outer maps or prog_array maps, and on lookup failure,
bpftool simply skips the entry with no error message. This is because
the kernel returns non-zero when no value is found for the provided key,
which frequently happen for those maps if they have not been filled.

When such a case occurs, errno is set to ENOENT. It seems unlikely we
could receive other error codes at this stage (we successfully retrieved
map info just before), but to be on the safe side, let's skip the entry
only if errno was ENOENT, and not for the other errors.

v3: New patch
Signed-off-by: NQuentin Monnet <quentin@isovalent.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NAndrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200910102652.10509-3-quentin@isovalent.com

86233ce3

tools: bpftool: Clean up function to dump map entry · a20693b6

由 Quentin Monnet 提交于 9月 10, 2020

The function used to dump a map entry in bpftool is a bit difficult to
follow, as a consequence to earlier refactorings. There is a variable
("num_elems") which does not appear to be necessary, and the error
handling would look cleaner if moved to its own function. Let's clean it
up. No functional change.

v2:
- v1 was erroneously removing the check on fd maps in an attempt to get
  support for outer map dumps. This is already working. Instead, v2
  focuses on cleaning up the dump_map_elem() function, to avoid
  similar confusion in the future.
Signed-off-by: NQuentin Monnet <quentin@isovalent.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NAndrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200910102652.10509-2-quentin@isovalent.com

a20693b6

selftests: bpf: Test iterating a sockmap · 2f7de986

由 Lorenz Bauer 提交于 9月 09, 2020

Add a test that exercises a basic sockmap / sockhash iteration. For
now we simply count the number of elements seen. Once sockmap update
from iterators works we can extend this to perform a full copy.
Signed-off-by: NLorenz Bauer <lmb@cloudflare.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200909162712.221874-4-lmb@cloudflare.com

2f7de986

net: Allow iterating sockmap and sockhash · 03653515

由 Lorenz Bauer 提交于 9月 09, 2020

Add bpf_iter support for sockmap / sockhash, based on the bpf_sk_storage and
hashtable implementation. sockmap and sockhash share the same iteration
context: a pointer to an arbitrary key and a pointer to a socket. Both
pointers may be NULL, and so BPF has to perform a NULL check before accessing
them. Technically it's not possible for sockhash iteration to yield a NULL
socket, but we ignore this to be able to use a single iteration point.

Iteration will visit all keys that remain unmodified during the lifetime of
the iterator. It may or may not visit newly added ones.

Switch from using rcu_dereference_raw to plain rcu_dereference, so we gain
another guard rail if CONFIG_PROVE_RCU is enabled.
Signed-off-by: NLorenz Bauer <lmb@cloudflare.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NYonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20200909162712.221874-3-lmb@cloudflare.com

03653515

net: sockmap: Remove unnecessary sk_fullsock checks · 654785a1

由 Lorenz Bauer 提交于 9月 09, 2020

The lookup paths for sockmap and sockhash currently include a check
that returns NULL if the socket we just found is not a full socket.
However, this check is not necessary. On insertion we ensure that
we have a full socket (caveat around sock_ops), so request sockets
are not a problem. Time-wait sockets are allocated separate from
the original socket and then fed into the hashdance. They don't
affect the sockets already stored in the sockmap.
Suggested-by: NJakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: NLorenz Bauer <lmb@cloudflare.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200909162712.221874-2-lmb@cloudflare.com

654785a1

tools: bpftool: Include common options from separate file · f28ef96d

由 Quentin Monnet 提交于 9月 09, 2020

Nearly all man pages for bpftool have the same common set of option
flags (--help, --version, --json, --pretty, --debug). The description is
duplicated across all the pages, which is more difficult to maintain if
the description of an option changes. It may also be confusing to sort
out what options are not "common" and should not be copied when creating
new manual pages.

Let's move the description for those common options to a separate file,
which is included with a RST directive when generating the man pages.
Signed-off-by: NQuentin Monnet <quentin@isovalent.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NAndrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200909162500.17010-3-quentin@isovalent.com

f28ef96d

tools: bpftool: Print optional built-in features along with version · 82b8cf0a

由 Quentin Monnet 提交于 9月 09, 2020

Bpftool has a number of features that can be included or left aside
during compilation. This includes:

- Support for libbfd, providing the disassembler for JIT-compiled
  programs.
- Support for BPF skeletons, used for profiling programs or iterating on
  the PIDs of processes associated with BPF objects.

In order to make it easy for users to understand what features were
compiled for a given bpftool binary, print the status of the two
features above when showing the version number for bpftool ("bpftool -V"
or "bpftool version"). Document this in the main manual page. Example
invocations:

    $ bpftool version
    ./bpftool v5.9.0-rc1
    features: libbfd, skeletons

    $ bpftool -p version
    {
        "version": "5.9.0-rc1",
        "features": {
            "libbfd": true,
            "skeletons": true
        }
    }

Some other parameters are optional at compilation
("DISASM_FOUR_ARGS_SIGNATURE", LIBCAP support) but they do not impact
significantly bpftool's behaviour from a user's point of view, so their
status is not reported.

Available commands and supported program types depend on the version
number, and are therefore not reported either. Note that they are
already available, albeit without JSON, via bpftool's help messages.

v3:
- Use a simple list instead of boolean values for plain output.

v2:
- Fix JSON (object instead or array for the features).
Signed-off-by: NQuentin Monnet <quentin@isovalent.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NAndrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200909162500.17010-2-quentin@isovalent.com

82b8cf0a

selftests, bpftool: Add bpftool (and eBPF helpers) documentation build · 41d5c37b

由 Quentin Monnet 提交于 9月 09, 2020

eBPF selftests include a script to check that bpftool builds correctly
with different command lines. Let's add one build for bpftool's
documentation so as to detect errors or warning reported by rst2man when
compiling the man pages. Also add a build to the selftests Makefile to
make sure we build bpftool documentation along with bpftool when
building the selftests.

This also builds and checks warnings for the man page for eBPF helpers,
which is built along bpftool's documentation.

This change adds rst2man as a dependency for selftests (it comes with
Python's "docutils").

v2:
- Use "--exit-status=1" option for rst2man instead of counting lines
  from stderr.
- Also build bpftool as part as the selftests build (and not only when
  the tests are actually run).
Signed-off-by: NQuentin Monnet <quentin@isovalent.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NAndrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200909162251.15498-3-quentin@isovalent.com

41d5c37b

tools: bpftool: Log info-level messages when building bpftool man pages · 16f3ddfb

由 Quentin Monnet 提交于 9月 09, 2020

To build man pages for bpftool (and for eBPF helper functions), rst2man
can log different levels of information. Let's make it log all levels
to keep the RST files clean.

Doing so, rst2man complains about double colons, used for literal
blocks, that look like underlines for section titles. Let's add the
necessary blank lines.

v2:
- Use "--verbose" instead of "-r 1" (same behaviour but more readable).
- Pass it through a RST2MAN_OPTS variable so we can easily pass other
  options too.
Signed-off-by: NQuentin Monnet <quentin@isovalent.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NAndrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200909162251.15498-2-quentin@isovalent.com

16f3ddfb

bpf: Remove duplicate headers · e9091bb7

由 Chen Zhou 提交于 9月 08, 2020

Remove duplicate headers which are included twice.
Signed-off-by: NChen Zhou <chenzhou10@huawei.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NAndrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200908132201.184005-1-chenzhou10@huawei.com

e9091bb7

10 9月, 2020 2 次提交

perf: Stop using deprecated bpf_program__title() · 8081ede1

由 Andrii Nakryiko 提交于 9月 08, 2020

Switch from deprecated bpf_program__title() API to
bpf_program__section_name(). Also drop unnecessary error checks because
neither bpf_program__title() nor bpf_program__section_name() can fail or
return NULL.

Fixes: 52109584 ("libbpf: Deprecate notion of BPF program "title" in favor of "section name"")
Signed-off-by: NAndrii Nakryiko <andriin@fb.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Reviewed-by: NTobias Klauser <tklauser@distanz.ch>
Acked-by: NJiri Olsa <jolsa@redhat.com>
Link: https://lore.kernel.org/bpf/20200908180127.1249-1-andriin@fb.com

8081ede1

selftests/bpf: Fix test_sysctl_loop{1, 2} failure due to clang change · 7fb5eefd

由 Yonghong Song 提交于 9月 09, 2020

Andrii reported that with latest clang, when building selftests, we have
error likes:
  error: progs/test_sysctl_loop1.c:23:16: in function sysctl_tcp_mem i32 (%struct.bpf_sysctl*):
  Looks like the BPF stack limit of 512 bytes is exceeded.
  Please move large on stack variables into BPF per-cpu array map.

The error is triggered by the following LLVM patch:
  https://reviews.llvm.org/D87134

For example, the following code is from test_sysctl_loop1.c:
  static __always_inline int is_tcp_mem(struct bpf_sysctl *ctx)
  {
    volatile char tcp_mem_name[] = "net/ipv4/tcp_mem/very_very_very_very_long_pointless_string";
    ...
  }
Without the above LLVM patch, the compiler did optimization to load the string
(59 bytes long) with 7 64bit loads, 1 8bit load and 1 16bit load,
occupying 64 byte stack size.

With the above LLVM patch, the compiler only uses 8bit loads, but subregister is 32bit.
So stack requirements become 4 * 59 = 236 bytes. Together with other stuff on
the stack, total stack size exceeds 512 bytes, hence compiler complains and quits.

To fix the issue, removing "volatile" key word or changing "volatile" to
"const"/"static const" does not work, the string is put in .rodata.str1.1 section,
which libbpf did not process it and errors out with
  libbpf: elf: skipping unrecognized data section(6) .rodata.str1.1
  libbpf: prog 'sysctl_tcp_mem': bad map relo against '.L__const.is_tcp_mem.tcp_mem_name'
          in section '.rodata.str1.1'

Defining the string const as global variable can fix the issue as it puts the string constant
in '.rodata' section which is recognized by libbpf. In the future, when libbpf can process
'.rodata.str*.*' properly, the global definition can be changed back to local definition.

Defining tcp_mem_name as a global, however, triggered a verifier failure.
   ./test_progs -n 7/21
  libbpf: load bpf program failed: Permission denied
  libbpf: -- BEGIN DUMP LOG ---
  libbpf:
  invalid stack off=0 size=1
  verification time 6975 usec
  stack depth 160+64
  processed 889 insns (limit 1000000) max_states_per_insn 4 total_states
  14 peak_states 14 mark_read 10

  libbpf: -- END LOG --
  libbpf: failed to load program 'sysctl_tcp_mem'
  libbpf: failed to load object 'test_sysctl_loop2.o'
  test_bpf_verif_scale:FAIL:114
  #7/21 test_sysctl_loop2.o:FAIL
This actually exposed a bpf program bug. In test_sysctl_loop{1,2}, we have code
like
  const char tcp_mem_name[] = "<...long string...>";
  ...
  char name[64];
  ...
  for (i = 0; i < sizeof(tcp_mem_name); ++i)
      if (name[i] != tcp_mem_name[i])
          return 0;
In the above code, if sizeof(tcp_mem_name) > 64, name[i] access may be
out of bound. The sizeof(tcp_mem_name) is 59 for test_sysctl_loop1.c and
79 for test_sysctl_loop2.c.

Without promotion-to-global change, old compiler generates code where
the overflowed stack access is actually filled with valid value, so hiding
the bpf program bug. With promotion-to-global change, the code is different,
more specifically, the previous loading constants to stack is gone, and
"name" occupies stack[-64:0] and overflow access triggers a verifier error.
To fix the issue, adjust "name" buffer size properly.
Reported-by: NAndrii Nakryiko <andriin@fb.com>
Signed-off-by: NYonghong Song <yhs@fb.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NAndrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200909171542.3673449-1-yhs@fb.com

7fb5eefd

09 9月, 2020 2 次提交

selftests/bpf: Add test for map_ptr arithmetic · e6054fc1

由 Yonghong Song 提交于 9月 08, 2020

Change selftest map_ptr_kern.c with disabling inlining for
one of subtests, which will fail the test without previous
verifier change. Also added to verifier test for both
"map_ptr += scalar" and "scalar += map_ptr" arithmetic.
Signed-off-by: NYonghong Song <yhs@fb.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NAndrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200908175703.2463721-1-yhs@fb.com

e6054fc1

bpf: Permit map_ptr arithmetic with opcode add and offset 0 · 7c696732

由 Yonghong Song 提交于 9月 08, 2020

Commit 41c48f3a ("bpf: Support access
to bpf map fields") added support to access map fields
with CORE support. For example,

            struct bpf_map {
                    __u32 max_entries;
            } __attribute__((preserve_access_index));

            struct bpf_array {
                    struct bpf_map map;
                    __u32 elem_size;
            } __attribute__((preserve_access_index));

            struct {
                    __uint(type, BPF_MAP_TYPE_ARRAY);
                    __uint(max_entries, 4);
                    __type(key, __u32);
                    __type(value, __u32);
            } m_array SEC(".maps");

            SEC("cgroup_skb/egress")
            int cg_skb(void *ctx)
            {
                    struct bpf_array *array = (struct bpf_array *)&m_array;

                    /* .. array->map.max_entries .. */
            }

In kernel, bpf_htab has similar structure,

	    struct bpf_htab {
		    struct bpf_map map;
                    ...
            }

In the above cg_skb(), to access array->map.max_entries, with CORE, the clang will
generate two builtin's.
            base = &m_array;
            /* access array.map */
            map_addr = __builtin_preserve_struct_access_info(base, 0, 0);
            /* access array.map.max_entries */
            max_entries_addr = __builtin_preserve_struct_access_info(map_addr, 0, 0);
	    max_entries = *max_entries_addr;

In the current llvm, if two builtin's are in the same function or
in the same function after inlining, the compiler is smart enough to chain
them together and generates like below:
            base = &m_array;
            max_entries = *(base + reloc_offset); /* reloc_offset = 0 in this case */
and we are fine.

But if we force no inlining for one of functions in test_map_ptr() selftest, e.g.,
check_default(), the above two __builtin_preserve_* will be in two different
functions. In this case, we will have code like:
   func check_hash():
            reloc_offset_map = 0;
            base = &m_array;
            map_base = base + reloc_offset_map;
            check_default(map_base, ...)
   func check_default(map_base, ...):
            max_entries = *(map_base + reloc_offset_max_entries);

In kernel, map_ptr (CONST_PTR_TO_MAP) does not allow any arithmetic.
The above "map_base = base + reloc_offset_map" will trigger a verifier failure.
  ; VERIFY(check_default(&hash->map, map));
  0: (18) r7 = 0xffffb4fe8018a004
  2: (b4) w1 = 110
  3: (63) *(u32 *)(r7 +0) = r1
   R1_w=invP110 R7_w=map_value(id=0,off=4,ks=4,vs=8,imm=0) R10=fp0
  ; VERIFY_TYPE(BPF_MAP_TYPE_HASH, check_hash);
  4: (18) r1 = 0xffffb4fe8018a000
  6: (b4) w2 = 1
  7: (63) *(u32 *)(r1 +0) = r2
   R1_w=map_value(id=0,off=0,ks=4,vs=8,imm=0) R2_w=invP1 R7_w=map_value(id=0,off=4,ks=4,vs=8,imm=0) R10=fp0
  8: (b7) r2 = 0
  9: (18) r8 = 0xffff90bcb500c000
  11: (18) r1 = 0xffff90bcb500c000
  13: (0f) r1 += r2
  R1 pointer arithmetic on map_ptr prohibited

To fix the issue, let us permit map_ptr + 0 arithmetic which will
result in exactly the same map_ptr.
Signed-off-by: NYonghong Song <yhs@fb.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NAndrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200908175702.2463625-1-yhs@fb.com

7c696732

07 9月, 2020 3 次提交

tools, bpf: Synchronise BPF UAPI header with tools · bc0b5a03

由 Quentin Monnet 提交于 9月 04, 2020

Synchronise the bpf.h header under tools, to report the fixes recently
brought to the documentation for the BPF helpers.
Signed-off-by: NQuentin Monnet <quentin@isovalent.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20200904161454.31135-4-quentin@isovalent.com

bc0b5a03

bpf: Fix formatting in documentation for BPF helpers · 938c3efd

由 Quentin Monnet 提交于 9月 04, 2020

Fix a formatting error in the description of bpf_load_hdr_opt() (rst2man
complains about a wrong indentation, but what is missing is actually a
blank line before the bullet list).

Fix and harmonise the formatting for other helpers.
Signed-off-by: NQuentin Monnet <quentin@isovalent.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20200904161454.31135-3-quentin@isovalent.com

938c3efd

tools: bpftool: Fix formatting in bpftool-link documentation · 1a7581b1

由 Quentin Monnet 提交于 9月 04, 2020

Fix a formatting error in the documentation for bpftool-link, so that
the man page can build correctly.
Signed-off-by: NQuentin Monnet <quentin@isovalent.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20200904161454.31135-2-quentin@isovalent.com

1a7581b1

04 9月, 2020 9 次提交

samples, bpf: Add xsk_fwd test file to .gitignore · f9bec5d7

由 Daniel T. Lee 提交于 9月 04, 2020

This commit adds xsk_fwd test file to .gitignore which is newly added
to samples/bpf.
Signed-off-by: NDaniel T. Lee <danieltimlee@gmail.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20200904063434.24963-2-danieltimlee@gmail.com

f9bec5d7

samples, bpf: Replace bpf_program__title() with bpf_program__section_name() · 698584df

由 Daniel T. Lee 提交于 9月 04, 2020

From commit 52109584 ("libbpf: Deprecate notion of BPF program
"title" in favor of "section name""), the term title has been replaced
with section name in libbpf.

Since the bpf_program__title() has been deprecated, this commit
switches this function to bpf_program__section_name(). Due to
this commit, the compilation warning issue has also been resolved.

Fixes: 52109584 ("libbpf: Deprecate notion of BPF program "title" in favor of "section name"")
Signed-off-by: NDaniel T. Lee <danieltimlee@gmail.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20200904063434.24963-1-danieltimlee@gmail.com

698584df

libbpf: Fix potential multiplication overflow · 8eb62958

由 Andrii Nakryiko 提交于 9月 03, 2020

Detected by LGTM static analyze in Github repo, fix potential multiplication
overflow before result is casted to size_t.

Fixes: 8505e870 ("libbpf: Implement generalized .BTF.ext func/line info adjustment")
Signed-off-by: NAndrii Nakryiko <andriin@fb.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20200904041611.1695163-2-andriin@fb.com

8eb62958

libbpf: Fix another __u64 cast in printf · 17e54b09

由 Andrii Nakryiko 提交于 9月 03, 2020

Another issue of __u64 needing either %lu or %llu, depending on the
architecture. Fix with cast to `unsigned long long`.

Fixes: 7e06aad5 ("libbpf: Add multi-prog section support for struct_ops")
Signed-off-by: NAndrii Nakryiko <andriin@fb.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20200904041611.1695163-1-andriin@fb.com

17e54b09

selftests/bpf: Fix check in global_data_init. · 95cec14b

由 Hao Luo 提交于 9月 03, 2020

The returned value of bpf_object__open_file() should be checked with
libbpf_get_error() rather than NULL. This fix prevents test_progs from
crash when test_global_data.o is not present.
Signed-off-by: NHao Luo <haoluo@google.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NAndrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200903200528.747884-1-haoluo@google.com

95cec14b

Merge branch 'libbpf-support-bpf-to-bpf-calls' · b4ff7ad7

由 Alexei Starovoitov 提交于 9月 03, 2020

Andrii Nakryiko says:

====================
Currently, libbpf supports a limited form of BPF-to-BPF subprogram calls. The
restriction is that entry-point BPF program should use *all* of defined
sub-programs in BPF .o file. If any of the subprograms is not used, such
entry-point BPF program will be rejected by verifier as containing unreachable
dead code. This is not a big limitation for cases with single entry-point BPF
programs, but is quite a heavy restriction for multi-programs that use only
partially overlapping set of subprograms.

This patch set removes all such restrictions and adds complete support for
using BPF sub-program calls on BPF side. This is achieved through libbpf
tracking subprograms individually and detecting which subprograms are used by
any given entry-point BPF program, and subsequently only appending and
relocating code for just those used subprograms.

In addition, libbpf now also supports multiple entry-point BPF programs within
the same ELF section. This allows to structure code so that there are few
variants of BPF programs of the same type and attaching to the same target
(e.g., for tracepoints and kprobes) without the need to worry about ELF
section name clashes.

This patch set opens way for more wider adoption of BPF subprogram calls,
especially for real-world production use-cases with complicated net of
subprograms. This will allow to further scale BPF verification process through
good use of global functions, which can be verified independently. This is
also important prerequisite for static linking which allows static BPF
libraries to not worry about naming clashes for section names, as well as use
static non-inlined functions (subprograms) without worries of verifier
rejecting program due to dead code.

Patch set is structured as follows:
- patched 1-6 contain all the libbpf changes necessary to support multi-prog
  sections and bpf2bpf subcalls;
- patch 7 adds dedicated selftests validating all combinations of possible
  sub-calls (within and across sections, static vs global functions);
- patch 8 deprecated bpf_program__title() in favor of
  bpf_program__section_name(). The intent was to also deprecate
  bpf_object__find_program_by_title() as it's now non-sensical with multiple
  programs per section. But there were too many selftests uses of this and
  I didn't want to delay this patches further and make it even bigger, so left
  it for a follow up cleanup;
- patches 9-10 remove uses for title-related APIs from bpftool and
  bpf_program__title() use from selftests;
- patch 11 is converting fexit_bpf2bpf to have explicit subtest (it does
  contain 4 subtests, which are not handled as sub-tests);
- patches 12-14 convert few complicated BPF selftests to use __noinline
  functions to further validate correctness of libbpf's bpf2bpf processing
  logic.

v2->v3:
  - explained subprog relocation algorithm in more details (Alexei);
  - pyperf, strobelight and cls_redirect got new subprog variants, leaving
    other modes intact (Alexei);
v1->v2:
  - rename DEPRECATED to LIBBPF_DEPRECATED to avoid name clashes;
  - fix test_subprogs build;
  - convert a bunch of complicated selftests to __noinline (Alexei).
====================
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

b4ff7ad7

selftests/bpf: Add __noinline variant of cls_redirect selftest · ee333df5

由 Andrii Nakryiko 提交于 9月 03, 2020

As one of the most complicated and close-to-real-world programs, cls_redirect
is a good candidate to exercise libbpf's logic of handling bpf2bpf calls. So
add variant with using explicit __noinline for majority of functions except
few most basic ones. If those few functions are inlined, verifier starts to
complain about program instruction limit of 1mln instructions being exceeded,
most probably due to instruction overhead of doing a sub-program call.
Convert user-space part of selftest to have to sub-tests: with and without
inlining.
Signed-off-by: NAndrii Nakryiko <andriin@fb.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Cc: Lorenz Bauer <lmb@cloudflare.com>
Link: https://lore.kernel.org/bpf/20200903203542.15944-15-andriin@fb.com

ee333df5

selftests/bpf: Modernize xdp_noinline test w/ skeleton and __noinline · baaf680e

由 Andrii Nakryiko 提交于 9月 03, 2020

Update xdp_noinline to use BPF skeleton and force __noinline on helper
sub-programs. Also, split existing logic into v4- and v6-only to complicate
sub-program calling patterns (partially overlapped sets of functions for
entry-level BPF programs).
Signed-off-by: NAndrii Nakryiko <andriin@fb.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200903203542.15944-14-andriin@fb.com

baaf680e

selftests/bpf: Add subprogs to pyperf, strobemeta, and l4lb_noinline tests · fab45be1

由 Andrii Nakryiko 提交于 9月 03, 2020

Add use of non-inlined subprogs to few bigger selftests to excercise libbpf's
bpf2bpf handling logic. Also split l4lb_all selftest into two sub-tests.
Signed-off-by: NAndrii Nakryiko <andriin@fb.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200903203542.15944-13-andriin@fb.com

fab45be1

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功