提交 · b568405856906ee4d9ba6284fd36f2928653a623 · openeuler / Kernel

28 11月, 2019 1 次提交

libbpf: Fix Makefile' libbpf symbol mismatch diagnostic · b5684058

由 Andrii Nakryiko 提交于 11月 27, 2019

Fix Makefile's diagnostic diff output when there is LIBBPF_API-versioned
symbols mismatch.

Fixes: 1bd63524 ("libbpf: handle symbol versioning properly for libbpf.a")
Signed-off-by: NAndrii Nakryiko <andriin@fb.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NYonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20191127200134.1360660-1-andriin@fb.com

b5684058

26 11月, 2019 1 次提交

libbpf: Fix usage of u32 in userspace code · b615e5a1

由 Andrii Nakryiko 提交于 11月 25, 2019

u32 is not defined for libbpf when compiled outside of kernel sources (e.g.,
in Github projection). Use __u32 instead.

Fixes: b8c54ea4 ("libbpf: Add support to attach to fentry/fexit tracing progs")
Signed-off-by: NAndrii Nakryiko <andriin@fb.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20191125212948.1163343-1-andriin@fb.com

b615e5a1

25 11月, 2019 15 次提交

bpf: Introduce BPF_TRACE_x helper for the tracing tests · f9a7cf6e

由 Martin KaFai Lau 提交于 11月 23, 2019

For BPF_PROG_TYPE_TRACING, the bpf_prog's ctx is an array of u64.
This patch borrows the idea from BPF_CALL_x in filter.h to
convert a u64 to the arg type of the traced function.

The new BPF_TRACE_x has an arg to specify the return type of a bpf_prog.
It will be used in the future TCP-ops bpf_prog that may return "void".

The new macros are defined in the new header file "bpf_trace_helpers.h".
It is under selftests/bpf/ for now.  It could be moved to libbpf later
after seeing more upcoming non-tracing use cases.

The tests are changed to use these new macros also.  Hence,
the k[s]u8/16/32/64 are no longer needed and they are removed
from the bpf_helpers.h.
Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20191123202504.1502696-1-kafai@fb.com

f9a7cf6e

bpf, testing: Add various tail call test cases · 79d49ba0

由 Daniel Borkmann 提交于 11月 22, 2019

Add several BPF kselftest cases for tail calls which test the various
patch directions, and that multiple locations are patched in same and
different programs.

  # ./test_progs -n 45
   #45/1 tailcall_1:OK
   #45/2 tailcall_2:OK
   #45/3 tailcall_3:OK
   #45/4 tailcall_4:OK
   #45/5 tailcall_5:OK
   #45 tailcalls:OK
  Summary: 1/5 PASSED, 0 SKIPPED, 0 FAILED

I've also verified the JITed dump after each of the rewrite cases that
it matches expectations.

Also regular test_verifier suite passes fine which contains further tail
call tests:

  # ./test_verifier
  [...]
  Summary: 1563 PASSED, 0 SKIPPED, 0 FAILED

Checked under JIT, interpreter and JIT + hardening.
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NAndrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/3d6cbecbeb171117dccfe153306e479798fb608d.1574452833.git.daniel@iogearbox.net

79d49ba0

selftests/bpf: Add BPF trampoline performance test · c4781e37

由 Alexei Starovoitov 提交于 11月 21, 2019

Add a test that benchmarks different ways of attaching BPF program to a kernel function.
Here are the results for 2.4Ghz x86 cpu on a kernel without mitigations:
$ ./test_progs -n 49 -v|grep events
task_rename base	2743K events per sec
task_rename kprobe	2419K events per sec
task_rename kretprobe	1876K events per sec
task_rename raw_tp	2578K events per sec
task_rename fentry	2710K events per sec
task_rename fexit	2685K events per sec

On a kernel with retpoline:
$ ./test_progs -n 49 -v|grep events
task_rename base	2401K events per sec
task_rename kprobe	1930K events per sec
task_rename kretprobe	1485K events per sec
task_rename raw_tp	2053K events per sec
task_rename fentry	2351K events per sec
task_rename fexit	2185K events per sec

All 5 approaches:
- kprobe/kretprobe in __set_task_comm()
- raw tracepoint in trace_task_rename()
- fentry/fexit in __set_task_comm()
are roughly equivalent.

__set_task_comm() by itself is quite fast, so any extra instructions add up.
Until BPF trampoline was introduced the fastest mechanism was raw tracepoint.
kprobe via ftrace was second best. kretprobe is slow due to trap. New
fentry/fexit methods via BPF trampoline are clearly the fastest and the
difference is more pronounced with retpoline on, since BPF trampoline doesn't
use indirect jumps.
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20191122011515.255371-1-ast@kernel.org

c4781e37

selftests/bpf: Ensure core_reloc_kernel is reading test_progs's data only · 6147a140

由 Andrii Nakryiko 提交于 11月 21, 2019

test_core_reloc_kernel.c selftest is the only CO-RE test that reads and
returns for validation calling thread's information (pid, tgid, comm). Thus it
has to make sure that only test_prog's invocations are honored.

Fixes: df36e621 ("selftests/bpf: add CO-RE relocs testing setup")
Reported-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NAndrii Nakryiko <andriin@fb.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20191121175900.3486133-1-andriin@fb.com

6147a140

selftests/bpf: Add verifier tests for better jmp32 register bounds · 260cb5df

由 Yonghong Song 提交于 11月 21, 2019

Three test cases are added.
Test 1: jmp32 'reg op imm'.
Test 2: jmp32 'reg op reg' where dst 'reg' has unknown constant
        and src 'reg' has known constant
Test 3: jmp32 'reg op reg' where dst 'reg' has known constant
        and src 'reg' has unknown constant
Signed-off-by: NYonghong Song <yhs@fb.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20191121170651.449096-1-yhs@fb.com

260cb5df

libbpf: Fix bpf_object name determination for bpf_object__open_file() · 1aace10f

由 Andrii Nakryiko 提交于 11月 21, 2019

If bpf_object__open_file() gets path like "some/dir/obj.o", it should derive
BPF object's name as "obj" (unless overriden through opts->object_name).
Instead, due to using `path` as a fallback value for opts->obj_name, path is
used as is for object name, so for above example BPF object's name will be
verbatim "some/dir/obj", which leads to all sorts of troubles, especially when
internal maps are concern (they are using up to 8 characters of object name).
Fix that by ensuring object_name stays NULL, unless overriden.

Fixes: 291ee02b ("libbpf: Refactor bpf_object__open APIs to use common opts")
Signed-off-by: NAndrii Nakryiko <andriin@fb.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20191122003527.551556-1-andriin@fb.com

1aace10f

selftests/bpf: Integrate verbose verifier log into test_progs · a8fdaad5

由 Andrii Nakryiko 提交于 11月 19, 2019

Add exra level of verboseness, activated by -vvv argument. When -vv is
specified, verbose libbpf and verifier log (level 1) is output, even for
successful tests. With -vvv, verifier log goes to level 2.

This is extremely useful to debug verifier failures, as well as just see the
state and flow of verification. Before this, you'd have to go and modify
load_program()'s source code inside libbpf to specify extra log_level flags,
which is suboptimal to say the least.

Currently -vv and -vvv triggering verifier output is integrated into
test_stub's bpf_prog_load as well as bpf_verif_scale.c tests.
Signed-off-by: NAndrii Nakryiko <andriin@fb.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20191120003548.4159797-1-andriin@fb.com

a8fdaad5

libbpf: Support initialized global variables · 393cdfbe

由 Andrii Nakryiko 提交于 11月 20, 2019

Initialized global variables are no different in ELF from static variables,
and don't require any extra support from libbpf. But they are matching
semantics of global data (backed by BPF maps) more closely, preventing
LLVM/Clang from aggressively inlining constant values and not requiring
volatile incantations to prevent those. This patch enables global variables.
It still disables uninitialized variables, which will be put into special COM
(common) ELF section, because BPF doesn't allow uninitialized data to be
accessed.
Signed-off-by: NAndrii Nakryiko <andriin@fb.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20191121070743.1309473-5-andriin@fb.com

393cdfbe

selftests, bpftool: Skip the build test if not in tree · 5940c5bf

由 Jakub Kicinski 提交于 11月 19, 2019

If selftests are copied over to another machine/location
for execution the build test of bpftool will obviously
not work, since the sources are not copied.
Skip it if we can't find bpftool's Makefile.
Reported-by: NNaresh Kamboju <naresh.kamboju@linaro.org>
Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: NQuentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20191119105010.19189-3-quentin.monnet@netronome.com

5940c5bf

libbpf: Fix various errors and warning reported by checkpatch.pl · 8983b731

由 Andrii Nakryiko 提交于 11月 20, 2019

Fix a bunch of warnings and errors reported by checkpatch.pl, to make it
easier to spot new problems.
Signed-off-by: NAndrii Nakryiko <andriin@fb.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20191121070743.1309473-4-andriin@fb.com

8983b731

selftests, bpftool: Set EXIT trap after usage function · 31f8b829

由 Quentin Monnet 提交于 11月 19, 2019

The trap on EXIT is used to clean up any temporary directory left by the
build attempts. It is not needed when the user simply calls the script
with its --help option, and may not be needed either if we add checks
(e.g. on the availability of bpftool files) before the build attempts.

Let's move this trap and related variables lower down in the code, so
that we don't accidentally change the value returned from the script
on early exits at pre-checks.
Signed-off-by: NQuentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Reviewed-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Link: https://lore.kernel.org/bpf/20191119105010.19189-2-quentin.monnet@netronome.com

31f8b829

libbpf: Refactor relocation handling · 1f8e2bcb

由 Andrii Nakryiko 提交于 11月 20, 2019

Relocation handling code is convoluted and unnecessarily deeply nested. Split
out per-relocation logic into separate function. Also refactor the logic to be
more a sequence of per-relocation type checks and processing steps, making it
simpler to follow control flow. This makes it easier to further extends it to
new kinds of relocations (e.g., support for extern variables).

This patch also makes relocation's section verification more robust.
Previously relocations against not yet supported externs were silently ignored
because of obj->efile.text_shndx was zero, when all BPF programs had custom
section names and there was no .text section. Also, invalid LDIMM64 relocations
against non-map sections were passed through, if they were pointing to a .text
section (or 0, which is invalid section). All these bugs are fixed within this
refactoring and checks are made more appropriate for each type of relocation.
Signed-off-by: NAndrii Nakryiko <andriin@fb.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20191121070743.1309473-3-andriin@fb.com

1f8e2bcb

tools, bpf: Fix build for 'make -s tools/bpf O=<dir>' · a89b2cbf

由 Quentin Monnet 提交于 11月 19, 2019

Building selftests with 'make TARGETS=bpf kselftest' was fixed in commit
55d554f5 ("tools: bpf: Use !building_out_of_srctree to determine
srctree"). However, by updating $(srctree) in tools/bpf/Makefile for
in-tree builds only, we leave out the case where we pass an output
directory to build BPF tools, but $(srctree) is not set. This
typically happens for:

    $ make -s tools/bpf O=/tmp/foo
    Makefile:40: /tools/build/Makefile.feature: No such file or directory

Fix it by updating $(srctree) in the Makefile not only for out-of-tree
builds, but also if $(srctree) is empty.

Detected with test_bpftool_build.sh.

Fixes: 55d554f5 ("tools: bpf: Use !building_out_of_srctree to determine srctree")
Signed-off-by: NQuentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Link: https://lore.kernel.org/bpf/20191119105626.21453-1-quentin.monnet@netronome.com

a89b2cbf

selftests/bpf: Ensure no DWARF relocations for BPF object files · ffc88174

由 Andrii Nakryiko 提交于 11月 20, 2019

Add -mattr=dwarfris attribute to llc to avoid having relocations against DWARF
data. These relocations make it impossible to inspect DWARF contents: all
strings are invalid.
Signed-off-by: NAndrii Nakryiko <andriin@fb.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20191121070743.1309473-2-andriin@fb.com

ffc88174

tools, bpftool: Fix warning on ignored return value for 'read' · a0f17cc6

由 Quentin Monnet 提交于 11月 19, 2019

When building bpftool, a warning was introduced by commit a9436460
("bpftool: Allow to read btf as raw data"), because the return value
from a call to 'read()' is ignored. Let's address it.
Signed-off-by: NQuentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Reviewed-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: NAndrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20191119111706.22440-1-quentin.monnet@netronome.com

a0f17cc6

20 11月, 2019 3 次提交

selftests/bpf: Enforce no-ALU32 for test_progs-no_alu32 · 24f65050

由 Andrii Nakryiko 提交于 11月 19, 2019

With the most recent Clang, alu32 is enabled by default if -mcpu=probe or
-mcpu=v3 is specified. Use a separate build rule with -mcpu=v2 to enforce no
ALU32 mode.
Suggested-by: NYonghong Song <yhs@fb.com>
Signed-off-by: NAndrii Nakryiko <andriin@fb.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NYonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20191120002510.4130605-1-andriin@fb.com

24f65050

libbpf: Fix call relocation offset calculation bug · a0d7da26

由 Andrii Nakryiko 提交于 11月 19, 2019

When relocating subprogram call, libbpf doesn't take into account
relo->text_off, which comes from symbol's value. This generally works fine for
subprograms implemented as static functions, but breaks for global functions.

Taking a simplified test_pkt_access.c as an example:

__attribute__ ((noinline))
static int test_pkt_access_subprog1(volatile struct __sk_buff *skb)
{
        return skb->len * 2;
}

__attribute__ ((noinline))
static int test_pkt_access_subprog2(int val, volatile struct __sk_buff *skb)
{
        return skb->len + val;
}

SEC("classifier/test_pkt_access")
int test_pkt_access(struct __sk_buff *skb)
{
        if (test_pkt_access_subprog1(skb) != skb->len * 2)
                return TC_ACT_SHOT;
        if (test_pkt_access_subprog2(2, skb) != skb->len + 2)
                return TC_ACT_SHOT;
        return TC_ACT_UNSPEC;
}

When compiled, we get two relocations, pointing to '.text' symbol. .text has
st_value set to 0 (it points to the beginning of .text section):

0000000000000008  000000050000000a R_BPF_64_32            0000000000000000 .text
0000000000000040  000000050000000a R_BPF_64_32            0000000000000000 .text

test_pkt_access_subprog1 and test_pkt_access_subprog2 offsets (targets of two
calls) are encoded within call instruction's imm32 part as -1 and 2,
respectively:

0000000000000000 test_pkt_access_subprog1:
       0:       61 10 00 00 00 00 00 00 r0 = *(u32 *)(r1 + 0)
       1:       64 00 00 00 01 00 00 00 w0 <<= 1
       2:       95 00 00 00 00 00 00 00 exit

0000000000000018 test_pkt_access_subprog2:
       3:       61 10 00 00 00 00 00 00 r0 = *(u32 *)(r1 + 0)
       4:       04 00 00 00 02 00 00 00 w0 += 2
       5:       95 00 00 00 00 00 00 00 exit

0000000000000000 test_pkt_access:
       0:       bf 16 00 00 00 00 00 00 r6 = r1
===>   1:       85 10 00 00 ff ff ff ff call -1
       2:       bc 01 00 00 00 00 00 00 w1 = w0
       3:       b4 00 00 00 02 00 00 00 w0 = 2
       4:       61 62 00 00 00 00 00 00 r2 = *(u32 *)(r6 + 0)
       5:       64 02 00 00 01 00 00 00 w2 <<= 1
       6:       5e 21 08 00 00 00 00 00 if w1 != w2 goto +8 <LBB0_3>
       7:       bf 61 00 00 00 00 00 00 r1 = r6
===>   8:       85 10 00 00 02 00 00 00 call 2
       9:       bc 01 00 00 00 00 00 00 w1 = w0
      10:       61 62 00 00 00 00 00 00 r2 = *(u32 *)(r6 + 0)
      11:       04 02 00 00 02 00 00 00 w2 += 2
      12:       b4 00 00 00 ff ff ff ff w0 = -1
      13:       1e 21 01 00 00 00 00 00 if w1 == w2 goto +1 <LBB0_3>
      14:       b4 00 00 00 02 00 00 00 w0 = 2
0000000000000078 LBB0_3:
      15:       95 00 00 00 00 00 00 00 exit

Now, if we compile example with global functions, the setup changes.
Relocations are now against specifically test_pkt_access_subprog1 and
test_pkt_access_subprog2 symbols, with test_pkt_access_subprog2 pointing 24
bytes into its respective section (.text), i.e., 3 instructions in:

0000000000000008  000000070000000a R_BPF_64_32            0000000000000000 test_pkt_access_subprog1
0000000000000048  000000080000000a R_BPF_64_32            0000000000000018 test_pkt_access_subprog2

Calls instructions now encode offsets relative to function symbols and are both
set ot -1:

0000000000000000 test_pkt_access_subprog1:
       0:       61 10 00 00 00 00 00 00 r0 = *(u32 *)(r1 + 0)
       1:       64 00 00 00 01 00 00 00 w0 <<= 1
       2:       95 00 00 00 00 00 00 00 exit

0000000000000018 test_pkt_access_subprog2:
       3:       61 20 00 00 00 00 00 00 r0 = *(u32 *)(r2 + 0)
       4:       0c 10 00 00 00 00 00 00 w0 += w1
       5:       95 00 00 00 00 00 00 00 exit

0000000000000000 test_pkt_access:
       0:       bf 16 00 00 00 00 00 00 r6 = r1
===>   1:       85 10 00 00 ff ff ff ff call -1
       2:       bc 01 00 00 00 00 00 00 w1 = w0
       3:       b4 00 00 00 02 00 00 00 w0 = 2
       4:       61 62 00 00 00 00 00 00 r2 = *(u32 *)(r6 + 0)
       5:       64 02 00 00 01 00 00 00 w2 <<= 1
       6:       5e 21 09 00 00 00 00 00 if w1 != w2 goto +9 <LBB2_3>
       7:       b4 01 00 00 02 00 00 00 w1 = 2
       8:       bf 62 00 00 00 00 00 00 r2 = r6
===>   9:       85 10 00 00 ff ff ff ff call -1
      10:       bc 01 00 00 00 00 00 00 w1 = w0
      11:       61 62 00 00 00 00 00 00 r2 = *(u32 *)(r6 + 0)
      12:       04 02 00 00 02 00 00 00 w2 += 2
      13:       b4 00 00 00 ff ff ff ff w0 = -1
      14:       1e 21 01 00 00 00 00 00 if w1 == w2 goto +1 <LBB2_3>
      15:       b4 00 00 00 02 00 00 00 w0 = 2
0000000000000080 LBB2_3:
      16:       95 00 00 00 00 00 00 00 exit

Thus the right formula to calculate target call offset after relocation should
take into account relocation's target symbol value (offset within section),
call instruction's imm32 offset, and (subtracting, to get relative instruction
offset) instruction index of call instruction itself. All that is shifted by
number of instructions in main program, given all sub-programs are copied over
after main program.

Convert few selftests relying on bpf-to-bpf calls to use global functions
instead of static ones.

Fixes: 48cca7e4 ("libbpf: add support for bpf_call")
Reported-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NAndrii Nakryiko <andriin@fb.com>
Acked-by: NYonghong Song <yhs@fb.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20191119224447.3781271-1-andriin@fb.com

a0d7da26

net-af_xdp: Use correct number of channels from ethtool · 3de88c91

由 Luigi Rizzo 提交于 11月 18, 2019

Drivers use different fields to report the number of channels, so take
the maximum of all data channels (rx, tx, combined) when determining the
size of the xsk map. The current code used only 'combined' which was set
to 0 in some drivers e.g. mlx4.

Tested: compiled and run xdpsock -q 3 -r -S on mlx4
Signed-off-by: NLuigi Rizzo <lrizzo@google.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Reviewed-by: NJakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: NMagnus Karlsson <magnus.karlsson@intel.com>
Link: https://lore.kernel.org/bpf/20191119001951.92930-1-lrizzo@google.com

3de88c91

19 11月, 2019 8 次提交

selftests: forwarding: Add speed and auto-negotiation test · 64916b57

由 Amit Cohen 提交于 11月 18, 2019

Check configurations and packets transference with different variations
of autoneg and speed.

Test plan:
1. Test force of same speed with autoneg off
2. Test force of different speeds with autoneg off (should fail)
3. One side is autoneg on and other side sets force of common speeds
4. One side is autoneg on and other side only advertises a subset of the
   common speeds (one speed of the subset)
5. One side is autoneg on and other side only advertises a subset of the
   common speeds. Check that highest speed is negotiated
6. Test autoneg on, but each side advertises different speeds (should
   fail)
Signed-off-by: NAmit Cohen <amitc@mellanox.com>
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

64916b57

selftests: forwarding: lib.sh: Add wait for dev with timeout · 8f72a9cf

由 Amit Cohen 提交于 11月 18, 2019

Add a function that waits for device with maximum number of iterations.
It enables to limit the waiting and prevent infinite loop.

This will be used by the subsequent patch which will set two ports to
different speeds in order to make sure they cannot negotiate a link.

Waiting for all the setup is limited with 10 minutes for each device.
Signed-off-by: NAmit Cohen <amitc@mellanox.com>
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8f72a9cf

selftests: forwarding: Add ethtool_lib.sh · 646cf7ed

由 Amit Cohen 提交于 11月 18, 2019

Functions:
1. speeds_arr_get
	The function returns an array of speed values from
        /usr/include/linux/ethtool.h The array looks as follows:
	[10baseT/Half] = 0,
	[10baseT/Full] = 1,
	...

2. ethtool_set:
	params: cmd
	The function runs ethtool by cmd (ethtool -s cmd) and checks if
	there was an error in configuration

3. dev_speeds_get:
	params: dev, with_mode (0 or 1), adver (0 or 1)
	return value: Array of supported/Advertised link modes
	with/without mode

	* Example 1:
	speeds_get swp1 0 0
	return: 1000 10000 40000
	* Example 2:
	speeds_get swp1 1 1
	return: 1000baseKX/Full 10000baseKR/Full 40000baseCR4/Full

4. common_speeds_get:
	params: dev1, dev2, with_mode (0 or 1), adver (0 or 1)
	return value: Array of common speeds of dev1 and dev2

	* Example:
	common_speeds_get swp1 swp2 0 0
	return: 1000 10000
	Assuming that swp1 supports 1000 10000 40000 and swp2 supports
	1000 10000
Signed-off-by: NAmit Cohen <amitc@mellanox.com>
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

646cf7ed

selftests: mlxsw: Check devlink device before running test · b22b0b0b

由 Danielle Ratson 提交于 11月 18, 2019

The scale test for Spectrum-2 should only be invoked for Spectrum-2.
Skip the test otherwise.
Signed-off-by: NDanielle Ratson <danieller@mellanox.com>
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b22b0b0b

selftests: mlxsw: Add router scale test for Spectrum-2 · 0fed96fa

由 Danielle Ratson 提交于 11月 18, 2019

Same as for Spectrum-1, test the ability to add the maximum number of
routes possible to the switch.

Invoke the test from the 'resource_scale' wrapper script.
Signed-off-by: NDanielle Ratson <danieller@mellanox.com>
Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0fed96fa

selftests, bpf: Workaround an alu32 sub-register spilling issue · 2ea2612b

由 Yonghong Song 提交于 11月 17, 2019

Currently, with latest llvm trunk, selftest test_progs failed obj
file test_seg6_loop.o with the following error in verifier:

  infinite loop detected at insn 76

The byte code sequence looks like below, and noted that alu32 has been
turned off by default for better generated codes in general:

      48:       w3 = 100
      49:       *(u32 *)(r10 - 68) = r3
      ...
  ;             if (tlv.type == SR6_TLV_PADDING) {
      76:       if w3 == 5 goto -18 <LBB0_19>
      ...
      85:       r1 = *(u32 *)(r10 - 68)
  ;     for (int i = 0; i < 100; i++) {
      86:       w1 += -1
      87:       if w1 == 0 goto +5 <LBB0_20>
      88:       *(u32 *)(r10 - 68) = r1

The main reason for verification failure is due to partial spills at
r10 - 68 for induction variable "i".

Current verifier only handles spills with 8-byte values. The above 4-byte
value spill to stack is treated to STACK_MISC and its content is not
saved. For the above example:

    w3 = 100
      R3_w=inv100 fp-64_w=inv1086626730498
    *(u32 *)(r10 - 68) = r3
      R3_w=inv100 fp-64_w=inv1086626730498
    ...
    r1 = *(u32 *)(r10 - 68)
      R1_w=inv(id=0,umax_value=4294967295,var_off=(0x0; 0xffffffff))
      fp-64=inv1086626730498

To resolve this issue, verifier needs to be extended to track sub-registers
in spilling, or llvm needs to enhanced to prevent sub-register spilling
in register allocation phase. The former will increase verifier complexity
and the latter will need some llvm "hacking".

Let us workaround this issue by declaring the induction variable as "long"
type so spilling will happen at non sub-register level. We can revisit this
later if sub-register spilling causes similar or other verification issues.
Signed-off-by: NYonghong Song <yhs@fb.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NAndrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20191117214036.1309510-1-yhs@fb.com

2ea2612b

selftests, bpf: Fix test_tc_tunnel hanging · 3b054b71

由 Jiri Benc 提交于 11月 15, 2019

When run_kselftests.sh is run, it hangs after test_tc_tunnel.sh. The reason
is test_tc_tunnel.sh ensures the server ('nc -l') is run all the time,
starting it again every time it is expected to terminate. The exception is
the final client_connect: the server is not started anymore, which ensures
no process is kept running after the test is finished.

For a sit test, though, the script is terminated prematurely without the
final client_connect and the 'nc' process keeps running. This in turn causes
the run_one function in kselftest/runner.sh to hang forever, waiting for the
runaway process to finish.

Ensure a remaining server is terminated on cleanup.

Fixes: f6ad6acc ("selftests/bpf: expand test_tc_tunnel with SIT encap")
Signed-off-by: NJiri Benc <jbenc@redhat.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NWillem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/bpf/60919291657a9ee89c708d8aababc28ebe1420be.1573821780.git.jbenc@redhat.com

3b054b71

selftests, bpf: xdping is not meant to be run standalone · 56bf877a

由 Jiri Benc 提交于 11月 18, 2019

The actual test to run is test_xdping.sh, which is already in TEST_PROGS.
The xdping program alone is not runnable with 'make run_tests', it
immediatelly fails due to missing arguments.

Move xdping to TEST_GEN_PROGS_EXTENDED in order to be built but not run.

Fixes: cd538502 ("selftests/bpf: measure RTT from xdp using xdping")
Signed-off-by: NJiri Benc <jbenc@redhat.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Reviewed-by: NAlan Maguire <alan.maguire@oracle.com>
Acked-by: NToke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/bpf/4365c81198f62521344c2215909634407184387e.1573821726.git.jbenc@redhat.com

56bf877a

18 11月, 2019 7 次提交

selftests/bpf: Add BPF_TYPE_MAP_ARRAY mmap() tests · 5051b384

由 Andrii Nakryiko 提交于 11月 17, 2019

Add selftests validating mmap()-ing BPF array maps: both single-element and
multi-element ones. Check that plain bpf_map_update_elem() and
bpf_map_lookup_elem() work correctly with memory-mapped array. Also convert
CO-RE relocation tests to use memory-mapped views of global data.
Signed-off-by: NAndrii Nakryiko <andriin@fb.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NSong Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20191117172806.2195367-6-andriin@fb.com

5051b384

libbpf: Make global data internal arrays mmap()-able, if possible · 7fe74b43

由 Andrii Nakryiko 提交于 11月 17, 2019

Add detection of BPF_F_MMAPABLE flag support for arrays and add it as an extra
flag to internal global data maps, if supported by kernel. This allows users
to memory-map global data and use it without BPF map operations, greatly
simplifying user experience.
Signed-off-by: NAndrii Nakryiko <andriin@fb.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NSong Liu <songliubraving@fb.com>
Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
Link: https://lore.kernel.org/bpf/20191117172806.2195367-5-andriin@fb.com

7fe74b43

bpf: Add mmap() support for BPF_MAP_TYPE_ARRAY · fc970227

由 Andrii Nakryiko 提交于 11月 17, 2019

Add ability to memory-map contents of BPF array map. This is extremely useful
for working with BPF global data from userspace programs. It allows to avoid
typical bpf_map_{lookup,update}_elem operations, improving both performance
and usability.

There had to be special considerations for map freezing, to avoid having
writable memory view into a frozen map. To solve this issue, map freezing and
mmap-ing is happening under mutex now:
  - if map is already frozen, no writable mapping is allowed;
  - if map has writable memory mappings active (accounted in map->writecnt),
    map freezing will keep failing with -EBUSY;
  - once number of writable memory mappings drops to zero, map freezing can be
    performed again.

Only non-per-CPU plain arrays are supported right now. Maps with spinlocks
can't be memory mapped either.

For BPF_F_MMAPABLE array, memory allocation has to be done through vmalloc()
to be mmap()'able. We also need to make sure that array data memory is
page-sized and page-aligned, so we over-allocate memory in such a way that
struct bpf_array is at the end of a single page of memory with array->value
being aligned with the start of the second page. On deallocation we need to
accomodate this memory arrangement to free vmalloc()'ed memory correctly.

One important consideration regarding how memory-mapping subsystem functions.
Memory-mapping subsystem provides few optional callbacks, among them open()
and close().  close() is called for each memory region that is unmapped, so
that users can decrease their reference counters and free up resources, if
necessary. open() is *almost* symmetrical: it's called for each memory region
that is being mapped, **except** the very first one. So bpf_map_mmap does
initial refcnt bump, while open() will do any extra ones after that. Thus
number of close() calls is equal to number of open() calls plus one more.
Signed-off-by: NAndrii Nakryiko <andriin@fb.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NSong Liu <songliubraving@fb.com>
Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
Link: https://lore.kernel.org/bpf/20191117172806.2195367-4-andriin@fb.com

fc970227

selftests/clone3: skip if clone3() is ENOSYS · 11fde161

由 Christian Brauner 提交于 11月 18, 2019

If the clone3() syscall is not implemented we should skip the tests.

Fixes: 41585bbe ("selftests: add tests for clone3() with *set_tid")
Fixes: 17a81069 ("selftests: add tests for clone3()")
Signed-off-by: NChristian Brauner <christian.brauner@ubuntu.com>

11fde161

selftests/clone3: check that all pids are released on error paths · a019ff3b

由 Andrei Vagin 提交于 11月 17, 2019

This is a regression test case for an issue when pids have not been
released on error paths.
Signed-off-by: NAndrei Vagin <avagin@gmail.com>
Link: https://lore.kernel.org/r/20191118064750.408003-3-avagin@gmail.comSigned-off-by: NChristian Brauner <christian.brauner@ubuntu.com>

a019ff3b

selftests/clone3: report a correct number of fails · 28df7515

由 Andrei Vagin 提交于 11月 17, 2019

In clone3_set_tid, a few test cases are running in a child process. And
right now, if one of these test cases fails, the whole test will exit
with the success status.

Fixes: 41585bbe ("selftests: add tests for clone3() with *set_tid")
Signed-off-by: NAndrei Vagin <avagin@gmail.com>
Link: https://lore.kernel.org/r/20191118064750.408003-2-avagin@gmail.comSigned-off-by: NChristian Brauner <christian.brauner@ubuntu.com>

28df7515

selftests/clone3: flush stdout and stderr before clone3() and _exit() · 4f5c289e

由 Andrei Vagin 提交于 11月 17, 2019

Buffers have to be flushed before clone3() to avoid double messages in
the log.

Fixes: 41585bbe ("selftests: add tests for clone3() with *set_tid")
Signed-off-by: NAndrei Vagin <avagin@gmail.com>
Link: https://lore.kernel.org/r/20191118064750.408003-1-avagin@gmail.comSigned-off-by: NChristian Brauner <christian.brauner@ubuntu.com>

4f5c289e

17 11月, 2019 1 次提交

selftests: net: avoid ptl lock contention in tcp_mmap · 597b01ed

由 Eric Dumazet 提交于 11月 15, 2019

tcp_mmap is used as a reference program for TCP rx zerocopy,
so it is important to point out some potential issues.

If multiple threads are concurrently using getsockopt(...
TCP_ZEROCOPY_RECEIVE), there is a chance the low-level mm
functions compete on shared ptl lock, if vma are arbitrary placed.

Instead of letting the mm layer place the chunks back to back,
this patch enforces an alignment so that each thread uses
a different ptl lock.

Performance measured on a 100 Gbit NIC, with 8 tcp_mmap clients
launched at the same time :

$ for f in {1..8}; do ./tcp_mmap -H 2002:a05:6608:290:: & done

In the following run, we reproduce the old behavior by requesting no alignment :

$ tcp_mmap -sz -C $((128*1024)) -a 4096
received 32768 MB (100 % mmap'ed) in 9.69532 s, 28.3516 Gbit
  cpu usage user:0.08634 sys:3.86258, 120.511 usec per MB, 171839 c-switches
received 32768 MB (100 % mmap'ed) in 25.4719 s, 10.7914 Gbit
  cpu usage user:0.055268 sys:21.5633, 659.745 usec per MB, 9065 c-switches
received 32768 MB (100 % mmap'ed) in 28.5419 s, 9.63069 Gbit
  cpu usage user:0.057401 sys:23.8761, 730.392 usec per MB, 14987 c-switches
received 32768 MB (100 % mmap'ed) in 28.655 s, 9.59268 Gbit
  cpu usage user:0.059689 sys:23.8087, 728.406 usec per MB, 18509 c-switches
received 32768 MB (100 % mmap'ed) in 28.7808 s, 9.55074 Gbit
  cpu usage user:0.066042 sys:23.4632, 718.056 usec per MB, 24702 c-switches
received 32768 MB (100 % mmap'ed) in 28.8259 s, 9.5358 Gbit
  cpu usage user:0.056547 sys:23.6628, 723.858 usec per MB, 23518 c-switches
received 32768 MB (100 % mmap'ed) in 28.8808 s, 9.51767 Gbit
  cpu usage user:0.059357 sys:23.8515, 729.703 usec per MB, 14691 c-switches
received 32768 MB (100 % mmap'ed) in 28.8879 s, 9.51534 Gbit
  cpu usage user:0.047115 sys:23.7349, 725.769 usec per MB, 21773 c-switches

New behavior (automatic alignment based on Hugepagesize),
we can see the system overhead being dramatically reduced.

$ tcp_mmap -sz -C $((128*1024))
received 32768 MB (100 % mmap'ed) in 13.5339 s, 20.3103 Gbit
  cpu usage user:0.122644 sys:3.4125, 107.884 usec per MB, 168567 c-switches
received 32768 MB (100 % mmap'ed) in 16.0335 s, 17.1439 Gbit
  cpu usage user:0.132428 sys:3.55752, 112.608 usec per MB, 188557 c-switches
received 32768 MB (100 % mmap'ed) in 17.5506 s, 15.6621 Gbit
  cpu usage user:0.155405 sys:3.24889, 103.891 usec per MB, 226652 c-switches
received 32768 MB (100 % mmap'ed) in 19.1924 s, 14.3222 Gbit
  cpu usage user:0.135352 sys:3.35583, 106.542 usec per MB, 207404 c-switches
received 32768 MB (100 % mmap'ed) in 22.3649 s, 12.2906 Gbit
  cpu usage user:0.142429 sys:3.53187, 112.131 usec per MB, 250225 c-switches
received 32768 MB (100 % mmap'ed) in 22.5336 s, 12.1986 Gbit
  cpu usage user:0.140654 sys:3.61971, 114.757 usec per MB, 253754 c-switches
received 32768 MB (100 % mmap'ed) in 22.5483 s, 12.1906 Gbit
  cpu usage user:0.134035 sys:3.55952, 112.718 usec per MB, 252997 c-switches
received 32768 MB (100 % mmap'ed) in 22.6442 s, 12.139 Gbit
  cpu usage user:0.126173 sys:3.71251, 117.147 usec per MB, 253728 c-switches
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Cc: Arjun Roy <arjunroy@google.com>
Acked-by: NSoheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

597b01ed

16 11月, 2019 4 次提交

selftests: add tests for clone3() with *set_tid · 41585bbe

由 Adrian Reber 提交于 11月 15, 2019

This tests clone3() with *set_tid to see if all desired PIDs are working
as expected. The tests are trying multiple invalid input parameters as
well as creating processes while specifying a certain PID in multiple
PID namespaces at the same time.

Additionally this moves common clone3() test code into clone3_selftests.h.
Signed-off-by: NAdrian Reber <areber@redhat.com>
Acked-by: NChristian Brauner <christian.brauner@ubuntu.com>
Link: https://lore.kernel.org/r/20191115123621.142252-2-areber@redhat.comSigned-off-by: NChristian Brauner <christian.brauner@ubuntu.com>

41585bbe

selftests/bpf: Add a test for attaching BPF prog to another BPF prog and subprog · d6f39601

由 Alexei Starovoitov 提交于 11月 14, 2019

Add a test that attaches one FEXIT program to main sched_cls networking program
and two other FEXIT programs to subprograms. All three tracing programs
access return values and skb->len of networking program and subprograms.
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NSong Liu <songliubraving@fb.com>
Acked-by: NAndrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20191114185720.1641606-21-ast@kernel.org

d6f39601

selftests/bpf: Extend test_pkt_access test · 4c096324

由 Alexei Starovoitov 提交于 11月 14, 2019

The test_pkt_access.o is used by multiple tests. Fix its section name so that
program type can be automatically detected by libbpf and make it call other
subprograms with skb argument.
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NSong Liu <songliubraving@fb.com>
Acked-by: NAndrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20191114185720.1641606-20-ast@kernel.org

4c096324

libbpf: Add support for attaching BPF programs to other BPF programs · e7bf94db

由 Alexei Starovoitov 提交于 11月 14, 2019

Extend libbpf api to pass attach_prog_fd into bpf_object__open.
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NSong Liu <songliubraving@fb.com>
Link: https://lore.kernel.org/bpf/20191114185720.1641606-19-ast@kernel.org

e7bf94db

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功