提交 · 18d4e900a4500c54af56b9ad39f4d3b378eb0661 · openeuler / Kernel

16 5月, 2020 11 次提交

bpf: Selftests, improve test_sockmap total bytes counter · 18d4e900

由 John Fastabend 提交于 5月 13, 2020

The recv thread in test_sockmap waits to receive all bytes from sender but
in the case we use pop data it may wait for more bytes then actually being
sent. This stalls the test harness for multiple seconds. Because this
happens in multiple tests it slows time to run the selftest.

Fix by doing a better job of accounting for total bytes when pop helpers
are used.
Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Reviewed-by: NJakub Sitnicki <jakub@cloudflare.com>
Link: https://lore.kernel.org/bpf/158939726542.15176.5964532245173539540.stgit@john-Precision-5820-Tower

18d4e900

bpf: Selftests, print error in test_sockmap error cases · 248aba1d

由 John Fastabend 提交于 5月 13, 2020

Its helpful to know the error value if an error occurs.
Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Reviewed-by: NJakub Sitnicki <jakub@cloudflare.com>
Link: https://lore.kernel.org/bpf/158939724566.15176.12079885932643225626.stgit@john-Precision-5820-Tower

248aba1d

bpf: Selftests, sockmap test prog run without setting cgroup · 13a5f3ff

由 John Fastabend 提交于 5月 13, 2020

Running test_sockmap with arguments to specify a test pattern requires
including a cgroup argument. Instead of requiring this if the option is
not provided create one

This is not used by selftest runs but I use it when I want to test a
specific test. Most useful when developing new code and/or tests.
Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Reviewed-by: NJakub Sitnicki <jakub@cloudflare.com>
Link: https://lore.kernel.org/bpf/158939722675.15176.6294210959489131688.stgit@john-Precision-5820-Tower

13a5f3ff

bpf: Selftests, remove prints from sockmap tests · d79a3212

由 John Fastabend 提交于 5月 13, 2020

The prints in the test_sockmap programs were only useful when we
didn't have enough control over test infrastructure to know from
user program what was being pushed into kernel side.

Now that we have or will shortly have better test controls lets
remove the printers. This means we can remove half the programs
and cleanup bpf side.
Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Reviewed-by: NJakub Sitnicki <jakub@cloudflare.com>
Link: https://lore.kernel.org/bpf/158939720756.15176.9806965887313279429.stgit@john-Precision-5820-Tower

d79a3212

bpf: Selftests, move sockmap bpf prog header into progs · 991e35ee

由 John Fastabend 提交于 5月 13, 2020

Moves test_sockmap_kern.h into progs directory but does not change
code at all.
Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Reviewed-by: NJakub Sitnicki <jakub@cloudflare.com>
Link: https://lore.kernel.org/bpf/158939718921.15176.5766299102332077086.stgit@john-Precision-5820-Tower

991e35ee

selftests/bpf: Move test_align under test_progs · 3b09d27c

由 Stanislav Fomichev 提交于 5月 15, 2020

There is a much higher chance we can see the regressions if the
test is part of test_progs.
Signed-off-by: NStanislav Fomichev <sdf@google.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20200515194904.229296-2-sdf@google.com

3b09d27c

selftests/bpf: Fix test_align verifier log patterns · 5366d226

由 Stanislav Fomichev 提交于 5月 15, 2020

Commit 294f2fc6 ("bpf: Verifer, adjust_scalar_min_max_vals to always
call update_reg_bounds()") changed the way verifier logs some of its state,
adjust the test_align accordingly. Where possible, I tried to not copy-paste
the entire log line and resorted to dropping the last closing brace instead.

Fixes: 294f2fc6 ("bpf: Verifer, adjust_scalar_min_max_vals to always call update_reg_bounds()")
Signed-off-by: NStanislav Fomichev <sdf@google.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20200515194904.229296-1-sdf@google.com

5366d226

libbpf, hashmap: Fix signedness warnings · 8d35d74f

由 Ian Rogers 提交于 5月 15, 2020

Fixes the following warnings:

hashmap.c: In function ‘hashmap__clear’:
hashmap.h:150:20: error: comparison of integer expressions of different signedness: ‘int’ and ‘size_t’ {aka ‘long unsigned int’} [-Werror=sign-compare]
150 | for (bkt = 0; bkt < map->cap; bkt++) \

hashmap.c: In function ‘hashmap_grow’:
hashmap.h:150:20: error: comparison of integer expressions of different signedness: ‘int’ and ‘size_t’ {aka ‘long unsigned int’} [-Werror=sign-compare]
150 | for (bkt = 0; bkt < map->cap; bkt++) \
Signed-off-by: NIan Rogers <irogers@google.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NAndrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200515165007.217120-4-irogers@google.com

8d35d74f

libbpf, hashmap: Remove unused #include · f516acd5

由 Ian Rogers 提交于 5月 15, 2020

Remove #include of libbpf_internal.h that is unused.

Discussed in this thread:
https://lore.kernel.org/lkml/CAEf4BzZRmiEds_8R8g4vaAeWvJzPb4xYLnpF0X2VNY8oTzkphQ@mail.gmail.com/Signed-off-by: NIan Rogers <irogers@google.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NAndrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200515165007.217120-3-irogers@google.com

f516acd5

selftests: mptcp: pm: rm the right tmp file · 9a2dbb59

由 Matthieu Baerts 提交于 5月 15, 2020

"$err" is a variable pointing to a temp file. "$out" is not: only used
as a local variable in "check()" and representing the output of a
command line.

Fixes: eedbc685 (selftests: add PM netlink functional tests)
Signed-off-by: NMatthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9a2dbb59

selftests: implement flower classifier terse dump tests · e7534fd4

由 Vlad Buslov 提交于 5月 15, 2020

Implement two basic tests to verify terse dump functionality of flower
classifier:

- Test that verifies that terse dump works.

- Test that verifies that terse dump doesn't print filter key.
Signed-off-by: NVlad Buslov <vladbu@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e7534fd4

15 5月, 2020 15 次提交

selftests/bpf: Use CAP_BPF and CAP_PERFMON in tests · 81626001

由 Alexei Starovoitov 提交于 5月 13, 2020

Make all test_verifier test exercise CAP_BPF and CAP_PERFMON
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20200513230355.7858-4-alexei.starovoitov@gmail.com

81626001

bpf, bpftool: Allow probing for CONFIG_HZ from kernel config · 0ee52c0f

由 Daniel Borkmann 提交于 5月 13, 2020

In Cilium we've recently switched to make use of bpf_jiffies64() for
parts of our tc and XDP datapath since bpf_ktime_get_ns() is more
expensive and high-precision is not needed for our timeouts we have
anyway. Our agent has a probe manager which picks up the json of
bpftool's feature probe and we also use the macro output in our C
programs e.g. to have workarounds when helpers are not available on
older kernels.

Extend the kernel config info dump to also include the kernel's
CONFIG_HZ, and rework the probe_kernel_image_config() for allowing a
macro dump such that CONFIG_HZ can be propagated to BPF C code as a
simple define if available via config. Latter allows to have _compile-
time_ resolution of jiffies <-> sec conversion in our code since all
are propagated as known constants.

Given we cannot generally assume availability of kconfig everywhere,
we also have a kernel hz probe [0] as a fallback. Potentially, bpftool
could have an integrated probe fallback as well, although to derive it,
we might need to place it under 'bpftool feature probe full' or similar
given it would slow down the probing process overall. Yet 'full' doesn't
fit either for us since we don't want to pollute the kernel log with
warning messages from bpf_probe_write_user() and bpf_trace_printk() on
agent startup; I've left it out for the time being.

[0] https://github.com/cilium/cilium/blob/master/bpf/cilium-probe-kernel-hz.cSigned-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Reviewed-by: NQuentin Monnet <quentin@isovalent.com>
Cc: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20200513075849.20868-1-daniel@iogearbox.net

0ee52c0f

selftests/bpf: Xdp_adjust_tail add grow tail tests · 7ae2e00e

由 Jesper Dangaard Brouer 提交于 5月 14, 2020

Extend BPF selftest xdp_adjust_tail with grow tail tests, which is added
as subtest's. The first grow test stays in same form as original shrink
test. The second grow test use the newer bpf_prog_test_run_xattr() calls,
and does extra checking of data contents.
Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/158945350567.97035.9632611946765811876.stgit@firesoul

7ae2e00e

selftests/bpf: Adjust BPF selftest for xdp_adjust_tail · 68545fb6

由 Jesper Dangaard Brouer 提交于 5月 14, 2020

Current selftest for BPF-helper xdp_adjust_tail only shrink tail.
Make it more clear that this is a shrink test case.
Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/158945350058.97035.17280775016196207372.stgit@firesoul

68545fb6

selftests/bpf: Test for sk helpers in cgroup skb · 68e916bc

由 Andrey Ignatov 提交于 5月 14, 2020

Test bpf_sk_lookup_tcp, bpf_sk_release, bpf_sk_cgroup_id and
bpf_sk_ancestor_cgroup_id helpers from cgroup skb program.

The test creates a testing cgroup, starts a TCPv6 server inside the
cgroup and creates two client sockets: one inside testing cgroup and one
outside.

Then it attaches cgroup skb program to the cgroup that checks all TCP
segments coming to the server and allows only those coming from the
cgroup of the server. If a segment comes from a peer outside of the
cgroup, it'll be dropped.

Finally the test checks that client from inside testing cgroup can
successfully connect to the server, but client outside the cgroup fails
to connect by timeout.

The main goal of the test is to check newly introduced
bpf_sk_{,ancestor_}cgroup_id helpers.

It also checks a couple of socket lookup helpers (tcp & release), but
lookup helpers were introduced much earlier and covered by other tests.
Here it's mostly checked that they can be called from cgroup skb.
Signed-off-by: NAndrey Ignatov <rdna@fb.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NYonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/171f4c5d75e8ff4fe1c4e8c1c12288b5240a4549.1589486450.git.rdna@fb.com

68e916bc

selftests/bpf: Add connect_fd_to_fd, connect_wait net helpers · 383724e1

由 Andrey Ignatov 提交于 5月 14, 2020

Add two new network helpers.

connect_fd_to_fd connects an already created client socket fd to address
of server fd. Sometimes it's useful to separate client socket creation
and connecting this socket to a server, e.g. if client socket has to be
created in a cgroup different from that of server cgroup.

Additionally connect_to_fd is now implemented using connect_fd_to_fd,
both helpers don't treat EINPROGRESS as an error and let caller decide
how to proceed with it.

connect_wait is a helper to work with non-blocking client sockets so
that if connect_to_fd or connect_fd_to_fd returned -1 with errno ==
EINPROGRESS, caller can wait for connect to finish or for connection
timeout. The helper returns -1 on error, 0 on timeout (1sec,
hard-coded), and positive number on success.
Signed-off-by: NAndrey Ignatov <rdna@fb.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NYonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/1403fab72300f379ca97ead4820ae43eac4414ef.1589486450.git.rdna@fb.com

383724e1

bpf: Introduce bpf_sk_{, ancestor_}cgroup_id helpers · f307fa2c

由 Andrey Ignatov 提交于 5月 14, 2020

With having ability to lookup sockets in cgroup skb programs it becomes
useful to access cgroup id of retrieved sockets so that policies can be
implemented based on origin cgroup of such socket.

For example, a container running in a cgroup can have cgroup skb ingress
program that can lookup peer socket that is sending packets to a process
inside the container and decide whether those packets should be allowed
or denied based on cgroup id of the peer.

More specifically such ingress program can implement intra-host policy
"allow incoming packets only from this same container and not from any
other container on same host" w/o relying on source IP addresses since
quite often it can be the case that containers share same IP address on
the host.

Introduce two new helpers for this use-case: bpf_sk_cgroup_id() and
bpf_sk_ancestor_cgroup_id().

These helpers are similar to existing bpf_skb_{,ancestor_}cgroup_id
helpers with the only difference that sk is used to get cgroup id
instead of skb, and share code with them.

See documentation in UAPI for more details.
Signed-off-by: NAndrey Ignatov <rdna@fb.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NYonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/f5884981249ce911f63e9b57ecd5d7d19154ff39.1589486450.git.rdna@fb.com

f307fa2c

selftest/bpf: Fix spelling mistake "SIGALARM" -> "SIGALRM" · 5b0004d9

由 Colin Ian King 提交于 5月 14, 2020

There is a spelling mistake in an error message, fix it.
Signed-off-by: NColin Ian King <colin.king@canonical.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NYonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20200514121529.259668-1-colin.king@canonical.com

5b0004d9

selftests/bpf: Test narrow loads for bpf_sock_addr.user_port · 0645f7eb

由 Andrey Ignatov 提交于 5月 13, 2020

Test 1,2,4-byte loads from bpf_sock_addr.user_port in sock_addr
programs.
Signed-off-by: NAndrey Ignatov <rdna@fb.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NYonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/e5c734a58cca4041ab30cb5471e644246f8cdb5a.1589420814.git.rdna@fb.com

0645f7eb

bpf: Support narrow loads from bpf_sock_addr.user_port · 7aebfa1b

由 Andrey Ignatov 提交于 5月 13, 2020

bpf_sock_addr.user_port supports only 4-byte load and it leads to ugly
code in BPF programs, like:

	volatile __u32 user_port = ctx->user_port;
	__u16 port = bpf_ntohs(user_port);

Since otherwise clang may optimize the load to be 2-byte and it's
rejected by verifier.

Add support for 1- and 2-byte loads same way as it's supported for other
fields in bpf_sock_addr like user_ip4, msg_src_ip4, etc.
Signed-off-by: NAndrey Ignatov <rdna@fb.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NYonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/c1e983f4c17573032601d0b2b1f9d1274f24bc16.1589420814.git.rdna@fb.com

7aebfa1b

devlink: refactor end checks in devlink_nl_cmd_region_read_dumpit · 5a46b062

由 Jakub Kicinski 提交于 5月 13, 2020

Clean up after recent fixes, move address calculations
around and change the variable init, so that we can have
just one start_offset == end_offset check.

Make the check a little stricter to preserve the -EINVAL
error if requested start offset is larger than the region
itself.
Signed-off-by: NJakub Kicinski <kuba@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5a46b062

selftests/bpf: Enforce returning 0 for fentry/fexit programs · 6d74f64b

由 Yonghong Song 提交于 5月 13, 2020

There are a few fentry/fexit programs returning non-0.
The tests with these programs will break with the previous
patch which enfoced return-0 rules. Fix them properly.

Fixes: ac065870 ("selftests/bpf: Add BPF_PROG, BPF_KPROBE, and BPF_KRETPROBE macros")
Signed-off-by: NYonghong Song <yhs@fb.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NAndrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200514053207.1298479-1-yhs@fb.com

6d74f64b

selftests: fix flower parent qdisc · 0531b035

由 Vlad Buslov 提交于 5月 14, 2020

Flower tests used to create ingress filter with specified parent qdisc
"parent ffff:" but dump them on "ingress". With recent commit that fixed
tcm_parent handling in dump those are not considered same parent anymore,
which causes iproute2 tc to emit additional "parent ffff:" in first line of
filter dump output. The change in output causes filter match in tests to
fail.

Prevent parent qdisc output when dumping filters in flower tests by always
correctly specifying "ingress" parent both when creating and dumping
filters.

Fixes: a7df4870 ("net_sched: fix tcm_parent in tc filter dump")
Signed-off-by: NVlad Buslov <vladbu@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0531b035

libbpf: Fix register naming in PT_REGS s390 macros · 516d8d49

由 Sumanth Korikkar 提交于 5月 13, 2020

Fix register naming in PT_REGS s390 macros

Fixes: b8ebce86 ("libbpf: Provide CO-RE variants of PT_REGS macros")
Signed-off-by: NSumanth Korikkar <sumanthk@linux.ibm.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Reviewed-by: NJulian Wiedmann <jwi@linux.ibm.com>
Acked-by: NAndrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200513154414.29972-1-sumanthk@linux.ibm.com

516d8d49

bpf: Fix bug in mmap() implementation for BPF array map · 333291ce

由 Andrii Nakryiko 提交于 5月 12, 2020

mmap() subsystem allows user-space application to memory-map region with
initial page offset. This wasn't taken into account in initial implementation
of BPF array memory-mapping. This would result in wrong pages, not taking into
account requested page shift, being memory-mmaped into user-space. This patch
fixes this gap and adds a test for such scenario.

Fixes: fc970227 ("bpf: Add mmap() support for BPF_MAP_TYPE_ARRAY")
Signed-off-by: NAndrii Nakryiko <andriin@fb.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NYonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20200512235925.3817805-1-andriin@fb.com

333291ce

14 5月, 2020 6 次提交

bpf: Change btf_iter func proto prefix to "bpf_iter_" · 21aef70e

由 Yonghong Song 提交于 5月 13, 2020

This is to be consistent with tracing and lsm programs
which have prefix "bpf_trace_" and "bpf_lsm_" respectively.
Suggested-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NYonghong Song <yhs@fb.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NAndrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200513180216.2949387-1-yhs@fb.com

21aef70e

tools/bpf: selftests : Explain bpf_iter test failures with llvm 10.0.0 · 99aaf53e

由 Yonghong Song 提交于 5月 13, 2020

Commit 6879c042 ("tools/bpf: selftests: Add bpf_iter selftests")
added self tests for bpf_iter feature. But two subtests
ipv6_route and netlink needs llvm latest 10.x release branch
or trunk due to a bug in llvm BPF backend. This patch added
the file README.rst to document these two failures
so people using llvm 10.0.0 can be aware of them.
Suggested-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NYonghong Song <yhs@fb.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200513180215.2949237-1-yhs@fb.com

99aaf53e

selftest/bpf: Add BPF triggering benchmark · c5d420c3

由 Andrii Nakryiko 提交于 5月 12, 2020

It is sometimes desirable to be able to trigger BPF program from user-space
with minimal overhead. sys_enter would seem to be a good candidate, yet in
a lot of cases there will be a lot of noise from syscalls triggered by other
processes on the system. So while searching for low-overhead alternative, I've
stumbled upon getpgid() syscall, which seems to be specific enough to not
suffer from accidental syscall by other apps.

This set of benchmarks compares tp, raw_tp w/ filtering by syscall ID, kprobe,
fentry and fmod_ret with returning error (so that syscall would not be
executed), to determine the lowest-overhead way. Here are results on my
machine (using benchs/run_bench_trigger.sh script):

  base      :    9.200 ± 0.319M/s
  tp        :    6.690 ± 0.125M/s
  rawtp     :    8.571 ± 0.214M/s
  kprobe    :    6.431 ± 0.048M/s
  fentry    :    8.955 ± 0.241M/s
  fmodret   :    8.903 ± 0.135M/s

So it seems like fmodret doesn't give much benefit for such lightweight
syscall. Raw tracepoint is pretty decent despite additional filtering logic,
but it will be called for any other syscall in the system, which rules it out.
Fentry, though, seems to be adding the least amoung of overhead and achieves
97.3% of performance of baseline no-BPF-attached syscall.

Using getpgid() seems to be preferable to set_task_comm() approach from
test_overhead, as it's about 2.35x faster in a baseline performance.
Signed-off-by: NAndrii Nakryiko <andriin@fb.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
Acked-by: NYonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20200512192445.2351848-5-andriin@fb.com

c5d420c3

selftest/bpf: Fmod_ret prog and implement test_overhead as part of bench · 4eaf0b5c

由 Andrii Nakryiko 提交于 5月 12, 2020

Add fmod_ret BPF program to existing test_overhead selftest. Also re-implement
user-space benchmarking part into benchmark runner to compare results. Results
with ./bench are consistently somewhat lower than test_overhead's, but relative
performance of various types of BPF programs stay consisten (e.g., kretprobe is
noticeably slower). This slowdown seems to be coming from the fact that
test_overhead is single-threaded, while benchmark always spins off at least
one thread for producer. This has been confirmed by hacking multi-threaded
test_overhead variant and also single-threaded bench variant. Resutls are
below. run_bench_rename.sh script from benchs/ subdirectory was used to
produce results for ./bench.

Single-threaded implementations
===============================

/* bench: single-threaded, atomics */
base      :    4.622 ± 0.049M/s
kprobe    :    3.673 ± 0.052M/s
kretprobe :    2.625 ± 0.052M/s
rawtp     :    4.369 ± 0.089M/s
fentry    :    4.201 ± 0.558M/s
fexit     :    4.309 ± 0.148M/s
fmodret   :    4.314 ± 0.203M/s

/* selftest: single-threaded, no atomics */
task_rename base        4555K events per sec
task_rename kprobe      3643K events per sec
task_rename kretprobe   2506K events per sec
task_rename raw_tp      4303K events per sec
task_rename fentry      4307K events per sec
task_rename fexit       4010K events per sec
task_rename fmod_ret    3984K events per sec

Multi-threaded implementations
==============================

/* bench: multi-threaded w/ atomics */
base      :    3.910 ± 0.023M/s
kprobe    :    3.048 ± 0.037M/s
kretprobe :    2.300 ± 0.015M/s
rawtp     :    3.687 ± 0.034M/s
fentry    :    3.740 ± 0.087M/s
fexit     :    3.510 ± 0.009M/s
fmodret   :    3.485 ± 0.050M/s

/* selftest: multi-threaded w/ atomics */
task_rename base        3872K events per sec
task_rename kprobe      3068K events per sec
task_rename kretprobe   2350K events per sec
task_rename raw_tp      3731K events per sec
task_rename fentry      3639K events per sec
task_rename fexit       3558K events per sec
task_rename fmod_ret    3511K events per sec

/* selftest: multi-threaded, no atomics */
task_rename base        3945K events per sec
task_rename kprobe      3298K events per sec
task_rename kretprobe   2451K events per sec
task_rename raw_tp      3718K events per sec
task_rename fentry      3782K events per sec
task_rename fexit       3543K events per sec
task_rename fmod_ret    3526K events per sec

Note that the fact that ./bench benchmark always uses atomic increments for
counting, while test_overhead doesn't, doesn't influence test results all that
much.
Signed-off-by: NAndrii Nakryiko <andriin@fb.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
Acked-by: NYonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20200512192445.2351848-4-andriin@fb.com

4eaf0b5c

selftests/bpf: Add benchmark runner infrastructure · 8e7c2a02

由 Andrii Nakryiko 提交于 5月 12, 2020

While working on BPF ringbuf implementation, testing, and benchmarking, I've
developed a pretty generic and modular benchmark runner, which seems to be
generically useful, as I've already used it for one more purpose (testing
fastest way to trigger BPF program, to minimize overhead of in-kernel code).

This patch adds generic part of benchmark runner and sets up Makefile for
extending it with more sets of benchmarks.

Benchmarker itself operates by spinning up specified number of producer and
consumer threads, setting up interval timer sending SIGALARM signal to
application once a second. Every second, current snapshot with hits/drops
counters are collected and stored in an array. Drops are useful for
producer/consumer benchmarks in which producer might overwhelm consumers.

Once test finishes after given amount of warm-up and testing seconds, mean and
stddev are calculated (ignoring warm-up results) and is printed out to stdout.
This setup seems to give consistent and accurate results.

To validate behavior, I added two atomic counting tests: global and local.
For global one, all the producer threads are atomically incrementing same
counter as fast as possible. This, of course, leads to huge drop of
performance once there is more than one producer thread due to CPUs fighting
for the same memory location.

Local counting, on the other hand, maintains one counter per each producer
thread, incremented independently. Once per second, all counters are read and
added together to form final "counting throughput" measurement. As expected,
such setup demonstrates linear scalability with number of producers (as long
as there are enough physical CPU cores, of course). See example output below.
Also, this setup can nicely demonstrate disastrous effects of false sharing,
if care is not taken to take those per-producer counters apart into
independent cache lines.

Demo output shows global counter first with 1 producer, then with 4. Both
total and per-producer performance significantly drop. The last run is local
counter with 4 producers, demonstrating near-perfect scalability.

$ ./bench -a -w1 -d2 -p1 count-global
Setting up benchmark 'count-global'...
Benchmark 'count-global' started.
Iter 0 ( 24.822us): hits 148.179M/s (148.179M/prod), drops 0.000M/s
Iter 1 ( 37.939us): hits 149.308M/s (149.308M/prod), drops 0.000M/s
Iter 2 (-10.774us): hits 150.717M/s (150.717M/prod), drops 0.000M/s
Iter 3 ( 3.807us): hits 151.435M/s (151.435M/prod), drops 0.000M/s
Summary: hits 150.488 ± 1.079M/s (150.488M/prod), drops 0.000 ± 0.000M/s

$ ./bench -a -w1 -d2 -p4 count-global
Setting up benchmark 'count-global'...
Benchmark 'count-global' started.
Iter 0 ( 60.659us): hits 53.910M/s ( 13.477M/prod), drops 0.000M/s
Iter 1 (-17.658us): hits 53.722M/s ( 13.431M/prod), drops 0.000M/s
Iter 2 ( 5.865us): hits 53.495M/s ( 13.374M/prod), drops 0.000M/s
Iter 3 ( 0.104us): hits 53.606M/s ( 13.402M/prod), drops 0.000M/s
Summary: hits 53.608 ± 0.113M/s ( 13.402M/prod), drops 0.000 ± 0.000M/s

$ ./bench -a -w1 -d2 -p4 count-local
Setting up benchmark 'count-local'...
Benchmark 'count-local' started.
Iter 0 ( 23.388us): hits 640.450M/s (160.113M/prod), drops 0.000M/s
Iter 1 ( 2.291us): hits 605.661M/s (151.415M/prod), drops 0.000M/s
Iter 2 ( -6.415us): hits 607.092M/s (151.773M/prod), drops 0.000M/s
Iter 3 ( -1.361us): hits 601.796M/s (150.449M/prod), drops 0.000M/s
Summary: hits 604.849 ± 2.739M/s (151.212M/prod), drops 0.000 ± 0.000M/s

Benchmark runner supports setting thread affinity for producer and consumer
threads. You can use -a flag for default CPU selection scheme, where first
consumer gets CPU #0, next one gets CPU #1, and so on. Then producer threads
pick up next CPU and increment one-by-one as well. But user can also specify
a set of CPUs independently for producers and consumers with --prod-affinity
1,2-10,15 and --cons-affinity <set-of-cpus>. The latter allows to force
producers and consumers to share same set of CPUs, if necessary.
Signed-off-by: NAndrii Nakryiko <andriin@fb.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NYonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20200512192445.2351848-3-andriin@fb.com

8e7c2a02

selftests/bpf: Extract parse_num_list into generic testing_helpers.c · cd49291c

由 Andrii Nakryiko 提交于 5月 12, 2020

Add testing_helpers.c, which will contain generic helpers for test runners and
tests needing some common generic functionality, like parsing a set of
numbers.
Signed-off-by: NAndrii Nakryiko <andriin@fb.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NYonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20200512192445.2351848-2-andriin@fb.com

cd49291c

13 5月, 2020 2 次提交

libbpf: Fix probe code to return EPERM if encountered · fd9eef1a

由 Eelco Chaudron 提交于 5月 12, 2020

When the probe code was failing for any reason ENOTSUP was returned, even
if this was due to not having enough lock space. This patch fixes this by
returning EPERM to the user application, so it can respond and increase
the RLIMIT_MEMLOCK size.
Signed-off-by: NEelco Chaudron <echaudro@redhat.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NYonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/158927424896.2342.10402475603585742943.stgit@ebuild

fd9eef1a

selftests/bpf: Install generated test progs · 309b81f0

由 Yauheni Kaliuta 提交于 5月 13, 2020

Before commit 74b5a596 ("selftests/bpf: Replace test_progs and
test_maps w/ general rule") selftests/bpf used generic install
target from selftests/lib.mk to install generated bpf test progs
by mentioning them in TEST_GEN_FILES variable.

Take that functionality back.

Fixes: 74b5a596 ("selftests/bpf: Replace test_progs and test_maps w/ general rule")
Signed-off-by: NYauheni Kaliuta <yauheni.kaliuta@redhat.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NAndrii Nakryiko <andriin@fb.com>
Link: https://lore.kernel.org/bpf/20200513021722.7787-1-yauheni.kaliuta@redhat.com

309b81f0

12 5月, 2020 4 次提交

tools/bootconfig: Fix apply_xbc() to return zero on success · 9d82ccda

由 Steven Rostedt (VMware) 提交于 5月 08, 2020

The return of apply_xbc() returns the result of the last write() call, which
is not what is expected. It should only return zero on success.

Link: https://lore.kernel.org/r/20200508093059.GF9365@kadam

Fixes: 88426044 ("tools/bootconfig: Fix resource leak in apply_xbc()")
Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
Acked-by: NMasami Hiramatsu <mhiramat@kernel.org>
Tested-by: NMasami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>

9d82ccda

tools, bpf: Synchronise BPF UAPI header with tools · ff20460e

由 Quentin Monnet 提交于 5月 11, 2020

Synchronise the bpf.h header under tools, to report the fixes recently
brought to the documentation for the BPF helpers.
Signed-off-by: NQuentin Monnet <quentin@isovalent.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20200511161536.29853-5-quentin@isovalent.com

ff20460e

tools, bpftool: Minor fixes for documentation · c8caa0bb

由 Quentin Monnet 提交于 5月 11, 2020

Bring minor improvements to bpftool documentation. Fix or harmonise
formatting, update map types (including in interactive help), improve
description for "map create", fix a build warning due to a missing line
after the double-colon for the "bpftool prog profile" example,
complete/harmonise/sort the list of related bpftool man pages in
footers.

v2:
- Remove (instead of changing) mark-up on "value" in bpftool-map.rst,
  when it does not refer to something passed on the command line.
- Fix an additional typo ("hexadeximal") in the same file.
Signed-off-by: NQuentin Monnet <quentin@isovalent.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20200511161536.29853-3-quentin@isovalent.com

c8caa0bb

tools, bpftool: Poison and replace kernel integer typedefs · 6e7e034e

由 Quentin Monnet 提交于 5月 11, 2020

Replace the use of kernel-only integer typedefs (u8, u32, etc.) by their
user space counterpart (__u8, __u32, etc.).

Similarly to what libbpf does, poison the typedefs to avoid introducing
them again in the future.
Signed-off-by: NQuentin Monnet <quentin@isovalent.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20200511161536.29853-2-quentin@isovalent.com

6e7e034e

11 5月, 2020 1 次提交

bpf, libbpf: Replace zero-length array with flexible-array · 385bbf7b

由 Gustavo A. R. Silva 提交于 5月 07, 2020

The current codebase makes use of the zero-length array language
extension to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:

struct foo {
        int stuff;
        struct boo array[];
};

By making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last in the structure, which
will help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.

Also, notice that, dynamic memory allocations won't be affected by
this change:

"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]

sizeof(flexible-array-member) triggers a warning because flexible array
members have incomplete type[1]. There are some instances of code in
which the sizeof operator is being incorrectly/erroneously applied to
zero-length arrays and the result is zero. Such instances may be hiding
some bugs. So, this work (flexible-array member conversions) will also
help to get completely rid of those sorts of issues.

This issue was found with the help of Coccinelle.

[1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
[2] https://github.com/KSPP/linux/issues/21
[3] commit 76497732 ("cxgb3/l2t: Fix undefined behaviour")
Signed-off-by: NGustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NYonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20200507185057.GA13981@embeddedor

385bbf7b

10 5月, 2020 1 次提交

bpf, runqslower: include proper uapi/bpf.h · b4563fac

由 Song Liu 提交于 5月 04, 2020

runqslower doesn't specify include path for uapi/bpf.h. This causes the
following warning:

In file included from runqslower.c:10:
.../tools/testing/selftests/bpf/tools/include/bpf/bpf.h:234:38:
warning: 'enum bpf_stats_type' declared inside parameter list will not
be visible outside of this definition or declaration
  234 | LIBBPF_API int bpf_enable_stats(enum bpf_stats_type type);

Fix this by adding -I tools/includ/uapi to the Makefile.
Reported-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NSong Liu <songliubraving@fb.com>
Acked-by: NAndrii Nakryiko <andriin@fb.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

b4563fac

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功