- 15 5月, 2020 1 次提交
-
-
由 Andrey Ignatov 提交于
bpf_sock_addr.user_port supports only 4-byte load and it leads to ugly code in BPF programs, like: volatile __u32 user_port = ctx->user_port; __u16 port = bpf_ntohs(user_port); Since otherwise clang may optimize the load to be 2-byte and it's rejected by verifier. Add support for 1- and 2-byte loads same way as it's supported for other fields in bpf_sock_addr like user_ip4, msg_src_ip4, etc. Signed-off-by: NAndrey Ignatov <rdna@fb.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Acked-by: NYonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/c1e983f4c17573032601d0b2b1f9d1274f24bc16.1589420814.git.rdna@fb.com
-
- 14 5月, 2020 6 次提交
-
-
由 Yonghong Song 提交于
This is to be consistent with tracing and lsm programs which have prefix "bpf_trace_" and "bpf_lsm_" respectively. Suggested-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NYonghong Song <yhs@fb.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Acked-by: NAndrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200513180216.2949387-1-yhs@fb.com
-
由 Yonghong Song 提交于
Commit 6879c042 ("tools/bpf: selftests: Add bpf_iter selftests") added self tests for bpf_iter feature. But two subtests ipv6_route and netlink needs llvm latest 10.x release branch or trunk due to a bug in llvm BPF backend. This patch added the file README.rst to document these two failures so people using llvm 10.0.0 can be aware of them. Suggested-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NYonghong Song <yhs@fb.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200513180215.2949237-1-yhs@fb.com
-
由 Andrii Nakryiko 提交于
It is sometimes desirable to be able to trigger BPF program from user-space with minimal overhead. sys_enter would seem to be a good candidate, yet in a lot of cases there will be a lot of noise from syscalls triggered by other processes on the system. So while searching for low-overhead alternative, I've stumbled upon getpgid() syscall, which seems to be specific enough to not suffer from accidental syscall by other apps. This set of benchmarks compares tp, raw_tp w/ filtering by syscall ID, kprobe, fentry and fmod_ret with returning error (so that syscall would not be executed), to determine the lowest-overhead way. Here are results on my machine (using benchs/run_bench_trigger.sh script): base : 9.200 ± 0.319M/s tp : 6.690 ± 0.125M/s rawtp : 8.571 ± 0.214M/s kprobe : 6.431 ± 0.048M/s fentry : 8.955 ± 0.241M/s fmodret : 8.903 ± 0.135M/s So it seems like fmodret doesn't give much benefit for such lightweight syscall. Raw tracepoint is pretty decent despite additional filtering logic, but it will be called for any other syscall in the system, which rules it out. Fentry, though, seems to be adding the least amoung of overhead and achieves 97.3% of performance of baseline no-BPF-attached syscall. Using getpgid() seems to be preferable to set_task_comm() approach from test_overhead, as it's about 2.35x faster in a baseline performance. Signed-off-by: NAndrii Nakryiko <andriin@fb.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Acked-by: NJohn Fastabend <john.fastabend@gmail.com> Acked-by: NYonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20200512192445.2351848-5-andriin@fb.com
-
由 Andrii Nakryiko 提交于
Add fmod_ret BPF program to existing test_overhead selftest. Also re-implement user-space benchmarking part into benchmark runner to compare results. Results with ./bench are consistently somewhat lower than test_overhead's, but relative performance of various types of BPF programs stay consisten (e.g., kretprobe is noticeably slower). This slowdown seems to be coming from the fact that test_overhead is single-threaded, while benchmark always spins off at least one thread for producer. This has been confirmed by hacking multi-threaded test_overhead variant and also single-threaded bench variant. Resutls are below. run_bench_rename.sh script from benchs/ subdirectory was used to produce results for ./bench. Single-threaded implementations =============================== /* bench: single-threaded, atomics */ base : 4.622 ± 0.049M/s kprobe : 3.673 ± 0.052M/s kretprobe : 2.625 ± 0.052M/s rawtp : 4.369 ± 0.089M/s fentry : 4.201 ± 0.558M/s fexit : 4.309 ± 0.148M/s fmodret : 4.314 ± 0.203M/s /* selftest: single-threaded, no atomics */ task_rename base 4555K events per sec task_rename kprobe 3643K events per sec task_rename kretprobe 2506K events per sec task_rename raw_tp 4303K events per sec task_rename fentry 4307K events per sec task_rename fexit 4010K events per sec task_rename fmod_ret 3984K events per sec Multi-threaded implementations ============================== /* bench: multi-threaded w/ atomics */ base : 3.910 ± 0.023M/s kprobe : 3.048 ± 0.037M/s kretprobe : 2.300 ± 0.015M/s rawtp : 3.687 ± 0.034M/s fentry : 3.740 ± 0.087M/s fexit : 3.510 ± 0.009M/s fmodret : 3.485 ± 0.050M/s /* selftest: multi-threaded w/ atomics */ task_rename base 3872K events per sec task_rename kprobe 3068K events per sec task_rename kretprobe 2350K events per sec task_rename raw_tp 3731K events per sec task_rename fentry 3639K events per sec task_rename fexit 3558K events per sec task_rename fmod_ret 3511K events per sec /* selftest: multi-threaded, no atomics */ task_rename base 3945K events per sec task_rename kprobe 3298K events per sec task_rename kretprobe 2451K events per sec task_rename raw_tp 3718K events per sec task_rename fentry 3782K events per sec task_rename fexit 3543K events per sec task_rename fmod_ret 3526K events per sec Note that the fact that ./bench benchmark always uses atomic increments for counting, while test_overhead doesn't, doesn't influence test results all that much. Signed-off-by: NAndrii Nakryiko <andriin@fb.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Acked-by: NJohn Fastabend <john.fastabend@gmail.com> Acked-by: NYonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20200512192445.2351848-4-andriin@fb.com
-
由 Andrii Nakryiko 提交于
While working on BPF ringbuf implementation, testing, and benchmarking, I've developed a pretty generic and modular benchmark runner, which seems to be generically useful, as I've already used it for one more purpose (testing fastest way to trigger BPF program, to minimize overhead of in-kernel code). This patch adds generic part of benchmark runner and sets up Makefile for extending it with more sets of benchmarks. Benchmarker itself operates by spinning up specified number of producer and consumer threads, setting up interval timer sending SIGALARM signal to application once a second. Every second, current snapshot with hits/drops counters are collected and stored in an array. Drops are useful for producer/consumer benchmarks in which producer might overwhelm consumers. Once test finishes after given amount of warm-up and testing seconds, mean and stddev are calculated (ignoring warm-up results) and is printed out to stdout. This setup seems to give consistent and accurate results. To validate behavior, I added two atomic counting tests: global and local. For global one, all the producer threads are atomically incrementing same counter as fast as possible. This, of course, leads to huge drop of performance once there is more than one producer thread due to CPUs fighting for the same memory location. Local counting, on the other hand, maintains one counter per each producer thread, incremented independently. Once per second, all counters are read and added together to form final "counting throughput" measurement. As expected, such setup demonstrates linear scalability with number of producers (as long as there are enough physical CPU cores, of course). See example output below. Also, this setup can nicely demonstrate disastrous effects of false sharing, if care is not taken to take those per-producer counters apart into independent cache lines. Demo output shows global counter first with 1 producer, then with 4. Both total and per-producer performance significantly drop. The last run is local counter with 4 producers, demonstrating near-perfect scalability. $ ./bench -a -w1 -d2 -p1 count-global Setting up benchmark 'count-global'... Benchmark 'count-global' started. Iter 0 ( 24.822us): hits 148.179M/s (148.179M/prod), drops 0.000M/s Iter 1 ( 37.939us): hits 149.308M/s (149.308M/prod), drops 0.000M/s Iter 2 (-10.774us): hits 150.717M/s (150.717M/prod), drops 0.000M/s Iter 3 ( 3.807us): hits 151.435M/s (151.435M/prod), drops 0.000M/s Summary: hits 150.488 ± 1.079M/s (150.488M/prod), drops 0.000 ± 0.000M/s $ ./bench -a -w1 -d2 -p4 count-global Setting up benchmark 'count-global'... Benchmark 'count-global' started. Iter 0 ( 60.659us): hits 53.910M/s ( 13.477M/prod), drops 0.000M/s Iter 1 (-17.658us): hits 53.722M/s ( 13.431M/prod), drops 0.000M/s Iter 2 ( 5.865us): hits 53.495M/s ( 13.374M/prod), drops 0.000M/s Iter 3 ( 0.104us): hits 53.606M/s ( 13.402M/prod), drops 0.000M/s Summary: hits 53.608 ± 0.113M/s ( 13.402M/prod), drops 0.000 ± 0.000M/s $ ./bench -a -w1 -d2 -p4 count-local Setting up benchmark 'count-local'... Benchmark 'count-local' started. Iter 0 ( 23.388us): hits 640.450M/s (160.113M/prod), drops 0.000M/s Iter 1 ( 2.291us): hits 605.661M/s (151.415M/prod), drops 0.000M/s Iter 2 ( -6.415us): hits 607.092M/s (151.773M/prod), drops 0.000M/s Iter 3 ( -1.361us): hits 601.796M/s (150.449M/prod), drops 0.000M/s Summary: hits 604.849 ± 2.739M/s (151.212M/prod), drops 0.000 ± 0.000M/s Benchmark runner supports setting thread affinity for producer and consumer threads. You can use -a flag for default CPU selection scheme, where first consumer gets CPU #0, next one gets CPU #1, and so on. Then producer threads pick up next CPU and increment one-by-one as well. But user can also specify a set of CPUs independently for producers and consumers with --prod-affinity 1,2-10,15 and --cons-affinity <set-of-cpus>. The latter allows to force producers and consumers to share same set of CPUs, if necessary. Signed-off-by: NAndrii Nakryiko <andriin@fb.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Acked-by: NYonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20200512192445.2351848-3-andriin@fb.com
-
由 Andrii Nakryiko 提交于
Add testing_helpers.c, which will contain generic helpers for test runners and tests needing some common generic functionality, like parsing a set of numbers. Signed-off-by: NAndrii Nakryiko <andriin@fb.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Acked-by: NYonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20200512192445.2351848-2-andriin@fb.com
-
- 13 5月, 2020 2 次提交
-
-
由 Eelco Chaudron 提交于
When the probe code was failing for any reason ENOTSUP was returned, even if this was due to not having enough lock space. This patch fixes this by returning EPERM to the user application, so it can respond and increase the RLIMIT_MEMLOCK size. Signed-off-by: NEelco Chaudron <echaudro@redhat.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NYonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/158927424896.2342.10402475603585742943.stgit@ebuild
-
由 Yauheni Kaliuta 提交于
Before commit 74b5a596 ("selftests/bpf: Replace test_progs and test_maps w/ general rule") selftests/bpf used generic install target from selftests/lib.mk to install generated bpf test progs by mentioning them in TEST_GEN_FILES variable. Take that functionality back. Fixes: 74b5a596 ("selftests/bpf: Replace test_progs and test_maps w/ general rule") Signed-off-by: NYauheni Kaliuta <yauheni.kaliuta@redhat.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NAndrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200513021722.7787-1-yauheni.kaliuta@redhat.com
-
- 12 5月, 2020 3 次提交
-
-
由 Quentin Monnet 提交于
Synchronise the bpf.h header under tools, to report the fixes recently brought to the documentation for the BPF helpers. Signed-off-by: NQuentin Monnet <quentin@isovalent.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20200511161536.29853-5-quentin@isovalent.com
-
由 Quentin Monnet 提交于
Bring minor improvements to bpftool documentation. Fix or harmonise formatting, update map types (including in interactive help), improve description for "map create", fix a build warning due to a missing line after the double-colon for the "bpftool prog profile" example, complete/harmonise/sort the list of related bpftool man pages in footers. v2: - Remove (instead of changing) mark-up on "value" in bpftool-map.rst, when it does not refer to something passed on the command line. - Fix an additional typo ("hexadeximal") in the same file. Signed-off-by: NQuentin Monnet <quentin@isovalent.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20200511161536.29853-3-quentin@isovalent.com
-
由 Quentin Monnet 提交于
Replace the use of kernel-only integer typedefs (u8, u32, etc.) by their user space counterpart (__u8, __u32, etc.). Similarly to what libbpf does, poison the typedefs to avoid introducing them again in the future. Signed-off-by: NQuentin Monnet <quentin@isovalent.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20200511161536.29853-2-quentin@isovalent.com
-
- 11 5月, 2020 1 次提交
-
-
由 Gustavo A. R. Silva 提交于
The current codebase makes use of the zero-length array language extension to the C90 standard, but the preferred mechanism to declare variable-length types such as these ones is a flexible array member[1][2], introduced in C99: struct foo { int stuff; struct boo array[]; }; By making use of the mechanism above, we will get a compiler warning in case the flexible array does not occur last in the structure, which will help us prevent some kind of undefined behavior bugs from being inadvertently introduced[3] to the codebase from now on. Also, notice that, dynamic memory allocations won't be affected by this change: "Flexible array members have incomplete type, and so the sizeof operator may not be applied. As a quirk of the original implementation of zero-length arrays, sizeof evaluates to zero."[1] sizeof(flexible-array-member) triggers a warning because flexible array members have incomplete type[1]. There are some instances of code in which the sizeof operator is being incorrectly/erroneously applied to zero-length arrays and the result is zero. Such instances may be hiding some bugs. So, this work (flexible-array member conversions) will also help to get completely rid of those sorts of issues. This issue was found with the help of Coccinelle. [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html [2] https://github.com/KSPP/linux/issues/21 [3] commit 76497732 ("cxgb3/l2t: Fix undefined behaviour") Signed-off-by: NGustavo A. R. Silva <gustavoars@kernel.org> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NYonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20200507185057.GA13981@embeddedor
-
- 10 5月, 2020 11 次提交
-
-
由 Song Liu 提交于
runqslower doesn't specify include path for uapi/bpf.h. This causes the following warning: In file included from runqslower.c:10: .../tools/testing/selftests/bpf/tools/include/bpf/bpf.h:234:38: warning: 'enum bpf_stats_type' declared inside parameter list will not be visible outside of this definition or declaration 234 | LIBBPF_API int bpf_enable_stats(enum bpf_stats_type type); Fix this by adding -I tools/includ/uapi to the Makefile. Reported-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NSong Liu <songliubraving@fb.com> Acked-by: NAndrii Nakryiko <andriin@fb.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
-
由 Yonghong Song 提交于
The added test includes the following subtests: - test verifier change for btf_id_or_null - test load/create_iter/read for ipv6_route/netlink/bpf_map/task/task_file - test anon bpf iterator - test anon bpf iterator reading one char at a time - test file bpf iterator - test overflow (single bpf program output not overflow) - test overflow (single bpf program output overflows) - test bpf prog returning 1 The ipv6_route tests the following verifier change - access fields in the variable length array of the structure. The netlink load tests the following verifier change - put a btf_id ptr value in a stack and accessible to tracing/iter programs. The anon bpf iterator also tests link auto attach through skeleton. $ test_progs -n 2 #2/1 btf_id_or_null:OK #2/2 ipv6_route:OK #2/3 netlink:OK #2/4 bpf_map:OK #2/5 task:OK #2/6 task_file:OK #2/7 anon:OK #2/8 anon-read-one-char:OK #2/9 file:OK #2/10 overflow:OK #2/11 overflow-e2big:OK #2/12 prog-ret-1:OK #2 bpf_iter:OK Summary: 1/12 PASSED, 0 SKIPPED, 0 FAILED Signed-off-by: NYonghong Song <yhs@fb.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Acked-by: NAndrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200509175923.2477637-1-yhs@fb.com
-
由 Yonghong Song 提交于
The implementation is arbitrary, just to show how the bpf programs can be written for bpf_map/task/task_file. They can be costomized for specific needs. For example, for bpf_map, the iterator prints out: $ cat /sys/fs/bpf/my_bpf_map id refcnt usercnt locked_vm 3 2 0 20 6 2 0 20 9 2 0 20 12 2 0 20 13 2 0 20 16 2 0 20 19 2 0 20 %%% END %%% For task, the iterator prints out: $ cat /sys/fs/bpf/my_task tgid gid 1 1 2 2 .... 1944 1944 1948 1948 1949 1949 1953 1953 === END === For task/file, the iterator prints out: $ cat /sys/fs/bpf/my_task_file tgid gid fd file 1 1 0 ffffffff95c97600 1 1 1 ffffffff95c97600 1 1 2 ffffffff95c97600 .... 1895 1895 255 ffffffff95c8fe00 1932 1932 0 ffffffff95c8fe00 1932 1932 1 ffffffff95c8fe00 1932 1932 2 ffffffff95c8fe00 1932 1932 3 ffffffff95c185c0 This is able to print out all open files (fd and file->f_op), so user can compare f_op against a particular kernel file operations to find what it is. For example, from /proc/kallsyms, we can find ffffffff95c185c0 r eventfd_fops so we will know tgid 1932 fd 3 is an eventfd file descriptor. Signed-off-by: NYonghong Song <yhs@fb.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Acked-by: NAndrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200509175922.2477576-1-yhs@fb.com
-
由 Yonghong Song 提交于
Two bpf programs are added in this patch for netlink and ipv6_route target. On my VM, I am able to achieve identical results compared to /proc/net/netlink and /proc/net/ipv6_route. $ cat /proc/net/netlink sk Eth Pid Groups Rmem Wmem Dump Locks Drops Inode 000000002c42d58b 0 0 00000000 0 0 0 2 0 7 00000000a4e8b5e1 0 1 00000551 0 0 0 2 0 18719 00000000e1b1c195 4 0 00000000 0 0 0 2 0 16422 000000007e6b29f9 6 0 00000000 0 0 0 2 0 16424 .... 00000000159a170d 15 1862 00000002 0 0 0 2 0 1886 000000009aca4bc9 15 3918224839 00000002 0 0 0 2 0 19076 00000000d0ab31d2 15 1 00000002 0 0 0 2 0 18683 000000008398fb08 16 0 00000000 0 0 0 2 0 27 $ cat /sys/fs/bpf/my_netlink sk Eth Pid Groups Rmem Wmem Dump Locks Drops Inode 000000002c42d58b 0 0 00000000 0 0 0 2 0 7 00000000a4e8b5e1 0 1 00000551 0 0 0 2 0 18719 00000000e1b1c195 4 0 00000000 0 0 0 2 0 16422 000000007e6b29f9 6 0 00000000 0 0 0 2 0 16424 .... 00000000159a170d 15 1862 00000002 0 0 0 2 0 1886 000000009aca4bc9 15 3918224839 00000002 0 0 0 2 0 19076 00000000d0ab31d2 15 1 00000002 0 0 0 2 0 18683 000000008398fb08 16 0 00000000 0 0 0 2 0 27 $ cat /proc/net/ipv6_route fe800000000000000000000000000000 40 00000000000000000000000000000000 00 00000000000000000000000000000000 00000100 00000001 00000000 00000001 eth0 00000000000000000000000000000000 00 00000000000000000000000000000000 00 00000000000000000000000000000000 ffffffff 00000001 00000000 00200200 lo 00000000000000000000000000000001 80 00000000000000000000000000000000 00 00000000000000000000000000000000 00000000 00000003 00000000 80200001 lo fe80000000000000c04b03fffe7827ce 80 00000000000000000000000000000000 00 00000000000000000000000000000000 00000000 00000002 00000000 80200001 eth0 ff000000000000000000000000000000 08 00000000000000000000000000000000 00 00000000000000000000000000000000 00000100 00000003 00000000 00000001 eth0 00000000000000000000000000000000 00 00000000000000000000000000000000 00 00000000000000000000000000000000 ffffffff 00000001 00000000 00200200 lo $ cat /sys/fs/bpf/my_ipv6_route fe800000000000000000000000000000 40 00000000000000000000000000000000 00 00000000000000000000000000000000 00000100 00000001 00000000 00000001 eth0 00000000000000000000000000000000 00 00000000000000000000000000000000 00 00000000000000000000000000000000 ffffffff 00000001 00000000 00200200 lo 00000000000000000000000000000001 80 00000000000000000000000000000000 00 00000000000000000000000000000000 00000000 00000003 00000000 80200001 lo fe80000000000000c04b03fffe7827ce 80 00000000000000000000000000000000 00 00000000000000000000000000000000 00000000 00000002 00000000 80200001 eth0 ff000000000000000000000000000000 08 00000000000000000000000000000000 00 00000000000000000000000000000000 00000100 00000003 00000000 00000001 eth0 00000000000000000000000000000000 00 00000000000000000000000000000000 00 00000000000000000000000000000000 ffffffff 00000001 00000000 00200200 lo Signed-off-by: NYonghong Song <yhs@fb.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Acked-by: NAndrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200509175921.2477493-1-yhs@fb.com
-
由 Yonghong Song 提交于
Currently, only one command is supported bpftool iter pin <bpf_prog.o> <path> It will pin the trace/iter bpf program in the object file <bpf_prog.o> to the <path> where <path> should be on a bpffs mount. For example, $ bpftool iter pin ./bpf_iter_ipv6_route.o \ /sys/fs/bpf/my_route User can then do a `cat` to print out the results: $ cat /sys/fs/bpf/my_route fe800000000000000000000000000000 40 00000000000000000000000000000000 ... 00000000000000000000000000000000 00 00000000000000000000000000000000 ... 00000000000000000000000000000001 80 00000000000000000000000000000000 ... fe800000000000008c0162fffebdfd57 80 00000000000000000000000000000000 ... ff000000000000000000000000000000 08 00000000000000000000000000000000 ... 00000000000000000000000000000000 00 00000000000000000000000000000000 ... The implementation for ipv6_route iterator is in one of subsequent patches. This patch also added BPF_LINK_TYPE_ITER to link query. In the future, we may add additional parameters to pin command by parameterizing the bpf iterator. For example, a map_id or pid may be added to let bpf program only traverses a single map or task, similar to kernel seq_file single_open(). We may also add introspection command for targets/iterators by leveraging the bpf_iter itself. Signed-off-by: NYonghong Song <yhs@fb.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200509175920.2477247-1-yhs@fb.com
-
由 Yonghong Song 提交于
These two helpers will be used later in bpf_iter bpf program bpf_iter_netlink.c. Put them in bpf_helpers.h since they could be useful in other cases. Signed-off-by: NYonghong Song <yhs@fb.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Acked-by: NAndrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200509175919.2477104-1-yhs@fb.com
-
由 Yonghong Song 提交于
Two new libbpf APIs are added to support bpf_iter: - bpf_program__attach_iter Given a bpf program and additional parameters, which is none now, returns a bpf_link. - bpf_iter_create syscall level API to create a bpf iterator. The macro BPF_SEQ_PRINTF are also introduced. The format looks like: BPF_SEQ_PRINTF(seq, "task id %d\n", pid); This macro can help bpf program writers with nicer bpf_seq_printf syntax similar to the kernel one. Signed-off-by: NYonghong Song <yhs@fb.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Acked-by: NAndrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200509175917.2476936-1-yhs@fb.com
-
由 Yonghong Song 提交于
Two helpers bpf_seq_printf and bpf_seq_write, are added for writing data to the seq_file buffer. bpf_seq_printf supports common format string flag/width/type fields so at least I can get identical results for netlink and ipv6_route targets. For bpf_seq_printf and bpf_seq_write, return value -EOVERFLOW specifically indicates a write failure due to overflow, which means the object will be repeated in the next bpf invocation if object collection stays the same. Note that if the object collection is changed, depending how collection traversal is done, even if the object still in the collection, it may not be visited. For bpf_seq_printf, format %s, %p{i,I}{4,6} needs to read kernel memory. Reading kernel memory may fail in the following two cases: - invalid kernel address, or - valid kernel address but requiring a major fault If reading kernel memory failed, the %s string will be an empty string and %p{i,I}{4,6} will be all 0. Not returning error to bpf program is consistent with what bpf_trace_printk() does for now. bpf_seq_printf may return -EBUSY meaning that internal percpu buffer for memory copy of strings or other pointees is not available. Bpf program can return 1 to indicate it wants the same object to be repeated. Right now, this should not happen on no-RT kernels since migrate_disable(), which guards bpf prog call, calls preempt_disable(). Signed-off-by: NYonghong Song <yhs@fb.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Acked-by: NAndrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200509175914.2476661-1-yhs@fb.com
-
由 Yonghong Song 提交于
A new bpf command BPF_ITER_CREATE is added. The anonymous bpf iterator is seq_file based. The seq_file private data are referenced by targets. The bpf_iter infrastructure allocated additional space at seq_file->private before the space used by targets to store some meta data, e.g., prog: prog to run session_id: an unique id for each opened seq_file seq_num: how many times bpf programs are queried in this session done_stop: an internal state to decide whether bpf program should be called in seq_ops->stop() or not The seq_num will start from 0 for valid objects. The bpf program may see the same seq_num more than once if - seq_file buffer overflow happens and the same object is retried by bpf_seq_read(), or - the bpf program explicitly requests a retry of the same object Since module is not supported for bpf_iter, all target registeration happens at __init time, so there is no need to change bpf_iter_unreg_target() as it is used mostly in error path of the init function at which time no bpf iterators have been created yet. Signed-off-by: NYonghong Song <yhs@fb.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Acked-by: NAndrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200509175905.2475770-1-yhs@fb.com
-
由 Yonghong Song 提交于
Given a bpf program, the step to create an anonymous bpf iterator is: - create a bpf_iter_link, which combines bpf program and the target. In the future, there could be more information recorded in the link. A link_fd will be returned to the user space. - create an anonymous bpf iterator with the given link_fd. The bpf_iter_link can be pinned to bpffs mount file system to create a file based bpf iterator as well. The benefit to use of bpf_iter_link: - using bpf link simplifies design and implementation as bpf link is used for other tracing bpf programs. - for file based bpf iterator, bpf_iter_link provides a standard way to replace underlying bpf programs. - for both anonymous and free based iterators, bpf link query capability can be leveraged. The patch added support of tracing/iter programs for BPF_LINK_CREATE. A new link type BPF_LINK_TYPE_ITER is added to facilitate link querying. Currently, only prog_id is needed, so there is no additional in-kernel show_fdinfo() and fill_link_info() hook is needed for BPF_LINK_TYPE_ITER link. Signed-off-by: NYonghong Song <yhs@fb.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Acked-by: NAndrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200509175901.2475084-1-yhs@fb.com
-
由 Yonghong Song 提交于
A bpf_iter program is a tracing program with attach type BPF_TRACE_ITER. The load attribute attach_btf_id is used by the verifier against a particular kernel function, which represents a target, e.g., __bpf_iter__bpf_map for target bpf_map which is implemented later. The program return value must be 0 or 1 for now. 0 : successful, except potential seq_file buffer overflow which is handled by seq_file reader. 1 : request to restart the same object In the future, other return values may be used for filtering or teminating the iterator. Signed-off-by: NYonghong Song <yhs@fb.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Acked-by: NAndrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200509175900.2474947-1-yhs@fb.com
-
- 09 5月, 2020 3 次提交
-
-
由 Stanislav Fomichev 提交于
We want to have a tighter control on what ports we bind to in the BPF_CGROUP_INET{4,6}_CONNECT hooks even if it means connect() becomes slightly more expensive. The expensive part comes from the fact that we now need to call inet_csk_get_port() that verifies that the port is not used and allocates an entry in the hash table for it. Since we can't rely on "snum || !bind_address_no_port" to prevent us from calling POST_BIND hook anymore, let's add another bind flag to indicate that the call site is BPF program. v5: * fix wrong AF_INET (should be AF_INET6) in the bpf program for v6 v3: * More bpf_bind documentation refinements (Martin KaFai Lau) * Add UDP tests as well (Martin KaFai Lau) * Don't start the thread, just do socket+bind+listen (Martin KaFai Lau) v2: * Update documentation (Andrey Ignatov) * Pass BIND_FORCE_ADDRESS_NO_PORT conditionally (Andrey Ignatov) Signed-off-by: NStanislav Fomichev <sdf@google.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NAndrey Ignatov <rdna@fb.com> Acked-by: NMartin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20200508174611.228805-5-sdf@google.com
-
由 Stanislav Fomichev 提交于
1. Move pkt_v4 and pkt_v6 into network_helpers and adjust the users. 2. Copy-paste spin_lock_thread into two tests that use it. Signed-off-by: NStanislav Fomichev <sdf@google.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NMartin KaFai Lau <kafai@fb.com> Acked-by: NAndrey Ignatov <rdna@fb.com> Link: https://lore.kernel.org/bpf/20200508174611.228805-3-sdf@google.com
-
由 Stanislav Fomichev 提交于
Move the following routines that let us start a background listener thread and connect to a server by fd to the test_prog: * start_server - socket+bind+listen * connect_to_fd - connect to the server identified by fd These will be used in the next commit. Also, extend these helpers to support AF_INET6 and accept the family as an argument. v5: * drop pthread.h (Martin KaFai Lau) * add SO_SNDTIMEO (Martin KaFai Lau) v4: * export extra helper to start server without a thread (Martin KaFai Lau) * tcp_rtt is no longer starting background thread (Martin KaFai Lau) v2: * put helpers into network_helpers.c (Andrii Nakryiko) Signed-off-by: NStanislav Fomichev <sdf@google.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NAndrey Ignatov <rdna@fb.com> Acked-by: NMartin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20200508174611.228805-2-sdf@google.com
-
- 03 5月, 2020 1 次提交
-
-
由 Vincent Cheng 提交于
Add adjust_phase to ptp_clock_caps capability to allow user to query if a PHC driver supports adjust phase with ioctl PTP_CLOCK_GETCAPS command. Signed-off-by: NVincent Cheng <vincent.cheng.xh@renesas.com> Reviewed-by: NRichard Cochran <richardcochran@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 02 5月, 2020 5 次提交
-
-
由 Stanislav Fomichev 提交于
Andrey pointed out that we can use reno instead of dctcp for CC tests and drop CONFIG_TCP_CONG_DCTCP=y requirement. Fixes: beecf11b ("bpf: Bpf_{g,s}etsockopt for struct bpf_sock_addr") Suggested-by: NAndrey Ignatov <rdna@fb.com> Signed-off-by: NStanislav Fomichev <sdf@google.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Acked-by: NMartin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20200501224320.28441-1-sdf@google.com
-
由 Stanislav Fomichev 提交于
Currently, bpf_getsockopt and bpf_setsockopt helpers operate on the 'struct bpf_sock_ops' context in BPF_PROG_TYPE_SOCK_OPS program. Let's generalize them and make them available for 'struct bpf_sock_addr'. That way, in the future, we can allow those helpers in more places. As an example, let's expose those 'struct bpf_sock_addr' based helpers to BPF_CGROUP_INET{4,6}_CONNECT hooks. That way we can override CC before the connection is made. v3: * Expose custom helpers for bpf_sock_addr context instead of doing generic bpf_sock argument (as suggested by Daniel). Even with try_socket_lock that doesn't sleep we have a problem where context sk is already locked and socket lock is non-nestable. v2: * s/BPF_PROG_TYPE_CGROUP_SOCKOPT/BPF_PROG_TYPE_SOCK_OPS/ Signed-off-by: NStanislav Fomichev <sdf@google.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Acked-by: NMartin KaFai Lau <kafai@fb.com> Acked-by: NJohn Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20200430233152.199403-1-sdf@google.com
-
由 Song Liu 提交于
Add test for BPF_ENABLE_STATS, which should enable run_time_ns stats. ~/selftests/bpf# ./test_progs -t enable_stats -v test_enable_stats:PASS:skel_open_and_load 0 nsec test_enable_stats:PASS:get_stats_fd 0 nsec test_enable_stats:PASS:attach_raw_tp 0 nsec test_enable_stats:PASS:get_prog_info 0 nsec test_enable_stats:PASS:check_stats_enabled 0 nsec test_enable_stats:PASS:check_run_cnt_valid 0 nsec Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED Signed-off-by: NSong Liu <songliubraving@fb.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Acked-by: NAndrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200430071506.1408910-4-songliubraving@fb.com
-
由 Song Liu 提交于
bpf_enable_stats() is added to enable given stats. Signed-off-by: NSong Liu <songliubraving@fb.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200430071506.1408910-3-songliubraving@fb.com
-
由 Song Liu 提交于
Currently, sysctl kernel.bpf_stats_enabled controls BPF runtime stats. Typical userspace tools use kernel.bpf_stats_enabled as follows: 1. Enable kernel.bpf_stats_enabled; 2. Check program run_time_ns; 3. Sleep for the monitoring period; 4. Check program run_time_ns again, calculate the difference; 5. Disable kernel.bpf_stats_enabled. The problem with this approach is that only one userspace tool can toggle this sysctl. If multiple tools toggle the sysctl at the same time, the measurement may be inaccurate. To fix this problem while keep backward compatibility, introduce a new bpf command BPF_ENABLE_STATS. On success, this command enables stats and returns a valid fd. BPF_ENABLE_STATS takes argument "type". Currently, only one type, BPF_STATS_RUN_TIME, is supported. We can extend the command to support other types of stats in the future. With BPF_ENABLE_STATS, user space tool would have the following flow: 1. Get a fd with BPF_ENABLE_STATS, and make sure it is valid; 2. Check program run_time_ns; 3. Sleep for the monitoring period; 4. Check program run_time_ns again, calculate the difference; 5. Close the fd. Signed-off-by: NSong Liu <songliubraving@fb.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20200430071506.1408910-2-songliubraving@fb.com
-
- 30 4月, 2020 7 次提交
-
-
由 Jakub Sitnicki 提交于
Check that verifier allows passing a map of type: BPF_MAP_TYPE_REUSEPORT_SOCKARRARY, or BPF_MAP_TYPE_SOCKMAP, or BPF_MAP_TYPE_SOCKHASH ... to bpf_sk_select_reuseport helper. Suggested-by: NJohn Fastabend <john.fastabend@gmail.com> Signed-off-by: NJakub Sitnicki <jakub@cloudflare.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20200430104738.494180-1-jakub@cloudflare.com
-
由 Andrii Nakryiko 提交于
Some versions of GCC falsely detect that vi might not be initialized. That's not true, but let's silence it with NULL initialization. Signed-off-by: NAndrii Nakryiko <andriin@fb.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20200430021436.1522502-1-andriin@fb.com
-
由 Jakub Sitnicki 提交于
Update bpf_sk_assign test to fetch the server socket from SOCKMAP, now that map lookup from BPF in SOCKMAP is enabled. This way the test TC BPF program doesn't need to know what address server socket is bound to. Signed-off-by: NJakub Sitnicki <jakub@cloudflare.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NJohn Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20200429181154.479310-4-jakub@cloudflare.com
-
由 Jakub Sitnicki 提交于
Now that bpf_map_lookup_elem() is white-listed for SOCKMAP/SOCKHASH, replace the tests which check that verifier prevents lookup on these map types with ones that ensure that lookup operation is permitted, but only with a release of acquired socket reference. Signed-off-by: NJakub Sitnicki <jakub@cloudflare.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NJohn Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20200429181154.479310-3-jakub@cloudflare.com
-
由 Quentin Monnet 提交于
The new libcap dependency is not used for an essential feature of bpftool, and we could imagine building the tool without checks on CAP_SYS_ADMIN by disabling probing features as an unprivileged users. Make it so, in order to avoid a hard dependency on libcap, and to ease packaging/embedding of bpftool. Signed-off-by: NQuentin Monnet <quentin@isovalent.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NJohn Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20200429144506.8999-4-quentin@isovalent.com
-
由 Quentin Monnet 提交于
There is demand for a way to identify what BPF helper functions are available to unprivileged users. To do so, allow unprivileged users to run "bpftool feature probe" to list BPF-related features. This will only show features accessible to those users, and may not reflect the full list of features available (to administrators) on the system. To avoid the case where bpftool is inadvertently run as non-root and would list only a subset of the features supported by the system when it would be expected to list all of them, running as unprivileged is gated behind the "unprivileged" keyword passed to the command line. When used by a privileged user, this keyword allows to drop the CAP_SYS_ADMIN and to list the features available to unprivileged users. Note that this addsd a dependency on libpcap for compiling bpftool. Note that there is no particular reason why the probes were restricted to root, other than the fact I did not need them for unprivileged and did not bother with the additional checks at the time probes were added. Signed-off-by: NQuentin Monnet <quentin@isovalent.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NJohn Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20200429144506.8999-3-quentin@isovalent.com
-
由 Quentin Monnet 提交于
The "full_mode" variable used for switching between full or partial feature probing (i.e. with or without probing helpers that will log warnings in kernel logs) was piped from the main do_probe() function down to probe_helpers_for_progtype(), where it is needed. Define it as a global variable: the calls will be more readable, and if other similar flags were to be used in the future, we could use global variables as well instead of extending again the list of arguments with new flags. Signed-off-by: NQuentin Monnet <quentin@isovalent.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NJohn Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20200429144506.8999-2-quentin@isovalent.com
-