- 18 3月, 2021 1 次提交
-
-
由 Alexei Starovoitov 提交于
The fexit/fmod_ret programs can be attached to kernel functions that can sleep. The synchronize_rcu_tasks() will not wait for such tasks to complete. In such case the trampoline image will be freed and when the task wakes up the return IP will point to freed memory causing the crash. Solve this by adding percpu_ref_get/put for the duration of trampoline and separate trampoline vs its image life times. The "half page" optimization has to be removed, since first_half->second_half->first_half transition cannot be guaranteed to complete in deterministic time. Every trampoline update becomes a new image. The image with fmod_ret or fexit progs will be freed via percpu_ref_kill and call_rcu_tasks. Together they will wait for the original function and trampoline asm to complete. The trampoline is patched from nop to jmp to skip fexit progs. They are freed independently from the trampoline. The image with fentry progs only will be freed via call_rcu_tasks_trace+call_rcu_tasks which will wait for both sleepable and non-sleepable progs to complete. Fixes: fec56f58 ("bpf: Introduce BPF trampoline") Reported-by: NAndrii Nakryiko <andrii@kernel.org> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: Paul E. McKenney <paulmck@kernel.org> # for RCU Link: https://lore.kernel.org/bpf/20210316210007.38949-1-alexei.starovoitov@gmail.com
-
- 10 3月, 2021 2 次提交
-
-
由 Björn Töpel 提交于
Currently the bpf_redirect_map() implementation dispatches to the correct map-lookup function via a switch-statement. To avoid the dispatching, this change adds bpf_redirect_map() as a map operation. Each map provides its bpf_redirect_map() version, and correct function is automatically selected by the BPF verifier. A nice side-effect of the code movement is that the map lookup functions are now local to the map implementation files, which removes one additional function call. Signed-off-by: NBjörn Töpel <bjorn.topel@intel.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NJesper Dangaard Brouer <brouer@redhat.com> Acked-by: NToke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/bpf/20210308112907.559576-2-bjorn.topel@gmail.com
-
由 Yonghong Song 提交于
For kuprobe and tracepoint bpf programs, kernel calls trace_call_bpf() which calls BPF_PROG_RUN_ARRAY_CHECK() to run the program array. Currently, BPF_PROG_RUN_ARRAY_CHECK() also calls bpf_cgroup_storage_set() to set percpu cgroup local storage with NULL value. This is due to Commit 394e40a2 ("bpf: extend bpf_prog_array to store pointers to the cgroup storage") which modified __BPF_PROG_RUN_ARRAY() to call bpf_cgroup_storage_set() and this macro is also used by BPF_PROG_RUN_ARRAY_CHECK(). kuprobe and tracepoint programs are not allowed to call bpf_get_local_storage() helper hence does not access percpu cgroup local storage. Let us change BPF_PROG_RUN_ARRAY_CHECK() not to modify percpu cgroup local storage. The issue is observed when I tried to debug [1] where percpu data is overwritten due to preempt_disable -> migration_disable change. This patch does not completely fix the above issue, which will be addressed separately, e.g., multiple cgroup prog runs may preempt each other. But it does fix any potential issue caused by tracing program overwriting percpu cgroup storage: - in a busy system, a tracing program is to run between bpf_cgroup_storage_set() and the cgroup prog run. - a kprobe program is triggered by a helper in cgroup prog before bpf_get_local_storage() is called. [1] https://lore.kernel.org/bpf/CAKH8qBuXCfUz=w8L+Fj74OaUpbosO29niYwTki7e3Ag044_aww@mail.gmail.com/T Fixes: 394e40a2 ("bpf: extend bpf_prog_array to store pointers to the cgroup storage") Signed-off-by: NYonghong Song <yhs@fb.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Acked-by: NRoman Gushchin <guro@fb.com> Link: https://lore.kernel.org/bpf/20210309185028.3763817-1-yhs@fb.com
-
- 05 3月, 2021 1 次提交
-
-
由 Lorenz Bauer 提交于
Allow to pass sk_lookup programs to PROG_TEST_RUN. User space provides the full bpf_sk_lookup struct as context. Since the context includes a socket pointer that can't be exposed to user space we define that PROG_TEST_RUN returns the cookie of the selected socket or zero in place of the socket pointer. We don't support testing programs that select a reuseport socket, since this would mean running another (unrelated) BPF program from the sk_lookup test handler. Signed-off-by: NLorenz Bauer <lmb@cloudflare.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210303101816.36774-3-lmb@cloudflare.com
-
- 27 2月, 2021 6 次提交
-
-
由 Yonghong Song 提交于
This patch added support for hashmap, percpu hashmap, lru hashmap and percpu lru hashmap. Signed-off-by: NYonghong Song <yhs@fb.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Acked-by: NAndrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210226204927.3885020-1-yhs@fb.com
-
由 Yonghong Song 提交于
The bpf_for_each_map_elem() helper is introduced which iterates all map elements with a callback function. The helper signature looks like long bpf_for_each_map_elem(map, callback_fn, callback_ctx, flags) and for each map element, the callback_fn will be called. For example, like hashmap, the callback signature may look like long callback_fn(map, key, val, callback_ctx) There are two known use cases for this. One is from upstream ([1]) where a for_each_map_elem helper may help implement a timeout mechanism in a more generic way. Another is from our internal discussion for a firewall use case where a map contains all the rules. The packet data can be compared to all these rules to decide allow or deny the packet. For array maps, users can already use a bounded loop to traverse elements. Using this helper can avoid using bounded loop. For other type of maps (e.g., hash maps) where bounded loop is hard or impossible to use, this helper provides a convenient way to operate on all elements. For callback_fn, besides map and map element, a callback_ctx, allocated on caller stack, is also passed to the callback function. This callback_ctx argument can provide additional input and allow to write to caller stack for output. If the callback_fn returns 0, the helper will iterate through next element if available. If the callback_fn returns 1, the helper will stop iterating and returns to the bpf program. Other return values are not used for now. Currently, this helper is only available with jit. It is possible to make it work with interpreter with so effort but I leave it as the future work. [1]: https://lore.kernel.org/bpf/20210122205415.113822-1-xiyou.wangcong@gmail.com/Signed-off-by: NYonghong Song <yhs@fb.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Acked-by: NAndrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210226204925.3884923-1-yhs@fb.com
-
由 Cong Wang 提交于
It is only used within sock_map.c so can become static. Suggested-by: NJakub Sitnicki <jakub@cloudflare.com> Signed-off-by: NCong Wang <cong.wang@bytedance.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Acked-by: NJakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/bpf/20210223184934.6054-7-xiyou.wangcong@gmail.com
-
由 Cong Wang 提交于
As suggested by John, clean up sockmap related Kconfigs: Reduce the scope of CONFIG_BPF_STREAM_PARSER down to TCP stream parser, to reflect its name. Make the rest sockmap code simply depend on CONFIG_BPF_SYSCALL and CONFIG_INET, the latter is still needed at this point because of TCP/UDP proto update. And leave CONFIG_NET_SOCK_MSG untouched, as it is used by non-sockmap cases. Signed-off-by: NCong Wang <cong.wang@bytedance.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Reviewed-by: NLorenz Bauer <lmb@cloudflare.com> Acked-by: NJohn Fastabend <john.fastabend@gmail.com> Acked-by: NJakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/bpf/20210223184934.6054-2-xiyou.wangcong@gmail.com
-
由 Dmitrii Banshchikov 提交于
Instead of using integer literal here and there use macro name for better context. Signed-off-by: NDmitrii Banshchikov <me@ubique.spb.ru> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Acked-by: NMartin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20210225202629.585485-1-me@ubique.spb.ru
-
由 Song Liu 提交于
To access per-task data, BPF programs usually creates a hash table with pid as the key. This is not ideal because: 1. The user need to estimate the proper size of the hash table, which may be inaccurate; 2. Big hash tables are slow; 3. To clean up the data properly during task terminations, the user need to write extra logic. Task local storage overcomes these issues and offers a better option for these per-task data. Task local storage is only available to BPF_LSM. Now enable it for tracing programs. Unlike LSM programs, tracing programs can be called in IRQ contexts. Helpers that access task local storage are updated to use raw_spin_lock_irqsave() instead of raw_spin_lock_bh(). Tracing programs can attach to functions on the task free path, e.g. exit_creds(). To avoid allocating task local storage after bpf_task_storage_free(). bpf_task_storage_get() is updated to not allocate new storage when the task is not refcounted (task->usage == 0). Signed-off-by: NSong Liu <songliubraving@fb.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Acked-by: NKP Singh <kpsingh@kernel.org> Acked-by: NMartin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20210225234319.336131-2-songliubraving@fb.com
-
- 12 2月, 2021 1 次提交
-
-
由 Florent Revest 提交于
This needs a new helper that: - can work in a sleepable context (using sock_gen_cookie) - takes a struct sock pointer and checks that it's not NULL Signed-off-by: NFlorent Revest <revest@chromium.org> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Acked-by: NKP Singh <kpsingh@kernel.org> Acked-by: NAndrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210210111406.785541-2-revest@chromium.org
-
- 11 2月, 2021 4 次提交
-
-
由 Alexei Starovoitov 提交于
Since both sleepable and non-sleepable programs execute under migrate_disable add recursion prevention mechanism to both types of programs when they're executed via bpf trampoline. Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NAndrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210210033634.62081-5-alexei.starovoitov@gmail.com
-
由 Alexei Starovoitov 提交于
Since sleepable programs don't migrate from the cpu the excution stats can be computed for them as well. Reuse the same infrastructure for both sleepable and non-sleepable programs. run_cnt -> the number of times the program was executed. run_time_ns -> the program execution time in nanoseconds including the off-cpu time when the program was sleeping. Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NAndrii Nakryiko <andrii@kernel.org> Acked-by: NKP Singh <kpsingh@kernel.org> Link: https://lore.kernel.org/bpf/20210210033634.62081-4-alexei.starovoitov@gmail.com
-
由 Alexei Starovoitov 提交于
Move bpf_prog_stats from prog->aux into prog to avoid one extra load in critical path of program execution. Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NAndrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210210033634.62081-2-alexei.starovoitov@gmail.com
-
由 Andrei Matei 提交于
Before this patch, variable offset access to the stack was dissalowed for regular instructions, but was allowed for "indirect" accesses (i.e. helpers). This patch removes the restriction, allowing reading and writing to the stack through stack pointers with variable offsets. This makes stack-allocated buffers more usable in programs, and brings stack pointers closer to other types of pointers. The motivation is being able to use stack-allocated buffers for data manipulation. When the stack size limit is sufficient, allocating buffers on the stack is simpler than per-cpu arrays, or other alternatives. In unpriviledged programs, variable-offset reads and writes are disallowed (they were already disallowed for the indirect access case) because the speculative execution checking code doesn't support them. Additionally, when writing through a variable-offset stack pointer, if any pointers are in the accessible range, there's possilibities of later leaking pointers because the write cannot be tracked precisely. Writes with variable offset mark the whole range as initialized, even though we don't know which stack slots are actually written. This is in order to not reject future reads to these slots. Note that this doesn't affect writes done through helpers; like before, helpers need the whole stack range to be initialized to begin with. All the stack slots are in range are considered scalars after the write; variable-offset register spills are not tracked. For reads, all the stack slots in the variable range needs to be initialized (but see above about what writes do), otherwise the read is rejected. All register spilled in stack slots that might be read are marked as having been read, however reads through such pointers don't do register filling; the target register will always be either a scalar or a constant zero. Signed-off-by: NAndrei Matei <andreimatei1@gmail.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210207011027.676572-2-andreimatei1@gmail.com
-
- 28 1月, 2021 1 次提交
-
-
由 Stanislav Fomichev 提交于
At the moment, BPF_CGROUP_INET{4,6}_BIND hooks can rewrite user_port to the privileged ones (< ip_unprivileged_port_start), but it will be rejected later on in the __inet_bind or __inet6_bind. Let's add another return value to indicate that CAP_NET_BIND_SERVICE check should be ignored. Use the same idea as we currently use in cgroup/egress where bit #1 indicates CN. Instead, for cgroup/bind{4,6}, bit #1 indicates that CAP_NET_BIND_SERVICE should be bypassed. v5: - rename flags to be less confusing (Andrey Ignatov) - rework BPF_PROG_CGROUP_INET_EGRESS_RUN_ARRAY to work on flags and accept BPF_RET_SET_CN (no behavioral changes) v4: - Add missing IPv6 support (Martin KaFai Lau) v3: - Update description (Martin KaFai Lau) - Fix capability restore in selftest (Martin KaFai Lau) v2: - Switch to explicit return code (Martin KaFai Lau) Signed-off-by: NStanislav Fomichev <sdf@google.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Reviewed-by: NMartin KaFai Lau <kafai@fb.com> Acked-by: NAndrey Ignatov <rdna@fb.com> Link: https://lore.kernel.org/bpf/20210127193140.3170382-1-sdf@google.com
-
- 13 1月, 2021 3 次提交
-
-
由 Andrii Nakryiko 提交于
Add support for directly accessing kernel module variables from BPF programs using special ldimm64 instructions. This functionality builds upon vmlinux ksym support, but extends ldimm64 with src_reg=BPF_PSEUDO_BTF_ID to allow specifying kernel module BTF's FD in insn[1].imm field. During BPF program load time, verifier will resolve FD to BTF object and will take reference on BTF object itself and, for module BTFs, corresponding module as well, to make sure it won't be unloaded from under running BPF program. The mechanism used is similar to how bpf_prog keeps track of used bpf_maps. One interesting change is also in how per-CPU variable is determined. The logic is to find .data..percpu data section in provided BTF, but both vmlinux and module each have their own .data..percpu entries in BTF. So for module's case, the search for DATASEC record needs to look at only module's added BTF types. This is implemented with custom search function. Signed-off-by: NAndrii Nakryiko <andrii@kernel.org> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Acked-by: NYonghong Song <yhs@fb.com> Acked-by: NHao Luo <haoluo@google.com> Link: https://lore.kernel.org/bpf/20210112075520.4103414-6-andrii@kernel.org
-
由 Andrii Nakryiko 提交于
__bpf_free_used_maps() is always defined in kernel/bpf/core.c, while include/linux/bpf.h is guarding it behind CONFIG_BPF_SYSCALL. Move it out of that guard region and fix compiler warning. Fixes: a2ea0746 ("bpf: Fix missing prog untrack in release_maps") Reported-by: Nkernel test robot <lkp@intel.com> Signed-off-by: NAndrii Nakryiko <andrii@kernel.org> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Acked-by: NYonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20210112075520.4103414-4-andrii@kernel.org
-
由 Andrii Nakryiko 提交于
Add bpf_patch_call_args() prototype. This function is called from BPF verifier and only if CONFIG_BPF_JIT_ALWAYS_ON is not defined. This fixes compiler warning about missing prototype in some kernel configurations. Fixes: 1ea47e01 ("bpf: add support for bpf_call to interpreter") Reported-by: Nkernel test robot <lkp@intel.com> Signed-off-by: NAndrii Nakryiko <andrii@kernel.org> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Acked-by: NYonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20210112075520.4103414-2-andrii@kernel.org
-
- 09 12月, 2020 1 次提交
-
-
由 Florent Revest 提交于
This moves the bpf_sock_from_file definition into net/core/filter.c which only gets compiled with CONFIG_NET and also moves the helper proto usage next to other tracing helpers that are conditional on CONFIG_NET. This avoids ld: kernel/trace/bpf_trace.o: in function `bpf_sock_from_file': bpf_trace.c:(.text+0xe23): undefined reference to `sock_from_file' When compiling a kernel with BPF and without NET. Reported-by: Nkernel test robot <lkp@intel.com> Reported-by: NRandy Dunlap <rdunlap@infradead.org> Signed-off-by: NFlorent Revest <revest@chromium.org> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Acked-by: NRandy Dunlap <rdunlap@infradead.org> Acked-by: NMartin KaFai Lau <kafai@fb.com> Acked-by: NKP Singh <kpsingh@kernel.org> Link: https://lore.kernel.org/bpf/20201208173623.1136863-1-revest@chromium.org
-
- 04 12月, 2020 1 次提交
-
-
由 Andrii Nakryiko 提交于
Remove a permeating assumption thoughout BPF verifier of vmlinux BTF. Instead, wherever BTF type IDs are involved, also track the instance of struct btf that goes along with the type ID. This allows to gradually add support for kernel module BTFs and using/tracking module types across BPF helper calls and registers. This patch also renames btf_id() function to btf_obj_id() to minimize naming clash with using btf_id to denote BTF *type* ID, rather than BTF *object*'s ID. Also, altough btf_vmlinux can't get destructed and thus doesn't need refcounting, module BTFs need that, so apply BTF refcounting universally when BPF program is using BTF-powered attachment (tp_btf, fentry/fexit, etc). This makes for simpler clean up code. Now that BTF type ID is not enough to uniquely identify a BTF type, extend BPF trampoline key to include BTF object ID. To differentiate that from target program BPF ID, set 31st bit of type ID. BTF type IDs (at least currently) are not allowed to take full 32 bits, so there is no danger of confusing that bit with a valid BTF type ID. Signed-off-by: NAndrii Nakryiko <andrii@kernel.org> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20201203204634.1325171-10-andrii@kernel.org
-
- 03 12月, 2020 3 次提交
-
-
由 Roman Gushchin 提交于
Do not use rlimit-based memory accounting for bpf progs. It has been replaced with memcg-based memory accounting. Signed-off-by: NRoman Gushchin <guro@fb.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Acked-by: NSong Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20201201215900.3569844-34-guro@fb.com
-
由 Roman Gushchin 提交于
Remove rlimit-based accounting infrastructure code, which is not used anymore. To provide a backward compatibility, use an approximation of the bpf map memory footprint as a "memlock" value, available to a user via map info. The approximation is based on the maximal number of elements and key and value sizes. Signed-off-by: NRoman Gushchin <guro@fb.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Acked-by: NSong Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20201201215900.3569844-33-guro@fb.com
-
由 Roman Gushchin 提交于
Bpf maps can be updated from an interrupt context and in such case there is no process which can be charged. It makes the memory accounting of bpf maps non-trivial. Fortunately, after commit 4127c650 ("mm: kmem: enable kernel memcg accounting from interrupt contexts") and commit b87d8cef ("mm, memcg: rework remote charging API to support nesting") it's finally possible. To make the ownership model simple and consistent, when the map is created, the memory cgroup of the current process is recorded. All subsequent allocations related to the bpf map are charged to the same memory cgroup. It includes allocations made by any processes (even if they do belong to a different cgroup) and from interrupts. This commit introduces 3 new helpers, which will be used by following commits to enable the accounting of bpf maps memory: - bpf_map_kmalloc_node() - bpf_map_kzalloc() - bpf_map_alloc_percpu() They are wrapping popular memory allocation functions. They set the active memory cgroup to the map's memory cgroup and add __GFP_ACCOUNT to the passed gfp flags. Then they call into the corresponding memory allocation function and restore the original active memory cgroup. These helpers are supposed to use everywhere except the map creation path. During the map creation when the map structure is allocated by itself, it cannot be passed to those helpers. In those cases default memory allocation function will be used with the __GFP_ACCOUNT flag. Signed-off-by: NRoman Gushchin <guro@fb.com> Acked-by: NDaniel Borkmann <daniel@iogearbox.net> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20201201215900.3569844-7-guro@fb.com
-
- 19 11月, 2020 1 次提交
-
-
由 Dmitrii Banshchikov 提交于
The helper uses CLOCK_MONOTONIC_COARSE source of time that is less accurate but more performant. We have a BPF CGROUP_SKB firewall that supports event logging through bpf_perf_event_output(). Each event has a timestamp and currently we use bpf_ktime_get_ns() for it. Use of bpf_ktime_get_coarse_ns() saves ~15-20 ns in time required for event logging. bpf_ktime_get_ns(): EgressLogByRemoteEndpoint 113.82ns 8.79M bpf_ktime_get_coarse_ns(): EgressLogByRemoteEndpoint 95.40ns 10.48M Signed-off-by: NDmitrii Banshchikov <me@ubique.spb.ru> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NMartin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20201117184549.257280-1-me@ubique.spb.ru
-
- 11 11月, 2020 1 次提交
-
-
由 Andrii Nakryiko 提交于
Add kernel module listener that will load/validate and unload module BTF. Module BTFs gets ID generated for them, which makes it possible to iterate them with existing BTF iteration API. They are given their respective module's names, which will get reported through GET_OBJ_INFO API. They are also marked as in-kernel BTFs for tooling to distinguish them from user-provided BTFs. Also, similarly to vmlinux BTF, kernel module BTFs are exposed through sysfs as /sys/kernel/btf/<module-name>. This is convenient for user-space tools to inspect module BTF contents and dump their types with existing tools: [vmuser@archvm bpf]$ ls -la /sys/kernel/btf total 0 drwxr-xr-x 2 root root 0 Nov 4 19:46 . drwxr-xr-x 13 root root 0 Nov 4 19:46 .. ... -r--r--r-- 1 root root 888 Nov 4 19:46 irqbypass -r--r--r-- 1 root root 100225 Nov 4 19:46 kvm -r--r--r-- 1 root root 35401 Nov 4 19:46 kvm_intel -r--r--r-- 1 root root 120 Nov 4 19:46 pcspkr -r--r--r-- 1 root root 399 Nov 4 19:46 serio_raw -r--r--r-- 1 root root 4094095 Nov 4 19:46 vmlinux Signed-off-by: NAndrii Nakryiko <andrii@kernel.org> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Reviewed-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://lore.kernel.org/bpf/20201110011932.3201430-5-andrii@kernel.org
-
- 07 11月, 2020 1 次提交
-
-
由 KP Singh 提交于
The currently available bpf_get_current_task returns an unsigned integer which can be used along with BPF_CORE_READ to read data from the task_struct but still cannot be used as an input argument to a helper that accepts an ARG_PTR_TO_BTF_ID of type task_struct. In order to implement this helper a new return type, RET_PTR_TO_BTF_ID, is added. This is similar to RET_PTR_TO_BTF_ID_OR_NULL but does not require checking the nullness of returned pointer. Signed-off-by: NKP Singh <kpsingh@google.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Acked-by: NSong Liu <songliubraving@fb.com> Acked-by: NMartin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20201106103747.2780972-6-kpsingh@chromium.org
-
- 29 10月, 2020 1 次提交
-
-
由 Yonghong Song 提交于
Commit e679654a ("bpf: Fix a rcu_sched stall issue with bpf task/task_file iterator") tries to fix rcu stalls warning which is caused by bpf task_file iterator when running "bpftool prog". rcu: INFO: rcu_sched self-detected stall on CPU rcu: \x097-....: (20999 ticks this GP) idle=302/1/0x4000000000000000 softirq=1508852/1508852 fqs=4913 \x09(t=21031 jiffies g=2534773 q=179750) NMI backtrace for cpu 7 CPU: 7 PID: 184195 Comm: bpftool Kdump: loaded Tainted: G W 5.8.0-00004-g68bfc7f8c1b4 #6 Hardware name: Quanta Twin Lakes MP/Twin Lakes Passive MP, BIOS F09_3A17 05/03/2019 Call Trace: <IRQ> dump_stack+0x57/0x70 nmi_cpu_backtrace.cold+0x14/0x53 ? lapic_can_unplug_cpu.cold+0x39/0x39 nmi_trigger_cpumask_backtrace+0xb7/0xc7 rcu_dump_cpu_stacks+0xa2/0xd0 rcu_sched_clock_irq.cold+0x1ff/0x3d9 ? tick_nohz_handler+0x100/0x100 update_process_times+0x5b/0x90 tick_sched_timer+0x5e/0xf0 __hrtimer_run_queues+0x12a/0x2a0 hrtimer_interrupt+0x10e/0x280 __sysvec_apic_timer_interrupt+0x51/0xe0 asm_call_on_stack+0xf/0x20 </IRQ> sysvec_apic_timer_interrupt+0x6f/0x80 ... task_file_seq_next+0x52/0xa0 bpf_seq_read+0xb9/0x320 vfs_read+0x9d/0x180 ksys_read+0x5f/0xe0 do_syscall_64+0x38/0x60 entry_SYSCALL_64_after_hwframe+0x44/0xa9 The fix is to limit the number of bpf program runs to be one million. This fixed the program in most cases. But we also found under heavy load, which can increase the wallclock time for bpf_seq_read(), the warning may still be possible. For example, calling bpf_delay() in the "while" loop of bpf_seq_read(), which will introduce artificial delay, the warning will show up in my qemu run. static unsigned q; volatile unsigned *p = &q; volatile unsigned long long ll; static void bpf_delay(void) { int i, j; for (i = 0; i < 10000; i++) for (j = 0; j < 10000; j++) ll += *p; } There are two ways to fix this issue. One is to reduce the above one million threshold to say 100,000 and hopefully rcu warning will not show up any more. Another is to introduce a target feature which enables bpf_seq_read() calling cond_resched(). This patch took second approach as the first approach may cause more -EAGAIN failures for read() syscalls. Note that not all bpf_iter targets can permit cond_resched() in bpf_seq_read() as some, e.g., netlink seq iterator, rcu read lock critical section spans through seq_ops->next() -> seq_ops->show() -> seq_ops->next(). For the kernel code with the above hack, "bpftool p" roughly takes 38 seconds to finish on my VM with 184 bpf program runs. Using the following command, I am able to collect the number of context switches: perf stat -e context-switches -- ./bpftool p >& log Without this patch, 69 context-switches With this patch, 75 context-switches This patch added additional 6 context switches, roughly every 6 seconds to reschedule, to avoid lengthy no-rescheduling which may cause the above RCU warnings. Signed-off-by: NYonghong Song <yhs@fb.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Acked-by: NAndrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20201028061054.1411116-1-yhs@fb.com
-
- 12 10月, 2020 1 次提交
-
-
由 Daniel Borkmann 提交于
Recent work in f4d05259 ("bpf: Add map_meta_equal map ops") and 134fede4 ("bpf: Relax max_entries check for most of the inner map types") added support for dynamic inner max elements for most map-in-map types. Exceptions were maps like array or prog array where the map_gen_lookup() callback uses the maps' max_entries field as a constant when emitting instructions. We recently implemented Maglev consistent hashing into Cilium's load balancer which uses map-in-map with an outer map being hash and inner being array holding the Maglev backend table for each service. This has been designed this way in order to reduce overall memory consumption given the outer hash map allows to avoid preallocating a large, flat memory area for all services. Also, the number of service mappings is not always known a-priori. The use case for dynamic inner array map entries is to further reduce memory overhead, for example, some services might just have a small number of back ends while others could have a large number. Right now the Maglev backend table for small and large number of backends would need to have the same inner array map entries which adds a lot of unneeded overhead. Dynamic inner array map entries can be realized by avoiding the inlined code generation for their lookup. The lookup will still be efficient since it will be calling into array_map_lookup_elem() directly and thus avoiding retpoline. The patch adds a BPF_F_INNER_MAP flag to map creation which therefore skips inline code generation and relaxes array_map_meta_equal() check to ignore both maps' max_entries. This also still allows to have faster lookups for map-in-map when BPF_F_INNER_MAP is not specified and hence dynamic max_entries not needed. Example code generation where inner map is dynamic sized array: # bpftool p d x i 125 int handle__sys_enter(void * ctx): ; int handle__sys_enter(void *ctx) 0: (b4) w1 = 0 ; int key = 0; 1: (63) *(u32 *)(r10 -4) = r1 2: (bf) r2 = r10 ; 3: (07) r2 += -4 ; inner_map = bpf_map_lookup_elem(&outer_arr_dyn, &key); 4: (18) r1 = map[id:468] 6: (07) r1 += 272 7: (61) r0 = *(u32 *)(r2 +0) 8: (35) if r0 >= 0x3 goto pc+5 9: (67) r0 <<= 3 10: (0f) r0 += r1 11: (79) r0 = *(u64 *)(r0 +0) 12: (15) if r0 == 0x0 goto pc+1 13: (05) goto pc+1 14: (b7) r0 = 0 15: (b4) w6 = -1 ; if (!inner_map) 16: (15) if r0 == 0x0 goto pc+6 17: (bf) r2 = r10 ; 18: (07) r2 += -4 ; val = bpf_map_lookup_elem(inner_map, &key); 19: (bf) r1 = r0 | No inlining but instead 20: (85) call array_map_lookup_elem#149280 | call to array_map_lookup_elem() ; return val ? *val : -1; | for inner array lookup. 21: (15) if r0 == 0x0 goto pc+1 ; return val ? *val : -1; 22: (61) r6 = *(u32 *)(r0 +0) ; } 23: (bc) w0 = w6 24: (95) exit Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Acked-by: NAndrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20201010234006.7075-4-daniel@iogearbox.net
-
- 03 10月, 2020 2 次提交
-
-
由 Hao Luo 提交于
Add bpf_this_cpu_ptr() to help access percpu var on this cpu. This helper always returns a valid pointer, therefore no need to check returned value for NULL. Also note that all programs run with preemption disabled, which means that the returned pointer is stable during all the execution of the program. Signed-off-by: NHao Luo <haoluo@google.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Acked-by: NAndrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200929235049.2533242-6-haoluo@google.com
-
由 Hao Luo 提交于
Add bpf_per_cpu_ptr() to help bpf programs access percpu vars. bpf_per_cpu_ptr() has the same semantic as per_cpu_ptr() in the kernel except that it may return NULL. This happens when the cpu parameter is out of range. So the caller must check the returned value. Signed-off-by: NHao Luo <haoluo@google.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Acked-by: NAndrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200929235049.2533242-5-haoluo@google.com
-
- 30 9月, 2020 2 次提交
-
-
由 Toke Høiland-Jørgensen 提交于
This enables support for attaching freplace programs to multiple attach points. It does this by amending the UAPI for bpf_link_Create with a target btf ID that can be used to supply the new attachment point along with the target program fd. The target must be compatible with the target that was supplied at program load time. The implementation reuses the checks that were factored out of check_attach_btf_id() to ensure compatibility between the BTF types of the old and new attachment. If these match, a new bpf_tracing_link will be created for the new attach target, allowing multiple attachments to co-exist simultaneously. The code could theoretically support multiple-attach of other types of tracing programs as well, but since I don't have a use case for any of those, there is no API support for doing so. Signed-off-by: NToke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Acked-by: NAndrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/160138355169.48470.17165680973640685368.stgit@toke.dk
-
由 Toke Høiland-Jørgensen 提交于
In preparation for allowing multiple attachments of freplace programs, move the references to the target program and trampoline into the bpf_tracing_link structure when that is created. To do this atomically, introduce a new mutex in prog->aux to protect writing to the two pointers to target prog and trampoline, and rename the members to make it clear that they are related. With this change, it is no longer possible to attach the same tracing program multiple times (detaching in-between), since the reference from the tracing program to the target disappears on the first attach. However, since the next patch will let the caller supply an attach target, that will also make it possible to attach to the same place multiple times. Signed-off-by: NToke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Acked-by: NAndrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/160138355059.48470.2503076992210324984.stgit@toke.dk
-
- 29 9月, 2020 5 次提交
-
-
由 Alan Maguire 提交于
A helper is added to support tracing kernel type information in BPF using the BPF Type Format (BTF). Its signature is long bpf_snprintf_btf(char *str, u32 str_size, struct btf_ptr *ptr, u32 btf_ptr_size, u64 flags); struct btf_ptr * specifies - a pointer to the data to be traced - the BTF id of the type of data pointed to - a flags field is provided for future use; these flags are not to be confused with the BTF_F_* flags below that control how the btf_ptr is displayed; the flags member of the struct btf_ptr may be used to disambiguate types in kernel versus module BTF, etc; the main distinction is the flags relate to the type and information needed in identifying it; not how it is displayed. For example a BPF program with a struct sk_buff *skb could do the following: static struct btf_ptr b = { }; b.ptr = skb; b.type_id = __builtin_btf_type_id(struct sk_buff, 1); bpf_snprintf_btf(str, sizeof(str), &b, sizeof(b), 0, 0); Default output looks like this: (struct sk_buff){ .transport_header = (__u16)65535, .mac_header = (__u16)65535, .end = (sk_buff_data_t)192, .head = (unsigned char *)0x000000007524fd8b, .data = (unsigned char *)0x000000007524fd8b, .truesize = (unsigned int)768, .users = (refcount_t){ .refs = (atomic_t){ .counter = (int)1, }, }, } Flags modifying display are as follows: - BTF_F_COMPACT: no formatting around type information - BTF_F_NONAME: no struct/union member names/types - BTF_F_PTR_RAW: show raw (unobfuscated) pointer values; equivalent to %px. - BTF_F_ZERO: show zero-valued struct/union members; they are not displayed by default Signed-off-by: NAlan Maguire <alan.maguire@oracle.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/1601292670-1616-4-git-send-email-alan.maguire@oracle.com
-
由 Alan Maguire 提交于
It will be used later for BPF structure display support Signed-off-by: NAlan Maguire <alan.maguire@oracle.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/1601292670-1616-2-git-send-email-alan.maguire@oracle.com
-
由 Toke Høiland-Jørgensen 提交于
The check_attach_btf_id() function really does three things: 1. It performs a bunch of checks on the program to ensure that the attachment is valid. 2. It stores a bunch of state about the attachment being requested in the verifier environment and struct bpf_prog objects. 3. It allocates a trampoline for the attachment. This patch splits out (1.) and (3.) into separate functions which will perform the checks, but return the computed values instead of directly modifying the environment. This is done in preparation for reusing the checks when the actual attachment is happening, which will allow tracing programs to have multiple (compatible) attachments. This also fixes a bug where a bunch of checks were skipped if a trampoline already existed for the tracing target. Fixes: 6ba43b76 ("bpf: Attachment verification for BPF_MODIFY_RETURN") Fixes: 1e6c62a8 ("bpf: Introduce sleepable BPF programs") Acked-by: NAndrii Nakryiko <andriin@fb.com> Signed-off-by: NToke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
-
由 Toke Høiland-Jørgensen 提交于
In preparation for moving code around, change a bunch of references to env->log (and the verbose() logging helper) to use bpf_log() and a direct pointer to struct bpf_verifier_log. While we're touching the function signature, mark the 'prog' argument to bpf_check_type_match() as const. Also enhance the bpf_verifier_log_needed() check to handle NULL pointers for the log struct so we can re-use the code with logging disabled. Acked-by: NAndrii Nakryiko <andriin@fb.com> Signed-off-by: NToke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
-
由 Song Liu 提交于
Add .test_run for raw_tracepoint. Also, introduce a new feature that runs the target program on a specific CPU. This is achieved by a new flag in bpf_attr.test, BPF_F_TEST_RUN_ON_CPU. When this flag is set, the program is triggered on cpu with id bpf_attr.test.cpu. This feature is needed for BPF programs that handle perf_event and other percpu resources, as the program can access these resource locally. Signed-off-by: NSong Liu <songliubraving@fb.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NJohn Fastabend <john.fastabend@gmail.com> Acked-by: NAndrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200925205432.1777-2-songliubraving@fb.com
-
- 26 9月, 2020 2 次提交
-
-
由 John Fastabend 提交于
The meaning of PTR_TO_BTF_ID_OR_NULL differs slightly from other types denoted with the *_OR_NULL type. For example the types PTR_TO_SOCKET and PTR_TO_SOCKET_OR_NULL can be used for branch analysis because the type PTR_TO_SOCKET is guaranteed to _not_ have a null value. In contrast PTR_TO_BTF_ID and BTF_TO_BTF_ID_OR_NULL have slightly different meanings. A PTR_TO_BTF_TO_ID may be a pointer to NULL value, but it is safe to read this pointer in the program context because the program context will handle any faults. The fallout is for PTR_TO_BTF_ID the verifier can assume reads are safe, but can not use the type in branch analysis. Additionally, authors need to be extra careful when passing PTR_TO_BTF_ID into helpers. In general helpers consuming type PTR_TO_BTF_ID will need to assume it may be null. Seeing the above is not obvious to readers without the back knowledge lets add a comment in the type definition. Editorial comment, as networking and tracing programs get closer and more tightly merged we may need to consider a new type that we can ensure is non-null for branch analysis and also passing into helpers. Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Acked-by: NLorenz Bauer <lmb@cloudflare.com>
-
由 Martin KaFai Lau 提交于
There is a constant need to add more fields into the bpf_tcp_sock for the bpf programs running at tc, sock_ops...etc. A current workaround could be to use bpf_probe_read_kernel(). However, other than making another helper call for reading each field and missing CO-RE, it is also not as intuitive to use as directly reading "tp->lsndtime" for example. While already having perfmon cap to do bpf_probe_read_kernel(), it will be much easier if the bpf prog can directly read from the tcp_sock. This patch tries to do that by using the existing casting-helpers bpf_skc_to_*() whose func_proto returns a btf_id. For example, the func_proto of bpf_skc_to_tcp_sock returns the btf_id of the kernel "struct tcp_sock". These helpers are also added to is_ptr_cast_function(). It ensures the returning reg (BPF_REF_0) will also carries the ref_obj_id. That will keep the ref-tracking works properly. The bpf_skc_to_* helpers are made available to most of the bpf prog types in filter.c. The bpf_skc_to_* helpers will be limited by perfmon cap. This patch adds a ARG_PTR_TO_BTF_ID_SOCK_COMMON. The helper accepting this arg can accept a btf-id-ptr (PTR_TO_BTF_ID + &btf_sock_ids[BTF_SOCK_TYPE_SOCK_COMMON]) or a legacy-ctx-convert-skc-ptr (PTR_TO_SOCK_COMMON). The bpf_skc_to_*() helpers are changed to take ARG_PTR_TO_BTF_ID_SOCK_COMMON such that they will accept pointer obtained from skb->sk. Instead of specifying both arg_type and arg_btf_id in the same func_proto which is how the current ARG_PTR_TO_BTF_ID does, the arg_btf_id of the new ARG_PTR_TO_BTF_ID_SOCK_COMMON is specified in the compatible_reg_types[] in verifier.c. The reason is the arg_btf_id is always the same. Discussion in this thread: https://lore.kernel.org/bpf/20200922070422.1917351-1-kafai@fb.com/ The ARG_PTR_TO_BTF_ID_ part gives a clear expectation that the helper is expecting a PTR_TO_BTF_ID which could be NULL. This is the same behavior as the existing helper taking ARG_PTR_TO_BTF_ID. The _SOCK_COMMON part means the helper is also expecting the legacy SOCK_COMMON pointer. By excluding the _OR_NULL part, the bpf prog cannot call helper with a literal NULL which doesn't make sense in most cases. e.g. bpf_skc_to_tcp_sock(NULL) will be rejected. All PTR_TO_*_OR_NULL reg has to do a NULL check first before passing into the helper or else the bpf prog will be rejected. This behavior is nothing new and consistent with the current expectation during bpf-prog-load. [ ARG_PTR_TO_BTF_ID_SOCK_COMMON will be used to replace ARG_PTR_TO_SOCK* of other existing helpers later such that those existing helpers can take the PTR_TO_BTF_ID returned by the bpf_skc_to_*() helpers. The only special case is bpf_sk_lookup_assign() which can accept a literal NULL ptr. It has to be handled specially in another follow up patch if there is a need (e.g. by renaming ARG_PTR_TO_SOCKET_OR_NULL to ARG_PTR_TO_BTF_ID_SOCK_COMMON_OR_NULL). ] [ When converting the older helpers that take ARG_PTR_TO_SOCK* in the later patch, if the kernel does not support BTF, ARG_PTR_TO_BTF_ID_SOCK_COMMON will behave like ARG_PTR_TO_SOCK_COMMON because no reg->type could have PTR_TO_BTF_ID in this case. It is not a concern for the newer-btf-only helper like the bpf_skc_to_*() here though because these helpers must require BTF vmlinux to begin with. ] Signed-off-by: NMartin KaFai Lau <kafai@fb.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Acked-by: NJohn Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20200925000350.3855720-1-kafai@fb.com
-