- 04 11月, 2022 1 次提交
-
-
由 Kumar Kartikeya Dwivedi 提交于
To prepare the BPF verifier to handle special fields in both map values and program allocated types coming from program BTF, we need to refactor the kptr_off_tab handling code into something more generic and reusable across both cases to avoid code duplication. Later patches also require passing this data to helpers at runtime, so that they can work on user defined types, initialize them, destruct them, etc. The main observation is that both map values and such allocated types point to a type in program BTF, hence they can be handled similarly. We can prepare a field metadata table for both cases and store them in struct bpf_map or struct btf depending on the use case. Hence, refactor the code into generic btf_record and btf_field member structs. The btf_record represents the fields of a specific btf_type in user BTF. The cnt indicates the number of special fields we successfully recognized, and field_mask is a bitmask of fields that were found, to enable quick determination of availability of a certain field. Subsequently, refactor the rest of the code to work with these generic types, remove assumptions about kptr and kptr_off_tab, rename variables to more meaningful names, etc. Signed-off-by: NKumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20221103191013.1236066-7-memxor@gmail.comSigned-off-by: NAlexei Starovoitov <ast@kernel.org>
-
- 01 11月, 2022 1 次提交
-
-
由 Thomas Gleixner 提交于
Now that the 32bit UP oddity is gone and 32bit uses always a sequence count, there is no need for the fetch_irq() variants anymore. Convert to the regular interface. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/bpf/20221026123110.331690-1-bigeasy@linutronix.de
-
- 26 10月, 2022 2 次提交
-
-
由 Yonghong Song 提交于
Similar to sk/inode/task storage, implement similar cgroup local storage. There already exists a local storage implementation for cgroup-attached bpf programs. See map type BPF_MAP_TYPE_CGROUP_STORAGE and helper bpf_get_local_storage(). But there are use cases such that non-cgroup attached bpf progs wants to access cgroup local storage data. For example, tc egress prog has access to sk and cgroup. It is possible to use sk local storage to emulate cgroup local storage by storing data in socket. But this is a waste as it could be lots of sockets belonging to a particular cgroup. Alternatively, a separate map can be created with cgroup id as the key. But this will introduce additional overhead to manipulate the new map. A cgroup local storage, similar to existing sk/inode/task storage, should help for this use case. The life-cycle of storage is managed with the life-cycle of the cgroup struct. i.e. the storage is destroyed along with the owning cgroup with a call to bpf_cgrp_storage_free() when cgroup itself is deleted. The userspace map operations can be done by using a cgroup fd as a key passed to the lookup, update and delete operations. Typically, the following code is used to get the current cgroup: struct task_struct *task = bpf_get_current_task_btf(); ... task->cgroups->dfl_cgrp ... and in structure task_struct definition: struct task_struct { .... struct css_set __rcu *cgroups; .... } With sleepable program, accessing task->cgroups is not protected by rcu_read_lock. So the current implementation only supports non-sleepable program and supporting sleepable program will be the next step together with adding rcu_read_lock protection for rcu tagged structures. Since map name BPF_MAP_TYPE_CGROUP_STORAGE has been used for old cgroup local storage support, the new map name BPF_MAP_TYPE_CGRP_STORAGE is used for cgroup storage available to non-cgroup-attached bpf programs. The old cgroup storage supports bpf_get_local_storage() helper to get the cgroup data. The new cgroup storage helper bpf_cgrp_storage_get() can provide similar functionality. While old cgroup storage pre-allocates storage memory, the new mechanism can also pre-allocate with a user space bpf_map_update_elem() call to avoid potential run-time memory allocation failure. Therefore, the new cgroup storage can provide all functionality w.r.t. the old one. So in uapi bpf.h, the old BPF_MAP_TYPE_CGROUP_STORAGE is alias to BPF_MAP_TYPE_CGROUP_STORAGE_DEPRECATED to indicate the old cgroup storage can be deprecated since the new one can provide the same functionality. Acked-by: NDavid Vernet <void@manifault.com> Signed-off-by: NYonghong Song <yhs@fb.com> Link: https://lore.kernel.org/r/20221026042850.673791-1-yhs@fb.comSigned-off-by: NAlexei Starovoitov <ast@kernel.org> -
由 Martin KaFai Lau 提交于
The commit 64696c40 ("bpf: Add __bpf_prog_{enter,exit}_struct_ops for struct_ops trampoline") removed prog->active check for struct_ops prog. The bpf_lsm and bpf_iter is also using trampoline. Like struct_ops, the bpf_lsm and bpf_iter have fixed hooks for the prog to attach. The kernel does not call the same hook in a recursive way. This patch also removes the prog->active check for bpf_lsm and bpf_iter. A later patch has a test to reproduce the recursion issue for a sleepable bpf_lsm program. This patch appends the '_recur' naming to the existing enter and exit functions that track the prog->active counter. New __bpf_prog_{enter,exit}[_sleepable] function are added to skip the prog->active tracking. The '_struct_ops' version is also removed. It also moves the decision on picking the enter and exit function to the new bpf_trampoline_{enter,exit}(). It returns the '_recur' ones for all tracing progs to use. For bpf_lsm, bpf_iter, struct_ops (no prog->active tracking after 64696c40), and bpf_lsm_cgroup (no prog->active tracking after 69fd337a), it will return the functions that don't track the prog->active. Signed-off-by: NMartin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20221025184524.3526117-2-martin.lau@linux.devSigned-off-by: NAlexei Starovoitov <ast@kernel.org>
-
- 22 9月, 2022 1 次提交
-
-
由 Jiri Olsa 提交于
We got report from sysbot [1] about warnings that were caused by bpf program attached to contention_begin raw tracepoint triggering the same tracepoint by using bpf_trace_printk helper that takes trace_printk_lock lock. Call Trace: <TASK> ? trace_event_raw_event_bpf_trace_printk+0x5f/0x90 bpf_trace_printk+0x2b/0xe0 bpf_prog_a9aec6167c091eef_prog+0x1f/0x24 bpf_trace_run2+0x26/0x90 native_queued_spin_lock_slowpath+0x1c6/0x2b0 _raw_spin_lock_irqsave+0x44/0x50 bpf_trace_printk+0x3f/0xe0 bpf_prog_a9aec6167c091eef_prog+0x1f/0x24 bpf_trace_run2+0x26/0x90 native_queued_spin_lock_slowpath+0x1c6/0x2b0 _raw_spin_lock_irqsave+0x44/0x50 bpf_trace_printk+0x3f/0xe0 bpf_prog_a9aec6167c091eef_prog+0x1f/0x24 bpf_trace_run2+0x26/0x90 native_queued_spin_lock_slowpath+0x1c6/0x2b0 _raw_spin_lock_irqsave+0x44/0x50 bpf_trace_printk+0x3f/0xe0 bpf_prog_a9aec6167c091eef_prog+0x1f/0x24 bpf_trace_run2+0x26/0x90 native_queued_spin_lock_slowpath+0x1c6/0x2b0 _raw_spin_lock_irqsave+0x44/0x50 __unfreeze_partials+0x5b/0x160 ... The can be reproduced by attaching bpf program as raw tracepoint on contention_begin tracepoint. The bpf prog calls bpf_trace_printk helper. Then by running perf bench the spin lock code is forced to take slow path and call contention_begin tracepoint. Fixing this by skipping execution of the bpf program if it's already running, Using bpf prog 'active' field, which is being currently used by trampoline programs for the same reason. Moving bpf_prog_inc_misses_counter to syscall.c because trampoline.c is compiled in just for CONFIG_BPF_JIT option. Reviewed-by: NStanislav Fomichev <sdf@google.com> Reported-by: syzbot+2251879aa068ad9c960d@syzkaller.appspotmail.com [1] https://lore.kernel.org/bpf/YxhFe3EwqchC%2FfYf@krava/T/#tSigned-off-by: NJiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20220916071914.7156-1-jolsa@kernel.orgSigned-off-by: NAlexei Starovoitov <ast@kernel.org>
-
- 17 9月, 2022 2 次提交
-
-
由 Wang Yufen 提交于
Use kvmemdup_bpfptr helper instead of open-coding to simplify the code. Signed-off-by: NWang Yufen <wangyufen@huawei.com> Acked-by: NStanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/r/1663058433-14089-1-git-send-email-wangyufen@huawei.comSigned-off-by: NMartin KaFai Lau <martin.lau@kernel.org>
-
由 Lee Jones 提交于
The documentation for find_vpid() clearly states: "Must be called with the tasklist_lock or rcu_read_lock() held." Presently we do neither for find_vpid() instance in bpf_task_fd_query(). Add proper rcu_read_lock/unlock() to fix the issue. Fixes: 41bdc4b4 ("bpf: introduce bpf subcommand BPF_TASK_FD_QUERY") Signed-off-by: NLee Jones <lee@kernel.org> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NYonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20220912133855.1218900-1-lee@kernel.org
-
- 08 9月, 2022 2 次提交
-
-
由 Kumar Kartikeya Dwivedi 提交于
Enable support for kptrs in percpu BPF arraymap by wiring up the freeing of these kptrs from percpu map elements. Signed-off-by: NKumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20220904204145.3089-3-memxor@gmail.comSigned-off-by: NAlexei Starovoitov <ast@kernel.org>
-
由 Jules Irenge 提交于
Sparse reported a warning at bpf_map_free_kptrs() "warning: Using plain integer as NULL pointer" During the process of fixing this warning, it was discovered that the current code erroneously writes to the pointer variable instead of deferencing and writing to the actual kptr. Hence, Sparse tool accidentally helped to uncover this problem. Fix this by doing WRITE_ONCE(*p, 0) instead of WRITE_ONCE(p, 0). Note that the effect of this bug is that unreferenced kptrs will not be cleared during check_and_free_fields. It is not a problem if the clearing is not done during map_free stage, as there is nothing to free for them. Fixes: 14a324f6 ("bpf: Wire up freeing of referenced kptr") Signed-off-by: NJules Irenge <jbi.octave@gmail.com> Link: https://lore.kernel.org/r/Yxi3pJaK6UDjVJSy@playgroundSigned-off-by: NAlexei Starovoitov <ast@kernel.org>
-
- 05 9月, 2022 1 次提交
-
-
由 Alexei Starovoitov 提交于
SLAB_TYPESAFE_BY_RCU makes kmem_caches non mergeable and slows down kmem_cache_destroy. All bpf_mem_cache are safe to share across different maps and programs. Convert SLAB_TYPESAFE_BY_RCU to batched call_rcu. This change solves the memory consumption issue, avoids kmem_cache_destroy latency and keeps bpf hash map performance the same. Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NKumar Kartikeya Dwivedi <memxor@gmail.com> Acked-by: NAndrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220902211058.60789-10-alexei.starovoitov@gmail.com
-
- 26 8月, 2022 1 次提交
-
-
由 Benjamin Tissoires 提交于
Add BPF_MAP_GET_FD_BY_ID and BPF_MAP_DELETE_PROG. Only BPF_MAP_GET_FD_BY_ID needs to be amended to be able to access the bpf pointer either from the userspace or the kernel. Acked-by: NYonghong Song <yhs@fb.com> Signed-off-by: NBenjamin Tissoires <benjamin.tissoires@redhat.com> Link: https://lore.kernel.org/r/20220824134055.1328882-7-benjamin.tissoires@redhat.comSigned-off-by: NAlexei Starovoitov <ast@kernel.org>
-
- 18 8月, 2022 1 次提交
-
-
由 YiFei Zhu 提交于
The verifier cannot perform sufficient validation of any pointers passed into bpf_attr and treats them as integers rather than pointers. The helper will then read from arbitrary pointers passed into it. Restrict the helper to CAP_PERFMON since the security model in BPF of arbitrary kernel read is CAP_BPF + CAP_PERFMON. Fixes: af2ac3e1 ("bpf: Prepare bpf syscall to be used from kernel and user space.") Signed-off-by: NYiFei Zhu <zhuyifei@google.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NAlexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220816205517.682470-1-zhuyifei@google.com
-
- 11 8月, 2022 2 次提交
-
-
由 Alexei Starovoitov 提交于
Shut up this warning: kernel/bpf/syscall.c:5089:5: warning: no previous prototype for function 'kern_sys_bpf' [-Wmissing-prototypes] int kern_sys_bpf(int cmd, union bpf_attr *attr, unsigned int size) Reported-by: NJakub Kicinski <kuba@kernel.org> Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
-
由 Alexei Starovoitov 提交于
The verifier cannot perform sufficient validation of bpf_attr->test.ctx_in pointer, therefore bpf programs should not be allowed to call BPF_PROG_RUN command from within the program. To fix this issue split bpf_sys_bpf() bpf helper into normal kern_sys_bpf() kernel function that can only be used by the kernel light skeleton directly. Reported-by: NYiFei Zhu <zhuyifei@google.com> Fixes: b1d18a75 ("bpf: Extend sys_bpf commands for bpf_syscall programs.") Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
-
- 08 8月, 2022 1 次提交
-
-
由 Stanislav Fomichev 提交于
When attaching to program, the program itself might not be attached to anything (and, hence, might not have attach_btf), so we can't unconditionally use 'prog->aux->dst_prog->aux->attach_btf'. Instead, use bpf_prog_get_target_btf to pick proper target BTF: * when attached to dst_prog, use dst_prog->aux->btf * when attached to kernel btf, use prog->aux->attach_btf Fixes: b79c9fc9 ("bpf: implement BPF_PROG_QUERY for BPF_LSM_CGROUP") Signed-off-by: NStanislav Fomichev <sdf@google.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NHao Luo <haoluo@google.com> Acked-by: NMartin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20220804201140.1340684-1-sdf@google.com
-
- 13 7月, 2022 1 次提交
-
-
由 Roman Gushchin 提交于
The memory consumed by a bpf map is always accounted to the memory cgroup of the process which created the map. The map can outlive the memory cgroup if it's used by processes in other cgroups or is pinned on bpffs. In this case the map pins the original cgroup in the dying state. For other types of objects (slab objects, non-slab kernel allocations, percpu objects and recently LRU pages) there is a reparenting process implemented: on cgroup offlining charged objects are getting reassigned to the parent cgroup. Because all charges and statistics are fully recursive it's a fairly cheap operation. For efficiency and consistency with other types of objects, let's do the same for bpf maps. Fortunately thanks to the objcg API, the required changes are minimal. Please, note that individual allocations (slabs, percpu and large kmallocs) already have the reparenting mechanism. This commit adds it to the saved map->memcg pointer by replacing it to map->objcg. Because dying cgroups are not visible for a user and all charges are recursive, this commit doesn't bring any behavior changes for a user. v2: added a missing const qualifier Signed-off-by: NRoman Gushchin <roman.gushchin@linux.dev> Reviewed-by: NShakeel Butt <shakeelb@google.com> Link: https://lore.kernel.org/r/20220711162827.184743-1-roman.gushchin@linux.devSigned-off-by: NAlexei Starovoitov <ast@kernel.org>
-
- 30 6月, 2022 2 次提交
-
-
由 Stanislav Fomichev 提交于
We have two options: 1. Treat all BPF_LSM_CGROUP the same, regardless of attach_btf_id 2. Treat BPF_LSM_CGROUP+attach_btf_id as a separate hook point I was doing (2) in the original patch, but switching to (1) here: * bpf_prog_query returns all attached BPF_LSM_CGROUP programs regardless of attach_btf_id * attach_btf_id is exported via bpf_prog_info Reviewed-by: NMartin KaFai Lau <kafai@fb.com> Signed-off-by: NStanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/r/20220628174314.1216643-6-sdf@google.comSigned-off-by: NAlexei Starovoitov <ast@kernel.org>
-
由 Stanislav Fomichev 提交于
Allow attaching to lsm hooks in the cgroup context. Attaching to per-cgroup LSM works exactly like attaching to other per-cgroup hooks. New BPF_LSM_CGROUP is added to trigger new mode; the actual lsm hook we attach to is signaled via existing attach_btf_id. For the hooks that have 'struct socket' or 'struct sock' as its first argument, we use the cgroup associated with that socket. For the rest, we use 'current' cgroup (this is all on default hierarchy == v2 only). Note that for some hooks that work on 'struct sock' we still take the cgroup from 'current' because some of them work on the socket that hasn't been properly initialized yet. Behind the scenes, we allocate a shim program that is attached to the trampoline and runs cgroup effective BPF programs array. This shim has some rudimentary ref counting and can be shared between several programs attaching to the same lsm hook from different cgroups. Note that this patch bloats cgroup size because we add 211 cgroup_bpf_attach_type(s) for simplicity sake. This will be addressed in the subsequent patch. Also note that we only add non-sleepable flavor for now. To enable sleepable use-cases, bpf_prog_run_array_cg has to grab trace rcu, shim programs have to be freed via trace rcu, cgroup_bpf.effective should be also trace-rcu-managed + maybe some other changes that I'm not aware of. Reviewed-by: NMartin KaFai Lau <kafai@fb.com> Signed-off-by: NStanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/r/20220628174314.1216643-4-sdf@google.comSigned-off-by: NAlexei Starovoitov <ast@kernel.org>
-
- 17 6月, 2022 1 次提交
-
-
由 Joanne Koong 提交于
This patch does two things: 1) Marks the dynptr bpf_func_proto structs that were added in [1] as static, as pointed out by the kernel test robot in [2]. 2) There are some bpf_func_proto structs marked as extern which can instead be statically defined. [1] https://lore.kernel.org/bpf/20220523210712.3641569-1-joannelkoong@gmail.com/ [2] https://lore.kernel.org/bpf/62ab89f2.Pko7sI08RAKdF8R6%25lkp@intel.com/Reported-by: Nkernel test robot <lkp@intel.com> Signed-off-by: NJoanne Koong <joannelkoong@gmail.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20220616225407.1878436-1-joannelkoong@gmail.com
-
- 03 6月, 2022 1 次提交
-
-
由 Pu Lehui 提交于
We found that 32-bit environment can not print BPF line info due to a data inconsistency between jited_ksyms[0] and jited_linfo[0]. For example: jited_kyms[0] = 0xb800067c, jited_linfo[0] = 0xffffffffb800067c We know that both of them store BPF func address, but due to the different data extension operations when extended to u64, they may not be the same. We need to unify the data extension operations of them. Signed-off-by: NPu Lehui <pulehui@huawei.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Signed-off-by: NAndrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/CAEf4BzZ-eDcdJZgJ+Np7Y=V-TVjDDvOMqPwzKjyWrh=i5juv4w@mail.gmail.com Link: https://lore.kernel.org/bpf/20220530092815.1112406-2-pulehui@huawei.com
-
- 21 5月, 2022 1 次提交
-
-
由 Alan Maguire 提交于
With unprivileged BPF disabled, all cmds associated with the BPF syscall are blocked to users without CAP_BPF/CAP_SYS_ADMIN. However there are use cases where we may wish to allow interactions with BPF programs without being able to load and attach them. So for example, a process with required capabilities loads/attaches a BPF program, and a process with less capabilities interacts with it; retrieving perf/ring buffer events, modifying map-specified config etc. With all BPF syscall commands blocked as a result of unprivileged BPF being disabled, this mode of interaction becomes impossible for processes without CAP_BPF. As Alexei notes "The bpf ACL model is the same as traditional file's ACL. The creds and ACLs are checked at open(). Then during file's write/read additional checks might be performed. BPF has such functionality already. Different map_creates have capability checks while map_lookup has: map_get_sys_perms(map, f) & FMODE_CAN_READ. In other words it's enough to gate FD-receiving parts of bpf with unprivileged_bpf_disabled sysctl. The rest is handled by availability of FD and access to files in bpffs." So key fd creation syscall commands BPF_PROG_LOAD and BPF_MAP_CREATE are blocked with unprivileged BPF disabled and no CAP_BPF. And as Alexei notes, map creation with unprivileged BPF disabled off blocks creation of maps aside from array, hash and ringbuf maps. Programs responsible for loading and attaching the BPF program can still control access to its pinned representation by restricting permissions on the pin path, as with normal files. Signed-off-by: NAlan Maguire <alan.maguire@oracle.com> Acked-by: NYonghong Song <yhs@fb.com> Acked-by: NShung-Hsi Yu <shung-hsi.yu@suse.com> Acked-by: NKP Singh <kpsingh@kernel.org> Link: https://lore.kernel.org/r/1652970334-30510-2-git-send-email-alan.maguire@oracle.comSigned-off-by: NAlexei Starovoitov <ast@kernel.org>
-
- 11 5月, 2022 4 次提交
-
-
由 Kui-Feng Lee 提交于
Pass a cookie along with BPF_LINK_CREATE requests. Add a bpf_cookie field to struct bpf_tracing_link to attach a cookie. The cookie of a bpf_tracing_link is available by calling bpf_get_attach_cookie when running the BPF program of the attached link. The value of a cookie will be set at bpf_tramp_run_ctx by the trampoline of the link. Signed-off-by: NKui-Feng Lee <kuifeng@fb.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NAndrii Nakryiko <andrii@kernel.org> Acked-by: NAndrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220510205923.3206889-4-kuifeng@fb.com
-
由 Kui-Feng Lee 提交于
BPF trampolines will create a bpf_tramp_run_ctx, a bpf_run_ctx, on stacks and set/reset the current bpf_run_ctx before/after calling a bpf_prog. Signed-off-by: NKui-Feng Lee <kuifeng@fb.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NAndrii Nakryiko <andrii@kernel.org> Acked-by: NAndrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220510205923.3206889-3-kuifeng@fb.com
-
由 Kui-Feng Lee 提交于
Replace struct bpf_tramp_progs with struct bpf_tramp_links to collect struct bpf_tramp_link(s) for a trampoline. struct bpf_tramp_link extends bpf_link to act as a linked list node. arch_prepare_bpf_trampoline() accepts a struct bpf_tramp_links to collects all bpf_tramp_link(s) that a trampoline should call. Change BPF trampoline and bpf_struct_ops to pass bpf_tramp_links instead of bpf_tramp_progs. Signed-off-by: NKui-Feng Lee <kuifeng@fb.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NAndrii Nakryiko <andrii@kernel.org> Acked-by: NAndrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220510205923.3206889-2-kuifeng@fb.com
-
由 Dmitrii Dolgov 提交于
Implement bpf_link iterator to traverse links via bpf_seq_file operations. The changeset is mostly shamelessly copied from commit a228a64f ("bpf: Add bpf_prog iterator") Signed-off-by: NDmitrii Dolgov <9erthalion6@gmail.com> Acked-by: NYonghong Song <yhs@fb.com> Link: https://lore.kernel.org/r/20220510155233.9815-2-9erthalion6@gmail.comSigned-off-by: NAlexei Starovoitov <ast@kernel.org>
-
- 26 4月, 2022 3 次提交
-
-
由 Kumar Kartikeya Dwivedi 提交于
A destructor kfunc can be defined as void func(type *), where type may be void or any other pointer type as per convenience. In this patch, we ensure that the type is sane and capture the function pointer into off_desc of ptr_off_tab for the specific pointer offset, with the invariant that the dtor pointer is always set when 'kptr_ref' tag is applied to the pointer's pointee type, which is indicated by the flag BPF_MAP_VALUE_OFF_F_REF. Note that only BTF IDs whose destructor kfunc is registered, thus become the allowed BTF IDs for embedding as referenced kptr. Hence it serves the purpose of finding dtor kfunc BTF ID, as well acting as a check against the whitelist of allowed BTF IDs for this purpose. Finally, wire up the actual freeing of the referenced pointer if any at all available offsets, so that no references are leaked after the BPF map goes away and the BPF program previously moved the ownership a referenced pointer into it. The behavior is similar to BPF timers, where bpf_map_{update,delete}_elem will free any existing referenced kptr. The same case is with LRU map's bpf_lru_push_free/htab_lru_push_free functions, which are extended to reset unreferenced and free referenced kptr. Note that unlike BPF timers, kptr is not reset or freed when map uref drops to zero. Signed-off-by: NKumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220424214901.2743946-8-memxor@gmail.com -
由 Kumar Kartikeya Dwivedi 提交于
Since now there might be at most 10 offsets that need handling in copy_map_value, the manual shuffling and special case is no longer going to work. Hence, let's generalise the copy_map_value function by using a sorted array of offsets to skip regions that must be avoided while copying into and out of a map value. When the map is created, we populate the offset array in struct map, Then, copy_map_value uses this sorted offset array is used to memcpy while skipping timer, spin lock, and kptr. The array is allocated as in most cases none of these special fields would be present in map value, hence we can save on space for the common case by not embedding the entire object inside bpf_map struct. Signed-off-by: NKumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220424214901.2743946-6-memxor@gmail.com
-
由 Kumar Kartikeya Dwivedi 提交于
This commit introduces a new pointer type 'kptr' which can be embedded in a map value to hold a PTR_TO_BTF_ID stored by a BPF program during its invocation. When storing such a kptr, BPF program's PTR_TO_BTF_ID register must have the same type as in the map value's BTF, and loading a kptr marks the destination register as PTR_TO_BTF_ID with the correct kernel BTF and BTF ID. Such kptr are unreferenced, i.e. by the time another invocation of the BPF program loads this pointer, the object which the pointer points to may not longer exist. Since PTR_TO_BTF_ID loads (using BPF_LDX) are patched to PROBE_MEM loads by the verifier, it would safe to allow user to still access such invalid pointer, but passing such pointers into BPF helpers and kfuncs should not be permitted. A future patch in this series will close this gap. The flexibility offered by allowing programs to dereference such invalid pointers while being safe at runtime frees the verifier from doing complex lifetime tracking. As long as the user may ensure that the object remains valid, it can ensure data read by it from the kernel object is valid. The user indicates that a certain pointer must be treated as kptr capable of accepting stores of PTR_TO_BTF_ID of a certain type, by using a BTF type tag 'kptr' on the pointed to type of the pointer. Then, this information is recorded in the object BTF which will be passed into the kernel by way of map's BTF information. The name and kind from the map value BTF is used to look up the in-kernel type, and the actual BTF and BTF ID is recorded in the map struct in a new kptr_off_tab member. For now, only storing pointers to structs is permitted. An example of this specification is shown below: #define __kptr __attribute__((btf_type_tag("kptr"))) struct map_value { ... struct task_struct __kptr *task; ... }; Then, in a BPF program, user may store PTR_TO_BTF_ID with the type task_struct into the map, and then load it later. Note that the destination register is marked PTR_TO_BTF_ID_OR_NULL, as the verifier cannot know whether the value is NULL or not statically, it must treat all potential loads at that map value offset as loading a possibly NULL pointer. Only BPF_LDX, BPF_STX, and BPF_ST (with insn->imm = 0 to denote NULL) are allowed instructions that can access such a pointer. On BPF_LDX, the destination register is updated to be a PTR_TO_BTF_ID, and on BPF_STX, it is checked whether the source register type is a PTR_TO_BTF_ID with same BTF type as specified in the map BTF. The access size must always be BPF_DW. For the map in map support, the kptr_off_tab for outer map is copied from the inner map's kptr_off_tab. It was chosen to do a deep copy instead of introducing a refcount to kptr_off_tab, because the copy only needs to be done when paramterizing using inner_map_fd in the map in map case, hence would be unnecessary for all other users. It is not permitted to use MAP_FREEZE command and mmap for BPF map having kptrs, similar to the bpf_timer case. A kptr also requires that BPF program has both read and write access to the map (hence both BPF_F_RDONLY_PROG and BPF_F_WRONLY_PROG are disallowed). Note that check_map_access must be called from both check_helper_mem_access and for the BPF instructions, hence the kptr check must distinguish between ACCESS_DIRECT and ACCESS_HELPER, and reject ACCESS_HELPER cases. We rename stack_access_src to bpf_access_src and reuse it for this purpose. Signed-off-by: NKumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220424214901.2743946-2-memxor@gmail.com
-
- 23 4月, 2022 1 次提交
-
-
由 Andrii Nakryiko 提交于
Allow attaching BTF-aware TRACING programs, previously attachable only through BPF_RAW_TRACEPOINT_OPEN command, through LINK_CREATE command: - BTF-aware raw tracepoints (tp_btf in libbpf lingo); - fentry/fexit/fmod_ret programs; - BPF LSM programs. This change converges all bpf_link-based attachments under LINK_CREATE command allowing to further extend the API with features like BPF cookie under "multiplexed" link_create section of bpf_attr. Non-BTF-aware raw tracepoints are left under BPF_RAW_TRACEPOINT_OPEN, but there is nothing preventing opening them up to LINK_CREATE as well. Signed-off-by: NAndrii Nakryiko <andrii@kernel.org> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Reviewed-by: NKuifeng Lee <kuifeng@fb.com> Link: https://lore.kernel.org/bpf/20220421033945.3602803-2-andrii@kernel.org
-
- 14 4月, 2022 1 次提交
-
-
由 Yan Zhu 提交于
We're moving sysctls out of kernel/sysctl.c as it is a mess. We already moved all filesystem sysctls out. And with time the goal is to move all sysctls out to their own subsystem/actual user. kernel/sysctl.c has grown to an insane mess and its easy to run into conflicts with it. The effort to move them out into various subsystems is part of this. Signed-off-by: NYan Zhu <zhuyan34@huawei.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Cc: Luis Chamberlain <mcgrof@kernel.org> Link: https://lore.kernel.org/bpf/20220407070759.29506-1-zhuyan34@huawei.com
-
- 18 3月, 2022 2 次提交
-
-
由 Jiri Olsa 提交于
Adding support to call bpf_get_attach_cookie helper from kprobe programs attached with kprobe multi link. The cookie is provided by array of u64 values, where each value is paired with provided function address or symbol with the same array index. When cookie array is provided it's sorted together with addresses (check bpf_kprobe_multi_cookie_swap). This way we can find cookie based on the address in bpf_get_attach_cookie helper. Suggested-by: NAndrii Nakryiko <andrii@kernel.org> Signed-off-by: NJiri Olsa <jolsa@kernel.org> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220316122419.933957-7-jolsa@kernel.org
-
由 Jiri Olsa 提交于
Adding new link type BPF_LINK_TYPE_KPROBE_MULTI that attaches kprobe program through fprobe API. The fprobe API allows to attach probe on multiple functions at once very fast, because it works on top of ftrace. On the other hand this limits the probe point to the function entry or return. The kprobe program gets the same pt_regs input ctx as when it's attached through the perf API. Adding new attach type BPF_TRACE_KPROBE_MULTI that allows attachment kprobe to multiple function with new link. User provides array of addresses or symbols with count to attach the kprobe program to. The new link_create uapi interface looks like: struct { __u32 flags; __u32 cnt; __aligned_u64 syms; __aligned_u64 addrs; } kprobe_multi; The flags field allows single BPF_TRACE_KPROBE_MULTI bit to create return multi kprobe. Signed-off-by: NMasami Hiramatsu <mhiramat@kernel.org> Signed-off-by: NJiri Olsa <jolsa@kernel.org> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Acked-by: NAndrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220316122419.933957-4-jolsa@kernel.org
-
- 10 3月, 2022 1 次提交
-
-
由 Toke Høiland-Jørgensen 提交于
This adds support for running XDP programs through BPF_PROG_RUN in a mode that enables live packet processing of the resulting frames. Previous uses of BPF_PROG_RUN for XDP returned the XDP program return code and the modified packet data to userspace, which is useful for unit testing of XDP programs. The existing BPF_PROG_RUN for XDP allows userspace to set the ingress ifindex and RXQ number as part of the context object being passed to the kernel. This patch reuses that code, but adds a new mode with different semantics, which can be selected with the new BPF_F_TEST_XDP_LIVE_FRAMES flag. When running BPF_PROG_RUN in this mode, the XDP program return codes will be honoured: returning XDP_PASS will result in the frame being injected into the networking stack as if it came from the selected networking interface, while returning XDP_TX and XDP_REDIRECT will result in the frame being transmitted out that interface. XDP_TX is translated into an XDP_REDIRECT operation to the same interface, since the real XDP_TX action is only possible from within the network drivers themselves, not from the process context where BPF_PROG_RUN is executed. Internally, this new mode of operation creates a page pool instance while setting up the test run, and feeds pages from that into the XDP program. The setup cost of this is amortised over the number of repetitions specified by userspace. To support the performance testing use case, we further optimise the setup step so that all pages in the pool are pre-initialised with the packet data, and pre-computed context and xdp_frame objects stored at the start of each page. This makes it possible to entirely avoid touching the page content on each XDP program invocation, and enables sending up to 9 Mpps/core on my test box. Because the data pages are recycled by the page pool, and the test runner doesn't re-initialise them for each run, subsequent invocations of the XDP program will see the packet data in the state it was after the last time it ran on that particular page. This means that an XDP program that modifies the packet before redirecting it has to be careful about which assumptions it makes about the packet content, but that is only an issue for the most naively written programs. Enabling the new flag is only allowed when not setting ctx_out and data_out in the test specification, since using it means frames will be redirected somewhere else, so they can't be returned. Signed-off-by: NToke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Acked-by: NMartin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20220309105346.100053-2-toke@redhat.com
-
- 24 2月, 2022 1 次提交
-
-
由 Tom Rix 提交于
Add leading space to spdx tag Use // for spdx c file comment Replacements resereved to reserved inbetween to in between everytime to every time intutivie to intuitive currenct to current encontered to encountered referenceing to referencing upto to up to exectuted to executed Signed-off-by: NTom Rix <trix@redhat.com> Signed-off-by: NAndrii Nakryiko <andrii@kernel.org> Acked-by: NSong Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20220220184055.3608317-1-trix@redhat.com
-
- 19 2月, 2022 1 次提交
-
-
由 Eric Dumazet 提交于
As stated in the comment found in maybe_wait_bpf_programs(), the synchronize_rcu() barrier is only needed before returning to userspace, not after each deletion in the batch. Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Reviewed-by: NStanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/bpf/20220218181801.2971275-1-eric.dumazet@gmail.com
-
- 18 2月, 2022 1 次提交
-
-
由 Eric Dumazet 提交于
syzbot reported various soft lockups caused by bpf batch operations. INFO: task kworker/1:1:27 blocked for more than 140 seconds. INFO: task hung in rcu_barrier Nothing prevents batch ops to process huge amount of data, we need to add schedule points in them. Note that maybe_wait_bpf_programs(map) calls from generic_map_delete_batch() can be factorized by moving the call after the loop. This will be done later in -next tree once we get this fix merged, unless there is strong opinion doing this optimization sooner. Fixes: aa2e93b8 ("bpf: Add generic support for update and delete batch ops") Fixes: cb4d03ab ("bpf: Add generic support for lookup batch op") Reported-by: Nsyzbot <syzkaller@googlegroups.com> Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Reviewed-by: NStanislav Fomichev <sdf@google.com> Acked-by: NBrian Vazquez <brianvv@google.com> Link: https://lore.kernel.org/bpf/20220217181902.808742-1-eric.dumazet@gmail.com
-
- 11 2月, 2022 2 次提交
-
-
由 Alexei Starovoitov 提交于
The main change is a move of the single line #include "iterators.lskel.h" from iterators/iterators.c to bpf_preload_kern.c. Which means that generated light skeleton can be used from user space or user mode driver like iterators.c or from the kernel module or the kernel itself. The direct use of light skeleton from the kernel module simplifies the code, since UMD is no longer necessary. The libbpf.a required user space and UMD. The CO-RE in the kernel and generated "loader bpf program" used by the light skeleton are capable to perform complex loading operations traditionally provided by libbpf. In addition UMD approach was launching UMD process every time bpffs has to be mounted. With light skeleton in the kernel the bpf_preload kernel module loads bpf iterators once and pins them multiple times into different bpffs mounts. Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NYonghong Song <yhs@fb.com> Acked-by: NAndrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220209232001.27490-6-alexei.starovoitov@gmail.com
-
由 Alexei Starovoitov 提交于
bpf_sycall programs can be used directly by the kernel modules to load programs and create maps via kernel skeleton. . Export bpf_sys_bpf syscall wrapper to be used in kernel skeleton. . Export bpf_map_get to be used in kernel skeleton. . Allow prog_run cmd for bpf_syscall programs with recursion check. . Enable link_create and raw_tp_open cmds. Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NYonghong Song <yhs@fb.com> Acked-by: NAndrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220209232001.27490-2-alexei.starovoitov@gmail.com
-
- 22 1月, 2022 2 次提交
-
-
由 Toke Hoiland-Jorgensen 提交于
The check for tail call map compatibility ensures that tail calls only happen between maps of the same type. To ensure backwards compatibility for XDP frags we need a similar type of check for cpumap and devmap programs, so move the state from bpf_array_aux into bpf_map, add xdp_has_frags to the check, and apply the same check to cpumap and devmap. Acked-by: NJohn Fastabend <john.fastabend@gmail.com> Co-developed-by: NLorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: NLorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: NToke Hoiland-Jorgensen <toke@redhat.com> Link: https://lore.kernel.org/r/f19fd97c0328a39927f3ad03e1ca6b43fd53cdfd.1642758637.git.lorenzo@kernel.orgSigned-off-by: NAlexei Starovoitov <ast@kernel.org>
-
由 Lorenzo Bianconi 提交于
Introduce BPF_F_XDP_HAS_FRAGS and the related field in bpf_prog_aux in order to notify the driver the loaded program support xdp frags. Acked-by: NToke Hoiland-Jorgensen <toke@redhat.com> Acked-by: NJohn Fastabend <john.fastabend@gmail.com> Signed-off-by: NLorenzo Bianconi <lorenzo@kernel.org> Link: https://lore.kernel.org/r/db2e8075b7032a356003f407d1b0deb99adaa0ed.1642758637.git.lorenzo@kernel.orgSigned-off-by: NAlexei Starovoitov <ast@kernel.org>
-