- 28 12月, 2017 2 次提交
-
-
由 Alexei Starovoitov 提交于
fix off by one error in max call depth check and add a test Fixes: f4d7e40a ("bpf: introduce function calls (verification)") Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
由 Alexei Starovoitov 提交于
Instead of computing max stack depth for current call chain during the main verifier pass track stack depth of each function independently and after do_check() is done do another pass over all instructions analyzing depth of all possible call stacks. Fixes: f4d7e40a ("bpf: introduce function calls (verification)") Reported-by: NJann Horn <jannh@google.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
- 24 12月, 2017 1 次提交
-
-
由 Gianluca Borello 提交于
Commit cc2b14d5 ("bpf: teach verifier to recognize zero initialized stack") introduced a very relaxed check when comparing stacks of different states, effectively returning a positive result in many cases where it shouldn't. This can create problems in cases such as this following C pseudocode: long var; long *x = bpf_map_lookup(...); if (!x) return; if (*x != 0xbeef) var = 0; else var = 1; /* This is the key part, calling a helper causes an explored state * to be saved with the information that "var" is on the stack as * STACK_ZERO, since the helper is first met by the verifier after * the "var = 0" assignment. This state will however be wrongly used * also for the "var = 1" case, so the verifier assumes "var" is always * 0 and will replace the NULL assignment with nops, because the * search pruning prevents it from exploring the faulty branch. */ bpf_ktime_get_ns(); if (var) *(long *)0 = 0xbeef; Fix the issue by making sure that the stack is fully explored before returning a positive comparison result. Also attach a couple tests that highlight the bad behavior. In the first test, without this fix instructions 16 and 17 are replaced with nops instead of being rejected by the verifier. The second test, instead, allows a program to make a potentially illegal read from the stack. Fixes: cc2b14d5 ("bpf: teach verifier to recognize zero initialized stack") Signed-off-by: NGianluca Borello <g.borello@gmail.com> Acked-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
-
- 21 12月, 2017 11 次提交
-
-
由 Daniel Borkmann 提交于
Currently a dump of an xlated prog (post verifier stage) doesn't correlate used helpers as well as maps. The prog info lists involved map ids, however there's no correlation of where in the program they are used as of today. Likewise, bpftool does not correlate helper calls with the target functions. The latter can be done w/o any kernel changes through kallsyms, and also has the advantage that this works with inlined helpers and BPF calls. Example, via interpreter: # tc filter show dev foo ingress filter protocol all pref 49152 bpf chain 0 filter protocol all pref 49152 bpf chain 0 handle 0x1 foo.o:[ingress] \ direct-action not_in_hw id 1 tag c74773051b364165 <-- prog id:1 * Output before patch (calls/maps remain unclear): # bpftool prog dump xlated id 1 <-- dump prog id:1 0: (b7) r1 = 2 1: (63) *(u32 *)(r10 -4) = r1 2: (bf) r2 = r10 3: (07) r2 += -4 4: (18) r1 = 0xffff95c47a8d4800 6: (85) call unknown#73040 7: (15) if r0 == 0x0 goto pc+18 8: (bf) r2 = r10 9: (07) r2 += -4 10: (bf) r1 = r0 11: (85) call unknown#73040 12: (15) if r0 == 0x0 goto pc+23 [...] * Output after patch: # bpftool prog dump xlated id 1 0: (b7) r1 = 2 1: (63) *(u32 *)(r10 -4) = r1 2: (bf) r2 = r10 3: (07) r2 += -4 4: (18) r1 = map[id:2] <-- map id:2 6: (85) call bpf_map_lookup_elem#73424 <-- helper call 7: (15) if r0 == 0x0 goto pc+18 8: (bf) r2 = r10 9: (07) r2 += -4 10: (bf) r1 = r0 11: (85) call bpf_map_lookup_elem#73424 12: (15) if r0 == 0x0 goto pc+23 [...] # bpftool map show id 2 <-- show/dump/etc map id:2 2: hash_of_maps flags 0x0 key 4B value 4B max_entries 3 memlock 4096B Example, JITed, same prog: # tc filter show dev foo ingress filter protocol all pref 49152 bpf chain 0 filter protocol all pref 49152 bpf chain 0 handle 0x1 foo.o:[ingress] \ direct-action not_in_hw id 3 tag c74773051b364165 jited # bpftool prog show id 3 3: sched_cls tag c74773051b364165 loaded_at Dec 19/13:48 uid 0 xlated 384B jited 257B memlock 4096B map_ids 2 # bpftool prog dump xlated id 3 0: (b7) r1 = 2 1: (63) *(u32 *)(r10 -4) = r1 2: (bf) r2 = r10 3: (07) r2 += -4 4: (18) r1 = map[id:2] <-- map id:2 6: (85) call __htab_map_lookup_elem#77408 <-+ inlined rewrite 7: (15) if r0 == 0x0 goto pc+2 | 8: (07) r0 += 56 | 9: (79) r0 = *(u64 *)(r0 +0) <-+ 10: (15) if r0 == 0x0 goto pc+24 11: (bf) r2 = r10 12: (07) r2 += -4 [...] Example, same prog, but kallsyms disabled (in that case we are also not allowed to pass any relative offsets, etc, so prog becomes pointer sanitized on dump): # sysctl kernel.kptr_restrict=2 kernel.kptr_restrict = 2 # bpftool prog dump xlated id 3 0: (b7) r1 = 2 1: (63) *(u32 *)(r10 -4) = r1 2: (bf) r2 = r10 3: (07) r2 += -4 4: (18) r1 = map[id:2] 6: (85) call bpf_unspec#0 7: (15) if r0 == 0x0 goto pc+2 [...] Example, BPF calls via interpreter: # bpftool prog dump xlated id 1 0: (85) call pc+2#__bpf_prog_run_args32 1: (b7) r0 = 1 2: (95) exit 3: (b7) r0 = 2 4: (95) exit Example, BPF calls via JIT: # sysctl net.core.bpf_jit_enable=1 net.core.bpf_jit_enable = 1 # sysctl net.core.bpf_jit_kallsyms=1 net.core.bpf_jit_kallsyms = 1 # bpftool prog dump xlated id 1 0: (85) call pc+2#bpf_prog_3b185187f1855c4c_F 1: (b7) r0 = 1 2: (95) exit 3: (b7) r0 = 2 4: (95) exit And finally, an example for tail calls that is now working as well wrt correlation: # bpftool prog dump xlated id 2 [...] 10: (b7) r2 = 8 11: (85) call bpf_trace_printk#-41312 12: (bf) r1 = r6 13: (18) r2 = map[id:1] 15: (b7) r3 = 0 16: (85) call bpf_tail_call#12 17: (b7) r1 = 42 18: (6b) *(u16 *)(r6 +46) = r1 19: (b7) r0 = 0 20: (95) exit # bpftool map show id 1 1: prog_array flags 0x0 key 4B value 4B max_entries 1 memlock 4096B Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> -
由 Daniel Borkmann 提交于
Right now kallsyms handling is not working with JITed subprogs. The reason is that when in 1c2a088a ("bpf: x64: add JIT support for multi-function programs") in jit_subprogs() they are passed to bpf_prog_kallsyms_add(), then their prog type is 0, which BPF core will think it's a cBPF program as only cBPF programs have a 0 type. Thus, they need to inherit the type from the main prog. Once that is fixed, they are indeed added to the BPF kallsyms infra, but their tag is 0. Therefore, since intention is to add them as bpf_prog_F_<tag>, we need to pass them to bpf_prog_calc_tag() first. And once this is resolved, there is a use-after-free on prog cleanup: we remove the kallsyms entry from the main prog, later walk all subprogs and call bpf_jit_free() on them. However, the kallsyms linkage was never released on them. Thus, do that for all subprogs right in __bpf_prog_put() when refcount hits 0. Fixes: 1c2a088a ("bpf: x64: add JIT support for multi-function programs") Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
-
由 Alexei Starovoitov 提交于
Do not allow root to convert valid pointers into unknown scalars. In particular disallow: ptr &= reg ptr <<= reg ptr += ptr and explicitly allow: ptr -= ptr since pkt_end - pkt == length 1. This minimizes amount of address leaks root can do. In the future may need to further tighten the leaks with kptr_restrict. 2. If program has such pointer math it's likely a user mistake and when verifier complains about it right away instead of many instructions later on invalid memory access it's easier for users to fix their progs. 3. when register holding a pointer cannot change to scalar it allows JITs to optimize better. Like 32-bit archs could use single register for pointers instead of a pair required to hold 64-bit scalars. 4. reduces architecture dependent behavior. Since code: r1 = r10; r1 &= 0xff; if (r1 ...) will behave differently arm64 vs x64 and offloaded vs native. A significant chunk of ptr mangling was allowed by commit f1174f77 ("bpf/verifier: rework value tracking") yet some of it was allowed even earlier. Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
由 Alexei Starovoitov 提交于
There were various issues related to the limited size of integers used in the verifier: - `off + size` overflow in __check_map_access() - `off + reg->off` overflow in check_mem_access() - `off + reg->var_off.value` overflow or 32-bit truncation of `reg->var_off.value` in check_mem_access() - 32-bit truncation in check_stack_boundary() Make sure that any integer math cannot overflow by not allowing pointer math with large values. Also reduce the scope of "scalar op scalar" tracking. Fixes: f1174f77 ("bpf/verifier: rework value tracking") Reported-by: NJann Horn <jannh@google.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
由 Jann Horn 提交于
This could be made safe by passing through a reference to env and checking for env->allow_ptr_leaks, but it would only work one way and is probably not worth the hassle - not doing it will not directly lead to program rejection. Fixes: f1174f77 ("bpf/verifier: rework value tracking") Signed-off-by: NJann Horn <jannh@google.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
由 Jann Horn 提交于
Force strict alignment checks for stack pointers because the tracking of stack spills relies on it; unaligned stack accesses can lead to corruption of spilled registers, which is exploitable. Fixes: f1174f77 ("bpf/verifier: rework value tracking") Signed-off-by: NJann Horn <jannh@google.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
由 Jann Horn 提交于
Prevent indirect stack accesses at non-constant addresses, which would permit reading and corrupting spilled pointers. Fixes: f1174f77 ("bpf/verifier: rework value tracking") Signed-off-by: NJann Horn <jannh@google.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
由 Jann Horn 提交于
32-bit ALU ops operate on 32-bit values and have 32-bit outputs. Adjust the verifier accordingly. Fixes: f1174f77 ("bpf/verifier: rework value tracking") Signed-off-by: NJann Horn <jannh@google.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
由 Jann Horn 提交于
Properly handle register truncation to a smaller size. The old code first mirrors the clearing of the high 32 bits in the bitwise tristate representation, which is correct. But then, it computes the new arithmetic bounds as the intersection between the old arithmetic bounds and the bounds resulting from the bitwise tristate representation. Therefore, when coerce_reg_to_32() is called on a number with bounds [0xffff'fff8, 0x1'0000'0007], the verifier computes [0xffff'fff8, 0xffff'ffff] as bounds of the truncated number. This is incorrect: The truncated number could also be in the range [0, 7], and no meaningful arithmetic bounds can be computed in that case apart from the obvious [0, 0xffff'ffff]. Starting with v4.14, this is exploitable by unprivileged users as long as the unprivileged_bpf_disabled sysctl isn't set. Debian assigned CVE-2017-16996 for this issue. v2: - flip the mask during arithmetic bounds calculation (Ben Hutchings) v3: - add CVE number (Ben Hutchings) Fixes: b03c9f9f ("bpf/verifier: track signed and unsigned min/max values") Signed-off-by: NJann Horn <jannh@google.com> Acked-by: NEdward Cree <ecree@solarflare.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
由 Jann Horn 提交于
Distinguish between BPF_ALU64|BPF_MOV|BPF_K (load 32-bit immediate, sign-extended to 64-bit) and BPF_ALU|BPF_MOV|BPF_K (load 32-bit immediate, zero-padded to 64-bit); only perform sign extension in the first case. Starting with v4.14, this is exploitable by unprivileged users as long as the unprivileged_bpf_disabled sysctl isn't set. Debian assigned CVE-2017-16995 for this issue. v3: - add CVE number (Ben Hutchings) Fixes: 48461135 ("bpf: allow access into map value arrays") Signed-off-by: NJann Horn <jannh@google.com> Acked-by: NEdward Cree <ecree@solarflare.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
由 Edward Cree 提交于
Incorrect signed bounds were being computed. If the old upper signed bound was positive and the old lower signed bound was negative, this could cause the new upper signed bound to be too low, leading to security issues. Fixes: b03c9f9f ("bpf/verifier: track signed and unsigned min/max values") Reported-by: NJann Horn <jannh@google.com> Signed-off-by: NEdward Cree <ecree@solarflare.com> Acked-by: NAlexei Starovoitov <ast@kernel.org> [jannh@google.com: changed description to reflect bug impact] Signed-off-by: NJann Horn <jannh@google.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
- 19 12月, 2017 3 次提交
-
-
由 Yonghong Song 提交于
The tools/testing/selftests/bpf test program test_dev_cgroup fails with the following error when compiled with llvm 6.0. (I did not try with earlier versions.) libbpf: load bpf program failed: Permission denied libbpf: -- BEGIN DUMP LOG --- libbpf: 0: (61) r2 = *(u32 *)(r1 +4) 1: (b7) r0 = 0 2: (55) if r2 != 0x1 goto pc+8 R0=inv0 R1=ctx(id=0,off=0,imm=0) R2=inv1 R10=fp0 3: (69) r2 = *(u16 *)(r1 +0) invalid bpf_context access off=0 size=2 ... The culprit is the following statement in dev_cgroup.c: short type = ctx->access_type & 0xFFFF; This code is typical as the ctx->access_type is assigned as below in kernel/bpf/cgroup.c: struct bpf_cgroup_dev_ctx ctx = { .access_type = (access << 16) | dev_type, .major = major, .minor = minor, }; The compiler converts it to u16 access while the verifier cgroup_dev_is_valid_access rejects any non u32 access. This patch permits the field access_type to be accessible with type u16 and u8 as well. Signed-off-by: NYonghong Song <yhs@fb.com> Tested-by: NRoman Gushchin <guro@fb.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> -
由 Colin Ian King 提交于
Function skip_callee is local to the source and does not need to be in global scope, so make it static. Also return NULL rather than 0. Cleans up two sparse warnings: symbol 'skip_callee' was not declared. Should it be static? Using plain integer as NULL pointer Signed-off-by: NColin Ian King <colin.king@canonical.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
由 Colin Ian King 提交于
Trivial fix to spelling mistake in error message text. Signed-off-by: NColin Ian King <colin.king@canonical.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
- 18 12月, 2017 7 次提交
-
-
由 Josef Bacik 提交于
Things got moved around between the original bpf_override_return patches and the final version, and now the ftrace kprobe dispatcher assumes if you modified the ip that you also enabled preemption. Make a comment of this and enable preemption, this fixes the lockdep splat that happened when using this feature. Fixes: 9802d865 ("bpf: add a bpf_override_function helper") Signed-off-by: NJosef Bacik <jbacik@fb.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
由 Alexei Starovoitov 提交于
Typical JIT does several passes over bpf instructions to compute total size and relative offsets of jumps and calls. With multitple bpf functions calling each other all relative calls will have invalid offsets intially therefore we need to additional last pass over the program to emit calls with correct offsets. For example in case of three bpf functions: main: call foo call bpf_map_lookup exit foo: call bar exit bar: exit We will call bpf_int_jit_compile() indepedently for main(), foo() and bar() x64 JIT typically does 4-5 passes to converge. After these initial passes the image for these 3 functions will be good except call targets, since start addresses of foo() and bar() are unknown when we were JITing main() (note that call bpf_map_lookup will be resolved properly during initial passes). Once start addresses of 3 functions are known we patch call_insn->imm to point to right functions and call bpf_int_jit_compile() again which needs only one pass. Additional safety checks are done to make sure this last pass doesn't produce image that is larger or smaller than previous pass. When constant blinding is on it's applied to all functions at the first pass, since doing it once again at the last pass can change size of the JITed code. Tested on x64 and arm64 hw with JIT on/off, blinding on/off. x64 jits bpf-to-bpf calls correctly while arm64 falls back to interpreter. All other JITs that support normal BPF_CALL will behave the same way since bpf-to-bpf call is equivalent to bpf-to-kernel call from JITs point of view. Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Acked-by: NDaniel Borkmann <daniel@iogearbox.net> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
由 Alexei Starovoitov 提交于
global bpf_jit_enable variable is tested multiple times in JITs, blinding and verifier core. The malicious root can try to toggle it while loading the programs. This race condition was accounted for and there should be no issues, but it's safer to avoid this race condition. Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Acked-by: NDaniel Borkmann <daniel@iogearbox.net> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
由 Alexei Starovoitov 提交于
though bpf_call is still the same call instruction and calling convention 'bpf to bpf' and 'bpf to helper' is the same the interpreter has to oparate on 'struct bpf_insn *'. To distinguish these two cases add a kernel internal opcode and mark call insns with it. This opcode is seen by interpreter only. JITs will never see it. Also add tiny bit of debug code to aid interpreter debugging. Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Acked-by: NDaniel Borkmann <daniel@iogearbox.net> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
由 Alexei Starovoitov 提交于
programs with function calls are often passing various pointers via stack. When all calls are inlined llvm flattens stack accesses and optimizes away extra branches. When functions are not inlined it becomes the job of the verifier to recognize zero initialized stack to avoid exploring paths that program will not take. The following program would fail otherwise: ptr = &buffer_on_stack; *ptr = 0; ... func_call(.., ptr, ...) { if (..) *ptr = bpf_map_lookup(); } ... if (*ptr != 0) { // Access (*ptr)->field is valid. // Without stack_zero tracking such (*ptr)->field access // will be rejected } since stack slots are no longer uniform invalid | spill | misc add liveness marking to all slots, but do it in 8 byte chunks. So if nothing was read or written in [fp-16, fp-9] range it will be marked as LIVE_NONE. If any byte in that range was read, it will be marked LIVE_READ and stacksafe() check will perform byte-by-byte verification. If all bytes in the range were written the slot will be marked as LIVE_WRITTEN. This significantly speeds up state equality comparison and reduces total number of states processed. before after bpf_lb-DLB_L3.o 2051 2003 bpf_lb-DLB_L4.o 3287 3164 bpf_lb-DUNKNOWN.o 1080 1080 bpf_lxc-DDROP_ALL.o 24980 12361 bpf_lxc-DUNKNOWN.o 34308 16605 bpf_netdev.o 15404 10962 bpf_overlay.o 7191 6679 Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Acked-by: NDaniel Borkmann <daniel@iogearbox.net> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> -
由 Alexei Starovoitov 提交于
Allow arbitrary function calls from bpf function to another bpf function. To recognize such set of bpf functions the verifier does: 1. runs control flow analysis to detect function boundaries 2. proceeds with verification of all functions starting from main(root) function It recognizes that the stack of the caller can be accessed by the callee (if the caller passed a pointer to its stack to the callee) and the callee can store map_value and other pointers into the stack of the caller. 3. keeps track of the stack_depth of each function to make sure that total stack depth is still less than 512 bytes 4. disallows pointers to the callee stack to be stored into the caller stack, since they will be invalid as soon as the callee returns 5. to reuse all of the existing state_pruning logic each function call is considered to be independent call from the verifier point of view. The verifier pretends to inline all function calls it sees are being called. It stores the callsite instruction index as part of the state to make sure that two calls to the same callee from two different places in the caller will be different from state pruning point of view 6. more safety checks are added to liveness analysis Implementation details: . struct bpf_verifier_state is now consists of all stack frames that led to this function . struct bpf_func_state represent one stack frame. It consists of registers in the given frame and its stack . propagate_liveness() logic had a premature optimization where mark_reg_read() and mark_stack_slot_read() were manually inlined with loop iterating over parents for each register or stack slot. Undo this optimization to reuse more complex mark_*_read() logic . skip_callee() logic is not necessary from safety point of view, but without it mark_*_read() markings become too conservative, since after returning from the funciton call a read of r6-r9 will incorrectly propagate the read marks into callee causing inefficient pruning later . mark_*_read() logic is now aware of control flow which makes it more complex. In the future the plan is to rewrite liveness to be hierarchical. So that liveness can be done within basic block only and control flow will be responsible for propagation of liveness information along cfg and between calls. . tail_calls and ld_abs insns are not allowed in the programs with bpf-to-bpf calls . returning stack pointers to the caller or storing them into stack frame of the caller is not allowed Testing: . no difference in cilium processed_insn numbers . large number of tests follows in next patches Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Acked-by: NJohn Fastabend <john.fastabend@gmail.com> Acked-by: NDaniel Borkmann <daniel@iogearbox.net> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
由 Alexei Starovoitov 提交于
Allow arbitrary function calls from bpf function to another bpf function. Since the beginning of bpf all bpf programs were represented as a single function and program authors were forced to use always_inline for all functions in their C code. That was causing llvm to unnecessary inflate the code size and forcing developers to move code to header files with little code reuse. With a bit of additional complexity teach verifier to recognize arbitrary function calls from one bpf function to another as long as all of functions are presented to the verifier as a single bpf program. New program layout: r6 = r1 // some code .. r1 = .. // arg1 r2 = .. // arg2 call pc+1 // function call pc-relative exit .. = r1 // access arg1 .. = r2 // access arg2 .. call pc+20 // second level of function call ... It allows for better optimized code and finally allows to introduce the core bpf libraries that can be reused in different projects, since programs are no longer limited by single elf file. With function calls bpf can be compiled into multiple .o files. This patch is the first step. It detects programs that contain multiple functions and checks that calls between them are valid. It splits the sequence of bpf instructions (one program) into a set of bpf functions that call each other. Calls to only known functions are allowed. In the future the verifier may allow calls to unresolved functions and will do dynamic linking. This logic supports statically linked bpf functions only. Such function boundary detection could have been done as part of control flow graph building in check_cfg(), but it's cleaner to separate function boundary detection vs control flow checks within a subprogram (function) into logically indepedent steps. Follow up patches may split check_cfg() further, but not check_subprogs(). Only allow bpf-to-bpf calls for root only and for non-hw-offloaded programs. These restrictions can be relaxed in the future. Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Acked-by: NDaniel Borkmann <daniel@iogearbox.net> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
- 17 12月, 2017 1 次提交
-
-
由 Will Deacon 提交于
[ Note, this is a Git cherry-pick of the following commit: 506458ef ("locking/barriers: Convert users of lockless_dereference() to READ_ONCE()") ... for easier x86 PTI code testing and back-porting. ] READ_ONCE() now has an implicit smp_read_barrier_depends() call, so it can be used instead of lockless_dereference() without any change in semantics. Signed-off-by: NWill Deacon <will.deacon@arm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1508840570-22169-4-git-send-email-will.deacon@arm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 16 12月, 2017 1 次提交
-
-
由 Daniel Borkmann 提交于
Some JITs don't cache skb context on stack in prologue, so when LD_ABS/IND is used and helper calls yield bpf_helper_changes_pkt_data() as true, then they temporarily save/restore skb pointer. However, the assumption that skb always has to be in r1 is a bit of a gamble. Right now it turned out to be true for all helpers listed in bpf_helper_changes_pkt_data(), but lets enforce that from verifier side, so that we make this a guarantee and bail out if the func proto is misconfigured in future helpers. In case of BPF helper calls from cBPF, bpf_helper_changes_pkt_data() is completely unrelevant here (since cBPF is context read-only) and therefore always false. Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
-
- 15 12月, 2017 6 次提交
-
-
由 Steven Rostedt 提交于
Daniel Wagner reported a crash on the BeagleBone Black SoC. This is a single CPU architecture, and does not have a functional arch_send_call_function_single_ipi() implementation which can crash the kernel if that is called. As it only has one CPU, it shouldn't be called, but if the kernel is compiled for SMP, the push/pull RT scheduling logic now calls it for irq_work if the one CPU is overloaded, it can use that function to call itself and crash the kernel. Ideally, we should disable the SCHED_FEAT(RT_PUSH_IPI) if the system only has a single CPU. But SCHED_FEAT is a constant if sched debugging is turned off. Another fix can also be used, and this should also help with normal SMP machines. That is, do not initiate the pull code if there's only one RT overloaded CPU, and that CPU happens to be the current CPU that is scheduling in a lower priority task. Even on a system with many CPUs, if there's many RT tasks waiting to run on a single CPU, and that CPU schedules in another RT task of lower priority, it will initiate the PULL logic in case there's a higher priority RT task on another CPU that is waiting to run. But if there is no other CPU with waiting RT tasks, it will initiate the RT pull logic on itself (as it still has RT tasks waiting to run). This is a wasted effort. Not only does this help with SMP code where the current CPU is the only one with RT overloaded tasks, it should also solve the issue that Daniel encountered, because it will prevent the PULL logic from executing, as there's only one CPU on the system, and the check added here will cause it to exit the RT pull code. Reported-by: NDaniel Wagner <wagi@monom.org> Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org> Acked-by: NPeter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-rt-users <linux-rt-users@vger.kernel.org> Cc: stable@vger.kernel.org Fixes: 4bdced5c ("sched/rt: Simplify the IPI based RT balancing logic") Link: http://lkml.kernel.org/r/20171202130454.4cbbfe8d@vmware.local.homeSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Thomas Gleixner 提交于
timer_create() specifies via sigevent->sigev_notify the signal delivery for the new timer. The valid modes are SIGEV_NONE, SIGEV_SIGNAL, SIGEV_THREAD and (SIGEV_SIGNAL | SIGEV_THREAD_ID). The sanity check in good_sigevent() is only checking the valid combination for the SIGEV_THREAD_ID bit, i.e. SIGEV_SIGNAL, but if SIGEV_THREAD_ID is not set it accepts any random value. This has no real effects on the posix timer and signal delivery code, but it affects show_timer() which handles the output of /proc/$PID/timers. That function uses a string array to pretty print sigev_notify. The access to that array has no bound checks, so random sigev_notify cause access beyond the array bounds. Add proper checks for the valid notify modes and remove the SIGEV_THREAD_ID masking from various code pathes as SIGEV_NONE can never be set in combination with SIGEV_THREAD_ID. Reported-by: NEric Biggers <ebiggers3@gmail.com> Reported-by: NDmitry Vyukov <dvyukov@google.com> Reported-by: NAlexey Dobriyan <adobriyan@gmail.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: John Stultz <john.stultz@linaro.org> Cc: stable@vger.kernel.org
-
由 Steven Rostedt (VMware) 提交于
The stack tracer records a stack dump whenever it sees a stack usage that is more than what it ever saw before. This can happen at any function that is being traced. If it happens when the CPU is going idle (or other strange locations), RCU may not be watching, and in this case, the recording of the stack trace will trigger a warning. There's been lots of efforts to make hacks to allow stack tracing to proceed even if RCU is not watching, but this only causes more issues to appear. Simply do not trace a stack if RCU is not watching. It probably isn't a bad stack anyway. Acked-by: N"Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
-
由 Sudip Mukherjee 提交于
gcc toggle -fisolate-erroneous-paths-dereference (default at -O2 onwards) isolates faulty code paths such as null pointer access, divide by zero etc. If gcc port doesnt implement __builtin_trap, an abort() is generated which causes kernel link error. In this case, gcc is generating abort due to 'divide by zero' in lib/mpi/mpih-div.c. Currently 'frv' and 'arc' are failing. Previously other arch was also broken like m32r was fixed by commit d22e3d69 ("m32r: fix build failure"). Let's define this weak function which is common for all arch and fix the problem permanently. We can even remove the arch specific 'abort' after this is done. Link: http://lkml.kernel.org/r/1513118956-8718-1-git-send-email-sudipm.mukherjee@gmail.comSigned-off-by: NSudip Mukherjee <sudipm.mukherjee@gmail.com> Cc: Alexey Brodkin <Alexey.Brodkin@synopsys.com> Cc: Vineet Gupta <Vineet.Gupta1@synopsys.com> Cc: Sudip Mukherjee <sudipm.mukherjee@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Thiago Rafael Becker 提交于
In testing, we found that nfsd threads may call set_groups in parallel for the same entry cached in auth.unix.gid, racing in the call of groups_sort, corrupting the groups for that entry and leading to permission denials for the client. This patch: - Make groups_sort globally visible. - Move the call to groups_sort to the modifiers of group_info - Remove the call to groups_sort from set_groups Link: http://lkml.kernel.org/r/20171211151420.18655-1-thiago.becker@gmail.comSigned-off-by: NThiago Rafael Becker <thiago.becker@gmail.com> Reviewed-by: NMatthew Wilcox <mawilcox@microsoft.com> Reviewed-by: NNeilBrown <neilb@suse.com> Acked-by: N"J. Bruce Fields" <bfields@fieldses.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Dmitry Vyukov 提交于
Fix a silly copy-paste bug. We truncated u32 args to u16. Link: http://lkml.kernel.org/r/20171207101134.107168-1-dvyukov@google.com Fixes: ded97d2c ("kcov: support comparison operands collection") Signed-off-by: NDmitry Vyukov <dvyukov@google.com> Cc: syzkaller@googlegroups.com Cc: Alexander Potapenko <glider@google.com> Cc: Vegard Nossum <vegard.nossum@oracle.com> Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 14 12月, 2017 1 次提交
-
-
由 Yonghong Song 提交于
Commit f371b304 ("bpf/tracing: allow user space to query prog array on the same tp") introduced a perf ioctl command to query prog array attached to the same perf tracepoint. The commit introduced a compilation error under certain config conditions, e.g., (1). CONFIG_BPF_SYSCALL is not defined, or (2). CONFIG_TRACING is defined but neither CONFIG_UPROBE_EVENTS nor CONFIG_KPROBE_EVENTS is defined. Error message: kernel/events/core.o: In function `perf_ioctl': core.c:(.text+0x98c4): undefined reference to `bpf_event_query_prog_array' This patch fixed this error by guarding the real definition under CONFIG_BPF_EVENTS and provided static inline dummy function if CONFIG_BPF_EVENTS was not defined. It renamed the function from bpf_event_query_prog_array to perf_event_query_prog_array and moved the definition from linux/bpf.h to linux/trace_events.h so the definition is in proximity to other prog_array related functions. Fixes: f371b304 ("bpf/tracing: allow user space to query prog array on the same tp") Reported-by: NStephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: NYonghong Song <yhs@fb.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
- 13 12月, 2017 5 次提交
-
-
由 Eric Dumazet 提交于
While using large percpu maps, htab_map_alloc() can hold cpu for hundreds of ms. This patch adds cond_resched() calls to percpu alloc/free call sites, all running in process context. Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
-
由 Daniel Borkmann 提交于
When tracing and networking programs are both attached in the system and both use event-output helpers that eventually call into perf_event_output(), then we could end up in a situation where the tracing attached program runs in user context while a cls_bpf program is triggered on that same CPU out of softirq context. Since both rely on the same per-cpu perf_sample_data, we could potentially corrupt it. This can only ever happen in a combination of the two types; all tracing programs use a bpf_prog_active counter to bail out in case a program is already running on that CPU out of a different context. XDP and cls_bpf programs by themselves don't have this issue as they run in the same context only. Therefore, split both perf_sample_data so they cannot be accessed from each other. Fixes: 20b9d7ac ("bpf: avoid excessive stack usage for perf_sample_data") Reported-by: NAlexei Starovoitov <ast@fb.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Tested-by: NSong Liu <songliubraving@fb.com> Acked-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
-
由 Josef Bacik 提交于
Error injection is sloppy and very ad-hoc. BPF could fill this niche perfectly with it's kprobe functionality. We could make sure errors are only triggered in specific call chains that we care about with very specific situations. Accomplish this with the bpf_override_funciton helper. This will modify the probe'd callers return value to the specified value and set the PC to an override function that simply returns, bypassing the originally probed function. This gives us a nice clean way to implement systematic error injection for all of our code paths. Acked-by: NAlexei Starovoitov <ast@kernel.org> Acked-by: NIngo Molnar <mingo@kernel.org> Signed-off-by: NJosef Bacik <jbacik@fb.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
-
由 Josef Bacik 提交于
Using BPF we can override kprob'ed functions and return arbitrary values. Obviously this can be a bit unsafe, so make this feature opt-in for functions. Simply tag a function with KPROBE_ERROR_INJECT_SYMBOL in order to give BPF access to that function for error injection purposes. Signed-off-by: NJosef Bacik <jbacik@fb.com> Acked-by: NIngo Molnar <mingo@kernel.org> Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
-
由 Yonghong Song 提交于
Commit e87c6bc3 ("bpf: permit multiple bpf attachments for a single perf event") added support to attach multiple bpf programs to a single perf event. Although this provides flexibility, users may want to know what other bpf programs attached to the same tp interface. Besides getting visibility for the underlying bpf system, such information may also help consolidate multiple bpf programs, understand potential performance issues due to a large array, and debug (e.g., one bpf program which overwrites return code may impact subsequent program results). Commit 2541517c ("tracing, perf: Implement BPF programs attached to kprobes") utilized the existing perf ioctl interface and added the command PERF_EVENT_IOC_SET_BPF to attach a bpf program to a tracepoint. This patch adds a new ioctl command, given a perf event fd, to query the bpf program array attached to the same perf tracepoint event. The new uapi ioctl command: PERF_EVENT_IOC_QUERY_BPF The new uapi/linux/perf_event.h structure: struct perf_event_query_bpf { __u32 ids_len; __u32 prog_cnt; __u32 ids[0]; }; User space provides buffer "ids" for kernel to copy to. When returning from the kernel, the number of available programs in the array is set in "prog_cnt". The usage: struct perf_event_query_bpf *query = malloc(sizeof(*query) + sizeof(u32) * ids_len); query.ids_len = ids_len; err = ioctl(pmu_efd, PERF_EVENT_IOC_QUERY_BPF, query); if (err == 0) { /* query.prog_cnt is the number of available progs, * number of progs in ids: (ids_len == 0) ? 0 : query.prog_cnt */ } else if (errno == ENOSPC) { /* query.ids_len number of progs copied, * query.prog_cnt is the number of available progs */ } else { /* other errors */ } Signed-off-by: NYonghong Song <yhs@fb.com> Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Acked-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
-
- 12 12月, 2017 2 次提交
-
-
由 Ingo Molnar 提交于
This code (CONFIG_LOCKDEP_CROSSRELEASE=y and CONFIG_LOCKDEP_COMPLETIONS=y), while it found a number of old bugs initially, was also causing too many false positives that caused people to disable lockdep - which is arguably a worse overall outcome. If we disable cross-release by default but keep the code upstream then in practice the most likely outcome is that we'll allow the situation to degrade gradually, by allowing entropy to introduce more and more false positives, until it overwhelms maintenance capacity. Another bad side effect was that people were trying to work around the false positives by uglifying/complicating unrelated code. There's a marked difference between annotating locking operations and uglifying good code just due to bad lock debugging code ... This gradual decrease in quality happened to a number of debugging facilities in the kernel, and lockdep is pretty complex already, so we cannot risk this outcome. Either cross-release checking can be done right with no false positives, or it should not be included in the upstream kernel. ( Note that it might make sense to maintain it out of tree and go through the false positives every now and then and see whether new bugs were introduced. ) Cc: Byungchul Park <byungchul.park@lge.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: NIngo Molnar <mingo@kernel.org> -
由 Will Deacon 提交于
When CONFIG_GENERIC_LOCKBEAK=y, locking structures grow an extra int ->break_lock field which is used to implement raw_spin_is_contended() by setting the field to 1 when waiting on a lock and clearing it to zero when holding a lock. However, there are a few problems with this approach: - There is a write-write race between a CPU successfully taking the lock (and subsequently writing break_lock = 0) and a waiter waiting on the lock (and subsequently writing break_lock = 1). This could result in a contended lock being reported as uncontended and vice-versa. - On machines with store buffers, nothing guarantees that the writes to break_lock are visible to other CPUs at any particular time. - READ_ONCE/WRITE_ONCE are not used, so the field is potentially susceptible to harmful compiler optimisations, Consequently, the usefulness of this field is unclear and we'd be better off removing it and allowing architectures to implement raw_spin_is_contended() by providing a definition of arch_spin_is_contended(), as they can when CONFIG_GENERIC_LOCKBREAK=n. Signed-off-by: NWill Deacon <will.deacon@arm.com> Acked-by: NPeter Zijlstra <peterz@infradead.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Sebastian Ott <sebott@linux.vnet.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1511894539-7988-3-git-send-email-will.deacon@arm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-