1. 18 1月, 2018 1 次提交
    • J
      bpf: add new jited info fields in bpf_dev_offload and bpf_prog_info · fcfb126d
      Jiong Wang 提交于
      For host JIT, there are "jited_len"/"bpf_func" fields in struct bpf_prog
      used by all host JIT targets to get jited image and it's length. While for
      offload, targets are likely to have different offload mechanisms that these
      info are kept in device private data fields.
      
      Therefore, BPF_OBJ_GET_INFO_BY_FD syscall needs an unified way to get JIT
      length and contents info for offload targets.
      
      One way is to introduce new callback to parse device private data then fill
      those fields in bpf_prog_info. This might be a little heavy, the other way
      is to add generic fields which will be initialized by all offload targets.
      
      This patch follow the second approach to introduce two new fields in
      struct bpf_dev_offload and teach bpf_prog_get_info_by_fd about them to fill
      correct jited_prog_len and jited_prog_insns in bpf_prog_info.
      Reviewed-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: NJiong Wang <jiong.wang@netronome.com>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      fcfb126d
  2. 17 1月, 2018 3 次提交
  3. 16 1月, 2018 2 次提交
    • R
      tracing: Prevent PROFILE_ALL_BRANCHES when FORTIFY_SOURCE=y · 68e76e03
      Randy Dunlap 提交于
      I regularly get 50 MB - 60 MB files during kernel randconfig builds.
      These large files mostly contain (many repeats of; e.g., 124,594):
      
      In file included from ../include/linux/string.h:6:0,
                       from ../include/linux/uuid.h:20,
                       from ../include/linux/mod_devicetable.h:13,
                       from ../scripts/mod/devicetable-offsets.c:3:
      ../include/linux/compiler.h:64:4: warning: '______f' is static but declared in inline function 'strcpy' which is not static [enabled by default]
          ______f = {     \
          ^
      ../include/linux/compiler.h:56:23: note: in expansion of macro '__trace_if'
                             ^
      ../include/linux/string.h:425:2: note: in expansion of macro 'if'
        if (p_size == (size_t)-1 && q_size == (size_t)-1)
        ^
      
      This only happens when CONFIG_FORTIFY_SOURCE=y and
      CONFIG_PROFILE_ALL_BRANCHES=y, so prevent PROFILE_ALL_BRANCHES if
      FORTIFY_SOURCE=y.
      
      Link: http://lkml.kernel.org/r/9199446b-a141-c0c3-9678-a3f9107f2750@infradead.orgSigned-off-by: NRandy Dunlap <rdunlap@infradead.org>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      68e76e03
    • S
      ring-buffer: Bring back context level recursive checks · a0e3a18f
      Steven Rostedt (VMware) 提交于
      Commit 1a149d7d ("ring-buffer: Rewrite trace_recursive_(un)lock() to be
      simpler") replaced the context level recursion checks with a simple counter.
      This would prevent the ring buffer code from recursively calling itself more
      than the max number of contexts that exist (Normal, softirq, irq, nmi). But
      this change caused a lockup in a specific case, which was during suspend and
      resume using a global clock. Adding a stack dump to see where this occurred,
      the issue was in the trace global clock itself:
      
        trace_buffer_lock_reserve+0x1c/0x50
        __trace_graph_entry+0x2d/0x90
        trace_graph_entry+0xe8/0x200
        prepare_ftrace_return+0x69/0xc0
        ftrace_graph_caller+0x78/0xa8
        queued_spin_lock_slowpath+0x5/0x1d0
        trace_clock_global+0xb0/0xc0
        ring_buffer_lock_reserve+0xf9/0x390
      
      The function graph tracer traced queued_spin_lock_slowpath that was called
      by trace_clock_global. This pointed out that the trace_clock_global() is not
      reentrant, as it takes a spin lock. It depended on the ring buffer recursive
      lock from letting that happen.
      
      By removing the context detection and adding just a max number of allowable
      recursions, it allowed the trace_clock_global() to be entered again and try
      to retake the spinlock it already held, causing a deadlock.
      
      Fixes: 1a149d7d ("ring-buffer: Rewrite trace_recursive_(un)lock() to be simpler")
      Reported-by: NDavid Weinehall <david.weinehall@gmail.com>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      a0e3a18f
  4. 15 1月, 2018 7 次提交
  5. 14 1月, 2018 1 次提交
  6. 13 1月, 2018 4 次提交
  7. 11 1月, 2018 3 次提交
    • D
      bpf, array: fix overflow in max_entries and undefined behavior in index_mask · bbeb6e43
      Daniel Borkmann 提交于
      syzkaller tried to alloc a map with 0xfffffffd entries out of a userns,
      and thus unprivileged. With the recently added logic in b2157399
      ("bpf: prevent out-of-bounds speculation") we round this up to the next
      power of two value for max_entries for unprivileged such that we can
      apply proper masking into potentially zeroed out map slots.
      
      However, this will generate an index_mask of 0xffffffff, and therefore
      a + 1 will let this overflow into new max_entries of 0. This will pass
      allocation, etc, and later on map access we still enforce on the original
      attr->max_entries value which was 0xfffffffd, therefore triggering GPF
      all over the place. Thus bail out on overflow in such case.
      
      Moreover, on 32 bit archs roundup_pow_of_two() can also not be used,
      since fls_long(max_entries - 1) can result in 32 and 1UL << 32 in 32 bit
      space is undefined. Therefore, do this by hand in a 64 bit variable.
      
      This fixes all the issues triggered by syzkaller's reproducers.
      
      Fixes: b2157399 ("bpf: prevent out-of-bounds speculation")
      Reported-by: syzbot+b0efb8e572d01bce1ae0@syzkaller.appspotmail.com
      Reported-by: syzbot+6c15e9744f75f2364773@syzkaller.appspotmail.com
      Reported-by: syzbot+d2f5524fb46fd3b312ee@syzkaller.appspotmail.com
      Reported-by: syzbot+61d23c95395cc90dbc2b@syzkaller.appspotmail.com
      Reported-by: syzbot+0d363c942452cca68c01@syzkaller.appspotmail.com
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      bbeb6e43
    • D
      bpf: arsh is not supported in 32 bit alu thus reject it · 7891a87e
      Daniel Borkmann 提交于
      The following snippet was throwing an 'unknown opcode cc' warning
      in BPF interpreter:
      
        0: (18) r0 = 0x0
        2: (7b) *(u64 *)(r10 -16) = r0
        3: (cc) (u32) r0 s>>= (u32) r0
        4: (95) exit
      
      Although a number of JITs do support BPF_ALU | BPF_ARSH | BPF_{K,X}
      generation, not all of them do and interpreter does neither. We can
      leave existing ones and implement it later in bpf-next for the
      remaining ones, but reject this properly in verifier for the time
      being.
      
      Fixes: 17a52670 ("bpf: verifier (add verifier core)")
      Reported-by: syzbot+93c4904c5c70348a6890@syzkaller.appspotmail.com
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      7891a87e
    • C
      bpf: fix spelling mistake: "obusing" -> "abusing" · 40950343
      Colin Ian King 提交于
      Trivial fix to spelling mistake in error message text.
      Signed-off-by: NColin Ian King <colin.king@canonical.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      40950343
  8. 10 1月, 2018 3 次提交
    • Q
      bpf: export function to write into verifier log buffer · 430e68d1
      Quentin Monnet 提交于
      Rename the BPF verifier `verbose()` to `bpf_verifier_log_write()` and
      export it, so that other components (in particular, drivers for BPF
      offload) can reuse the user buffer log to dump error messages at
      verification time.
      
      Renaming `verbose()` was necessary in order to avoid a name so generic
      to be exported to the global namespace. However to prevent too much pain
      for backports, the calls to `verbose()` in the kernel BPF verifier were
      not changed. Instead, use function aliasing to make `verbose` point to
      `bpf_verifier_log_write`. Another solution could consist in making a
      wrapper around `verbose()`, but since it is a variadic function, I don't
      see a clean way without creating two identical wrappers, one for the
      verifier and one to export.
      Signed-off-by: NQuentin Monnet <quentin.monnet@netronome.com>
      Reviewed-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      430e68d1
    • M
      membarrier: Disable preemption when calling smp_call_function_many() · 54167607
      Mathieu Desnoyers 提交于
      smp_call_function_many() requires disabling preemption around the call.
      Signed-off-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Cc: <stable@vger.kernel.org> # v4.14+
      Cc: Andrea Parri <parri.andrea@gmail.com>
      Cc: Andrew Hunter <ahh@google.com>
      Cc: Avi Kivity <avi@scylladb.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Boqun Feng <boqun.feng@gmail.com>
      Cc: Dave Watson <davejwatson@fb.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Maged Michael <maged.michael@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul E . McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20171215192310.25293-1-mathieu.desnoyers@efficios.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      54167607
    • A
      bpf: introduce BPF_JIT_ALWAYS_ON config · 290af866
      Alexei Starovoitov 提交于
      The BPF interpreter has been used as part of the spectre 2 attack CVE-2017-5715.
      
      A quote from goolge project zero blog:
      "At this point, it would normally be necessary to locate gadgets in
      the host kernel code that can be used to actually leak data by reading
      from an attacker-controlled location, shifting and masking the result
      appropriately and then using the result of that as offset to an
      attacker-controlled address for a load. But piecing gadgets together
      and figuring out which ones work in a speculation context seems annoying.
      So instead, we decided to use the eBPF interpreter, which is built into
      the host kernel - while there is no legitimate way to invoke it from inside
      a VM, the presence of the code in the host kernel's text section is sufficient
      to make it usable for the attack, just like with ordinary ROP gadgets."
      
      To make attacker job harder introduce BPF_JIT_ALWAYS_ON config
      option that removes interpreter from the kernel in favor of JIT-only mode.
      So far eBPF JIT is supported by:
      x64, arm64, arm32, sparc64, s390, powerpc64, mips64
      
      The start of JITed program is randomized and code page is marked as read-only.
      In addition "constant blinding" can be turned on with net.core.bpf_jit_harden
      
      v2->v3:
      - move __bpf_prog_ret0 under ifdef (Daniel)
      
      v1->v2:
      - fix init order, test_bpf and cBPF (Daniel's feedback)
      - fix offloaded bpf (Jakub's feedback)
      - add 'return 0' dummy in case something can invoke prog->bpf_func
      - retarget bpf tree. For bpf-next the patch would need one extra hunk.
        It will be sent when the trees are merged back to net-next
      
      Considered doing:
        int bpf_jit_enable __read_mostly = BPF_EBPF_JIT_DEFAULT;
      but it seems better to land the patch as-is and in bpf-next remove
      bpf_jit_enable global variable from all JITs, consolidate in one place
      and remove this jit_init() function.
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      290af866
  9. 09 1月, 2018 3 次提交
    • A
      bpf: prevent out-of-bounds speculation · b2157399
      Alexei Starovoitov 提交于
      Under speculation, CPUs may mis-predict branches in bounds checks. Thus,
      memory accesses under a bounds check may be speculated even if the
      bounds check fails, providing a primitive for building a side channel.
      
      To avoid leaking kernel data round up array-based maps and mask the index
      after bounds check, so speculated load with out of bounds index will load
      either valid value from the array or zero from the padded area.
      
      Unconditionally mask index for all array types even when max_entries
      are not rounded to power of 2 for root user.
      When map is created by unpriv user generate a sequence of bpf insns
      that includes AND operation to make sure that JITed code includes
      the same 'index & index_mask' operation.
      
      If prog_array map is created by unpriv user replace
        bpf_tail_call(ctx, map, index);
      with
        if (index >= max_entries) {
          index &= map->index_mask;
          bpf_tail_call(ctx, map, index);
        }
      (along with roundup to power 2) to prevent out-of-bounds speculation.
      There is secondary redundant 'if (index >= max_entries)' in the interpreter
      and in all JITs, but they can be optimized later if necessary.
      
      Other array-like maps (cpumap, devmap, sockmap, perf_event_array, cgroup_array)
      cannot be used by unpriv, so no changes there.
      
      That fixes bpf side of "Variant 1: bounds check bypass (CVE-2017-5753)" on
      all architectures with and without JIT.
      
      v2->v3:
      Daniel noticed that attack potentially can be crafted via syscall commands
      without loading the program, so add masking to those paths as well.
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      b2157399
    • A
      bpf: fix verifier GPF in kmalloc failure path · 5896351e
      Alexei Starovoitov 提交于
      syzbot reported the following panic in the verifier triggered
      by kmalloc error injection:
      
      kasan: GPF could be caused by NULL-ptr deref or user memory access
      RIP: 0010:copy_func_state kernel/bpf/verifier.c:403 [inline]
      RIP: 0010:copy_verifier_state+0x364/0x590 kernel/bpf/verifier.c:431
      Call Trace:
       pop_stack+0x8c/0x270 kernel/bpf/verifier.c:449
       push_stack kernel/bpf/verifier.c:491 [inline]
       check_cond_jmp_op kernel/bpf/verifier.c:3598 [inline]
       do_check+0x4b60/0xa050 kernel/bpf/verifier.c:4731
       bpf_check+0x3296/0x58c0 kernel/bpf/verifier.c:5489
       bpf_prog_load+0xa2a/0x1b00 kernel/bpf/syscall.c:1198
       SYSC_bpf kernel/bpf/syscall.c:1807 [inline]
       SyS_bpf+0x1044/0x4420 kernel/bpf/syscall.c:1769
      
      when copy_verifier_state() aborts in the middle due to kmalloc failure
      some of the frames could have been partially copied while
      current free_verifier_state() loop
      for (i = 0; i <= state->curframe; i++)
      assumed that all frames are non-null.
      Simply fix it by adding 'if (!state)' to free_func_state().
      Also avoid stressing copy frame logic more if kzalloc fails
      in push_stack() free env->cur_state right away.
      
      Fixes: f4d7e40a ("bpf: introduce function calls (verification)")
      Reported-by: syzbot+32ac5a3e473f2e01cfc7@syzkaller.appspotmail.com
      Reported-by: syzbot+fa99e24f3c29d269a7d5@syzkaller.appspotmail.com
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      5896351e
    • I
      locking/lockdep: Remove cross-release leftovers · 527187d2
      Ingo Molnar 提交于
      There's two cross-release leftover facilities:
      
       - the crossrelease_hist_*() irq-tracing callbacks (NOPs currently)
       - the complete_release_commit() callback (NOP as well)
      
      Remove them.
      
      Cc: David Sterba <dsterba@suse.com>
      Cc: Byungchul Park <byungchul.park@lge.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      527187d2
  10. 07 1月, 2018 2 次提交
    • J
      bpf: sockmap missing NULL psock check · 5731a879
      John Fastabend 提交于
      Add psock NULL check to handle a racing sock event that can get the
      sk_callback_lock before this case but after xchg happens causing the
      refcnt to hit zero and sock user data (psock) to be null and queued
      for garbage collection.
      
      Also add a comment in the code because this is a bit subtle and
      not obvious in my opinion.
      Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      5731a879
    • Y
      bpf: implement syscall command BPF_MAP_GET_NEXT_KEY for stacktrace map · 16f07c55
      Yonghong Song 提交于
      Currently, bpf syscall command BPF_MAP_GET_NEXT_KEY is not
      supported for stacktrace map. However, there are use cases where
      user space wants to enumerate all stacktrace map entries where
      BPF_MAP_GET_NEXT_KEY command will be really helpful.
      In addition, if user space wants to delete all map entries
      in order to save memory and does not want to close the
      map file descriptor, BPF_MAP_GET_NEXT_KEY may help improve
      performance if map entries are sparsely populated.
      
      The implementation has similar behavior for
      BPF_MAP_GET_NEXT_KEY implementation in hashtab. If user provides
      a NULL key pointer or an invalid key, the first key is returned.
      Otherwise, the first valid key after the input parameter "key"
      is returned, or -ENOENT if no valid key can be found.
      Signed-off-by: NYonghong Song <yhs@fb.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      16f07c55
  11. 06 1月, 2018 1 次提交
  12. 05 1月, 2018 4 次提交
  13. 31 12月, 2017 6 次提交