1. 10 1月, 2018 1 次提交
    • A
      bpf: introduce BPF_JIT_ALWAYS_ON config · 290af866
      Alexei Starovoitov 提交于
      The BPF interpreter has been used as part of the spectre 2 attack CVE-2017-5715.
      
      A quote from goolge project zero blog:
      "At this point, it would normally be necessary to locate gadgets in
      the host kernel code that can be used to actually leak data by reading
      from an attacker-controlled location, shifting and masking the result
      appropriately and then using the result of that as offset to an
      attacker-controlled address for a load. But piecing gadgets together
      and figuring out which ones work in a speculation context seems annoying.
      So instead, we decided to use the eBPF interpreter, which is built into
      the host kernel - while there is no legitimate way to invoke it from inside
      a VM, the presence of the code in the host kernel's text section is sufficient
      to make it usable for the attack, just like with ordinary ROP gadgets."
      
      To make attacker job harder introduce BPF_JIT_ALWAYS_ON config
      option that removes interpreter from the kernel in favor of JIT-only mode.
      So far eBPF JIT is supported by:
      x64, arm64, arm32, sparc64, s390, powerpc64, mips64
      
      The start of JITed program is randomized and code page is marked as read-only.
      In addition "constant blinding" can be turned on with net.core.bpf_jit_harden
      
      v2->v3:
      - move __bpf_prog_ret0 under ifdef (Daniel)
      
      v1->v2:
      - fix init order, test_bpf and cBPF (Daniel's feedback)
      - fix offloaded bpf (Jakub's feedback)
      - add 'return 0' dummy in case something can invoke prog->bpf_func
      - retarget bpf tree. For bpf-next the patch would need one extra hunk.
        It will be sent when the trees are merged back to net-next
      
      Considered doing:
        int bpf_jit_enable __read_mostly = BPF_EBPF_JIT_DEFAULT;
      but it seems better to land the patch as-is and in bpf-next remove
      bpf_jit_enable global variable from all JITs, consolidate in one place
      and remove this jit_init() function.
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      290af866
  2. 09 1月, 2018 1 次提交
    • A
      bpf: prevent out-of-bounds speculation · b2157399
      Alexei Starovoitov 提交于
      Under speculation, CPUs may mis-predict branches in bounds checks. Thus,
      memory accesses under a bounds check may be speculated even if the
      bounds check fails, providing a primitive for building a side channel.
      
      To avoid leaking kernel data round up array-based maps and mask the index
      after bounds check, so speculated load with out of bounds index will load
      either valid value from the array or zero from the padded area.
      
      Unconditionally mask index for all array types even when max_entries
      are not rounded to power of 2 for root user.
      When map is created by unpriv user generate a sequence of bpf insns
      that includes AND operation to make sure that JITed code includes
      the same 'index & index_mask' operation.
      
      If prog_array map is created by unpriv user replace
        bpf_tail_call(ctx, map, index);
      with
        if (index >= max_entries) {
          index &= map->index_mask;
          bpf_tail_call(ctx, map, index);
        }
      (along with roundup to power 2) to prevent out-of-bounds speculation.
      There is secondary redundant 'if (index >= max_entries)' in the interpreter
      and in all JITs, but they can be optimized later if necessary.
      
      Other array-like maps (cpumap, devmap, sockmap, perf_event_array, cgroup_array)
      cannot be used by unpriv, so no changes there.
      
      That fixes bpf side of "Variant 1: bounds check bypass (CVE-2017-5753)" on
      all architectures with and without JIT.
      
      v2->v3:
      Daniel noticed that attack potentially can be crafted via syscall commands
      without loading the program, so add masking to those paths as well.
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      b2157399
  3. 07 1月, 2018 1 次提交
  4. 06 1月, 2018 1 次提交
  5. 21 12月, 2017 9 次提交
  6. 16 12月, 2017 1 次提交
    • D
      bpf: guarantee r1 to be ctx in case of bpf_helper_changes_pkt_data · 04514d13
      Daniel Borkmann 提交于
      Some JITs don't cache skb context on stack in prologue, so when
      LD_ABS/IND is used and helper calls yield bpf_helper_changes_pkt_data()
      as true, then they temporarily save/restore skb pointer. However,
      the assumption that skb always has to be in r1 is a bit of a
      gamble. Right now it turned out to be true for all helpers listed
      in bpf_helper_changes_pkt_data(), but lets enforce that from verifier
      side, so that we make this a guarantee and bail out if the func
      proto is misconfigured in future helpers.
      
      In case of BPF helper calls from cBPF, bpf_helper_changes_pkt_data()
      is completely unrelevant here (since cBPF is context read-only) and
      therefore always false.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      04514d13
  7. 13 12月, 2017 1 次提交
  8. 01 12月, 2017 1 次提交
  9. 28 11月, 2017 1 次提交
  10. 23 11月, 2017 2 次提交
    • A
      bpf: fix branch pruning logic · c131187d
      Alexei Starovoitov 提交于
      when the verifier detects that register contains a runtime constant
      and it's compared with another constant it will prune exploration
      of the branch that is guaranteed not to be taken at runtime.
      This is all correct, but malicious program may be constructed
      in such a way that it always has a constant comparison and
      the other branch is never taken under any conditions.
      In this case such path through the program will not be explored
      by the verifier. It won't be taken at run-time either, but since
      all instructions are JITed the malicious program may cause JITs
      to complain about using reserved fields, etc.
      To fix the issue we have to track the instructions explored by
      the verifier and sanitize instructions that are dead at run time
      with NOPs. We cannot reject such dead code, since llvm generates
      it for valid C code, since it doesn't do as much data flow
      analysis as the verifier does.
      
      Fixes: 17a52670 ("bpf: verifier (add verifier core)")
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      c131187d
    • G
      bpf: introduce ARG_PTR_TO_MEM_OR_NULL · db1ac496
      Gianluca Borello 提交于
      With the current ARG_PTR_TO_MEM/ARG_PTR_TO_UNINIT_MEM semantics, an helper
      argument can be NULL when the next argument type is ARG_CONST_SIZE_OR_ZERO
      and the verifier can prove the value of this next argument is 0. However,
      most helpers are just interested in handling <!NULL, 0>, so forcing them to
      deal with <NULL, 0> makes the implementation of those helpers more
      complicated for no apparent benefits, requiring them to explicitly handle
      those corner cases with checks that bpf programs could start relying upon,
      preventing the possibility of removing them later.
      
      Solve this by making ARG_PTR_TO_MEM/ARG_PTR_TO_UNINIT_MEM never accept NULL
      even when ARG_CONST_SIZE_OR_ZERO is set, and introduce a new argument type
      ARG_PTR_TO_MEM_OR_NULL to explicitly deal with the NULL case.
      
      Currently, the only helper that needs this is bpf_csum_diff_proto(), so
      change arg1 and arg3 to this new type as well.
      
      Also add a new battery of tests that explicitly test the
      !ARG_PTR_TO_MEM_OR_NULL combination: all the current ones testing the
      various <NULL, 0> variations are focused on bpf_csum_diff, so cover also
      other helpers.
      Signed-off-by: NGianluca Borello <g.borello@gmail.com>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      db1ac496
  11. 21 11月, 2017 7 次提交
  12. 16 11月, 2017 1 次提交
  13. 15 11月, 2017 1 次提交
    • E
      bpf: fix lockdep splat · 89ad2fa3
      Eric Dumazet 提交于
      pcpu_freelist_pop() needs the same lockdep awareness than
      pcpu_freelist_populate() to avoid a false positive.
      
       [ INFO: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected ]
      
       switchto-defaul/12508 [HC0[0]:SC0[6]:HE0:SE0] is trying to acquire:
        (&htab->buckets[i].lock){......}, at: [<ffffffff9dc099cb>] __htab_percpu_map_update_elem+0x1cb/0x300
      
       and this task is already holding:
        (dev_queue->dev->qdisc_class ?: &qdisc_tx_lock#2){+.-...}, at: [<ffffffff9e135848>] __dev_queue_xmit+0
      x868/0x1240
       which would create a new lock dependency:
        (dev_queue->dev->qdisc_class ?: &qdisc_tx_lock#2){+.-...} -> (&htab->buckets[i].lock){......}
      
       but this new dependency connects a SOFTIRQ-irq-safe lock:
        (dev_queue->dev->qdisc_class ?: &qdisc_tx_lock#2){+.-...}
       ... which became SOFTIRQ-irq-safe at:
         [<ffffffff9db5931b>] __lock_acquire+0x42b/0x1f10
         [<ffffffff9db5b32c>] lock_acquire+0xbc/0x1b0
         [<ffffffff9da05e38>] _raw_spin_lock+0x38/0x50
         [<ffffffff9e135848>] __dev_queue_xmit+0x868/0x1240
         [<ffffffff9e136240>] dev_queue_xmit+0x10/0x20
         [<ffffffff9e1965d9>] ip_finish_output2+0x439/0x590
         [<ffffffff9e197410>] ip_finish_output+0x150/0x2f0
         [<ffffffff9e19886d>] ip_output+0x7d/0x260
         [<ffffffff9e19789e>] ip_local_out+0x5e/0xe0
         [<ffffffff9e197b25>] ip_queue_xmit+0x205/0x620
         [<ffffffff9e1b8398>] tcp_transmit_skb+0x5a8/0xcb0
         [<ffffffff9e1ba152>] tcp_write_xmit+0x242/0x1070
         [<ffffffff9e1baffc>] __tcp_push_pending_frames+0x3c/0xf0
         [<ffffffff9e1b3472>] tcp_rcv_established+0x312/0x700
         [<ffffffff9e1c1acc>] tcp_v4_do_rcv+0x11c/0x200
         [<ffffffff9e1c3dc2>] tcp_v4_rcv+0xaa2/0xc30
         [<ffffffff9e191107>] ip_local_deliver_finish+0xa7/0x240
         [<ffffffff9e191a36>] ip_local_deliver+0x66/0x200
         [<ffffffff9e19137d>] ip_rcv_finish+0xdd/0x560
         [<ffffffff9e191e65>] ip_rcv+0x295/0x510
         [<ffffffff9e12ff88>] __netif_receive_skb_core+0x988/0x1020
         [<ffffffff9e130641>] __netif_receive_skb+0x21/0x70
         [<ffffffff9e1306ff>] process_backlog+0x6f/0x230
         [<ffffffff9e132129>] net_rx_action+0x229/0x420
         [<ffffffff9da07ee8>] __do_softirq+0xd8/0x43d
         [<ffffffff9e282bcc>] do_softirq_own_stack+0x1c/0x30
         [<ffffffff9dafc2f5>] do_softirq+0x55/0x60
         [<ffffffff9dafc3a8>] __local_bh_enable_ip+0xa8/0xb0
         [<ffffffff9db4c727>] cpu_startup_entry+0x1c7/0x500
         [<ffffffff9daab333>] start_secondary+0x113/0x140
      
       to a SOFTIRQ-irq-unsafe lock:
        (&head->lock){+.+...}
       ... which became SOFTIRQ-irq-unsafe at:
       ...  [<ffffffff9db5971f>] __lock_acquire+0x82f/0x1f10
         [<ffffffff9db5b32c>] lock_acquire+0xbc/0x1b0
         [<ffffffff9da05e38>] _raw_spin_lock+0x38/0x50
         [<ffffffff9dc0b7fa>] pcpu_freelist_pop+0x7a/0xb0
         [<ffffffff9dc08b2c>] htab_map_alloc+0x50c/0x5f0
         [<ffffffff9dc00dc5>] SyS_bpf+0x265/0x1200
         [<ffffffff9e28195f>] entry_SYSCALL_64_fastpath+0x12/0x17
      
       other info that might help us debug this:
      
       Chain exists of:
         dev_queue->dev->qdisc_class ?: &qdisc_tx_lock#2 --> &htab->buckets[i].lock --> &head->lock
      
        Possible interrupt unsafe locking scenario:
      
              CPU0                    CPU1
              ----                    ----
         lock(&head->lock);
                                      local_irq_disable();
                                      lock(dev_queue->dev->qdisc_class ?: &qdisc_tx_lock#2);
                                      lock(&htab->buckets[i].lock);
         <Interrupt>
           lock(dev_queue->dev->qdisc_class ?: &qdisc_tx_lock#2);
      
        *** DEADLOCK ***
      
      Fixes: e19494ed ("bpf: introduce percpu_freelist")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      89ad2fa3
  14. 14 11月, 2017 1 次提交
    • Y
      bpf: improve verifier ARG_CONST_SIZE_OR_ZERO semantics · 9fd29c08
      Yonghong Song 提交于
      For helpers, the argument type ARG_CONST_SIZE_OR_ZERO permits the
      access size to be 0 when accessing the previous argument (arg).
      Right now, it requires the arg needs to be NULL when size passed
      is 0 or could be 0. It also requires a non-NULL arg when the size
      is proved to be non-0.
      
      This patch changes verifier ARG_CONST_SIZE_OR_ZERO behavior
      such that for size-0 or possible size-0, it is not required
      the arg equal to NULL.
      
      There are a couple of reasons for this semantics change, and
      all of them intends to simplify user bpf programs which
      may improve user experience and/or increase chances of
      verifier acceptance. Together with the next patch which
      changes bpf_probe_read arg2 type from ARG_CONST_SIZE to
      ARG_CONST_SIZE_OR_ZERO, the following two examples, which
      fail the verifier currently, are able to get verifier acceptance.
      
      Example 1:
         unsigned long len = pend - pstart;
         len = len > MAX_PAYLOAD_LEN ? MAX_PAYLOAD_LEN : len;
         len &= MAX_PAYLOAD_LEN;
         bpf_probe_read(data->payload, len, pstart);
      
      It does not have test for "len > 0" and it failed the verifier.
      Users may not be aware that they have to add this test.
      Converting the bpf_probe_read helper to have
      ARG_CONST_SIZE_OR_ZERO helps the above code get
      verifier acceptance.
      
      Example 2:
        Here is one example where llvm "messed up" the code and
        the verifier fails.
      
      ......
         unsigned long len = pend - pstart;
         if (len > 0 && len <= MAX_PAYLOAD_LEN)
           bpf_probe_read(data->payload, len, pstart);
      ......
      
      The compiler generates the following code and verifier fails:
      ......
      39: (79) r2 = *(u64 *)(r10 -16)
      40: (1f) r2 -= r8
      41: (bf) r1 = r2
      42: (07) r1 += -1
      43: (25) if r1 > 0xffe goto pc+3
        R0=inv(id=0) R1=inv(id=0,umax_value=4094,var_off=(0x0; 0xfff))
        R2=inv(id=0) R6=map_value(id=0,off=0,ks=4,vs=4095,imm=0) R7=inv(id=0)
        R8=inv(id=0) R9=inv0 R10=fp0
      44: (bf) r1 = r6
      45: (bf) r3 = r8
      46: (85) call bpf_probe_read#45
      R2 min value is negative, either use unsigned or 'var &= const'
      ......
      
      The compiler optimization is correct. If r1 = 0,
      r1 - 1 = 0xffffffffffffffff > 0xffe.  If r1 != 0, r1 - 1 will not wrap.
      r1 > 0xffe at insn #43 can actually capture
      both "r1 > 0" and "len <= MAX_PAYLOAD_LEN".
      This however causes an issue in verifier as the value range of arg2
      "r2" does not properly get refined and lead to verification failure.
      
      Relaxing bpf_prog_read arg2 from ARG_CONST_SIZE to ARG_CONST_SIZE_OR_ZERO
      allows the following simplied code:
         unsigned long len = pend - pstart;
         if (len <= MAX_PAYLOAD_LEN)
           bpf_probe_read(data->payload, len, pstart);
      
      The llvm compiler will generate less complex code and the
      verifier is able to verify that the program is okay.
      Signed-off-by: NYonghong Song <yhs@fb.com>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9fd29c08
  15. 11 11月, 2017 2 次提交
  16. 05 11月, 2017 6 次提交
  17. 03 11月, 2017 3 次提交
    • C
      bpf: fix verifier NULL pointer dereference · 8c01c4f8
      Craig Gallek 提交于
      do_check() can fail early without allocating env->cur_state under
      memory pressure.  Syzkaller found the stack below on the linux-next
      tree because of this.
      
        kasan: CONFIG_KASAN_INLINE enabled
        kasan: GPF could be caused by NULL-ptr deref or user memory access
        general protection fault: 0000 [#1] SMP KASAN
        Dumping ftrace buffer:
           (ftrace buffer empty)
        Modules linked in:
        CPU: 1 PID: 27062 Comm: syz-executor5 Not tainted 4.14.0-rc7+ #106
        Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
        task: ffff8801c2c74700 task.stack: ffff8801c3e28000
        RIP: 0010:free_verifier_state kernel/bpf/verifier.c:347 [inline]
        RIP: 0010:bpf_check+0xcf4/0x19c0 kernel/bpf/verifier.c:4533
        RSP: 0018:ffff8801c3e2f5c8 EFLAGS: 00010202
        RAX: dffffc0000000000 RBX: 00000000fffffff4 RCX: 0000000000000000
        RDX: 0000000000000070 RSI: ffffffff817d5aa9 RDI: 0000000000000380
        RBP: ffff8801c3e2f668 R08: 0000000000000000 R09: 1ffff100387c5d9f
        R10: 00000000218c4e80 R11: ffffffff85b34380 R12: ffff8801c4dc6a28
        R13: 0000000000000000 R14: ffff8801c4dc6a00 R15: ffff8801c4dc6a20
        FS:  00007f311079b700(0000) GS:ffff8801db300000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 00000000004d4a24 CR3: 00000001cbcd0000 CR4: 00000000001406e0
        DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
        Call Trace:
         bpf_prog_load+0xcbb/0x18e0 kernel/bpf/syscall.c:1166
         SYSC_bpf kernel/bpf/syscall.c:1690 [inline]
         SyS_bpf+0xae9/0x4620 kernel/bpf/syscall.c:1652
         entry_SYSCALL_64_fastpath+0x1f/0xbe
        RIP: 0033:0x452869
        RSP: 002b:00007f311079abe8 EFLAGS: 00000212 ORIG_RAX: 0000000000000141
        RAX: ffffffffffffffda RBX: 0000000000758020 RCX: 0000000000452869
        RDX: 0000000000000030 RSI: 0000000020168000 RDI: 0000000000000005
        RBP: 00007f311079aa20 R08: 0000000000000000 R09: 0000000000000000
        R10: 0000000000000000 R11: 0000000000000212 R12: 00000000004b7550
        R13: 00007f311079ab58 R14: 00000000004b7560 R15: 0000000000000000
        Code: df 48 c1 ea 03 80 3c 02 00 0f 85 e6 0b 00 00 4d 8b 6e 20 48 b8 00 00 00 00 00 fc ff df 49 8d bd 80 03 00 00 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 b6 0b 00 00 49 8b bd 80 03 00 00 e8 d6 0c 26
        RIP: free_verifier_state kernel/bpf/verifier.c:347 [inline] RSP: ffff8801c3e2f5c8
        RIP: bpf_check+0xcf4/0x19c0 kernel/bpf/verifier.c:4533 RSP: ffff8801c3e2f5c8
        ---[ end trace c8d37f339dc64004 ]---
      
      Fixes: 638f5b90 ("bpf: reduce verifier memory consumption")
      Fixes: 1969db47 ("bpf: fix verifier memory leaks")
      Signed-off-by: NCraig Gallek <kraig@google.com>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8c01c4f8
    • A
      bpf: fix out-of-bounds access warning in bpf_check · eba0c929
      Arnd Bergmann 提交于
      The bpf_verifer_ops array is generated dynamically and may be
      empty depending on configuration, which then causes an out
      of bounds access:
      
      kernel/bpf/verifier.c: In function 'bpf_check':
      kernel/bpf/verifier.c:4320:29: error: array subscript is above array bounds [-Werror=array-bounds]
      
      This adds a check to the start of the function as a workaround.
      I would assume that the function is never called in that configuration,
      so the warning is probably harmless.
      
      Fixes: 00176a34 ("bpf: remove the verifier ops from program structure")
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      eba0c929
    • A
      bpf: fix link error without CONFIG_NET · 7cce782e
      Arnd Bergmann 提交于
      I ran into this link error with the latest net-next plus linux-next
      trees when networking is disabled:
      
      kernel/bpf/verifier.o:(.rodata+0x2958): undefined reference to `tc_cls_act_analyzer_ops'
      kernel/bpf/verifier.o:(.rodata+0x2970): undefined reference to `xdp_analyzer_ops'
      
      It seems that the code was written to deal with varying contents of
      the arrray, but the actual #ifdef was missing. Both tc_cls_act_analyzer_ops
      and xdp_analyzer_ops are defined in the core networking code, so adding
      a check for CONFIG_NET seems appropriate here, and I've verified this with
      many randconfig builds
      
      Fixes: 4f9218aa ("bpf: move knowledge about post-translation offsets out of verifier")
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7cce782e