1. 30 6月, 2017 1 次提交
    • D
      bpf: prevent leaking pointer via xadd on unpriviledged · 6bdf6abc
      Daniel Borkmann 提交于
      Leaking kernel addresses on unpriviledged is generally disallowed,
      for example, verifier rejects the following:
      
        0: (b7) r0 = 0
        1: (18) r2 = 0xffff897e82304400
        3: (7b) *(u64 *)(r1 +48) = r2
        R2 leaks addr into ctx
      
      Doing pointer arithmetic on them is also forbidden, so that they
      don't turn into unknown value and then get leaked out. However,
      there's xadd as a special case, where we don't check the src reg
      for being a pointer register, e.g. the following will pass:
      
        0: (b7) r0 = 0
        1: (7b) *(u64 *)(r1 +48) = r0
        2: (18) r2 = 0xffff897e82304400 ; map
        4: (db) lock *(u64 *)(r1 +48) += r2
        5: (95) exit
      
      We could store the pointer into skb->cb, loose the type context,
      and then read it out from there again to leak it eventually out
      of a map value. Or more easily in a different variant, too:
      
         0: (bf) r6 = r1
         1: (7a) *(u64 *)(r10 -8) = 0
         2: (bf) r2 = r10
         3: (07) r2 += -8
         4: (18) r1 = 0x0
         6: (85) call bpf_map_lookup_elem#1
         7: (15) if r0 == 0x0 goto pc+3
         R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R6=ctx R10=fp
         8: (b7) r3 = 0
         9: (7b) *(u64 *)(r0 +0) = r3
        10: (db) lock *(u64 *)(r0 +0) += r6
        11: (b7) r0 = 0
        12: (95) exit
      
        from 7 to 11: R0=inv,min_value=0,max_value=0 R6=ctx R10=fp
        11: (b7) r0 = 0
        12: (95) exit
      
      Prevent this by checking xadd src reg for pointer types. Also
      add a couple of test cases related to this.
      
      Fixes: 1be7f75d ("bpf: enable non-root eBPF programs")
      Fixes: 17a52670 ("bpf: verifier (add verifier core)")
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Acked-by: NEdward Cree <ecree@solarflare.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6bdf6abc
  2. 26 5月, 2017 1 次提交
  3. 01 5月, 2017 1 次提交
  4. 29 4月, 2017 1 次提交
  5. 22 4月, 2017 1 次提交
  6. 07 4月, 2017 1 次提交
  7. 02 4月, 2017 1 次提交
  8. 25 3月, 2017 1 次提交
  9. 23 3月, 2017 1 次提交
  10. 13 3月, 2017 1 次提交
  11. 11 2月, 2017 4 次提交
  12. 07 2月, 2017 2 次提交
  13. 25 1月, 2017 1 次提交
    • D
      bpf: enable verifier to better track const alu ops · 3fadc801
      Daniel Borkmann 提交于
      William reported couple of issues in relation to direct packet
      access. Typical scheme is to check for data + [off] <= data_end,
      where [off] can be either immediate or coming from a tracked
      register that contains an immediate, depending on the branch, we
      can then access the data. However, in case of calculating [off]
      for either the mentioned test itself or for access after the test
      in a more "complex" way, then the verifier will stop tracking the
      CONST_IMM marked register and will mark it as UNKNOWN_VALUE one.
      
      Adding that UNKNOWN_VALUE typed register to a pkt() marked
      register, the verifier then bails out in check_packet_ptr_add()
      as it finds the registers imm value below 48. In the first below
      example, that is due to evaluate_reg_imm_alu() not handling right
      shifts and thus marking the register as UNKNOWN_VALUE via helper
      __mark_reg_unknown_value() that resets imm to 0.
      
      In the second case the same happens at the time when r4 is set
      to r4 &= r5, where it transitions to UNKNOWN_VALUE from
      evaluate_reg_imm_alu(). Later on r4 we shift right by 3 inside
      evaluate_reg_alu(), where the register's imm turns into 3. That
      is, for registers with type UNKNOWN_VALUE, imm of 0 means that
      we don't know what value the register has, and for imm > 0 it
      means that the value has [imm] upper zero bits. F.e. when shifting
      an UNKNOWN_VALUE register by 3 to the right, no matter what value
      it had, we know that the 3 upper most bits must be zero now.
      This is to make sure that ALU operations with unknown registers
      don't overflow. Meaning, once we know that we have more than 48
      upper zero bits, or, in other words cannot go beyond 0xffff offset
      with ALU ops, such an addition will track the target register
      as a new pkt() register with a new id, but 0 offset and 0 range,
      so for that a new data/data_end test will be required. Is the source
      register a CONST_IMM one that is to be added to the pkt() register,
      or the source instruction is an add instruction with immediate
      value, then it will get added if it stays within max 0xffff bounds.
      >From there, pkt() type, can be accessed should reg->off + imm be
      within the access range of pkt().
      
        [...]
        from 28 to 30: R0=imm1,min_value=1,max_value=1
          R1=pkt(id=0,off=0,r=22) R2=pkt_end
          R3=imm144,min_value=144,max_value=144
          R4=imm0,min_value=0,max_value=0
          R5=inv48,min_value=2054,max_value=2054 R10=fp
        30: (bf) r5 = r3
        31: (07) r5 += 23
        32: (77) r5 >>= 3
        33: (bf) r6 = r1
        34: (0f) r6 += r5
        cannot add integer value with 0 upper zero bits to ptr_to_packet
      
        [...]
        from 52 to 80: R0=imm1,min_value=1,max_value=1
          R1=pkt(id=0,off=0,r=34) R2=pkt_end R3=inv
          R4=imm272 R5=inv56,min_value=17,max_value=17
          R6=pkt(id=0,off=26,r=34) R10=fp
        80: (07) r4 += 71
        81: (18) r5 = 0xfffffff8
        83: (5f) r4 &= r5
        84: (77) r4 >>= 3
        85: (0f) r1 += r4
        cannot add integer value with 3 upper zero bits to ptr_to_packet
      
      Thus to get above use-cases working, evaluate_reg_imm_alu() has
      been extended for further ALU ops. This is fine, because we only
      operate strictly within realm of CONST_IMM types, so here we don't
      care about overflows as they will happen in the simulated but also
      real execution and interaction with pkt() in check_packet_ptr_add()
      will check actual imm value once added to pkt(), but it's irrelevant
      before.
      
      With regards to 06c1c049 ("bpf: allow helpers access to variable
      memory") that works on UNKNOWN_VALUE registers, the verifier becomes
      now a bit smarter as it can better resolve ALU ops, so we need to
      adapt two test cases there, as min/max bound tracking only becomes
      necessary when registers were spilled to stack. So while mask was
      set before to track upper bound for UNKNOWN_VALUE case, it's now
      resolved directly as CONST_IMM, and such contructs are only necessary
      when f.e. registers are spilled.
      
      For commit 6b173873 ("bpf: recognize 64bit immediate loads as
      consts") that initially enabled dw load tracking only for nfp jit/
      analyzer, I did couple of tests on large, complex programs and we
      don't increase complexity badly (my tests were in ~3% range on avg).
      I've added a couple of tests similar to affected code above, and
      it works fine with verifier now.
      Reported-by: NWilliam Tu <u9012063@gmail.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Cc: Gianluca Borello <g.borello@gmail.com>
      Cc: William Tu <u9012063@gmail.com>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3fadc801
  14. 12 1月, 2017 1 次提交
  15. 10 1月, 2017 3 次提交
    • G
      bpf: allow helpers access to variable memory · 06c1c049
      Gianluca Borello 提交于
      Currently, helpers that read and write from/to the stack can do so using
      a pair of arguments of type ARG_PTR_TO_STACK and ARG_CONST_STACK_SIZE.
      ARG_CONST_STACK_SIZE accepts a constant register of type CONST_IMM, so
      that the verifier can safely check the memory access. However, requiring
      the argument to be a constant can be limiting in some circumstances.
      
      Since the current logic keeps track of the minimum and maximum value of
      a register throughout the simulated execution, ARG_CONST_STACK_SIZE can
      be changed to also accept an UNKNOWN_VALUE register in case its
      boundaries have been set and the range doesn't cause invalid memory
      accesses.
      
      One common situation when this is useful:
      
      int len;
      char buf[BUFSIZE]; /* BUFSIZE is 128 */
      
      if (some_condition)
      	len = 42;
      else
      	len = 84;
      
      some_helper(..., buf, len & (BUFSIZE - 1));
      
      The compiler can often decide to assign the constant values 42 or 48
      into a variable on the stack, instead of keeping it in a register. When
      the variable is then read back from stack into the register in order to
      be passed to the helper, the verifier will not be able to recognize the
      register as constant (the verifier is not currently tracking all
      constant writes into memory), and the program won't be valid.
      
      However, by allowing the helper to accept an UNKNOWN_VALUE register,
      this program will work because the bitwise AND operation will set the
      range of possible values for the UNKNOWN_VALUE register to [0, BUFSIZE),
      so the verifier can guarantee the helper call will be safe (assuming the
      argument is of type ARG_CONST_STACK_SIZE_OR_ZERO, otherwise one more
      check against 0 would be needed). Custom ranges can be set not only with
      ALU operations, but also by explicitly comparing the UNKNOWN_VALUE
      register with constants.
      
      Another very common example happens when intercepting system call
      arguments and accessing user-provided data of variable size using
      bpf_probe_read(). One can load at runtime the user-provided length in an
      UNKNOWN_VALUE register, and then read that exact amount of data up to a
      compile-time determined limit in order to fit into the proper local
      storage allocated on the stack, without having to guess a suboptimal
      access size at compile time.
      
      Also, in case the helpers accepting the UNKNOWN_VALUE register operate
      in raw mode, disable the raw mode so that the program is required to
      initialize all memory, since there is no guarantee the helper will fill
      it completely, leaving possibilities for data leak (just relevant when
      the memory used by the helper is the stack, not when using a pointer to
      map element value or packet). In other words, ARG_PTR_TO_RAW_STACK will
      be treated as ARG_PTR_TO_STACK.
      Signed-off-by: NGianluca Borello <g.borello@gmail.com>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      06c1c049
    • G
      bpf: allow adjusted map element values to spill · f0318d01
      Gianluca Borello 提交于
      commit 48461135 ("bpf: allow access into map value arrays")
      introduces the ability to do pointer math inside a map element value via
      the PTR_TO_MAP_VALUE_ADJ register type.
      
      The current support doesn't handle the case where a PTR_TO_MAP_VALUE_ADJ
      is spilled into the stack, limiting several use cases, especially when
      generating bpf code from a compiler.
      
      Handle this case by explicitly enabling the register type
      PTR_TO_MAP_VALUE_ADJ to be spilled. Also, make sure that min_value and
      max_value are reset just for BPF_LDX operations that don't result in a
      restore of a spilled register from stack.
      Signed-off-by: NGianluca Borello <g.borello@gmail.com>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f0318d01
    • G
      bpf: allow helpers access to map element values · 5722569b
      Gianluca Borello 提交于
      Enable helpers to directly access a map element value by passing a
      register type PTR_TO_MAP_VALUE (or PTR_TO_MAP_VALUE_ADJ) to helper
      arguments ARG_PTR_TO_STACK or ARG_PTR_TO_RAW_STACK.
      
      This enables several use cases. For example, a typical tracing program
      might want to capture pathnames passed to sys_open() with:
      
      struct trace_data {
      	char pathname[PATHLEN];
      };
      
      SEC("kprobe/sys_open")
      void bpf_sys_open(struct pt_regs *ctx)
      {
      	struct trace_data data;
      	bpf_probe_read(data.pathname, sizeof(data.pathname), ctx->di);
      
      	/* consume data.pathname, for example via
      	 * bpf_trace_printk() or bpf_perf_event_output()
      	 */
      }
      
      Such a program could easily hit the stack limit in case PATHLEN needs to
      be large or more local variables need to exist, both of which are quite
      common scenarios. Allowing direct helper access to map element values,
      one could do:
      
      struct bpf_map_def SEC("maps") scratch_map = {
      	.type = BPF_MAP_TYPE_PERCPU_ARRAY,
      	.key_size = sizeof(u32),
      	.value_size = sizeof(struct trace_data),
      	.max_entries = 1,
      };
      
      SEC("kprobe/sys_open")
      int bpf_sys_open(struct pt_regs *ctx)
      {
      	int id = 0;
      	struct trace_data *p = bpf_map_lookup_elem(&scratch_map, &id);
      	if (!p)
      		return;
      	bpf_probe_read(p->pathname, sizeof(p->pathname), ctx->di);
      
      	/* consume p->pathname, for example via
      	 * bpf_trace_printk() or bpf_perf_event_output()
      	 */
      }
      
      And wouldn't risk exhausting the stack.
      
      Code changes are loosely modeled after commit 6841de8b ("bpf: allow
      helpers access the packet directly"). Unlike with PTR_TO_PACKET, these
      changes just work with ARG_PTR_TO_STACK and ARG_PTR_TO_RAW_STACK (not
      ARG_PTR_TO_MAP_KEY, ARG_PTR_TO_MAP_VALUE, ...): adding those would be
      trivial, but since there is not currently a use case for that, it's
      reasonable to limit the set of changes.
      
      Also, add new tests to make sure accesses to map element values from
      helpers never go out of boundary, even when adjusted.
      Signed-off-by: NGianluca Borello <g.borello@gmail.com>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5722569b
  16. 17 12月, 2016 2 次提交
    • D
      bpf, test_verifier: fix a test case error result on unprivileged · 0eb6984f
      Daniel Borkmann 提交于
      Running ./test_verifier as unprivileged lets 1 out of 98 tests fail:
      
        [...]
        #71 unpriv: check that printk is disallowed FAIL
        Unexpected error message!
        0: (7a) *(u64 *)(r10 -8) = 0
        1: (bf) r1 = r10
        2: (07) r1 += -8
        3: (b7) r2 = 8
        4: (bf) r3 = r1
        5: (85) call bpf_trace_printk#6
        unknown func bpf_trace_printk#6
        [...]
      
      The test case is correct, just that the error outcome changed with
      ebb676da ("bpf: Print function name in addition to function id").
      Same as with e00c7b21 ("bpf: fix multiple issues in selftest suite
      and samples") issue 2), so just fix up the function name.
      
      Fixes: ebb676da ("bpf: Print function name in addition to function id")
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0eb6984f
    • D
      bpf: fix regression on verifier pruning wrt map lookups · a08dd0da
      Daniel Borkmann 提交于
      Commit 57a09bf0 ("bpf: Detect identical PTR_TO_MAP_VALUE_OR_NULL
      registers") introduced a regression where existing programs stopped
      loading due to reaching the verifier's maximum complexity limit,
      whereas prior to this commit they were loading just fine; the affected
      program has roughly 2k instructions.
      
      What was found is that state pruning couldn't be performed effectively
      anymore due to mismatches of the verifier's register state, in particular
      in the id tracking. It doesn't mean that 57a09bf0 is incorrect per
      se, but rather that verifier needs to perform a lot more work for the
      same program with regards to involved map lookups.
      
      Since commit 57a09bf0 is only about tracking registers with type
      PTR_TO_MAP_VALUE_OR_NULL, the id is only needed to follow registers
      until they are promoted through pattern matching with a NULL check to
      either PTR_TO_MAP_VALUE or UNKNOWN_VALUE type. After that point, the
      id becomes irrelevant for the transitioned types.
      
      For UNKNOWN_VALUE, id is already reset to 0 via mark_reg_unknown_value(),
      but not so for PTR_TO_MAP_VALUE where id is becoming stale. It's even
      transferred further into other types that don't make use of it. Among
      others, one example is where UNKNOWN_VALUE is set on function call
      return with RET_INTEGER return type.
      
      states_equal() will then fall through the memcmp() on register state;
      note that the second memcmp() uses offsetofend(), so the id is part of
      that since d2a4dd37 ("bpf: fix state equivalence"). But the bisect
      pointed already to 57a09bf0, where we really reach beyond complexity
      limit. What I found was that states_equal() often failed in this
      case due to id mismatches in spilled regs with registers in type
      PTR_TO_MAP_VALUE. Unlike non-spilled regs, spilled regs just perform
      a memcmp() on their reg state and don't have any other optimizations
      in place, therefore also id was relevant in this case for making a
      pruning decision.
      
      We can safely reset id to 0 as well when converting to PTR_TO_MAP_VALUE.
      For the affected program, it resulted in a ~17 fold reduction of
      complexity and let the program load fine again. Selftest suite also
      runs fine. The only other place where env->id_gen is used currently is
      through direct packet access, but for these cases id is long living, thus
      a different scenario.
      
      Also, the current logic in mark_map_regs() is not fully correct when
      marking NULL branch with UNKNOWN_VALUE. We need to cache the destination
      reg's id in any case. Otherwise, once we marked that reg as UNKNOWN_VALUE,
      it's id is reset and any subsequent registers that hold the original id
      and are of type PTR_TO_MAP_VALUE_OR_NULL won't be marked UNKNOWN_VALUE
      anymore, since mark_map_reg() reuses the uncached regs[regno].id that
      was just overridden. Note, we don't need to cache it outside of
      mark_map_regs(), since it's called once on this_branch and the other
      time on other_branch, which are both two independent verifier states.
      A test case for this is added here, too.
      
      Fixes: 57a09bf0 ("bpf: Detect identical PTR_TO_MAP_VALUE_OR_NULL registers")
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NThomas Graf <tgraf@suug.ch>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a08dd0da
  17. 06 12月, 2016 2 次提交
  18. 01 12月, 2016 1 次提交
  19. 28 11月, 2016 1 次提交
    • D
      bpf: fix multiple issues in selftest suite and samples · e00c7b21
      Daniel Borkmann 提交于
      1) The test_lru_map and test_lru_dist fails building on my machine since
         the sys/resource.h header is not included.
      
      2) test_verifier fails in one test case where we try to call an invalid
         function, since the verifier log output changed wrt printing function
         names.
      
      3) Current selftest suite code relies on sysconf(_SC_NPROCESSORS_CONF) for
         retrieving the number of possible CPUs. This is broken at least in our
         scenario and really just doesn't work.
      
         glibc tries a number of things for retrieving _SC_NPROCESSORS_CONF.
         First it tries equivalent of /sys/devices/system/cpu/cpu[0-9]* | wc -l,
         if that fails, depending on the config, it either tries to count CPUs
         in /proc/cpuinfo, or returns the _SC_NPROCESSORS_ONLN value instead.
         If /proc/cpuinfo has some issue, it returns just 1 worst case. This
         oddity is nothing new [1], but semantics/behaviour seems to be settled.
         _SC_NPROCESSORS_ONLN will parse /sys/devices/system/cpu/online, if
         that fails it looks into /proc/stat for cpuX entries, and if also that
         fails for some reason, /proc/cpuinfo is consulted (and returning 1 if
         unlikely all breaks down).
      
         While that might match num_possible_cpus() from the kernel in some
         cases, it's really not guaranteed with CPU hotplugging, and can result
         in a buffer overflow since the array in user space could have too few
         number of slots, and on perpcu map lookup, the kernel will write beyond
         that memory of the value buffer.
      
         William Tu reported such mismatches:
      
           [...] The fact that sysconf(_SC_NPROCESSORS_CONF) != num_possible_cpu()
           happens when CPU hotadd is enabled. For example, in Fusion when
           setting vcpu.hotadd = "TRUE" or in KVM, setting ./qemu-system-x86_64
           -smp 2, maxcpus=4 ... the num_possible_cpu() will be 4 and sysconf()
           will be 2 [2]. [...]
      
         Documentation/cputopology.txt says /sys/devices/system/cpu/possible
         outputs cpu_possible_mask. That is the same as in num_possible_cpus(),
         so first step would be to fix the _SC_NPROCESSORS_CONF calls with our
         own implementation. Later, we could add support to bpf(2) for passing
         a mask via CPU_SET(3), for example, to just select a subset of CPUs.
      
         BPF samples code needs this fix as well (at least so that people stop
         copying this). Thus, define bpf_num_possible_cpus() once in selftests
         and import it from there for the sample code to avoid duplicating it.
         The remaining sysconf(_SC_NPROCESSORS_CONF) in samples are unrelated.
      
      After all three issues are fixed, the test suite runs fine again:
      
        # make run_tests | grep self
        selftests: test_verifier [PASS]
        selftests: test_maps [PASS]
        selftests: test_lru_map [PASS]
        selftests: test_kmod.sh [PASS]
      
        [1] https://www.sourceware.org/ml/libc-alpha/2011-06/msg00079.html
        [2] https://www.mail-archive.com/netdev@vger.kernel.org/msg121183.html
      
      Fixes: 3059303f ("samples/bpf: update tracex[23] examples to use per-cpu maps")
      Fixes: 86af8b41 ("Add sample for adding simple drop program to link")
      Fixes: df570f57 ("samples/bpf: unit test for BPF_MAP_TYPE_PERCPU_ARRAY")
      Fixes: e1559671 ("samples/bpf: unit test for BPF_MAP_TYPE_PERCPU_HASH")
      Fixes: ebb676da ("bpf: Print function name in addition to function id")
      Fixes: 5db58faf ("bpf: Add tests for the LRU bpf_htab")
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Cc: William Tu <u9012063@gmail.com>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e00c7b21
  20. 19 10月, 2016 1 次提交
    • T
      bpf: Detect identical PTR_TO_MAP_VALUE_OR_NULL registers · 57a09bf0
      Thomas Graf 提交于
      A BPF program is required to check the return register of a
      map_elem_lookup() call before accessing memory. The verifier keeps
      track of this by converting the type of the result register from
      PTR_TO_MAP_VALUE_OR_NULL to PTR_TO_MAP_VALUE after a conditional
      jump ensures safety. This check is currently exclusively performed
      for the result register 0.
      
      In the event the compiler reorders instructions, BPF_MOV64_REG
      instructions may be moved before the conditional jump which causes
      them to keep their type PTR_TO_MAP_VALUE_OR_NULL to which the
      verifier objects when the register is accessed:
      
      0: (b7) r1 = 10
      1: (7b) *(u64 *)(r10 -8) = r1
      2: (bf) r2 = r10
      3: (07) r2 += -8
      4: (18) r1 = 0x59c00000
      6: (85) call 1
      7: (bf) r4 = r0
      8: (15) if r0 == 0x0 goto pc+1
       R0=map_value(ks=8,vs=8) R4=map_value_or_null(ks=8,vs=8) R10=fp
      9: (7a) *(u64 *)(r4 +0) = 0
      R4 invalid mem access 'map_value_or_null'
      
      This commit extends the verifier to keep track of all identical
      PTR_TO_MAP_VALUE_OR_NULL registers after a map_elem_lookup() by
      assigning them an ID and then marking them all when the conditional
      jump is observed.
      Signed-off-by: NThomas Graf <tgraf@suug.ch>
      Reviewed-by: NJosef Bacik <jbacik@fb.com>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      57a09bf0
  21. 18 10月, 2016 2 次提交
  22. 29 9月, 2016 1 次提交
    • J
      bpf: allow access into map value arrays · 48461135
      Josef Bacik 提交于
      Suppose you have a map array value that is something like this
      
      struct foo {
      	unsigned iter;
      	int array[SOME_CONSTANT];
      };
      
      You can easily insert this into an array, but you cannot modify the contents of
      foo->array[] after the fact.  This is because we have no way to verify we won't
      go off the end of the array at verification time.  This patch provides a start
      for this work.  We accomplish this by keeping track of a minimum and maximum
      value a register could be while we're checking the code.  Then at the time we
      try to do an access into a MAP_VALUE we verify that the maximum offset into that
      region is a valid access into that memory region.  So in practice, code such as
      this
      
      unsigned index = 0;
      
      if (foo->iter >= SOME_CONSTANT)
      	foo->iter = index;
      else
      	index = foo->iter++;
      foo->array[index] = bar;
      
      would be allowed, as we can verify that index will always be between 0 and
      SOME_CONSTANT-1.  If you wish to use signed values you'll have to have an extra
      check to make sure the index isn't less than 0, or do something like index %=
      SOME_CONSTANT.
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      48461135
  23. 21 9月, 2016 1 次提交
  24. 09 9月, 2016 1 次提交
    • D
      bpf: fix range propagation on direct packet access · 2d2be8ca
      Daniel Borkmann 提交于
      LLVM can generate code that tests for direct packet access via
      skb->data/data_end in a way that currently gets rejected by the
      verifier, example:
      
        [...]
         7: (61) r3 = *(u32 *)(r6 +80)
         8: (61) r9 = *(u32 *)(r6 +76)
         9: (bf) r2 = r9
        10: (07) r2 += 54
        11: (3d) if r3 >= r2 goto pc+12
         R1=inv R2=pkt(id=0,off=54,r=0) R3=pkt_end R4=inv R6=ctx
         R9=pkt(id=0,off=0,r=0) R10=fp
        12: (18) r4 = 0xffffff7a
        14: (05) goto pc+430
        [...]
      
        from 11 to 24: R1=inv R2=pkt(id=0,off=54,r=0) R3=pkt_end R4=inv
                       R6=ctx R9=pkt(id=0,off=0,r=0) R10=fp
        24: (7b) *(u64 *)(r10 -40) = r1
        25: (b7) r1 = 0
        26: (63) *(u32 *)(r6 +56) = r1
        27: (b7) r2 = 40
        28: (71) r8 = *(u8 *)(r9 +20)
        invalid access to packet, off=20 size=1, R9(id=0,off=0,r=0)
      
      The reason why this gets rejected despite a proper test is that we
      currently call find_good_pkt_pointers() only in case where we detect
      tests like rX > pkt_end, where rX is of type pkt(id=Y,off=Z,r=0) and
      derived, for example, from a register of type pkt(id=Y,off=0,r=0)
      pointing to skb->data. find_good_pkt_pointers() then fills the range
      in the current branch to pkt(id=Y,off=0,r=Z) on success.
      
      For above case, we need to extend that to recognize pkt_end >= rX
      pattern and mark the other branch that is taken on success with the
      appropriate pkt(id=Y,off=0,r=Z) type via find_good_pkt_pointers().
      Since eBPF operates on BPF_JGT (>) and BPF_JGE (>=), these are the
      only two practical options to test for from what LLVM could have
      generated, since there's no such thing as BPF_JLT (<) or BPF_JLE (<=)
      that we would need to take into account as well.
      
      After the fix:
      
        [...]
         7: (61) r3 = *(u32 *)(r6 +80)
         8: (61) r9 = *(u32 *)(r6 +76)
         9: (bf) r2 = r9
        10: (07) r2 += 54
        11: (3d) if r3 >= r2 goto pc+12
         R1=inv R2=pkt(id=0,off=54,r=0) R3=pkt_end R4=inv R6=ctx
         R9=pkt(id=0,off=0,r=0) R10=fp
        12: (18) r4 = 0xffffff7a
        14: (05) goto pc+430
        [...]
      
        from 11 to 24: R1=inv R2=pkt(id=0,off=54,r=54) R3=pkt_end R4=inv
                       R6=ctx R9=pkt(id=0,off=0,r=54) R10=fp
        24: (7b) *(u64 *)(r10 -40) = r1
        25: (b7) r1 = 0
        26: (63) *(u32 *)(r6 +56) = r1
        27: (b7) r2 = 40
        28: (71) r8 = *(u8 *)(r9 +20)
        29: (bf) r1 = r8
        30: (25) if r8 > 0x3c goto pc+47
         R1=inv56 R2=imm40 R3=pkt_end R4=inv R6=ctx R8=inv56
         R9=pkt(id=0,off=0,r=54) R10=fp
        31: (b7) r1 = 1
        [...]
      
      Verifier test cases are also added in this work, one that demonstrates
      the mentioned example here and one that tries a bad packet access for
      the current/fall-through branch (the one with types pkt(id=X,off=Y,r=0),
      pkt(id=X,off=0,r=0)), then a case with good and bad accesses, and two
      with both test variants (>, >=).
      
      Fixes: 969bf05e ("bpf: direct packet access")
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2d2be8ca
  25. 13 8月, 2016 1 次提交
  26. 07 5月, 2016 1 次提交
  27. 15 4月, 2016 1 次提交
    • D
      bpf, samples: add test cases for raw stack · 3f2050e2
      Daniel Borkmann 提交于
      This adds test cases mostly around ARG_PTR_TO_RAW_STACK to check the
      verifier behaviour.
      
        [...]
        #84 raw_stack: no skb_load_bytes OK
        #85 raw_stack: skb_load_bytes, no init OK
        #86 raw_stack: skb_load_bytes, init OK
        #87 raw_stack: skb_load_bytes, spilled regs around bounds OK
        #88 raw_stack: skb_load_bytes, spilled regs corruption OK
        #89 raw_stack: skb_load_bytes, spilled regs corruption 2 OK
        #90 raw_stack: skb_load_bytes, spilled regs + data OK
        #91 raw_stack: skb_load_bytes, invalid access 1 OK
        #92 raw_stack: skb_load_bytes, invalid access 2 OK
        #93 raw_stack: skb_load_bytes, invalid access 3 OK
        #94 raw_stack: skb_load_bytes, invalid access 4 OK
        #95 raw_stack: skb_load_bytes, invalid access 5 OK
        #96 raw_stack: skb_load_bytes, invalid access 6 OK
        #97 raw_stack: skb_load_bytes, large access OK
        Summary: 98 PASSED, 0 FAILED
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3f2050e2
  28. 09 3月, 2016 1 次提交
  29. 13 10月, 2015 1 次提交
    • A
      bpf: add unprivileged bpf tests · bf508877
      Alexei Starovoitov 提交于
      Add new tests samples/bpf/test_verifier:
      
      unpriv: return pointer
        checks that pointer cannot be returned from the eBPF program
      
      unpriv: add const to pointer
      unpriv: add pointer to pointer
      unpriv: neg pointer
        checks that pointer arithmetic is disallowed
      
      unpriv: cmp pointer with const
      unpriv: cmp pointer with pointer
        checks that comparison of pointers is disallowed
        Only one case allowed 'void *value = bpf_map_lookup_elem(..); if (value == 0) ...'
      
      unpriv: check that printk is disallowed
        since bpf_trace_printk is not available to unprivileged
      
      unpriv: pass pointer to helper function
        checks that pointers cannot be passed to functions that expect integers
        If function expects a pointer the verifier allows only that type of pointer.
        Like 1st argument of bpf_map_lookup_elem() must be pointer to map.
        (applies to non-root as well)
      
      unpriv: indirectly pass pointer on stack to helper function
        checks that pointer stored into stack cannot be used as part of key
        passed into bpf_map_lookup_elem()
      
      unpriv: mangle pointer on stack 1
      unpriv: mangle pointer on stack 2
        checks that writing into stack slot that already contains a pointer
        is disallowed
      
      unpriv: read pointer from stack in small chunks
        checks that < 8 byte read from stack slot that contains a pointer is
        disallowed
      
      unpriv: write pointer into ctx
        checks that storing pointers into skb->fields is disallowed
      
      unpriv: write pointer into map elem value
        checks that storing pointers into element values is disallowed
        For example:
        int bpf_prog(struct __sk_buff *skb)
        {
          u32 key = 0;
          u64 *value = bpf_map_lookup_elem(&map, &key);
          if (value)
             *value = (u64) skb;
        }
        will be rejected.
      
      unpriv: partial copy of pointer
        checks that doing 32-bit register mov from register containing
        a pointer is disallowed
      
      unpriv: pass pointer to tail_call
        checks that passing pointer as an index into bpf_tail_call
        is disallowed
      
      unpriv: cmp map pointer with zero
        checks that comparing map pointer with constant is disallowed
      
      unpriv: write into frame pointer
        checks that frame pointer is read-only (applies to root too)
      
      unpriv: cmp of frame pointer
        checks that R10 cannot be using in comparison
      
      unpriv: cmp of stack pointer
        checks that Rx = R10 - imm is ok, but comparing Rx is not
      
      unpriv: obfuscate stack pointer
        checks that Rx = R10 - imm is ok, but Rx -= imm is not
      Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bf508877
  30. 27 7月, 2015 1 次提交
  31. 07 6月, 2015 1 次提交
    • A
      bpf: allow programs to write to certain skb fields · d691f9e8
      Alexei Starovoitov 提交于
      allow programs read/write skb->mark, tc_index fields and
      ((struct qdisc_skb_cb *)cb)->data.
      
      mark and tc_index are generically useful in TC.
      cb[0]-cb[4] are primarily used to pass arguments from one
      program to another called via bpf_tail_call() which can
      be seen in sockex3_kern.c example.
      
      All fields of 'struct __sk_buff' are readable to socket and tc_cls_act progs.
      mark, tc_index are writeable from tc_cls_act only.
      cb[0]-cb[4] are writeable by both sockets and tc_cls_act.
      
      Add verifier tests and improve sample code.
      Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d691f9e8