1. 09 8月, 2017 1 次提交
    • E
      bpf/verifier: rework value tracking · f1174f77
      Edward Cree 提交于
      Unifies adjusted and unadjusted register value types (e.g. FRAME_POINTER is
       now just a PTR_TO_STACK with zero offset).
      Tracks value alignment by means of tracking known & unknown bits.  This
       also replaces the 'reg->imm' (leading zero bits) calculations for (what
       were) UNKNOWN_VALUEs.
      If pointer leaks are allowed, and adjust_ptr_min_max_vals returns -EACCES,
       treat the pointer as an unknown scalar and try again, because we might be
       able to conclude something about the result (e.g. pointer & 0x40 is either
       0 or 0x40).
      Verifier hooks in the netronome/nfp driver were changed to match the new
       data structures.
      Signed-off-by: NEdward Cree <ecree@solarflare.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f1174f77
  2. 21 7月, 2017 1 次提交
    • D
      bpf: fix mixed signed/unsigned derived min/max value bounds · 4cabc5b1
      Daniel Borkmann 提交于
      Edward reported that there's an issue in min/max value bounds
      tracking when signed and unsigned compares both provide hints
      on limits when having unknown variables. E.g. a program such
      as the following should have been rejected:
      
         0: (7a) *(u64 *)(r10 -8) = 0
         1: (bf) r2 = r10
         2: (07) r2 += -8
         3: (18) r1 = 0xffff8a94cda93400
         5: (85) call bpf_map_lookup_elem#1
         6: (15) if r0 == 0x0 goto pc+7
        R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R10=fp
         7: (7a) *(u64 *)(r10 -16) = -8
         8: (79) r1 = *(u64 *)(r10 -16)
         9: (b7) r2 = -1
        10: (2d) if r1 > r2 goto pc+3
        R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R1=inv,min_value=0
        R2=imm-1,max_value=18446744073709551615,min_align=1 R10=fp
        11: (65) if r1 s> 0x1 goto pc+2
        R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R1=inv,min_value=0,max_value=1
        R2=imm-1,max_value=18446744073709551615,min_align=1 R10=fp
        12: (0f) r0 += r1
        13: (72) *(u8 *)(r0 +0) = 0
        R0=map_value_adj(ks=8,vs=8,id=0),min_value=0,max_value=1 R1=inv,min_value=0,max_value=1
        R2=imm-1,max_value=18446744073709551615,min_align=1 R10=fp
        14: (b7) r0 = 0
        15: (95) exit
      
      What happens is that in the first part ...
      
         8: (79) r1 = *(u64 *)(r10 -16)
         9: (b7) r2 = -1
        10: (2d) if r1 > r2 goto pc+3
      
      ... r1 carries an unsigned value, and is compared as unsigned
      against a register carrying an immediate. Verifier deduces in
      reg_set_min_max() that since the compare is unsigned and operation
      is greater than (>), that in the fall-through/false case, r1's
      minimum bound must be 0 and maximum bound must be r2. Latter is
      larger than the bound and thus max value is reset back to being
      'invalid' aka BPF_REGISTER_MAX_RANGE. Thus, r1 state is now
      'R1=inv,min_value=0'. The subsequent test ...
      
        11: (65) if r1 s> 0x1 goto pc+2
      
      ... is a signed compare of r1 with immediate value 1. Here,
      verifier deduces in reg_set_min_max() that since the compare
      is signed this time and operation is greater than (>), that
      in the fall-through/false case, we can deduce that r1's maximum
      bound must be 1, meaning with prior test, we result in r1 having
      the following state: R1=inv,min_value=0,max_value=1. Given that
      the actual value this holds is -8, the bounds are wrongly deduced.
      When this is being added to r0 which holds the map_value(_adj)
      type, then subsequent store access in above case will go through
      check_mem_access() which invokes check_map_access_adj(), that
      will then probe whether the map memory is in bounds based
      on the min_value and max_value as well as access size since
      the actual unknown value is min_value <= x <= max_value; commit
      fce366a9 ("bpf, verifier: fix alu ops against map_value{,
      _adj} register types") provides some more explanation on the
      semantics.
      
      It's worth to note in this context that in the current code,
      min_value and max_value tracking are used for two things, i)
      dynamic map value access via check_map_access_adj() and since
      commit 06c1c049 ("bpf: allow helpers access to variable memory")
      ii) also enforced at check_helper_mem_access() when passing a
      memory address (pointer to packet, map value, stack) and length
      pair to a helper and the length in this case is an unknown value
      defining an access range through min_value/max_value in that
      case. The min_value/max_value tracking is /not/ used in the
      direct packet access case to track ranges. However, the issue
      also affects case ii), for example, the following crafted program
      based on the same principle must be rejected as well:
      
         0: (b7) r2 = 0
         1: (bf) r3 = r10
         2: (07) r3 += -512
         3: (7a) *(u64 *)(r10 -16) = -8
         4: (79) r4 = *(u64 *)(r10 -16)
         5: (b7) r6 = -1
         6: (2d) if r4 > r6 goto pc+5
        R1=ctx R2=imm0,min_value=0,max_value=0,min_align=2147483648 R3=fp-512
        R4=inv,min_value=0 R6=imm-1,max_value=18446744073709551615,min_align=1 R10=fp
         7: (65) if r4 s> 0x1 goto pc+4
        R1=ctx R2=imm0,min_value=0,max_value=0,min_align=2147483648 R3=fp-512
        R4=inv,min_value=0,max_value=1 R6=imm-1,max_value=18446744073709551615,min_align=1
        R10=fp
         8: (07) r4 += 1
         9: (b7) r5 = 0
        10: (6a) *(u16 *)(r10 -512) = 0
        11: (85) call bpf_skb_load_bytes#26
        12: (b7) r0 = 0
        13: (95) exit
      
      Meaning, while we initialize the max_value stack slot that the
      verifier thinks we access in the [1,2] range, in reality we
      pass -7 as length which is interpreted as u32 in the helper.
      Thus, this issue is relevant also for the case of helper ranges.
      Resetting both bounds in check_reg_overflow() in case only one
      of them exceeds limits is also not enough as similar test can be
      created that uses values which are within range, thus also here
      learned min value in r1 is incorrect when mixed with later signed
      test to create a range:
      
         0: (7a) *(u64 *)(r10 -8) = 0
         1: (bf) r2 = r10
         2: (07) r2 += -8
         3: (18) r1 = 0xffff880ad081fa00
         5: (85) call bpf_map_lookup_elem#1
         6: (15) if r0 == 0x0 goto pc+7
        R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R10=fp
         7: (7a) *(u64 *)(r10 -16) = -8
         8: (79) r1 = *(u64 *)(r10 -16)
         9: (b7) r2 = 2
        10: (3d) if r2 >= r1 goto pc+3
        R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R1=inv,min_value=3
        R2=imm2,min_value=2,max_value=2,min_align=2 R10=fp
        11: (65) if r1 s> 0x4 goto pc+2
        R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0
        R1=inv,min_value=3,max_value=4 R2=imm2,min_value=2,max_value=2,min_align=2 R10=fp
        12: (0f) r0 += r1
        13: (72) *(u8 *)(r0 +0) = 0
        R0=map_value_adj(ks=8,vs=8,id=0),min_value=3,max_value=4
        R1=inv,min_value=3,max_value=4 R2=imm2,min_value=2,max_value=2,min_align=2 R10=fp
        14: (b7) r0 = 0
        15: (95) exit
      
      This leaves us with two options for fixing this: i) to invalidate
      all prior learned information once we switch signed context, ii)
      to track min/max signed and unsigned boundaries separately as
      done in [0]. (Given latter introduces major changes throughout
      the whole verifier, it's rather net-next material, thus this
      patch follows option i), meaning we can derive bounds either
      from only signed tests or only unsigned tests.) There is still the
      case of adjust_reg_min_max_vals(), where we adjust bounds on ALU
      operations, meaning programs like the following where boundaries
      on the reg get mixed in context later on when bounds are merged
      on the dst reg must get rejected, too:
      
         0: (7a) *(u64 *)(r10 -8) = 0
         1: (bf) r2 = r10
         2: (07) r2 += -8
         3: (18) r1 = 0xffff89b2bf87ce00
         5: (85) call bpf_map_lookup_elem#1
         6: (15) if r0 == 0x0 goto pc+6
        R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R10=fp
         7: (7a) *(u64 *)(r10 -16) = -8
         8: (79) r1 = *(u64 *)(r10 -16)
         9: (b7) r2 = 2
        10: (3d) if r2 >= r1 goto pc+2
        R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R1=inv,min_value=3
        R2=imm2,min_value=2,max_value=2,min_align=2 R10=fp
        11: (b7) r7 = 1
        12: (65) if r7 s> 0x0 goto pc+2
        R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R1=inv,min_value=3
        R2=imm2,min_value=2,max_value=2,min_align=2 R7=imm1,max_value=0 R10=fp
        13: (b7) r0 = 0
        14: (95) exit
      
        from 12 to 15: R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0
        R1=inv,min_value=3 R2=imm2,min_value=2,max_value=2,min_align=2 R7=imm1,min_value=1 R10=fp
        15: (0f) r7 += r1
        16: (65) if r7 s> 0x4 goto pc+2
        R0=map_value(ks=8,vs=8,id=0),min_value=0,max_value=0 R1=inv,min_value=3
        R2=imm2,min_value=2,max_value=2,min_align=2 R7=inv,min_value=4,max_value=4 R10=fp
        17: (0f) r0 += r7
        18: (72) *(u8 *)(r0 +0) = 0
        R0=map_value_adj(ks=8,vs=8,id=0),min_value=4,max_value=4 R1=inv,min_value=3
        R2=imm2,min_value=2,max_value=2,min_align=2 R7=inv,min_value=4,max_value=4 R10=fp
        19: (b7) r0 = 0
        20: (95) exit
      
      Meaning, in adjust_reg_min_max_vals() we must also reset range
      values on the dst when src/dst registers have mixed signed/
      unsigned derived min/max value bounds with one unbounded value
      as otherwise they can be added together deducing false boundaries.
      Once both boundaries are established from either ALU ops or
      compare operations w/o mixing signed/unsigned insns, then they
      can safely be added to other regs also having both boundaries
      established. Adding regs with one unbounded side to a map value
      where the bounded side has been learned w/o mixing ops is
      possible, but the resulting map value won't recover from that,
      meaning such op is considered invalid on the time of actual
      access. Invalid bounds are set on the dst reg in case i) src reg,
      or ii) in case dst reg already had them. The only way to recover
      would be to perform i) ALU ops but only 'add' is allowed on map
      value types or ii) comparisons, but these are disallowed on
      pointers in case they span a range. This is fine as only BPF_JEQ
      and BPF_JNE may be performed on PTR_TO_MAP_VALUE_OR_NULL registers
      which potentially turn them into PTR_TO_MAP_VALUE type depending
      on the branch, so only here min/max value cannot be invalidated
      for them.
      
      In terms of state pruning, value_from_signed is considered
      as well in states_equal() when dealing with adjusted map values.
      With regards to breaking existing programs, there is a small
      risk, but use-cases are rather quite narrow where this could
      occur and mixing compares probably unlikely.
      
      Joint work with Josef and Edward.
      
        [0] https://lists.iovisor.org/pipermail/iovisor-dev/2017-June/000822.html
      
      Fixes: 48461135 ("bpf: allow access into map value arrays")
      Reported-by: NEdward Cree <ecree@solarflare.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NEdward Cree <ecree@solarflare.com>
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4cabc5b1
  3. 24 6月, 2017 1 次提交
    • Y
      bpf: possibly avoid extra masking for narrower load in verifier · 23994631
      Yonghong Song 提交于
      Commit 31fd8581 ("bpf: permits narrower load from bpf program
      context fields") permits narrower load for certain ctx fields.
      The commit however will already generate a masking even if
      the prog-specific ctx conversion produces the result with
      narrower size.
      
      For example, for __sk_buff->protocol, the ctx conversion
      loads the data into register with 2-byte load.
      A narrower 2-byte load should not generate masking.
      For __sk_buff->vlan_present, the conversion function
      set the result as either 0 or 1, essentially a byte.
      The narrower 2-byte or 1-byte load should not generate masking.
      
      To avoid unnecessary masking, prog-specific *_is_valid_access
      now passes converted_op_size back to verifier, which indicates
      the valid data width after perceived future conversion.
      Based on this information, verifier is able to avoid
      unnecessary marking.
      
      Since we want more information back from prog-specific
      *_is_valid_access checking, all of them are packed into
      one data structure for more clarity.
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NYonghong Song <yhs@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      23994631
  4. 15 6月, 2017 1 次提交
    • Y
      bpf: permits narrower load from bpf program context fields · 31fd8581
      Yonghong Song 提交于
      Currently, verifier will reject a program if it contains an
      narrower load from the bpf context structure. For example,
              __u8 h = __sk_buff->hash, or
              __u16 p = __sk_buff->protocol
              __u32 sample_period = bpf_perf_event_data->sample_period
      which are narrower loads of 4-byte or 8-byte field.
      
      This patch solves the issue by:
        . Introduce a new parameter ctx_field_size to carry the
          field size of narrower load from prog type
          specific *__is_valid_access validator back to verifier.
        . The non-zero ctx_field_size for a memory access indicates
          (1). underlying prog type specific convert_ctx_accesses
               supporting non-whole-field access
          (2). the current insn is a narrower or whole field access.
        . In verifier, for such loads where load memory size is
          less than ctx_field_size, verifier transforms it
          to a full field load followed by proper masking.
        . Currently, __sk_buff and bpf_perf_event_data->sample_period
          are supporting narrowing loads.
        . Narrower stores are still not allowed as typical ctx stores
          are just normal stores.
      
      Because of this change, some tests in verifier will fail and
      these tests are removed. As a bonus, rename some out of bound
      __sk_buff->cb access to proper field name and remove two
      redundant "skb cb oob" tests.
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NYonghong Song <yhs@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      31fd8581
  5. 12 5月, 2017 2 次提交
    • D
      bpf: Add strict alignment flag for BPF_PROG_LOAD. · e07b98d9
      David S. Miller 提交于
      Add a new field, "prog_flags", and an initial flag value
      BPF_F_STRICT_ALIGNMENT.
      
      When set, the verifier will enforce strict pointer alignment
      regardless of the setting of CONFIG_EFFICIENT_UNALIGNED_ACCESS.
      
      The verifier, in this mode, will also use a fixed value of "2" in
      place of NET_IP_ALIGN.
      
      This facilitates test cases that will exercise and validate this part
      of the verifier even when run on architectures where alignment doesn't
      matter.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      e07b98d9
    • D
      bpf: Track alignment of register values in the verifier. · d1174416
      David S. Miller 提交于
      Currently if we add only constant values to pointers we can fully
      validate the alignment, and properly check if we need to reject the
      program on !CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS architectures.
      
      However, once an unknown value is introduced we only allow byte sized
      memory accesses which is too restrictive.
      
      Add logic to track the known minimum alignment of register values,
      and propagate this state into registers containing pointers.
      
      The most common paradigm that makes use of this new logic is computing
      the transport header using the IP header length field.  For example:
      
      	struct ethhdr *ep = skb->data;
      	struct iphdr *iph = (struct iphdr *) (ep + 1);
      	struct tcphdr *th;
       ...
      	n = iph->ihl;
      	th = ((void *)iph + (n * 4));
      	port = th->dest;
      
      The existing code will reject the load of th->dest because it cannot
      validate that the alignment is at least 2 once "n * 4" is added the
      the packet pointer.
      
      In the new code, the register holding "n * 4" will have a reg->min_align
      value of 4, because any value multiplied by 4 will be at least 4 byte
      aligned.  (actually, the eBPF code emitted by the compiler in this case
      is most likely to use a shift left by 2, but the end result is identical)
      
      At the critical addition:
      
      	th = ((void *)iph + (n * 4));
      
      The register holding 'th' will start with reg->off value of 14.  The
      pointer addition will transform that reg into something that looks like:
      
      	reg->aux_off = 14
      	reg->aux_off_align = 4
      
      Next, the verifier will look at the th->dest load, and it will see
      a load offset of 2, and first check:
      
      	if (reg->aux_off_align % size)
      
      which will pass because aux_off_align is 4.  reg_off will be computed:
      
      	reg_off = reg->off;
       ...
      		reg_off += reg->aux_off;
      
      plus we have off==2, and it will thus check:
      
      	if ((NET_IP_ALIGN + reg_off + off) % size != 0)
      
      which evaluates to:
      
      	if ((NET_IP_ALIGN + 14 + 2) % size != 0)
      
      On strict alignment architectures, NET_IP_ALIGN is 2, thus:
      
      	if ((2 + 14 + 2) % size != 0)
      
      which passes.
      
      These pointer transformations and checks work regardless of whether
      the constant offset or the variable with known alignment is added
      first to the pointer register.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      d1174416
  6. 17 3月, 2017 1 次提交
  7. 09 12月, 2016 1 次提交
  8. 17 11月, 2016 1 次提交
    • J
      bpf: fix range arithmetic for bpf map access · f23cc643
      Josef Bacik 提交于
      I made some invalid assumptions with BPF_AND and BPF_MOD that could result in
      invalid accesses to bpf map entries.  Fix this up by doing a few things
      
      1) Kill BPF_MOD support.  This doesn't actually get used by the compiler in real
      life and just adds extra complexity.
      
      2) Fix the logic for BPF_AND, don't allow AND of negative numbers and set the
      minimum value to 0 for positive AND's.
      
      3) Don't do operations on the ranges if they are set to the limits, as they are
      by definition undefined, and allowing arithmetic operations on those values
      could make them appear valid when they really aren't.
      
      This fixes the testcase provided by Jann as well as a few other theoretical
      problems.
      Reported-by: NJann Horn <jannh@google.com>
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f23cc643
  9. 19 10月, 2016 1 次提交
    • T
      bpf: Detect identical PTR_TO_MAP_VALUE_OR_NULL registers · 57a09bf0
      Thomas Graf 提交于
      A BPF program is required to check the return register of a
      map_elem_lookup() call before accessing memory. The verifier keeps
      track of this by converting the type of the result register from
      PTR_TO_MAP_VALUE_OR_NULL to PTR_TO_MAP_VALUE after a conditional
      jump ensures safety. This check is currently exclusively performed
      for the result register 0.
      
      In the event the compiler reorders instructions, BPF_MOV64_REG
      instructions may be moved before the conditional jump which causes
      them to keep their type PTR_TO_MAP_VALUE_OR_NULL to which the
      verifier objects when the register is accessed:
      
      0: (b7) r1 = 10
      1: (7b) *(u64 *)(r10 -8) = r1
      2: (bf) r2 = r10
      3: (07) r2 += -8
      4: (18) r1 = 0x59c00000
      6: (85) call 1
      7: (bf) r4 = r0
      8: (15) if r0 == 0x0 goto pc+1
       R0=map_value(ks=8,vs=8) R4=map_value_or_null(ks=8,vs=8) R10=fp
      9: (7a) *(u64 *)(r4 +0) = 0
      R4 invalid mem access 'map_value_or_null'
      
      This commit extends the verifier to keep track of all identical
      PTR_TO_MAP_VALUE_OR_NULL registers after a map_elem_lookup() by
      assigning them an ID and then marking them all when the conditional
      jump is observed.
      Signed-off-by: NThomas Graf <tgraf@suug.ch>
      Reviewed-by: NJosef Bacik <jbacik@fb.com>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      57a09bf0
  10. 29 9月, 2016 1 次提交
    • J
      bpf: allow access into map value arrays · 48461135
      Josef Bacik 提交于
      Suppose you have a map array value that is something like this
      
      struct foo {
      	unsigned iter;
      	int array[SOME_CONSTANT];
      };
      
      You can easily insert this into an array, but you cannot modify the contents of
      foo->array[] after the fact.  This is because we have no way to verify we won't
      go off the end of the array at verification time.  This patch provides a start
      for this work.  We accomplish this by keeping track of a minimum and maximum
      value a register could be while we're checking the code.  Then at the time we
      try to do an access into a MAP_VALUE we verify that the maximum offset into that
      region is a valid access into that memory region.  So in practice, code such as
      this
      
      unsigned index = 0;
      
      if (foo->iter >= SOME_CONSTANT)
      	foo->iter = index;
      else
      	index = foo->iter++;
      foo->array[index] = bar;
      
      would be allowed, as we can verify that index will always be between 0 and
      SOME_CONSTANT-1.  If you wish to use signed values you'll have to have an extra
      check to make sure the index isn't less than 0, or do something like index %=
      SOME_CONSTANT.
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      48461135
  11. 22 9月, 2016 2 次提交