1. 01 12月, 2016 1 次提交
  2. 17 11月, 2016 1 次提交
    • J
      bpf: fix range arithmetic for bpf map access · f23cc643
      Josef Bacik 提交于
      I made some invalid assumptions with BPF_AND and BPF_MOD that could result in
      invalid accesses to bpf map entries.  Fix this up by doing a few things
      
      1) Kill BPF_MOD support.  This doesn't actually get used by the compiler in real
      life and just adds extra complexity.
      
      2) Fix the logic for BPF_AND, don't allow AND of negative numbers and set the
      minimum value to 0 for positive AND's.
      
      3) Don't do operations on the ranges if they are set to the limits, as they are
      by definition undefined, and allowing arithmetic operations on those values
      could make them appear valid when they really aren't.
      
      This fixes the testcase provided by Jann as well as a few other theoretical
      problems.
      Reported-by: NJann Horn <jannh@google.com>
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f23cc643
  3. 29 9月, 2016 1 次提交
    • J
      bpf: allow access into map value arrays · 48461135
      Josef Bacik 提交于
      Suppose you have a map array value that is something like this
      
      struct foo {
      	unsigned iter;
      	int array[SOME_CONSTANT];
      };
      
      You can easily insert this into an array, but you cannot modify the contents of
      foo->array[] after the fact.  This is because we have no way to verify we won't
      go off the end of the array at verification time.  This patch provides a start
      for this work.  We accomplish this by keeping track of a minimum and maximum
      value a register could be while we're checking the code.  Then at the time we
      try to do an access into a MAP_VALUE we verify that the maximum offset into that
      region is a valid access into that memory region.  So in practice, code such as
      this
      
      unsigned index = 0;
      
      if (foo->iter >= SOME_CONSTANT)
      	foo->iter = index;
      else
      	index = foo->iter++;
      foo->array[index] = bar;
      
      would be allowed, as we can verify that index will always be between 0 and
      SOME_CONSTANT-1.  If you wish to use signed values you'll have to have an extra
      check to make sure the index isn't less than 0, or do something like index %=
      SOME_CONSTANT.
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      48461135
  4. 27 9月, 2016 1 次提交
    • M
      bpf: Set register type according to is_valid_access() · 1955351d
      Mickaël Salaün 提交于
      This prevent future potential pointer leaks when an unprivileged eBPF
      program will read a pointer value from its context. Even if
      is_valid_access() returns a pointer type, the eBPF verifier replace it
      with UNKNOWN_VALUE. The register value that contains a kernel address is
      then allowed to leak. Moreover, this fix allows unprivileged eBPF
      programs to use functions with (legitimate) pointer arguments.
      
      Not an issue currently since reg_type is only set for PTR_TO_PACKET or
      PTR_TO_PACKET_END in XDP and TC programs that can only be loaded as
      privileged. For now, the only unprivileged eBPF program allowed is for
      socket filtering and all the types from its context are UNKNOWN_VALUE.
      However, this fix is important for future unprivileged eBPF programs
      which could use pointers in their context.
      Signed-off-by: NMickaël Salaün <mic@digikod.net>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1955351d
  5. 22 9月, 2016 4 次提交
  6. 21 9月, 2016 2 次提交
    • D
      bpf: direct packet write and access for helpers for clsact progs · 36bbef52
      Daniel Borkmann 提交于
      This work implements direct packet access for helpers and direct packet
      write in a similar fashion as already available for XDP types via commits
      4acf6c0b ("bpf: enable direct packet data write for xdp progs") and
      6841de8b ("bpf: allow helpers access the packet directly"), and as a
      complementary feature to the already available direct packet read for tc
      (cls/act) programs.
      
      For enabling this, we need to introduce two helpers, bpf_skb_pull_data()
      and bpf_csum_update(). The first is generally needed for both, read and
      write, because they would otherwise only be limited to the current linear
      skb head. Usually, when the data_end test fails, programs just bail out,
      or, in the direct read case, use bpf_skb_load_bytes() as an alternative
      to overcome this limitation. If such data sits in non-linear parts, we
      can just pull them in once with the new helper, retest and eventually
      access them.
      
      At the same time, this also makes sure the skb is uncloned, which is, of
      course, a necessary condition for direct write. As this needs to be an
      invariant for the write part only, the verifier detects writes and adds
      a prologue that is calling bpf_skb_pull_data() to effectively unclone the
      skb from the very beginning in case it is indeed cloned. The heuristic
      makes use of a similar trick that was done in 233577a2 ("net: filter:
      constify detection of pkt_type_offset"). This comes at zero cost for other
      programs that do not use the direct write feature. Should a program use
      this feature only sparsely and has read access for the most parts with,
      for example, drop return codes, then such write action can be delegated
      to a tail called program for mitigating this cost of potential uncloning
      to a late point in time where it would have been paid similarly with the
      bpf_skb_store_bytes() as well. Advantage of direct write is that the
      writes are inlined whereas the helper cannot make any length assumptions
      and thus needs to generate a call to memcpy() also for small sizes, as well
      as cost of helper call itself with sanity checks are avoided. Plus, when
      direct read is already used, we don't need to cache or perform rechecks
      on the data boundaries (due to verifier invalidating previous checks for
      helpers that change skb->data), so more complex programs using rewrites
      can benefit from switching to direct read plus write.
      
      For direct packet access to helpers, we save the otherwise needed copy into
      a temp struct sitting on stack memory when use-case allows. Both facilities
      are enabled via may_access_direct_pkt_data() in verifier. For now, we limit
      this to map helpers and csum_diff, and can successively enable other helpers
      where we find it makes sense. Helpers that definitely cannot be allowed for
      this are those part of bpf_helper_changes_skb_data() since they can change
      underlying data, and those that write into memory as this could happen for
      packet typed args when still cloned. bpf_csum_update() helper accommodates
      for the fact that we need to fixup checksum_complete when using direct write
      instead of bpf_skb_store_bytes(), meaning the programs can use available
      helpers like bpf_csum_diff(), and implement csum_add(), csum_sub(),
      csum_block_add(), csum_block_sub() equivalents in eBPF together with the
      new helper. A usage example will be provided for iproute2's examples/bpf/
      directory.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      36bbef52
    • D
      bpf, verifier: enforce larger zero range for pkt on overloading stack buffs · b399cf64
      Daniel Borkmann 提交于
      Current contract for the following two helper argument types is:
      
        * ARG_CONST_STACK_SIZE: passed argument pair must be (ptr, >0).
        * ARG_CONST_STACK_SIZE_OR_ZERO: passed argument pair can be either
          (NULL, 0) or (ptr, >0).
      
      With 6841de8b ("bpf: allow helpers access the packet directly"), we can
      pass also raw packet data to helpers, so depending on the argument type
      being PTR_TO_PACKET, we now either assert memory via check_packet_access()
      or check_stack_boundary(). As a result, the tests in check_packet_access()
      currently allow more than intended with regards to reg->imm.
      
      Back in 969bf05e ("bpf: direct packet access"), check_packet_access()
      was fine to ignore size argument since in check_mem_access() size was
      bpf_size_to_bytes() derived and prior to the call to check_packet_access()
      guaranteed to be larger than zero.
      
      However, for the above two argument types, it currently means, we can have
      a <= 0 size and thus breaking current guarantees for helpers. Enforce a
      check for size <= 0 and bail out if so.
      
      check_stack_boundary() doesn't have such an issue since it already tests
      for access_size <= 0 and bails out, resp. access_size == 0 in case of NULL
      pointer passed when allowed.
      
      Fixes: 6841de8b ("bpf: allow helpers access the packet directly")
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b399cf64
  7. 09 9月, 2016 1 次提交
    • D
      bpf: fix range propagation on direct packet access · 2d2be8ca
      Daniel Borkmann 提交于
      LLVM can generate code that tests for direct packet access via
      skb->data/data_end in a way that currently gets rejected by the
      verifier, example:
      
        [...]
         7: (61) r3 = *(u32 *)(r6 +80)
         8: (61) r9 = *(u32 *)(r6 +76)
         9: (bf) r2 = r9
        10: (07) r2 += 54
        11: (3d) if r3 >= r2 goto pc+12
         R1=inv R2=pkt(id=0,off=54,r=0) R3=pkt_end R4=inv R6=ctx
         R9=pkt(id=0,off=0,r=0) R10=fp
        12: (18) r4 = 0xffffff7a
        14: (05) goto pc+430
        [...]
      
        from 11 to 24: R1=inv R2=pkt(id=0,off=54,r=0) R3=pkt_end R4=inv
                       R6=ctx R9=pkt(id=0,off=0,r=0) R10=fp
        24: (7b) *(u64 *)(r10 -40) = r1
        25: (b7) r1 = 0
        26: (63) *(u32 *)(r6 +56) = r1
        27: (b7) r2 = 40
        28: (71) r8 = *(u8 *)(r9 +20)
        invalid access to packet, off=20 size=1, R9(id=0,off=0,r=0)
      
      The reason why this gets rejected despite a proper test is that we
      currently call find_good_pkt_pointers() only in case where we detect
      tests like rX > pkt_end, where rX is of type pkt(id=Y,off=Z,r=0) and
      derived, for example, from a register of type pkt(id=Y,off=0,r=0)
      pointing to skb->data. find_good_pkt_pointers() then fills the range
      in the current branch to pkt(id=Y,off=0,r=Z) on success.
      
      For above case, we need to extend that to recognize pkt_end >= rX
      pattern and mark the other branch that is taken on success with the
      appropriate pkt(id=Y,off=0,r=Z) type via find_good_pkt_pointers().
      Since eBPF operates on BPF_JGT (>) and BPF_JGE (>=), these are the
      only two practical options to test for from what LLVM could have
      generated, since there's no such thing as BPF_JLT (<) or BPF_JLE (<=)
      that we would need to take into account as well.
      
      After the fix:
      
        [...]
         7: (61) r3 = *(u32 *)(r6 +80)
         8: (61) r9 = *(u32 *)(r6 +76)
         9: (bf) r2 = r9
        10: (07) r2 += 54
        11: (3d) if r3 >= r2 goto pc+12
         R1=inv R2=pkt(id=0,off=54,r=0) R3=pkt_end R4=inv R6=ctx
         R9=pkt(id=0,off=0,r=0) R10=fp
        12: (18) r4 = 0xffffff7a
        14: (05) goto pc+430
        [...]
      
        from 11 to 24: R1=inv R2=pkt(id=0,off=54,r=54) R3=pkt_end R4=inv
                       R6=ctx R9=pkt(id=0,off=0,r=54) R10=fp
        24: (7b) *(u64 *)(r10 -40) = r1
        25: (b7) r1 = 0
        26: (63) *(u32 *)(r6 +56) = r1
        27: (b7) r2 = 40
        28: (71) r8 = *(u8 *)(r9 +20)
        29: (bf) r1 = r8
        30: (25) if r8 > 0x3c goto pc+47
         R1=inv56 R2=imm40 R3=pkt_end R4=inv R6=ctx R8=inv56
         R9=pkt(id=0,off=0,r=54) R10=fp
        31: (b7) r1 = 1
        [...]
      
      Verifier test cases are also added in this work, one that demonstrates
      the mentioned example here and one that tries a bad packet access for
      the current/fall-through branch (the one with types pkt(id=X,off=Y,r=0),
      pkt(id=X,off=0,r=0)), then a case with good and bad accesses, and two
      with both test variants (>, >=).
      
      Fixes: 969bf05e ("bpf: direct packet access")
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2d2be8ca
  8. 03 9月, 2016 2 次提交
  9. 13 8月, 2016 3 次提交
  10. 04 8月, 2016 1 次提交
    • J
      bpf: fix method of PTR_TO_PACKET reg id generation · 1f415a74
      Jakub Kicinski 提交于
      Using per-register incrementing ID can lead to
      find_good_pkt_pointers() confusing registers which
      have completely different values.  Consider example:
      
      0: (bf) r6 = r1
      1: (61) r8 = *(u32 *)(r6 +76)
      2: (61) r0 = *(u32 *)(r6 +80)
      3: (bf) r7 = r8
      4: (07) r8 += 32
      5: (2d) if r8 > r0 goto pc+9
       R0=pkt_end R1=ctx R6=ctx R7=pkt(id=0,off=0,r=32) R8=pkt(id=0,off=32,r=32) R10=fp
      6: (bf) r8 = r7
      7: (bf) r9 = r7
      8: (71) r1 = *(u8 *)(r7 +0)
      9: (0f) r8 += r1
      10: (71) r1 = *(u8 *)(r7 +1)
      11: (0f) r9 += r1
      12: (07) r8 += 32
      13: (2d) if r8 > r0 goto pc+1
       R0=pkt_end R1=inv56 R6=ctx R7=pkt(id=0,off=0,r=32) R8=pkt(id=1,off=32,r=32) R9=pkt(id=1,off=0,r=32) R10=fp
      14: (71) r1 = *(u8 *)(r9 +16)
      15: (b7) r7 = 0
      16: (bf) r0 = r7
      17: (95) exit
      
      We need to get a UNKNOWN_VALUE with imm to force id
      generation so lines 0-5 make r7 a valid packet pointer.
      We then read two different bytes from the packet and
      add them to copies of the constructed packet pointer.
      r8 (line 9) and r9 (line 11) will get the same id of 1,
      independently.  When either of them is validated (line
      13) - find_good_pkt_pointers() will also mark the other
      as safe.  This leads to access on line 14 being mistakenly
      considered safe.
      
      Fixes: 969bf05e ("bpf: direct packet access")
      Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1f415a74
  11. 20 7月, 2016 2 次提交
  12. 02 7月, 2016 2 次提交
  13. 16 6月, 2016 1 次提交
  14. 21 5月, 2016 2 次提交
    • A
      bpf: teach verifier to recognize imm += ptr pattern · 1b9b69ec
      Alexei Starovoitov 提交于
      Humans don't write C code like:
        u8 *ptr = skb->data;
        int imm = 4;
        imm += ptr;
      but from llvm backend point of view 'imm' and 'ptr' are registers and
      imm += ptr may be preferred vs ptr += imm depending which register value
      will be used further in the code, while verifier can only recognize ptr += imm.
      That caused small unrelated changes in the C code of the bpf program to
      trigger rejection by the verifier. Therefore teach the verifier to recognize
      both ptr += imm and imm += ptr.
      For example:
      when R6=pkt(id=0,off=0,r=62) R7=imm22
      after r7 += r6 instruction
      will be R6=pkt(id=0,off=0,r=62) R7=pkt(id=0,off=22,r=62)
      
      Fixes: 969bf05e ("bpf: direct packet access")
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1b9b69ec
    • A
      bpf: support decreasing order in direct packet access · d91b28ed
      Alexei Starovoitov 提交于
      when packet headers are accessed in 'decreasing' order (like TCP port
      may be fetched before the program reads IP src) the llvm may generate
      the following code:
      [...]                // R7=pkt(id=0,off=22,r=70)
      r2 = *(u32 *)(r7 +0) // good access
      [...]
      r7 += 40             // R7=pkt(id=0,off=62,r=70)
      r8 = *(u32 *)(r7 +0) // good access
      [...]
      r1 = *(u32 *)(r7 -20) // this one will fail though it's within a safe range
                            // it's doing *(u32*)(skb->data + 42)
      Fix verifier to recognize such code pattern
      
      Alos turned out that 'off > range' condition is not a verifier bug.
      It's a buggy program that may do something like:
      if (ptr + 50 > data_end)
        return 0;
      ptr += 60;
      *(u32*)ptr;
      in such case emit
      "invalid access to packet, off=0 size=4, R1(id=0,off=60,r=50)" error message,
      so all information is available for the program author to fix the program.
      
      Fixes: 969bf05e ("bpf: direct packet access")
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d91b28ed
  15. 17 5月, 2016 1 次提交
  16. 07 5月, 2016 3 次提交
    • A
      bpf: improve verifier state equivalence · 735b4333
      Alexei Starovoitov 提交于
      since UNKNOWN_VALUE type is weaker than CONST_IMM we can un-teach
      verifier its recognition of constants in conditional branches
      without affecting safety.
      Ex:
      if (reg == 123) {
        .. here verifier was marking reg->type as CONST_IMM
           instead keep reg as UNKNOWN_VALUE
      }
      
      Two verifier states with UNKNOWN_VALUE are equivalent, whereas
      CONST_IMM_X != CONST_IMM_Y, since CONST_IMM is used for stack range
      verification and other cases.
      So help search pruning by marking registers as UNKNOWN_VALUE
      where possible instead of CONST_IMM.
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      735b4333
    • A
      bpf: direct packet access · 969bf05e
      Alexei Starovoitov 提交于
      Extended BPF carried over two instructions from classic to access
      packet data: LD_ABS and LD_IND. They're highly optimized in JITs,
      but due to their design they have to do length check for every access.
      When BPF is processing 20M packets per second single LD_ABS after JIT
      is consuming 3% cpu. Hence the need to optimize it further by amortizing
      the cost of 'off < skb_headlen' over multiple packet accesses.
      One option is to introduce two new eBPF instructions LD_ABS_DW and LD_IND_DW
      with similar usage as skb_header_pointer().
      The kernel part for interpreter and x64 JIT was implemented in [1], but such
      new insns behave like old ld_abs and abort the program with 'return 0' if
      access is beyond linear data. Such hidden control flow is hard to workaround
      plus changing JITs and rolling out new llvm is incovenient.
      
      Therefore allow cls_bpf/act_bpf program access skb->data directly:
      int bpf_prog(struct __sk_buff *skb)
      {
        struct iphdr *ip;
      
        if (skb->data + sizeof(struct iphdr) + ETH_HLEN > skb->data_end)
            /* packet too small */
            return 0;
      
        ip = skb->data + ETH_HLEN;
      
        /* access IP header fields with direct loads */
        if (ip->version != 4 || ip->saddr == 0x7f000001)
            return 1;
        [...]
      }
      
      This solution avoids introduction of new instructions. llvm stays
      the same and all JITs stay the same, but verifier has to work extra hard
      to prove safety of the above program.
      
      For XDP the direct store instructions can be allowed as well.
      
      The skb->data is NET_IP_ALIGNED, so for common cases the verifier can check
      the alignment. The complex packet parsers where packet pointer is adjusted
      incrementally cannot be tracked for alignment, so allow byte access in such cases
      and misaligned access on architectures that define efficient_unaligned_access
      
      [1] https://git.kernel.org/cgit/linux/kernel/git/ast/bpf.git/?h=ld_abs_dwSigned-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      969bf05e
    • A
      bpf: cleanup verifier code · 1a0dc1ac
      Alexei Starovoitov 提交于
      cleanup verifier code and prepare it for addition of "pointer to packet" logic
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1a0dc1ac
  17. 29 4月, 2016 2 次提交
  18. 27 4月, 2016 1 次提交
  19. 15 4月, 2016 2 次提交
    • D
      bpf, verifier: add ARG_PTR_TO_RAW_STACK type · 435faee1
      Daniel Borkmann 提交于
      When passing buffers from eBPF stack space into a helper function, we have
      ARG_PTR_TO_STACK argument type for helpers available. The verifier makes sure
      that such buffers are initialized, within boundaries, etc.
      
      However, the downside with this is that we have a couple of helper functions
      such as bpf_skb_load_bytes() that fill out the passed buffer in the expected
      success case anyway, so zero initializing them prior to the helper call is
      unneeded/wasted instructions in the eBPF program that can be avoided.
      
      Therefore, add a new helper function argument type called ARG_PTR_TO_RAW_STACK.
      The idea is to skip the STACK_MISC check in check_stack_boundary() and color
      the related stack slots as STACK_MISC after we checked all call arguments.
      
      Helper functions using ARG_PTR_TO_RAW_STACK must make sure that every path of
      the helper function will fill the provided buffer area, so that we cannot leak
      any uninitialized stack memory. This f.e. means that error paths need to
      memset() the buffers, but the expected fast-path doesn't have to do this
      anymore.
      
      Since there's no such helper needing more than at most one ARG_PTR_TO_RAW_STACK
      argument, we can keep it simple and don't need to check for multiple areas.
      Should in future such a use-case really appear, we have check_raw_mode() that
      will make sure we implement support for it first.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      435faee1
    • D
      bpf, verifier: add bpf_call_arg_meta for passing meta data · 33ff9823
      Daniel Borkmann 提交于
      Currently, when the verifier checks calls in check_call() function, we
      call check_func_arg() for all 5 arguments e.g. to make sure expected types
      are correct. In some cases, we collect meta data (here: map pointer) to
      perform additional checks such as checking stack boundary on key/value
      sizes for subsequent arguments. As we're going to extend the meta data,
      add a generic struct bpf_call_arg_meta that we can use for passing into
      check_func_arg().
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      33ff9823
  20. 14 4月, 2016 1 次提交
  21. 11 4月, 2016 1 次提交
  22. 09 4月, 2016 1 次提交
    • D
      bpf, verifier: further improve search pruning · 07016151
      Daniel Borkmann 提交于
      The verifier needs to go through every path of the program in
      order to check that it terminates safely, which can be quite a
      lot of instructions that need to be processed f.e. in cases with
      more branchy programs. With search pruning from f1bca824 ("bpf:
      add search pruning optimization to verifier") the search space can
      already be reduced significantly when the verifier detects that
      a previously walked path with same register and stack contents
      terminated already (see verifier's states_equal()), so the search
      can skip walking those states.
      
      When working with larger programs of > ~2000 (out of max 4096)
      insns, we found that the current limit of 32k instructions is easily
      hit. For example, a case we ran into is that the search space cannot
      be pruned due to branches at the beginning of the program that make
      use of certain stack space slots (STACK_MISC), which are never used
      in the remaining program (STACK_INVALID). Therefore, the verifier
      needs to walk paths for the slots in STACK_INVALID state, but also
      all remaining paths with a stack structure, where the slots are in
      STACK_MISC, which can nearly double the search space needed. After
      various experiments, we find that a limit of 64k processed insns is
      a more reasonable choice when dealing with larger programs in practice.
      This still allows to reject extreme crafted cases that can have a
      much higher complexity (f.e. > ~300k) within the 4096 insns limit
      due to search pruning not being able to take effect.
      
      Furthermore, we found that a lot of states can be pruned after a
      call instruction, f.e. we were able to reduce the search state by
      ~35% in some cases with this heuristic, trade-off is to keep a bit
      more states in env->explored_states. Usually, call instructions
      have a number of preceding register assignments and/or stack stores,
      where search pruning has a better chance to suceed in states_equal()
      test. The current code marks the branch targets with STATE_LIST_MARK
      in case of conditional jumps, and the next (t + 1) instruction in
      case of unconditional jump so that f.e. a backjump will walk it. We
      also did experiments with using t + insns[t].off + 1 as a marker in
      the unconditionally jump case instead of t + 1 with the rationale
      that these two branches of execution that converge after the label
      might have more potential of pruning. We found that it was a bit
      better, but not necessarily significantly better than the current
      state, perhaps also due to clang not generating back jumps often.
      Hence, we left that as is for now.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      07016151
  23. 08 4月, 2016 1 次提交
  24. 22 2月, 2016 1 次提交
    • D
      bpf: add new arg_type that allows for 0 sized stack buffer · 8e2fe1d9
      Daniel Borkmann 提交于
      Currently, when we pass a buffer from the eBPF stack into a helper
      function, the function proto indicates argument types as ARG_PTR_TO_STACK
      and ARG_CONST_STACK_SIZE pair. If R<X> contains the former, then R<X+1>
      must be of the latter type. Then, verifier checks whether the buffer
      points into eBPF stack, is initialized, etc. The verifier also guarantees
      that the constant value passed in R<X+1> is greater than 0, so helper
      functions don't need to test for it and can always assume a non-NULL
      initialized buffer as well as non-0 buffer size.
      
      This patch adds a new argument types ARG_CONST_STACK_SIZE_OR_ZERO that
      allows to also pass NULL as R<X> and 0 as R<X+1> into the helper function.
      Such helper functions, of course, need to be able to handle these cases
      internally then. Verifier guarantees that either R<X> == NULL && R<X+1> == 0
      or R<X> != NULL && R<X+1> != 0 (like the case of ARG_CONST_STACK_SIZE), any
      other combinations are not possible to load.
      
      I went through various options of extending the verifier, and introducing
      the type ARG_CONST_STACK_SIZE_OR_ZERO seems to have most minimal changes
      needed to the verifier.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8e2fe1d9
  25. 20 2月, 2016 1 次提交
    • A
      bpf: introduce BPF_MAP_TYPE_STACK_TRACE · d5a3b1f6
      Alexei Starovoitov 提交于
      add new map type to store stack traces and corresponding helper
      bpf_get_stackid(ctx, map, flags) - walk user or kernel stack and return id
      @ctx: struct pt_regs*
      @map: pointer to stack_trace map
      @flags: bits 0-7 - numer of stack frames to skip
              bit 8 - collect user stack instead of kernel
              bit 9 - compare stacks by hash only
              bit 10 - if two different stacks hash into the same stackid
                       discard old
              other bits - reserved
      Return: >= 0 stackid on success or negative error
      
      stackid is a 32-bit integer handle that can be further combined with
      other data (including other stackid) and used as a key into maps.
      
      Userspace will access stackmap using standard lookup/delete syscall commands to
      retrieve full stack trace for given stackid.
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d5a3b1f6
  26. 11 2月, 2016 1 次提交
    • D
      bpf: fix branch offset adjustment on backjumps after patching ctx expansion · a1b14d27
      Daniel Borkmann 提交于
      When ctx access is used, the kernel often needs to expand/rewrite
      instructions, so after that patching, branch offsets have to be
      adjusted for both forward and backward jumps in the new eBPF program,
      but for backward jumps it fails to account the delta. Meaning, for
      example, if the expansion happens exactly on the insn that sits at
      the jump target, it doesn't fix up the back jump offset.
      
      Analysis on what the check in adjust_branches() is currently doing:
      
        /* adjust offset of jmps if necessary */
        if (i < pos && i + insn->off + 1 > pos)
          insn->off += delta;
        else if (i > pos && i + insn->off + 1 < pos)
          insn->off -= delta;
      
      First condition (forward jumps):
      
        Before:                         After:
      
        insns[0]                        insns[0]
        insns[1] <--- i/insn            insns[1] <--- i/insn
        insns[2] <--- pos               insns[P] <--- pos
        insns[3]                        insns[P]  `------| delta
        insns[4] <--- target_X          insns[P]   `-----|
        insns[5]                        insns[3]
                                        insns[4] <--- target_X
                                        insns[5]
      
      First case is if we cross pos-boundary and the jump instruction was
      before pos. This is handeled correctly. I.e. if i == pos, then this
      would mean our jump that we currently check was the patchlet itself
      that we just injected. Since such patchlets are self-contained and
      have no awareness of any insns before or after the patched one, the
      delta is correctly not adjusted. Also, for the second condition in
      case of i + insn->off + 1 == pos, means we jump to that newly patched
      instruction, so no offset adjustment are needed. That part is correct.
      
      Second condition (backward jumps):
      
        Before:                         After:
      
        insns[0]                        insns[0]
        insns[1] <--- target_X          insns[1] <--- target_X
        insns[2] <--- pos <-- target_Y  insns[P] <--- pos <-- target_Y
        insns[3]                        insns[P]  `------| delta
        insns[4] <--- i/insn            insns[P]   `-----|
        insns[5]                        insns[3]
                                        insns[4] <--- i/insn
                                        insns[5]
      
      Second interesting case is where we cross pos-boundary and the jump
      instruction was after pos. Backward jump with i == pos would be
      impossible and pose a bug somewhere in the patchlet, so the first
      condition checking i > pos is okay only by itself. However, i +
      insn->off + 1 < pos does not always work as intended to trigger the
      adjustment. It works when jump targets would be far off where the
      delta wouldn't matter. But, for example, where the fixed insn->off
      before pointed to pos (target_Y), it now points to pos + delta, so
      that additional room needs to be taken into account for the check.
      This means that i) both tests here need to be adjusted into pos + delta,
      and ii) for the second condition, the test needs to be <= as pos
      itself can be a target in the backjump, too.
      
      Fixes: 9bac3d6d ("bpf: allow extended BPF programs access skb fields")
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a1b14d27