1. 24 12月, 2017 1 次提交
    • G
      bpf: fix stacksafe exploration when comparing states · fd05e57b
      Gianluca Borello 提交于
      Commit cc2b14d5 ("bpf: teach verifier to recognize zero initialized
      stack") introduced a very relaxed check when comparing stacks of different
      states, effectively returning a positive result in many cases where it
      shouldn't.
      
      This can create problems in cases such as this following C pseudocode:
      
      long var;
      long *x = bpf_map_lookup(...);
      if (!x)
              return;
      
      if (*x != 0xbeef)
              var = 0;
      else
              var = 1;
      
      /* This is the key part, calling a helper causes an explored state
       * to be saved with the information that "var" is on the stack as
       * STACK_ZERO, since the helper is first met by the verifier after
       * the "var = 0" assignment. This state will however be wrongly used
       * also for the "var = 1" case, so the verifier assumes "var" is always
       * 0 and will replace the NULL assignment with nops, because the
       * search pruning prevents it from exploring the faulty branch.
       */
      bpf_ktime_get_ns();
      
      if (var)
              *(long *)0 = 0xbeef;
      
      Fix the issue by making sure that the stack is fully explored before
      returning a positive comparison result.
      
      Also attach a couple tests that highlight the bad behavior. In the first
      test, without this fix instructions 16 and 17 are replaced with nops
      instead of being rejected by the verifier.
      
      The second test, instead, allows a program to make a potentially illegal
      read from the stack.
      
      Fixes: cc2b14d5 ("bpf: teach verifier to recognize zero initialized stack")
      Signed-off-by: NGianluca Borello <g.borello@gmail.com>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      fd05e57b
  2. 21 12月, 2017 2 次提交
    • D
      bpf: allow for correlation of maps and helpers in dump · 7105e828
      Daniel Borkmann 提交于
      Currently a dump of an xlated prog (post verifier stage) doesn't
      correlate used helpers as well as maps. The prog info lists
      involved map ids, however there's no correlation of where in the
      program they are used as of today. Likewise, bpftool does not
      correlate helper calls with the target functions.
      
      The latter can be done w/o any kernel changes through kallsyms,
      and also has the advantage that this works with inlined helpers
      and BPF calls.
      
      Example, via interpreter:
      
        # tc filter show dev foo ingress
        filter protocol all pref 49152 bpf chain 0
        filter protocol all pref 49152 bpf chain 0 handle 0x1 foo.o:[ingress] \
                            direct-action not_in_hw id 1 tag c74773051b364165   <-- prog id:1
      
        * Output before patch (calls/maps remain unclear):
      
        # bpftool prog dump xlated id 1             <-- dump prog id:1
         0: (b7) r1 = 2
         1: (63) *(u32 *)(r10 -4) = r1
         2: (bf) r2 = r10
         3: (07) r2 += -4
         4: (18) r1 = 0xffff95c47a8d4800
         6: (85) call unknown#73040
         7: (15) if r0 == 0x0 goto pc+18
         8: (bf) r2 = r10
         9: (07) r2 += -4
        10: (bf) r1 = r0
        11: (85) call unknown#73040
        12: (15) if r0 == 0x0 goto pc+23
        [...]
      
        * Output after patch:
      
        # bpftool prog dump xlated id 1
         0: (b7) r1 = 2
         1: (63) *(u32 *)(r10 -4) = r1
         2: (bf) r2 = r10
         3: (07) r2 += -4
         4: (18) r1 = map[id:2]                     <-- map id:2
         6: (85) call bpf_map_lookup_elem#73424     <-- helper call
         7: (15) if r0 == 0x0 goto pc+18
         8: (bf) r2 = r10
         9: (07) r2 += -4
        10: (bf) r1 = r0
        11: (85) call bpf_map_lookup_elem#73424
        12: (15) if r0 == 0x0 goto pc+23
        [...]
      
        # bpftool map show id 2                     <-- show/dump/etc map id:2
        2: hash_of_maps  flags 0x0
              key 4B  value 4B  max_entries 3  memlock 4096B
      
      Example, JITed, same prog:
      
        # tc filter show dev foo ingress
        filter protocol all pref 49152 bpf chain 0
        filter protocol all pref 49152 bpf chain 0 handle 0x1 foo.o:[ingress] \
                        direct-action not_in_hw id 3 tag c74773051b364165 jited
      
        # bpftool prog show id 3
        3: sched_cls  tag c74773051b364165
              loaded_at Dec 19/13:48  uid 0
              xlated 384B  jited 257B  memlock 4096B  map_ids 2
      
        # bpftool prog dump xlated id 3
         0: (b7) r1 = 2
         1: (63) *(u32 *)(r10 -4) = r1
         2: (bf) r2 = r10
         3: (07) r2 += -4
         4: (18) r1 = map[id:2]                      <-- map id:2
         6: (85) call __htab_map_lookup_elem#77408   <-+ inlined rewrite
         7: (15) if r0 == 0x0 goto pc+2                |
         8: (07) r0 += 56                              |
         9: (79) r0 = *(u64 *)(r0 +0)                <-+
        10: (15) if r0 == 0x0 goto pc+24
        11: (bf) r2 = r10
        12: (07) r2 += -4
        [...]
      
      Example, same prog, but kallsyms disabled (in that case we are
      also not allowed to pass any relative offsets, etc, so prog
      becomes pointer sanitized on dump):
      
        # sysctl kernel.kptr_restrict=2
        kernel.kptr_restrict = 2
      
        # bpftool prog dump xlated id 3
         0: (b7) r1 = 2
         1: (63) *(u32 *)(r10 -4) = r1
         2: (bf) r2 = r10
         3: (07) r2 += -4
         4: (18) r1 = map[id:2]
         6: (85) call bpf_unspec#0
         7: (15) if r0 == 0x0 goto pc+2
        [...]
      
      Example, BPF calls via interpreter:
      
        # bpftool prog dump xlated id 1
         0: (85) call pc+2#__bpf_prog_run_args32
         1: (b7) r0 = 1
         2: (95) exit
         3: (b7) r0 = 2
         4: (95) exit
      
      Example, BPF calls via JIT:
      
        # sysctl net.core.bpf_jit_enable=1
        net.core.bpf_jit_enable = 1
        # sysctl net.core.bpf_jit_kallsyms=1
        net.core.bpf_jit_kallsyms = 1
      
        # bpftool prog dump xlated id 1
         0: (85) call pc+2#bpf_prog_3b185187f1855c4c_F
         1: (b7) r0 = 1
         2: (95) exit
         3: (b7) r0 = 2
         4: (95) exit
      
      And finally, an example for tail calls that is now working
      as well wrt correlation:
      
        # bpftool prog dump xlated id 2
        [...]
        10: (b7) r2 = 8
        11: (85) call bpf_trace_printk#-41312
        12: (bf) r1 = r6
        13: (18) r2 = map[id:1]
        15: (b7) r3 = 0
        16: (85) call bpf_tail_call#12
        17: (b7) r1 = 42
        18: (6b) *(u16 *)(r6 +46) = r1
        19: (b7) r0 = 0
        20: (95) exit
      
        # bpftool map show id 1
        1: prog_array  flags 0x0
              key 4B  value 4B  max_entries 1  memlock 4096B
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      7105e828
    • D
      bpf: fix kallsyms handling for subprogs · 4f74d809
      Daniel Borkmann 提交于
      Right now kallsyms handling is not working with JITed subprogs.
      The reason is that when in 1c2a088a ("bpf: x64: add JIT support
      for multi-function programs") in jit_subprogs() they are passed
      to bpf_prog_kallsyms_add(), then their prog type is 0, which BPF
      core will think it's a cBPF program as only cBPF programs have a
      0 type. Thus, they need to inherit the type from the main prog.
      
      Once that is fixed, they are indeed added to the BPF kallsyms
      infra, but their tag is 0. Therefore, since intention is to add
      them as bpf_prog_F_<tag>, we need to pass them to bpf_prog_calc_tag()
      first. And once this is resolved, there is a use-after-free on
      prog cleanup: we remove the kallsyms entry from the main prog,
      later walk all subprogs and call bpf_jit_free() on them. However,
      the kallsyms linkage was never released on them. Thus, do that
      for all subprogs right in __bpf_prog_put() when refcount hits 0.
      
      Fixes: 1c2a088a ("bpf: x64: add JIT support for multi-function programs")
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      4f74d809
  3. 19 12月, 2017 2 次提交
  4. 18 12月, 2017 6 次提交
    • A
      bpf: x64: add JIT support for multi-function programs · 1c2a088a
      Alexei Starovoitov 提交于
      Typical JIT does several passes over bpf instructions to
      compute total size and relative offsets of jumps and calls.
      With multitple bpf functions calling each other all relative calls
      will have invalid offsets intially therefore we need to additional
      last pass over the program to emit calls with correct offsets.
      For example in case of three bpf functions:
      main:
        call foo
        call bpf_map_lookup
        exit
      foo:
        call bar
        exit
      bar:
        exit
      
      We will call bpf_int_jit_compile() indepedently for main(), foo() and bar()
      x64 JIT typically does 4-5 passes to converge.
      After these initial passes the image for these 3 functions
      will be good except call targets, since start addresses of
      foo() and bar() are unknown when we were JITing main()
      (note that call bpf_map_lookup will be resolved properly
      during initial passes).
      Once start addresses of 3 functions are known we patch
      call_insn->imm to point to right functions and call
      bpf_int_jit_compile() again which needs only one pass.
      Additional safety checks are done to make sure this
      last pass doesn't produce image that is larger or smaller
      than previous pass.
      
      When constant blinding is on it's applied to all functions
      at the first pass, since doing it once again at the last
      pass can change size of the JITed code.
      
      Tested on x64 and arm64 hw with JIT on/off, blinding on/off.
      x64 jits bpf-to-bpf calls correctly while arm64 falls back to interpreter.
      All other JITs that support normal BPF_CALL will behave the same way
      since bpf-to-bpf call is equivalent to bpf-to-kernel call from
      JITs point of view.
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      1c2a088a
    • A
      bpf: fix net.core.bpf_jit_enable race · 60b58afc
      Alexei Starovoitov 提交于
      global bpf_jit_enable variable is tested multiple times in JITs,
      blinding and verifier core. The malicious root can try to toggle
      it while loading the programs. This race condition was accounted
      for and there should be no issues, but it's safer to avoid
      this race condition.
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      60b58afc
    • A
      bpf: add support for bpf_call to interpreter · 1ea47e01
      Alexei Starovoitov 提交于
      though bpf_call is still the same call instruction and
      calling convention 'bpf to bpf' and 'bpf to helper' is the same
      the interpreter has to oparate on 'struct bpf_insn *'.
      To distinguish these two cases add a kernel internal opcode and
      mark call insns with it.
      This opcode is seen by interpreter only. JITs will never see it.
      Also add tiny bit of debug code to aid interpreter debugging.
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      1ea47e01
    • A
      bpf: teach verifier to recognize zero initialized stack · cc2b14d5
      Alexei Starovoitov 提交于
      programs with function calls are often passing various
      pointers via stack. When all calls are inlined llvm
      flattens stack accesses and optimizes away extra branches.
      When functions are not inlined it becomes the job of
      the verifier to recognize zero initialized stack to avoid
      exploring paths that program will not take.
      The following program would fail otherwise:
      
      ptr = &buffer_on_stack;
      *ptr = 0;
      ...
      func_call(.., ptr, ...) {
        if (..)
          *ptr = bpf_map_lookup();
      }
      ...
      if (*ptr != 0) {
        // Access (*ptr)->field is valid.
        // Without stack_zero tracking such (*ptr)->field access
        // will be rejected
      }
      
      since stack slots are no longer uniform invalid | spill | misc
      add liveness marking to all slots, but do it in 8 byte chunks.
      So if nothing was read or written in [fp-16, fp-9] range
      it will be marked as LIVE_NONE.
      If any byte in that range was read, it will be marked LIVE_READ
      and stacksafe() check will perform byte-by-byte verification.
      If all bytes in the range were written the slot will be
      marked as LIVE_WRITTEN.
      This significantly speeds up state equality comparison
      and reduces total number of states processed.
      
                          before   after
      bpf_lb-DLB_L3.o       2051    2003
      bpf_lb-DLB_L4.o       3287    3164
      bpf_lb-DUNKNOWN.o     1080    1080
      bpf_lxc-DDROP_ALL.o   24980   12361
      bpf_lxc-DUNKNOWN.o    34308   16605
      bpf_netdev.o          15404   10962
      bpf_overlay.o         7191    6679
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      cc2b14d5
    • A
      bpf: introduce function calls (verification) · f4d7e40a
      Alexei Starovoitov 提交于
      Allow arbitrary function calls from bpf function to another bpf function.
      
      To recognize such set of bpf functions the verifier does:
      1. runs control flow analysis to detect function boundaries
      2. proceeds with verification of all functions starting from main(root) function
      It recognizes that the stack of the caller can be accessed by the callee
      (if the caller passed a pointer to its stack to the callee) and the callee
      can store map_value and other pointers into the stack of the caller.
      3. keeps track of the stack_depth of each function to make sure that total
      stack depth is still less than 512 bytes
      4. disallows pointers to the callee stack to be stored into the caller stack,
      since they will be invalid as soon as the callee returns
      5. to reuse all of the existing state_pruning logic each function call
      is considered to be independent call from the verifier point of view.
      The verifier pretends to inline all function calls it sees are being called.
      It stores the callsite instruction index as part of the state to make sure
      that two calls to the same callee from two different places in the caller
      will be different from state pruning point of view
      6. more safety checks are added to liveness analysis
      
      Implementation details:
      . struct bpf_verifier_state is now consists of all stack frames that
        led to this function
      . struct bpf_func_state represent one stack frame. It consists of
        registers in the given frame and its stack
      . propagate_liveness() logic had a premature optimization where
        mark_reg_read() and mark_stack_slot_read() were manually inlined
        with loop iterating over parents for each register or stack slot.
        Undo this optimization to reuse more complex mark_*_read() logic
      . skip_callee() logic is not necessary from safety point of view,
        but without it mark_*_read() markings become too conservative,
        since after returning from the funciton call a read of r6-r9
        will incorrectly propagate the read marks into callee causing
        inefficient pruning later
      . mark_*_read() logic is now aware of control flow which makes it
        more complex. In the future the plan is to rewrite liveness
        to be hierarchical. So that liveness can be done within
        basic block only and control flow will be responsible for
        propagation of liveness information along cfg and between calls.
      . tail_calls and ld_abs insns are not allowed in the programs with
        bpf-to-bpf calls
      . returning stack pointers to the caller or storing them into stack
        frame of the caller is not allowed
      
      Testing:
      . no difference in cilium processed_insn numbers
      . large number of tests follows in next patches
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      f4d7e40a
    • A
      bpf: introduce function calls (function boundaries) · cc8b0b92
      Alexei Starovoitov 提交于
      Allow arbitrary function calls from bpf function to another bpf function.
      
      Since the beginning of bpf all bpf programs were represented as a single function
      and program authors were forced to use always_inline for all functions
      in their C code. That was causing llvm to unnecessary inflate the code size
      and forcing developers to move code to header files with little code reuse.
      
      With a bit of additional complexity teach verifier to recognize
      arbitrary function calls from one bpf function to another as long as
      all of functions are presented to the verifier as a single bpf program.
      New program layout:
      r6 = r1    // some code
      ..
      r1 = ..    // arg1
      r2 = ..    // arg2
      call pc+1  // function call pc-relative
      exit
      .. = r1    // access arg1
      .. = r2    // access arg2
      ..
      call pc+20 // second level of function call
      ...
      
      It allows for better optimized code and finally allows to introduce
      the core bpf libraries that can be reused in different projects,
      since programs are no longer limited by single elf file.
      With function calls bpf can be compiled into multiple .o files.
      
      This patch is the first step. It detects programs that contain
      multiple functions and checks that calls between them are valid.
      It splits the sequence of bpf instructions (one program) into a set
      of bpf functions that call each other. Calls to only known
      functions are allowed. In the future the verifier may allow
      calls to unresolved functions and will do dynamic linking.
      This logic supports statically linked bpf functions only.
      
      Such function boundary detection could have been done as part of
      control flow graph building in check_cfg(), but it's cleaner to
      separate function boundary detection vs control flow checks within
      a subprogram (function) into logically indepedent steps.
      Follow up patches may split check_cfg() further, but not check_subprogs().
      
      Only allow bpf-to-bpf calls for root only and for non-hw-offloaded programs.
      These restrictions can be relaxed in the future.
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      cc8b0b92
  5. 13 12月, 2017 1 次提交
  6. 01 12月, 2017 6 次提交
  7. 23 11月, 2017 2 次提交
    • A
      bpf: fix branch pruning logic · c131187d
      Alexei Starovoitov 提交于
      when the verifier detects that register contains a runtime constant
      and it's compared with another constant it will prune exploration
      of the branch that is guaranteed not to be taken at runtime.
      This is all correct, but malicious program may be constructed
      in such a way that it always has a constant comparison and
      the other branch is never taken under any conditions.
      In this case such path through the program will not be explored
      by the verifier. It won't be taken at run-time either, but since
      all instructions are JITed the malicious program may cause JITs
      to complain about using reserved fields, etc.
      To fix the issue we have to track the instructions explored by
      the verifier and sanitize instructions that are dead at run time
      with NOPs. We cannot reject such dead code, since llvm generates
      it for valid C code, since it doesn't do as much data flow
      analysis as the verifier does.
      
      Fixes: 17a52670 ("bpf: verifier (add verifier core)")
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      c131187d
    • G
      bpf: introduce ARG_PTR_TO_MEM_OR_NULL · db1ac496
      Gianluca Borello 提交于
      With the current ARG_PTR_TO_MEM/ARG_PTR_TO_UNINIT_MEM semantics, an helper
      argument can be NULL when the next argument type is ARG_CONST_SIZE_OR_ZERO
      and the verifier can prove the value of this next argument is 0. However,
      most helpers are just interested in handling <!NULL, 0>, so forcing them to
      deal with <NULL, 0> makes the implementation of those helpers more
      complicated for no apparent benefits, requiring them to explicitly handle
      those corner cases with checks that bpf programs could start relying upon,
      preventing the possibility of removing them later.
      
      Solve this by making ARG_PTR_TO_MEM/ARG_PTR_TO_UNINIT_MEM never accept NULL
      even when ARG_CONST_SIZE_OR_ZERO is set, and introduce a new argument type
      ARG_PTR_TO_MEM_OR_NULL to explicitly deal with the NULL case.
      
      Currently, the only helper that needs this is bpf_csum_diff_proto(), so
      change arg1 and arg3 to this new type as well.
      
      Also add a new battery of tests that explicitly test the
      !ARG_PTR_TO_MEM_OR_NULL combination: all the current ones testing the
      various <NULL, 0> variations are focused on bpf_csum_diff, so cover also
      other helpers.
      Signed-off-by: NGianluca Borello <g.borello@gmail.com>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      db1ac496
  8. 14 11月, 2017 1 次提交
    • Y
      bpf: improve verifier ARG_CONST_SIZE_OR_ZERO semantics · 9fd29c08
      Yonghong Song 提交于
      For helpers, the argument type ARG_CONST_SIZE_OR_ZERO permits the
      access size to be 0 when accessing the previous argument (arg).
      Right now, it requires the arg needs to be NULL when size passed
      is 0 or could be 0. It also requires a non-NULL arg when the size
      is proved to be non-0.
      
      This patch changes verifier ARG_CONST_SIZE_OR_ZERO behavior
      such that for size-0 or possible size-0, it is not required
      the arg equal to NULL.
      
      There are a couple of reasons for this semantics change, and
      all of them intends to simplify user bpf programs which
      may improve user experience and/or increase chances of
      verifier acceptance. Together with the next patch which
      changes bpf_probe_read arg2 type from ARG_CONST_SIZE to
      ARG_CONST_SIZE_OR_ZERO, the following two examples, which
      fail the verifier currently, are able to get verifier acceptance.
      
      Example 1:
         unsigned long len = pend - pstart;
         len = len > MAX_PAYLOAD_LEN ? MAX_PAYLOAD_LEN : len;
         len &= MAX_PAYLOAD_LEN;
         bpf_probe_read(data->payload, len, pstart);
      
      It does not have test for "len > 0" and it failed the verifier.
      Users may not be aware that they have to add this test.
      Converting the bpf_probe_read helper to have
      ARG_CONST_SIZE_OR_ZERO helps the above code get
      verifier acceptance.
      
      Example 2:
        Here is one example where llvm "messed up" the code and
        the verifier fails.
      
      ......
         unsigned long len = pend - pstart;
         if (len > 0 && len <= MAX_PAYLOAD_LEN)
           bpf_probe_read(data->payload, len, pstart);
      ......
      
      The compiler generates the following code and verifier fails:
      ......
      39: (79) r2 = *(u64 *)(r10 -16)
      40: (1f) r2 -= r8
      41: (bf) r1 = r2
      42: (07) r1 += -1
      43: (25) if r1 > 0xffe goto pc+3
        R0=inv(id=0) R1=inv(id=0,umax_value=4094,var_off=(0x0; 0xfff))
        R2=inv(id=0) R6=map_value(id=0,off=0,ks=4,vs=4095,imm=0) R7=inv(id=0)
        R8=inv(id=0) R9=inv0 R10=fp0
      44: (bf) r1 = r6
      45: (bf) r3 = r8
      46: (85) call bpf_probe_read#45
      R2 min value is negative, either use unsigned or 'var &= const'
      ......
      
      The compiler optimization is correct. If r1 = 0,
      r1 - 1 = 0xffffffffffffffff > 0xffe.  If r1 != 0, r1 - 1 will not wrap.
      r1 > 0xffe at insn #43 can actually capture
      both "r1 > 0" and "len <= MAX_PAYLOAD_LEN".
      This however causes an issue in verifier as the value range of arg2
      "r2" does not properly get refined and lead to verification failure.
      
      Relaxing bpf_prog_read arg2 from ARG_CONST_SIZE to ARG_CONST_SIZE_OR_ZERO
      allows the following simplied code:
         unsigned long len = pend - pstart;
         if (len <= MAX_PAYLOAD_LEN)
           bpf_probe_read(data->payload, len, pstart);
      
      The llvm compiler will generate less complex code and the
      verifier is able to verify that the program is okay.
      Signed-off-by: NYonghong Song <yhs@fb.com>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9fd29c08
  9. 11 11月, 2017 2 次提交
  10. 05 11月, 2017 3 次提交
  11. 03 11月, 2017 3 次提交
    • C
      bpf: fix verifier NULL pointer dereference · 8c01c4f8
      Craig Gallek 提交于
      do_check() can fail early without allocating env->cur_state under
      memory pressure.  Syzkaller found the stack below on the linux-next
      tree because of this.
      
        kasan: CONFIG_KASAN_INLINE enabled
        kasan: GPF could be caused by NULL-ptr deref or user memory access
        general protection fault: 0000 [#1] SMP KASAN
        Dumping ftrace buffer:
           (ftrace buffer empty)
        Modules linked in:
        CPU: 1 PID: 27062 Comm: syz-executor5 Not tainted 4.14.0-rc7+ #106
        Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
        task: ffff8801c2c74700 task.stack: ffff8801c3e28000
        RIP: 0010:free_verifier_state kernel/bpf/verifier.c:347 [inline]
        RIP: 0010:bpf_check+0xcf4/0x19c0 kernel/bpf/verifier.c:4533
        RSP: 0018:ffff8801c3e2f5c8 EFLAGS: 00010202
        RAX: dffffc0000000000 RBX: 00000000fffffff4 RCX: 0000000000000000
        RDX: 0000000000000070 RSI: ffffffff817d5aa9 RDI: 0000000000000380
        RBP: ffff8801c3e2f668 R08: 0000000000000000 R09: 1ffff100387c5d9f
        R10: 00000000218c4e80 R11: ffffffff85b34380 R12: ffff8801c4dc6a28
        R13: 0000000000000000 R14: ffff8801c4dc6a00 R15: ffff8801c4dc6a20
        FS:  00007f311079b700(0000) GS:ffff8801db300000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 00000000004d4a24 CR3: 00000001cbcd0000 CR4: 00000000001406e0
        DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
        Call Trace:
         bpf_prog_load+0xcbb/0x18e0 kernel/bpf/syscall.c:1166
         SYSC_bpf kernel/bpf/syscall.c:1690 [inline]
         SyS_bpf+0xae9/0x4620 kernel/bpf/syscall.c:1652
         entry_SYSCALL_64_fastpath+0x1f/0xbe
        RIP: 0033:0x452869
        RSP: 002b:00007f311079abe8 EFLAGS: 00000212 ORIG_RAX: 0000000000000141
        RAX: ffffffffffffffda RBX: 0000000000758020 RCX: 0000000000452869
        RDX: 0000000000000030 RSI: 0000000020168000 RDI: 0000000000000005
        RBP: 00007f311079aa20 R08: 0000000000000000 R09: 0000000000000000
        R10: 0000000000000000 R11: 0000000000000212 R12: 00000000004b7550
        R13: 00007f311079ab58 R14: 00000000004b7560 R15: 0000000000000000
        Code: df 48 c1 ea 03 80 3c 02 00 0f 85 e6 0b 00 00 4d 8b 6e 20 48 b8 00 00 00 00 00 fc ff df 49 8d bd 80 03 00 00 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 b6 0b 00 00 49 8b bd 80 03 00 00 e8 d6 0c 26
        RIP: free_verifier_state kernel/bpf/verifier.c:347 [inline] RSP: ffff8801c3e2f5c8
        RIP: bpf_check+0xcf4/0x19c0 kernel/bpf/verifier.c:4533 RSP: ffff8801c3e2f5c8
        ---[ end trace c8d37f339dc64004 ]---
      
      Fixes: 638f5b90 ("bpf: reduce verifier memory consumption")
      Fixes: 1969db47 ("bpf: fix verifier memory leaks")
      Signed-off-by: NCraig Gallek <kraig@google.com>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8c01c4f8
    • A
      bpf: fix out-of-bounds access warning in bpf_check · eba0c929
      Arnd Bergmann 提交于
      The bpf_verifer_ops array is generated dynamically and may be
      empty depending on configuration, which then causes an out
      of bounds access:
      
      kernel/bpf/verifier.c: In function 'bpf_check':
      kernel/bpf/verifier.c:4320:29: error: array subscript is above array bounds [-Werror=array-bounds]
      
      This adds a check to the start of the function as a workaround.
      I would assume that the function is never called in that configuration,
      so the warning is probably harmless.
      
      Fixes: 00176a34 ("bpf: remove the verifier ops from program structure")
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      eba0c929
    • A
      bpf: fix link error without CONFIG_NET · 7cce782e
      Arnd Bergmann 提交于
      I ran into this link error with the latest net-next plus linux-next
      trees when networking is disabled:
      
      kernel/bpf/verifier.o:(.rodata+0x2958): undefined reference to `tc_cls_act_analyzer_ops'
      kernel/bpf/verifier.o:(.rodata+0x2970): undefined reference to `xdp_analyzer_ops'
      
      It seems that the code was written to deal with varying contents of
      the arrray, but the actual #ifdef was missing. Both tc_cls_act_analyzer_ops
      and xdp_analyzer_ops are defined in the core networking code, so adding
      a check for CONFIG_NET seems appropriate here, and I've verified this with
      many randconfig builds
      
      Fixes: 4f9218aa ("bpf: move knowledge about post-translation offsets out of verifier")
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7cce782e
  12. 02 11月, 2017 2 次提交
  13. 01 11月, 2017 2 次提交
  14. 22 10月, 2017 2 次提交
  15. 18 10月, 2017 5 次提交
    • J
      bpf: move knowledge about post-translation offsets out of verifier · 4f9218aa
      Jakub Kicinski 提交于
      Use the fact that verifier ops are now separate from program
      ops to define a separate set of callbacks for verification of
      already translated programs.
      
      Since we expect the analyzer ops to be defined only for
      a small subset of all program types initialize their array
      by hand (don't use linux/bpf_types.h).
      Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4f9218aa
    • J
      bpf: remove the verifier ops from program structure · 00176a34
      Jakub Kicinski 提交于
      Since the verifier ops don't have to be associated with
      the program for its entire lifetime we can move it to
      verifier's struct bpf_verifier_env.
      Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      00176a34
    • J
      bpf: split verifier and program ops · 7de16e3a
      Jakub Kicinski 提交于
      struct bpf_verifier_ops contains both verifier ops and operations
      used later during program's lifetime (test_run).  Split the runtime
      ops into a different structure.
      
      BPF_PROG_TYPE() will now append ## _prog_ops or ## _verifier_ops
      to the names.
      Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7de16e3a
    • J
      bpf: disallow arithmetic operations on context pointer · 28e33f9d
      Jakub Kicinski 提交于
      Commit f1174f77 ("bpf/verifier: rework value tracking")
      removed the crafty selection of which pointer types are
      allowed to be modified.  This is OK for most pointer types
      since adjust_ptr_min_max_vals() will catch operations on
      immutable pointers.  One exception is PTR_TO_CTX which is
      now allowed to be offseted freely.
      
      The intent of aforementioned commit was to allow context
      access via modified registers.  The offset passed to
      ->is_valid_access() verifier callback has been adjusted
      by the value of the variable offset.
      
      What is missing, however, is taking the variable offset
      into account when the context register is used.  Or in terms
      of the code adding the offset to the value passed to the
      ->convert_ctx_access() callback.  This leads to the following
      eBPF user code:
      
           r1 += 68
           r0 = *(u32 *)(r1 + 8)
           exit
      
      being translated to this in kernel space:
      
         0: (07) r1 += 68
         1: (61) r0 = *(u32 *)(r1 +180)
         2: (95) exit
      
      Offset 8 is corresponding to 180 in the kernel, but offset
      76 is valid too.  Verifier will "accept" access to offset
      68+8=76 but then "convert" access to offset 8 as 180.
      Effective access to offset 248 is beyond the kernel context.
      (This is a __sk_buff example on a debug-heavy kernel -
      packet mark is 8 -> 180, 76 would be data.)
      
      Dereferencing the modified context pointer is not as easy
      as dereferencing other types, because we have to translate
      the access to reading a field in kernel structures which is
      usually at a different offset and often of a different size.
      To allow modifying the pointer we would have to make sure
      that given eBPF instruction will always access the same
      field or the fields accessed are "compatible" in terms of
      offset and size...
      
      Disallow dereferencing modified context pointers and add
      to selftests the test case described here.
      
      Fixes: f1174f77 ("bpf/verifier: rework value tracking")
      Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NEdward Cree <ecree@solarflare.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      28e33f9d
    • J
      bpf: XDP_REDIRECT enable use of cpumap · 9c270af3
      Jesper Dangaard Brouer 提交于
      This patch connects cpumap to the xdp_do_redirect_map infrastructure.
      
      Still no SKB allocation are done yet.  The XDP frames are transferred
      to the other CPU, but they are simply refcnt decremented on the remote
      CPU.  This served as a good benchmark for measuring the overhead of
      remote refcnt decrement.  If driver page recycle cache is not
      efficient then this, exposes a bottleneck in the page allocator.
      
      A shout-out to MST's ptr_ring, which is the secret behind is being so
      efficient to transfer memory pointers between CPUs, without constantly
      bouncing cache-lines between CPUs.
      
      V3: Handle !CONFIG_BPF_SYSCALL pointed out by kbuild test robot.
      
      V4: Make Generic-XDP aware of cpumap type, but don't allow redirect yet,
       as implementation require a separate upstream discussion.
      
      V5:
       - Fix a maybe-uninitialized pointed out by kbuild test robot.
       - Restrict bpf-prog side access to cpumap, open when use-cases appear
       - Implement cpu_map_enqueue() as a more simple void pointer enqueue
      
      V6:
       - Allow cpumap type for usage in helper bpf_redirect_map,
         general bpf-prog side restriction moved to earlier patch.
      Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9c270af3