1. 16 11月, 2019 3 次提交
    • A
      bpf: Introduce BPF trampoline · fec56f58
      Alexei Starovoitov 提交于
      Introduce BPF trampoline concept to allow kernel code to call into BPF programs
      with practically zero overhead.  The trampoline generation logic is
      architecture dependent.  It's converting native calling convention into BPF
      calling convention.  BPF ISA is 64-bit (even on 32-bit architectures). The
      registers R1 to R5 are used to pass arguments into BPF functions. The main BPF
      program accepts only single argument "ctx" in R1. Whereas CPU native calling
      convention is different. x86-64 is passing first 6 arguments in registers
      and the rest on the stack. x86-32 is passing first 3 arguments in registers.
      sparc64 is passing first 6 in registers. And so on.
      
      The trampolines between BPF and kernel already exist.  BPF_CALL_x macros in
      include/linux/filter.h statically compile trampolines from BPF into kernel
      helpers. They convert up to five u64 arguments into kernel C pointers and
      integers. On 64-bit architectures this BPF_to_kernel trampolines are nops. On
      32-bit architecture they're meaningful.
      
      The opposite job kernel_to_BPF trampolines is done by CAST_TO_U64 macros and
      __bpf_trace_##call() shim functions in include/trace/bpf_probe.h. They convert
      kernel function arguments into array of u64s that BPF program consumes via
      R1=ctx pointer.
      
      This patch set is doing the same job as __bpf_trace_##call() static
      trampolines, but dynamically for any kernel function. There are ~22k global
      kernel functions that are attachable via nop at function entry. The function
      arguments and types are described in BTF.  The job of btf_distill_func_proto()
      function is to extract useful information from BTF into "function model" that
      architecture dependent trampoline generators will use to generate assembly code
      to cast kernel function arguments into array of u64s.  For example the kernel
      function eth_type_trans has two pointers. They will be casted to u64 and stored
      into stack of generated trampoline. The pointer to that stack space will be
      passed into BPF program in R1. On x86-64 such generated trampoline will consume
      16 bytes of stack and two stores of %rdi and %rsi into stack. The verifier will
      make sure that only two u64 are accessed read-only by BPF program. The verifier
      will also recognize the precise type of the pointers being accessed and will
      not allow typecasting of the pointer to a different type within BPF program.
      
      The tracing use case in the datacenter demonstrated that certain key kernel
      functions have (like tcp_retransmit_skb) have 2 or more kprobes that are always
      active.  Other functions have both kprobe and kretprobe.  So it is essential to
      keep both kernel code and BPF programs executing at maximum speed. Hence
      generated BPF trampoline is re-generated every time new program is attached or
      detached to maintain maximum performance.
      
      To avoid the high cost of retpoline the attached BPF programs are called
      directly. __bpf_prog_enter/exit() are used to support per-program execution
      stats.  In the future this logic will be optimized further by adding support
      for bpf_stats_enabled_key inside generated assembly code. Introduction of
      preemptible and sleepable BPF programs will completely remove the need to call
      to __bpf_prog_enter/exit().
      
      Detach of a BPF program from the trampoline should not fail. To avoid memory
      allocation in detach path the half of the page is used as a reserve and flipped
      after each attach/detach. 2k bytes is enough to call 40+ BPF programs directly
      which is enough for BPF tracing use cases. This limit can be increased in the
      future.
      
      BPF_TRACE_FENTRY programs have access to raw kernel function arguments while
      BPF_TRACE_FEXIT programs have access to kernel return value as well. Often
      kprobe BPF program remembers function arguments in a map while kretprobe
      fetches arguments from a map and analyzes them together with return value.
      BPF_TRACE_FEXIT accelerates this typical use case.
      
      Recursion prevention for kprobe BPF programs is done via per-cpu
      bpf_prog_active counter. In practice that turned out to be a mistake. It
      caused programs to randomly skip execution. The tracing tools missed results
      they were looking for. Hence BPF trampoline doesn't provide builtin recursion
      prevention. It's a job of BPF program itself and will be addressed in the
      follow up patches.
      
      BPF trampoline is intended to be used beyond tracing and fentry/fexit use cases
      in the future. For example to remove retpoline cost from XDP programs.
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAndrii Nakryiko <andriin@fb.com>
      Acked-by: NSong Liu <songliubraving@fb.com>
      Link: https://lore.kernel.org/bpf/20191114185720.1641606-5-ast@kernel.org
      fec56f58
    • A
      bpf: Add bpf_arch_text_poke() helper · 5964b200
      Alexei Starovoitov 提交于
      Add bpf_arch_text_poke() helper that is used by BPF trampoline logic to patch
      nops/calls in kernel text into calls into BPF trampoline and to patch
      calls/nops inside BPF programs too.
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NSong Liu <songliubraving@fb.com>
      Acked-by: NAndrii Nakryiko <andriin@fb.com>
      Link: https://lore.kernel.org/bpf/20191114185720.1641606-4-ast@kernel.org
      5964b200
    • A
      bpf: Refactor x86 JIT into helpers · 3b2744e6
      Alexei Starovoitov 提交于
      Refactor x86 JITing of LDX, STX, CALL instructions into separate helper
      functions.  No functional changes in LDX and STX helpers.  There is a minor
      change in CALL helper. It will populate target address correctly on the first
      pass of JIT instead of second pass. That won't reduce total number of JIT
      passes though.
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NSong Liu <songliubraving@fb.com>
      Acked-by: NAndrii Nakryiko <andriin@fb.com>
      Link: https://lore.kernel.org/bpf/20191114185720.1641606-3-ast@kernel.org
      3b2744e6
  2. 17 10月, 2019 1 次提交
    • A
      bpf: Add support for BTF pointers to x86 JIT · 3dec541b
      Alexei Starovoitov 提交于
      Pointer to BTF object is a pointer to kernel object or NULL.
      Such pointers can only be used by BPF_LDX instructions.
      The verifier changed their opcode from LDX|MEM|size
      to LDX|PROBE_MEM|size to make JITing easier.
      The number of entries in extable is the number of BPF_LDX insns
      that access kernel memory via "pointer to BTF type".
      Only these load instructions can fault.
      Since x86 extable is relative it has to be allocated in the same
      memory region as JITed code.
      Allocate it prior to last pass of JITing and let the last pass populate it.
      Pointer to extable in bpf_prog_aux is necessary to make page fault
      handling fast.
      Page fault handling is done in two steps:
      1. bpf_prog_kallsyms_find() finds BPF program that page faulted.
         It's done by walking rb tree.
      2. then extable for given bpf program is binary searched.
      This process is similar to how page faulting is done for kernel modules.
      The exception handler skips over faulting x86 instruction and
      initializes destination register with zero. This mimics exact
      behavior of bpf_probe_read (when probe_kernel_read faults dest is zeroed).
      
      JITs for other architectures can add support in similar way.
      Until then they will reject unknown opcode and fallback to interpreter.
      
      Since extable should be aligned and placed near JITed code
      make bpf_jit_binary_alloc() return 4 byte aligned image offset,
      so that extable aligning formula in bpf_int_jit_compile() doesn't need
      to rely on internal implementation of bpf_jit_binary_alloc().
      On x86 gcc defaults to 16-byte alignment for regular kernel functions
      due to better performance. JITed code may be aligned to 16 in the future,
      but it will use 4 in the meantime.
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAndrii Nakryiko <andriin@fb.com>
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Link: https://lore.kernel.org/bpf/20191016032505.2089704-10-ast@kernel.org
      3dec541b
  3. 05 10月, 2019 1 次提交
  4. 02 8月, 2019 1 次提交
    • A
      bpf: fix x64 JIT code generation for jmp to 1st insn · 7c2e988f
      Alexei Starovoitov 提交于
      Introduction of bounded loops exposed old bug in x64 JIT.
      JIT maintains the array of offsets to the end of all instructions to
      compute jmp offsets.
      addrs[0] - offset of the end of the 1st insn (that includes prologue).
      addrs[1] - offset of the end of the 2nd insn.
      JIT didn't keep the offset of the beginning of the 1st insn,
      since classic BPF didn't have backward jumps and valid extended BPF
      couldn't have a branch to 1st insn, because it didn't allow loops.
      With bounded loops it's possible to construct a valid program that
      jumps backwards to the 1st insn.
      Fix JIT by computing:
      addrs[0] - offset of the end of prologue == start of the 1st insn.
      addrs[1] - offset of the end of 1st insn.
      
      v1->v2:
      - Yonghong noticed a bug in jit linfo.
        Fix it by passing 'addrs + 1' to bpf_prog_fill_jited_linfo(),
        since it expects insn_to_jit_off array to be offsets to last byte.
      
      Reported-by: syzbot+35101610ff3e83119b1b@syzkaller.appspotmail.com
      Fixes: 2589726d ("bpf: introduce bounded loops")
      Fixes: 0a14842f ("net: filter: Just In Time compiler for x86-64")
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NSong Liu <songliubraving@fb.com>
      7c2e988f
  5. 15 6月, 2019 1 次提交
  6. 05 6月, 2019 1 次提交
  7. 27 1月, 2019 1 次提交
  8. 10 12月, 2018 1 次提交
    • M
      bpf: Add bpf_line_info support · c454a46b
      Martin KaFai Lau 提交于
      This patch adds bpf_line_info support.
      
      It accepts an array of bpf_line_info objects during BPF_PROG_LOAD.
      The "line_info", "line_info_cnt" and "line_info_rec_size" are added
      to the "union bpf_attr".  The "line_info_rec_size" makes
      bpf_line_info extensible in the future.
      
      The new "check_btf_line()" ensures the userspace line_info is valid
      for the kernel to use.
      
      When the verifier is translating/patching the bpf_prog (through
      "bpf_patch_insn_single()"), the line_infos' insn_off is also
      adjusted by the newly added "bpf_adj_linfo()".
      
      If the bpf_prog is jited, this patch also provides the jited addrs (in
      aux->jited_linfo) for the corresponding line_info.insn_off.
      "bpf_prog_fill_jited_linfo()" is added to fill the aux->jited_linfo.
      It is currently called by the x86 jit.  Other jits can also use
      "bpf_prog_fill_jited_linfo()" and it will be done in the followup patches.
      In the future, if it deemed necessary, a particular jit could also provide
      its own "bpf_prog_fill_jited_linfo()" implementation.
      
      A few "*line_info*" fields are added to the bpf_prog_info such
      that the user can get the xlated line_info back (i.e. the line_info
      with its insn_off reflecting the translated prog).  The jited_line_info
      is available if the prog is jited.  It is an array of __u64.
      If the prog is not jited, jited_line_info_cnt is 0.
      
      The verifier's verbose log with line_info will be done in
      a follow up patch.
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Acked-by: NYonghong Song <yhs@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      c454a46b
  9. 13 6月, 2018 1 次提交
    • K
      treewide: kmalloc() -> kmalloc_array() · 6da2ec56
      Kees Cook 提交于
      The kmalloc() function has a 2-factor argument form, kmalloc_array(). This
      patch replaces cases of:
      
              kmalloc(a * b, gfp)
      
      with:
              kmalloc_array(a * b, gfp)
      
      as well as handling cases of:
      
              kmalloc(a * b * c, gfp)
      
      with:
      
              kmalloc(array3_size(a, b, c), gfp)
      
      as it's slightly less ugly than:
      
              kmalloc_array(array_size(a, b), c, gfp)
      
      This does, however, attempt to ignore constant size factors like:
      
              kmalloc(4 * 1024, gfp)
      
      though any constants defined via macros get caught up in the conversion.
      
      Any factors with a sizeof() of "unsigned char", "char", and "u8" were
      dropped, since they're redundant.
      
      The tools/ directory was manually excluded, since it has its own
      implementation of kmalloc().
      
      The Coccinelle script used for this was:
      
      // Fix redundant parens around sizeof().
      @@
      type TYPE;
      expression THING, E;
      @@
      
      (
        kmalloc(
      -	(sizeof(TYPE)) * E
      +	sizeof(TYPE) * E
        , ...)
      |
        kmalloc(
      -	(sizeof(THING)) * E
      +	sizeof(THING) * E
        , ...)
      )
      
      // Drop single-byte sizes and redundant parens.
      @@
      expression COUNT;
      typedef u8;
      typedef __u8;
      @@
      
      (
        kmalloc(
      -	sizeof(u8) * (COUNT)
      +	COUNT
        , ...)
      |
        kmalloc(
      -	sizeof(__u8) * (COUNT)
      +	COUNT
        , ...)
      |
        kmalloc(
      -	sizeof(char) * (COUNT)
      +	COUNT
        , ...)
      |
        kmalloc(
      -	sizeof(unsigned char) * (COUNT)
      +	COUNT
        , ...)
      |
        kmalloc(
      -	sizeof(u8) * COUNT
      +	COUNT
        , ...)
      |
        kmalloc(
      -	sizeof(__u8) * COUNT
      +	COUNT
        , ...)
      |
        kmalloc(
      -	sizeof(char) * COUNT
      +	COUNT
        , ...)
      |
        kmalloc(
      -	sizeof(unsigned char) * COUNT
      +	COUNT
        , ...)
      )
      
      // 2-factor product with sizeof(type/expression) and identifier or constant.
      @@
      type TYPE;
      expression THING;
      identifier COUNT_ID;
      constant COUNT_CONST;
      @@
      
      (
      - kmalloc
      + kmalloc_array
        (
      -	sizeof(TYPE) * (COUNT_ID)
      +	COUNT_ID, sizeof(TYPE)
        , ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	sizeof(TYPE) * COUNT_ID
      +	COUNT_ID, sizeof(TYPE)
        , ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	sizeof(TYPE) * (COUNT_CONST)
      +	COUNT_CONST, sizeof(TYPE)
        , ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	sizeof(TYPE) * COUNT_CONST
      +	COUNT_CONST, sizeof(TYPE)
        , ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	sizeof(THING) * (COUNT_ID)
      +	COUNT_ID, sizeof(THING)
        , ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	sizeof(THING) * COUNT_ID
      +	COUNT_ID, sizeof(THING)
        , ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	sizeof(THING) * (COUNT_CONST)
      +	COUNT_CONST, sizeof(THING)
        , ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	sizeof(THING) * COUNT_CONST
      +	COUNT_CONST, sizeof(THING)
        , ...)
      )
      
      // 2-factor product, only identifiers.
      @@
      identifier SIZE, COUNT;
      @@
      
      - kmalloc
      + kmalloc_array
        (
      -	SIZE * COUNT
      +	COUNT, SIZE
        , ...)
      
      // 3-factor product with 1 sizeof(type) or sizeof(expression), with
      // redundant parens removed.
      @@
      expression THING;
      identifier STRIDE, COUNT;
      type TYPE;
      @@
      
      (
        kmalloc(
      -	sizeof(TYPE) * (COUNT) * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        kmalloc(
      -	sizeof(TYPE) * (COUNT) * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        kmalloc(
      -	sizeof(TYPE) * COUNT * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        kmalloc(
      -	sizeof(TYPE) * COUNT * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        kmalloc(
      -	sizeof(THING) * (COUNT) * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      |
        kmalloc(
      -	sizeof(THING) * (COUNT) * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      |
        kmalloc(
      -	sizeof(THING) * COUNT * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      |
        kmalloc(
      -	sizeof(THING) * COUNT * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      )
      
      // 3-factor product with 2 sizeof(variable), with redundant parens removed.
      @@
      expression THING1, THING2;
      identifier COUNT;
      type TYPE1, TYPE2;
      @@
      
      (
        kmalloc(
      -	sizeof(TYPE1) * sizeof(TYPE2) * COUNT
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
        , ...)
      |
        kmalloc(
      -	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
        , ...)
      |
        kmalloc(
      -	sizeof(THING1) * sizeof(THING2) * COUNT
      +	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
        , ...)
      |
        kmalloc(
      -	sizeof(THING1) * sizeof(THING2) * (COUNT)
      +	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
        , ...)
      |
        kmalloc(
      -	sizeof(TYPE1) * sizeof(THING2) * COUNT
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
        , ...)
      |
        kmalloc(
      -	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
        , ...)
      )
      
      // 3-factor product, only identifiers, with redundant parens removed.
      @@
      identifier STRIDE, SIZE, COUNT;
      @@
      
      (
        kmalloc(
      -	(COUNT) * STRIDE * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kmalloc(
      -	COUNT * (STRIDE) * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kmalloc(
      -	COUNT * STRIDE * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kmalloc(
      -	(COUNT) * (STRIDE) * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kmalloc(
      -	COUNT * (STRIDE) * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kmalloc(
      -	(COUNT) * STRIDE * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kmalloc(
      -	(COUNT) * (STRIDE) * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kmalloc(
      -	COUNT * STRIDE * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      )
      
      // Any remaining multi-factor products, first at least 3-factor products,
      // when they're not all constants...
      @@
      expression E1, E2, E3;
      constant C1, C2, C3;
      @@
      
      (
        kmalloc(C1 * C2 * C3, ...)
      |
        kmalloc(
      -	(E1) * E2 * E3
      +	array3_size(E1, E2, E3)
        , ...)
      |
        kmalloc(
      -	(E1) * (E2) * E3
      +	array3_size(E1, E2, E3)
        , ...)
      |
        kmalloc(
      -	(E1) * (E2) * (E3)
      +	array3_size(E1, E2, E3)
        , ...)
      |
        kmalloc(
      -	E1 * E2 * E3
      +	array3_size(E1, E2, E3)
        , ...)
      )
      
      // And then all remaining 2 factors products when they're not all constants,
      // keeping sizeof() as the second factor argument.
      @@
      expression THING, E1, E2;
      type TYPE;
      constant C1, C2, C3;
      @@
      
      (
        kmalloc(sizeof(THING) * C2, ...)
      |
        kmalloc(sizeof(TYPE) * C2, ...)
      |
        kmalloc(C1 * C2 * C3, ...)
      |
        kmalloc(C1 * C2, ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	sizeof(TYPE) * (E2)
      +	E2, sizeof(TYPE)
        , ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	sizeof(TYPE) * E2
      +	E2, sizeof(TYPE)
        , ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	sizeof(THING) * (E2)
      +	E2, sizeof(THING)
        , ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	sizeof(THING) * E2
      +	E2, sizeof(THING)
        , ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	(E1) * E2
      +	E1, E2
        , ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	(E1) * (E2)
      +	E1, E2
        , ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	E1 * E2
      +	E1, E2
        , ...)
      )
      Signed-off-by: NKees Cook <keescook@chromium.org>
      6da2ec56
  10. 04 5月, 2018 1 次提交
  11. 03 5月, 2018 2 次提交
    • D
      bpf, x64: fix memleak when not converging on calls · 39f56ca9
      Daniel Borkmann 提交于
      The JIT logic in jit_subprogs() is as follows: for all subprogs we
      allocate a bpf_prog_alloc(), populate it (prog->is_func = 1 here),
      and pass it to bpf_int_jit_compile(). If a failure occurred during
      JIT and prog->jited is not set, then we bail out from attempting to
      JIT the whole program, and punt to the interpreter instead. In case
      JITing went successful, we fixup BPF call offsets and do another
      pass to bpf_int_jit_compile() (extra_pass is true at that point) to
      complete JITing calls. Given that requires to pass JIT context around
      addrs and jit_data from x86 JIT are freed in the extra_pass in
      bpf_int_jit_compile() when calls are involved (if not, they can
      be freed immediately). However, if in the original pass, the JIT
      image didn't converge then we leak addrs and jit_data since image
      itself is NULL, the prog->is_func is set and extra_pass is false
      in that case, meaning both will become unreachable and are never
      cleaned up, therefore we need to free as well on !image. Only x64
      JIT is affected.
      
      Fixes: 1c2a088a ("bpf: x64: add JIT support for multi-function programs")
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      39f56ca9
    • D
      bpf, x64: fix memleak when not converging after image · 3aab8884
      Daniel Borkmann 提交于
      While reviewing x64 JIT code, I noticed that we leak the prior allocated
      JIT image in the case where proglen != oldproglen during the JIT passes.
      Prior to the commit e0ee9c12 ("x86: bpf_jit: fix two bugs in eBPF JIT
      compiler") we would just break out of the loop, and using the image as the
      JITed prog since it could only shrink in size anyway. After e0ee9c12,
      we would bail out to out_addrs label where we free addrs and jit_data but
      not the image coming from bpf_jit_binary_alloc().
      
      Fixes: e0ee9c12 ("x86: bpf_jit: fix two bugs in eBPF JIT compiler")
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      3aab8884
  12. 02 5月, 2018 1 次提交
    • I
      x86/bpf: Clean up non-standard comments, to make the code more readable · a2c7a983
      Ingo Molnar 提交于
      So by chance I looked into x86 assembly in arch/x86/net/bpf_jit_comp.c and
      noticed the weird and inconsistent comment style it mistakenly learned from
      the networking code:
      
       /* Multi-line comment ...
        * ... looks like this.
        */
      
      Fix this to use the standard comment style specified in Documentation/CodingStyle
      and used in arch/x86/ as well:
      
       /*
        * Multi-line comment ...
        * ... looks like this.
        */
      
      Also, to quote Linus's ... more explicit views about this:
      
        http://article.gmane.org/gmane.linux.kernel.cryptoapi/21066
      
        > But no, the networking code picked *none* of the above sane formats.
        > Instead, it picked these two models that are just half-arsed
        > shit-for-brains:
        >
        >  (no)
        >      /* This is disgusting drug-induced
        >        * crap, and should die
        >        */
        >
        >   (no-no-no)
        >       /* This is also very nasty
        >        * and visually unbalanced */
        >
        > Please. The networking code actually has the *worst* possible comment
        > style. You can literally find that (no-no-no) style, which is just
        > really horribly disgusting and worse than the otherwise fairly similar
        > (d) in pretty much every way.
      
      Also improve the comments and some other details while at it:
      
       - Don't mix same-line and previous-line comment style on otherwise
         identical code patterns within the same function,
      
       - capitalize 'BPF' and x86 register names consistently,
      
       - capitalize sentences consistently,
      
       - instead of 'x64' use 'x86-64': x64 is a Microsoft specific term,
      
       - use more consistent punctuation,
      
       - use standard coding style in macros as well,
      
       - fix typos and a few other minor details.
      
      Consistent coding style is not optional, at least in arch/x86/.
      
      No change in functionality.
      
      ( In case this commit causes conflicts with pending development code
        I'll be glad to help resolve any conflicts! )
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Alexei Starovoitov <ast@fb.com>
      Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
      Cc: netdev@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      a2c7a983
  13. 27 4月, 2018 1 次提交
    • I
      x86/bpf: Clean up non-standard comments, to make the code more readable · 5f26c501
      Ingo Molnar 提交于
      So by chance I looked into x86 assembly in arch/x86/net/bpf_jit_comp.c and
      noticed the weird and inconsistent comment style it mistakenly learned from
      the networking code:
      
       /* Multi-line comment ...
        * ... looks like this.
        */
      
      Fix this to use the standard comment style specified in Documentation/CodingStyle
      and used in arch/x86/ as well:
      
       /*
        * Multi-line comment ...
        * ... looks like this.
        */
      
      Also, to quote Linus's ... more explicit views about this:
      
        http://article.gmane.org/gmane.linux.kernel.cryptoapi/21066
      
        > But no, the networking code picked *none* of the above sane formats.
        > Instead, it picked these two models that are just half-arsed
        > shit-for-brains:
        >
        >  (no)
        >      /* This is disgusting drug-induced
        >        * crap, and should die
        >        */
        >
        >   (no-no-no)
        >       /* This is also very nasty
        >        * and visually unbalanced */
        >
        > Please. The networking code actually has the *worst* possible comment
        > style. You can literally find that (no-no-no) style, which is just
        > really horribly disgusting and worse than the otherwise fairly similar
        > (d) in pretty much every way.
      
      Also improve the comments and some other details while at it:
      
       - Don't mix same-line and previous-line comment style on otherwise
         identical code patterns within the same function,
      
       - capitalize 'BPF' and x86 register names consistently,
      
       - capitalize sentences consistently,
      
       - instead of 'x64' use 'x86-64': x64 is a Microsoft specific term,
      
       - use more consistent punctuation,
      
       - use standard coding style in macros as well,
      
       - fix typos and a few other minor details.
      
      Consistent coding style is not optional, at least in arch/x86/.
      
      No change in functionality.
      
      ( In case this commit causes conflicts with pending development code
        I'll be glad to help resolve any conflicts! )
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Alexei Starovoitov <ast@fb.com>
      Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
      Cc: netdev@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      5f26c501
  14. 25 4月, 2018 1 次提交
    • G
      bpf, x64: fix JIT emission for dead code · 1612a981
      Gianluca Borello 提交于
      Commit 2a5418a1 ("bpf: improve dead code sanitizing") replaced dead
      code with a series of ja-1 instructions, for safety. That made JIT
      compilation much more complex for some BPF programs. One instance of such
      programs is, for example:
      
      bool flag = false
      ...
      /* A bunch of other code */
      ...
      if (flag)
              do_something()
      
      In some cases llvm is not able to remove at compile time the code for
      do_something(), so the generated BPF program ends up with a large amount
      of dead instructions. In one specific real life example, there are two
      series of ~500 and ~1000 dead instructions in the program. When the
      verifier replaces them with a series of ja-1 instructions, it causes an
      interesting behavior at JIT time.
      
      During the first pass, since all the instructions are estimated at 64
      bytes, the ja-1 instructions end up being translated as 5 bytes JMP
      instructions (0xE9), since the jump offsets become increasingly large (>
      127) as each instruction gets discovered to be 5 bytes instead of the
      estimated 64.
      
      Starting from the second pass, the first N instructions of the ja-1
      sequence get translated into 2 bytes JMPs (0xEB) because the jump offsets
      become <= 127 this time. In particular, N is defined as roughly 127 / (5
      - 2) ~= 42. So, each further pass will make the subsequent N JMP
      instructions shrink from 5 to 2 bytes, making the image shrink every time.
      This means that in order to have the entire program converge, there need
      to be, in the real example above, at least ~1000 / 42 ~= 24 passes just
      for translating the dead code. If we add this number to the passes needed
      to translate the other non dead code, it brings such program to 40+
      passes, and JIT doesn't complete. Ultimately the userspace loader fails
      because such BPF program was supposed to be part of a prog array owner
      being JITed.
      
      While it is certainly possible to try to refactor such programs to help
      the compiler remove dead code, the behavior is not really intuitive and it
      puts further burden on the BPF developer who is not expecting such
      behavior. To make things worse, such programs are working just fine in all
      the kernel releases prior to the ja-1 fix.
      
      A possible approach to mitigate this behavior consists into noticing that
      for ja-1 instructions we don't really need to rely on the estimated size
      of the previous and current instructions, we know that a -1 BPF jump
      offset can be safely translated into a 0xEB instruction with a jump offset
      of -2.
      
      Such fix brings the BPF program in the previous example to complete again
      in ~9 passes.
      
      Fixes: 2a5418a1 ("bpf: improve dead code sanitizing")
      Signed-off-by: NGianluca Borello <g.borello@gmail.com>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      1612a981
  15. 08 3月, 2018 1 次提交
    • D
      bpf, x64: increase number of passes · 6007b080
      Daniel Borkmann 提交于
      In Cilium some of the main programs we run today are hitting 9 passes
      on x64's JIT compiler, and we've had cases already where we surpassed
      the limit where the JIT then punts the program to the interpreter
      instead, leading to insertion failures due to CONFIG_BPF_JIT_ALWAYS_ON
      or insertion failures due to the prog array owner being JITed but the
      program to insert not (both must have the same JITed/non-JITed property).
      
      One concrete case the program image shrunk from 12,767 bytes down to
      10,288 bytes where the image converged after 16 steps. I've measured
      that this took 340us in the JIT until it converges on my i7-6600U. Thus,
      increase the original limit we had from day one where the JIT covered
      cBPF only back then before we run into the case (as similar with the
      complexity limit) where we trip over this and hit program rejections.
      Also add a cond_resched() into the compilation loop, the JIT process
      runs without any locks and may sleep anyway.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      6007b080
  16. 28 2月, 2018 1 次提交
  17. 24 2月, 2018 5 次提交
  18. 23 2月, 2018 1 次提交
    • D
      bpf, x64: implement retpoline for tail call · a493a87f
      Daniel Borkmann 提交于
      Implement a retpoline [0] for the BPF tail call JIT'ing that converts
      the indirect jump via jmp %rax that is used to make the long jump into
      another JITed BPF image. Since this is subject to speculative execution,
      we need to control the transient instruction sequence here as well
      when CONFIG_RETPOLINE is set, and direct it into a pause + lfence loop.
      The latter aligns also with what gcc / clang emits (e.g. [1]).
      
      JIT dump after patch:
      
        # bpftool p d x i 1
         0: (18) r2 = map[id:1]
         2: (b7) r3 = 0
         3: (85) call bpf_tail_call#12
         4: (b7) r0 = 2
         5: (95) exit
      
      With CONFIG_RETPOLINE:
      
        # bpftool p d j i 1
        [...]
        33:	cmp    %edx,0x24(%rsi)
        36:	jbe    0x0000000000000072  |*
        38:	mov    0x24(%rbp),%eax
        3e:	cmp    $0x20,%eax
        41:	ja     0x0000000000000072  |
        43:	add    $0x1,%eax
        46:	mov    %eax,0x24(%rbp)
        4c:	mov    0x90(%rsi,%rdx,8),%rax
        54:	test   %rax,%rax
        57:	je     0x0000000000000072  |
        59:	mov    0x28(%rax),%rax
        5d:	add    $0x25,%rax
        61:	callq  0x000000000000006d  |+
        66:	pause                      |
        68:	lfence                     |
        6b:	jmp    0x0000000000000066  |
        6d:	mov    %rax,(%rsp)         |
        71:	retq                       |
        72:	mov    $0x2,%eax
        [...]
      
        * relative fall-through jumps in error case
        + retpoline for indirect jump
      
      Without CONFIG_RETPOLINE:
      
        # bpftool p d j i 1
        [...]
        33:	cmp    %edx,0x24(%rsi)
        36:	jbe    0x0000000000000063  |*
        38:	mov    0x24(%rbp),%eax
        3e:	cmp    $0x20,%eax
        41:	ja     0x0000000000000063  |
        43:	add    $0x1,%eax
        46:	mov    %eax,0x24(%rbp)
        4c:	mov    0x90(%rsi,%rdx,8),%rax
        54:	test   %rax,%rax
        57:	je     0x0000000000000063  |
        59:	mov    0x28(%rax),%rax
        5d:	add    $0x25,%rax
        61:	jmpq   *%rax               |-
        63:	mov    $0x2,%eax
        [...]
      
        * relative fall-through jumps in error case
        - plain indirect jump as before
      
        [0] https://support.google.com/faqs/answer/7625886
        [1] https://github.com/gcc-mirror/gcc/commit/a31e654fa107be968b802786d747e962c2fcdb2bSigned-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      a493a87f
  19. 27 1月, 2018 1 次提交
  20. 20 1月, 2018 2 次提交
  21. 18 12月, 2017 2 次提交
    • A
      bpf: x64: add JIT support for multi-function programs · 1c2a088a
      Alexei Starovoitov 提交于
      Typical JIT does several passes over bpf instructions to
      compute total size and relative offsets of jumps and calls.
      With multitple bpf functions calling each other all relative calls
      will have invalid offsets intially therefore we need to additional
      last pass over the program to emit calls with correct offsets.
      For example in case of three bpf functions:
      main:
        call foo
        call bpf_map_lookup
        exit
      foo:
        call bar
        exit
      bar:
        exit
      
      We will call bpf_int_jit_compile() indepedently for main(), foo() and bar()
      x64 JIT typically does 4-5 passes to converge.
      After these initial passes the image for these 3 functions
      will be good except call targets, since start addresses of
      foo() and bar() are unknown when we were JITing main()
      (note that call bpf_map_lookup will be resolved properly
      during initial passes).
      Once start addresses of 3 functions are known we patch
      call_insn->imm to point to right functions and call
      bpf_int_jit_compile() again which needs only one pass.
      Additional safety checks are done to make sure this
      last pass doesn't produce image that is larger or smaller
      than previous pass.
      
      When constant blinding is on it's applied to all functions
      at the first pass, since doing it once again at the last
      pass can change size of the JITed code.
      
      Tested on x64 and arm64 hw with JIT on/off, blinding on/off.
      x64 jits bpf-to-bpf calls correctly while arm64 falls back to interpreter.
      All other JITs that support normal BPF_CALL will behave the same way
      since bpf-to-bpf call is equivalent to bpf-to-kernel call from
      JITs point of view.
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      1c2a088a
    • A
      bpf: fix net.core.bpf_jit_enable race · 60b58afc
      Alexei Starovoitov 提交于
      global bpf_jit_enable variable is tested multiple times in JITs,
      blinding and verifier core. The malicious root can try to toggle
      it while loading the programs. This race condition was accounted
      for and there should be no issues, but it's safer to avoid
      this race condition.
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      60b58afc
  22. 04 10月, 2017 1 次提交
  23. 01 9月, 2017 1 次提交
  24. 10 8月, 2017 1 次提交
  25. 07 6月, 2017 1 次提交
  26. 01 6月, 2017 3 次提交
  27. 09 5月, 2017 1 次提交
  28. 29 4月, 2017 1 次提交
  29. 22 2月, 2017 1 次提交