1. 11 2月, 2017 4 次提交
  2. 07 2月, 2017 2 次提交
  3. 26 1月, 2017 2 次提交
  4. 25 1月, 2017 2 次提交
    • D
      bpf: enable verifier to better track const alu ops · 3fadc801
      Daniel Borkmann 提交于
      William reported couple of issues in relation to direct packet
      access. Typical scheme is to check for data + [off] <= data_end,
      where [off] can be either immediate or coming from a tracked
      register that contains an immediate, depending on the branch, we
      can then access the data. However, in case of calculating [off]
      for either the mentioned test itself or for access after the test
      in a more "complex" way, then the verifier will stop tracking the
      CONST_IMM marked register and will mark it as UNKNOWN_VALUE one.
      
      Adding that UNKNOWN_VALUE typed register to a pkt() marked
      register, the verifier then bails out in check_packet_ptr_add()
      as it finds the registers imm value below 48. In the first below
      example, that is due to evaluate_reg_imm_alu() not handling right
      shifts and thus marking the register as UNKNOWN_VALUE via helper
      __mark_reg_unknown_value() that resets imm to 0.
      
      In the second case the same happens at the time when r4 is set
      to r4 &= r5, where it transitions to UNKNOWN_VALUE from
      evaluate_reg_imm_alu(). Later on r4 we shift right by 3 inside
      evaluate_reg_alu(), where the register's imm turns into 3. That
      is, for registers with type UNKNOWN_VALUE, imm of 0 means that
      we don't know what value the register has, and for imm > 0 it
      means that the value has [imm] upper zero bits. F.e. when shifting
      an UNKNOWN_VALUE register by 3 to the right, no matter what value
      it had, we know that the 3 upper most bits must be zero now.
      This is to make sure that ALU operations with unknown registers
      don't overflow. Meaning, once we know that we have more than 48
      upper zero bits, or, in other words cannot go beyond 0xffff offset
      with ALU ops, such an addition will track the target register
      as a new pkt() register with a new id, but 0 offset and 0 range,
      so for that a new data/data_end test will be required. Is the source
      register a CONST_IMM one that is to be added to the pkt() register,
      or the source instruction is an add instruction with immediate
      value, then it will get added if it stays within max 0xffff bounds.
      >From there, pkt() type, can be accessed should reg->off + imm be
      within the access range of pkt().
      
        [...]
        from 28 to 30: R0=imm1,min_value=1,max_value=1
          R1=pkt(id=0,off=0,r=22) R2=pkt_end
          R3=imm144,min_value=144,max_value=144
          R4=imm0,min_value=0,max_value=0
          R5=inv48,min_value=2054,max_value=2054 R10=fp
        30: (bf) r5 = r3
        31: (07) r5 += 23
        32: (77) r5 >>= 3
        33: (bf) r6 = r1
        34: (0f) r6 += r5
        cannot add integer value with 0 upper zero bits to ptr_to_packet
      
        [...]
        from 52 to 80: R0=imm1,min_value=1,max_value=1
          R1=pkt(id=0,off=0,r=34) R2=pkt_end R3=inv
          R4=imm272 R5=inv56,min_value=17,max_value=17
          R6=pkt(id=0,off=26,r=34) R10=fp
        80: (07) r4 += 71
        81: (18) r5 = 0xfffffff8
        83: (5f) r4 &= r5
        84: (77) r4 >>= 3
        85: (0f) r1 += r4
        cannot add integer value with 3 upper zero bits to ptr_to_packet
      
      Thus to get above use-cases working, evaluate_reg_imm_alu() has
      been extended for further ALU ops. This is fine, because we only
      operate strictly within realm of CONST_IMM types, so here we don't
      care about overflows as they will happen in the simulated but also
      real execution and interaction with pkt() in check_packet_ptr_add()
      will check actual imm value once added to pkt(), but it's irrelevant
      before.
      
      With regards to 06c1c049 ("bpf: allow helpers access to variable
      memory") that works on UNKNOWN_VALUE registers, the verifier becomes
      now a bit smarter as it can better resolve ALU ops, so we need to
      adapt two test cases there, as min/max bound tracking only becomes
      necessary when registers were spilled to stack. So while mask was
      set before to track upper bound for UNKNOWN_VALUE case, it's now
      resolved directly as CONST_IMM, and such contructs are only necessary
      when f.e. registers are spilled.
      
      For commit 6b173873 ("bpf: recognize 64bit immediate loads as
      consts") that initially enabled dw load tracking only for nfp jit/
      analyzer, I did couple of tests on large, complex programs and we
      don't increase complexity badly (my tests were in ~3% range on avg).
      I've added a couple of tests similar to affected code above, and
      it works fine with verifier now.
      Reported-by: NWilliam Tu <u9012063@gmail.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Cc: Gianluca Borello <g.borello@gmail.com>
      Cc: William Tu <u9012063@gmail.com>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3fadc801
    • D
      bpf: add prog tag test case to bpf selftests · 62b64660
      Daniel Borkmann 提交于
      Add the test case used to compare the results from fdinfo with
      af_alg's output on the tag. Tests are from min to max sized
      programs, with and without maps included.
      
        # ./test_tag
        test_tag: OK (40945 tests)
      
      Tested on x86_64 and s390x.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      62b64660
  5. 24 1月, 2017 1 次提交
  6. 20 1月, 2017 2 次提交
  7. 19 1月, 2017 1 次提交
  8. 18 1月, 2017 2 次提交
  9. 17 1月, 2017 3 次提交
    • M
      perf probe: Fix to probe on gcc generated functions in modules · 613f050d
      Masami Hiramatsu 提交于
      Fix to probe on gcc generated functions on modules. Since
      probing on a module is based on its symbol name, it should
      be adjusted on actual symbols.
      
      E.g. without this fix, perf probe shows probe definition
      on non-exist symbol as below.
      
        $ perf probe -m build-x86_64/net/netfilter/nf_nat.ko -F in_range*
        in_range.isra.12
        $ perf probe -m build-x86_64/net/netfilter/nf_nat.ko -D in_range
        p:probe/in_range nf_nat:in_range+0
      
      With this fix, perf probe correctly shows a probe on
      gcc-generated symbol.
      
        $ perf probe -m build-x86_64/net/netfilter/nf_nat.ko -D in_range
        p:probe/in_range nf_nat:in_range.isra.12+0
      
      This also fixes same problem on online module as below.
      
        $ perf probe -m i915 -D assert_plane
        p:probe/assert_plane i915:assert_plane.constprop.134+0
      Signed-off-by: NMasami Hiramatsu <mhiramat@kernel.org>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/148411450673.9978.14905987549651656075.stgit@devboxSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      613f050d
    • M
      perf probe: Add error checks to offline probe post-processing · 3e96dac7
      Masami Hiramatsu 提交于
      Add error check codes on post processing and improve it for offline
      probe events as:
      
       - post processing fails if no matched symbol found in map(-ENOENT)
         or strdup() failed(-ENOMEM).
      
       - Even if the symbol name is the same, it updates symbol address
         and offset.
      Signed-off-by: NMasami Hiramatsu <mhiramat@kernel.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/148411443738.9978.4617979132625405545.stgit@devboxSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      3e96dac7
    • M
      perf probe: Fix to show correct locations for events on modules · d2d4edbe
      Masami Hiramatsu 提交于
      Fix to show correct locations for events on modules by relocating given
      address instead of retrying after failure.
      
      This happens when the module text size is big enough, bigger than
      sh_addr, because the original code retries with given address + sh_addr
      if it failed to find CU DIE at the given address.
      
      Any address smaller than sh_addr always fails and it retries with the
      correct address, but addresses bigger than sh_addr will get a CU DIE
      which is on the given address (not adjusted by sh_addr).
      
      In my environment(x86-64), the sh_addr of ".text" section is 0x10030.
      Since i915 is a huge kernel module, we can see this issue as below.
      
        $ grep "[Tt] .*\[i915\]" /proc/kallsyms | sort | head -n1
        ffffffffc0270000 t i915_switcheroo_can_switch	[i915]
      
      ffffffffc0270000 + 0x10030 = ffffffffc0280030, so we'll check
      symbols cross this boundary.
      
        $ grep "[Tt] .*\[i915\]" /proc/kallsyms | grep -B1 ^ffffffffc028\
        | head -n 2
        ffffffffc027ff80 t haswell_init_clock_gating	[i915]
        ffffffffc0280110 t valleyview_init_clock_gating	[i915]
      
      So setup probes on both function and see what happen.
      
        $ sudo ./perf probe -m i915 -a haswell_init_clock_gating \
              -a valleyview_init_clock_gating
        Added new events:
          probe:haswell_init_clock_gating (on haswell_init_clock_gating in i915)
          probe:valleyview_init_clock_gating (on valleyview_init_clock_gating in i915)
      
        You can now use it in all perf tools, such as:
      
        	perf record -e probe:valleyview_init_clock_gating -aR sleep 1
      
        $ sudo ./perf probe -l
          probe:haswell_init_clock_gating (on haswell_init_clock_gating@gpu/drm/i915/intel_pm.c in i915)
          probe:valleyview_init_clock_gating (on i915_vga_set_decode:4@gpu/drm/i915/i915_drv.c in i915)
      
      As you can see, haswell_init_clock_gating is correctly shown,
      but valleyview_init_clock_gating is not.
      
      With this patch, both events are shown correctly.
      
        $ sudo ./perf probe -l
          probe:haswell_init_clock_gating (on haswell_init_clock_gating@gpu/drm/i915/intel_pm.c in i915)
          probe:valleyview_init_clock_gating (on valleyview_init_clock_gating@gpu/drm/i915/intel_pm.c in i915)
      
      Committer notes:
      
      In my case:
      
        # perf probe -m i915 -a haswell_init_clock_gating -a valleyview_init_clock_gating
        Added new events:
          probe:haswell_init_clock_gating (on haswell_init_clock_gating in i915)
          probe:valleyview_init_clock_gating (on valleyview_init_clock_gating in i915)
      
        You can now use it in all perf tools, such as:
      
      	  perf record -e probe:valleyview_init_clock_gating -aR sleep 1
      
        # perf probe -l
          probe:haswell_init_clock_gating (on i915_getparam+432@gpu/drm/i915/i915_drv.c in i915)
          probe:valleyview_init_clock_gating (on __i915_printk+240@gpu/drm/i915/i915_drv.c in i915)
        #
      
        # readelf -SW /lib/modules/4.9.0+/build/vmlinux | egrep -w '.text|Name'
         [Nr] Name   Type      Address          Off    Size   ES Flg Lk Inf Al
         [ 1] .text  PROGBITS  ffffffff81000000 200000 822fd3 00  AX  0   0 4096
        #
      
        So both are b0rked, now with the fix:
      
        # perf probe -m i915 -a haswell_init_clock_gating -a valleyview_init_clock_gating
        Added new events:
          probe:haswell_init_clock_gating (on haswell_init_clock_gating in i915)
          probe:valleyview_init_clock_gating (on valleyview_init_clock_gating in i915)
      
        You can now use it in all perf tools, such as:
      
      	perf record -e probe:valleyview_init_clock_gating -aR sleep 1
      
        # perf probe -l
          probe:haswell_init_clock_gating (on haswell_init_clock_gating@gpu/drm/i915/intel_pm.c in i915)
          probe:valleyview_init_clock_gating (on valleyview_init_clock_gating@gpu/drm/i915/intel_pm.c in i915)
        #
      
      Both looks correct.
      Signed-off-by: NMasami Hiramatsu <mhiramat@kernel.org>
      Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/148411436777.9978.1440275861947194930.stgit@devboxSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      d2d4edbe
  10. 12 1月, 2017 2 次提交
  11. 11 1月, 2017 1 次提交
  12. 10 1月, 2017 3 次提交
    • G
      bpf: allow helpers access to variable memory · 06c1c049
      Gianluca Borello 提交于
      Currently, helpers that read and write from/to the stack can do so using
      a pair of arguments of type ARG_PTR_TO_STACK and ARG_CONST_STACK_SIZE.
      ARG_CONST_STACK_SIZE accepts a constant register of type CONST_IMM, so
      that the verifier can safely check the memory access. However, requiring
      the argument to be a constant can be limiting in some circumstances.
      
      Since the current logic keeps track of the minimum and maximum value of
      a register throughout the simulated execution, ARG_CONST_STACK_SIZE can
      be changed to also accept an UNKNOWN_VALUE register in case its
      boundaries have been set and the range doesn't cause invalid memory
      accesses.
      
      One common situation when this is useful:
      
      int len;
      char buf[BUFSIZE]; /* BUFSIZE is 128 */
      
      if (some_condition)
      	len = 42;
      else
      	len = 84;
      
      some_helper(..., buf, len & (BUFSIZE - 1));
      
      The compiler can often decide to assign the constant values 42 or 48
      into a variable on the stack, instead of keeping it in a register. When
      the variable is then read back from stack into the register in order to
      be passed to the helper, the verifier will not be able to recognize the
      register as constant (the verifier is not currently tracking all
      constant writes into memory), and the program won't be valid.
      
      However, by allowing the helper to accept an UNKNOWN_VALUE register,
      this program will work because the bitwise AND operation will set the
      range of possible values for the UNKNOWN_VALUE register to [0, BUFSIZE),
      so the verifier can guarantee the helper call will be safe (assuming the
      argument is of type ARG_CONST_STACK_SIZE_OR_ZERO, otherwise one more
      check against 0 would be needed). Custom ranges can be set not only with
      ALU operations, but also by explicitly comparing the UNKNOWN_VALUE
      register with constants.
      
      Another very common example happens when intercepting system call
      arguments and accessing user-provided data of variable size using
      bpf_probe_read(). One can load at runtime the user-provided length in an
      UNKNOWN_VALUE register, and then read that exact amount of data up to a
      compile-time determined limit in order to fit into the proper local
      storage allocated on the stack, without having to guess a suboptimal
      access size at compile time.
      
      Also, in case the helpers accepting the UNKNOWN_VALUE register operate
      in raw mode, disable the raw mode so that the program is required to
      initialize all memory, since there is no guarantee the helper will fill
      it completely, leaving possibilities for data leak (just relevant when
      the memory used by the helper is the stack, not when using a pointer to
      map element value or packet). In other words, ARG_PTR_TO_RAW_STACK will
      be treated as ARG_PTR_TO_STACK.
      Signed-off-by: NGianluca Borello <g.borello@gmail.com>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      06c1c049
    • G
      bpf: allow adjusted map element values to spill · f0318d01
      Gianluca Borello 提交于
      commit 48461135 ("bpf: allow access into map value arrays")
      introduces the ability to do pointer math inside a map element value via
      the PTR_TO_MAP_VALUE_ADJ register type.
      
      The current support doesn't handle the case where a PTR_TO_MAP_VALUE_ADJ
      is spilled into the stack, limiting several use cases, especially when
      generating bpf code from a compiler.
      
      Handle this case by explicitly enabling the register type
      PTR_TO_MAP_VALUE_ADJ to be spilled. Also, make sure that min_value and
      max_value are reset just for BPF_LDX operations that don't result in a
      restore of a spilled register from stack.
      Signed-off-by: NGianluca Borello <g.borello@gmail.com>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f0318d01
    • G
      bpf: allow helpers access to map element values · 5722569b
      Gianluca Borello 提交于
      Enable helpers to directly access a map element value by passing a
      register type PTR_TO_MAP_VALUE (or PTR_TO_MAP_VALUE_ADJ) to helper
      arguments ARG_PTR_TO_STACK or ARG_PTR_TO_RAW_STACK.
      
      This enables several use cases. For example, a typical tracing program
      might want to capture pathnames passed to sys_open() with:
      
      struct trace_data {
      	char pathname[PATHLEN];
      };
      
      SEC("kprobe/sys_open")
      void bpf_sys_open(struct pt_regs *ctx)
      {
      	struct trace_data data;
      	bpf_probe_read(data.pathname, sizeof(data.pathname), ctx->di);
      
      	/* consume data.pathname, for example via
      	 * bpf_trace_printk() or bpf_perf_event_output()
      	 */
      }
      
      Such a program could easily hit the stack limit in case PATHLEN needs to
      be large or more local variables need to exist, both of which are quite
      common scenarios. Allowing direct helper access to map element values,
      one could do:
      
      struct bpf_map_def SEC("maps") scratch_map = {
      	.type = BPF_MAP_TYPE_PERCPU_ARRAY,
      	.key_size = sizeof(u32),
      	.value_size = sizeof(struct trace_data),
      	.max_entries = 1,
      };
      
      SEC("kprobe/sys_open")
      int bpf_sys_open(struct pt_regs *ctx)
      {
      	int id = 0;
      	struct trace_data *p = bpf_map_lookup_elem(&scratch_map, &id);
      	if (!p)
      		return;
      	bpf_probe_read(p->pathname, sizeof(p->pathname), ctx->di);
      
      	/* consume p->pathname, for example via
      	 * bpf_trace_printk() or bpf_perf_event_output()
      	 */
      }
      
      And wouldn't risk exhausting the stack.
      
      Code changes are loosely modeled after commit 6841de8b ("bpf: allow
      helpers access the packet directly"). Unlike with PTR_TO_PACKET, these
      changes just work with ARG_PTR_TO_STACK and ARG_PTR_TO_RAW_STACK (not
      ARG_PTR_TO_MAP_KEY, ARG_PTR_TO_MAP_VALUE, ...): adding those would be
      trivial, but since there is not currently a use case for that, it's
      reasonable to limit the set of changes.
      
      Also, add new tests to make sure accesses to map element values from
      helpers never go out of boundary, even when adjusted.
      Signed-off-by: NGianluca Borello <g.borello@gmail.com>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5722569b
  13. 06 1月, 2017 5 次提交
  14. 04 1月, 2017 6 次提交
  15. 03 1月, 2017 4 次提交