1. 30 4月, 2016 4 次提交
  2. 15 4月, 2016 2 次提交
    • D
      bpf, samples: add test cases for raw stack · 3f2050e2
      Daniel Borkmann 提交于
      This adds test cases mostly around ARG_PTR_TO_RAW_STACK to check the
      verifier behaviour.
      
        [...]
        #84 raw_stack: no skb_load_bytes OK
        #85 raw_stack: skb_load_bytes, no init OK
        #86 raw_stack: skb_load_bytes, init OK
        #87 raw_stack: skb_load_bytes, spilled regs around bounds OK
        #88 raw_stack: skb_load_bytes, spilled regs corruption OK
        #89 raw_stack: skb_load_bytes, spilled regs corruption 2 OK
        #90 raw_stack: skb_load_bytes, spilled regs + data OK
        #91 raw_stack: skb_load_bytes, invalid access 1 OK
        #92 raw_stack: skb_load_bytes, invalid access 2 OK
        #93 raw_stack: skb_load_bytes, invalid access 3 OK
        #94 raw_stack: skb_load_bytes, invalid access 4 OK
        #95 raw_stack: skb_load_bytes, invalid access 5 OK
        #96 raw_stack: skb_load_bytes, invalid access 6 OK
        #97 raw_stack: skb_load_bytes, large access OK
        Summary: 98 PASSED, 0 FAILED
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3f2050e2
    • D
      bpf, samples: don't zero data when not needed · 02413cab
      Daniel Borkmann 提交于
      Remove the zero initialization in the sample programs where appropriate.
      Note that this is an optimization which is now possible, old programs
      still doing the zero initialization are just fine as well. Also, make
      sure we don't have padding issues when we don't memset() the entire
      struct anymore.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      02413cab
  3. 08 4月, 2016 3 次提交
  4. 07 4月, 2016 3 次提交
    • N
      samples/bpf: Enable powerpc support · 138d6153
      Naveen N. Rao 提交于
      Add the necessary definitions for building bpf samples on ppc.
      
      Since ppc doesn't store function return address on the stack, modify how
      PT_REGS_RET() and PT_REGS_FP() work.
      
      Also, introduce PT_REGS_IP() to access the instruction pointer.
      
      Cc: Alexei Starovoitov <ast@fb.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      138d6153
    • N
      samples/bpf: Use llc in PATH, rather than a hardcoded value · 128d1514
      Naveen N. Rao 提交于
      While at it, remove the generation of .s files and fix some typos in the
      related comment.
      
      Cc: Alexei Starovoitov <ast@fb.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      128d1514
    • N
      samples/bpf: Fix build breakage with map_perf_test_user.c · 77e63534
      Naveen N. Rao 提交于
      Building BPF samples is failing with the below error:
      
      samples/bpf/map_perf_test_user.c: In function ‘main’:
      samples/bpf/map_perf_test_user.c:134:9: error: variable ‘r’ has
      initializer but incomplete type
        struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY};
               ^
      samples/bpf/map_perf_test_user.c:134:21: error: ‘RLIM_INFINITY’
      undeclared (first use in this function)
        struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY};
                           ^
      samples/bpf/map_perf_test_user.c:134:21: note: each undeclared
      identifier is reported only once for each function it appears in
      samples/bpf/map_perf_test_user.c:134:9: warning: excess elements in
      struct initializer [enabled by default]
        struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY};
               ^
      samples/bpf/map_perf_test_user.c:134:9: warning: (near initialization
      for ‘r’) [enabled by default]
      samples/bpf/map_perf_test_user.c:134:9: warning: excess elements in
      struct initializer [enabled by default]
      samples/bpf/map_perf_test_user.c:134:9: warning: (near initialization
      for ‘r’) [enabled by default]
      samples/bpf/map_perf_test_user.c:134:16: error: storage size of ‘r’
      isn’t known
        struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY};
                      ^
      samples/bpf/map_perf_test_user.c:139:2: warning: implicit declaration of
      function ‘setrlimit’ [-Wimplicit-function-declaration]
        setrlimit(RLIMIT_MEMLOCK, &r);
        ^
      samples/bpf/map_perf_test_user.c:139:12: error: ‘RLIMIT_MEMLOCK’
      undeclared (first use in this function)
        setrlimit(RLIMIT_MEMLOCK, &r);
                  ^
      samples/bpf/map_perf_test_user.c:134:16: warning: unused variable ‘r’
      [-Wunused-variable]
        struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY};
                      ^
      make[2]: *** [samples/bpf/map_perf_test_user.o] Error 1
      
      Fix this by including the necessary header file.
      
      Cc: Alexei Starovoitov <ast@fb.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      77e63534
  5. 09 3月, 2016 7 次提交
  6. 20 2月, 2016 1 次提交
    • A
      samples/bpf: offwaketime example · a6ffe7b9
      Alexei Starovoitov 提交于
      This is simplified version of Brendan Gregg's offwaketime:
      This program shows kernel stack traces and task names that were blocked and
      "off-CPU", along with the stack traces and task names for the threads that woke
      them, and the total elapsed time from when they blocked to when they were woken
      up. The combined stacks, task names, and total time is summarized in kernel
      context for efficiency.
      
      Example:
      $ sudo ./offwaketime | flamegraph.pl > demo.svg
      Open demo.svg in the browser as FlameGraph visualization.
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a6ffe7b9
  7. 06 2月, 2016 3 次提交
  8. 17 11月, 2015 1 次提交
  9. 03 11月, 2015 1 次提交
    • D
      bpf: add sample usages for persistent maps/progs · 42984d7c
      Daniel Borkmann 提交于
      This patch adds a couple of stand-alone examples on how BPF_OBJ_PIN
      and BPF_OBJ_GET commands can be used.
      
      Example with maps:
      
        # ./fds_example -F /sys/fs/bpf/m -P -m -k 1 -v 42
        bpf: map fd:3 (Success)
        bpf: pin ret:(0,Success)
        bpf: fd:3 u->(1:42) ret:(0,Success)
        # ./fds_example -F /sys/fs/bpf/m -G -m -k 1
        bpf: get fd:3 (Success)
        bpf: fd:3 l->(1):42 ret:(0,Success)
        # ./fds_example -F /sys/fs/bpf/m -G -m -k 1 -v 24
        bpf: get fd:3 (Success)
        bpf: fd:3 u->(1:24) ret:(0,Success)
        # ./fds_example -F /sys/fs/bpf/m -G -m -k 1
        bpf: get fd:3 (Success)
        bpf: fd:3 l->(1):24 ret:(0,Success)
      
        # ./fds_example -F /sys/fs/bpf/m2 -P -m
        bpf: map fd:3 (Success)
        bpf: pin ret:(0,Success)
        # ./fds_example -F /sys/fs/bpf/m2 -G -m -k 1
        bpf: get fd:3 (Success)
        bpf: fd:3 l->(1):0 ret:(0,Success)
        # ./fds_example -F /sys/fs/bpf/m2 -G -m
        bpf: get fd:3 (Success)
      
      Example with progs:
      
        # ./fds_example -F /sys/fs/bpf/p -P -p
        bpf: prog fd:3 (Success)
        bpf: pin ret:(0,Success)
        bpf sock:4 <- fd:3 attached ret:(0,Success)
        # ./fds_example -F /sys/fs/bpf/p -G -p
        bpf: get fd:3 (Success)
        bpf: sock:4 <- fd:3 attached ret:(0,Success)
      
        # ./fds_example -F /sys/fs/bpf/p2 -P -p -o ./sockex1_kern.o
        bpf: prog fd:5 (Success)
        bpf: pin ret:(0,Success)
        bpf: sock:3 <- fd:5 attached ret:(0,Success)
        # ./fds_example -F /sys/fs/bpf/p2 -G -p
        bpf: get fd:3 (Success)
        bpf: sock:4 <- fd:3 attached ret:(0,Success)
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      42984d7c
  10. 28 10月, 2015 1 次提交
  11. 22 10月, 2015 1 次提交
  12. 13 10月, 2015 1 次提交
    • A
      bpf: add unprivileged bpf tests · bf508877
      Alexei Starovoitov 提交于
      Add new tests samples/bpf/test_verifier:
      
      unpriv: return pointer
        checks that pointer cannot be returned from the eBPF program
      
      unpriv: add const to pointer
      unpriv: add pointer to pointer
      unpriv: neg pointer
        checks that pointer arithmetic is disallowed
      
      unpriv: cmp pointer with const
      unpriv: cmp pointer with pointer
        checks that comparison of pointers is disallowed
        Only one case allowed 'void *value = bpf_map_lookup_elem(..); if (value == 0) ...'
      
      unpriv: check that printk is disallowed
        since bpf_trace_printk is not available to unprivileged
      
      unpriv: pass pointer to helper function
        checks that pointers cannot be passed to functions that expect integers
        If function expects a pointer the verifier allows only that type of pointer.
        Like 1st argument of bpf_map_lookup_elem() must be pointer to map.
        (applies to non-root as well)
      
      unpriv: indirectly pass pointer on stack to helper function
        checks that pointer stored into stack cannot be used as part of key
        passed into bpf_map_lookup_elem()
      
      unpriv: mangle pointer on stack 1
      unpriv: mangle pointer on stack 2
        checks that writing into stack slot that already contains a pointer
        is disallowed
      
      unpriv: read pointer from stack in small chunks
        checks that < 8 byte read from stack slot that contains a pointer is
        disallowed
      
      unpriv: write pointer into ctx
        checks that storing pointers into skb->fields is disallowed
      
      unpriv: write pointer into map elem value
        checks that storing pointers into element values is disallowed
        For example:
        int bpf_prog(struct __sk_buff *skb)
        {
          u32 key = 0;
          u64 *value = bpf_map_lookup_elem(&map, &key);
          if (value)
             *value = (u64) skb;
        }
        will be rejected.
      
      unpriv: partial copy of pointer
        checks that doing 32-bit register mov from register containing
        a pointer is disallowed
      
      unpriv: pass pointer to tail_call
        checks that passing pointer as an index into bpf_tail_call
        is disallowed
      
      unpriv: cmp map pointer with zero
        checks that comparing map pointer with constant is disallowed
      
      unpriv: write into frame pointer
        checks that frame pointer is read-only (applies to root too)
      
      unpriv: cmp of frame pointer
        checks that R10 cannot be using in comparison
      
      unpriv: cmp of stack pointer
        checks that Rx = R10 - imm is ok, but comparing Rx is not
      
      unpriv: obfuscate stack pointer
        checks that Rx = R10 - imm is ok, but Rx -= imm is not
      Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bf508877
  13. 18 9月, 2015 1 次提交
    • A
      bpf: add bpf_redirect() helper · 27b29f63
      Alexei Starovoitov 提交于
      Existing bpf_clone_redirect() helper clones skb before redirecting
      it to RX or TX of destination netdev.
      Introduce bpf_redirect() helper that does that without cloning.
      
      Benchmarked with two hosts using 10G ixgbe NICs.
      One host is doing line rate pktgen.
      Another host is configured as:
      $ tc qdisc add dev $dev ingress
      $ tc filter add dev $dev root pref 10 u32 match u32 0 0 flowid 1:2 \
         action bpf run object-file tcbpf1_kern.o section clone_redirect_xmit drop
      so it receives the packet on $dev and immediately xmits it on $dev + 1
      The section 'clone_redirect_xmit' in tcbpf1_kern.o file has the program
      that does bpf_clone_redirect() and performance is 2.0 Mpps
      
      $ tc filter add dev $dev root pref 10 u32 match u32 0 0 flowid 1:2 \
         action bpf run object-file tcbpf1_kern.o section redirect_xmit drop
      which is using bpf_redirect() - 2.4 Mpps
      
      and using cls_bpf with integrated actions as:
      $ tc filter add dev $dev root pref 10 \
        bpf run object-file tcbpf1_kern.o section redirect_xmit integ_act classid 1
      performance is 2.5 Mpps
      
      To summarize:
      u32+act_bpf using clone_redirect - 2.0 Mpps
      u32+act_bpf using redirect - 2.4 Mpps
      cls_bpf using redirect - 2.5 Mpps
      
      For comparison linux bridge in this setup is doing 2.1 Mpps
      and ixgbe rx + drop in ip_rcv - 7.8 Mpps
      Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NJohn Fastabend <john.r.fastabend@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      27b29f63
  14. 13 8月, 2015 1 次提交
    • K
      bpf: fix build warnings and add function read_trace_pipe() · 5ed3ccbd
      Kaixu Xia 提交于
      There are two improvements in this patch:
       1. Fix the build warnings;
       2. Add function read_trace_pipe() to print the result on
          the screen;
      
      Before this patch, we can get the result through /sys/kernel/de
      bug/tracing/trace_pipe and get nothing on the screen.
      By applying this patch, the result can be printed on the screen.
        $ ./tracex6
      	...
               tracex6-705   [003] d..1   131.428593: : CPU-3   19981414
                  sshd-683   [000] d..1   131.428727: : CPU-0   221682321
                  sshd-683   [000] d..1   131.428821: : CPU-0   221808766
                  sshd-683   [000] d..1   131.428950: : CPU-0   221982984
                  sshd-683   [000] d..1   131.429045: : CPU-0   222111851
               tracex6-705   [003] d..1   131.429168: : CPU-3   20757551
                  sshd-683   [000] d..1   131.429170: : CPU-0   222281240
                  sshd-683   [000] d..1   131.429261: : CPU-0   222403340
                  sshd-683   [000] d..1   131.429378: : CPU-0   222561024
      	...
      Signed-off-by: NKaixu Xia <xiakaixu@huawei.com>
      Acked-by: NAlexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5ed3ccbd
  15. 10 8月, 2015 1 次提交
  16. 27 7月, 2015 1 次提交
  17. 09 7月, 2015 1 次提交
  18. 23 6月, 2015 1 次提交
    • D
      bpf: BPF based latency tracing · 0fb1170e
      Daniel Wagner 提交于
      BPF offers another way to generate latency histograms. We attach
      kprobes at trace_preempt_off and trace_preempt_on and calculate the
      time it takes to from seeing the off/on transition.
      
      The first array is used to store the start time stamp. The key is the
      CPU id. The second array stores the log2(time diff). We need to use
      static allocation here (array and not hash tables). The kprobes
      hooking into trace_preempt_on|off should not calling any dynamic
      memory allocation or free path. We need to avoid recursivly
      getting called. Besides that, it reduces jitter in the measurement.
      
      CPU 0
            latency        : count     distribution
             1 -> 1        : 0        |                                        |
             2 -> 3        : 0        |                                        |
             4 -> 7        : 0        |                                        |
             8 -> 15       : 0        |                                        |
            16 -> 31       : 0        |                                        |
            32 -> 63       : 0        |                                        |
            64 -> 127      : 0        |                                        |
           128 -> 255      : 0        |                                        |
           256 -> 511      : 0        |                                        |
           512 -> 1023     : 0        |                                        |
          1024 -> 2047     : 0        |                                        |
          2048 -> 4095     : 166723   |*************************************** |
          4096 -> 8191     : 19870    |***                                     |
          8192 -> 16383    : 6324     |                                        |
         16384 -> 32767    : 1098     |                                        |
         32768 -> 65535    : 190      |                                        |
         65536 -> 131071   : 179      |                                        |
        131072 -> 262143   : 18       |                                        |
        262144 -> 524287   : 4        |                                        |
        524288 -> 1048575  : 1363     |                                        |
      CPU 1
            latency        : count     distribution
             1 -> 1        : 0        |                                        |
             2 -> 3        : 0        |                                        |
             4 -> 7        : 0        |                                        |
             8 -> 15       : 0        |                                        |
            16 -> 31       : 0        |                                        |
            32 -> 63       : 0        |                                        |
            64 -> 127      : 0        |                                        |
           128 -> 255      : 0        |                                        |
           256 -> 511      : 0        |                                        |
           512 -> 1023     : 0        |                                        |
          1024 -> 2047     : 0        |                                        |
          2048 -> 4095     : 114042   |*************************************** |
          4096 -> 8191     : 9587     |**                                      |
          8192 -> 16383    : 4140     |                                        |
         16384 -> 32767    : 673      |                                        |
         32768 -> 65535    : 179      |                                        |
         65536 -> 131071   : 29       |                                        |
        131072 -> 262143   : 4        |                                        |
        262144 -> 524287   : 1        |                                        |
        524288 -> 1048575  : 364      |                                        |
      CPU 2
            latency        : count     distribution
             1 -> 1        : 0        |                                        |
             2 -> 3        : 0        |                                        |
             4 -> 7        : 0        |                                        |
             8 -> 15       : 0        |                                        |
            16 -> 31       : 0        |                                        |
            32 -> 63       : 0        |                                        |
            64 -> 127      : 0        |                                        |
           128 -> 255      : 0        |                                        |
           256 -> 511      : 0        |                                        |
           512 -> 1023     : 0        |                                        |
          1024 -> 2047     : 0        |                                        |
          2048 -> 4095     : 40147    |*************************************** |
          4096 -> 8191     : 2300     |*                                       |
          8192 -> 16383    : 828      |                                        |
         16384 -> 32767    : 178      |                                        |
         32768 -> 65535    : 59       |                                        |
         65536 -> 131071   : 2        |                                        |
        131072 -> 262143   : 0        |                                        |
        262144 -> 524287   : 1        |                                        |
        524288 -> 1048575  : 174      |                                        |
      CPU 3
            latency        : count     distribution
             1 -> 1        : 0        |                                        |
             2 -> 3        : 0        |                                        |
             4 -> 7        : 0        |                                        |
             8 -> 15       : 0        |                                        |
            16 -> 31       : 0        |                                        |
            32 -> 63       : 0        |                                        |
            64 -> 127      : 0        |                                        |
           128 -> 255      : 0        |                                        |
           256 -> 511      : 0        |                                        |
           512 -> 1023     : 0        |                                        |
          1024 -> 2047     : 0        |                                        |
          2048 -> 4095     : 29626    |*************************************** |
          4096 -> 8191     : 2704     |**                                      |
          8192 -> 16383    : 1090     |                                        |
         16384 -> 32767    : 160      |                                        |
         32768 -> 65535    : 72       |                                        |
         65536 -> 131071   : 32       |                                        |
        131072 -> 262143   : 26       |                                        |
        262144 -> 524287   : 12       |                                        |
        524288 -> 1048575  : 298      |                                        |
      
      All this is based on the trace3 examples written by
      Alexei Starovoitov <ast@plumgrid.com>.
      Signed-off-by: NDaniel Wagner <daniel.wagner@bmw-carit.de>
      Cc: Alexei Starovoitov <ast@plumgrid.com>
      Cc: Alexei Starovoitov <ast@plumgrid.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: linux-kernel@vger.kernel.org
      Cc: netdev@vger.kernel.org
      Acked-by: NAlexei Starovoitov <ast@plumgrid.com>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0fb1170e
  19. 16 6月, 2015 1 次提交
    • A
      bpf: introduce current->pid, tgid, uid, gid, comm accessors · ffeedafb
      Alexei Starovoitov 提交于
      eBPF programs attached to kprobes need to filter based on
      current->pid, uid and other fields, so introduce helper functions:
      
      u64 bpf_get_current_pid_tgid(void)
      Return: current->tgid << 32 | current->pid
      
      u64 bpf_get_current_uid_gid(void)
      Return: current_gid << 32 | current_uid
      
      bpf_get_current_comm(char *buf, int size_of_buf)
      stores current->comm into buf
      
      They can be used from the programs attached to TC as well to classify packets
      based on current task fields.
      
      Update tracex2 example to print histogram of write syscalls for each process
      instead of aggregated for all.
      Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ffeedafb
  20. 07 6月, 2015 2 次提交
    • A
      bpf: allow programs to write to certain skb fields · d691f9e8
      Alexei Starovoitov 提交于
      allow programs read/write skb->mark, tc_index fields and
      ((struct qdisc_skb_cb *)cb)->data.
      
      mark and tc_index are generically useful in TC.
      cb[0]-cb[4] are primarily used to pass arguments from one
      program to another called via bpf_tail_call() which can
      be seen in sockex3_kern.c example.
      
      All fields of 'struct __sk_buff' are readable to socket and tc_cls_act progs.
      mark, tc_index are writeable from tc_cls_act only.
      cb[0]-cb[4] are writeable by both sockets and tc_cls_act.
      
      Add verifier tests and improve sample code.
      Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d691f9e8
    • A
      bpf: make programs see skb->data == L2 for ingress and egress · 3431205e
      Alexei Starovoitov 提交于
      eBPF programs attached to ingress and egress qdiscs see inconsistent skb->data.
      For ingress L2 header is already pulled, whereas for egress it's present.
      This is known to program writers which are currently forced to use
      BPF_LL_OFF workaround.
      Since programs don't change skb internal pointers it is safe to do
      pull/push right around invocation of the program and earlier taps and
      later pt->func() will not be affected.
      Multiple taps via packet_rcv(), tpacket_rcv() are doing the same trick
      around run_filter/BPF_PROG_RUN even if skb_shared.
      
      This fix finally allows programs to use optimized LD_ABS/IND instructions
      without BPF_LL_OFF for higher performance.
      tc ingress + cls_bpf + samples/bpf/tcbpf1_kern.o
             w/o JIT   w/JIT
      before  20.5     23.6 Mpps
      after   21.8     26.6 Mpps
      
      Old programs with BPF_LL_OFF will still work as-is.
      
      We can now undo most of the earlier workaround commit:
      a166151c ("bpf: fix bpf helpers to use skb->mac_header relative offsets")
      Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com>
      Acked-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3431205e
  21. 22 5月, 2015 2 次提交
    • A
      samples/bpf: bpf_tail_call example for networking · 530b2c86
      Alexei Starovoitov 提交于
      Usage:
      $ sudo ./sockex3
      IP     src.port -> dst.port               bytes      packets
      127.0.0.1.42010 -> 127.0.0.1.12865         1568            8
      127.0.0.1.59526 -> 127.0.0.1.33778     11422636       173070
      127.0.0.1.33778 -> 127.0.0.1.59526  11260224828       341974
      127.0.0.1.12865 -> 127.0.0.1.42010         1832           12
      IP     src.port -> dst.port               bytes      packets
      127.0.0.1.42010 -> 127.0.0.1.12865         1568            8
      127.0.0.1.59526 -> 127.0.0.1.33778     23198092       351486
      127.0.0.1.33778 -> 127.0.0.1.59526  22972698518       698616
      127.0.0.1.12865 -> 127.0.0.1.42010         1832           12
      
      this example is similar to sockex2 in a way that it accumulates per-flow
      statistics, but it does packet parsing differently.
      sockex2 inlines full packet parser routine into single bpf program.
      This sockex3 example have 4 independent programs that parse vlan, mpls, ip, ipv6
      and one main program that starts the process.
      bpf_tail_call() mechanism allows each program to be small and be called
      on demand potentially multiple times, so that many vlan, mpls, ip in ip,
      gre encapsulations can be parsed. These and other protocol parsers can
      be added or removed at runtime. TLVs can be parsed in similar manner.
      Note, tail_call_cnt dynamic check limits the number of tail calls to 32.
      Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      530b2c86
    • A
      samples/bpf: bpf_tail_call example for tracing · 5bacd780
      Alexei Starovoitov 提交于
      kprobe example that demonstrates how future seccomp programs may look like.
      It attaches to seccomp_phase1() function and tail-calls other BPF programs
      depending on syscall number.
      
      Existing optimized classic BPF seccomp programs generated by Chrome look like:
      if (sd.nr < 121) {
        if (sd.nr < 57) {
          if (sd.nr < 22) {
            if (sd.nr < 7) {
              if (sd.nr < 4) {
                if (sd.nr < 1) {
                  check sys_read
                } else {
                  if (sd.nr < 3) {
                    check sys_write and sys_open
                  } else {
                    check sys_close
                  }
                }
              } else {
            } else {
          } else {
        } else {
      } else {
      }
      
      the future seccomp using native eBPF may look like:
        bpf_tail_call(&sd, &syscall_jmp_table, sd.nr);
      which is simpler, faster and leaves more room for per-syscall checks.
      
      Usage:
      $ sudo ./tracex5
      <...>-366   [001] d...     4.870033: : read(fd=1, buf=00007f6d5bebf000, size=771)
      <...>-369   [003] d...     4.870066: : mmap
      <...>-369   [003] d...     4.870077: : syscall=110 (one of get/set uid/pid/gid)
      <...>-369   [003] d...     4.870089: : syscall=107 (one of get/set uid/pid/gid)
         sh-369   [000] d...     4.891740: : read(fd=0, buf=00000000023d1000, size=512)
         sh-369   [000] d...     4.891747: : write(fd=1, buf=00000000023d3000, size=512)
         sh-369   [000] d...     4.891747: : read(fd=1, buf=00000000023d3000, size=512)
      Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5bacd780
  22. 13 5月, 2015 1 次提交