1. 02 12月, 2016 1 次提交
    • T
      bpf: Add tests and samples for LWT-BPF · f74599f7
      Thomas Graf 提交于
      Adds a series of tests to verify the functionality of attaching
      BPF programs at LWT hooks.
      
      Also adds a sample which collects a histogram of packet sizes which
      pass through an LWT hook.
      
      $ ./lwt_len_hist.sh
      Starting netserver with host 'IN(6)ADDR_ANY' port '12865' and family AF_UNSPEC
      MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.253.2 () port 0 AF_INET : demo
      Recv   Send    Send
      Socket Socket  Message  Elapsed
      Size   Size    Size     Time     Throughput
      bytes  bytes   bytes    secs.    10^6bits/sec
      
       87380  16384  16384    10.00    39857.69
             1 -> 1        : 0        |                                      |
             2 -> 3        : 0        |                                      |
             4 -> 7        : 0        |                                      |
             8 -> 15       : 0        |                                      |
            16 -> 31       : 0        |                                      |
            32 -> 63       : 22       |                                      |
            64 -> 127      : 98       |                                      |
           128 -> 255      : 213      |                                      |
           256 -> 511      : 1444251  |********                              |
           512 -> 1023     : 660610   |***                                   |
          1024 -> 2047     : 535241   |**                                    |
          2048 -> 4095     : 19       |                                      |
          4096 -> 8191     : 180      |                                      |
          8192 -> 16383    : 5578023  |************************************* |
         16384 -> 32767    : 632099   |***                                   |
         32768 -> 65535    : 6575     |                                      |
      Signed-off-by: NThomas Graf <tgraf@suug.ch>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f74599f7
  2. 01 12月, 2016 1 次提交
  3. 28 11月, 2016 1 次提交
    • D
      bpf: fix multiple issues in selftest suite and samples · e00c7b21
      Daniel Borkmann 提交于
      1) The test_lru_map and test_lru_dist fails building on my machine since
         the sys/resource.h header is not included.
      
      2) test_verifier fails in one test case where we try to call an invalid
         function, since the verifier log output changed wrt printing function
         names.
      
      3) Current selftest suite code relies on sysconf(_SC_NPROCESSORS_CONF) for
         retrieving the number of possible CPUs. This is broken at least in our
         scenario and really just doesn't work.
      
         glibc tries a number of things for retrieving _SC_NPROCESSORS_CONF.
         First it tries equivalent of /sys/devices/system/cpu/cpu[0-9]* | wc -l,
         if that fails, depending on the config, it either tries to count CPUs
         in /proc/cpuinfo, or returns the _SC_NPROCESSORS_ONLN value instead.
         If /proc/cpuinfo has some issue, it returns just 1 worst case. This
         oddity is nothing new [1], but semantics/behaviour seems to be settled.
         _SC_NPROCESSORS_ONLN will parse /sys/devices/system/cpu/online, if
         that fails it looks into /proc/stat for cpuX entries, and if also that
         fails for some reason, /proc/cpuinfo is consulted (and returning 1 if
         unlikely all breaks down).
      
         While that might match num_possible_cpus() from the kernel in some
         cases, it's really not guaranteed with CPU hotplugging, and can result
         in a buffer overflow since the array in user space could have too few
         number of slots, and on perpcu map lookup, the kernel will write beyond
         that memory of the value buffer.
      
         William Tu reported such mismatches:
      
           [...] The fact that sysconf(_SC_NPROCESSORS_CONF) != num_possible_cpu()
           happens when CPU hotadd is enabled. For example, in Fusion when
           setting vcpu.hotadd = "TRUE" or in KVM, setting ./qemu-system-x86_64
           -smp 2, maxcpus=4 ... the num_possible_cpu() will be 4 and sysconf()
           will be 2 [2]. [...]
      
         Documentation/cputopology.txt says /sys/devices/system/cpu/possible
         outputs cpu_possible_mask. That is the same as in num_possible_cpus(),
         so first step would be to fix the _SC_NPROCESSORS_CONF calls with our
         own implementation. Later, we could add support to bpf(2) for passing
         a mask via CPU_SET(3), for example, to just select a subset of CPUs.
      
         BPF samples code needs this fix as well (at least so that people stop
         copying this). Thus, define bpf_num_possible_cpus() once in selftests
         and import it from there for the sample code to avoid duplicating it.
         The remaining sysconf(_SC_NPROCESSORS_CONF) in samples are unrelated.
      
      After all three issues are fixed, the test suite runs fine again:
      
        # make run_tests | grep self
        selftests: test_verifier [PASS]
        selftests: test_maps [PASS]
        selftests: test_lru_map [PASS]
        selftests: test_kmod.sh [PASS]
      
        [1] https://www.sourceware.org/ml/libc-alpha/2011-06/msg00079.html
        [2] https://www.mail-archive.com/netdev@vger.kernel.org/msg121183.html
      
      Fixes: 3059303f ("samples/bpf: update tracex[23] examples to use per-cpu maps")
      Fixes: 86af8b41 ("Add sample for adding simple drop program to link")
      Fixes: df570f57 ("samples/bpf: unit test for BPF_MAP_TYPE_PERCPU_ARRAY")
      Fixes: e1559671 ("samples/bpf: unit test for BPF_MAP_TYPE_PERCPU_HASH")
      Fixes: ebb676da ("bpf: Print function name in addition to function id")
      Fixes: 5db58faf ("bpf: Add tests for the LRU bpf_htab")
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Cc: William Tu <u9012063@gmail.com>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e00c7b21
  4. 26 11月, 2016 1 次提交
    • D
      samples: bpf: add userspace example for attaching eBPF programs to cgroups · d8c5b17f
      Daniel Mack 提交于
      Add a simple userpace program to demonstrate the new API to attach eBPF
      programs to cgroups. This is what it does:
      
       * Create arraymap in kernel with 4 byte keys and 8 byte values
      
       * Load eBPF program
      
         The eBPF program accesses the map passed in to store two pieces of
         information. The number of invocations of the program, which maps
         to the number of packets received, is stored to key 0. Key 1 is
         incremented on each iteration by the number of bytes stored in
         the skb.
      
       * Detach any eBPF program previously attached to the cgroup
      
       * Attach the new program to the cgroup using BPF_PROG_ATTACH
      
       * Once a second, read map[0] and map[1] to see how many bytes and
         packets were seen on any socket of tasks in the given cgroup.
      
      The program takes a cgroup path as 1st argument, and either "ingress"
      or "egress" as 2nd. Optionally, "drop" can be passed as 3rd argument,
      which will make the generated eBPF program return 0 instead of 1, so
      the kernel will drop the packet.
      
      libbpf gained two new wrappers for the new syscall commands.
      Signed-off-by: NDaniel Mack <daniel@zonque.org>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d8c5b17f
  5. 16 11月, 2016 1 次提交
    • M
      bpf: Add tests for the LRU bpf_htab · 5db58faf
      Martin KaFai Lau 提交于
      This patch has some unit tests and a test_lru_dist.
      
      The test_lru_dist reads in the numeric keys from a file.
      The files used here are generated by a modified fio-genzipf tool
      originated from the fio test suit.  The sample data file can be
      found here: https://github.com/iamkafai/bpf-lru
      
      The zipf.* data files have 100k numeric keys and the key is also
      ranged from 1 to 100k.
      
      The test_lru_dist outputs the number of unique keys (nr_unique).
      F.e. The following means, 61239 of them is unique out of 100k keys.
      nr_misses means it cannot be found in the LRU map, so nr_misses
      must be >= nr_unique. test_lru_dist also simulates a perfect LRU
      map as a comparison:
      
      [root@arch-fb-vm1 ~]# ~/devshare/fb-kernel/linux/samples/bpf/test_lru_dist \
      /root/zipf.100k.a1_01.out 4000 1
      ...
      test_parallel_lru_dist (map_type:9 map_flags:0x0):
          task:0 BPF LRU: nr_unique:23093(/100000) nr_misses:31603(/100000)
          task:0 Perfect LRU: nr_unique:23093(/100000 nr_misses:34328(/100000)
      ....
      test_parallel_lru_dist (map_type:9 map_flags:0x2):
          task:0 BPF LRU: nr_unique:23093(/100000) nr_misses:31710(/100000)
          task:0 Perfect LRU: nr_unique:23093(/100000 nr_misses:34328(/100000)
      
      [root@arch-fb-vm1 ~]# ~/devshare/fb-kernel/linux/samples/bpf/test_lru_dist \
      /root/zipf.100k.a0_01.out 40000 1
      ...
      test_parallel_lru_dist (map_type:9 map_flags:0x0):
          task:0 BPF LRU: nr_unique:61239(/100000) nr_misses:67054(/100000)
          task:0 Perfect LRU: nr_unique:61239(/100000 nr_misses:66993(/100000)
      ...
      test_parallel_lru_dist (map_type:9 map_flags:0x2):
          task:0 BPF LRU: nr_unique:61239(/100000) nr_misses:67068(/100000)
          task:0 Perfect LRU: nr_unique:61239(/100000 nr_misses:66993(/100000)
      
      LRU map has also been added to map_perf_test:
      /* Global LRU */
      [root@kerneltest003.31.prn1 ~]# for i in 1 4 8; do echo -n "$i cpus: "; \
      ./map_perf_test 16 $i | awk '{r += $3}END{print r " updates"}'; done
       1 cpus: 2934082 updates
       4 cpus: 7391434 updates
       8 cpus: 6500576 updates
      
      /* Percpu LRU */
      [root@kerneltest003.31.prn1 ~]# for i in 1 4 8; do echo -n "$i cpus: "; \
      ./map_perf_test 32 $i | awk '{r += $3}END{print r " updates"}'; done
        1 cpus: 2896553 updates
        4 cpus: 9766395 updates
        8 cpus: 17460553 updates
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5db58faf
  6. 13 11月, 2016 1 次提交
    • M
      bpf: Add test for bpf_redirect to ipip/ip6tnl · 90e02896
      Martin KaFai Lau 提交于
      The test creates two netns, ns1 and ns2.  The host (the default netns)
      has an ipip or ip6tnl dev configured for tunneling traffic to the ns2.
      
          ping VIPS from ns1 <----> host <--tunnel--> ns2 (VIPs at loopback)
      
      The test is to have ns1 pinging VIPs configured at the loopback
      interface in ns2.
      
      The VIPs are 10.10.1.102 and 2401:face::66 (which are configured
      at lo@ns2). [Note: 0x66 => 102].
      
      At ns1, the VIPs are routed _via_ the host.
      
      At the host, bpf programs are installed at the veth to redirect packets
      from a veth to the ipip/ip6tnl.  The test is configured in a way so
      that both ingress and egress can be tested.
      
      At ns2, the ipip/ip6tnl dev is configured with the local and remote address
      specified.  The return path is routed to the dev ipip/ip6tnl.
      
      During egress test, the host also locally tests pinging the VIPs to ensure
      that bpf_redirect at egress also works for the direct egress (i.e. not
      forwarding from dev ve1 to ve2).
      Acked-by: NAlexei Starovoitov <ast@fb.com>
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      90e02896
  7. 18 10月, 2016 1 次提交
    • D
      bpf: add initial suite for selftests · 5aa5bd14
      Daniel Borkmann 提交于
      Add a start of a test suite for kernel selftests. This moves test_verifier
      and test_maps over to tools/testing/selftests/bpf/ along with various
      code improvements and also adds a script for invoking test_bpf module.
      The test suite can simply be run via selftest framework, f.e.:
      
        # cd tools/testing/selftests/bpf/
        # make
        # make run_tests
      
      Both test_verifier and test_maps were kind of misplaced in samples/bpf/
      directory and we were looking into adding them to selftests for a while
      now, so it can be picked up by kbuild bot et al and hopefully also get
      more exposure and thus new test case additions.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5aa5bd14
  8. 03 9月, 2016 2 次提交
  9. 20 8月, 2016 1 次提交
    • W
      samples/bpf: Add tunnel set/get tests. · 6afb1e28
      William Tu 提交于
      The patch creates sample code exercising bpf_skb_{set,get}_tunnel_key,
      and bpf_skb_{set,get}_tunnel_opt for GRE, VXLAN, and GENEVE.  A native
      tunnel device is created in a namespace to interact with a lwtunnel
      device out of the namespace, with metadata enabled.  The bpf_skb_set_*
      program is attached to tc egress and bpf_skb_get_* is attached to egress
      qdisc.  A ping between two tunnels is used to verify correctness and
      the result of bpf_skb_get_* printed by bpf_trace_printk.
      Signed-off-by: NWilliam Tu <u9012063@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6afb1e28
  10. 13 8月, 2016 1 次提交
    • S
      samples/bpf: Add test_current_task_under_cgroup test · 9e6e60ec
      Sargun Dhillon 提交于
      This test has a BPF program which writes the last known pid to call the
      sync syscall within a given cgroup to a map.
      
      The user mode program creates its own mount namespace, and mounts the
      cgroupsv2  hierarchy in there, as on all current test systems
      (Ubuntu 16.04, Debian), the cgroupsv2 vfs is unmounted by default.
      Once it does this, it proceeds to test.
      
      The test checks for positive and negative condition. It ensures that
      when it's part of a given cgroup, its pid is captured in the map,
      and that when it leaves the cgroup, this doesn't happen.
      
      It populate a cgroups arraymap prior to execution in userspace. This means
      that the program must be run in the same cgroups namespace as the programs
      that are being traced.
      Signed-off-by: NSargun Dhillon <sargun@sargun.me>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Tejun Heo <tj@kernel.org>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9e6e60ec
  11. 26 7月, 2016 1 次提交
    • S
      samples/bpf: Add test/example of using bpf_probe_write_user bpf helper · cf9b1199
      Sargun Dhillon 提交于
      This example shows using a kprobe to act as a dnat mechanism to divert
      traffic for arbitrary endpoints. It rewrite the arguments to a syscall
      while they're still in userspace, and before the syscall has a chance
      to copy the argument into kernel space.
      
      Although this is an example, it also acts as a test because the mapped
      address is 255.255.255.255:555 -> real address, and that's not a legal
      address to connect to. If the helper is broken, the example will fail
      on the intermediate steps, as well as the final step to verify the
      rewrite of userspace memory succeeded.
      Signed-off-by: NSargun Dhillon <sargun@sargun.me>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cf9b1199
  12. 20 7月, 2016 2 次提交
    • B
      bpf: add sample for xdp forwarding and rewrite · 764cbcce
      Brenden Blanco 提交于
      Add a sample that rewrites and forwards packets out on the same
      interface. Observed single core forwarding performance of ~10Mpps.
      
      Since the mlx4 driver under test recycles every single packet page, the
      perf output shows almost exclusively just the ring management and bpf
      program work. Slowdowns are likely occurring due to cache misses.
      Signed-off-by: NBrenden Blanco <bblanco@plumgrid.com>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      764cbcce
    • B
      Add sample for adding simple drop program to link · 86af8b41
      Brenden Blanco 提交于
      Add a sample program that only drops packets at the BPF_PROG_TYPE_XDP_RX
      hook of a link. With the drop-only program, observed single core rate is
      ~20Mpps.
      
      Other tests were run, for instance without the dropcnt increment or
      without reading from the packet header, the packet rate was mostly
      unchanged.
      
      $ perf record -a samples/bpf/xdp1 $(</sys/class/net/eth0/ifindex)
      proto 17:   20403027 drops/s
      
      ./pktgen_sample03_burst_single_flow.sh -i $DEV -d $IP -m $MAC -t 4
      Running... ctrl^C to stop
      Device: eth4@0
      Result: OK: 11791017(c11788327+d2689) usec, 59622913 (60byte,0frags)
        5056638pps 2427Mb/sec (2427186240bps) errors: 0
      Device: eth4@1
      Result: OK: 11791012(c11787906+d3106) usec, 60526944 (60byte,0frags)
        5133311pps 2463Mb/sec (2463989280bps) errors: 0
      Device: eth4@2
      Result: OK: 11791019(c11788249+d2769) usec, 59868091 (60byte,0frags)
        5077431pps 2437Mb/sec (2437166880bps) errors: 0
      Device: eth4@3
      Result: OK: 11795039(c11792403+d2636) usec, 59483181 (60byte,0frags)
        5043067pps 2420Mb/sec (2420672160bps) errors: 0
      
      perf report --no-children:
       26.05%  ksoftirqd/0  [mlx4_en]         [k] mlx4_en_process_rx_cq
       17.84%  ksoftirqd/0  [mlx4_en]         [k] mlx4_en_alloc_frags
        5.52%  ksoftirqd/0  [mlx4_en]         [k] mlx4_en_free_frag
        4.90%  swapper      [kernel.vmlinux]  [k] poll_idle
        4.14%  ksoftirqd/0  [kernel.vmlinux]  [k] get_page_from_freelist
        2.78%  ksoftirqd/0  [kernel.vmlinux]  [k] __free_pages_ok
        2.57%  ksoftirqd/0  [kernel.vmlinux]  [k] bpf_map_lookup_elem
        2.51%  swapper      [mlx4_en]         [k] mlx4_en_process_rx_cq
        1.94%  ksoftirqd/0  [kernel.vmlinux]  [k] percpu_array_map_lookup_elem
        1.45%  swapper      [mlx4_en]         [k] mlx4_en_alloc_frags
        1.35%  ksoftirqd/0  [kernel.vmlinux]  [k] free_one_page
        1.33%  swapper      [kernel.vmlinux]  [k] intel_idle
        1.04%  ksoftirqd/0  [mlx4_en]         [k] 0x000000000001c5c5
        0.96%  ksoftirqd/0  [mlx4_en]         [k] 0x000000000001c58d
        0.93%  ksoftirqd/0  [mlx4_en]         [k] 0x000000000001c6ee
        0.92%  ksoftirqd/0  [mlx4_en]         [k] 0x000000000001c6b9
        0.89%  ksoftirqd/0  [kernel.vmlinux]  [k] __alloc_pages_nodemask
        0.83%  ksoftirqd/0  [mlx4_en]         [k] 0x000000000001c686
        0.83%  ksoftirqd/0  [mlx4_en]         [k] 0x000000000001c5d5
        0.78%  ksoftirqd/0  [mlx4_en]         [k] mlx4_alloc_pages.isra.23
        0.77%  ksoftirqd/0  [mlx4_en]         [k] 0x000000000001c5b4
        0.77%  ksoftirqd/0  [kernel.vmlinux]  [k] net_rx_action
      
      machine specs:
       receiver - Intel E5-1630 v3 @ 3.70GHz
       sender - Intel E5645 @ 2.40GHz
       Mellanox ConnectX-3 @40G
      Signed-off-by: NBrenden Blanco <bblanco@plumgrid.com>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      86af8b41
  13. 02 7月, 2016 1 次提交
    • M
      cgroup: bpf: Add an example to do cgroup checking in BPF · a3f74617
      Martin KaFai Lau 提交于
      test_cgrp2_array_pin.c:
      A userland program that creates a bpf_map (BPF_MAP_TYPE_GROUP_ARRAY),
      pouplates/updates it with a cgroup2's backed fd and pins it to a
      bpf-fs's file.  The pinned file can be loaded by tc and then used
      by the bpf prog later.  This program can also update an existing pinned
      array and it could be useful for debugging/testing purpose.
      
      test_cgrp2_tc_kern.c:
      A bpf prog which should be loaded by tc.  It is to demonstrate
      the usage of bpf_skb_in_cgroup.
      
      test_cgrp2_tc.sh:
      A script that glues the test_cgrp2_array_pin.c and
      test_cgrp2_tc_kern.c together.  The idea is like:
      1. Load the test_cgrp2_tc_kern.o by tc
      2. Use test_cgrp2_array_pin.c to populate a BPF_MAP_TYPE_CGROUP_ARRAY
         with a cgroup fd
      3. Do a 'ping -6 ff02::1%ve' to ensure the packet has been
         dropped because of a match on the cgroup
      
      Most of the lines in test_cgrp2_tc.sh is the boilerplate
      to setup the cgroup/bpf-fs/net-devices/netns...etc.  It is
      not bulletproof on errors but should work well enough and
      give enough debug info if things did not go well.
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Cc: Alexei Starovoitov <ast@fb.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Tejun Heo <tj@kernel.org>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a3f74617
  14. 07 5月, 2016 1 次提交
    • A
      samples/bpf: add 'pointer to packet' tests · 65d472fb
      Alexei Starovoitov 提交于
      parse_simple.c - packet parser exapmle with single length check that
      filters out udp packets for port 9
      
      parse_varlen.c - variable length parser that understand multiple vlan headers,
      ipip, ipip6 and ip options to filter out udp or tcp packets on port 9.
      The packet is parsed layer by layer with multitple length checks.
      
      parse_ldabs.c - classic style of packet parsing using LD_ABS instruction.
      Same functionality as parse_simple.
      
      simple = 24.1Mpps per core
      varlen = 22.7Mpps
      ldabs  = 21.4Mpps
      
      Parser with LD_ABS instructions is slower than full direct access parser
      which does more packet accesses and checks.
      
      These examples demonstrate the choice bpf program authors can make between
      flexibility of the parser vs speed.
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      65d472fb
  15. 30 4月, 2016 4 次提交
  16. 08 4月, 2016 1 次提交
    • A
      samples/bpf: add tracepoint vs kprobe performance tests · e3edfdec
      Alexei Starovoitov 提交于
      the first microbenchmark does
      fd=open("/proc/self/comm");
      for() {
        write(fd, "test");
      }
      and on 4 cpus in parallel:
                                            writes per sec
      base (no tracepoints, no kprobes)         930k
      with kprobe at __set_task_comm()          420k
      with tracepoint at task:task_rename       730k
      
      For kprobe + full bpf program manully fetches oldcomm, newcomm via bpf_probe_read.
      For tracepint bpf program does nothing, since arguments are copied by tracepoint.
      
      2nd microbenchmark does:
      fd=open("/dev/urandom");
      for() {
        read(fd, buf);
      }
      and on 4 cpus in parallel:
                                             reads per sec
      base (no tracepoints, no kprobes)         300k
      with kprobe at urandom_read()             279k
      with tracepoint at random:urandom_read    290k
      
      bpf progs attached to kprobe and tracepoint are noop.
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e3edfdec
  17. 07 4月, 2016 1 次提交
  18. 09 3月, 2016 2 次提交
  19. 20 2月, 2016 1 次提交
    • A
      samples/bpf: offwaketime example · a6ffe7b9
      Alexei Starovoitov 提交于
      This is simplified version of Brendan Gregg's offwaketime:
      This program shows kernel stack traces and task names that were blocked and
      "off-CPU", along with the stack traces and task names for the threads that woke
      them, and the total elapsed time from when they blocked to when they were woken
      up. The combined stacks, task names, and total time is summarized in kernel
      context for efficiency.
      
      Example:
      $ sudo ./offwaketime | flamegraph.pl > demo.svg
      Open demo.svg in the browser as FlameGraph visualization.
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a6ffe7b9
  20. 17 11月, 2015 1 次提交
  21. 03 11月, 2015 1 次提交
    • D
      bpf: add sample usages for persistent maps/progs · 42984d7c
      Daniel Borkmann 提交于
      This patch adds a couple of stand-alone examples on how BPF_OBJ_PIN
      and BPF_OBJ_GET commands can be used.
      
      Example with maps:
      
        # ./fds_example -F /sys/fs/bpf/m -P -m -k 1 -v 42
        bpf: map fd:3 (Success)
        bpf: pin ret:(0,Success)
        bpf: fd:3 u->(1:42) ret:(0,Success)
        # ./fds_example -F /sys/fs/bpf/m -G -m -k 1
        bpf: get fd:3 (Success)
        bpf: fd:3 l->(1):42 ret:(0,Success)
        # ./fds_example -F /sys/fs/bpf/m -G -m -k 1 -v 24
        bpf: get fd:3 (Success)
        bpf: fd:3 u->(1:24) ret:(0,Success)
        # ./fds_example -F /sys/fs/bpf/m -G -m -k 1
        bpf: get fd:3 (Success)
        bpf: fd:3 l->(1):24 ret:(0,Success)
      
        # ./fds_example -F /sys/fs/bpf/m2 -P -m
        bpf: map fd:3 (Success)
        bpf: pin ret:(0,Success)
        # ./fds_example -F /sys/fs/bpf/m2 -G -m -k 1
        bpf: get fd:3 (Success)
        bpf: fd:3 l->(1):0 ret:(0,Success)
        # ./fds_example -F /sys/fs/bpf/m2 -G -m
        bpf: get fd:3 (Success)
      
      Example with progs:
      
        # ./fds_example -F /sys/fs/bpf/p -P -p
        bpf: prog fd:3 (Success)
        bpf: pin ret:(0,Success)
        bpf sock:4 <- fd:3 attached ret:(0,Success)
        # ./fds_example -F /sys/fs/bpf/p -G -p
        bpf: get fd:3 (Success)
        bpf: sock:4 <- fd:3 attached ret:(0,Success)
      
        # ./fds_example -F /sys/fs/bpf/p2 -P -p -o ./sockex1_kern.o
        bpf: prog fd:5 (Success)
        bpf: pin ret:(0,Success)
        bpf: sock:3 <- fd:5 attached ret:(0,Success)
        # ./fds_example -F /sys/fs/bpf/p2 -G -p
        bpf: get fd:3 (Success)
        bpf: sock:4 <- fd:3 attached ret:(0,Success)
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      42984d7c
  22. 22 10月, 2015 1 次提交
  23. 10 8月, 2015 1 次提交
  24. 23 6月, 2015 1 次提交
    • D
      bpf: BPF based latency tracing · 0fb1170e
      Daniel Wagner 提交于
      BPF offers another way to generate latency histograms. We attach
      kprobes at trace_preempt_off and trace_preempt_on and calculate the
      time it takes to from seeing the off/on transition.
      
      The first array is used to store the start time stamp. The key is the
      CPU id. The second array stores the log2(time diff). We need to use
      static allocation here (array and not hash tables). The kprobes
      hooking into trace_preempt_on|off should not calling any dynamic
      memory allocation or free path. We need to avoid recursivly
      getting called. Besides that, it reduces jitter in the measurement.
      
      CPU 0
            latency        : count     distribution
             1 -> 1        : 0        |                                        |
             2 -> 3        : 0        |                                        |
             4 -> 7        : 0        |                                        |
             8 -> 15       : 0        |                                        |
            16 -> 31       : 0        |                                        |
            32 -> 63       : 0        |                                        |
            64 -> 127      : 0        |                                        |
           128 -> 255      : 0        |                                        |
           256 -> 511      : 0        |                                        |
           512 -> 1023     : 0        |                                        |
          1024 -> 2047     : 0        |                                        |
          2048 -> 4095     : 166723   |*************************************** |
          4096 -> 8191     : 19870    |***                                     |
          8192 -> 16383    : 6324     |                                        |
         16384 -> 32767    : 1098     |                                        |
         32768 -> 65535    : 190      |                                        |
         65536 -> 131071   : 179      |                                        |
        131072 -> 262143   : 18       |                                        |
        262144 -> 524287   : 4        |                                        |
        524288 -> 1048575  : 1363     |                                        |
      CPU 1
            latency        : count     distribution
             1 -> 1        : 0        |                                        |
             2 -> 3        : 0        |                                        |
             4 -> 7        : 0        |                                        |
             8 -> 15       : 0        |                                        |
            16 -> 31       : 0        |                                        |
            32 -> 63       : 0        |                                        |
            64 -> 127      : 0        |                                        |
           128 -> 255      : 0        |                                        |
           256 -> 511      : 0        |                                        |
           512 -> 1023     : 0        |                                        |
          1024 -> 2047     : 0        |                                        |
          2048 -> 4095     : 114042   |*************************************** |
          4096 -> 8191     : 9587     |**                                      |
          8192 -> 16383    : 4140     |                                        |
         16384 -> 32767    : 673      |                                        |
         32768 -> 65535    : 179      |                                        |
         65536 -> 131071   : 29       |                                        |
        131072 -> 262143   : 4        |                                        |
        262144 -> 524287   : 1        |                                        |
        524288 -> 1048575  : 364      |                                        |
      CPU 2
            latency        : count     distribution
             1 -> 1        : 0        |                                        |
             2 -> 3        : 0        |                                        |
             4 -> 7        : 0        |                                        |
             8 -> 15       : 0        |                                        |
            16 -> 31       : 0        |                                        |
            32 -> 63       : 0        |                                        |
            64 -> 127      : 0        |                                        |
           128 -> 255      : 0        |                                        |
           256 -> 511      : 0        |                                        |
           512 -> 1023     : 0        |                                        |
          1024 -> 2047     : 0        |                                        |
          2048 -> 4095     : 40147    |*************************************** |
          4096 -> 8191     : 2300     |*                                       |
          8192 -> 16383    : 828      |                                        |
         16384 -> 32767    : 178      |                                        |
         32768 -> 65535    : 59       |                                        |
         65536 -> 131071   : 2        |                                        |
        131072 -> 262143   : 0        |                                        |
        262144 -> 524287   : 1        |                                        |
        524288 -> 1048575  : 174      |                                        |
      CPU 3
            latency        : count     distribution
             1 -> 1        : 0        |                                        |
             2 -> 3        : 0        |                                        |
             4 -> 7        : 0        |                                        |
             8 -> 15       : 0        |                                        |
            16 -> 31       : 0        |                                        |
            32 -> 63       : 0        |                                        |
            64 -> 127      : 0        |                                        |
           128 -> 255      : 0        |                                        |
           256 -> 511      : 0        |                                        |
           512 -> 1023     : 0        |                                        |
          1024 -> 2047     : 0        |                                        |
          2048 -> 4095     : 29626    |*************************************** |
          4096 -> 8191     : 2704     |**                                      |
          8192 -> 16383    : 1090     |                                        |
         16384 -> 32767    : 160      |                                        |
         32768 -> 65535    : 72       |                                        |
         65536 -> 131071   : 32       |                                        |
        131072 -> 262143   : 26       |                                        |
        262144 -> 524287   : 12       |                                        |
        524288 -> 1048575  : 298      |                                        |
      
      All this is based on the trace3 examples written by
      Alexei Starovoitov <ast@plumgrid.com>.
      Signed-off-by: NDaniel Wagner <daniel.wagner@bmw-carit.de>
      Cc: Alexei Starovoitov <ast@plumgrid.com>
      Cc: Alexei Starovoitov <ast@plumgrid.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: linux-kernel@vger.kernel.org
      Cc: netdev@vger.kernel.org
      Acked-by: NAlexei Starovoitov <ast@plumgrid.com>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0fb1170e
  25. 22 5月, 2015 2 次提交
    • A
      samples/bpf: bpf_tail_call example for networking · 530b2c86
      Alexei Starovoitov 提交于
      Usage:
      $ sudo ./sockex3
      IP     src.port -> dst.port               bytes      packets
      127.0.0.1.42010 -> 127.0.0.1.12865         1568            8
      127.0.0.1.59526 -> 127.0.0.1.33778     11422636       173070
      127.0.0.1.33778 -> 127.0.0.1.59526  11260224828       341974
      127.0.0.1.12865 -> 127.0.0.1.42010         1832           12
      IP     src.port -> dst.port               bytes      packets
      127.0.0.1.42010 -> 127.0.0.1.12865         1568            8
      127.0.0.1.59526 -> 127.0.0.1.33778     23198092       351486
      127.0.0.1.33778 -> 127.0.0.1.59526  22972698518       698616
      127.0.0.1.12865 -> 127.0.0.1.42010         1832           12
      
      this example is similar to sockex2 in a way that it accumulates per-flow
      statistics, but it does packet parsing differently.
      sockex2 inlines full packet parser routine into single bpf program.
      This sockex3 example have 4 independent programs that parse vlan, mpls, ip, ipv6
      and one main program that starts the process.
      bpf_tail_call() mechanism allows each program to be small and be called
      on demand potentially multiple times, so that many vlan, mpls, ip in ip,
      gre encapsulations can be parsed. These and other protocol parsers can
      be added or removed at runtime. TLVs can be parsed in similar manner.
      Note, tail_call_cnt dynamic check limits the number of tail calls to 32.
      Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      530b2c86
    • A
      samples/bpf: bpf_tail_call example for tracing · 5bacd780
      Alexei Starovoitov 提交于
      kprobe example that demonstrates how future seccomp programs may look like.
      It attaches to seccomp_phase1() function and tail-calls other BPF programs
      depending on syscall number.
      
      Existing optimized classic BPF seccomp programs generated by Chrome look like:
      if (sd.nr < 121) {
        if (sd.nr < 57) {
          if (sd.nr < 22) {
            if (sd.nr < 7) {
              if (sd.nr < 4) {
                if (sd.nr < 1) {
                  check sys_read
                } else {
                  if (sd.nr < 3) {
                    check sys_write and sys_open
                  } else {
                    check sys_close
                  }
                }
              } else {
            } else {
          } else {
        } else {
      } else {
      }
      
      the future seccomp using native eBPF may look like:
        bpf_tail_call(&sd, &syscall_jmp_table, sd.nr);
      which is simpler, faster and leaves more room for per-syscall checks.
      
      Usage:
      $ sudo ./tracex5
      <...>-366   [001] d...     4.870033: : read(fd=1, buf=00007f6d5bebf000, size=771)
      <...>-369   [003] d...     4.870066: : mmap
      <...>-369   [003] d...     4.870077: : syscall=110 (one of get/set uid/pid/gid)
      <...>-369   [003] d...     4.870089: : syscall=107 (one of get/set uid/pid/gid)
         sh-369   [000] d...     4.891740: : read(fd=0, buf=00000000023d1000, size=512)
         sh-369   [000] d...     4.891747: : write(fd=1, buf=00000000023d3000, size=512)
         sh-369   [000] d...     4.891747: : read(fd=1, buf=00000000023d3000, size=512)
      Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5bacd780
  26. 13 5月, 2015 1 次提交
  27. 07 4月, 2015 1 次提交
    • A
      tc: bpf: add checksum helpers · 91bc4822
      Alexei Starovoitov 提交于
      Commit 608cd71a ("tc: bpf: generalize pedit action") has added the
      possibility to mangle packet data to BPF programs in the tc pipeline.
      This patch adds two helpers bpf_l3_csum_replace() and bpf_l4_csum_replace()
      for fixing up the protocol checksums after the packet mangling.
      
      It also adds 'flags' argument to bpf_skb_store_bytes() helper to avoid
      unnecessary checksum recomputations when BPF programs adjusting l3/l4
      checksums and documents all three helpers in uapi header.
      
      Moreover, a sample program is added to show how BPF programs can make use
      of the mangle and csum helpers.
      Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      91bc4822
  28. 02 4月, 2015 4 次提交
    • A
      samples/bpf: Add kmem_alloc()/free() tracker tool · 9811e353
      Alexei Starovoitov 提交于
      One BPF program attaches to kmem_cache_alloc_node() and
      remembers all allocated objects in the map.
      Another program attaches to kmem_cache_free() and deletes
      corresponding object from the map.
      
      User space walks the map every second and prints any objects
      which are older than 1 second.
      
      Usage:
      
      	$ sudo tracex4
      
      Then start few long living processes. The 'tracex4' will print
      something like this:
      
      	obj 0xffff880465928000 is 13sec old was allocated at ip ffffffff8105dc32
      	obj 0xffff88043181c280 is 13sec old was allocated at ip ffffffff8105dc32
      	obj 0xffff880465848000 is  8sec old was allocated at ip ffffffff8105dc32
      	obj 0xffff8804338bc280 is 15sec old was allocated at ip ffffffff8105dc32
      
      	$ addr2line -fispe vmlinux ffffffff8105dc32
      	do_fork at fork.c:1665
      
      As soon as processes exit the memory is reclaimed and 'tracex4'
      prints nothing.
      
      Similar experiment can be done with the __kmalloc()/kfree() pair.
      Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Link: http://lkml.kernel.org/r/1427312966-8434-10-git-send-email-ast@plumgrid.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      9811e353
    • A
      samples/bpf: Add IO latency analysis (iosnoop/heatmap) tool · 5c7fc2d2
      Alexei Starovoitov 提交于
      BPF C program attaches to
      blk_mq_start_request()/blk_update_request() kprobe events to
      calculate IO latency.
      
      For every completed block IO event it computes the time delta
      in nsec and records in a histogram map:
      
      	map[log10(delta)*10]++
      
      User space reads this histogram map every 2 seconds and prints
      it as a 'heatmap' using gray shades of text terminal. Black
      spaces have many events and white spaces have very few events.
      Left most space is the smallest latency, right most space is
      the largest latency in the range.
      
      Usage:
      
      	$ sudo ./tracex3
      	and do 'sudo dd if=/dev/sda of=/dev/null' in other terminal.
      
      Observe IO latencies and how different activity (like 'make
      kernel') affects it.
      
      Similar experiments can be done for network transmit latencies,
      syscalls, etc.
      
      '-t' flag prints the heatmap using normal ascii characters:
      
      $ sudo ./tracex3 -t
        heatmap of IO latency
        # - many events with this latency
          - few events
      	|1us      |10us     |100us    |1ms      |10ms     |100ms    |1s |10s
      				 *ooo. *O.#.                                    # 221
      			      .  *#     .                                       # 125
      				 ..   .o#*..                                    # 55
      			    .  . .  .  .#O                                      # 37
      				 .#                                             # 175
      				       .#*.                                     # 37
      				  #                                             # 199
      		      .              . *#*.                                     # 55
      				       *#..*                                    # 42
      				  #                                             # 266
      			      ...***Oo#*OO**o#* .                               # 629
      				  #                                             # 271
      				      . .#o* o.*o*                              # 221
      				. . o* *#O..                                    # 50
      Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com>
      Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Link: http://lkml.kernel.org/r/1427312966-8434-9-git-send-email-ast@plumgrid.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      5c7fc2d2
    • A
      samples/bpf: Add counting example for kfree_skb() function calls and the write() syscall · d822a192
      Alexei Starovoitov 提交于
      this example has two probes in one C file that attach to
      different kprove events and use two different maps.
      
      1st probe is x64 specific equivalent of dropmon. It attaches to
      kfree_skb, retrevies 'ip' address of kfree_skb() caller and
      counts number of packet drops at that 'ip' address. User space
      prints 'location - count' map every second.
      
      2nd probe attaches to kprobe:sys_write and computes a histogram
      of different write sizes
      
      Usage:
      	$ sudo tracex2
      	location 0xffffffff81695995 count 1
      	location 0xffffffff816d0da9 count 2
      
      	location 0xffffffff81695995 count 2
      	location 0xffffffff816d0da9 count 2
      
      	location 0xffffffff81695995 count 3
      	location 0xffffffff816d0da9 count 2
      
      	557145+0 records in
      	557145+0 records out
      	285258240 bytes (285 MB) copied, 1.02379 s, 279 MB/s
      		   syscall write() stats
      	     byte_size       : count     distribution
      	       1 -> 1        : 3        |                                      |
      	       2 -> 3        : 0        |                                      |
      	       4 -> 7        : 0        |                                      |
      	       8 -> 15       : 0        |                                      |
      	      16 -> 31       : 2        |                                      |
      	      32 -> 63       : 3        |                                      |
      	      64 -> 127      : 1        |                                      |
      	     128 -> 255      : 1        |                                      |
      	     256 -> 511      : 0        |                                      |
      	     512 -> 1023     : 1118968  |************************************* |
      
      Ctrl-C at any time. Kernel will auto cleanup maps and programs
      
      	$ addr2line -ape ./bld_x64/vmlinux 0xffffffff81695995
      	0xffffffff816d0da9 0xffffffff81695995:
      	./bld_x64/../net/ipv4/icmp.c:1038 0xffffffff816d0da9:
      	./bld_x64/../net/unix/af_unix.c:1231
      Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com>
      Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Link: http://lkml.kernel.org/r/1427312966-8434-8-git-send-email-ast@plumgrid.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      d822a192
    • A
      samples/bpf: Add simple non-portable kprobe filter example · b896c4f9
      Alexei Starovoitov 提交于
      tracex1_kern.c - C program compiled into BPF.
      
      It attaches to kprobe:netif_receive_skb()
      
      When skb->dev->name == "lo", it prints sample debug message into
      trace_pipe via bpf_trace_printk() helper function.
      
      tracex1_user.c - corresponding user space component that:
        - loads BPF program via bpf() syscall
        - opens kprobes:netif_receive_skb event via perf_event_open()
          syscall
        - attaches the program to event via ioctl(event_fd,
          PERF_EVENT_IOC_SET_BPF, prog_fd);
        - prints from trace_pipe
      
      Note, this BPF program is non-portable. It must be recompiled
      with current kernel headers. kprobe is not a stable ABI and
      BPF+kprobe scripts may no longer be meaningful when kernel
      internals change.
      
      No matter in what way the kernel changes, neither the kprobe,
      nor the BPF program can ever crash or corrupt the kernel,
      assuming the kprobes, perf and BPF subsystem has no bugs.
      
      The verifier will detect that the program is using
      bpf_trace_printk() and the kernel will print 'this is a DEBUG
      kernel' warning banner, which means that bpf_trace_printk()
      should be used for debugging of the BPF program only.
      
      Usage:
      $ sudo tracex1
                  ping-19826 [000] d.s2 63103.382648: : skb ffff880466b1ca00 len 84
                  ping-19826 [000] d.s2 63103.382684: : skb ffff880466b1d300 len 84
      
                  ping-19826 [000] d.s2 63104.382533: : skb ffff880466b1ca00 len 84
                  ping-19826 [000] d.s2 63104.382594: : skb ffff880466b1d300 len 84
      Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com>
      Cc: Arnaldo Carvalho de Melo <acme@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Link: http://lkml.kernel.org/r/1427312966-8434-7-git-send-email-ast@plumgrid.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      b896c4f9
  29. 06 12月, 2014 2 次提交
    • A
      samples: bpf: large eBPF program in C · fbe33108
      Alexei Starovoitov 提交于
      sockex2_kern.c is purposefully large eBPF program in C.
      llvm compiles ~200 lines of C code into ~300 eBPF instructions.
      
      It's similar to __skb_flow_dissect() to demonstrate that complex packet parsing
      can be done by eBPF.
      Then it uses (struct flow_keys)->dst IP address (or hash of ipv6 dst) to keep
      stats of number of packets per IP.
      User space loads eBPF program, attaches it to loopback interface and prints
      dest_ip->#packets stats every second.
      
      Usage:
      $sudo samples/bpf/sockex2
      ip 127.0.0.1 count 19
      ip 127.0.0.1 count 178115
      ip 127.0.0.1 count 369437
      ip 127.0.0.1 count 559841
      ip 127.0.0.1 count 750539
      Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fbe33108
    • A
      samples: bpf: trivial eBPF program in C · a8085782
      Alexei Starovoitov 提交于
      this example does the same task as previous socket example
      in assembler, but this one does it in C.
      
      eBPF program in kernel does:
          /* assume that packet is IPv4, load one byte of IP->proto */
          int index = load_byte(skb, ETH_HLEN + offsetof(struct iphdr, protocol));
          long *value;
      
          value = bpf_map_lookup_elem(&my_map, &index);
          if (value)
              __sync_fetch_and_add(value, 1);
      
      Corresponding user space reads map[tcp], map[udp], map[icmp]
      and prints protocol stats every second
      Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a8085782