1. 03 5月, 2017 1 次提交
  2. 18 4月, 2017 3 次提交
    • M
      bpf: lru: Add map-in-map LRU example · 3a5795b8
      Martin KaFai Lau 提交于
      This patch adds a map-in-map LRU example.
      If we know only a subset of cores will use the
      LRU, we can allocate a common LRU list per targeting core
      and store it into an array-of-hashs.
      
      It allows using the common LRU map with map-update performance
      comparable to the BPF_F_NO_COMMON_LRU map but without wasting memory
      on the unused cores that we know they will never access the LRU map.
      
      BPF_F_NO_COMMON_LRU:
      > map_perf_test 32 8 10000000 10000000 | awk '{sum += $3}END{print sum}'
      9234314 (9.23M/s)
      
      map-in-map LRU:
      > map_perf_test 512 8 1260000 80000000 | awk '{sum += $3}END{print sum}'
      9962743 (9.96M/s)
      
      Notes that the max_entries for the map-in-map LRU test is 1260000 which
      is the max_entries for each inner LRU map.  8 processes have been
      started, so 8 * 1260000 = 10080000 (~10M) which is close to what is
      used in the BPF_F_NO_COMMON_LRU test.
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3a5795b8
    • M
      bpf: Allow bpf sample programs (*_user.c) to change bpf_map_def · 9fd63d05
      Martin KaFai Lau 提交于
      The current bpf_map_def is statically defined during compile
      time.  This patch allows the *_user.c program to change it during
      runtime.  It is done by adding load_bpf_file_fixup_map() which
      takes a callback.  The callback will be called before creating
      each map so that it has a chance to modify the bpf_map_def.
      
      The current usecase is to change max_entries in map_perf_test.
      It is interesting to test with a much bigger map size in
      some cases (e.g. the following patch on bpf_lru_map.c).
      However,  it is hard to find one size to fit all testing
      environment.  Hence, it is handy to take the max_entries
      as a cmdline arg and then configure the bpf_map_def during
      runtime.
      
      This patch adds two cmdline args.  One is to configure
      the map's max_entries.  Another is to configure the max_cnt
      which controls how many times a syscall is called.
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9fd63d05
    • M
      bpf: lru: Refactor LRU map tests in map_perf_test · bf8db5d2
      Martin KaFai Lau 提交于
      One more LRU test will be added later in this patch series.
      In this patch, we first move all existing LRU map tests into
      a single syscall (connect) first so that the future new
      LRU test can be added without hunting another syscall.
      
      One of the map name is also changed from percpu_lru_hash_map
      to nocommon_lru_hash_map to avoid the confusion with percpu_hash_map.
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bf8db5d2
  3. 17 3月, 2017 1 次提交
    • A
      samples/bpf: add map_lookup microbenchmark · 95ff141e
      Alexei Starovoitov 提交于
      $ map_perf_test 128
      speed of HASH bpf_map_lookup_elem() in lookups per second
      	w/o JIT		w/JIT
      before	46M		58M
      after	42M		74M
      
      perf report
      before:
          54.23%  map_perf_test  [kernel.kallsyms]  [k] __htab_map_lookup_elem
          14.24%  map_perf_test  [kernel.kallsyms]  [k] lookup_elem_raw
           8.84%  map_perf_test  [kernel.kallsyms]  [k] htab_map_lookup_elem
           5.93%  map_perf_test  [kernel.kallsyms]  [k] bpf_map_lookup_elem
           2.30%  map_perf_test  [kernel.kallsyms]  [k] bpf_prog_da4fc6a3f41761a2
           1.49%  map_perf_test  [kernel.kallsyms]  [k] kprobe_ftrace_handler
      
      after:
          60.03%  map_perf_test  [kernel.kallsyms]  [k] __htab_map_lookup_elem
          18.07%  map_perf_test  [kernel.kallsyms]  [k] lookup_elem_raw
           2.91%  map_perf_test  [kernel.kallsyms]  [k] bpf_prog_da4fc6a3f41761a2
           1.94%  map_perf_test  [kernel.kallsyms]  [k] _einittext
           1.90%  map_perf_test  [kernel.kallsyms]  [k] __audit_syscall_exit
           1.72%  map_perf_test  [kernel.kallsyms]  [k] kprobe_ftrace_handler
      
      Notice that bpf_map_lookup_elem() and htab_map_lookup_elem() are trivial
      functions, yet they take sizeable amount of cpu time.
      htab_map_gen_lookup() removes bpf_map_lookup_elem() and converts
      htab_map_lookup_elem() into three BPF insns which causing cpu time
      for bpf_prog_da4fc6a3f41761a2() slightly increase.
      
      $ map_perf_test 256
      speed of ARRAY bpf_map_lookup_elem() in lookups per second
      	w/o JIT		w/JIT
      before	97M		174M
      after	64M		280M
      
      before:
          37.33%  map_perf_test  [kernel.kallsyms]  [k] array_map_lookup_elem
          13.95%  map_perf_test  [kernel.kallsyms]  [k] bpf_map_lookup_elem
           6.54%  map_perf_test  [kernel.kallsyms]  [k] bpf_prog_da4fc6a3f41761a2
           4.57%  map_perf_test  [kernel.kallsyms]  [k] kprobe_ftrace_handler
      
      after:
          32.86%  map_perf_test  [kernel.kallsyms]  [k] bpf_prog_da4fc6a3f41761a2
           6.54%  map_perf_test  [kernel.kallsyms]  [k] kprobe_ftrace_handler
      
      array_map_gen_lookup() removes calls to array_map_lookup_elem()
      and bpf_map_lookup_elem() and replaces them with 7 bpf insns.
      
      The performance without JIT is slower, since executing extra insns
      in the interpreter is slower than running native C code,
      but with JIT the performance gains are obvious,
      since native C->x86 code is replaced with fewer bpf->x86 instructions.
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      95ff141e
  4. 24 1月, 2017 1 次提交
  5. 16 11月, 2016 1 次提交
    • M
      bpf: Add tests for the LRU bpf_htab · 5db58faf
      Martin KaFai Lau 提交于
      This patch has some unit tests and a test_lru_dist.
      
      The test_lru_dist reads in the numeric keys from a file.
      The files used here are generated by a modified fio-genzipf tool
      originated from the fio test suit.  The sample data file can be
      found here: https://github.com/iamkafai/bpf-lru
      
      The zipf.* data files have 100k numeric keys and the key is also
      ranged from 1 to 100k.
      
      The test_lru_dist outputs the number of unique keys (nr_unique).
      F.e. The following means, 61239 of them is unique out of 100k keys.
      nr_misses means it cannot be found in the LRU map, so nr_misses
      must be >= nr_unique. test_lru_dist also simulates a perfect LRU
      map as a comparison:
      
      [root@arch-fb-vm1 ~]# ~/devshare/fb-kernel/linux/samples/bpf/test_lru_dist \
      /root/zipf.100k.a1_01.out 4000 1
      ...
      test_parallel_lru_dist (map_type:9 map_flags:0x0):
          task:0 BPF LRU: nr_unique:23093(/100000) nr_misses:31603(/100000)
          task:0 Perfect LRU: nr_unique:23093(/100000 nr_misses:34328(/100000)
      ....
      test_parallel_lru_dist (map_type:9 map_flags:0x2):
          task:0 BPF LRU: nr_unique:23093(/100000) nr_misses:31710(/100000)
          task:0 Perfect LRU: nr_unique:23093(/100000 nr_misses:34328(/100000)
      
      [root@arch-fb-vm1 ~]# ~/devshare/fb-kernel/linux/samples/bpf/test_lru_dist \
      /root/zipf.100k.a0_01.out 40000 1
      ...
      test_parallel_lru_dist (map_type:9 map_flags:0x0):
          task:0 BPF LRU: nr_unique:61239(/100000) nr_misses:67054(/100000)
          task:0 Perfect LRU: nr_unique:61239(/100000 nr_misses:66993(/100000)
      ...
      test_parallel_lru_dist (map_type:9 map_flags:0x2):
          task:0 BPF LRU: nr_unique:61239(/100000) nr_misses:67068(/100000)
          task:0 Perfect LRU: nr_unique:61239(/100000 nr_misses:66993(/100000)
      
      LRU map has also been added to map_perf_test:
      /* Global LRU */
      [root@kerneltest003.31.prn1 ~]# for i in 1 4 8; do echo -n "$i cpus: "; \
      ./map_perf_test 16 $i | awk '{r += $3}END{print r " updates"}'; done
       1 cpus: 2934082 updates
       4 cpus: 7391434 updates
       8 cpus: 6500576 updates
      
      /* Percpu LRU */
      [root@kerneltest003.31.prn1 ~]# for i in 1 4 8; do echo -n "$i cpus: "; \
      ./map_perf_test 32 $i | awk '{r += $3}END{print r " updates"}'; done
        1 cpus: 2896553 updates
        4 cpus: 9766395 updates
        8 cpus: 17460553 updates
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5db58faf
  6. 07 4月, 2016 1 次提交
    • N
      samples/bpf: Fix build breakage with map_perf_test_user.c · 77e63534
      Naveen N. Rao 提交于
      Building BPF samples is failing with the below error:
      
      samples/bpf/map_perf_test_user.c: In function ‘main’:
      samples/bpf/map_perf_test_user.c:134:9: error: variable ‘r’ has
      initializer but incomplete type
        struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY};
               ^
      samples/bpf/map_perf_test_user.c:134:21: error: ‘RLIM_INFINITY’
      undeclared (first use in this function)
        struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY};
                           ^
      samples/bpf/map_perf_test_user.c:134:21: note: each undeclared
      identifier is reported only once for each function it appears in
      samples/bpf/map_perf_test_user.c:134:9: warning: excess elements in
      struct initializer [enabled by default]
        struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY};
               ^
      samples/bpf/map_perf_test_user.c:134:9: warning: (near initialization
      for ‘r’) [enabled by default]
      samples/bpf/map_perf_test_user.c:134:9: warning: excess elements in
      struct initializer [enabled by default]
      samples/bpf/map_perf_test_user.c:134:9: warning: (near initialization
      for ‘r’) [enabled by default]
      samples/bpf/map_perf_test_user.c:134:16: error: storage size of ‘r’
      isn’t known
        struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY};
                      ^
      samples/bpf/map_perf_test_user.c:139:2: warning: implicit declaration of
      function ‘setrlimit’ [-Wimplicit-function-declaration]
        setrlimit(RLIMIT_MEMLOCK, &r);
        ^
      samples/bpf/map_perf_test_user.c:139:12: error: ‘RLIMIT_MEMLOCK’
      undeclared (first use in this function)
        setrlimit(RLIMIT_MEMLOCK, &r);
                  ^
      samples/bpf/map_perf_test_user.c:134:16: warning: unused variable ‘r’
      [-Wunused-variable]
        struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY};
                      ^
      make[2]: *** [samples/bpf/map_perf_test_user.o] Error 1
      
      Fix this by including the necessary header file.
      
      Cc: Alexei Starovoitov <ast@fb.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      77e63534
  7. 09 3月, 2016 1 次提交