1. 29 10月, 2021 7 次提交
    • K
      bpf: Add bpf_kallsyms_lookup_name helper · d6aef08a
      Kumar Kartikeya Dwivedi 提交于
      This helper allows us to get the address of a kernel symbol from inside
      a BPF_PROG_TYPE_SYSCALL prog (used by gen_loader), so that we can
      relocate typeless ksym vars.
      Signed-off-by: NKumar Kartikeya Dwivedi <memxor@gmail.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NSong Liu <songliubraving@fb.com>
      Link: https://lore.kernel.org/bpf/20211028063501.2239335-2-memxor@gmail.com
      d6aef08a
    • A
      Merge branch 'Implement bloom filter map' · 2895f48f
      Alexei Starovoitov 提交于
      Joanne Koong says:
      
      ====================
      
      This patchset adds a new kind of bpf map: the bloom filter map.
      Bloom filters are a space-efficient probabilistic data structure
      used to quickly test whether an element exists in a set.
      For a brief overview about how bloom filters work,
      https://en.wikipedia.org/wiki/Bloom_filter
      may be helpful.
      
      One example use-case is an application leveraging a bloom filter
      map to determine whether a computationally expensive hashmap
      lookup can be avoided. If the element was not found in the bloom
      filter map, the hashmap lookup can be skipped.
      
      This patchset includes benchmarks for testing the performance of
      the bloom filter for different entry sizes and different number of
      hash functions used, as well as comparisons for hashmap lookups
      with vs. without the bloom filter.
      
      A high level overview of this patchset is as follows:
      1/5 - kernel changes for adding bloom filter map
      2/5 - libbpf changes for adding map_extra flags
      3/5 - tests for the bloom filter map
      4/5 - benchmarks for bloom filter lookup/update throughput and false positive
      rate
      5/5 - benchmarks for how hashmap lookups perform with vs. without the bloom
      filter
      
      v5 -> v6:
      * in 1/5: remove "inline" from the hash function, add check in syscall to
      fail out in cases where map_extra is not 0 for non-bloom-filter maps,
      fix alignment matching issues, move "map_extra flags" comments to inside
      the bpf_attr struct, add bpf_map_info map_extra changes here, add map_extra
      assignment in bpf_map_get_info_by_fd, change hash value_size to u32 instead of
      a u64
      * in 2/5: remove bpf_map_info map_extra changes, remove TODO comment about
      extending BTF arrays to cover u64s, cast to unsigned long long for %llx when
      printing out map_extra flags
      * in 3/5: use __type(value, ...) instead of __uint(value_size, ...) for values
      and keys
      * in 4/5: fix wrong bounds for the index when iterating through random values,
      update commit message to include update+lookup benchmark results for 8 byte
      and 64-byte value sizes, remove explicit global bool initializaton to false
      for hashmap_use_bloom and count_false_hits variables
      
      v4 -> v5:
      * Change the "bitset map with bloom filter capabilities" to a bloom filter map
      with max_entries signifying the number of unique entries expected in the bloom
      filter, remove bitset tests
      * Reduce verbiage by changing "bloom_filter" to "bloom", and renaming progs to
      more concise names.
      * in 2/5: remove "map_extra" from struct definitions that are frozen, create a
      "bpf_create_map_params" struct to propagate map_extra to the kernel at map
      creation time, change map_extra to __u64
      * in 4/5: check pthread condition variable in a loop when generating initial
      map data, remove "err" checks where not pragmatic, generate random values
      for the hashmap in the setup() instead of in the bpf program, add check_args()
      for checking that there aren't more requested entries than possible unique
      entries for the specified value size
      * in 5/5: Update commit message with updated benchmark data
      
      v3 -> v4:
      * Generalize the bloom filter map to be a bitset map with bloom filter
      capabilities
      * Add map_extra flags; pass in nr_hash_funcs through lower 4 bits of map_extra
      for the bitset map
      * Add tests for the bitset map (non-bloom filter) functionality
      * In the benchmarks, stats are computed only as monotonic increases, and place
      stats in a struct instead of as a percpu_array bpf map
      
      v2 -> v3:
      * Add libbpf changes for supporting nr_hash_funcs, instead of passing the
      number of hash functions through map_flags.
      * Separate the hashing logic in kernel/bpf/bloom_filter.c into a helper
      function
      
      v1 -> v2:
      * Remove libbpf changes, and pass the number of hash functions through
      map_flags instead.
      * Default to using 5 hash functions if no number of hash functions
      is specified.
      * Use set_bit instead of spinlocks in the bloom filter bitmap. This
      improved the speed significantly. For example, using 5 hash functions
      with 100k entries, there was roughly a 35% speed increase.
      * Use jhash2 (instead of jhash) for u32-aligned value sizes. This
      increased the speed by roughly 5 to 15%. When using jhash2 on value
      sizes non-u32 aligned (truncating any remainder bits), there was not
      a noticeable difference.
      * Add test for using the bloom filter as an inner map.
      * Reran the benchmarks, updated the commit messages to correspond to
      the new results.
      ====================
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      2895f48f
    • J
      bpf/benchs: Add benchmarks for comparing hashmap lookups w/ vs. w/out bloom filter · f44bc543
      Joanne Koong 提交于
      This patch adds benchmark tests for comparing the performance of hashmap
      lookups without the bloom filter vs. hashmap lookups with the bloom filter.
      
      Checking the bloom filter first for whether the element exists should
      overall enable a higher throughput for hashmap lookups, since if the
      element does not exist in the bloom filter, we can avoid a costly lookup in
      the hashmap.
      
      On average, using 5 hash functions in the bloom filter tended to perform
      the best across the widest range of different entry sizes. The benchmark
      results using 5 hash functions (running on 8 threads on a machine with one
      numa node, and taking the average of 3 runs) were roughly as follows:
      
      value_size = 4 bytes -
      	10k entries: 30% faster
      	50k entries: 40% faster
      	100k entries: 40% faster
      	500k entres: 70% faster
      	1 million entries: 90% faster
      	5 million entries: 140% faster
      
      value_size = 8 bytes -
      	10k entries: 30% faster
      	50k entries: 40% faster
      	100k entries: 50% faster
      	500k entres: 80% faster
      	1 million entries: 100% faster
      	5 million entries: 150% faster
      
      value_size = 16 bytes -
      	10k entries: 20% faster
      	50k entries: 30% faster
      	100k entries: 35% faster
      	500k entres: 65% faster
      	1 million entries: 85% faster
      	5 million entries: 110% faster
      
      value_size = 40 bytes -
      	10k entries: 5% faster
      	50k entries: 15% faster
      	100k entries: 20% faster
      	500k entres: 65% faster
      	1 million entries: 75% faster
      	5 million entries: 120% faster
      Signed-off-by: NJoanne Koong <joannekoong@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20211027234504.30744-6-joannekoong@fb.com
      f44bc543
    • J
      bpf/benchs: Add benchmark tests for bloom filter throughput + false positive · 57fd1c63
      Joanne Koong 提交于
      This patch adds benchmark tests for the throughput (for lookups + updates)
      and the false positive rate of bloom filter lookups, as well as some
      minor refactoring of the bash script for running the benchmarks.
      
      These benchmarks show that as the number of hash functions increases,
      the throughput and the false positive rate of the bloom filter decreases.
      >From the benchmark data, the approximate average false-positive rates
      are roughly as follows:
      
      1 hash function = ~30%
      2 hash functions = ~15%
      3 hash functions = ~5%
      4 hash functions = ~2.5%
      5 hash functions = ~1%
      6 hash functions = ~0.5%
      7 hash functions  = ~0.35%
      8 hash functions = ~0.15%
      9 hash functions = ~0.1%
      10 hash functions = ~0%
      
      For reference data, the benchmarks run on one thread on a machine
      with one numa node for 1 to 5 hash functions for 8-byte and 64-byte
      values are as follows:
      
      1 hash function:
        50k entries
      	8-byte value
      	    Lookups - 51.1 M/s operations
      	    Updates - 33.6 M/s operations
      	    False positive rate: 24.15%
      	64-byte value
      	    Lookups - 15.7 M/s operations
      	    Updates - 15.1 M/s operations
      	    False positive rate: 24.2%
        100k entries
      	8-byte value
      	    Lookups - 51.0 M/s operations
      	    Updates - 33.4 M/s operations
      	    False positive rate: 24.04%
      	64-byte value
      	    Lookups - 15.6 M/s operations
      	    Updates - 14.6 M/s operations
      	    False positive rate: 24.06%
        500k entries
      	8-byte value
      	    Lookups - 50.5 M/s operations
      	    Updates - 33.1 M/s operations
      	    False positive rate: 27.45%
      	64-byte value
      	    Lookups - 15.6 M/s operations
      	    Updates - 14.2 M/s operations
      	    False positive rate: 27.42%
        1 mil entries
      	8-byte value
      	    Lookups - 49.7 M/s operations
      	    Updates - 32.9 M/s operations
      	    False positive rate: 27.45%
      	64-byte value
      	    Lookups - 15.4 M/s operations
      	    Updates - 13.7 M/s operations
      	    False positive rate: 27.58%
        2.5 mil entries
      	8-byte value
      	    Lookups - 47.2 M/s operations
      	    Updates - 31.8 M/s operations
      	    False positive rate: 30.94%
      	64-byte value
      	    Lookups - 15.3 M/s operations
      	    Updates - 13.2 M/s operations
      	    False positive rate: 30.95%
        5 mil entries
      	8-byte value
      	    Lookups - 41.1 M/s operations
      	    Updates - 28.1 M/s operations
      	    False positive rate: 31.01%
      	64-byte value
      	    Lookups - 13.3 M/s operations
      	    Updates - 11.4 M/s operations
      	    False positive rate: 30.98%
      
      2 hash functions:
        50k entries
      	8-byte value
      	    Lookups - 34.1 M/s operations
      	    Updates - 20.1 M/s operations
      	    False positive rate: 9.13%
      	64-byte value
      	    Lookups - 8.4 M/s operations
      	    Updates - 7.9 M/s operations
      	    False positive rate: 9.21%
        100k entries
      	8-byte value
      	    Lookups - 33.7 M/s operations
      	    Updates - 18.9 M/s operations
      	    False positive rate: 9.13%
      	64-byte value
      	    Lookups - 8.4 M/s operations
      	    Updates - 7.7 M/s operations
      	    False positive rate: 9.19%
        500k entries
      	8-byte value
      	    Lookups - 32.7 M/s operations
      	    Updates - 18.1 M/s operations
      	    False positive rate: 12.61%
      	64-byte value
      	    Lookups - 8.4 M/s operations
      	    Updates - 7.5 M/s operations
      	    False positive rate: 12.61%
        1 mil entries
      	8-byte value
      	    Lookups - 30.6 M/s operations
      	    Updates - 18.9 M/s operations
      	    False positive rate: 12.54%
      	64-byte value
      	    Lookups - 8.0 M/s operations
      	    Updates - 7.0 M/s operations
      	    False positive rate: 12.52%
        2.5 mil entries
      	8-byte value
      	    Lookups - 25.3 M/s operations
      	    Updates - 16.7 M/s operations
      	    False positive rate: 16.77%
      	64-byte value
      	    Lookups - 7.9 M/s operations
      	    Updates - 6.5 M/s operations
      	    False positive rate: 16.88%
        5 mil entries
      	8-byte value
      	    Lookups - 20.8 M/s operations
      	    Updates - 14.7 M/s operations
      	    False positive rate: 16.78%
      	64-byte value
      	    Lookups - 7.0 M/s operations
      	    Updates - 6.0 M/s operations
      	    False positive rate: 16.78%
      
      3 hash functions:
        50k entries
      	8-byte value
      	    Lookups - 25.1 M/s operations
      	    Updates - 14.6 M/s operations
      	    False positive rate: 7.65%
      	64-byte value
      	    Lookups - 5.8 M/s operations
      	    Updates - 5.5 M/s operations
      	    False positive rate: 7.58%
        100k entries
      	8-byte value
      	    Lookups - 24.7 M/s operations
      	    Updates - 14.1 M/s operations
      	    False positive rate: 7.71%
      	64-byte value
      	    Lookups - 5.8 M/s operations
      	    Updates - 5.3 M/s operations
      	    False positive rate: 7.62%
        500k entries
      	8-byte value
      	    Lookups - 22.9 M/s operations
      	    Updates - 13.9 M/s operations
      	    False positive rate: 2.62%
      	64-byte value
      	    Lookups - 5.6 M/s operations
      	    Updates - 4.8 M/s operations
      	    False positive rate: 2.7%
        1 mil entries
      	8-byte value
      	    Lookups - 19.8 M/s operations
      	    Updates - 12.6 M/s operations
      	    False positive rate: 2.60%
      	64-byte value
      	    Lookups - 5.3 M/s operations
      	    Updates - 4.4 M/s operations
      	    False positive rate: 2.69%
        2.5 mil entries
      	8-byte value
      	    Lookups - 16.2 M/s operations
      	    Updates - 10.7 M/s operations
      	    False positive rate: 4.49%
      	64-byte value
      	    Lookups - 4.9 M/s operations
      	    Updates - 4.1 M/s operations
      	    False positive rate: 4.41%
        5 mil entries
      	8-byte value
      	    Lookups - 18.8 M/s operations
      	    Updates - 9.2 M/s operations
      	    False positive rate: 4.45%
      	64-byte value
      	    Lookups - 5.2 M/s operations
      	    Updates - 3.9 M/s operations
      	    False positive rate: 4.54%
      
      4 hash functions:
        50k entries
      	8-byte value
      	    Lookups - 19.7 M/s operations
      	    Updates - 11.1 M/s operations
      	    False positive rate: 1.01%
      	64-byte value
      	    Lookups - 4.4 M/s operations
      	    Updates - 4.0 M/s operations
      	    False positive rate: 1.00%
        100k entries
      	8-byte value
      	    Lookups - 19.5 M/s operations
      	    Updates - 10.9 M/s operations
      	    False positive rate: 1.00%
      	64-byte value
      	    Lookups - 4.3 M/s operations
      	    Updates - 3.9 M/s operations
      	    False positive rate: 0.97%
        500k entries
      	8-byte value
      	    Lookups - 18.2 M/s operations
      	    Updates - 10.6 M/s operations
      	    False positive rate: 2.05%
      	64-byte value
      	    Lookups - 4.3 M/s operations
      	    Updates - 3.7 M/s operations
      	    False positive rate: 2.05%
        1 mil entries
      	8-byte value
      	    Lookups - 15.5 M/s operations
      	    Updates - 9.6 M/s operations
      	    False positive rate: 1.99%
      	64-byte value
      	    Lookups - 4.0 M/s operations
      	    Updates - 3.4 M/s operations
      	    False positive rate: 1.99%
        2.5 mil entries
      	8-byte value
      	    Lookups - 13.8 M/s operations
      	    Updates - 7.7 M/s operations
      	    False positive rate: 3.91%
      	64-byte value
      	    Lookups - 3.7 M/s operations
      	    Updates - 3.6 M/s operations
      	    False positive rate: 3.78%
        5 mil entries
      	8-byte value
      	    Lookups - 13.0 M/s operations
      	    Updates - 6.9 M/s operations
      	    False positive rate: 3.93%
      	64-byte value
      	    Lookups - 3.5 M/s operations
      	    Updates - 3.7 M/s operations
      	    False positive rate: 3.39%
      
      5 hash functions:
        50k entries
      	8-byte value
      	    Lookups - 16.4 M/s operations
      	    Updates - 9.1 M/s operations
      	    False positive rate: 0.78%
      	64-byte value
      	    Lookups - 3.5 M/s operations
      	    Updates - 3.2 M/s operations
      	    False positive rate: 0.77%
        100k entries
      	8-byte value
      	    Lookups - 16.3 M/s operations
      	    Updates - 9.0 M/s operations
      	    False positive rate: 0.79%
      	64-byte value
      	    Lookups - 3.5 M/s operations
      	    Updates - 3.2 M/s operations
      	    False positive rate: 0.78%
        500k entries
      	8-byte value
      	    Lookups - 15.1 M/s operations
      	    Updates - 8.8 M/s operations
      	    False positive rate: 1.82%
      	64-byte value
      	    Lookups - 3.4 M/s operations
      	    Updates - 3.0 M/s operations
      	    False positive rate: 1.78%
        1 mil entries
      	8-byte value
      	    Lookups - 13.2 M/s operations
      	    Updates - 7.8 M/s operations
      	    False positive rate: 1.81%
      	64-byte value
      	    Lookups - 3.2 M/s operations
      	    Updates - 2.8 M/s operations
      	    False positive rate: 1.80%
        2.5 mil entries
      	8-byte value
      	    Lookups - 10.5 M/s operations
      	    Updates - 5.9 M/s operations
      	    False positive rate: 0.29%
      	64-byte value
      	    Lookups - 3.2 M/s operations
      	    Updates - 2.4 M/s operations
      	    False positive rate: 0.28%
        5 mil entries
      	8-byte value
      	    Lookups - 9.6 M/s operations
      	    Updates - 5.7 M/s operations
      	    False positive rate: 0.30%
      	64-byte value
      	    Lookups - 3.2 M/s operations
      	    Updates - 2.7 M/s operations
      	    False positive rate: 0.30%
      Signed-off-by: NJoanne Koong <joannekoong@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20211027234504.30744-5-joannekoong@fb.com
      57fd1c63
    • J
      selftests/bpf: Add bloom filter map test cases · ed9109ad
      Joanne Koong 提交于
      This patch adds test cases for bpf bloom filter maps. They include tests
      checking against invalid operations by userspace, tests for using the
      bloom filter map as an inner map, and a bpf program that queries the
      bloom filter map for values added by a userspace program.
      Signed-off-by: NJoanne Koong <joannekoong@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20211027234504.30744-4-joannekoong@fb.com
      ed9109ad
    • J
      libbpf: Add "map_extra" as a per-map-type extra flag · 47512102
      Joanne Koong 提交于
      This patch adds the libbpf infrastructure for supporting a
      per-map-type "map_extra" field, whose definition will be
      idiosyncratic depending on map type.
      
      For example, for the bloom filter map, the lower 4 bits of
      map_extra is used to denote the number of hash functions.
      
      Please note that until libbpf 1.0 is here, the
      "bpf_create_map_params" struct is used as a temporary
      means for propagating the map_extra field to the kernel.
      Signed-off-by: NJoanne Koong <joannekoong@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20211027234504.30744-3-joannekoong@fb.com
      47512102
    • J
      bpf: Add bloom filter map implementation · 9330986c
      Joanne Koong 提交于
      This patch adds the kernel-side changes for the implementation of
      a bpf bloom filter map.
      
      The bloom filter map supports peek (determining whether an element
      is present in the map) and push (adding an element to the map)
      operations.These operations are exposed to userspace applications
      through the already existing syscalls in the following way:
      
      BPF_MAP_LOOKUP_ELEM -> peek
      BPF_MAP_UPDATE_ELEM -> push
      
      The bloom filter map does not have keys, only values. In light of
      this, the bloom filter map's API matches that of queue stack maps:
      user applications use BPF_MAP_LOOKUP_ELEM/BPF_MAP_UPDATE_ELEM
      which correspond internally to bpf_map_peek_elem/bpf_map_push_elem,
      and bpf programs must use the bpf_map_peek_elem and bpf_map_push_elem
      APIs to query or add an element to the bloom filter map. When the
      bloom filter map is created, it must be created with a key_size of 0.
      
      For updates, the user will pass in the element to add to the map
      as the value, with a NULL key. For lookups, the user will pass in the
      element to query in the map as the value, with a NULL key. In the
      verifier layer, this requires us to modify the argument type of
      a bloom filter's BPF_FUNC_map_peek_elem call to ARG_PTR_TO_MAP_VALUE;
      as well, in the syscall layer, we need to copy over the user value
      so that in bpf_map_peek_elem, we know which specific value to query.
      
      A few things to please take note of:
       * If there are any concurrent lookups + updates, the user is
      responsible for synchronizing this to ensure no false negative lookups
      occur.
       * The number of hashes to use for the bloom filter is configurable from
      userspace. If no number is specified, the default used will be 5 hash
      functions. The benchmarks later in this patchset can help compare the
      performance of using different number of hashes on different entry
      sizes. In general, using more hashes decreases both the false positive
      rate and the speed of a lookup.
       * Deleting an element in the bloom filter map is not supported.
       * The bloom filter map may be used as an inner map.
       * The "max_entries" size that is specified at map creation time is used
      to approximate a reasonable bitmap size for the bloom filter, and is not
      otherwise strictly enforced. If the user wishes to insert more entries
      into the bloom filter than "max_entries", they may do so but they should
      be aware that this may lead to a higher false positive rate.
      Signed-off-by: NJoanne Koong <joannekoong@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20211027234504.30744-2-joannekoong@fb.com
      9330986c
  2. 28 10月, 2021 11 次提交
    • T
      bpf, tests: Add module parameter test_suite to test_bpf module · b066abba
      Tiezhu Yang 提交于
      After commit 9298e63e ("bpf/tests: Add exhaustive tests of ALU
      operand magnitudes"), when modprobe test_bpf.ko with JIT on mips64,
      there exists segment fault due to the following reason:
      
        [...]
        ALU64_MOV_X: all register value magnitudes jited:1
        Break instruction in kernel code[#1]
        [...]
      
      It seems that the related JIT implementations of some test cases
      in test_bpf() have problems. At this moment, I do not care about
      the segment fault while I just want to verify the test cases of
      tail calls.
      
      Based on the above background and motivation, add the following
      module parameter test_suite to the test_bpf.ko:
      
        test_suite=<string>: only the specified test suite will be run, the
        string can be "test_bpf", "test_tail_calls" or "test_skb_segment".
      
      If test_suite is not specified, but test_id, test_name or test_range
      is specified, set 'test_bpf' as the default test suite. This is useful
      to only test the corresponding test suite when specifying the valid
      test_suite string.
      
      Any invalid test suite will result in -EINVAL being returned and no
      tests being run. If the test_suite is not specified or specified as
      empty string, it does not change the current logic, all of the test
      cases will be run.
      
      Here are some test results:
      
       # dmesg -c
       # modprobe test_bpf
       # dmesg | grep Summary
       test_bpf: Summary: 1009 PASSED, 0 FAILED, [0/997 JIT'ed]
       test_bpf: test_tail_calls: Summary: 8 PASSED, 0 FAILED, [0/8 JIT'ed]
       test_bpf: test_skb_segment: Summary: 2 PASSED, 0 FAILED
      
       # rmmod test_bpf
       # dmesg -c
       # modprobe test_bpf test_suite=test_bpf
       # dmesg | tail -1
       test_bpf: Summary: 1009 PASSED, 0 FAILED, [0/997 JIT'ed]
      
       # rmmod test_bpf
       # dmesg -c
       # modprobe test_bpf test_suite=test_tail_calls
       # dmesg
       test_bpf: #0 Tail call leaf jited:0 21 PASS
       [...]
       test_bpf: #7 Tail call error path, index out of range jited:0 32 PASS
       test_bpf: test_tail_calls: Summary: 8 PASSED, 0 FAILED, [0/8 JIT'ed]
      
       # rmmod test_bpf
       # dmesg -c
       # modprobe test_bpf test_suite=test_skb_segment
       # dmesg
       test_bpf: #0 gso_with_rx_frags PASS
       test_bpf: #1 gso_linear_no_head_frag PASS
       test_bpf: test_skb_segment: Summary: 2 PASSED, 0 FAILED
      
       # rmmod test_bpf
       # dmesg -c
       # modprobe test_bpf test_id=1
       # dmesg
       test_bpf: test_bpf: set 'test_bpf' as the default test_suite.
       test_bpf: #1 TXA jited:0 54 51 50 PASS
       test_bpf: Summary: 1 PASSED, 0 FAILED, [0/1 JIT'ed]
      
       # rmmod test_bpf
       # dmesg -c
       # modprobe test_bpf test_suite=test_bpf test_name=TXA
       # dmesg
       test_bpf: #1 TXA jited:0 54 50 51 PASS
       test_bpf: Summary: 1 PASSED, 0 FAILED, [0/1 JIT'ed]
      
       # rmmod test_bpf
       # dmesg -c
       # modprobe test_bpf test_suite=test_tail_calls test_range=6,7
       # dmesg
       test_bpf: #6 Tail call error path, NULL target jited:0 41 PASS
       test_bpf: #7 Tail call error path, index out of range jited:0 32 PASS
       test_bpf: test_tail_calls: Summary: 2 PASSED, 0 FAILED, [0/2 JIT'ed]
      
       # rmmod test_bpf
       # dmesg -c
       # modprobe test_bpf test_suite=test_skb_segment test_id=1
       # dmesg
       test_bpf: #1 gso_linear_no_head_frag PASS
       test_bpf: test_skb_segment: Summary: 1 PASSED, 0 FAILED
      
      By the way, the above segment fault has been fixed in the latest bpf-next
      tree which contains the mips64 JIT rework.
      Signed-off-by: NTiezhu Yang <yangtiezhu@loongson.cn>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Tested-by: NJohan Almbladh <johan.almbladh@anyfinetworks.com>
      Acked-by: NJohan Almbladh <johan.almbladh@anyfinetworks.com>
      Link: https://lore.kernel.org/bpf/1635384321-28128-1-git-send-email-yangtiezhu@loongson.cn
      b066abba
    • T
      riscv, bpf: Add BPF exception tables · 252c765b
      Tong Tiangen 提交于
      When a tracing BPF program attempts to read memory without using the
      bpf_probe_read() helper, the verifier marks the load instruction with
      the BPF_PROBE_MEM flag. Since the riscv JIT does not currently recognize
      this flag it falls back to the interpreter.
      
      Add support for BPF_PROBE_MEM, by appending an exception table to the
      BPF program. If the load instruction causes a data abort, the fixup
      infrastructure finds the exception table and fixes up the fault, by
      clearing the destination register and jumping over the faulting
      instruction.
      
      A more generic solution would add a "handler" field to the table entry,
      like on x86 and s390. The same issue in ARM64 is fixed in 80083428
      ("bpf, arm64: Add BPF exception tables").
      Signed-off-by: NTong Tiangen <tongtiangen@huawei.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Tested-by: NPu Lehui <pulehui@huawei.com>
      Tested-by: NBjörn Töpel <bjorn@kernel.org>
      Acked-by: NBjörn Töpel <bjorn@kernel.org>
      Link: https://lore.kernel.org/bpf/20211027111822.3801679-1-tongtiangen@huawei.com
      252c765b
    • A
      Merge branch 'selftests/bpf: parallel mode improvement' · 03e6a7a9
      Andrii Nakryiko 提交于
      Yucong Sun says:
      
      ====================
      
      Several patches to improve parallel execution mode, updating vmtest.sh
      and fixed two previously dropped patches according to feedback.
      ====================
      Signed-off-by: NAndrii Nakryiko <andrii@kernel.org>
      03e6a7a9
    • Y
      selftests/bpf: Adding a namespace reset for tc_redirect · e1ef62a4
      Yucong Sun 提交于
      This patch delete ns_src/ns_dst/ns_redir namespaces before recreating
      them, making the test more robust.
      Signed-off-by: NYucong Sun <sunyucong@gmail.com>
      Signed-off-by: NAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20211025223345.2136168-5-fallentree@fb.com
      e1ef62a4
    • Y
      selftests/bpf: Fix attach_probe in parallel mode · 9e7240fb
      Yucong Sun 提交于
      This patch makes attach_probe uses its own method as attach point,
      avoiding conflict with other tests like bpf_cookie.
      Signed-off-by: NYucong Sun <sunyucong@gmail.com>
      Signed-off-by: NAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20211025223345.2136168-4-fallentree@fb.com
      9e7240fb
    • Y
      selfetests/bpf: Update vmtest.sh defaults · 547208a3
      Yucong Sun 提交于
      Increase memory to 4G, 8 SMP core with host cpu passthrough. This
      make it run faster in parallel mode and more likely to succeed.
      Signed-off-by: NYucong Sun <sunyucong@gmail.com>
      Signed-off-by: NAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20211025223345.2136168-2-fallentree@fb.com
      547208a3
    • A
      Merge branch 'bpf: use 32bit safe version of u64_stats' · f9d532fc
      Alexei Starovoitov 提交于
      Eric Dumazet says:
      
      ====================
      
      From: Eric Dumazet <edumazet@google.com>
      
      Two first patches fix bugs added in 5.1 and 5.5
      
      Third patch replaces the u64 fields in struct bpf_prog_stats
      with u64_stats_t ones to avoid possible sampling errors,
      in case of load/store stearing.
      ====================
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      f9d532fc
    • E
      bpf: Use u64_stats_t in struct bpf_prog_stats · 61a0abae
      Eric Dumazet 提交于
      Commit 316580b6 ("u64_stats: provide u64_stats_t type")
      fixed possible load/store tearing on 64bit arches.
      
      For instance the following C code
      
      stats->nsecs += sched_clock() - start;
      
      Could be rightfully implemented like this by a compiler,
      confusing concurrent readers a lot:
      
      stats->nsecs += sched_clock();
      // arbitrary delay
      stats->nsecs -= start;
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20211026214133.3114279-4-eric.dumazet@gmail.com
      61a0abae
    • E
      bpf: Fixes possible race in update_prog_stats() for 32bit arches · d979617a
      Eric Dumazet 提交于
      It seems update_prog_stats() suffers from same issue fixed
      in the prior patch:
      
      As it can run while interrupts are enabled, it could
      be re-entered and the u64_stats syncp could be mangled.
      
      Fixes: fec56f58 ("bpf: Introduce BPF trampoline")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20211026214133.3114279-3-eric.dumazet@gmail.com
      d979617a
    • E
      bpf: Avoid races in __bpf_prog_run() for 32bit arches · f941eadd
      Eric Dumazet 提交于
      __bpf_prog_run() can run from non IRQ contexts, meaning
      it could be re entered if interrupted.
      
      This calls for the irq safe variant of u64_stats_update_{begin|end},
      or risk a deadlock.
      
      This patch is a nop on 64bit arches, fortunately.
      
      syzbot report:
      
      WARNING: inconsistent lock state
      5.12.0-rc3-syzkaller #0 Not tainted
      --------------------------------
      inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
      udevd/4013 [HC0[0]:SC0[0]:HE1:SE1] takes:
      ff7c9dec (&(&pstats->syncp)->seq){+.?.}-{0:0}, at: sk_filter include/linux/filter.h:867 [inline]
      ff7c9dec (&(&pstats->syncp)->seq){+.?.}-{0:0}, at: do_one_broadcast net/netlink/af_netlink.c:1468 [inline]
      ff7c9dec (&(&pstats->syncp)->seq){+.?.}-{0:0}, at: netlink_broadcast_filtered+0x27c/0x4fc net/netlink/af_netlink.c:1520
      {IN-SOFTIRQ-W} state was registered at:
        lock_acquire.part.0+0xf0/0x41c kernel/locking/lockdep.c:5510
        lock_acquire+0x6c/0x74 kernel/locking/lockdep.c:5483
        do_write_seqcount_begin_nested include/linux/seqlock.h:520 [inline]
        do_write_seqcount_begin include/linux/seqlock.h:545 [inline]
        u64_stats_update_begin include/linux/u64_stats_sync.h:129 [inline]
        bpf_prog_run_pin_on_cpu include/linux/filter.h:624 [inline]
        bpf_prog_run_clear_cb+0x1bc/0x270 include/linux/filter.h:755
        run_filter+0xa0/0x17c net/packet/af_packet.c:2031
        packet_rcv+0xc0/0x3e0 net/packet/af_packet.c:2104
        dev_queue_xmit_nit+0x2bc/0x39c net/core/dev.c:2387
        xmit_one net/core/dev.c:3588 [inline]
        dev_hard_start_xmit+0x94/0x518 net/core/dev.c:3609
        sch_direct_xmit+0x11c/0x1f0 net/sched/sch_generic.c:313
        qdisc_restart net/sched/sch_generic.c:376 [inline]
        __qdisc_run+0x194/0x7f8 net/sched/sch_generic.c:384
        qdisc_run include/net/pkt_sched.h:136 [inline]
        qdisc_run include/net/pkt_sched.h:128 [inline]
        __dev_xmit_skb net/core/dev.c:3795 [inline]
        __dev_queue_xmit+0x65c/0xf84 net/core/dev.c:4150
        dev_queue_xmit+0x14/0x18 net/core/dev.c:4215
        neigh_resolve_output net/core/neighbour.c:1491 [inline]
        neigh_resolve_output+0x170/0x228 net/core/neighbour.c:1471
        neigh_output include/net/neighbour.h:510 [inline]
        ip6_finish_output2+0x2e4/0x9fc net/ipv6/ip6_output.c:117
        __ip6_finish_output net/ipv6/ip6_output.c:182 [inline]
        __ip6_finish_output+0x164/0x3f8 net/ipv6/ip6_output.c:161
        ip6_finish_output+0x2c/0xb0 net/ipv6/ip6_output.c:192
        NF_HOOK_COND include/linux/netfilter.h:290 [inline]
        ip6_output+0x74/0x294 net/ipv6/ip6_output.c:215
        dst_output include/net/dst.h:448 [inline]
        NF_HOOK include/linux/netfilter.h:301 [inline]
        NF_HOOK include/linux/netfilter.h:295 [inline]
        mld_sendpack+0x2a8/0x7e4 net/ipv6/mcast.c:1679
        mld_send_cr net/ipv6/mcast.c:1975 [inline]
        mld_ifc_timer_expire+0x1e8/0x494 net/ipv6/mcast.c:2474
        call_timer_fn+0xd0/0x570 kernel/time/timer.c:1431
        expire_timers kernel/time/timer.c:1476 [inline]
        __run_timers kernel/time/timer.c:1745 [inline]
        run_timer_softirq+0x2e4/0x384 kernel/time/timer.c:1758
        __do_softirq+0x204/0x7ac kernel/softirq.c:345
        do_softirq_own_stack include/asm-generic/softirq_stack.h:10 [inline]
        invoke_softirq kernel/softirq.c:228 [inline]
        __irq_exit_rcu+0x1d8/0x200 kernel/softirq.c:422
        irq_exit+0x10/0x3c kernel/softirq.c:446
        __handle_domain_irq+0xb4/0x120 kernel/irq/irqdesc.c:692
        handle_domain_irq include/linux/irqdesc.h:176 [inline]
        gic_handle_irq+0x84/0xac drivers/irqchip/irq-gic.c:370
        __irq_svc+0x5c/0x94 arch/arm/kernel/entry-armv.S:205
        debug_smp_processor_id+0x0/0x24 lib/smp_processor_id.c:53
        rcu_read_lock_held_common kernel/rcu/update.c:108 [inline]
        rcu_read_lock_sched_held+0x24/0x7c kernel/rcu/update.c:123
        trace_lock_acquire+0x24c/0x278 include/trace/events/lock.h:13
        lock_acquire+0x3c/0x74 kernel/locking/lockdep.c:5481
        rcu_lock_acquire include/linux/rcupdate.h:267 [inline]
        rcu_read_lock include/linux/rcupdate.h:656 [inline]
        avc_has_perm_noaudit+0x6c/0x260 security/selinux/avc.c:1150
        selinux_inode_permission+0x140/0x220 security/selinux/hooks.c:3141
        security_inode_permission+0x44/0x60 security/security.c:1268
        inode_permission.part.0+0x5c/0x13c fs/namei.c:521
        inode_permission fs/namei.c:494 [inline]
        may_lookup fs/namei.c:1652 [inline]
        link_path_walk.part.0+0xd4/0x38c fs/namei.c:2208
        link_path_walk fs/namei.c:2189 [inline]
        path_lookupat+0x3c/0x1b8 fs/namei.c:2419
        filename_lookup+0xa8/0x1a4 fs/namei.c:2453
        user_path_at_empty+0x74/0x90 fs/namei.c:2733
        do_readlinkat+0x5c/0x12c fs/stat.c:417
        __do_sys_readlink fs/stat.c:450 [inline]
        sys_readlink+0x24/0x28 fs/stat.c:447
        ret_fast_syscall+0x0/0x2c arch/arm/mm/proc-v7.S:64
        0x7eaa4974
      irq event stamp: 298277
      hardirqs last  enabled at (298277): [<802000d0>] no_work_pending+0x4/0x34
      hardirqs last disabled at (298276): [<8020c9b8>] do_work_pending+0x9c/0x648 arch/arm/kernel/signal.c:676
      softirqs last  enabled at (298216): [<8020167c>] __do_softirq+0x584/0x7ac kernel/softirq.c:372
      softirqs last disabled at (298201): [<8024dff4>] do_softirq_own_stack include/asm-generic/softirq_stack.h:10 [inline]
      softirqs last disabled at (298201): [<8024dff4>] invoke_softirq kernel/softirq.c:228 [inline]
      softirqs last disabled at (298201): [<8024dff4>] __irq_exit_rcu+0x1d8/0x200 kernel/softirq.c:422
      
      other info that might help us debug this:
       Possible unsafe locking scenario:
      
             CPU0
             ----
        lock(&(&pstats->syncp)->seq);
        <Interrupt>
          lock(&(&pstats->syncp)->seq);
      
       *** DEADLOCK ***
      
      1 lock held by udevd/4013:
       #0: 82b09c5c (rcu_read_lock){....}-{1:2}, at: sk_filter_trim_cap+0x54/0x434 net/core/filter.c:139
      
      stack backtrace:
      CPU: 1 PID: 4013 Comm: udevd Not tainted 5.12.0-rc3-syzkaller #0
      Hardware name: ARM-Versatile Express
      Backtrace:
      [<81802550>] (dump_backtrace) from [<818027c4>] (show_stack+0x18/0x1c arch/arm/kernel/traps.c:252)
       r7:00000080 r6:600d0093 r5:00000000 r4:82b58344
      [<818027ac>] (show_stack) from [<81809e98>] (__dump_stack lib/dump_stack.c:79 [inline])
      [<818027ac>] (show_stack) from [<81809e98>] (dump_stack+0xb8/0xe8 lib/dump_stack.c:120)
      [<81809de0>] (dump_stack) from [<81804a00>] (print_usage_bug.part.0+0x228/0x230 kernel/locking/lockdep.c:3806)
       r7:86bcb768 r6:81a0326c r5:830f96a8 r4:86bcb0c0
      [<818047d8>] (print_usage_bug.part.0) from [<802bb1b8>] (print_usage_bug kernel/locking/lockdep.c:3776 [inline])
      [<818047d8>] (print_usage_bug.part.0) from [<802bb1b8>] (valid_state kernel/locking/lockdep.c:3818 [inline])
      [<818047d8>] (print_usage_bug.part.0) from [<802bb1b8>] (mark_lock_irq kernel/locking/lockdep.c:4021 [inline])
      [<818047d8>] (print_usage_bug.part.0) from [<802bb1b8>] (mark_lock.part.0+0xc34/0x136c kernel/locking/lockdep.c:4478)
       r10:83278fe8 r9:82c6d748 r8:00000000 r7:82c6d2d4 r6:00000004 r5:86bcb768
       r4:00000006
      [<802ba584>] (mark_lock.part.0) from [<802bc644>] (mark_lock kernel/locking/lockdep.c:4442 [inline])
      [<802ba584>] (mark_lock.part.0) from [<802bc644>] (mark_usage kernel/locking/lockdep.c:4391 [inline])
      [<802ba584>] (mark_lock.part.0) from [<802bc644>] (__lock_acquire+0x9bc/0x3318 kernel/locking/lockdep.c:4854)
       r10:86bcb768 r9:86bcb0c0 r8:00000001 r7:00040000 r6:0000075a r5:830f96a8
       r4:00000000
      [<802bbc88>] (__lock_acquire) from [<802bfb90>] (lock_acquire.part.0+0xf0/0x41c kernel/locking/lockdep.c:5510)
       r10:00000000 r9:600d0013 r8:00000000 r7:00000000 r6:828a2680 r5:828a2680
       r4:861e5bc8
      [<802bfaa0>] (lock_acquire.part.0) from [<802bff28>] (lock_acquire+0x6c/0x74 kernel/locking/lockdep.c:5483)
       r10:8146137c r9:00000000 r8:00000001 r7:00000000 r6:00000000 r5:00000000
       r4:ff7c9dec
      [<802bfebc>] (lock_acquire) from [<81381eb4>] (do_write_seqcount_begin_nested include/linux/seqlock.h:520 [inline])
      [<802bfebc>] (lock_acquire) from [<81381eb4>] (do_write_seqcount_begin include/linux/seqlock.h:545 [inline])
      [<802bfebc>] (lock_acquire) from [<81381eb4>] (u64_stats_update_begin include/linux/u64_stats_sync.h:129 [inline])
      [<802bfebc>] (lock_acquire) from [<81381eb4>] (__bpf_prog_run_save_cb include/linux/filter.h:727 [inline])
      [<802bfebc>] (lock_acquire) from [<81381eb4>] (bpf_prog_run_save_cb include/linux/filter.h:741 [inline])
      [<802bfebc>] (lock_acquire) from [<81381eb4>] (sk_filter_trim_cap+0x26c/0x434 net/core/filter.c:149)
       r10:a4095dd0 r9:ff7c9dd0 r8:e44be000 r7:8146137c r6:00000001 r5:8611ba80
       r4:00000000
      [<81381c48>] (sk_filter_trim_cap) from [<8146137c>] (sk_filter include/linux/filter.h:867 [inline])
      [<81381c48>] (sk_filter_trim_cap) from [<8146137c>] (do_one_broadcast net/netlink/af_netlink.c:1468 [inline])
      [<81381c48>] (sk_filter_trim_cap) from [<8146137c>] (netlink_broadcast_filtered+0x27c/0x4fc net/netlink/af_netlink.c:1520)
       r10:00000001 r9:833d6b1c r8:00000000 r7:8572f864 r6:8611ba80 r5:8698d800
       r4:8572f800
      [<81461100>] (netlink_broadcast_filtered) from [<81463e60>] (netlink_broadcast net/netlink/af_netlink.c:1544 [inline])
      [<81461100>] (netlink_broadcast_filtered) from [<81463e60>] (netlink_sendmsg+0x3d0/0x478 net/netlink/af_netlink.c:1925)
       r10:00000000 r9:00000002 r8:8698d800 r7:000000b7 r6:8611b900 r5:861e5f50
       r4:86aa3000
      [<81463a90>] (netlink_sendmsg) from [<81321f54>] (sock_sendmsg_nosec net/socket.c:654 [inline])
      [<81463a90>] (netlink_sendmsg) from [<81321f54>] (sock_sendmsg+0x3c/0x4c net/socket.c:674)
       r10:00000000 r9:861e5dd4 r8:00000000 r7:86570000 r6:00000000 r5:86570000
       r4:861e5f50
      [<81321f18>] (sock_sendmsg) from [<813234d0>] (____sys_sendmsg+0x230/0x29c net/socket.c:2350)
       r5:00000040 r4:861e5f50
      [<813232a0>] (____sys_sendmsg) from [<8132549c>] (___sys_sendmsg+0xac/0xe4 net/socket.c:2404)
       r10:00000128 r9:861e4000 r8:00000000 r7:00000000 r6:86570000 r5:861e5f50
       r4:00000000
      [<813253f0>] (___sys_sendmsg) from [<81325684>] (__sys_sendmsg net/socket.c:2433 [inline])
      [<813253f0>] (___sys_sendmsg) from [<81325684>] (__do_sys_sendmsg net/socket.c:2442 [inline])
      [<813253f0>] (___sys_sendmsg) from [<81325684>] (sys_sendmsg+0x58/0xa0 net/socket.c:2440)
       r8:80200224 r7:00000128 r6:00000000 r5:7eaa541c r4:86570000
      [<8132562c>] (sys_sendmsg) from [<80200060>] (ret_fast_syscall+0x0/0x2c arch/arm/mm/proc-v7.S:64)
      Exception stack(0x861e5fa8 to 0x861e5ff0)
      5fa0:                   00000000 00000000 0000000c 7eaa541c 00000000 00000000
      5fc0: 00000000 00000000 76fbf840 00000128 00000000 0000008f 7eaa541c 000563f8
      5fe0: 00056110 7eaa53e0 00036cec 76c9bf44
       r6:76fbf840 r5:00000000 r4:00000000
      
      Fixes: 492ecee8 ("bpf: enable program stats")
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20211026214133.3114279-2-eric.dumazet@gmail.com
      f941eadd
    • J
      libbpf: Deprecate bpf_objects_list · 689624f0
      Joe Burton 提交于
      Add a flag to `enum libbpf_strict_mode' to disable the global
      `bpf_objects_list', preventing race conditions when concurrent threads
      call bpf_object__open() or bpf_object__close().
      
      bpf_object__next() will return NULL if this option is set.
      
      Callers may achieve the same workflow by tracking bpf_objects in
      application code.
      
        [0] Closes: https://github.com/libbpf/libbpf/issues/293Signed-off-by: NJoe Burton <jevburton@google.com>
      Signed-off-by: NAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20211026223528.413950-1-jevburton.kernel@gmail.com
      689624f0
  3. 26 10月, 2021 22 次提交