1. 25 6月, 2019 1 次提交
  2. 19 6月, 2019 1 次提交
  3. 18 6月, 2019 6 次提交
    • A
      libbpf: allow specifying map definitions using BTF · abd29c93
      Andrii Nakryiko 提交于
      This patch adds support for a new way to define BPF maps. It relies on
      BTF to describe mandatory and optional attributes of a map, as well as
      captures type information of key and value naturally. This eliminates
      the need for BPF_ANNOTATE_KV_PAIR hack and ensures key/value sizes are
      always in sync with the key/value type.
      
      Relying on BTF, this approach allows for both forward and backward
      compatibility w.r.t. extending supported map definition features. By
      default, any unrecognized attributes are treated as an error, but it's
      possible relax this using MAPS_RELAX_COMPAT flag. New attributes, added
      in the future will need to be optional.
      
      The outline of the new map definition (short, BTF-defined maps) is as follows:
      1. All the maps should be defined in .maps ELF section. It's possible to
         have both "legacy" map definitions in `maps` sections and BTF-defined
         maps in .maps sections. Everything will still work transparently.
      2. The map declaration and initialization is done through
         a global/static variable of a struct type with few mandatory and
         extra optional fields:
         - type field is mandatory and specified type of BPF map;
         - key/value fields are mandatory and capture key/value type/size information;
         - max_entries attribute is optional; if max_entries is not specified or
           initialized, it has to be provided in runtime through libbpf API
           before loading bpf_object;
         - map_flags is optional and if not defined, will be assumed to be 0.
      3. Key/value fields should be **a pointer** to a type describing
         key/value. The pointee type is assumed (and will be recorded as such
         and used for size determination) to be a type describing key/value of
         the map. This is done to save excessive amounts of space allocated in
         corresponding ELF sections for key/value of big size.
      4. As some maps disallow having BTF type ID associated with key/value,
         it's possible to specify key/value size explicitly without
         associating BTF type ID with it. Use key_size and value_size fields
         to do that (see example below).
      
      Here's an example of simple ARRAY map defintion:
      
      struct my_value { int x, y, z; };
      
      struct {
      	int type;
      	int max_entries;
      	int *key;
      	struct my_value *value;
      } btf_map SEC(".maps") = {
      	.type = BPF_MAP_TYPE_ARRAY,
      	.max_entries = 16,
      };
      
      This will define BPF ARRAY map 'btf_map' with 16 elements. The key will
      be of type int and thus key size will be 4 bytes. The value is struct
      my_value of size 12 bytes. This map can be used from C code exactly the
      same as with existing maps defined through struct bpf_map_def.
      
      Here's an example of STACKMAP definition (which currently disallows BTF type
      IDs for key/value):
      
      struct {
      	__u32 type;
      	__u32 max_entries;
      	__u32 map_flags;
      	__u32 key_size;
      	__u32 value_size;
      } stackmap SEC(".maps") = {
      	.type = BPF_MAP_TYPE_STACK_TRACE,
      	.max_entries = 128,
      	.map_flags = BPF_F_STACK_BUILD_ID,
      	.key_size = sizeof(__u32),
      	.value_size = PERF_MAX_STACK_DEPTH * sizeof(struct bpf_stack_build_id),
      };
      
      This approach is naturally extended to support map-in-map, by making a value
      field to be another struct that describes inner map. This feature is not
      implemented yet. It's also possible to incrementally add features like pinning
      with full backwards and forward compatibility. Support for static
      initialization of BPF_MAP_TYPE_PROG_ARRAY using pointers to BPF programs
      is also on the roadmap.
      Signed-off-by: NAndrii Nakryiko <andriin@fb.com>
      Acked-by: NSong Liu <songliubraving@fb.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      abd29c93
    • A
      libbpf: split initialization and loading of BTF · 063183bf
      Andrii Nakryiko 提交于
      Libbpf does sanitization of BTF before loading it into kernel, if kernel
      doesn't support some of newer BTF features. This removes some of the
      important information from BTF (e.g., DATASEC and VAR description),
      which will be used for map construction. This patch splits BTF
      processing into initialization step, in which BTF is initialized from
      ELF and all the original data is still preserved; and
      sanitization/loading step, which ensures that BTF is safe to load into
      kernel. This allows to use full BTF information to construct maps, while
      still loading valid BTF into older kernels.
      Signed-off-by: NAndrii Nakryiko <andriin@fb.com>
      Acked-by: NSong Liu <songliubraving@fb.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      063183bf
    • A
      libbpf: identify maps by section index in addition to offset · db48814b
      Andrii Nakryiko 提交于
      To support maps to be defined in multiple sections, it's important to
      identify map not just by offset within its section, but section index as
      well. This patch adds tracking of section index.
      
      For global data, we record section index of corresponding
      .data/.bss/.rodata ELF section for uniformity, and thus don't need
      a special value of offset for those maps.
      Signed-off-by: NAndrii Nakryiko <andriin@fb.com>
      Acked-by: NSong Liu <songliubraving@fb.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      db48814b
    • A
      libbpf: refactor map initialization · bf829271
      Andrii Nakryiko 提交于
      User and global data maps initialization has gotten pretty complicated
      and unnecessarily convoluted. This patch splits out the logic for global
      data map and user-defined map initialization. It also removes the
      restriction of pre-calculating how many maps will be initialized,
      instead allowing to keep adding new maps as they are discovered, which
      will be used later for BTF-defined map definitions.
      Signed-off-by: NAndrii Nakryiko <andriin@fb.com>
      Acked-by: NSong Liu <songliubraving@fb.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      bf829271
    • A
      libbpf: streamline ELF parsing error-handling · 01b29d1d
      Andrii Nakryiko 提交于
      Simplify ELF parsing logic by exiting early, as there is no common clean
      up path to execute. That makes it unnecessary to track when err was set
      and when it was cleared. It also reduces nesting in some places.
      Signed-off-by: NAndrii Nakryiko <andriin@fb.com>
      Acked-by: NSong Liu <songliubraving@fb.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      01b29d1d
    • A
      libbpf: extract BTF loading logic · 9c6660d0
      Andrii Nakryiko 提交于
      As a preparation for adding BTF-based BPF map loading, extract .BTF and
      .BTF.ext loading logic.
      Signed-off-by: NAndrii Nakryiko <andriin@fb.com>
      Acked-by: NSong Liu <songliubraving@fb.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      9c6660d0
  4. 15 6月, 2019 1 次提交
  5. 11 6月, 2019 1 次提交
  6. 07 6月, 2019 1 次提交
  7. 01 6月, 2019 1 次提交
    • M
      libbpf: Return btf_fd for load_sk_storage_btf · cfd49210
      Michal Rostecki 提交于
      Before this change, function load_sk_storage_btf expected that
      libbpf__probe_raw_btf was returning a BTF descriptor, but in fact it was
      returning an information about whether the probe was successful (0 or
      1). load_sk_storage_btf was using that value as an argument of the close
      function, which was resulting in closing stdout and thus terminating the
      process which called that function.
      
      That bug was visible in bpftool. `bpftool feature` subcommand was always
      exiting too early (because of closed stdout) and it didn't display all
      requested probes. `bpftool -j feature` or `bpftool -p feature` were not
      returning a valid json object.
      
      This change renames the libbpf__probe_raw_btf function to
      libbpf__load_raw_btf, which now returns a BTF descriptor, as expected in
      load_sk_storage_btf.
      
      v2:
      - Fix typo in the commit message.
      
      v3:
      - Simplify BTF descriptor handling in bpf_object__probe_btf_* functions.
      - Rename libbpf__probe_raw_btf function to libbpf__load_raw_btf and
      return a BTF descriptor.
      
      v4:
      - Fix typo in the commit message.
      
      Fixes: d7c4b398 ("libbpf: detect supported kernel BTF features and sanitize BTF")
      Signed-off-by: NMichal Rostecki <mrostecki@opensuse.org>
      Acked-by: NAndrii Nakryiko <andriin@fb.com>
      Acked-by: NSong Liu <songliubraving@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      cfd49210
  8. 30 5月, 2019 10 次提交
  9. 28 5月, 2019 2 次提交
  10. 25 5月, 2019 1 次提交
  11. 17 5月, 2019 1 次提交
  12. 16 5月, 2019 1 次提交
  13. 13 5月, 2019 1 次提交
    • A
      libbpf: detect supported kernel BTF features and sanitize BTF · d7c4b398
      Andrii Nakryiko 提交于
      Depending on used versions of libbpf, Clang, and kernel, it's possible to
      have valid BPF object files with valid BTF information, that still won't
      load successfully due to Clang emitting newer BTF features (e.g.,
      BTF_KIND_FUNC, .BTF.ext's line_info/func_info, BTF_KIND_DATASEC, etc), that
      are not yet supported by older kernel.
      
      This patch adds detection of BTF features and sanitizes BPF object's BTF
      by substituting various supported BTF kinds, which have compatible layout:
        - BTF_KIND_FUNC -> BTF_KIND_TYPEDEF
        - BTF_KIND_FUNC_PROTO -> BTF_KIND_ENUM
        - BTF_KIND_VAR -> BTF_KIND_INT
        - BTF_KIND_DATASEC -> BTF_KIND_STRUCT
      
      Replacement is done in such a way as to preserve as much information as
      possible (names, sizes, etc) where possible without violating kernel's
      validation rules.
      
      v2->v3:
        - remove duplicate #defines from libbpf_util.h
      
      v1->v2:
        - add internal libbpf_internal.h w/ common stuff
        - switch SK storage BTF to use new libbpf__probe_raw_btf()
      Reported-by: NAlexei Starovoitov <ast@fb.com>
      Signed-off-by: NAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      d7c4b398
  14. 27 4月, 2019 1 次提交
  15. 26 4月, 2019 2 次提交
  16. 17 4月, 2019 1 次提交
  17. 13 4月, 2019 1 次提交
  18. 11 4月, 2019 1 次提交
    • A
      libbpf: Fix build with gcc-8 · d5adbdd7
      Andrey Ignatov 提交于
      Reported in [1].
      
      With gcc 8.3.0 the following error is issued:
      
        cc -Ibpf@sta -I. -I.. -I.././include -I.././include/uapi
        -fdiagnostics-color=always -fsanitize=address,undefined -fno-omit-frame-pointer
        -pipe -D_FILE_OFFSET_BITS=64 -Wall -Winvalid-pch -Werror -g -fPIC -g -O2
        -Werror -Wall -Wno-pointer-arith -Wno-sign-compare  -MD -MQ
        'bpf@sta/src_libbpf.c.o' -MF 'bpf@sta/src_libbpf.c.o.d' -o
        'bpf@sta/src_libbpf.c.o' -c ../src/libbpf.c
        ../src/libbpf.c: In function 'bpf_object__elf_collect':
        ../src/libbpf.c:947:18: error: 'map_def_sz' may be used uninitialized in this
        function [-Werror=maybe-uninitialized]
           if (map_def_sz <= sizeof(struct bpf_map_def)) {
               ~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
        ../src/libbpf.c:827:18: note: 'map_def_sz' was declared here
          int i, map_idx, map_def_sz, nr_syms, nr_maps = 0, nr_maps_glob = 0;
                          ^~~~~~~~~~
      
      According to [2] -Wmaybe-uninitialized is enabled by -Wall.
      Same error is generated by clang's -Wconditional-uninitialized.
      
      [1] https://github.com/libbpf/libbpf/pull/29#issuecomment-481902601
      [2] https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html
      
      Fixes: d859900c ("bpf, libbpf: support global data/bss/rodata sections")
      Reported-by: NEvgeny Vereshchagin <evvers@ya.ru>
      Signed-off-by: NAndrey Ignatov <rdna@fb.com>
      Acked-by: NYonghong Song <yhs@fb.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      d5adbdd7
  19. 10 4月, 2019 3 次提交
    • D
      bpf, libbpf: add support for BTF Var and DataSec · 1713d68b
      Daniel Borkmann 提交于
      This adds libbpf support for BTF Var and DataSec kinds. Main point
      here is that libbpf needs to do some preparatory work before the
      whole BTF object can be loaded into the kernel, that is, fixing up
      of DataSec size taken from the ELF section size and non-static
      variable offset which needs to be taken from the ELF's string section.
      
      Upstream LLVM doesn't fix these up since at time of BTF emission
      it is too early in the compilation process thus this information
      isn't available yet, hence loader needs to take care of it.
      
      Note, deduplication handling has not been in the scope of this work
      and needs to be addressed in a future commit.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Link: https://reviews.llvm.org/D59441Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      1713d68b
    • D
      bpf, libbpf: support global data/bss/rodata sections · d859900c
      Daniel Borkmann 提交于
      This work adds BPF loader support for global data sections
      to libbpf. This allows to write BPF programs in more natural
      C-like way by being able to define global variables and const
      data.
      
      Back at LPC 2018 [0] we presented a first prototype which
      implemented support for global data sections by extending BPF
      syscall where union bpf_attr would get additional memory/size
      pair for each section passed during prog load in order to later
      add this base address into the ldimm64 instruction along with
      the user provided offset when accessing a variable. Consensus
      from LPC was that for proper upstream support, it would be
      more desirable to use maps instead of bpf_attr extension as
      this would allow for introspection of these sections as well
      as potential live updates of their content. This work follows
      this path by taking the following steps from loader side:
      
       1) In bpf_object__elf_collect() step we pick up ".data",
          ".rodata", and ".bss" section information.
      
       2) If present, in bpf_object__init_internal_map() we add
          maps to the obj's map array that corresponds to each
          of the present sections. Given section size and access
          properties can differ, a single entry array map is
          created with value size that is corresponding to the
          ELF section size of .data, .bss or .rodata. These
          internal maps are integrated into the normal map
          handling of libbpf such that when user traverses all
          obj maps, they can be differentiated from user-created
          ones via bpf_map__is_internal(). In later steps when
          we actually create these maps in the kernel via
          bpf_object__create_maps(), then for .data and .rodata
          sections their content is copied into the map through
          bpf_map_update_elem(). For .bss this is not necessary
          since array map is already zero-initialized by default.
          Additionally, for .rodata the map is frozen as read-only
          after setup, such that neither from program nor syscall
          side writes would be possible.
      
       3) In bpf_program__collect_reloc() step, we record the
          corresponding map, insn index, and relocation type for
          the global data.
      
       4) And last but not least in the actual relocation step in
          bpf_program__relocate(), we mark the ldimm64 instruction
          with src_reg = BPF_PSEUDO_MAP_VALUE where in the first
          imm field the map's file descriptor is stored as similarly
          done as in BPF_PSEUDO_MAP_FD, and in the second imm field
          (as ldimm64 is 2-insn wide) we store the access offset
          into the section. Given these maps have only single element
          ldimm64's off remains zero in both parts.
      
       5) On kernel side, this special marked BPF_PSEUDO_MAP_VALUE
          load will then store the actual target address in order
          to have a 'map-lookup'-free access. That is, the actual
          map value base address + offset. The destination register
          in the verifier will then be marked as PTR_TO_MAP_VALUE,
          containing the fixed offset as reg->off and backing BPF
          map as reg->map_ptr. Meaning, it's treated as any other
          normal map value from verification side, only with
          efficient, direct value access instead of actual call to
          map lookup helper as in the typical case.
      
      Currently, only support for static global variables has been
      added, and libbpf rejects non-static global variables from
      loading. This can be lifted until we have proper semantics
      for how BPF will treat multi-object BPF loads. From BTF side,
      libbpf will set the value type id of the types corresponding
      to the ".bss", ".data" and ".rodata" names which LLVM will
      emit without the object name prefix. The key type will be
      left as zero, thus making use of the key-less BTF option in
      array maps.
      
      Simple example dump of program using globals vars in each
      section:
      
        # bpftool prog
        [...]
        6784: sched_cls  name load_static_dat  tag a7e1291567277844  gpl
              loaded_at 2019-03-11T15:39:34+0000  uid 0
              xlated 1776B  jited 993B  memlock 4096B  map_ids 2238,2237,2235,2236,2239,2240
      
        # bpftool map show id 2237
        2237: array  name test_glo.bss  flags 0x0
              key 4B  value 64B  max_entries 1  memlock 4096B
        # bpftool map show id 2235
        2235: array  name test_glo.data  flags 0x0
              key 4B  value 64B  max_entries 1  memlock 4096B
        # bpftool map show id 2236
        2236: array  name test_glo.rodata  flags 0x80
              key 4B  value 96B  max_entries 1  memlock 4096B
      
        # bpftool prog dump xlated id 6784
        int load_static_data(struct __sk_buff * skb):
        ; int load_static_data(struct __sk_buff *skb)
           0: (b7) r6 = 0
        ; test_reloc(number, 0, &num0);
           1: (63) *(u32 *)(r10 -4) = r6
           2: (bf) r2 = r10
        ; int load_static_data(struct __sk_buff *skb)
           3: (07) r2 += -4
        ; test_reloc(number, 0, &num0);
           4: (18) r1 = map[id:2238]
           6: (18) r3 = map[id:2237][0]+0    <-- direct addr in .bss area
           8: (b7) r4 = 0
           9: (85) call array_map_update_elem#100464
          10: (b7) r1 = 1
        ; test_reloc(number, 1, &num1);
        [...]
        ; test_reloc(string, 2, str2);
         120: (18) r8 = map[id:2237][0]+16   <-- same here at offset +16
         122: (18) r1 = map[id:2239]
         124: (18) r3 = map[id:2237][0]+16
         126: (b7) r4 = 0
         127: (85) call array_map_update_elem#100464
         128: (b7) r1 = 120
        ; str1[5] = 'x';
         129: (73) *(u8 *)(r9 +5) = r1
        ; test_reloc(string, 3, str1);
         130: (b7) r1 = 3
         131: (63) *(u32 *)(r10 -4) = r1
         132: (b7) r9 = 3
         133: (bf) r2 = r10
        ; int load_static_data(struct __sk_buff *skb)
         134: (07) r2 += -4
        ; test_reloc(string, 3, str1);
         135: (18) r1 = map[id:2239]
         137: (18) r3 = map[id:2235][0]+16   <-- direct addr in .data area
         139: (b7) r4 = 0
         140: (85) call array_map_update_elem#100464
         141: (b7) r1 = 111
        ; __builtin_memcpy(&str2[2], "hello", sizeof("hello"));
         142: (73) *(u8 *)(r8 +6) = r1       <-- further access based on .bss data
         143: (b7) r1 = 108
         144: (73) *(u8 *)(r8 +5) = r1
        [...]
      
      For Cilium use-case in particular, this enables migrating configuration
      constants from Cilium daemon's generated header defines into global
      data sections such that expensive runtime recompilations with LLVM can
      be avoided altogether. Instead, the ELF file becomes effectively a
      "template", meaning, it is compiled only once (!) and the Cilium daemon
      will then rewrite relevant configuration data from the ELF's .data or
      .rodata sections directly instead of recompiling the program. The
      updated ELF is then loaded into the kernel and atomically replaces
      the existing program in the networking datapath. More info in [0].
      
      Based upon recent fix in LLVM, commit c0db6b6bd444 ("[BPF] Don't fail
      for static variables").
      
        [0] LPC 2018, BPF track, "ELF relocation for static data in BPF",
            http://vger.kernel.org/lpc-bpf2018.html#session-3Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAndrii Nakryiko <andriin@fb.com>
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      d859900c
    • J
      bpf, libbpf: refactor relocation handling · f8c7a4d4
      Joe Stringer 提交于
      Adjust the code for relocations slightly with no functional changes,
      so that upcoming patches that will introduce support for relocations
      into the .data, .rodata and .bss sections can be added independent
      of these changes.
      Signed-off-by: NJoe Stringer <joe@wand.net.nz>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAndrii Nakryiko <andriin@fb.com>
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      f8c7a4d4
  20. 07 4月, 2019 1 次提交
  21. 04 4月, 2019 1 次提交
    • A
      libbpf: teach libbpf about log_level bit 2 · da11b417
      Alexei Starovoitov 提交于
      Allow bpf_prog_load_xattr() to specify log_level for program loading.
      
      Teach libbpf to accept log_level with bit 2 set.
      
      Increase default BPF_LOG_BUF_SIZE from 256k to 16M.
      There is no downside to increase it to a maximum allowed by old kernels.
      Existing 256k limit caused ENOSPC errors and users were not able to see
      verifier error which is printed at the end of the verifier log.
      
      If ENOSPC is hit, double the verifier log and try again to capture
      the verifier error.
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      da11b417
  22. 20 3月, 2019 1 次提交