1. 16 1月, 2020 1 次提交
  2. 10 1月, 2020 1 次提交
    • M
      bpf: libbpf: Add STRUCT_OPS support · 590a0088
      Martin KaFai Lau 提交于
      This patch adds BPF STRUCT_OPS support to libbpf.
      
      The only sec_name convention is SEC(".struct_ops") to identify the
      struct_ops implemented in BPF,
      e.g. To implement a tcp_congestion_ops:
      
      SEC(".struct_ops")
      struct tcp_congestion_ops dctcp = {
      	.init           = (void *)dctcp_init,  /* <-- a bpf_prog */
      	/* ... some more func prts ... */
      	.name           = "bpf_dctcp",
      };
      
      Each struct_ops is defined as a global variable under SEC(".struct_ops")
      as above.  libbpf creates a map for each variable and the variable name
      is the map's name.  Multiple struct_ops is supported under
      SEC(".struct_ops").
      
      In the bpf_object__open phase, libbpf will look for the SEC(".struct_ops")
      section and find out what is the btf-type the struct_ops is
      implementing.  Note that the btf-type here is referring to
      a type in the bpf_prog.o's btf.  A "struct bpf_map" is added
      by bpf_object__add_map() as other maps do.  It will then
      collect (through SHT_REL) where are the bpf progs that the
      func ptrs are referring to.  No btf_vmlinux is needed in
      the open phase.
      
      In the bpf_object__load phase, the map-fields, which depend
      on the btf_vmlinux, are initialized (in bpf_map__init_kern_struct_ops()).
      It will also set the prog->type, prog->attach_btf_id, and
      prog->expected_attach_type.  Thus, the prog's properties do
      not rely on its section name.
      [ Currently, the bpf_prog's btf-type ==> btf_vmlinux's btf-type matching
        process is as simple as: member-name match + btf-kind match + size match.
        If these matching conditions fail, libbpf will reject.
        The current targeting support is "struct tcp_congestion_ops" which
        most of its members are function pointers.
        The member ordering of the bpf_prog's btf-type can be different from
        the btf_vmlinux's btf-type. ]
      
      Then, all obj->maps are created as usual (in bpf_object__create_maps()).
      
      Once the maps are created and prog's properties are all set,
      the libbpf will proceed to load all the progs.
      
      bpf_map__attach_struct_ops() is added to register a struct_ops
      map to a kernel subsystem.
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20200109003514.3856730-1-kafai@fb.com
      590a0088
  3. 09 1月, 2020 1 次提交
  4. 20 12月, 2019 1 次提交
    • A
      libbpf: Introduce bpf_prog_attach_xattr · cdbee383
      Andrey Ignatov 提交于
      Introduce a new bpf_prog_attach_xattr function that, in addition to
      program fd, target fd and attach type, accepts an extendable struct
      bpf_prog_attach_opts.
      
      bpf_prog_attach_opts relies on DECLARE_LIBBPF_OPTS macro to maintain
      backward and forward compatibility and has the following "optional"
      attach attributes:
      
      * existing attach_flags, since it's not required when attaching in NONE
        mode. Even though it's quite often used in MULTI and OVERRIDE mode it
        seems to be a good idea to reduce number of arguments to
        bpf_prog_attach_xattr;
      
      * newly introduced attribute of BPF_PROG_ATTACH command: replace_prog_fd
        that is fd of previously attached cgroup-bpf program to replace if
        BPF_F_REPLACE flag is used.
      
      The new function is named to be consistent with other xattr-functions
      (bpf_prog_test_run_xattr, bpf_create_map_xattr, bpf_load_program_xattr).
      
      The struct bpf_prog_attach_opts is supposed to be used with
      DECLARE_LIBBPF_OPTS macro.
      Signed-off-by: NAndrey Ignatov <rdna@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NAndrii Nakryiko <andriin@fb.com>
      Link: https://lore.kernel.org/bpf/bd6e0732303eb14e4b79cb128268d9e9ad6db208.1576741281.git.rdna@fb.com
      cdbee383
  5. 19 12月, 2019 1 次提交
    • A
      libbpf: Add bpf_link__disconnect() API to preserve underlying BPF resource · d6958706
      Andrii Nakryiko 提交于
      There are cases in which BPF resource (program, map, etc) has to outlive
      userspace program that "installed" it in the system in the first place.
      When BPF program is attached, libbpf returns bpf_link object, which
      is supposed to be destroyed after no longer necessary through
      bpf_link__destroy() API. Currently, bpf_link destruction causes both automatic
      detachment and frees up any resources allocated to for bpf_link in-memory
      representation. This is inconvenient for the case described above because of
      coupling of detachment and resource freeing.
      
      This patch introduces bpf_link__disconnect() API call, which marks bpf_link as
      disconnected from its underlying BPF resouces. This means that when bpf_link
      is destroyed later, all its memory resources will be freed, but BPF resource
      itself won't be detached.
      
      This design allows to follow strict and resource-leak-free design by default,
      while giving easy and straightforward way for user code to opt for keeping BPF
      resource attached beyond lifetime of a bpf_link. For some BPF programs (i.e.,
      FS-based tracepoints, kprobes, raw tracepoint, etc), user has to make sure to
      pin BPF program to prevent kernel to automatically detach it on process exit.
      This should typically be achived by pinning BPF program (or map in some cases)
      in BPF FS.
      Signed-off-by: NAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Link: https://lore.kernel.org/bpf/20191218225039.2668205-1-andriin@fb.com
      d6958706
  6. 16 12月, 2019 5 次提交
  7. 11 12月, 2019 1 次提交
  8. 16 11月, 2019 2 次提交
  9. 11 11月, 2019 2 次提交
  10. 03 11月, 2019 1 次提交
  11. 31 10月, 2019 1 次提交
  12. 21 10月, 2019 1 次提交
  13. 06 10月, 2019 1 次提交
    • A
      libbpf: add bpf_object__open_{file, mem} w/ extensible opts · 2ce8450e
      Andrii Nakryiko 提交于
      Add new set of bpf_object__open APIs using new approach to optional
      parameters extensibility allowing simpler ABI compatibility approach.
      
      This patch demonstrates an approach to implementing libbpf APIs that
      makes it easy to extend existing APIs with extra optional parameters in
      such a way, that ABI compatibility is preserved without having to do
      symbol versioning and generating lots of boilerplate code to handle it.
      To facilitate succinct code for working with options, add OPTS_VALID,
      OPTS_HAS, and OPTS_GET macros that hide all the NULL, size, and zero
      checks.
      
      Additionally, newly added libbpf APIs are encouraged to follow similar
      pattern of having all mandatory parameters as formal function parameters
      and always have optional (NULL-able) xxx_opts struct, which should
      always have real struct size as a first field and the rest would be
      optional parameters added over time, which tune the behavior of existing
      API, if specified by user.
      Signed-off-by: NAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      2ce8450e
  14. 02 10月, 2019 1 次提交
  15. 31 8月, 2019 1 次提交
  16. 21 8月, 2019 1 次提交
  17. 16 8月, 2019 1 次提交
    • A
      libbpf: make libbpf.map source of truth for libbpf version · dadb81d0
      Andrii Nakryiko 提交于
      Currently libbpf version is specified in 2 places: libbpf.map and
      Makefile. They easily get out of sync and it's very easy to update one,
      but forget to update another one. In addition, Github projection of
      libbpf has to maintain its own version which has to be remembered to be
      kept in sync manually, which is very error-prone approach.
      
      This patch makes libbpf.map a source of truth for libbpf version and
      uses shell invocation to parse out correct full and major libbpf version
      to use during build. Now we need to make sure that once new release
      cycle starts, we need to add (initially) empty section to libbpf.map
      with correct latest version.
      
      This also will make it possible to keep Github projection consistent
      with kernel sources version of libbpf by adopting similar parsing of
      version from libbpf.map.
      
      v2->v3:
      - grep -o + sort -rV (Andrey);
      
      v1->v2:
      - eager version vars evaluation (Jakub);
      - simplified version regex (Andrey);
      
      Cc: Andrey Ignatov <rdna@fb.com>
      Signed-off-by: NAndrii Nakryiko <andriin@fb.com>
      Acked-by: NAndrey Ignatov <rdna@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      dadb81d0
  18. 08 7月, 2019 1 次提交
    • A
      libbpf: add perf buffer API · fb84b822
      Andrii Nakryiko 提交于
      BPF_MAP_TYPE_PERF_EVENT_ARRAY map is often used to send data from BPF program
      to user space for additional processing. libbpf already has very low-level API
      to read single CPU perf buffer, bpf_perf_event_read_simple(), but it's hard to
      use and requires a lot of code to set everything up. This patch adds
      perf_buffer abstraction on top of it, abstracting setting up and polling
      per-CPU logic into simple and convenient API, similar to what BCC provides.
      
      perf_buffer__new() sets up per-CPU ring buffers and updates corresponding BPF
      map entries. It accepts two user-provided callbacks: one for handling raw
      samples and one for get notifications of lost samples due to buffer overflow.
      
      perf_buffer__new_raw() is similar, but provides more control over how
      perf events are set up (by accepting user-provided perf_event_attr), how
      they are handled (perf_event_header pointer is passed directly to
      user-provided callback), and on which CPUs ring buffers are created
      (it's possible to provide a list of CPUs and corresponding map keys to
      update). This API allows advanced users fuller control.
      
      perf_buffer__poll() is used to fetch ring buffer data across all CPUs,
      utilizing epoll instance.
      
      perf_buffer__free() does corresponding clean up and unsets FDs from BPF map.
      
      All APIs are not thread-safe. User should ensure proper locking/coordination if
      used in multi-threaded set up.
      Signed-off-by: NAndrii Nakryiko <andriin@fb.com>
      Acked-by: NYonghong Song <yhs@fb.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      fb84b822
  19. 06 7月, 2019 5 次提交
  20. 11 6月, 2019 1 次提交
  21. 28 5月, 2019 1 次提交
  22. 25 5月, 2019 2 次提交
    • A
      libbpf: add btf_dump API for BTF-to-C conversion · 351131b5
      Andrii Nakryiko 提交于
      BTF contains enough type information to allow generating valid
      compilable C header w/ correct layout of structs/unions and all the
      typedef/enum definitions. This patch adds a new "object" - btf_dump to
      facilitate dumping BTF as valid C. btf_dump__dump_type() is the main API
      which takes care of dumping out (through user-provided printf-like
      callback function) C definitions for given type ID and it's required
      dependencies. This allows for not just dumping out entirety of BTF types,
      but also selective filtering based on user-provided criterias w/ minimal
      set of dependent types.
      Signed-off-by: NAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      351131b5
    • A
      libbpf: add btf__parse_elf API to load .BTF and .BTF.ext · e6c64855
      Andrii Nakryiko 提交于
      Loading BTF and BTF.ext from ELF file is a common need. Instead of
      requiring every user to re-implement it, let's provide this API from
      libbpf itself. It's mostly copy/paste from `bpftool btf dump`
      implementation, which will be switched to libbpf's version in next
      patch. btf__parse_elf allows to load BTF and optionally BTF.ext.
      This is also useful for tests that need to load/work with BTF, loaded
      from test ELF files.
      Signed-off-by: NAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      e6c64855
  23. 10 4月, 2019 2 次提交
    • D
      bpf, libbpf: add support for BTF Var and DataSec · 1713d68b
      Daniel Borkmann 提交于
      This adds libbpf support for BTF Var and DataSec kinds. Main point
      here is that libbpf needs to do some preparatory work before the
      whole BTF object can be loaded into the kernel, that is, fixing up
      of DataSec size taken from the ELF section size and non-static
      variable offset which needs to be taken from the ELF's string section.
      
      Upstream LLVM doesn't fix these up since at time of BTF emission
      it is too early in the compilation process thus this information
      isn't available yet, hence loader needs to take care of it.
      
      Note, deduplication handling has not been in the scope of this work
      and needs to be addressed in a future commit.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Link: https://reviews.llvm.org/D59441Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      1713d68b
    • D
      bpf, libbpf: support global data/bss/rodata sections · d859900c
      Daniel Borkmann 提交于
      This work adds BPF loader support for global data sections
      to libbpf. This allows to write BPF programs in more natural
      C-like way by being able to define global variables and const
      data.
      
      Back at LPC 2018 [0] we presented a first prototype which
      implemented support for global data sections by extending BPF
      syscall where union bpf_attr would get additional memory/size
      pair for each section passed during prog load in order to later
      add this base address into the ldimm64 instruction along with
      the user provided offset when accessing a variable. Consensus
      from LPC was that for proper upstream support, it would be
      more desirable to use maps instead of bpf_attr extension as
      this would allow for introspection of these sections as well
      as potential live updates of their content. This work follows
      this path by taking the following steps from loader side:
      
       1) In bpf_object__elf_collect() step we pick up ".data",
          ".rodata", and ".bss" section information.
      
       2) If present, in bpf_object__init_internal_map() we add
          maps to the obj's map array that corresponds to each
          of the present sections. Given section size and access
          properties can differ, a single entry array map is
          created with value size that is corresponding to the
          ELF section size of .data, .bss or .rodata. These
          internal maps are integrated into the normal map
          handling of libbpf such that when user traverses all
          obj maps, they can be differentiated from user-created
          ones via bpf_map__is_internal(). In later steps when
          we actually create these maps in the kernel via
          bpf_object__create_maps(), then for .data and .rodata
          sections their content is copied into the map through
          bpf_map_update_elem(). For .bss this is not necessary
          since array map is already zero-initialized by default.
          Additionally, for .rodata the map is frozen as read-only
          after setup, such that neither from program nor syscall
          side writes would be possible.
      
       3) In bpf_program__collect_reloc() step, we record the
          corresponding map, insn index, and relocation type for
          the global data.
      
       4) And last but not least in the actual relocation step in
          bpf_program__relocate(), we mark the ldimm64 instruction
          with src_reg = BPF_PSEUDO_MAP_VALUE where in the first
          imm field the map's file descriptor is stored as similarly
          done as in BPF_PSEUDO_MAP_FD, and in the second imm field
          (as ldimm64 is 2-insn wide) we store the access offset
          into the section. Given these maps have only single element
          ldimm64's off remains zero in both parts.
      
       5) On kernel side, this special marked BPF_PSEUDO_MAP_VALUE
          load will then store the actual target address in order
          to have a 'map-lookup'-free access. That is, the actual
          map value base address + offset. The destination register
          in the verifier will then be marked as PTR_TO_MAP_VALUE,
          containing the fixed offset as reg->off and backing BPF
          map as reg->map_ptr. Meaning, it's treated as any other
          normal map value from verification side, only with
          efficient, direct value access instead of actual call to
          map lookup helper as in the typical case.
      
      Currently, only support for static global variables has been
      added, and libbpf rejects non-static global variables from
      loading. This can be lifted until we have proper semantics
      for how BPF will treat multi-object BPF loads. From BTF side,
      libbpf will set the value type id of the types corresponding
      to the ".bss", ".data" and ".rodata" names which LLVM will
      emit without the object name prefix. The key type will be
      left as zero, thus making use of the key-less BTF option in
      array maps.
      
      Simple example dump of program using globals vars in each
      section:
      
        # bpftool prog
        [...]
        6784: sched_cls  name load_static_dat  tag a7e1291567277844  gpl
              loaded_at 2019-03-11T15:39:34+0000  uid 0
              xlated 1776B  jited 993B  memlock 4096B  map_ids 2238,2237,2235,2236,2239,2240
      
        # bpftool map show id 2237
        2237: array  name test_glo.bss  flags 0x0
              key 4B  value 64B  max_entries 1  memlock 4096B
        # bpftool map show id 2235
        2235: array  name test_glo.data  flags 0x0
              key 4B  value 64B  max_entries 1  memlock 4096B
        # bpftool map show id 2236
        2236: array  name test_glo.rodata  flags 0x80
              key 4B  value 96B  max_entries 1  memlock 4096B
      
        # bpftool prog dump xlated id 6784
        int load_static_data(struct __sk_buff * skb):
        ; int load_static_data(struct __sk_buff *skb)
           0: (b7) r6 = 0
        ; test_reloc(number, 0, &num0);
           1: (63) *(u32 *)(r10 -4) = r6
           2: (bf) r2 = r10
        ; int load_static_data(struct __sk_buff *skb)
           3: (07) r2 += -4
        ; test_reloc(number, 0, &num0);
           4: (18) r1 = map[id:2238]
           6: (18) r3 = map[id:2237][0]+0    <-- direct addr in .bss area
           8: (b7) r4 = 0
           9: (85) call array_map_update_elem#100464
          10: (b7) r1 = 1
        ; test_reloc(number, 1, &num1);
        [...]
        ; test_reloc(string, 2, str2);
         120: (18) r8 = map[id:2237][0]+16   <-- same here at offset +16
         122: (18) r1 = map[id:2239]
         124: (18) r3 = map[id:2237][0]+16
         126: (b7) r4 = 0
         127: (85) call array_map_update_elem#100464
         128: (b7) r1 = 120
        ; str1[5] = 'x';
         129: (73) *(u8 *)(r9 +5) = r1
        ; test_reloc(string, 3, str1);
         130: (b7) r1 = 3
         131: (63) *(u32 *)(r10 -4) = r1
         132: (b7) r9 = 3
         133: (bf) r2 = r10
        ; int load_static_data(struct __sk_buff *skb)
         134: (07) r2 += -4
        ; test_reloc(string, 3, str1);
         135: (18) r1 = map[id:2239]
         137: (18) r3 = map[id:2235][0]+16   <-- direct addr in .data area
         139: (b7) r4 = 0
         140: (85) call array_map_update_elem#100464
         141: (b7) r1 = 111
        ; __builtin_memcpy(&str2[2], "hello", sizeof("hello"));
         142: (73) *(u8 *)(r8 +6) = r1       <-- further access based on .bss data
         143: (b7) r1 = 108
         144: (73) *(u8 *)(r8 +5) = r1
        [...]
      
      For Cilium use-case in particular, this enables migrating configuration
      constants from Cilium daemon's generated header defines into global
      data sections such that expensive runtime recompilations with LLVM can
      be avoided altogether. Instead, the ELF file becomes effectively a
      "template", meaning, it is compiled only once (!) and the Cilium daemon
      will then rewrite relevant configuration data from the ELF's .data or
      .rodata sections directly instead of recompiling the program. The
      updated ELF is then loaded into the kernel and atomically replaces
      the existing program in the networking datapath. More info in [0].
      
      Based upon recent fix in LLVM, commit c0db6b6bd444 ("[BPF] Don't fail
      for static variables").
      
        [0] LPC 2018, BPF track, "ELF relocation for static data in BPF",
            http://vger.kernel.org/lpc-bpf2018.html#session-3Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAndrii Nakryiko <andriin@fb.com>
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      d859900c
  24. 20 3月, 2019 1 次提交
  25. 26 2月, 2019 1 次提交
    • M
      libbpf: add support for using AF_XDP sockets · 1cad0788
      Magnus Karlsson 提交于
      This commit adds AF_XDP support to libbpf. The main reason for this is
      to facilitate writing applications that use AF_XDP by offering
      higher-level APIs that hide many of the details of the AF_XDP
      uapi. This is in the same vein as libbpf facilitates XDP adoption by
      offering easy-to-use higher level interfaces of XDP
      functionality. Hopefully this will facilitate adoption of AF_XDP, make
      applications using it simpler and smaller, and finally also make it
      possible for applications to benefit from optimizations in the AF_XDP
      user space access code. Previously, people just copied and pasted the
      code from the sample application into their application, which is not
      desirable.
      
      The interface is composed of two parts:
      
      * Low-level access interface to the four rings and the packet
      * High-level control plane interface for creating and setting
        up umems and af_xdp sockets as well as a simple XDP program.
      Tested-by: NBjörn Töpel <bjorn.topel@intel.com>
      Signed-off-by: NMagnus Karlsson <magnus.karlsson@intel.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      1cad0788
  26. 15 2月, 2019 2 次提交
    • A
      libbpf: Introduce bpf_object__btf · 789f6bab
      Andrey Ignatov 提交于
      Add new accessor for bpf_object to get opaque struct btf * from it.
      
      struct btf * is needed for all operations with BTF and it's present in
      bpf_object. The only thing missing is a way to get it.
      
      Example use-case is to get BTF key_type_id and value_type_id for a map in
      bpf_object. It can be done with btf__get_map_kv_tids() but that function
      requires struct btf *.
      
      Similar API can be added for struct btf_ext but no use-case for it yet.
      Signed-off-by: NAndrey Ignatov <rdna@fb.com>
      Acked-by: NYonghong Song <yhs@fb.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      789f6bab
    • A
      libbpf: Introduce bpf_map__resize · 1a11a4c7
      Andrey Ignatov 提交于
      Add bpf_map__resize() to change max_entries for a map.
      
      Quite often necessary map size is unknown at compile time and can be
      calculated only at run time.
      
      Currently the following approach is used to do so:
      * bpf_object__open_buffer() to open Elf file from a buffer;
      * bpf_object__find_map_by_name() to find relevant map;
      * bpf_map__def() to get map attributes and create struct
        bpf_create_map_attr from them;
      * update max_entries in bpf_create_map_attr;
      * bpf_create_map_xattr() to create new map with updated max_entries;
      * bpf_map__reuse_fd() to replace the map in bpf_object with newly
        created one.
      
      And after all this bpf_object can finally be loaded. The map will have
      new size.
      
      It 1) is quite a lot of steps; 2) doesn't take BTF into account.
      
      For "2)" even more steps should be made and some of them require changes
      to libbpf (e.g. to get struct btf * from bpf_object).
      
      Instead the whole problem can be solved by introducing simple
      bpf_map__resize() API that checks the map and sets new max_entries if
      the map is not loaded yet.
      
      So the new steps are:
      * bpf_object__open_buffer() to open Elf file from a buffer;
      * bpf_object__find_map_by_name() to find relevant map;
      * bpf_map__resize() to update max_entries.
      
      That's much simpler and works with BTF.
      Signed-off-by: NAndrey Ignatov <rdna@fb.com>
      Acked-by: NYonghong Song <yhs@fb.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      1a11a4c7
  27. 09 2月, 2019 1 次提交