1. 17 5月, 2021 1 次提交
    • K
      libbpf: Add low level TC-BPF management API · 715c5ce4
      Kumar Kartikeya Dwivedi 提交于
      This adds functions that wrap the netlink API used for adding, manipulating,
      and removing traffic control filters.
      
      The API summary:
      
      A bpf_tc_hook represents a location where a TC-BPF filter can be attached.
      This means that creating a hook leads to creation of the backing qdisc,
      while destruction either removes all filters attached to a hook, or destroys
      qdisc if requested explicitly (as discussed below).
      
      The TC-BPF API functions operate on this bpf_tc_hook to attach, replace,
      query, and detach tc filters. All functions return 0 on success, and a
      negative error code on failure.
      
      bpf_tc_hook_create - Create a hook
      Parameters:
      	@hook - Cannot be NULL, ifindex > 0, attach_point must be set to
      		proper enum constant. Note that parent must be unset when
      		attach_point is one of BPF_TC_INGRESS or BPF_TC_EGRESS. Note
      		that as an exception BPF_TC_INGRESS|BPF_TC_EGRESS is also a
      		valid value for attach_point.
      
      		Returns -EOPNOTSUPP when hook has attach_point as BPF_TC_CUSTOM.
      
      bpf_tc_hook_destroy - Destroy a hook
      Parameters:
      	@hook - Cannot be NULL. The behaviour depends on value of
      		attach_point. If BPF_TC_INGRESS, all filters attached to
      		the ingress hook will be detached. If BPF_TC_EGRESS, all
      		filters attached to the egress hook will be detached. If
      		BPF_TC_INGRESS|BPF_TC_EGRESS, the clsact qdisc will be
      		deleted, also detaching all filters. As before, parent must
      		be unset for these attach_points, and set for BPF_TC_CUSTOM.
      
      		It is advised that if the qdisc is operated on by many programs,
      		then the program at least check that there are no other existing
      		filters before deleting the clsact qdisc. An example is shown
      		below:
      
      		DECLARE_LIBBPF_OPTS(bpf_tc_hook, .ifindex = if_nametoindex("lo"),
      				    .attach_point = BPF_TC_INGRESS);
      		/* set opts as NULL, as we're not really interested in
      		 * getting any info for a particular filter, but just
      	 	 * detecting its presence.
      		 */
      		r = bpf_tc_query(&hook, NULL);
      		if (r == -ENOENT) {
      			/* no filters */
      			hook.attach_point = BPF_TC_INGRESS|BPF_TC_EGREESS;
      			return bpf_tc_hook_destroy(&hook);
      		} else {
      			/* failed or r == 0, the latter means filters do exist */
      			return r;
      		}
      
      		Note that there is a small race between checking for no
      		filters and deleting the qdisc. This is currently unavoidable.
      
      		Returns -EOPNOTSUPP when hook has attach_point as BPF_TC_CUSTOM.
      
      bpf_tc_attach - Attach a filter to a hook
      Parameters:
      	@hook - Cannot be NULL. Represents the hook the filter will be
      		attached to. Requirements for ifindex and attach_point are
      		same as described in bpf_tc_hook_create, but BPF_TC_CUSTOM
      		is also supported.  In that case, parent must be set to the
      		handle where the filter will be attached (using BPF_TC_PARENT).
      		E.g. to set parent to 1:16 like in tc command line, the
      		equivalent would be BPF_TC_PARENT(1, 16).
      
      	@opts - Cannot be NULL. The following opts are optional:
      		* handle   - The handle of the filter
      		* priority - The priority of the filter
      			     Must be >= 0 and <= UINT16_MAX
      		Note that when left unset, they will be auto-allocated by
      		the kernel. The following opts must be set:
      		* prog_fd - The fd of the loaded SCHED_CLS prog
      		The following opts must be unset:
      		* prog_id - The ID of the BPF prog
      		The following opts are optional:
      		* flags - Currently only BPF_TC_F_REPLACE is allowed. It
      			  allows replacing an existing filter instead of
      			  failing with -EEXIST.
      		The following opts will be filled by bpf_tc_attach on a
      		successful attach operation if they are unset:
      		* handle   - The handle of the attached filter
      		* priority - The priority of the attached filter
      		* prog_id  - The ID of the attached SCHED_CLS prog
      		This way, the user can know what the auto allocated values
      		for optional opts like handle and priority are for the newly
      		attached filter, if they were unset.
      
      		Note that some other attributes are set to fixed default
      		values listed below (this holds for all bpf_tc_* APIs):
      		protocol as ETH_P_ALL, direct action mode, chain index of 0,
      		and class ID of 0 (this can be set by writing to the
      		skb->tc_classid field from the BPF program).
      
      bpf_tc_detach
      Parameters:
      	@hook - Cannot be NULL. Represents the hook the filter will be
      		detached from. Requirements are same as described above
      		in bpf_tc_attach.
      
      	@opts - Cannot be NULL. The following opts must be set:
      		* handle, priority
      		The following opts must be unset:
      		* prog_fd, prog_id, flags
      
      bpf_tc_query
      Parameters:
      	@hook - Cannot be NULL. Represents the hook where the filter lookup will
      		be performed. Requirements are same as described above in
      		bpf_tc_attach().
      
      	@opts - Cannot be NULL. The following opts must be set:
      		* handle, priority
      		The following opts must be unset:
      		* prog_fd, prog_id, flags
      		The following fields will be filled by bpf_tc_query upon a
      		successful lookup:
      		* prog_id
      
      Some usage examples (using BPF skeleton infrastructure):
      
      BPF program (test_tc_bpf.c):
      
      	#include <linux/bpf.h>
      	#include <bpf/bpf_helpers.h>
      
      	SEC("classifier")
      	int cls(struct __sk_buff *skb)
      	{
      		return 0;
      	}
      
      Userspace loader:
      
      	struct test_tc_bpf *skel = NULL;
      	int fd, r;
      
      	skel = test_tc_bpf__open_and_load();
      	if (!skel)
      		return -ENOMEM;
      
      	fd = bpf_program__fd(skel->progs.cls);
      
      	DECLARE_LIBBPF_OPTS(bpf_tc_hook, hook, .ifindex =
      			    if_nametoindex("lo"), .attach_point =
      			    BPF_TC_INGRESS);
      	/* Create clsact qdisc */
      	r = bpf_tc_hook_create(&hook);
      	if (r < 0)
      		goto end;
      
      	DECLARE_LIBBPF_OPTS(bpf_tc_opts, opts, .prog_fd = fd);
      	r = bpf_tc_attach(&hook, &opts);
      	if (r < 0)
      		goto end;
      	/* Print the auto allocated handle and priority */
      	printf("Handle=%u", opts.handle);
      	printf("Priority=%u", opts.priority);
      
      	opts.prog_fd = opts.prog_id = 0;
      	bpf_tc_detach(&hook, &opts);
      end:
      	test_tc_bpf__destroy(skel);
      
      This is equivalent to doing the following using tc command line:
        # tc qdisc add dev lo clsact
        # tc filter add dev lo ingress bpf obj foo.o sec classifier da
        # tc filter del dev lo ingress handle <h> prio <p> bpf
      ... where the handle and priority can be found using:
        # tc filter show dev lo ingress
      
      Another example replacing a filter (extending prior example):
      
      	/* We can also choose both (or one), let's try replacing an
      	 * existing filter.
      	 */
      	DECLARE_LIBBPF_OPTS(bpf_tc_opts, replace_opts, .handle =
      			    opts.handle, .priority = opts.priority,
      			    .prog_fd = fd);
      	r = bpf_tc_attach(&hook, &replace_opts);
      	if (r == -EEXIST) {
      		/* Expected, now use BPF_TC_F_REPLACE to replace it */
      		replace_opts.flags = BPF_TC_F_REPLACE;
      		return bpf_tc_attach(&hook, &replace_opts);
      	} else if (r < 0) {
      		return r;
      	}
      	/* There must be no existing filter with these
      	 * attributes, so cleanup and return an error.
      	 */
      	replace_opts.prog_fd = replace_opts.prog_id = 0;
      	bpf_tc_detach(&hook, &replace_opts);
      	return -1;
      
      To obtain info of a particular filter:
      
      	/* Find info for filter with handle 1 and priority 50 */
      	DECLARE_LIBBPF_OPTS(bpf_tc_opts, info_opts, .handle = 1,
      			    .priority = 50);
      	r = bpf_tc_query(&hook, &info_opts);
      	if (r == -ENOENT)
      		printf("Filter not found");
      	else if (r < 0)
      		return r;
      	printf("Prog ID: %u", info_opts.prog_id);
      	return 0;
      Signed-off-by: NKumar Kartikeya Dwivedi <memxor@gmail.com>
      Co-developed-by: Daniel Borkmann <daniel@iogearbox.net> # libbpf API design
      [ Daniel: also did major patch cleanup ]
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Reviewed-by: NToke Høiland-Jørgensen <toke@redhat.com>
      Link: https://lore.kernel.org/bpf/20210512103451.989420-3-memxor@gmail.com
      715c5ce4
  2. 12 5月, 2021 1 次提交
  3. 09 4月, 2021 1 次提交
  4. 26 3月, 2021 2 次提交
  5. 19 3月, 2021 1 次提交
    • A
      libbpf: Add BPF static linker APIs · faf6ed32
      Andrii Nakryiko 提交于
      Introduce BPF static linker APIs to libbpf. BPF static linker allows to
      perform static linking of multiple BPF object files into a single combined
      resulting object file, preserving all the BPF programs, maps, global
      variables, etc.
      
      Data sections (.bss, .data, .rodata, .maps, maps, etc) with the same name are
      concatenated together. Similarly, code sections are also concatenated. All the
      symbols and ELF relocations are also concatenated in their respective ELF
      sections and are adjusted accordingly to the new object file layout.
      
      Static variables and functions are handled correctly as well, adjusting BPF
      instructions offsets to reflect new variable/function offset within the
      combined ELF section. Such relocations are referencing STT_SECTION symbols and
      that stays intact.
      
      Data sections in different files can have different alignment requirements, so
      that is taken care of as well, adjusting sizes and offsets as necessary to
      satisfy both old and new alignment requirements.
      
      DWARF data sections are stripped out, currently. As well as LLLVM_ADDRSIG
      section, which is ignored by libbpf in bpf_object__open() anyways. So, in
      a way, BPF static linker is an analogue to `llvm-strip -g`, which is a pretty
      nice property, especially if resulting .o file is then used to generate BPF
      skeleton.
      
      Original string sections are ignored and instead we construct our own set of
      unique strings using libbpf-internal `struct strset` API.
      
      To reduce the size of the patch, all the .BTF and .BTF.ext processing was
      moved into a separate patch.
      
      The high-level API consists of just 4 functions:
        - bpf_linker__new() creates an instance of BPF static linker. It accepts
          output filename and (currently empty) options struct;
        - bpf_linker__add_file() takes input filename and appends it to the already
          processed ELF data; it can be called multiple times, one for each BPF
          ELF object file that needs to be linked in;
        - bpf_linker__finalize() needs to be called to dump final ELF contents into
          the output file, specified when bpf_linker was created; after
          bpf_linker__finalize() is called, no more bpf_linker__add_file() and
          bpf_linker__finalize() calls are allowed, they will return error;
        - regardless of whether bpf_linker__finalize() was called or not,
          bpf_linker__free() will free up all the used resources.
      
      Currently, BPF static linker doesn't resolve cross-object file references
      (extern variables and/or functions). This will be added in the follow up patch
      set.
      Signed-off-by: NAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20210318194036.3521577-7-andrii@kernel.org
      faf6ed32
  6. 17 3月, 2021 1 次提交
  7. 15 12月, 2020 1 次提交
  8. 30 9月, 2020 1 次提交
  9. 04 9月, 2020 1 次提交
  10. 22 8月, 2020 1 次提交
    • A
      libbpf: Add perf_buffer APIs for better integration with outside epoll loop · dca5612f
      Andrii Nakryiko 提交于
      Add a set of APIs to perf_buffer manage to allow applications to integrate
      perf buffer polling into existing epoll-based infrastructure. One example is
      applications using libevent already and wanting to plug perf_buffer polling,
      instead of relying on perf_buffer__poll() and waste an extra thread to do it.
      But perf_buffer is still extremely useful to set up and consume perf buffer
      rings even for such use cases.
      
      So to accomodate such new use cases, add three new APIs:
        - perf_buffer__buffer_cnt() returns number of per-CPU buffers maintained by
          given instance of perf_buffer manager;
        - perf_buffer__buffer_fd() returns FD of perf_event corresponding to
          a specified per-CPU buffer; this FD is then polled independently;
        - perf_buffer__consume_buffer() consumes data from single per-CPU buffer,
          identified by its slot index.
      
      To support a simpler, but less efficient, way to integrate perf_buffer into
      external polling logic, also expose underlying epoll FD through
      perf_buffer__epoll_fd() API. It will need to be followed by
      perf_buffer__poll(), wasting extra syscall, or perf_buffer__consume(), wasting
      CPU to iterate buffers with no data. But could be simpler and more convenient
      for some cases.
      
      These APIs allow for great flexiblity, but do not sacrifice general usability
      of perf_buffer.
      
      Also exercise and check new APIs in perf_buffer selftest.
      Signed-off-by: NAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Reviewed-by: NAlan Maguire <alan.maguire@oracle.com>
      Link: https://lore.kernel.org/bpf/20200821165927.849538-1-andriin@fb.com
      dca5612f
  11. 07 8月, 2020 1 次提交
  12. 02 8月, 2020 1 次提交
  13. 26 7月, 2020 2 次提交
  14. 18 7月, 2020 1 次提交
  15. 29 6月, 2020 1 次提交
    • A
      libbpf: Support disabling auto-loading BPF programs · d9297581
      Andrii Nakryiko 提交于
      Currently, bpf_object__load() (and by induction skeleton's load), will always
      attempt to prepare, relocate, and load into kernel every single BPF program
      found inside the BPF object file. This is often convenient and the right thing
      to do and what users expect.
      
      But there are plenty of cases (especially with BPF development constantly
      picking up the pace), where BPF application is intended to work with old
      kernels, with potentially reduced set of features. But on kernels supporting
      extra features, it would like to take a full advantage of them, by employing
      extra BPF program. This could be a choice of using fentry/fexit over
      kprobe/kretprobe, if kernel is recent enough and is built with BTF. Or BPF
      program might be providing optimized bpf_iter-based solution that user-space
      might want to use, whenever available. And so on.
      
      With libbpf and BPF CO-RE in particular, it's advantageous to not have to
      maintain two separate BPF object files to achieve this. So to enable such use
      cases, this patch adds ability to request not auto-loading chosen BPF
      programs. In such case, libbpf won't attempt to perform relocations (which
      might fail due to old kernel), won't try to resolve BTF types for
      BTF-aware (tp_btf/fentry/fexit/etc) program types, because BTF might not be
      present, and so on. Skeleton will also automatically skip auto-attachment step
      for such not loaded BPF programs.
      
      Overall, this feature allows to simplify development and deployment of
      real-world BPF applications with complicated compatibility requirements.
      Signed-off-by: NAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20200625232629.3444003-2-andriin@fb.com
      d9297581
  16. 23 6月, 2020 1 次提交
    • A
      libbpf: Add a bunch of attribute getters/setters for map definitions · 1bdb6c9a
      Andrii Nakryiko 提交于
      Add a bunch of getter for various aspects of BPF map. Some of these attribute
      (e.g., key_size, value_size, type, etc) are available right now in struct
      bpf_map_def, but this patch adds getter allowing to fetch them individually.
      bpf_map_def approach isn't very scalable, when ABI stability requirements are
      taken into account. It's much easier to extend libbpf and add support for new
      features, when each aspect of BPF map has separate getter/setter.
      
      Getters follow the common naming convention of not explicitly having "get" in
      its name: bpf_map__type() returns map type, bpf_map__key_size() returns
      key_size. Setters, though, explicitly have set in their name:
      bpf_map__set_type(), bpf_map__set_key_size().
      
      This patch ensures we now have a getter and a setter for the following
      map attributes:
        - type;
        - max_entries;
        - map_flags;
        - numa_node;
        - key_size;
        - value_size;
        - ifindex.
      
      bpf_map__resize() enforces unnecessary restriction of max_entries > 0. It is
      unnecessary, because libbpf actually supports zero max_entries for some cases
      (e.g., for PERF_EVENT_ARRAY map) and treats it specially during map creation
      time. To allow setting max_entries=0, new bpf_map__set_max_entries() setter is
      added. bpf_map__resize()'s behavior is preserved for backwards compatibility
      reasons.
      
      Map ifindex getter is added as well. There is a setter already, but no
      corresponding getter. Fix this assymetry as well. bpf_map__set_ifindex()
      itself is converted from void function into error-returning one, similar to
      other setters. The only error returned right now is -EBUSY, if BPF map is
      already loaded and has corresponding FD.
      
      One lacking attribute with no ability to get/set or even specify it
      declaratively is numa_node. This patch fixes this gap and both adds
      programmatic getter/setter, as well as adds support for numa_node field in
      BTF-defined map.
      Signed-off-by: NAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NToke Høiland-Jørgensen <toke@redhat.com>
      Link: https://lore.kernel.org/bpf/20200621062112.3006313-1-andriin@fb.com
      1bdb6c9a
  17. 02 6月, 2020 3 次提交
    • J
      libbpf: Add support for bpf_link-based netns attachment · d60d81ac
      Jakub Sitnicki 提交于
      Add bpf_program__attach_nets(), which uses LINK_CREATE subcommand to create
      an FD-based kernel bpf_link, for attach types tied to network namespace,
      that is BPF_FLOW_DISSECTOR for the moment.
      Signed-off-by: NJakub Sitnicki <jakub@cloudflare.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20200531082846.2117903-7-jakub@cloudflare.com
      d60d81ac
    • A
      libbpf: Add BPF ring buffer support · bf99c936
      Andrii Nakryiko 提交于
      Declaring and instantiating BPF ring buffer doesn't require any changes to
      libbpf, as it's just another type of maps. So using existing BTF-defined maps
      syntax with __uint(type, BPF_MAP_TYPE_RINGBUF) and __uint(max_elements,
      <size-of-ring-buf>) is all that's necessary to create and use BPF ring buffer.
      
      This patch adds BPF ring buffer consumer to libbpf. It is very similar to
      perf_buffer implementation in terms of API, but also attempts to fix some
      minor problems and inconveniences with existing perf_buffer API.
      
      ring_buffer support both single ring buffer use case (with just using
      ring_buffer__new()), as well as allows to add more ring buffers, each with its
      own callback and context. This allows to efficiently poll and consume
      multiple, potentially completely independent, ring buffers, using single
      epoll instance.
      
      The latter is actually a problem in practice for applications
      that are using multiple sets of perf buffers. They have to create multiple
      instances for struct perf_buffer and poll them independently or in a loop,
      each approach having its own problems (e.g., inability to use a common poll
      timeout). struct ring_buffer eliminates this problem by aggregating many
      independent ring buffer instances under the single "ring buffer manager".
      
      Second, perf_buffer's callback can't return error, so applications that need
      to stop polling due to error in data or data signalling the end, have to use
      extra mechanisms to signal that polling has to stop. ring_buffer's callback
      can return error, which will be passed through back to user code and can be
      acted upon appropariately.
      
      Two APIs allow to consume ring buffer data:
        - ring_buffer__poll(), which will wait for data availability notification
          and will consume data only from reported ring buffer(s); this API allows
          to efficiently use resources by reading data only when it becomes
          available;
        - ring_buffer__consume(), will attempt to read new records regardless of
          data availablity notification sub-system. This API is useful for cases
          when lowest latency is required, in expense of burning CPU resources.
      Signed-off-by: NAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/20200529075424.3139988-3-andriin@fb.comSigned-off-by: NAlexei Starovoitov <ast@kernel.org>
      bf99c936
    • E
      libbpf: Add API to consume the perf ring buffer content · 272d51af
      Eelco Chaudron 提交于
      This new API, perf_buffer__consume, can be used as follows:
      
      - When you have a perf ring where wakeup_events is higher than 1,
        and you have remaining data in the rings you would like to pull
        out on exit (or maybe based on a timeout).
      
      - For low latency cases where you burn a CPU that constantly polls
        the queues.
      Signed-off-by: NEelco Chaudron <echaudro@redhat.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAndrii Nakryiko <andriin@fb.com>
      Link: https://lore.kernel.org/bpf/159048487929.89441.7465713173442594608.stgit@ebuildSigned-off-by: NAlexei Starovoitov <ast@kernel.org>
      272d51af
  18. 10 5月, 2020 1 次提交
  19. 15 4月, 2020 1 次提交
  20. 31 3月, 2020 1 次提交
  21. 30 3月, 2020 2 次提交
  22. 29 3月, 2020 1 次提交
  23. 03 3月, 2020 1 次提交
    • A
      libbpf: Add bpf_link pinning/unpinning · c016b68e
      Andrii Nakryiko 提交于
      With bpf_link abstraction supported by kernel explicitly, add
      pinning/unpinning API for links. Also allow to create (open) bpf_link from BPF
      FS file.
      
      This API allows to have an "ephemeral" FD-based BPF links (like raw tracepoint
      or fexit/freplace attachments) surviving user process exit, by pinning them in
      a BPF FS, which is an important use case for long-running BPF programs.
      
      As part of this, expose underlying FD for bpf_link. While legacy bpf_link's
      might not have a FD associated with them (which will be expressed as
      a bpf_link with fd=-1), kernel's abstraction is based around FD-based usage,
      so match it closely. This, subsequently, allows to have a generic
      pinning/unpinning API for generalized bpf_link. For some types of bpf_links
      kernel might not support pinning, in which case bpf_link__pin() will return
      error.
      
      With FD being part of generic bpf_link, also get rid of bpf_link_fd in favor
      of using vanialla bpf_link.
      Signed-off-by: NAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20200303043159.323675-3-andriin@fb.com
      c016b68e
  24. 21 2月, 2020 1 次提交
  25. 25 1月, 2020 1 次提交
    • A
      libbpf: Improve handling of failed CO-RE relocations · d7a25270
      Andrii Nakryiko 提交于
      Previously, if libbpf failed to resolve CO-RE relocation for some
      instructions, it would either return error immediately, or, if
      .relaxed_core_relocs option was set, would replace relocatable offset/imm part
      of an instruction with a bogus value (-1). Neither approach is good, because
      there are many possible scenarios where relocation is expected to fail (e.g.,
      when some field knowingly can be missing on specific kernel versions). On the
      other hand, replacing offset with invalid one can hide programmer errors, if
      this relocation failue wasn't anticipated.
      
      This patch deprecates .relaxed_core_relocs option and changes the approach to
      always replacing instruction, for which relocation failed, with invalid BPF
      helper call instruction. For cases where this is expected, BPF program should
      already ensure that that instruction is unreachable, in which case this
      invalid instruction is going to be silently ignored. But if instruction wasn't
      guarded, BPF program will be rejected at verification step with verifier log
      pointing precisely to the place in assembly where the problem is.
      Signed-off-by: NAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Link: https://lore.kernel.org/bpf/20200124053837.2434679-1-andriin@fb.com
      d7a25270
  26. 23 1月, 2020 1 次提交
  27. 10 1月, 2020 1 次提交
    • M
      bpf: libbpf: Add STRUCT_OPS support · 590a0088
      Martin KaFai Lau 提交于
      This patch adds BPF STRUCT_OPS support to libbpf.
      
      The only sec_name convention is SEC(".struct_ops") to identify the
      struct_ops implemented in BPF,
      e.g. To implement a tcp_congestion_ops:
      
      SEC(".struct_ops")
      struct tcp_congestion_ops dctcp = {
      	.init           = (void *)dctcp_init,  /* <-- a bpf_prog */
      	/* ... some more func prts ... */
      	.name           = "bpf_dctcp",
      };
      
      Each struct_ops is defined as a global variable under SEC(".struct_ops")
      as above.  libbpf creates a map for each variable and the variable name
      is the map's name.  Multiple struct_ops is supported under
      SEC(".struct_ops").
      
      In the bpf_object__open phase, libbpf will look for the SEC(".struct_ops")
      section and find out what is the btf-type the struct_ops is
      implementing.  Note that the btf-type here is referring to
      a type in the bpf_prog.o's btf.  A "struct bpf_map" is added
      by bpf_object__add_map() as other maps do.  It will then
      collect (through SHT_REL) where are the bpf progs that the
      func ptrs are referring to.  No btf_vmlinux is needed in
      the open phase.
      
      In the bpf_object__load phase, the map-fields, which depend
      on the btf_vmlinux, are initialized (in bpf_map__init_kern_struct_ops()).
      It will also set the prog->type, prog->attach_btf_id, and
      prog->expected_attach_type.  Thus, the prog's properties do
      not rely on its section name.
      [ Currently, the bpf_prog's btf-type ==> btf_vmlinux's btf-type matching
        process is as simple as: member-name match + btf-kind match + size match.
        If these matching conditions fail, libbpf will reject.
        The current targeting support is "struct tcp_congestion_ops" which
        most of its members are function pointers.
        The member ordering of the bpf_prog's btf-type can be different from
        the btf_vmlinux's btf-type. ]
      
      Then, all obj->maps are created as usual (in bpf_object__create_maps()).
      
      Once the maps are created and prog's properties are all set,
      the libbpf will proceed to load all the progs.
      
      bpf_map__attach_struct_ops() is added to register a struct_ops
      map to a kernel subsystem.
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20200109003514.3856730-1-kafai@fb.com
      590a0088
  28. 09 1月, 2020 1 次提交
  29. 19 12月, 2019 2 次提交
    • A
      libbpf: Allow to augment system Kconfig through extra optional config · 8601fd42
      Andrii Nakryiko 提交于
      Instead of all or nothing approach of overriding Kconfig file location, allow
      to extend it with extra values and override chosen subset of values though
      optional user-provided extra config, passed as a string through open options'
      .kconfig option. If same config key is present in both user-supplied config
      and Kconfig, user-supplied one wins. This allows applications to more easily
      test various conditions despite host kernel's real configuration. If all of
      BPF object's __kconfig externs are satisfied from user-supplied config, system
      Kconfig won't be read at all.
      
      Simplify selftests by not needing to create temporary Kconfig files.
      Suggested-by: NAlexei Starovoitov <ast@fb.com>
      Signed-off-by: NAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20191219002837.3074619-3-andriin@fb.com
      8601fd42
    • A
      libbpf: Add bpf_link__disconnect() API to preserve underlying BPF resource · d6958706
      Andrii Nakryiko 提交于
      There are cases in which BPF resource (program, map, etc) has to outlive
      userspace program that "installed" it in the system in the first place.
      When BPF program is attached, libbpf returns bpf_link object, which
      is supposed to be destroyed after no longer necessary through
      bpf_link__destroy() API. Currently, bpf_link destruction causes both automatic
      detachment and frees up any resources allocated to for bpf_link in-memory
      representation. This is inconvenient for the case described above because of
      coupling of detachment and resource freeing.
      
      This patch introduces bpf_link__disconnect() API call, which marks bpf_link as
      disconnected from its underlying BPF resouces. This means that when bpf_link
      is destroyed later, all its memory resources will be freed, but BPF resource
      itself won't be detached.
      
      This design allows to follow strict and resource-leak-free design by default,
      while giving easy and straightforward way for user code to opt for keeping BPF
      resource attached beyond lifetime of a bpf_link. For some BPF programs (i.e.,
      FS-based tracepoints, kprobes, raw tracepoint, etc), user has to make sure to
      pin BPF program to prevent kernel to automatically detach it on process exit.
      This should typically be achived by pinning BPF program (or map in some cases)
      in BPF FS.
      Signed-off-by: NAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Link: https://lore.kernel.org/bpf/20191218225039.2668205-1-andriin@fb.com
      d6958706
  30. 18 12月, 2019 1 次提交
  31. 16 12月, 2019 4 次提交
    • A
      libbpf: Support libbpf-provided extern variables · 166750bc
      Andrii Nakryiko 提交于
      Add support for extern variables, provided to BPF program by libbpf. Currently
      the following extern variables are supported:
        - LINUX_KERNEL_VERSION; version of a kernel in which BPF program is
          executing, follows KERNEL_VERSION() macro convention, can be 4- and 8-byte
          long;
        - CONFIG_xxx values; a set of values of actual kernel config. Tristate,
          boolean, strings, and integer values are supported.
      
      Set of possible values is determined by declared type of extern variable.
      Supported types of variables are:
      - Tristate values. Are represented as `enum libbpf_tristate`. Accepted values
        are **strictly** 'y', 'n', or 'm', which are represented as TRI_YES, TRI_NO,
        or TRI_MODULE, respectively.
      - Boolean values. Are represented as bool (_Bool) types. Accepted values are
        'y' and 'n' only, turning into true/false values, respectively.
      - Single-character values. Can be used both as a substritute for
        bool/tristate, or as a small-range integer:
        - 'y'/'n'/'m' are represented as is, as characters 'y', 'n', or 'm';
        - integers in a range [-128, 127] or [0, 255] (depending on signedness of
          char in target architecture) are recognized and represented with
          respective values of char type.
      - Strings. String values are declared as fixed-length char arrays. String of
        up to that length will be accepted and put in first N bytes of char array,
        with the rest of bytes zeroed out. If config string value is longer than
        space alloted, it will be truncated and warning message emitted. Char array
        is always zero terminated. String literals in config have to be enclosed in
        double quotes, just like C-style string literals.
      - Integers. 8-, 16-, 32-, and 64-bit integers are supported, both signed and
        unsigned variants. Libbpf enforces parsed config value to be in the
        supported range of corresponding integer type. Integers values in config can
        be:
        - decimal integers, with optional + and - signs;
        - hexadecimal integers, prefixed with 0x or 0X;
        - octal integers, starting with 0.
      
      Config file itself is searched in /boot/config-$(uname -r) location with
      fallback to /proc/config.gz, unless config path is specified explicitly
      through bpf_object_open_opts' kernel_config_path option. Both gzipped and
      plain text formats are supported. Libbpf adds explicit dependency on zlib
      because of this, but this shouldn't be a problem, given libelf already depends
      on zlib.
      
      All detected extern variables, are put into a separate .extern internal map.
      It, similarly to .rodata map, is marked as read-only from BPF program side, as
      well as is frozen on load. This allows BPF verifier to track extern values as
      constants and perform enhanced branch prediction and dead code elimination.
      This can be relied upon for doing kernel version/feature detection and using
      potentially unsupported field relocations or BPF helpers in a CO-RE-based BPF
      program, while still having a single version of BPF program running on old and
      new kernels. Selftests are validating this explicitly for unexisting BPF
      helper.
      Signed-off-by: NAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20191214014710.3449601-3-andriin@fb.com
      166750bc
    • A
      libbpf: Add BPF object skeleton support · d66562fb
      Andrii Nakryiko 提交于
      Add new set of APIs, allowing to open/load/attach BPF object through BPF
      object skeleton, generated by bpftool for a specific BPF object file. All the
      xxx_skeleton() APIs wrap up corresponding bpf_object_xxx() APIs, but
      additionally also automate map/program lookups by name, global data
      initialization and mmap()-ing, etc.  All this greatly improves and simplifies
      userspace usability of working with BPF programs. See follow up patches for
      examples.
      Signed-off-by: NAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Link: https://lore.kernel.org/bpf/20191214014341.3442258-13-andriin@fb.com
      d66562fb
    • A
      libbpf: Expose BPF program's function name · 01af3bf0
      Andrii Nakryiko 提交于
      Add APIs to get BPF program function name, as opposed to bpf_program__title(),
      which returns BPF program function's section name. Function name has a benefit
      of being a valid C identifier and uniquely identifies a specific BPF program,
      while section name can be duplicated across multiple independent BPF programs.
      
      Add also bpf_object__find_program_by_name(), similar to
      bpf_object__find_program_by_title(), to facilitate looking up BPF programs by
      their C function names.
      
      Convert one of selftests to new API for look up.
      Signed-off-by: NAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Link: https://lore.kernel.org/bpf/20191214014341.3442258-9-andriin@fb.com
      01af3bf0
    • A
      libbpf: Extract common user-facing helpers · 544402d4
      Andrii Nakryiko 提交于
      LIBBPF_API and DECLARE_LIBBPF_OPTS are needed in many public libbpf API
      headers. Extract them into libbpf_common.h to avoid unnecessary
      interdependency between btf.h, libbpf.h, and bpf.h or code duplication.
      Signed-off-by: NAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20191214014341.3442258-6-andriin@fb.com
      544402d4