1. 15 11月, 2022 1 次提交
  2. 27 9月, 2022 1 次提交
    • J
      libbpf: Fix the case of running as non-root with capabilities · 6a4ab886
      Jon Doron 提交于
      When running rootless with special capabilities like:
      FOWNER / DAC_OVERRIDE / DAC_READ_SEARCH
      
      The "access" API will not make the proper check if there is really
      access to a file or not.
      
      >From the access man page:
      "
      The check is done using the calling process's real UID and GID, rather
      than the effective IDs as is done when actually attempting an operation
      (e.g., open(2)) on the file.  Similarly, for the root user, the check
      uses the set of permitted capabilities  rather than the set of effective
      capabilities; ***and for non-root users, the check uses an empty set of
      capabilities.***
      "
      
      What that means is that for non-root user the access API will not do the
      proper validation if the process really has permission to a file or not.
      
      To resolve this this patch replaces all the access API calls with
      faccessat with AT_EACCESS flag.
      Signed-off-by: NJon Doron <jond@wiz.io>
      Signed-off-by: NAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20220925070431.1313680-1-arilou@gmail.com
      6a4ab886
  3. 24 9月, 2022 1 次提交
  4. 22 9月, 2022 2 次提交
    • D
      bpf: Add libbpf logic for user-space ring buffer · b66ccae0
      David Vernet 提交于
      Now that all of the logic is in place in the kernel to support user-space
      produced ring buffers, we can add the user-space logic to libbpf. This
      patch therefore adds the following public symbols to libbpf:
      
      struct user_ring_buffer *
      user_ring_buffer__new(int map_fd,
      		      const struct user_ring_buffer_opts *opts);
      void *user_ring_buffer__reserve(struct user_ring_buffer *rb, __u32 size);
      void *user_ring_buffer__reserve_blocking(struct user_ring_buffer *rb,
                                               __u32 size, int timeout_ms);
      void user_ring_buffer__submit(struct user_ring_buffer *rb, void *sample);
      void user_ring_buffer__discard(struct user_ring_buffer *rb,
      void user_ring_buffer__free(struct user_ring_buffer *rb);
      
      A user-space producer must first create a struct user_ring_buffer * object
      with user_ring_buffer__new(), and can then reserve samples in the
      ring buffer using one of the following two symbols:
      
      void *user_ring_buffer__reserve(struct user_ring_buffer *rb, __u32 size);
      void *user_ring_buffer__reserve_blocking(struct user_ring_buffer *rb,
                                               __u32 size, int timeout_ms);
      
      With user_ring_buffer__reserve(), a pointer to a 'size' region of the ring
      buffer will be returned if sufficient space is available in the buffer.
      user_ring_buffer__reserve_blocking() provides similar semantics, but will
      block for up to 'timeout_ms' in epoll_wait if there is insufficient space
      in the buffer. This function has the guarantee from the kernel that it will
      receive at least one event-notification per invocation to
      bpf_ringbuf_drain(), provided that at least one sample is drained, and the
      BPF program did not pass the BPF_RB_NO_WAKEUP flag to bpf_ringbuf_drain().
      
      Once a sample is reserved, it must either be committed to the ring buffer
      with user_ring_buffer__submit(), or discarded with
      user_ring_buffer__discard().
      Signed-off-by: NDavid Vernet <void@manifault.com>
      Signed-off-by: NAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20220920000100.477320-4-void@manifault.com
      b66ccae0
    • D
      bpf: Define new BPF_MAP_TYPE_USER_RINGBUF map type · 583c1f42
      David Vernet 提交于
      We want to support a ringbuf map type where samples are published from
      user-space, to be consumed by BPF programs. BPF currently supports a
      kernel -> user-space circular ring buffer via the BPF_MAP_TYPE_RINGBUF
      map type.  We'll need to define a new map type for user-space -> kernel,
      as none of the helpers exported for BPF_MAP_TYPE_RINGBUF will apply
      to a user-space producer ring buffer, and we'll want to add one or
      more helper functions that would not apply for a kernel-producer
      ring buffer.
      
      This patch therefore adds a new BPF_MAP_TYPE_USER_RINGBUF map type
      definition. The map type is useless in its current form, as there is no
      way to access or use it for anything until we one or more BPF helpers. A
      follow-on patch will therefore add a new helper function that allows BPF
      programs to run callbacks on samples that are published to the ring
      buffer.
      Signed-off-by: NDavid Vernet <void@manifault.com>
      Signed-off-by: NAndrii Nakryiko <andrii@kernel.org>
      Acked-by: NAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20220920000100.477320-2-void@manifault.com
      583c1f42
  5. 17 9月, 2022 1 次提交
  6. 18 8月, 2022 4 次提交
  7. 16 8月, 2022 1 次提交
  8. 12 8月, 2022 1 次提交
  9. 11 8月, 2022 1 次提交
  10. 09 8月, 2022 1 次提交
  11. 08 8月, 2022 1 次提交
  12. 05 8月, 2022 1 次提交
  13. 29 7月, 2022 1 次提交
    • D
      libbpf: Support PPC in arch_specific_syscall_pfx · 64893e83
      Daniel Müller 提交于
      Commit 708ac5be ("libbpf: add ksyscall/kretsyscall sections support
      for syscall kprobes") added the arch_specific_syscall_pfx() function,
      which returns a string representing the architecture in use. As it turns
      out this function is currently not aware of Power PC, where NULL is
      returned. That's being flagged by the libbpf CI system, which builds for
      ppc64le and the compiler sees a NULL pointer being passed in to a %s
      format string.
      With this change we add representations for two more architectures, for
      Power PC and Power PC 64, and also adjust the string format logic to
      handle NULL pointers gracefully, in an attempt to prevent similar issues
      with other architectures in the future.
      
      Fixes: 708ac5be ("libbpf: add ksyscall/kretsyscall sections support for syscall kprobes")
      Signed-off-by: NDaniel Müller <deso@posteo.net>
      Signed-off-by: NAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20220728222345.3125975-1-deso@posteo.net
      64893e83
  14. 20 7月, 2022 5 次提交
    • A
      libbpf: make RINGBUF map size adjustments more eagerly · 597fbc46
      Andrii Nakryiko 提交于
      Make libbpf adjust RINGBUF map size (rounding it up to closest power-of-2
      of page_size) more eagerly: during open phase when initializing the map
      and on explicit calls to bpf_map__set_max_entries().
      
      Such approach allows user to check actual size of BPF ringbuf even
      before it's created in the kernel, but also it prevents various edge
      case scenarios where BPF ringbuf size can get out of sync with what it
      would be in kernel. One of them (reported in [0]) is during an attempt
      to pin/reuse BPF ringbuf.
      
      Move adjust_ringbuf_sz() helper closer to its first actual use. The
      implementation of the helper is unchanged.
      
      Also make detection of whether bpf_object is already loaded more robust
      by checking obj->loaded explicitly, given that map->fd can be < 0 even
      if bpf_object is already loaded due to ability to disable map creation
      with bpf_map__set_autocreate(map, false).
      
        [0] Closes: https://github.com/libbpf/libbpf/pull/530
      
      Fixes: 0087a681 ("libbpf: Automatically fix up BPF_MAP_TYPE_RINGBUF size, if necessary")
      Signed-off-by: NAndrii Nakryiko <andrii@kernel.org>
      Acked-by: NYonghong Song <yhs@fb.com>
      Link: https://lore.kernel.org/r/20220715230952.2219271-1-andrii@kernel.orgSigned-off-by: NAlexei Starovoitov <ast@kernel.org>
      597fbc46
    • A
      libbpf: fallback to tracefs mount point if debugfs is not mounted · a1ac9fd6
      Andrii Nakryiko 提交于
      Teach libbpf to fallback to tracefs mount point (/sys/kernel/tracing) if
      debugfs (/sys/kernel/debug/tracing) isn't mounted.
      Acked-by: NYonghong Song <yhs@fb.com>
      Suggested-by: NConnor O'Brien <connoro@google.com>
      Signed-off-by: NAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/r/20220715185736.898848-1-andrii@kernel.orgSigned-off-by: NAlexei Starovoitov <ast@kernel.org>
      a1ac9fd6
    • A
      libbpf: add ksyscall/kretsyscall sections support for syscall kprobes · 708ac5be
      Andrii Nakryiko 提交于
      Add SEC("ksyscall")/SEC("ksyscall/<syscall_name>") and corresponding
      kretsyscall variants (for return kprobes) to allow users to kprobe
      syscall functions in kernel. These special sections allow to ignore
      complexities and differences between kernel versions and host
      architectures when it comes to syscall wrapper and corresponding
      __<arch>_sys_<syscall> vs __se_sys_<syscall> differences, depending on
      whether host kernel has CONFIG_ARCH_HAS_SYSCALL_WRAPPER (though libbpf
      itself doesn't rely on /proc/config.gz for detecting this, see
      BPF_KSYSCALL patch for how it's done internally).
      
      Combined with the use of BPF_KSYSCALL() macro, this allows to just
      specify intended syscall name and expected input arguments and leave
      dealing with all the variations to libbpf.
      
      In addition to SEC("ksyscall+") and SEC("kretsyscall+") add
      bpf_program__attach_ksyscall() API which allows to specify syscall name
      at runtime and provide associated BPF cookie value.
      
      At the moment SEC("ksyscall") and bpf_program__attach_ksyscall() do not
      handle all the calling convention quirks for mmap(), clone() and compat
      syscalls. It also only attaches to "native" syscall interfaces. If host
      system supports compat syscalls or defines 32-bit syscalls in 64-bit
      kernel, such syscall interfaces won't be attached to by libbpf.
      
      These limitations may or may not change in the future. Therefore it is
      recommended to use SEC("kprobe") for these syscalls or if working with
      compat and 32-bit interfaces is required.
      Tested-by: NAlan Maguire <alan.maguire@oracle.com>
      Signed-off-by: NAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/r/20220714070755.3235561-5-andrii@kernel.orgSigned-off-by: NAlexei Starovoitov <ast@kernel.org>
      708ac5be
    • A
      libbpf: improve BPF_KPROBE_SYSCALL macro and rename it to BPF_KSYSCALL · 6f5d467d
      Andrii Nakryiko 提交于
      Improve BPF_KPROBE_SYSCALL (and rename it to shorter BPF_KSYSCALL to
      match libbpf's SEC("ksyscall") section name, added in next patch) to use
      __kconfig variable to determine how to properly fetch syscall arguments.
      
      Instead of relying on hard-coded knowledge of whether kernel's
      architecture uses syscall wrapper or not (which only reflects the latest
      kernel versions, but is not necessarily true for older kernels and won't
      necessarily hold for later kernel versions on some particular host
      architecture), determine this at runtime by attempting to create
      perf_event (with fallback to kprobe event creation through tracefs on
      legacy kernels, just like kprobe attachment code is doing) for kernel
      function that would correspond to bpf() syscall on a system that has
      CONFIG_ARCH_HAS_SYSCALL_WRAPPER set (e.g., for x86-64 it would try
      '__x64_sys_bpf').
      
      If host kernel uses syscall wrapper, syscall kernel function's first
      argument is a pointer to struct pt_regs that then contains syscall
      arguments. In such case we need to use bpf_probe_read_kernel() to fetch
      actual arguments (which we do through BPF_CORE_READ() macro) from inner
      pt_regs.
      
      But if the kernel doesn't use syscall wrapper approach, input
      arguments can be read from struct pt_regs directly with no probe reading.
      
      All this feature detection is done without requiring /proc/config.gz
      existence and parsing, and BPF-side helper code uses newly added
      LINUX_HAS_SYSCALL_WRAPPER virtual __kconfig extern to keep in sync with
      user-side feature detection of libbpf.
      
      BPF_KSYSCALL() macro can be used both with SEC("kprobe") programs that
      define syscall function explicitly (e.g., SEC("kprobe/__x64_sys_bpf"))
      and SEC("ksyscall") program added in the next patch (which are the same
      kprobe program with added benefit of libbpf determining correct kernel
      function name automatically).
      
      Kretprobe and kretsyscall (added in next patch) programs don't need
      BPF_KSYSCALL as they don't provide access to input arguments. Normal
      BPF_KRETPROBE is completely sufficient and is recommended.
      Tested-by: NAlan Maguire <alan.maguire@oracle.com>
      Signed-off-by: NAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/r/20220714070755.3235561-4-andrii@kernel.orgSigned-off-by: NAlexei Starovoitov <ast@kernel.org>
      6f5d467d
    • A
      libbpf: generalize virtual __kconfig externs and use it for USDT · 55d00c37
      Andrii Nakryiko 提交于
      Libbpf supports single virtual __kconfig extern currently: LINUX_KERNEL_VERSION.
      LINUX_KERNEL_VERSION isn't coming from /proc/kconfig.gz and is intead
      customly filled out by libbpf.
      
      This patch generalizes this approach to support more such virtual
      __kconfig externs. One such extern added in this patch is
      LINUX_HAS_BPF_COOKIE which is used for BPF-side USDT supporting code in
      usdt.bpf.h instead of using CO-RE-based enum detection approach for
      detecting bpf_get_attach_cookie() BPF helper. This allows to remove
      otherwise not needed CO-RE dependency and keeps user-space and BPF-side
      parts of libbpf's USDT support strictly in sync in terms of their
      feature detection.
      
      We'll use similar approach for syscall wrapper detection for
      BPF_KSYSCALL() BPF-side macro in follow up patch.
      
      Generally, currently libbpf reserves CONFIG_ prefix for Kconfig values
      and LINUX_ for virtual libbpf-backed externs. In the future we might
      extend the set of prefixes that are supported. This can be done without
      any breaking changes, as currently any __kconfig extern with
      unrecognized name is rejected.
      
      For LINUX_xxx externs we support the normal "weak rule": if libbpf
      doesn't recognize given LINUX_xxx extern but such extern is marked as
      __weak, it is not rejected and defaults to zero.  This follows
      CONFIG_xxx handling logic and will allow BPF applications to
      opportunistically use newer libbpf virtual externs without breaking on
      older libbpf versions unnecessarily.
      Tested-by: NAlan Maguire <alan.maguire@oracle.com>
      Reviewed-by: NAlan Maguire <alan.maguire@oracle.com>
      Signed-off-by: NAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/r/20220714070755.3235561-2-andrii@kernel.orgSigned-off-by: NAlexei Starovoitov <ast@kernel.org>
      55d00c37
  15. 16 7月, 2022 1 次提交
    • J
      libbpf: perfbuf: Add API to get the ring buffer · 9ff5efde
      Jon Doron 提交于
      Add support for writing a custom event reader, by exposing the ring
      buffer.
      
      With the new API perf_buffer__buffer() you will get access to the
      raw mmaped()'ed per-cpu underlying memory of the ring buffer.
      
      This region contains both the perf buffer data and header
      (struct perf_event_mmap_page), which manages the ring buffer
      state (head/tail positions, when accessing the head/tail position
      it's important to take into consideration SMP).
      With this type of low level access one can implement different types of
      consumers here are few simple examples where this API helps with:
      
      1. perf_event_read_simple is allocating using malloc, perhaps you want
         to handle the wrap-around in some other way.
      2. Since perf buf is per-cpu then the order of the events is not
         guarnteed, for example:
         Given 3 events where each event has a timestamp t0 < t1 < t2,
         and the events are spread on more than 1 CPU, then we can end
         up with the following state in the ring buf:
         CPU[0] => [t0, t2]
         CPU[1] => [t1]
         When you consume the events from CPU[0], you could know there is
         a t1 missing, (assuming there are no drops, and your event data
         contains a sequential index).
         So now one can simply do the following, for CPU[0], you can store
         the address of t0 and t2 in an array (without moving the tail, so
         there data is not perished) then move on the CPU[1] and set the
         address of t1 in the same array.
         So you end up with something like:
         void **arr[] = [&t0, &t1, &t2], now you can consume it orderely
         and move the tails as you process in order.
      3. Assuming there are multiple CPUs and we want to start draining the
         messages from them, then we can "pick" with which one to start with
         according to the remaining free space in the ring buffer.
      Signed-off-by: NJon Doron <jond@wiz.io>
      Signed-off-by: NAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20220715181122.149224-1-arilou@gmail.com
      9ff5efde
  16. 14 7月, 2022 2 次提交
  17. 06 7月, 2022 4 次提交
  18. 30 6月, 2022 1 次提交
  19. 29 6月, 2022 9 次提交
  20. 25 6月, 2022 1 次提交