1. 20 7月, 2022 3 次提交
    • A
      libbpf: add ksyscall/kretsyscall sections support for syscall kprobes · 708ac5be
      Andrii Nakryiko 提交于
      Add SEC("ksyscall")/SEC("ksyscall/<syscall_name>") and corresponding
      kretsyscall variants (for return kprobes) to allow users to kprobe
      syscall functions in kernel. These special sections allow to ignore
      complexities and differences between kernel versions and host
      architectures when it comes to syscall wrapper and corresponding
      __<arch>_sys_<syscall> vs __se_sys_<syscall> differences, depending on
      whether host kernel has CONFIG_ARCH_HAS_SYSCALL_WRAPPER (though libbpf
      itself doesn't rely on /proc/config.gz for detecting this, see
      BPF_KSYSCALL patch for how it's done internally).
      
      Combined with the use of BPF_KSYSCALL() macro, this allows to just
      specify intended syscall name and expected input arguments and leave
      dealing with all the variations to libbpf.
      
      In addition to SEC("ksyscall+") and SEC("kretsyscall+") add
      bpf_program__attach_ksyscall() API which allows to specify syscall name
      at runtime and provide associated BPF cookie value.
      
      At the moment SEC("ksyscall") and bpf_program__attach_ksyscall() do not
      handle all the calling convention quirks for mmap(), clone() and compat
      syscalls. It also only attaches to "native" syscall interfaces. If host
      system supports compat syscalls or defines 32-bit syscalls in 64-bit
      kernel, such syscall interfaces won't be attached to by libbpf.
      
      These limitations may or may not change in the future. Therefore it is
      recommended to use SEC("kprobe") for these syscalls or if working with
      compat and 32-bit interfaces is required.
      Tested-by: NAlan Maguire <alan.maguire@oracle.com>
      Signed-off-by: NAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/r/20220714070755.3235561-5-andrii@kernel.orgSigned-off-by: NAlexei Starovoitov <ast@kernel.org>
      708ac5be
    • A
      libbpf: improve BPF_KPROBE_SYSCALL macro and rename it to BPF_KSYSCALL · 6f5d467d
      Andrii Nakryiko 提交于
      Improve BPF_KPROBE_SYSCALL (and rename it to shorter BPF_KSYSCALL to
      match libbpf's SEC("ksyscall") section name, added in next patch) to use
      __kconfig variable to determine how to properly fetch syscall arguments.
      
      Instead of relying on hard-coded knowledge of whether kernel's
      architecture uses syscall wrapper or not (which only reflects the latest
      kernel versions, but is not necessarily true for older kernels and won't
      necessarily hold for later kernel versions on some particular host
      architecture), determine this at runtime by attempting to create
      perf_event (with fallback to kprobe event creation through tracefs on
      legacy kernels, just like kprobe attachment code is doing) for kernel
      function that would correspond to bpf() syscall on a system that has
      CONFIG_ARCH_HAS_SYSCALL_WRAPPER set (e.g., for x86-64 it would try
      '__x64_sys_bpf').
      
      If host kernel uses syscall wrapper, syscall kernel function's first
      argument is a pointer to struct pt_regs that then contains syscall
      arguments. In such case we need to use bpf_probe_read_kernel() to fetch
      actual arguments (which we do through BPF_CORE_READ() macro) from inner
      pt_regs.
      
      But if the kernel doesn't use syscall wrapper approach, input
      arguments can be read from struct pt_regs directly with no probe reading.
      
      All this feature detection is done without requiring /proc/config.gz
      existence and parsing, and BPF-side helper code uses newly added
      LINUX_HAS_SYSCALL_WRAPPER virtual __kconfig extern to keep in sync with
      user-side feature detection of libbpf.
      
      BPF_KSYSCALL() macro can be used both with SEC("kprobe") programs that
      define syscall function explicitly (e.g., SEC("kprobe/__x64_sys_bpf"))
      and SEC("ksyscall") program added in the next patch (which are the same
      kprobe program with added benefit of libbpf determining correct kernel
      function name automatically).
      
      Kretprobe and kretsyscall (added in next patch) programs don't need
      BPF_KSYSCALL as they don't provide access to input arguments. Normal
      BPF_KRETPROBE is completely sufficient and is recommended.
      Tested-by: NAlan Maguire <alan.maguire@oracle.com>
      Signed-off-by: NAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/r/20220714070755.3235561-4-andrii@kernel.orgSigned-off-by: NAlexei Starovoitov <ast@kernel.org>
      6f5d467d
    • A
      libbpf: generalize virtual __kconfig externs and use it for USDT · 55d00c37
      Andrii Nakryiko 提交于
      Libbpf supports single virtual __kconfig extern currently: LINUX_KERNEL_VERSION.
      LINUX_KERNEL_VERSION isn't coming from /proc/kconfig.gz and is intead
      customly filled out by libbpf.
      
      This patch generalizes this approach to support more such virtual
      __kconfig externs. One such extern added in this patch is
      LINUX_HAS_BPF_COOKIE which is used for BPF-side USDT supporting code in
      usdt.bpf.h instead of using CO-RE-based enum detection approach for
      detecting bpf_get_attach_cookie() BPF helper. This allows to remove
      otherwise not needed CO-RE dependency and keeps user-space and BPF-side
      parts of libbpf's USDT support strictly in sync in terms of their
      feature detection.
      
      We'll use similar approach for syscall wrapper detection for
      BPF_KSYSCALL() BPF-side macro in follow up patch.
      
      Generally, currently libbpf reserves CONFIG_ prefix for Kconfig values
      and LINUX_ for virtual libbpf-backed externs. In the future we might
      extend the set of prefixes that are supported. This can be done without
      any breaking changes, as currently any __kconfig extern with
      unrecognized name is rejected.
      
      For LINUX_xxx externs we support the normal "weak rule": if libbpf
      doesn't recognize given LINUX_xxx extern but such extern is marked as
      __weak, it is not rejected and defaults to zero.  This follows
      CONFIG_xxx handling logic and will allow BPF applications to
      opportunistically use newer libbpf virtual externs without breaking on
      older libbpf versions unnecessarily.
      Tested-by: NAlan Maguire <alan.maguire@oracle.com>
      Reviewed-by: NAlan Maguire <alan.maguire@oracle.com>
      Signed-off-by: NAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/r/20220714070755.3235561-2-andrii@kernel.orgSigned-off-by: NAlexei Starovoitov <ast@kernel.org>
      55d00c37
  2. 16 7月, 2022 1 次提交
    • J
      libbpf: perfbuf: Add API to get the ring buffer · 9ff5efde
      Jon Doron 提交于
      Add support for writing a custom event reader, by exposing the ring
      buffer.
      
      With the new API perf_buffer__buffer() you will get access to the
      raw mmaped()'ed per-cpu underlying memory of the ring buffer.
      
      This region contains both the perf buffer data and header
      (struct perf_event_mmap_page), which manages the ring buffer
      state (head/tail positions, when accessing the head/tail position
      it's important to take into consideration SMP).
      With this type of low level access one can implement different types of
      consumers here are few simple examples where this API helps with:
      
      1. perf_event_read_simple is allocating using malloc, perhaps you want
         to handle the wrap-around in some other way.
      2. Since perf buf is per-cpu then the order of the events is not
         guarnteed, for example:
         Given 3 events where each event has a timestamp t0 < t1 < t2,
         and the events are spread on more than 1 CPU, then we can end
         up with the following state in the ring buf:
         CPU[0] => [t0, t2]
         CPU[1] => [t1]
         When you consume the events from CPU[0], you could know there is
         a t1 missing, (assuming there are no drops, and your event data
         contains a sequential index).
         So now one can simply do the following, for CPU[0], you can store
         the address of t0 and t2 in an array (without moving the tail, so
         there data is not perished) then move on the CPU[1] and set the
         address of t1 in the same array.
         So you end up with something like:
         void **arr[] = [&t0, &t1, &t2], now you can consume it orderely
         and move the tails as you process in order.
      3. Assuming there are multiple CPUs and we want to start draining the
         messages from them, then we can "pick" with which one to start with
         according to the remaining free space in the ring buffer.
      Signed-off-by: NJon Doron <jond@wiz.io>
      Signed-off-by: NAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20220715181122.149224-1-arilou@gmail.com
      9ff5efde
  3. 14 7月, 2022 2 次提交
  4. 06 7月, 2022 4 次提交
  5. 30 6月, 2022 1 次提交
  6. 29 6月, 2022 9 次提交
  7. 25 6月, 2022 1 次提交
  8. 17 6月, 2022 1 次提交
  9. 14 6月, 2022 1 次提交
  10. 09 6月, 2022 1 次提交
  11. 08 6月, 2022 2 次提交
  12. 04 6月, 2022 1 次提交
  13. 03 6月, 2022 4 次提交
  14. 24 5月, 2022 1 次提交
  15. 17 5月, 2022 1 次提交
  16. 13 5月, 2022 1 次提交
    • A
      libbpf: Add safer high-level wrappers for map operations · 737d0646
      Andrii Nakryiko 提交于
      Add high-level API wrappers for most common and typical BPF map
      operations that works directly on instances of struct bpf_map * (so
      you don't have to call bpf_map__fd()) and validate key/value size
      expectations.
      
      These helpers require users to specify key (and value, where
      appropriate) sizes when performing lookup/update/delete/etc. This forces
      user to actually think and validate (for themselves) those. This is
      a good thing as user is expected by kernel to implicitly provide correct
      key/value buffer sizes and kernel will just read/write necessary amount
      of data. If it so happens that user doesn't set up buffers correctly
      (which bit people for per-CPU maps especially) kernel either randomly
      overwrites stack data or return -EFAULT, depending on user's luck and
      circumstances. These high-level APIs are meant to prevent such
      unpleasant and hard to debug bugs.
      
      This patch also adds bpf_map_delete_elem_flags() low-level API and
      requires passing flags to bpf_map__delete_elem() API for consistency
      across all similar APIs, even though currently kernel doesn't expect
      any extra flags for BPF_MAP_DELETE_ELEM operation.
      
      List of map operations that get these high-level APIs:
      
        - bpf_map_lookup_elem;
        - bpf_map_update_elem;
        - bpf_map_delete_elem;
        - bpf_map_lookup_and_delete_elem;
        - bpf_map_get_next_key.
      Signed-off-by: NAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/20220512220713.2617964-1-andrii@kernel.org
      737d0646
  17. 11 5月, 2022 3 次提交
  18. 09 5月, 2022 1 次提交
  19. 29 4月, 2022 2 次提交
    • A
      libbpf: Allow to opt-out from creating BPF maps · ec41817b
      Andrii Nakryiko 提交于
      Add bpf_map__set_autocreate() API that allows user to opt-out from
      libbpf automatically creating BPF map during BPF object load.
      
      This is a useful feature when building CO-RE-enabled BPF application
      that takes advantage of some new-ish BPF map type (e.g., socket-local
      storage) if kernel supports it, but otherwise uses some alternative way
      (e.g., extra HASH map). In such case, being able to disable the creation
      of a map that kernel doesn't support allows to successfully create and
      load BPF object file with all its other maps and programs.
      
      It's still up to user to make sure that no "live" code in any of their BPF
      programs are referencing such map instance, which can be achieved by
      guarding such code with CO-RE relocation check or by using .rodata
      global variables.
      
      If user fails to properly guard such code to turn it into "dead code",
      libbpf will helpfully post-process BPF verifier log and will provide
      more meaningful error and map name that needs to be guarded properly. As
      such, instead of:
      
        ; value = bpf_map_lookup_elem(&missing_map, &zero);
        4: (85) call unknown#2001000000
        invalid func unknown#2001000000
      
      ... user will see:
      
        ; value = bpf_map_lookup_elem(&missing_map, &zero);
        4: <invalid BPF map reference>
        BPF map 'missing_map' is referenced but wasn't created
      Signed-off-by: NAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20220428041523.4089853-4-andrii@kernel.org
      ec41817b
    • A
      libbpf: Use libbpf_mem_ensure() when allocating new map · 69721203
      Andrii Nakryiko 提交于
      Reuse libbpf_mem_ensure() when adding a new map to the list of maps
      inside bpf_object. It takes care of proper resizing and reallocating of
      map array and zeroing out newly allocated memory.
      Signed-off-by: NAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20220428041523.4089853-3-andrii@kernel.org
      69721203