1. 29 9月, 2020 30 次提交
    • A
      libbpf: Add BTF writing APIs · 4a3b33f8
      Andrii Nakryiko 提交于
      Add APIs for appending new BTF types at the end of BTF object.
      
      Each BTF kind has either one API of the form btf__add_<kind>(). For types
      that have variable amount of additional items (struct/union, enum, func_proto,
      datasec), additional API is provided to emit each such item. E.g., for
      emitting a struct, one would use the following sequence of API calls:
      
      btf__add_struct(...);
      btf__add_field(...);
      ...
      btf__add_field(...);
      
      Each btf__add_field() will ensure that the last BTF type is of STRUCT or
      UNION kind and will automatically increment that type's vlen field.
      
      All the strings are provided as C strings (const char *), not a string offset.
      This significantly improves usability of BTF writer APIs. All such strings
      will be automatically appended to string section or existing string will be
      re-used, if such string was already added previously.
      
      Each API attempts to do all the reasonable validations, like enforcing
      non-empty names for entities with required names, proper value bounds, various
      bit offset restrictions, etc.
      
      Type ID validation is minimal because it's possible to emit a type that refers
      to type that will be emitted later, so libbpf has no way to enforce such
      cases. User must be careful to properly emit all the necessary types and
      specify type IDs that will be valid in the finally generated BTF.
      
      Each of btf__add_<kind>() APIs return new type ID on success or negative
      value on error. APIs like btf__add_field() that emit additional items
      return zero on success and negative value on error.
      Signed-off-by: NAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
      Link: https://lore.kernel.org/bpf/20200929020533.711288-2-andriin@fb.com
      4a3b33f8
    • A
      Merge branch 'bpf: add helpers to support BTF-based kernel' · 98b972d2
      Alexei Starovoitov 提交于
      Alan Maguire says:
      
      ====================
      This series attempts to provide a simple way for BPF programs (and in
      future other consumers) to utilize BPF Type Format (BTF) information
      to display kernel data structures in-kernel.  The use case this
      functionality is applied to here is to support a snprintf()-like
      helper to copy a BTF representation of kernel data to a string,
      and a BPF seq file helper to display BTF data for an iterator.
      
      There is already support in kernel/bpf/btf.c for "show" functionality;
      the changes here generalize that support from seq-file specific
      verifier display to the more generic case and add another specific
      use case; rather than seq_printf()ing the show data, it is copied
      to a supplied string using a snprintf()-like function.  Other future
      consumers of the show functionality could include a bpf_printk_btf()
      function which printk()ed the data instead.  Oops messaging in
      particular would be an interesting application for such functionality.
      
      The above potential use case hints at a potential reply to
      a reasonable objection that such typed display should be
      solved by tracing programs, where the in-kernel tracing records
      data and the userspace program prints it out.  While this
      is certainly the recommended approach for most cases, I
      believe having an in-kernel mechanism would be valuable
      also.  Critically in BPF programs it greatly simplifies
      debugging and tracing of such data to invoking a simple
      helper.
      
      One challenge raised in an earlier iteration of this work -
      where the BTF printing was implemented as a printk() format
      specifier - was that the amount of data printed per
      printk() was large, and other format specifiers were far
      simpler.  Here we sidestep that concern by printing
      components of the BTF representation as we go for the
      seq file case, and in the string case the snprintf()-like
      operation is intended to be a basis for perf event or
      ringbuf output.  The reasons for avoiding bpf_trace_printk
      are that
      
      1. bpf_trace_printk() strings are restricted in size and
      cannot display anything beyond trivial data structures; and
      2. bpf_trace_printk() is for debugging purposes only.
      
      As Alexei suggested, a bpf_trace_puts() helper could solve
      this in the future but it still would be limited by the
      1000 byte limit for traced strings.
      
      Default output for an sk_buff looks like this (zeroed fields
      are omitted):
      
      (struct sk_buff){
       .transport_header = (__u16)65535,
       .mac_header = (__u16)65535,
       .end = (sk_buff_data_t)192,
       .head = (unsigned char *)0x000000007524fd8b,
       .data = (unsigned char *)0x000000007524fd8b,
       .truesize = (unsigned int)768,
       .users = (refcount_t){
        .refs = (atomic_t){
         .counter = (int)1,
        },
       },
      }
      
      Flags can modify aspects of output format; see patch 3
      for more details.
      
      Changes since v6:
      
      - Updated safe data size to 32, object name size to 80.
        This increases the number of safe copies done, but performance is
        not a key goal here. WRT name size the largest type name length
        in bpf-next according to "pahole -s" is 64 bytes, so that still gives
        room for additional type qualifiers, parens etc within the name limit
        (Alexei, patch 2)
      - Remove inlines and converted as many #defines to functions as was
        possible.  In a few cases - btf_show_type_value[s]() specifically -
        I left these as macros as btf_show_type_value[s]() prepends and
        appends format strings to the format specifier (in order to include
        indentation, delimiters etc so a macro makes that simpler (Alexei,
        patch 2)
      - Handle btf_resolve_size() error in btf_show_obj_safe() (Alexei, patch 2)
      - Removed clang loop unroll in BTF snprintf test (Alexei)
      - switched to using bpf_core_type_id_kernel(type) as suggested by Andrii,
        and Alexei noted that __builtin_btf_type_id(,1) should be used (patch 4)
      - Added skip logic if __builtin_btf_type_id is not available (patches 4,8)
      - Bumped limits on bpf iters to support printing larger structures (Alexei,
        patch 5)
      - Updated overflow bpf_iter tests to reflect new iter max size (patch 6)
      - Updated seq helper to use type id only (Alexei, patch 7)
      - Updated BTF task iter test to use task struct instead of struct fs_struct
        since new limits allow a task_struct to be displayed (patch 8)
      - Fixed E2BIG handling in iter task (Alexei, patch 8)
      
      Changes since v5:
      
      - Moved btf print prepare into patch 3, type show seq
        with flags into patch 2 (Alexei, patches 2,3)
      - Fixed build bot warnings around static declarations
        and printf attributes
      - Renamed functions to snprintf_btf/seq_printf_btf
        (Alexei, patches 3-6)
      
      Changes since v4:
      
      - Changed approach from a BPF trace event-centric design to one
        utilizing a snprintf()-like helper and an iter helper (Alexei,
        patches 3,5)
      - Added tests to verify BTF output (patch 4)
      - Added support to tests for verifying BTF type_id-based display
        as well as type name via __builtin_btf_type_id (Andrii, patch 4).
      - Augmented task iter tests to cover the BTF-based seq helper.
        Because a task_struct's BTF-based representation would overflow
        the PAGE_SIZE limit on iterator data, the "struct fs_struct"
        (task->fs) is displayed for each task instead (Alexei, patch 6).
      
      Changes since v3:
      
      - Moved to RFC since the approach is different (and bpf-next is
        closed)
      - Rather than using a printk() format specifier as the means
        of invoking BTF-enabled display, a dedicated BPF helper is
        used.  This solves the issue of printk() having to output
        large amounts of data using a complex mechanism such as
        BTF traversal, but still provides a way for the display of
        such data to be achieved via BPF programs.  Future work could
        include a bpf_printk_btf() function to invoke display via
        printk() where the elements of a data structure are printk()ed
       one at a time.  Thanks to Petr Mladek, Andy Shevchenko and
        Rasmus Villemoes who took time to look at the earlier printk()
        format-specifier-focused version of this and provided feedback
        clarifying the problems with that approach.
      - Added trace id to the bpf_trace_printk events as a means of
        separating output from standard bpf_trace_printk() events,
        ensuring it can be easily parsed by the reader.
      - Added bpf_trace_btf() helper tests which do simple verification
        of the various display options.
      
      Changes since v2:
      
      - Alexei and Yonghong suggested it would be good to use
        probe_kernel_read() on to-be-shown data to ensure safety
        during operation.  Safe copy via probe_kernel_read() to a
        buffer object in "struct btf_show" is used to support
        this.  A few different approaches were explored
        including dynamic allocation and per-cpu buffers. The
        downside of dynamic allocation is that it would be done
        during BPF program execution for bpf_trace_printk()s using
        %pT format specifiers. The problem with per-cpu buffers
        is we'd have to manage preemption and since the display
        of an object occurs over an extended period and in printk
        context where we'd rather not change preemption status,
        it seemed tricky to manage buffer safety while considering
        preemption.  The approach of utilizing stack buffer space
        via the "struct btf_show" seemed like the simplest approach.
        The stack size of the associated functions which have a
        "struct btf_show" on their stack to support show operation
        (btf_type_snprintf_show() and btf_type_seq_show()) stays
        under 500 bytes. The compromise here is the safe buffer we
        use is small - 256 bytes - and as a result multiple
        probe_kernel_read()s are needed for larger objects. Most
        objects of interest are smaller than this (e.g.
        "struct sk_buff" is 224 bytes), and while task_struct is a
        notable exception at ~8K, performance is not the priority for
        BTF-based display. (Alexei and Yonghong, patch 2).
      - safe buffer use is the default behaviour (and is mandatory
        for BPF) but unsafe display - meaning no safe copy is done
        and we operate on the object itself - is supported via a
        'u' option.
      - pointers are prefixed with 0x for clarity (Alexei, patch 2)
      - added additional comments and explanations around BTF show
        code, especially around determining whether objects such
        zeroed. Also tried to comment safe object scheme used. (Yonghong,
        patch 2)
      - added late_initcall() to initialize vmlinux BTF so that it would
        not have to be initialized during printk operation (Alexei,
        patch 5)
      - removed CONFIG_BTF_PRINTF config option as it is not needed;
        CONFIG_DEBUG_INFO_BTF can be used to gate test behaviour and
        determining behaviour of type-based printk can be done via
        retrieval of BTF data; if it's not there BTF was unavailable
        or broken (Alexei, patches 4,6)
      - fix bpf_trace_printk test to use vmlinux.h and globals via
        skeleton infrastructure, removing need for perf events
        (Andrii, patch 8)
      
      Changes since v1:
      
      - changed format to be more drgn-like, rendering indented type info
        along with type names by default (Alexei)
      - zeroed values are omitted (Arnaldo) by default unless the '0'
        modifier is specified (Alexei)
      - added an option to print pointer values without obfuscation.
        The reason to do this is the sysctls controlling pointer display
        are likely to be irrelevant in many if not most tracing contexts.
        Some questions on this in the outstanding questions section below...
      - reworked printk format specifer so that we no longer rely on format
        %pT<type> but instead use a struct * which contains type information
        (Rasmus). This simplifies the printk parsing, makes use more dynamic
        and also allows specification by BTF id as well as name.
      - removed incorrect patch which tried to fix dereferencing of resolved
        BTF info for vmlinux; instead we skip modifiers for the relevant
        case (array element type determination) (Alexei).
      - fixed issues with negative snprintf format length (Rasmus)
      - added test cases for various data structure formats; base types,
        typedefs, structs, etc.
      - tests now iterate through all typedef, enum, struct and unions
        defined for vmlinux BTF and render a version of the target dummy
        value which is either all zeros or all 0xff values; the idea is this
        exercises the "skip if zero" and "print everything" cases.
      - added support in BPF for using the %pT format specifier in
        bpf_trace_printk()
      - added BPF tests which ensure %pT format specifier use works (Alexei).
      ====================
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      98b972d2
    • A
      selftests/bpf: Add test for bpf_seq_printf_btf helper · b72091bd
      Alan Maguire 提交于
      Add a test verifying iterating over tasks and displaying BTF
      representation of task_struct succeeds.
      Suggested-by: NAlexei Starovoitov <alexei.starovoitov@gmail.com>
      Signed-off-by: NAlan Maguire <alan.maguire@oracle.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/1601292670-1616-9-git-send-email-alan.maguire@oracle.com
      b72091bd
    • A
      bpf: Add bpf_seq_printf_btf helper · eb411377
      Alan Maguire 提交于
      A helper is added to allow seq file writing of kernel data
      structures using vmlinux BTF.  Its signature is
      
      long bpf_seq_printf_btf(struct seq_file *m, struct btf_ptr *ptr,
                              u32 btf_ptr_size, u64 flags);
      
      Flags and struct btf_ptr definitions/use are identical to the
      bpf_snprintf_btf helper, and the helper returns 0 on success
      or a negative error value.
      Suggested-by: NAlexei Starovoitov <alexei.starovoitov@gmail.com>
      Signed-off-by: NAlan Maguire <alan.maguire@oracle.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/1601292670-1616-8-git-send-email-alan.maguire@oracle.com
      eb411377
    • A
      selftests/bpf: Fix overflow tests to reflect iter size increase · eb58bbf2
      Alan Maguire 提交于
      bpf iter size increase to PAGE_SIZE << 3 means overflow tests assuming
      page size need to be bumped also.
      Signed-off-by: NAlan Maguire <alan.maguire@oracle.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/1601292670-1616-7-git-send-email-alan.maguire@oracle.com
      eb58bbf2
    • A
      bpf: Bump iter seq size to support BTF representation of large data structures · af653209
      Alan Maguire 提交于
      BPF iter size is limited to PAGE_SIZE; if we wish to display BTF-based
      representations of larger kernel data structures such as task_struct,
      this will be insufficient.
      Suggested-by: NAlexei Starovoitov <alexei.starovoitov@gmail.com>
      Signed-off-by: NAlan Maguire <alan.maguire@oracle.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/1601292670-1616-6-git-send-email-alan.maguire@oracle.com
      af653209
    • A
      selftests/bpf: Add bpf_snprintf_btf helper tests · 076a95f5
      Alan Maguire 提交于
      Tests verifying snprintf()ing of various data structures,
      flags combinations using a tp_btf program. Tests are skipped
      if __builtin_btf_type_id is not available to retrieve BTF
      type ids.
      Signed-off-by: NAlan Maguire <alan.maguire@oracle.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/1601292670-1616-5-git-send-email-alan.maguire@oracle.com
      076a95f5
    • A
      bpf: Add bpf_snprintf_btf helper · c4d0bfb4
      Alan Maguire 提交于
      A helper is added to support tracing kernel type information in BPF
      using the BPF Type Format (BTF).  Its signature is
      
      long bpf_snprintf_btf(char *str, u32 str_size, struct btf_ptr *ptr,
      		      u32 btf_ptr_size, u64 flags);
      
      struct btf_ptr * specifies
      
      - a pointer to the data to be traced
      - the BTF id of the type of data pointed to
      - a flags field is provided for future use; these flags
        are not to be confused with the BTF_F_* flags
        below that control how the btf_ptr is displayed; the
        flags member of the struct btf_ptr may be used to
        disambiguate types in kernel versus module BTF, etc;
        the main distinction is the flags relate to the type
        and information needed in identifying it; not how it
        is displayed.
      
      For example a BPF program with a struct sk_buff *skb
      could do the following:
      
      	static struct btf_ptr b = { };
      
      	b.ptr = skb;
      	b.type_id = __builtin_btf_type_id(struct sk_buff, 1);
      	bpf_snprintf_btf(str, sizeof(str), &b, sizeof(b), 0, 0);
      
      Default output looks like this:
      
      (struct sk_buff){
       .transport_header = (__u16)65535,
       .mac_header = (__u16)65535,
       .end = (sk_buff_data_t)192,
       .head = (unsigned char *)0x000000007524fd8b,
       .data = (unsigned char *)0x000000007524fd8b,
       .truesize = (unsigned int)768,
       .users = (refcount_t){
        .refs = (atomic_t){
         .counter = (int)1,
        },
       },
      }
      
      Flags modifying display are as follows:
      
      - BTF_F_COMPACT:	no formatting around type information
      - BTF_F_NONAME:		no struct/union member names/types
      - BTF_F_PTR_RAW:	show raw (unobfuscated) pointer values;
      			equivalent to %px.
      - BTF_F_ZERO:		show zero-valued struct/union members;
      			they are not displayed by default
      Signed-off-by: NAlan Maguire <alan.maguire@oracle.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/1601292670-1616-4-git-send-email-alan.maguire@oracle.com
      c4d0bfb4
    • A
      bpf: Move to generic BTF show support, apply it to seq files/strings · 31d0bc81
      Alan Maguire 提交于
      generalize the "seq_show" seq file support in btf.c to support
      a generic show callback of which we support two instances; the
      current seq file show, and a show with snprintf() behaviour which
      instead writes the type data to a supplied string.
      
      Both classes of show function call btf_type_show() with different
      targets; the seq file or the string to be written.  In the string
      case we need to track additional data - length left in string to write
      and length to return that we would have written (a la snprintf).
      
      By default show will display type information, field members and
      their types and values etc, and the information is indented
      based upon structure depth. Zeroed fields are omitted.
      
      Show however supports flags which modify its behaviour:
      
      BTF_SHOW_COMPACT - suppress newline/indent.
      BTF_SHOW_NONAME - suppress show of type and member names.
      BTF_SHOW_PTR_RAW - do not obfuscate pointer values.
      BTF_SHOW_UNSAFE - do not copy data to safe buffer before display.
      BTF_SHOW_ZERO - show zeroed values (by default they are not shown).
      Signed-off-by: NAlan Maguire <alan.maguire@oracle.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/1601292670-1616-3-git-send-email-alan.maguire@oracle.com
      31d0bc81
    • A
      76654e67
    • A
      libbpf: Add btf__new_empty() to create an empty BTF object · a871b043
      Andrii Nakryiko 提交于
      Add an ability to create an empty BTF object from scratch. This is going to be
      used by pahole for BTF encoding. And also by selftest for convenient creation
      of BTF objects.
      Signed-off-by: NAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
      Link: https://lore.kernel.org/bpf/20200926011357.2366158-7-andriin@fb.com
      a871b043
    • A
      libbpf: Allow modification of BTF and add btf__add_str API · 919d2b1d
      Andrii Nakryiko 提交于
      Allow internal BTF representation to switch from default read-only mode, in
      which raw BTF data is a single non-modifiable block of memory with BTF header,
      types, and strings layed out sequentially and contiguously in memory, into
      a writable representation with types and strings data split out into separate
      memory regions, that can be dynamically expanded.
      
      Such writable internal representation is transparent to users of libbpf APIs,
      but allows to append new types and strings at the end of BTF, which is
      a typical use case when generating BTF programmatically. All the basic
      guarantees of BTF types and strings layout is preserved, i.e., user can get
      `struct btf_type *` pointer and read it directly. Such btf_type pointers might
      be invalidated if BTF is modified, so some care is required in such mixed
      read/write scenarios.
      
      Switch from read-only to writable configuration happens automatically the
      first time when user attempts to modify BTF by either adding a new type or new
      string. It is still possible to get raw BTF data, which is a single piece of
      memory that can be persisted in ELF section or into a file as raw BTF. Such
      raw data memory is also still owned by BTF and will be freed either when BTF
      object is freed or if another modification to BTF happens, as any modification
      invalidates BTF raw representation.
      
      This patch adds the first two BTF manipulation APIs: btf__add_str(), which
      allows to add arbitrary strings to BTF string section, and btf__find_str()
      which allows to find existing string offset, but not add it if it's missing.
      All the added strings are automatically deduplicated. This is achieved by
      maintaining an additional string lookup index for all unique strings. Such
      index is built when BTF is switched to modifiable mode. If at that time BTF
      strings section contained duplicate strings, they are not de-duplicated. This
      is done specifically to not modify the existing content of BTF (types, their
      string offsets, etc), which can cause confusion and is especially important
      property if there is struct btf_ext associated with struct btf. By following
      this "imperfect deduplication" process, btf_ext is kept consitent and correct.
      If deduplication of strings is necessary, it can be forced by doing BTF
      deduplication, at which point all the strings will be eagerly deduplicated and
      all string offsets both in struct btf and struct btf_ext will be updated.
      Signed-off-by: NAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
      Link: https://lore.kernel.org/bpf/20200926011357.2366158-6-andriin@fb.com
      919d2b1d
    • A
      libbpf: Extract generic string hashing function for reuse · 7d9c71e1
      Andrii Nakryiko 提交于
      Calculating a hash of zero-terminated string is a common need when using
      hashmap, so extract it for reuse.
      Signed-off-by: NAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
      Link: https://lore.kernel.org/bpf/20200926011357.2366158-5-andriin@fb.com
      7d9c71e1
    • A
      libbpf: Generalize common logic for managing dynamically-sized arrays · 192f5a1f
      Andrii Nakryiko 提交于
      Managing dynamically-sized array is a common, but not trivial functionality,
      which significant amount of logic and code to implement properly. So instead
      of re-implementing it all the time, extract it into a helper function ans
      reuse.
      Signed-off-by: NAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
      Link: https://lore.kernel.org/bpf/20200926011357.2366158-4-andriin@fb.com
      192f5a1f
    • A
      libbpf: Remove assumption of single contiguous memory for BTF data · b8604247
      Andrii Nakryiko 提交于
      Refactor internals of struct btf to remove assumptions that BTF header, type
      data, and string data are layed out contiguously in a memory in a single
      memory allocation. Now we have three separate pointers pointing to the start
      of each respective are: header, types, strings. In the next patches, these
      pointers will be re-assigned to point to independently allocated memory areas,
      if BTF needs to be modified.
      Signed-off-by: NAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
      Link: https://lore.kernel.org/bpf/20200926011357.2366158-3-andriin@fb.com
      b8604247
    • A
      libbpf: Refactor internals of BTF type index · 740e69c3
      Andrii Nakryiko 提交于
      Refactor implementation of internal BTF type index to not use direct pointers.
      Instead it uses offset relative to the start of types data section. This
      allows for types data to be reallocatable, enabling implementation of
      modifiable BTF.
      
      As now getting type by ID has an extra indirection step, convert all internal
      type lookups to a new helper btf_type_id(), that returns non-const pointer to
      a type by its ID.
      Signed-off-by: NAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
      Link: https://lore.kernel.org/bpf/20200926011357.2366158-2-andriin@fb.com
      740e69c3
    • T
      selftests: Remove fmod_ret from test_overhead · b000def2
      Toke Høiland-Jørgensen 提交于
      The test_overhead prog_test included an fmod_ret program that attached to
      __set_task_comm() in the kernel. However, this function was never listed as
      allowed for return modification, so this only worked because of the
      verifier skipping tests when a trampoline already existed for the attach
      point. Now that the verifier checks have been fixed, remove fmod_ret from
      the test so it works again.
      
      Fixes: 4eaf0b5c ("selftest/bpf: Fmod_ret prog and implement test_overhead as part of bench")
      Acked-by: NAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: NToke Høiland-Jørgensen <toke@redhat.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      b000def2
    • T
      bpf: verifier: refactor check_attach_btf_id() · f7b12b6f
      Toke Høiland-Jørgensen 提交于
      The check_attach_btf_id() function really does three things:
      
      1. It performs a bunch of checks on the program to ensure that the
         attachment is valid.
      
      2. It stores a bunch of state about the attachment being requested in
         the verifier environment and struct bpf_prog objects.
      
      3. It allocates a trampoline for the attachment.
      
      This patch splits out (1.) and (3.) into separate functions which will
      perform the checks, but return the computed values instead of directly
      modifying the environment. This is done in preparation for reusing the
      checks when the actual attachment is happening, which will allow tracing
      programs to have multiple (compatible) attachments.
      
      This also fixes a bug where a bunch of checks were skipped if a trampoline
      already existed for the tracing target.
      
      Fixes: 6ba43b76 ("bpf: Attachment verification for BPF_MODIFY_RETURN")
      Fixes: 1e6c62a8 ("bpf: Introduce sleepable BPF programs")
      Acked-by: NAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: NToke Høiland-Jørgensen <toke@redhat.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      f7b12b6f
    • T
      bpf: change logging calls from verbose() to bpf_log() and use log pointer · efc68158
      Toke Høiland-Jørgensen 提交于
      In preparation for moving code around, change a bunch of references to
      env->log (and the verbose() logging helper) to use bpf_log() and a direct
      pointer to struct bpf_verifier_log. While we're touching the function
      signature, mark the 'prog' argument to bpf_check_type_match() as const.
      
      Also enhance the bpf_verifier_log_needed() check to handle NULL pointers
      for the log struct so we can re-use the code with logging disabled.
      Acked-by: NAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: NToke Høiland-Jørgensen <toke@redhat.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      efc68158
    • T
      bpf: disallow attaching modify_return tracing functions to other BPF programs · 1af9270e
      Toke Høiland-Jørgensen 提交于
      From the checks and commit messages for modify_return, it seems it was
      never the intention that it should be possible to attach a tracing program
      with expected_attach_type == BPF_MODIFY_RETURN to another BPF program.
      However, check_attach_modify_return() will only look at the function name,
      so if the target function starts with "security_", the attach will be
      allowed even for bpf2bpf attachment.
      
      Fix this oversight by also blocking the modification if a target program is
      supplied.
      
      Fixes: 18644cec ("bpf: Fix use-after-free in fmod_ret check")
      Fixes: 6ba43b76 ("bpf: Attachment verification for BPF_MODIFY_RETURN")
      Acked-by: NAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: NToke Høiland-Jørgensen <toke@redhat.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      1af9270e
    • A
      Merge branch 'Sockmap copying' · 84a20d8e
      Alexei Starovoitov 提交于
      Lorenz Bauer says:
      
      ====================
      Changes in v2:
      - Check sk_fullsock in map_update_elem (Martin)
      
      Enable calling map_update_elem on sockmaps from bpf_iter context. This
      in turn allows us to copy a sockmap by iterating its elements.
      
      The change itself is tiny, all thanks to the ground work from Martin,
      whose series [1] this patch is based on. I updated the tests to do some
      copying, and also included two cleanups.
      
      I'm sending this out now rather than when Martin's series has landed
      because I hope this can get in before the merge window (potentially)
      closes this weekend.
      
      1: https://lore.kernel.org/bpf/20200925000337.3853598-1-kafai@fb.com/
      ====================
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      84a20d8e
    • L
      selftest: bpf: Test copying a sockmap and sockhash · 5b87adc3
      Lorenz Bauer 提交于
      Since we can now call map_update_elem(sockmap) from bpf_iter context
      it's possible to copy a sockmap or sockhash in the kernel. Add a
      selftest which exercises this.
      Signed-off-by: NLorenz Bauer <lmb@cloudflare.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20200928090805.23343-5-lmb@cloudflare.com
      5b87adc3
    • L
      selftests: bpf: Remove shared header from sockmap iter test · 27870317
      Lorenz Bauer 提交于
      The shared header to define SOCKMAP_MAX_ENTRIES is a bit overkill.
      Dynamically allocate the sock_fd array based on bpf_map__max_entries
      instead.
      Suggested-by: NYonghong Song <yhs@fb.com>
      Signed-off-by: NLorenz Bauer <lmb@cloudflare.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NYonghong Song <yhs@fb.com>
      Link: https://lore.kernel.org/bpf/20200928090805.23343-4-lmb@cloudflare.com
      27870317
    • L
      selftests: bpf: Add helper to compare socket cookies · 26c3270d
      Lorenz Bauer 提交于
      We compare socket cookies to ensure that insertion into a sockmap worked.
      Pull this out into a helper function for use in other tests.
      Signed-off-by: NLorenz Bauer <lmb@cloudflare.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20200928090805.23343-3-lmb@cloudflare.com
      26c3270d
    • L
      bpf: sockmap: Enable map_update_elem from bpf_iter · 6550f2dd
      Lorenz Bauer 提交于
      Allow passing a pointer to a BTF struct sock_common* when updating
      a sockmap or sockhash. Since BTF pointers can fault and therefore be
      NULL at runtime we need to add an additional !sk check to
      sock_map_update_elem. Since we may be passed a request or timewait
      socket we also need to check sk_fullsock. Doing this allows calling
      map_update_elem on sockmap from bpf_iter context, which uses
      BTF pointers.
      Signed-off-by: NLorenz Bauer <lmb@cloudflare.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20200928090805.23343-2-lmb@cloudflare.com
      6550f2dd
    • L
      bpf, cpumap: Remove rcpu pointer from cpu_map_build_skb signature · efa90b50
      Lorenzo Bianconi 提交于
      Get rid of bpf_cpu_map_entry pointer in cpu_map_build_skb routine
      signature since it is no longer needed.
      Signed-off-by: NLorenzo Bianconi <lorenzo@kernel.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/33cb9b7dc447de3ea6fd6ce713ac41bca8794423.1601292015.git.lorenzo@kernel.org
      efa90b50
    • S
      selftests/bpf: Add raw_tp_test_run · 09d8ad16
      Song Liu 提交于
      This test runs test_run for raw_tracepoint program. The test covers ctx
      input, retval output, and running on correct cpu.
      Signed-off-by: NSong Liu <songliubraving@fb.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAndrii Nakryiko <andriin@fb.com>
      Link: https://lore.kernel.org/bpf/20200925205432.1777-4-songliubraving@fb.com
      09d8ad16
    • S
      libbpf: Support test run of raw tracepoint programs · 88f7fe72
      Song Liu 提交于
      Add bpf_prog_test_run_opts() with support of new fields in bpf_attr.test,
      namely, flags and cpu. Also extend _opts operations to support outputs via
      opts.
      Signed-off-by: NSong Liu <songliubraving@fb.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAndrii Nakryiko <andriin@fb.com>
      Link: https://lore.kernel.org/bpf/20200925205432.1777-3-songliubraving@fb.com
      88f7fe72
    • S
      bpf: Enable BPF_PROG_TEST_RUN for raw_tracepoint · 1b4d60ec
      Song Liu 提交于
      Add .test_run for raw_tracepoint. Also, introduce a new feature that runs
      the target program on a specific CPU. This is achieved by a new flag in
      bpf_attr.test, BPF_F_TEST_RUN_ON_CPU. When this flag is set, the program
      is triggered on cpu with id bpf_attr.test.cpu. This feature is needed for
      BPF programs that handle perf_event and other percpu resources, as the
      program can access these resource locally.
      Signed-off-by: NSong Liu <songliubraving@fb.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
      Acked-by: NAndrii Nakryiko <andriin@fb.com>
      Link: https://lore.kernel.org/bpf/20200925205432.1777-2-songliubraving@fb.com
      1b4d60ec
    • M
      xsk: Fix possible crash in socket_release when out-of-memory · 1fd17c8c
      Magnus Karlsson 提交于
      Fix possible crash in socket_release when an out-of-memory error has
      occurred in the bind call. If a socket using the XDP_SHARED_UMEM flag
      encountered an error in xp_create_and_assign_umem, the bind code
      jumped to the exit routine but erroneously forgot to set the err value
      before jumping. This meant that the exit routine thought the setup
      went well and set the state of the socket to XSK_BOUND. The xsk socket
      release code will then, at application exit, think that this is a
      properly setup socket, when it is not, leading to a crash when all
      fields in the socket have in fact not been initialized properly. Fix
      this by setting the err variable in xsk_bind so that the socket is not
      set to XSK_BOUND which leads to the clean-up in xsk_release not being
      triggered.
      
      Fixes: 1c1efc2a ("xsk: Create and free buffer pool independently from umem")
      Reported-by: syzbot+ddc7b4944bc61da19b81@syzkaller.appspotmail.com
      Signed-off-by: NMagnus Karlsson <magnus.karlsson@intel.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/1601112373-10595-1-git-send-email-magnus.karlsson@gmail.com
      1fd17c8c
  2. 26 9月, 2020 10 次提交