1. 25 6月, 2020 21 次提交
  2. 24 6月, 2020 8 次提交
    • A
      libbpf: Prevent loading vmlinux BTF twice · 192b6638
      Andrii Nakryiko 提交于
      Prevent loading/parsing vmlinux BTF twice in some cases: for CO-RE relocations
      and for BTF-aware hooks (tp_btf, fentry/fexit, etc).
      
      Fixes: a6ed02ca ("libbpf: Load btf_vmlinux only once per object.")
      Signed-off-by: NAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/20200624043805.1794620-1-andriin@fb.com
      192b6638
    • C
      135c783f
    • Q
      tools, bpftool: Fix variable shadowing in emit_obj_refs_json() · 54b66c22
      Quentin Monnet 提交于
      Building bpftool yields the following complaint:
      
          pids.c: In function 'emit_obj_refs_json':
          pids.c:175:80: warning: declaration of 'json_wtr' shadows a global declaration [-Wshadow]
            175 | void emit_obj_refs_json(struct obj_refs_table *table, __u32 id, json_writer_t *json_wtr)
                |                                                                 ~~~~~~~~~~~~~~~^~~~~~~~
          In file included from pids.c:11:
          main.h:141:23: note: shadowed declaration is here
            141 | extern json_writer_t *json_wtr;
                |                       ^~~~~~~~
      
      Let's rename the variable.
      
      v2:
      - Rename the variable instead of calling the global json_wtr directly.
      Signed-off-by: NQuentin Monnet <quentin@isovalent.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAndrii Nakryiko <andriin@fb.com>
      Link: https://lore.kernel.org/bpf/20200623213600.16643-1-quentin@isovalent.com
      54b66c22
    • A
      4e608675
    • T
      tools, bpftool: Correctly evaluate $(BUILD_BPF_SKELS) in Makefile · 9d9d8cc2
      Tobias Klauser 提交于
      Currently, if the clang-bpf-co-re feature is not available, the build
      fails with e.g.
      
        CC       prog.o
      prog.c:1462:10: fatal error: profiler.skel.h: No such file or directory
       1462 | #include "profiler.skel.h"
            |          ^~~~~~~~~~~~~~~~~
      
      This is due to the fact that the BPFTOOL_WITHOUT_SKELETONS macro is not
      defined, despite BUILD_BPF_SKELS not being set. Fix this by correctly
      evaluating $(BUILD_BPF_SKELS) when deciding on whether to add
      -DBPFTOOL_WITHOUT_SKELETONS to CFLAGS.
      
      Fixes: 05aca6da ("tools/bpftool: Generalize BPF skeleton support and generate vmlinux.h")
      Signed-off-by: NTobias Klauser <tklauser@distanz.ch>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Reviewed-by: NQuentin Monnet <quentin@isovalent.com>
      Acked-by: NAndrii Nakryiko <andriin@fb.com>
      Link: https://lore.kernel.org/bpf/20200623103710.10370-1-tklauser@distanz.ch
      9d9d8cc2
    • J
      selftests/bpf: Add variable-length data concat pattern less than test · 2fde1747
      John Fastabend 提交于
      Extend original variable-length tests with a case to catch a common
      existing pattern of testing for < 0 for errors. Note because
      verifier also tracks upper bounds and we know it can not be greater
      than MAX_LEN here we can skip upper bound check.
      
      In ALU64 enabled compilation converting from long->int return types
      in probe helpers results in extra instruction pattern, <<= 32, s >>= 32.
      The trade-off is the non-ALU64 case works. If you really care about
      every extra insn (XDP case?) then you probably should be using original
      int type.
      
      In addition adding a sext insn to bpf might help the verifier in the
      general case to avoid these types of tricks.
      Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/20200623032224.4020118-3-andriin@fb.com
      2fde1747
    • A
      selftests/bpf: Add variable-length data concatenation pattern test · 5e85c6bb
      Andrii Nakryiko 提交于
      Add selftest that validates variable-length data reading and concatentation
      with one big shared data array. This is a common pattern in production use for
      monitoring and tracing applications, that potentially can read a lot of data,
      but overall read much less. Such pattern allows to determine precisely what
      amount of data needs to be sent over perfbuf/ringbuf and maximize efficiency.
      Signed-off-by: NAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/20200623032224.4020118-2-andriin@fb.com
      5e85c6bb
    • A
      bpf: Switch most helper return values from 32-bit int to 64-bit long · bdb7b79b
      Andrii Nakryiko 提交于
      Switch most of BPF helper definitions from returning int to long. These
      definitions are coming from comments in BPF UAPI header and are used to
      generate bpf_helper_defs.h (under libbpf) to be later included and used from
      BPF programs.
      
      In actual in-kernel implementation, all the helpers are defined as returning
      u64, but due to some historical reasons, most of them are actually defined as
      returning int in UAPI (usually, to return 0 on success, and negative value on
      error).
      
      This actually causes Clang to quite often generate sub-optimal code, because
      compiler believes that return value is 32-bit, and in a lot of cases has to be
      up-converted (usually with a pair of 32-bit bit shifts) to 64-bit values,
      before they can be used further in BPF code.
      
      Besides just "polluting" the code, these 32-bit shifts quite often cause
      problems for cases in which return value matters. This is especially the case
      for the family of bpf_probe_read_str() functions. There are few other similar
      helpers (e.g., bpf_read_branch_records()), in which return value is used by
      BPF program logic to record variable-length data and process it. For such
      cases, BPF program logic carefully manages offsets within some array or map to
      read variable-length data. For such uses, it's crucial for BPF verifier to
      track possible range of register values to prove that all the accesses happen
      within given memory bounds. Those extraneous zero-extending bit shifts,
      inserted by Clang (and quite often interleaved with other code, which makes
      the issues even more challenging and sometimes requires employing extra
      per-variable compiler barriers), throws off verifier logic and makes it mark
      registers as having unknown variable offset. We'll study this pattern a bit
      later below.
      
      Another common pattern is to check return of BPF helper for non-zero state to
      detect error conditions and attempt alternative actions in such case. Even in
      this simple and straightforward case, this 32-bit vs BPF's native 64-bit mode
      quite often leads to sub-optimal and unnecessary extra code. We'll look at
      this pattern as well.
      
      Clang's BPF target supports two modes of code generation: ALU32, in which it
      is capable of using lower 32-bit parts of registers, and no-ALU32, in which
      only full 64-bit registers are being used. ALU32 mode somewhat mitigates the
      above described problems, but not in all cases.
      
      This patch switches all the cases in which BPF helpers return 0 or negative
      error from returning int to returning long. It is shown below that such change
      in definition leads to equivalent or better code. No-ALU32 mode benefits more,
      but ALU32 mode doesn't degrade or still gets improved code generation.
      
      Another class of cases switched from int to long are bpf_probe_read_str()-like
      helpers, which encode successful case as non-negative values, while still
      returning negative value for errors.
      
      In all of such cases, correctness is preserved due to two's complement
      encoding of negative values and the fact that all helpers return values with
      32-bit absolute value. Two's complement ensures that for negative values
      higher 32 bits are all ones and when truncated, leave valid negative 32-bit
      value with the same value. Non-negative values have upper 32 bits set to zero
      and similarly preserve value when high 32 bits are truncated. This means that
      just casting to int/u32 is correct and efficient (and in ALU32 mode doesn't
      require any extra shifts).
      
      To minimize the chances of regressions, two code patterns were investigated,
      as mentioned above. For both patterns, BPF assembly was analyzed in
      ALU32/NO-ALU32 compiler modes, both with current 32-bit int return type and
      new 64-bit long return type.
      
      Case 1. Variable-length data reading and concatenation. This is quite
      ubiquitous pattern in tracing/monitoring applications, reading data like
      process's environment variables, file path, etc. In such case, many pieces of
      string-like variable-length data are read into a single big buffer, and at the
      end of the process, only a part of array containing actual data is sent to
      user-space for further processing. This case is tested in test_varlen.c
      selftest (in the next patch). Code flow is roughly as follows:
      
        void *payload = &sample->payload;
        u64 len;
      
        len = bpf_probe_read_kernel_str(payload, MAX_SZ1, &source_data1);
        if (len <= MAX_SZ1) {
            payload += len;
            sample->len1 = len;
        }
        len = bpf_probe_read_kernel_str(payload, MAX_SZ2, &source_data2);
        if (len <= MAX_SZ2) {
            payload += len;
            sample->len2 = len;
        }
        /* and so on */
        sample->total_len = payload - &sample->payload;
        /* send over, e.g., perf buffer */
      
      There could be two variations with slightly different code generated: when len
      is 64-bit integer and when it is 32-bit integer. Both variations were analysed.
      BPF assembly instructions between two successive invocations of
      bpf_probe_read_kernel_str() were used to check code regressions. Results are
      below, followed by short analysis. Left side is using helpers with int return
      type, the right one is after the switch to long.
      
      ALU32 + INT                                ALU32 + LONG
      ===========                                ============
      
      64-BIT (13 insns):                         64-BIT (10 insns):
      ------------------------------------       ------------------------------------
        17:   call 115                             17:   call 115
        18:   if w0 > 256 goto +9 <LBB0_4>         18:   if r0 > 256 goto +6 <LBB0_4>
        19:   w1 = w0                              19:   r1 = 0 ll
        20:   r1 <<= 32                            21:   *(u64 *)(r1 + 0) = r0
        21:   r1 s>>= 32                           22:   r6 = 0 ll
        22:   r2 = 0 ll                            24:   r6 += r0
        24:   *(u64 *)(r2 + 0) = r1              00000000000000c8 <LBB0_4>:
        25:   r6 = 0 ll                            25:   r1 = r6
        27:   r6 += r1                             26:   w2 = 256
      00000000000000e0 <LBB0_4>:                   27:   r3 = 0 ll
        28:   r1 = r6                              29:   call 115
        29:   w2 = 256
        30:   r3 = 0 ll
        32:   call 115
      
      32-BIT (11 insns):                         32-BIT (12 insns):
      ------------------------------------       ------------------------------------
        17:   call 115                             17:   call 115
        18:   if w0 > 256 goto +7 <LBB1_4>         18:   if w0 > 256 goto +8 <LBB1_4>
        19:   r1 = 0 ll                            19:   r1 = 0 ll
        21:   *(u32 *)(r1 + 0) = r0                21:   *(u32 *)(r1 + 0) = r0
        22:   w1 = w0                              22:   r0 <<= 32
        23:   r6 = 0 ll                            23:   r0 >>= 32
        25:   r6 += r1                             24:   r6 = 0 ll
      00000000000000d0 <LBB1_4>:                   26:   r6 += r0
        26:   r1 = r6                            00000000000000d8 <LBB1_4>:
        27:   w2 = 256                             27:   r1 = r6
        28:   r3 = 0 ll                            28:   w2 = 256
        30:   call 115                             29:   r3 = 0 ll
                                                   31:   call 115
      
      In ALU32 mode, the variant using 64-bit length variable clearly wins and
      avoids unnecessary zero-extension bit shifts. In practice, this is even more
      important and good, because BPF code won't need to do extra checks to "prove"
      that payload/len are within good bounds.
      
      32-bit len is one instruction longer. Clang decided to do 64-to-32 casting
      with two bit shifts, instead of equivalent `w1 = w0` assignment. The former
      uses extra register. The latter might potentially lose some range information,
      but not for 32-bit value. So in this case, verifier infers that r0 is [0, 256]
      after check at 18:, and shifting 32 bits left/right keeps that range intact.
      We should probably look into Clang's logic and see why it chooses bitshifts
      over sub-register assignments for this.
      
      NO-ALU32 + INT                             NO-ALU32 + LONG
      ==============                             ===============
      
      64-BIT (14 insns):                         64-BIT (10 insns):
      ------------------------------------       ------------------------------------
        17:   call 115                             17:   call 115
        18:   r0 <<= 32                            18:   if r0 > 256 goto +6 <LBB0_4>
        19:   r1 = r0                              19:   r1 = 0 ll
        20:   r1 >>= 32                            21:   *(u64 *)(r1 + 0) = r0
        21:   if r1 > 256 goto +7 <LBB0_4>         22:   r6 = 0 ll
        22:   r0 s>>= 32                           24:   r6 += r0
        23:   r1 = 0 ll                          00000000000000c8 <LBB0_4>:
        25:   *(u64 *)(r1 + 0) = r0                25:   r1 = r6
        26:   r6 = 0 ll                            26:   r2 = 256
        28:   r6 += r0                             27:   r3 = 0 ll
      00000000000000e8 <LBB0_4>:                   29:   call 115
        29:   r1 = r6
        30:   r2 = 256
        31:   r3 = 0 ll
        33:   call 115
      
      32-BIT (13 insns):                         32-BIT (13 insns):
      ------------------------------------       ------------------------------------
        17:   call 115                             17:   call 115
        18:   r1 = r0                              18:   r1 = r0
        19:   r1 <<= 32                            19:   r1 <<= 32
        20:   r1 >>= 32                            20:   r1 >>= 32
        21:   if r1 > 256 goto +6 <LBB1_4>         21:   if r1 > 256 goto +6 <LBB1_4>
        22:   r2 = 0 ll                            22:   r2 = 0 ll
        24:   *(u32 *)(r2 + 0) = r0                24:   *(u32 *)(r2 + 0) = r0
        25:   r6 = 0 ll                            25:   r6 = 0 ll
        27:   r6 += r1                             27:   r6 += r1
      00000000000000e0 <LBB1_4>:                 00000000000000e0 <LBB1_4>:
        28:   r1 = r6                              28:   r1 = r6
        29:   r2 = 256                             29:   r2 = 256
        30:   r3 = 0 ll                            30:   r3 = 0 ll
        32:   call 115                             32:   call 115
      
      In NO-ALU32 mode, for the case of 64-bit len variable, Clang generates much
      superior code, as expected, eliminating unnecessary bit shifts. For 32-bit
      len, code is identical.
      
      So overall, only ALU-32 32-bit len case is more-or-less equivalent and the
      difference stems from internal Clang decision, rather than compiler lacking
      enough information about types.
      
      Case 2. Let's look at the simpler case of checking return result of BPF helper
      for errors. The code is very simple:
      
        long bla;
        if (bpf_probe_read_kenerl(&bla, sizeof(bla), 0))
            return 1;
        else
            return 0;
      
      ALU32 + CHECK (9 insns)                    ALU32 + CHECK (9 insns)
      ====================================       ====================================
        0:    r1 = r10                             0:    r1 = r10
        1:    r1 += -8                             1:    r1 += -8
        2:    w2 = 8                               2:    w2 = 8
        3:    r3 = 0                               3:    r3 = 0
        4:    call 113                             4:    call 113
        5:    w1 = w0                              5:    r1 = r0
        6:    w0 = 1                               6:    w0 = 1
        7:    if w1 != 0 goto +1 <LBB2_2>          7:    if r1 != 0 goto +1 <LBB2_2>
        8:    w0 = 0                               8:    w0 = 0
      0000000000000048 <LBB2_2>:                 0000000000000048 <LBB2_2>:
        9:    exit                                 9:    exit
      
      Almost identical code, the only difference is the use of full register
      assignment (r1 = r0) vs half-registers (w1 = w0) in instruction #5. On 32-bit
      architectures, new BPF assembly might be slightly less optimal, in theory. But
      one can argue that's not a big issue, given that use of full registers is
      still prevalent (e.g., for parameter passing).
      
      NO-ALU32 + CHECK (11 insns)                NO-ALU32 + CHECK (9 insns)
      ====================================       ====================================
        0:    r1 = r10                             0:    r1 = r10
        1:    r1 += -8                             1:    r1 += -8
        2:    r2 = 8                               2:    r2 = 8
        3:    r3 = 0                               3:    r3 = 0
        4:    call 113                             4:    call 113
        5:    r1 = r0                              5:    r1 = r0
        6:    r1 <<= 32                            6:    r0 = 1
        7:    r1 >>= 32                            7:    if r1 != 0 goto +1 <LBB2_2>
        8:    r0 = 1                               8:    r0 = 0
        9:    if r1 != 0 goto +1 <LBB2_2>        0000000000000048 <LBB2_2>:
       10:    r0 = 0                               9:    exit
      0000000000000058 <LBB2_2>:
       11:    exit
      
      NO-ALU32 is a clear improvement, getting rid of unnecessary zero-extension bit
      shifts.
      Signed-off-by: NAndrii Nakryiko <andriin@fb.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/20200623032224.4020118-1-andriin@fb.com
      bdb7b79b
  3. 23 6月, 2020 11 次提交