1. 08 7月, 2021 2 次提交
  2. 16 6月, 2021 1 次提交
  3. 26 5月, 2021 1 次提交
    • H
      xdp: Extend xdp_redirect_map with broadcast support · e624d4ed
      Hangbin Liu 提交于
      This patch adds two flags BPF_F_BROADCAST and BPF_F_EXCLUDE_INGRESS to
      extend xdp_redirect_map for broadcast support.
      
      With BPF_F_BROADCAST the packet will be broadcasted to all the interfaces
      in the map. with BPF_F_EXCLUDE_INGRESS the ingress interface will be
      excluded when do broadcasting.
      
      When getting the devices in dev hash map via dev_map_hash_get_next_key(),
      there is a possibility that we fall back to the first key when a device
      was removed. This will duplicate packets on some interfaces. So just walk
      the whole buckets to avoid this issue. For dev array map, we also walk the
      whole map to find valid interfaces.
      
      Function bpf_clear_redirect_map() was removed in
      commit ee75aef2 ("bpf, xdp: Restructure redirect actions").
      Add it back as we need to use ri->map again.
      
      With test topology:
        +-------------------+             +-------------------+
        | Host A (i40e 10G) |  ---------- | eno1(i40e 10G)    |
        +-------------------+             |                   |
                                          |   Host B          |
        +-------------------+             |                   |
        | Host C (i40e 10G) |  ---------- | eno2(i40e 10G)    |
        +-------------------+             |                   |
                                          |          +------+ |
                                          | veth0 -- | Peer | |
                                          | veth1 -- |      | |
                                          | veth2 -- |  NS  | |
                                          |          +------+ |
                                          +-------------------+
      
      On Host A:
       # pktgen/pktgen_sample03_burst_single_flow.sh -i eno1 -d $dst_ip -m $dst_mac -s 64
      
      On Host B(Intel(R) Xeon(R) CPU E5-2690 v3 @ 2.60GHz, 128G Memory):
      Use xdp_redirect_map and xdp_redirect_map_multi in samples/bpf for testing.
      All the veth peers in the NS have a XDP_DROP program loaded. The
      forward_map max_entries in xdp_redirect_map_multi is modify to 4.
      
      Testing the performance impact on the regular xdp_redirect path with and
      without patch (to check impact of additional check for broadcast mode):
      
      5.12 rc4         | redirect_map        i40e->i40e      |    2.0M |  9.7M
      5.12 rc4         | redirect_map        i40e->veth      |    1.7M | 11.8M
      5.12 rc4 + patch | redirect_map        i40e->i40e      |    2.0M |  9.6M
      5.12 rc4 + patch | redirect_map        i40e->veth      |    1.7M | 11.7M
      
      Testing the performance when cloning packets with the redirect_map_multi
      test, using a redirect map size of 4, filled with 1-3 devices:
      
      5.12 rc4 + patch | redirect_map multi  i40e->veth (x1) |    1.7M | 11.4M
      5.12 rc4 + patch | redirect_map multi  i40e->veth (x2) |    1.1M |  4.3M
      5.12 rc4 + patch | redirect_map multi  i40e->veth (x3) |    0.8M |  2.6M
      Signed-off-by: NHangbin Liu <liuhangbin@gmail.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NToke Høiland-Jørgensen <toke@redhat.com>
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
      Acked-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Link: https://lore.kernel.org/bpf/20210519090747.1655268-3-liuhangbin@gmail.com
      e624d4ed
  4. 25 5月, 2021 1 次提交
  5. 19 5月, 2021 3 次提交
  6. 28 4月, 2021 1 次提交
    • F
      bpf: Implement formatted output helpers with bstr_printf · 48cac3f4
      Florent Revest 提交于
      BPF has three formatted output helpers: bpf_trace_printk, bpf_seq_printf
      and bpf_snprintf. Their signatures specify that all arguments are
      provided from the BPF world as u64s (in an array or as registers). All
      of these helpers are currently implemented by calling functions such as
      snprintf() whose signatures take a variable number of arguments, then
      placed in a va_list by the compiler to call vsnprintf().
      
      "d9c9e4db bpf: Factorize bpf_trace_printk and bpf_seq_printf" introduced
      a bpf_printf_prepare function that fills an array of u64 sanitized
      arguments with an array of "modifiers" which indicate what the "real"
      size of each argument should be (given by the format specifier). The
      BPF_CAST_FMT_ARG macro consumes these arrays and casts each argument to
      its real size. However, the C promotion rules implicitely cast them all
      back to u64s. Therefore, the arguments given to snprintf are u64s and
      the va_list constructed by the compiler will use 64 bits for each
      argument. On 64 bit machines, this happens to work well because 32 bit
      arguments in va_lists need to occupy 64 bits anyway, but on 32 bit
      architectures this breaks the layout of the va_list expected by the
      called function and mangles values.
      
      In "88a5c690 bpf: fix bpf_trace_printk on 32 bit archs", this problem
      had been solved for bpf_trace_printk only with a "horrid workaround"
      that emitted multiple calls to trace_printk where each call had
      different argument types and generated different va_list layouts. One of
      the call would be dynamically chosen at runtime. This was ok with the 3
      arguments that bpf_trace_printk takes but bpf_seq_printf and
      bpf_snprintf accept up to 12 arguments. Because this approach scales
      code exponentially, it is not a viable option anymore.
      
      Because the promotion rules are part of the language and because the
      construction of a va_list is an arch-specific ABI, it's best to just
      avoid variadic arguments and va_lists altogether. Thankfully the
      kernel's snprintf() has an alternative in the form of bstr_printf() that
      accepts arguments in a "binary buffer representation". These binary
      buffers are currently created by vbin_printf and used in the tracing
      subsystem to split the cost of printing into two parts: a fast one that
      only dereferences and remembers values, and a slower one, called later,
      that does the pretty-printing.
      
      This patch refactors bpf_printf_prepare to construct binary buffers of
      arguments consumable by bstr_printf() instead of arrays of arguments and
      modifiers. This gets rid of BPF_CAST_FMT_ARG and greatly simplifies the
      bpf_printf_prepare usage but there are a few gotchas that change how
      bpf_printf_prepare needs to do things.
      
      Currently, bpf_printf_prepare uses a per cpu temporary buffer as a
      generic storage for strings and IP addresses. With this refactoring, the
      temporary buffers now holds all the arguments in a structured binary
      format.
      
      To comply with the format expected by bstr_printf, certain format
      specifiers also need to be pre-formatted: %pB and %pi6/%pi4/%pI4/%pI6.
      Because vsnprintf subroutines for these specifiers are hard to expose,
      we pre-format these arguments with calls to snprintf().
      Reported-by: NRasmus Villemoes <linux@rasmusvillemoes.dk>
      Signed-off-by: NFlorent Revest <revest@chromium.org>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20210427174313.860948-3-revest@chromium.org
      48cac3f4
  7. 20 4月, 2021 3 次提交
    • F
      bpf: Add a bpf_snprintf helper · 7b15523a
      Florent Revest 提交于
      The implementation takes inspiration from the existing bpf_trace_printk
      helper but there are a few differences:
      
      To allow for a large number of format-specifiers, parameters are
      provided in an array, like in bpf_seq_printf.
      
      Because the output string takes two arguments and the array of
      parameters also takes two arguments, the format string needs to fit in
      one argument. Thankfully, ARG_PTR_TO_CONST_STR is guaranteed to point to
      a zero-terminated read-only map so we don't need a format string length
      arg.
      
      Because the format-string is known at verification time, we also do
      a first pass of format string validation in the verifier logic. This
      makes debugging easier.
      Signed-off-by: NFlorent Revest <revest@chromium.org>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20210419155243.1632274-4-revest@chromium.org
      7b15523a
    • F
      bpf: Add a ARG_PTR_TO_CONST_STR argument type · fff13c4b
      Florent Revest 提交于
      This type provides the guarantee that an argument is going to be a const
      pointer to somewhere in a read-only map value. It also checks that this
      pointer is followed by a zero character before the end of the map value.
      Signed-off-by: NFlorent Revest <revest@chromium.org>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NAndrii Nakryiko <andrii@kernel.org>
      Link: https://lore.kernel.org/bpf/20210419155243.1632274-3-revest@chromium.org
      fff13c4b
    • F
      bpf: Factorize bpf_trace_printk and bpf_seq_printf · d9c9e4db
      Florent Revest 提交于
      Two helpers (trace_printk and seq_printf) have very similar
      implementations of format string parsing and a third one is coming
      (snprintf). To avoid code duplication and make the code easier to
      maintain, this moves the operations associated with format string
      parsing (validation and argument sanitization) into one generic
      function.
      
      The implementation of the two existing helpers already drifted quite a
      bit so unifying them entailed a lot of changes:
      
      - bpf_trace_printk always expected fmt[fmt_size] to be the terminating
        NULL character, this is no longer true, the first 0 is terminating.
      - bpf_trace_printk now supports %% (which produces the percentage char).
      - bpf_trace_printk now skips width formating fields.
      - bpf_trace_printk now supports the X modifier (capital hexadecimal).
      - bpf_trace_printk now supports %pK, %px, %pB, %pi4, %pI4, %pi6 and %pI6
      - argument casting on 32 bit has been simplified into one macro and
        using an enum instead of obscure int increments.
      
      - bpf_seq_printf now uses bpf_trace_copy_string instead of
        strncpy_from_kernel_nofault and handles the %pks %pus specifiers.
      - bpf_seq_printf now prints longs correctly on 32 bit architectures.
      
      - both were changed to use a global per-cpu tmp buffer instead of one
        stack buffer for trace_printk and 6 small buffers for seq_printf.
      - to avoid per-cpu buffer usage conflict, these helpers disable
        preemption while the per-cpu buffer is in use.
      - both helpers now support the %ps and %pS specifiers to print symbols.
      
      The implementation is also moved from bpf_trace.c to helpers.c because
      the upcoming bpf_snprintf helper will be made available to all BPF
      programs and will need it.
      Signed-off-by: NFlorent Revest <revest@chromium.org>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20210419155243.1632274-2-revest@chromium.org
      d9c9e4db
  8. 09 4月, 2021 1 次提交
  9. 03 4月, 2021 1 次提交
  10. 27 3月, 2021 4 次提交
    • M
      bpf: selftests: Add kfunc_call test · 7bd1590d
      Martin KaFai Lau 提交于
      This patch adds a few kernel function bpf_kfunc_call_test*() for the
      selftest's test_run purpose.  They will be allowed for tc_cls prog.
      
      The selftest calling the kernel function bpf_kfunc_call_test*()
      is also added in this patch.
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20210325015252.1551395-1-kafai@fb.com
      7bd1590d
    • M
      bpf: Support bpf program calling kernel function · e6ac2450
      Martin KaFai Lau 提交于
      This patch adds support to BPF verifier to allow bpf program calling
      kernel function directly.
      
      The use case included in this set is to allow bpf-tcp-cc to directly
      call some tcp-cc helper functions (e.g. "tcp_cong_avoid_ai()").  Those
      functions have already been used by some kernel tcp-cc implementations.
      
      This set will also allow the bpf-tcp-cc program to directly call the
      kernel tcp-cc implementation,  For example, a bpf_dctcp may only want to
      implement its own dctcp_cwnd_event() and reuse other dctcp_*() directly
      from the kernel tcp_dctcp.c instead of reimplementing (or
      copy-and-pasting) them.
      
      The tcp-cc kernel functions mentioned above will be white listed
      for the struct_ops bpf-tcp-cc programs to use in a later patch.
      The white listed functions are not bounded to a fixed ABI contract.
      Those functions have already been used by the existing kernel tcp-cc.
      If any of them has changed, both in-tree and out-of-tree kernel tcp-cc
      implementations have to be changed.  The same goes for the struct_ops
      bpf-tcp-cc programs which have to be adjusted accordingly.
      
      This patch is to make the required changes in the bpf verifier.
      
      First change is in btf.c, it adds a case in "btf_check_func_arg_match()".
      When the passed in "btf->kernel_btf == true", it means matching the
      verifier regs' states with a kernel function.  This will handle the
      PTR_TO_BTF_ID reg.  It also maps PTR_TO_SOCK_COMMON, PTR_TO_SOCKET,
      and PTR_TO_TCP_SOCK to its kernel's btf_id.
      
      In the later libbpf patch, the insn calling a kernel function will
      look like:
      
      insn->code == (BPF_JMP | BPF_CALL)
      insn->src_reg == BPF_PSEUDO_KFUNC_CALL /* <- new in this patch */
      insn->imm == func_btf_id /* btf_id of the running kernel */
      
      [ For the future calling function-in-kernel-module support, an array
        of module btf_fds can be passed at the load time and insn->off
        can be used to index into this array. ]
      
      At the early stage of verifier, the verifier will collect all kernel
      function calls into "struct bpf_kfunc_desc".  Those
      descriptors are stored in "prog->aux->kfunc_tab" and will
      be available to the JIT.  Since this "add" operation is similar
      to the current "add_subprog()" and looking for the same insn->code,
      they are done together in the new "add_subprog_and_kfunc()".
      
      In the "do_check()" stage, the new "check_kfunc_call()" is added
      to verify the kernel function call instruction:
      1. Ensure the kernel function can be used by a particular BPF_PROG_TYPE.
         A new bpf_verifier_ops "check_kfunc_call" is added to do that.
         The bpf-tcp-cc struct_ops program will implement this function in
         a later patch.
      2. Call "btf_check_kfunc_args_match()" to ensure the regs can be
         used as the args of a kernel function.
      3. Mark the regs' type, subreg_def, and zext_dst.
      
      At the later do_misc_fixups() stage, the new fixup_kfunc_call()
      will replace the insn->imm with the function address (relative
      to __bpf_call_base).  If needed, the jit can find the btf_func_model
      by calling the new bpf_jit_find_kfunc_model(prog, insn).
      With the imm set to the function address, "bpftool prog dump xlated"
      will be able to display the kernel function calls the same way as
      it displays other bpf helper calls.
      
      gpl_compatible program is required to call kernel function.
      
      This feature currently requires JIT.
      
      The verifier selftests are adjusted because of the changes in
      the verbose log in add_subprog_and_kfunc().
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20210325015142.1544736-1-kafai@fb.com
      e6ac2450
    • M
      bpf: Refactor btf_check_func_arg_match · 34747c41
      Martin KaFai Lau 提交于
      This patch moved the subprog specific logic from
      btf_check_func_arg_match() to the new btf_check_subprog_arg_match().
      The core logic is left in btf_check_func_arg_match() which
      will be reused later to check the kernel function call.
      
      The "if (!btf_type_is_ptr(t))" is checked first to improve the
      indentation which will be useful for a later patch.
      
      Some of the "btf_kind_str[]" usages is replaced with the shortcut
      "btf_type_str(t)".
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20210325015136.1544504-1-kafai@fb.com
      34747c41
    • J
      bpf: Take module reference for trampoline in module · 861de02e
      Jiri Olsa 提交于
      Currently module can be unloaded even if there's a trampoline
      register in it. It's easily reproduced by running in parallel:
      
        # while :; do ./test_progs -t module_attach; done
        # while :; do rmmod bpf_testmod; sleep 0.5; done
      
      Taking the module reference in case the trampoline's ip is
      within the module code. Releasing it when the trampoline's
      ip is unregistered.
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20210326105900.151466-1-jolsa@kernel.org
      861de02e
  11. 26 3月, 2021 2 次提交
  12. 18 3月, 2021 1 次提交
    • A
      bpf: Fix fexit trampoline. · e21aa341
      Alexei Starovoitov 提交于
      The fexit/fmod_ret programs can be attached to kernel functions that can sleep.
      The synchronize_rcu_tasks() will not wait for such tasks to complete.
      In such case the trampoline image will be freed and when the task
      wakes up the return IP will point to freed memory causing the crash.
      Solve this by adding percpu_ref_get/put for the duration of trampoline
      and separate trampoline vs its image life times.
      The "half page" optimization has to be removed, since
      first_half->second_half->first_half transition cannot be guaranteed to
      complete in deterministic time. Every trampoline update becomes a new image.
      The image with fmod_ret or fexit progs will be freed via percpu_ref_kill and
      call_rcu_tasks. Together they will wait for the original function and
      trampoline asm to complete. The trampoline is patched from nop to jmp to skip
      fexit progs. They are freed independently from the trampoline. The image with
      fentry progs only will be freed via call_rcu_tasks_trace+call_rcu_tasks which
      will wait for both sleepable and non-sleepable progs to complete.
      
      Fixes: fec56f58 ("bpf: Introduce BPF trampoline")
      Reported-by: NAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: Paul E. McKenney <paulmck@kernel.org>  # for RCU
      Link: https://lore.kernel.org/bpf/20210316210007.38949-1-alexei.starovoitov@gmail.com
      e21aa341
  13. 10 3月, 2021 2 次提交
  14. 05 3月, 2021 1 次提交
  15. 27 2月, 2021 6 次提交
  16. 12 2月, 2021 1 次提交
  17. 11 2月, 2021 4 次提交
  18. 28 1月, 2021 1 次提交
  19. 13 1月, 2021 3 次提交
  20. 09 12月, 2020 1 次提交