1. 22 10月, 2021 2 次提交
  2. 18 9月, 2021 2 次提交
  3. 16 9月, 2021 1 次提交
  4. 14 9月, 2021 1 次提交
  5. 11 9月, 2021 1 次提交
  6. 26 8月, 2021 1 次提交
  7. 24 8月, 2021 1 次提交
    • D
      bpf: Migrate cgroup_bpf to internal cgroup_bpf_attach_type enum · 6fc88c35
      Dave Marchevsky 提交于
      Add an enum (cgroup_bpf_attach_type) containing only valid cgroup_bpf
      attach types and a function to map bpf_attach_type values to the new
      enum. Inspired by netns_bpf_attach_type.
      
      Then, migrate cgroup_bpf to use cgroup_bpf_attach_type wherever
      possible.  Functionality is unchanged as attach_type_to_prog_type
      switches in bpf/syscall.c were preventing non-cgroup programs from
      making use of the invalid cgroup_bpf array slots.
      
      As a result struct cgroup_bpf uses 504 fewer bytes relative to when its
      arrays were sized using MAX_BPF_ATTACH_TYPE.
      
      bpf_cgroup_storage is notably not migrated as struct
      bpf_cgroup_storage_key is part of uapi and contains a bpf_attach_type
      member which is not meant to be opaque. Similarly, bpf_cgroup_link
      continues to report its bpf_attach_type member to userspace via fdinfo
      and bpf_link_info.
      
      To ease disambiguation, bpf_attach_type variables are renamed from
      'type' to 'atype' when changed to cgroup_bpf_attach_type.
      Signed-off-by: NDave Marchevsky <davemarchevsky@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20210819092420.1984861-2-davemarchevsky@fb.com
      6fc88c35
  8. 17 8月, 2021 3 次提交
    • A
      bpf: Add bpf_get_attach_cookie() BPF helper to access bpf_cookie value · 7adfc6c9
      Andrii Nakryiko 提交于
      Add new BPF helper, bpf_get_attach_cookie(), which can be used by BPF programs
      to get access to a user-provided bpf_cookie value, specified during BPF
      program attachment (BPF link creation) time.
      
      Naming is hard, though. With the concept being named "BPF cookie", I've
      considered calling the helper:
        - bpf_get_cookie() -- seems too unspecific and easily mistaken with socket
          cookie;
        - bpf_get_bpf_cookie() -- too much tautology;
        - bpf_get_link_cookie() -- would be ok, but while we create a BPF link to
          attach BPF program to BPF hook, it's still an "attachment" and the
          bpf_cookie is associated with BPF program attachment to a hook, not a BPF
          link itself. Technically, we could support bpf_cookie with old-style
          cgroup programs.So I ultimately rejected it in favor of
          bpf_get_attach_cookie().
      
      Currently all perf_event-backed BPF program types support
      bpf_get_attach_cookie() helper. Follow-up patches will add support for
      fentry/fexit programs as well.
      
      While at it, mark bpf_tracing_func_proto() as static to make it obvious that
      it's only used from within the kernel/trace/bpf_trace.c.
      Signed-off-by: NAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NYonghong Song <yhs@fb.com>
      Link: https://lore.kernel.org/bpf/20210815070609.987780-7-andrii@kernel.org
      7adfc6c9
    • A
      bpf: Allow to specify user-provided bpf_cookie for BPF perf links · 82e6b1ee
      Andrii Nakryiko 提交于
      Add ability for users to specify custom u64 value (bpf_cookie) when creating
      BPF link for perf_event-backed BPF programs (kprobe/uprobe, perf_event,
      tracepoints).
      
      This is useful for cases when the same BPF program is used for attaching and
      processing invocation of different tracepoints/kprobes/uprobes in a generic
      fashion, but such that each invocation is distinguished from each other (e.g.,
      BPF program can look up additional information associated with a specific
      kernel function without having to rely on function IP lookups). This enables
      new use cases to be implemented simply and efficiently that previously were
      possible only through code generation (and thus multiple instances of almost
      identical BPF program) or compilation at runtime (BCC-style) on target hosts
      (even more expensive resource-wise). For uprobes it is not even possible in
      some cases to know function IP before hand (e.g., when attaching to shared
      library without PID filtering, in which case base load address is not known
      for a library).
      
      This is done by storing u64 bpf_cookie in struct bpf_prog_array_item,
      corresponding to each attached and run BPF program. Given cgroup BPF programs
      already use two 8-byte pointers for their needs and cgroup BPF programs don't
      have (yet?) support for bpf_cookie, reuse that space through union of
      cgroup_storage and new bpf_cookie field.
      
      Make it available to kprobe/tracepoint BPF programs through bpf_trace_run_ctx.
      This is set by BPF_PROG_RUN_ARRAY, used by kprobe/uprobe/tracepoint BPF
      program execution code, which luckily is now also split from
      BPF_PROG_RUN_ARRAY_CG. This run context will be utilized by a new BPF helper
      giving access to this user-provided cookie value from inside a BPF program.
      Generic perf_event BPF programs will access this value from perf_event itself
      through passed in BPF program context.
      Signed-off-by: NAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NYonghong Song <yhs@fb.com>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lore.kernel.org/bpf/20210815070609.987780-6-andrii@kernel.org
      82e6b1ee
    • A
      bpf: Implement minimal BPF perf link · b89fbfbb
      Andrii Nakryiko 提交于
      Introduce a new type of BPF link - BPF perf link. This brings perf_event-based
      BPF program attachments (perf_event, tracepoints, kprobes, and uprobes) into
      the common BPF link infrastructure, allowing to list all active perf_event
      based attachments, auto-detaching BPF program from perf_event when link's FD
      is closed, get generic BPF link fdinfo/get_info functionality.
      
      BPF_LINK_CREATE command expects perf_event's FD as target_fd. No extra flags
      are currently supported.
      
      Force-detaching and atomic BPF program updates are not yet implemented, but
      with perf_event-based BPF links we now have common framework for this without
      the need to extend ioctl()-based perf_event interface.
      
      One interesting consideration is a new value for bpf_attach_type, which
      BPF_LINK_CREATE command expects. Generally, it's either 1-to-1 mapping from
      bpf_attach_type to bpf_prog_type, or many-to-1 mapping from a subset of
      bpf_attach_types to one bpf_prog_type (e.g., see BPF_PROG_TYPE_SK_SKB or
      BPF_PROG_TYPE_CGROUP_SOCK). In this case, though, we have three different
      program types (KPROBE, TRACEPOINT, PERF_EVENT) using the same perf_event-based
      mechanism, so it's many bpf_prog_types to one bpf_attach_type. I chose to
      define a single BPF_PERF_EVENT attach type for all of them and adjust
      link_create()'s logic for checking correspondence between attach type and
      program type.
      
      The alternative would be to define three new attach types (e.g., BPF_KPROBE,
      BPF_TRACEPOINT, and BPF_PERF_EVENT), but that seemed like unnecessary overkill
      and BPF_KPROBE will cause naming conflicts with BPF_KPROBE() macro, defined by
      libbpf. I chose to not do this to avoid unnecessary proliferation of
      bpf_attach_type enum values and not have to deal with naming conflicts.
      Signed-off-by: NAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NYonghong Song <yhs@fb.com>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lore.kernel.org/bpf/20210815070609.987780-5-andrii@kernel.org
      b89fbfbb
  9. 16 7月, 2021 3 次提交
    • J
      bpf: Add bpf_get_func_ip helper for kprobe programs · 9ffd9f3f
      Jiri Olsa 提交于
      Adding bpf_get_func_ip helper for BPF_PROG_TYPE_KPROBE programs,
      so it's now possible to call bpf_get_func_ip from both kprobe and
      kretprobe programs.
      
      Taking the caller's address from 'struct kprobe::addr', which is
      defined for both kprobe and kretprobe.
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Reviewed-by: NMasami Hiramatsu <mhiramat@kernel.org>
      Link: https://lore.kernel.org/bpf/20210714094400.396467-5-jolsa@kernel.org
      9ffd9f3f
    • J
      bpf: Add bpf_get_func_ip helper for tracing programs · 9b99edca
      Jiri Olsa 提交于
      Adding bpf_get_func_ip helper for BPF_PROG_TYPE_TRACING programs,
      specifically for all trampoline attach types.
      
      The trampoline's caller IP address is stored in (ctx - 8) address.
      so there's no reason to actually call the helper, but rather fixup
      the call instruction and return [ctx - 8] value directly.
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20210714094400.396467-4-jolsa@kernel.org
      9b99edca
    • A
      bpf: Introduce bpf timers. · b00628b1
      Alexei Starovoitov 提交于
      Introduce 'struct bpf_timer { __u64 :64; __u64 :64; };' that can be embedded
      in hash/array/lru maps as a regular field and helpers to operate on it:
      
      // Initialize the timer.
      // First 4 bits of 'flags' specify clockid.
      // Only CLOCK_MONOTONIC, CLOCK_REALTIME, CLOCK_BOOTTIME are allowed.
      long bpf_timer_init(struct bpf_timer *timer, struct bpf_map *map, int flags);
      
      // Configure the timer to call 'callback_fn' static function.
      long bpf_timer_set_callback(struct bpf_timer *timer, void *callback_fn);
      
      // Arm the timer to expire 'nsec' nanoseconds from the current time.
      long bpf_timer_start(struct bpf_timer *timer, u64 nsec, u64 flags);
      
      // Cancel the timer and wait for callback_fn to finish if it was running.
      long bpf_timer_cancel(struct bpf_timer *timer);
      
      Here is how BPF program might look like:
      struct map_elem {
          int counter;
          struct bpf_timer timer;
      };
      
      struct {
          __uint(type, BPF_MAP_TYPE_HASH);
          __uint(max_entries, 1000);
          __type(key, int);
          __type(value, struct map_elem);
      } hmap SEC(".maps");
      
      static int timer_cb(void *map, int *key, struct map_elem *val);
      /* val points to particular map element that contains bpf_timer. */
      
      SEC("fentry/bpf_fentry_test1")
      int BPF_PROG(test1, int a)
      {
          struct map_elem *val;
          int key = 0;
      
          val = bpf_map_lookup_elem(&hmap, &key);
          if (val) {
              bpf_timer_init(&val->timer, &hmap, CLOCK_REALTIME);
              bpf_timer_set_callback(&val->timer, timer_cb);
              bpf_timer_start(&val->timer, 1000 /* call timer_cb2 in 1 usec */, 0);
          }
      }
      
      This patch adds helper implementations that rely on hrtimers
      to call bpf functions as timers expire.
      The following patches add necessary safety checks.
      
      Only programs with CAP_BPF are allowed to use bpf_timer.
      
      The amount of timers used by the program is constrained by
      the memcg recorded at map creation time.
      
      The bpf_timer_init() helper needs explicit 'map' argument because inner maps
      are dynamic and not known at load time. While the bpf_timer_set_callback() is
      receiving hidden 'aux->prog' argument supplied by the verifier.
      
      The prog pointer is needed to do refcnting of bpf program to make sure that
      program doesn't get freed while the timer is armed. This approach relies on
      "user refcnt" scheme used in prog_array that stores bpf programs for
      bpf_tail_call. The bpf_timer_set_callback() will increment the prog refcnt which is
      paired with bpf_timer_cancel() that will drop the prog refcnt. The
      ops->map_release_uref is responsible for cancelling the timers and dropping
      prog refcnt when user space reference to a map reaches zero.
      This uref approach is done to make sure that Ctrl-C of user space process will
      not leave timers running forever unless the user space explicitly pinned a map
      that contained timers in bpffs.
      
      bpf_timer_init() and bpf_timer_set_callback() will return -EPERM if map doesn't
      have user references (is not held by open file descriptor from user space and
      not pinned in bpffs).
      
      The bpf_map_delete_elem() and bpf_map_update_elem() operations cancel
      and free the timer if given map element had it allocated.
      "bpftool map update" command can be used to cancel timers.
      
      The 'struct bpf_timer' is explicitly __attribute__((aligned(8))) because
      '__u64 :64' has 1 byte alignment of 8 byte padding.
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Acked-by: NAndrii Nakryiko <andrii@kernel.org>
      Acked-by: NToke Høiland-Jørgensen <toke@redhat.com>
      Link: https://lore.kernel.org/bpf/20210715005417.78572-4-alexei.starovoitov@gmail.com
      b00628b1
  10. 15 7月, 2021 2 次提交
  11. 16 6月, 2021 2 次提交
  12. 26 5月, 2021 1 次提交
    • H
      xdp: Extend xdp_redirect_map with broadcast support · e624d4ed
      Hangbin Liu 提交于
      This patch adds two flags BPF_F_BROADCAST and BPF_F_EXCLUDE_INGRESS to
      extend xdp_redirect_map for broadcast support.
      
      With BPF_F_BROADCAST the packet will be broadcasted to all the interfaces
      in the map. with BPF_F_EXCLUDE_INGRESS the ingress interface will be
      excluded when do broadcasting.
      
      When getting the devices in dev hash map via dev_map_hash_get_next_key(),
      there is a possibility that we fall back to the first key when a device
      was removed. This will duplicate packets on some interfaces. So just walk
      the whole buckets to avoid this issue. For dev array map, we also walk the
      whole map to find valid interfaces.
      
      Function bpf_clear_redirect_map() was removed in
      commit ee75aef2 ("bpf, xdp: Restructure redirect actions").
      Add it back as we need to use ri->map again.
      
      With test topology:
        +-------------------+             +-------------------+
        | Host A (i40e 10G) |  ---------- | eno1(i40e 10G)    |
        +-------------------+             |                   |
                                          |   Host B          |
        +-------------------+             |                   |
        | Host C (i40e 10G) |  ---------- | eno2(i40e 10G)    |
        +-------------------+             |                   |
                                          |          +------+ |
                                          | veth0 -- | Peer | |
                                          | veth1 -- |      | |
                                          | veth2 -- |  NS  | |
                                          |          +------+ |
                                          +-------------------+
      
      On Host A:
       # pktgen/pktgen_sample03_burst_single_flow.sh -i eno1 -d $dst_ip -m $dst_mac -s 64
      
      On Host B(Intel(R) Xeon(R) CPU E5-2690 v3 @ 2.60GHz, 128G Memory):
      Use xdp_redirect_map and xdp_redirect_map_multi in samples/bpf for testing.
      All the veth peers in the NS have a XDP_DROP program loaded. The
      forward_map max_entries in xdp_redirect_map_multi is modify to 4.
      
      Testing the performance impact on the regular xdp_redirect path with and
      without patch (to check impact of additional check for broadcast mode):
      
      5.12 rc4         | redirect_map        i40e->i40e      |    2.0M |  9.7M
      5.12 rc4         | redirect_map        i40e->veth      |    1.7M | 11.8M
      5.12 rc4 + patch | redirect_map        i40e->i40e      |    2.0M |  9.6M
      5.12 rc4 + patch | redirect_map        i40e->veth      |    1.7M | 11.7M
      
      Testing the performance when cloning packets with the redirect_map_multi
      test, using a redirect map size of 4, filled with 1-3 devices:
      
      5.12 rc4 + patch | redirect_map multi  i40e->veth (x1) |    1.7M | 11.4M
      5.12 rc4 + patch | redirect_map multi  i40e->veth (x2) |    1.1M |  4.3M
      5.12 rc4 + patch | redirect_map multi  i40e->veth (x3) |    0.8M |  2.6M
      Signed-off-by: NHangbin Liu <liuhangbin@gmail.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NToke Høiland-Jørgensen <toke@redhat.com>
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
      Acked-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Link: https://lore.kernel.org/bpf/20210519090747.1655268-3-liuhangbin@gmail.com
      e624d4ed
  13. 25 5月, 2021 1 次提交
  14. 19 5月, 2021 5 次提交
  15. 20 4月, 2021 1 次提交
  16. 14 4月, 2021 1 次提交
  17. 13 4月, 2021 1 次提交
  18. 12 4月, 2021 1 次提交
  19. 02 4月, 2021 1 次提交
  20. 27 3月, 2021 1 次提交
    • M
      bpf: Support bpf program calling kernel function · e6ac2450
      Martin KaFai Lau 提交于
      This patch adds support to BPF verifier to allow bpf program calling
      kernel function directly.
      
      The use case included in this set is to allow bpf-tcp-cc to directly
      call some tcp-cc helper functions (e.g. "tcp_cong_avoid_ai()").  Those
      functions have already been used by some kernel tcp-cc implementations.
      
      This set will also allow the bpf-tcp-cc program to directly call the
      kernel tcp-cc implementation,  For example, a bpf_dctcp may only want to
      implement its own dctcp_cwnd_event() and reuse other dctcp_*() directly
      from the kernel tcp_dctcp.c instead of reimplementing (or
      copy-and-pasting) them.
      
      The tcp-cc kernel functions mentioned above will be white listed
      for the struct_ops bpf-tcp-cc programs to use in a later patch.
      The white listed functions are not bounded to a fixed ABI contract.
      Those functions have already been used by the existing kernel tcp-cc.
      If any of them has changed, both in-tree and out-of-tree kernel tcp-cc
      implementations have to be changed.  The same goes for the struct_ops
      bpf-tcp-cc programs which have to be adjusted accordingly.
      
      This patch is to make the required changes in the bpf verifier.
      
      First change is in btf.c, it adds a case in "btf_check_func_arg_match()".
      When the passed in "btf->kernel_btf == true", it means matching the
      verifier regs' states with a kernel function.  This will handle the
      PTR_TO_BTF_ID reg.  It also maps PTR_TO_SOCK_COMMON, PTR_TO_SOCKET,
      and PTR_TO_TCP_SOCK to its kernel's btf_id.
      
      In the later libbpf patch, the insn calling a kernel function will
      look like:
      
      insn->code == (BPF_JMP | BPF_CALL)
      insn->src_reg == BPF_PSEUDO_KFUNC_CALL /* <- new in this patch */
      insn->imm == func_btf_id /* btf_id of the running kernel */
      
      [ For the future calling function-in-kernel-module support, an array
        of module btf_fds can be passed at the load time and insn->off
        can be used to index into this array. ]
      
      At the early stage of verifier, the verifier will collect all kernel
      function calls into "struct bpf_kfunc_desc".  Those
      descriptors are stored in "prog->aux->kfunc_tab" and will
      be available to the JIT.  Since this "add" operation is similar
      to the current "add_subprog()" and looking for the same insn->code,
      they are done together in the new "add_subprog_and_kfunc()".
      
      In the "do_check()" stage, the new "check_kfunc_call()" is added
      to verify the kernel function call instruction:
      1. Ensure the kernel function can be used by a particular BPF_PROG_TYPE.
         A new bpf_verifier_ops "check_kfunc_call" is added to do that.
         The bpf-tcp-cc struct_ops program will implement this function in
         a later patch.
      2. Call "btf_check_kfunc_args_match()" to ensure the regs can be
         used as the args of a kernel function.
      3. Mark the regs' type, subreg_def, and zext_dst.
      
      At the later do_misc_fixups() stage, the new fixup_kfunc_call()
      will replace the insn->imm with the function address (relative
      to __bpf_call_base).  If needed, the jit can find the btf_func_model
      by calling the new bpf_jit_find_kfunc_model(prog, insn).
      With the imm set to the function address, "bpftool prog dump xlated"
      will be able to display the kernel function calls the same way as
      it displays other bpf helper calls.
      
      gpl_compatible program is required to call kernel function.
      
      This feature currently requires JIT.
      
      The verifier selftests are adjusted because of the changes in
      the verbose log in add_subprog_and_kfunc().
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Link: https://lore.kernel.org/bpf/20210325015142.1544736-1-kafai@fb.com
      e6ac2450
  21. 05 3月, 2021 4 次提交
  22. 27 2月, 2021 2 次提交
  23. 25 2月, 2021 1 次提交
  24. 13 2月, 2021 1 次提交
    • J
      bpf: Add BPF-helper for MTU checking · 34b2021c
      Jesper Dangaard Brouer 提交于
      This BPF-helper bpf_check_mtu() works for both XDP and TC-BPF programs.
      
      The SKB object is complex and the skb->len value (accessible from
      BPF-prog) also include the length of any extra GRO/GSO segments, but
      without taking into account that these GRO/GSO segments get added
      transport (L4) and network (L3) headers before being transmitted. Thus,
      this BPF-helper is created such that the BPF-programmer don't need to
      handle these details in the BPF-prog.
      
      The API is designed to help the BPF-programmer, that want to do packet
      context size changes, which involves other helpers. These other helpers
      usually does a delta size adjustment. This helper also support a delta
      size (len_diff), which allow BPF-programmer to reuse arguments needed by
      these other helpers, and perform the MTU check prior to doing any actual
      size adjustment of the packet context.
      
      It is on purpose, that we allow the len adjustment to become a negative
      result, that will pass the MTU check. This might seem weird, but it's not
      this helpers responsibility to "catch" wrong len_diff adjustments. Other
      helpers will take care of these checks, if BPF-programmer chooses to do
      actual size adjustment.
      
      V14:
       - Improve man-page desc of len_diff.
      
      V13:
       - Enforce flag BPF_MTU_CHK_SEGS cannot use len_diff.
      
      V12:
       - Simplify segment check that calls skb_gso_validate_network_len.
       - Helpers should return long
      
      V9:
      - Use dev->hard_header_len (instead of ETH_HLEN)
      - Annotate with unlikely req from Daniel
      - Fix logic error using skb_gso_validate_network_len from Daniel
      
      V6:
      - Took John's advice and dropped BPF_MTU_CHK_RELAX
      - Returned MTU is kept at L3-level (like fib_lookup)
      
      V4: Lot of changes
       - ifindex 0 now use current netdev for MTU lookup
       - rename helper from bpf_mtu_check to bpf_check_mtu
       - fix bug for GSO pkt length (as skb->len is total len)
       - remove __bpf_len_adj_positive, simply allow negative len adj
      Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
      Link: https://lore.kernel.org/bpf/161287790461.790810.3429728639563297353.stgit@firesoul
      34b2021c