1. 29 6月, 2018 1 次提交
    • D
      bpf: Change bpf_fib_lookup to return lookup status · 4c79579b
      David Ahern 提交于
      For ACLs implemented using either FIB rules or FIB entries, the BPF
      program needs the FIB lookup status to be able to drop the packet.
      Since the bpf_fib_lookup API has not reached a released kernel yet,
      change the return code to contain an encoding of the FIB lookup
      result and return the nexthop device index in the params struct.
      
      In addition, inform the BPF program of any post FIB lookup reason as
      to why the packet needs to go up the stack.
      
      The fib result for unicast routes must have an egress device, so remove
      the check that it is non-NULL.
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      4c79579b
  2. 04 6月, 2018 2 次提交
    • D
      bpf: flowlabel in bpf_fib_lookup should be flowinfo · bd3a08aa
      David Ahern 提交于
      As Michal noted the flow struct takes both the flow label and priority.
      Update the bpf_fib_lookup API to note that it is flowinfo and not just
      the flow label.
      
      Cc: Michal Kubecek <mkubecek@suse.cz>
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      bd3a08aa
    • Y
      bpf: implement bpf_get_current_cgroup_id() helper · bf6fa2c8
      Yonghong Song 提交于
      bpf has been used extensively for tracing. For example, bcc
      contains an almost full set of bpf-based tools to trace kernel
      and user functions/events. Most tracing tools are currently
      either filtered based on pid or system-wide.
      
      Containers have been used quite extensively in industry and
      cgroup is often used together to provide resource isolation
      and protection. Several processes may run inside the same
      container. It is often desirable to get container-level tracing
      results as well, e.g. syscall count, function count, I/O
      activity, etc.
      
      This patch implements a new helper, bpf_get_current_cgroup_id(),
      which will return cgroup id based on the cgroup within which
      the current task is running.
      
      The later patch will provide an example to show that
      userspace can get the same cgroup id so it could
      configure a filter or policy in the bpf program based on
      task cgroup id.
      
      The helper is currently implemented for tracing. It can
      be added to other program types as well when needed.
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NYonghong Song <yhs@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      bf6fa2c8
  3. 03 6月, 2018 2 次提交
  4. 02 6月, 2018 1 次提交
    • D
      bpf: fix uapi hole for 32 bit compat applications · 36f9814a
      Daniel Borkmann 提交于
      In 64 bit, we have a 4 byte hole between ifindex and netns_dev in the
      case of struct bpf_map_info but also struct bpf_prog_info. In net-next
      commit b85fab0e ("bpf: Add gpl_compatible flag to struct bpf_prog_info")
      added a bitfield into it to expose some flags related to programs. Thus,
      add an unnamed __u32 bitfield for both so that alignment keeps the same
      in both 32 and 64 bit cases, and can be naturally extended from there
      as in b85fab0e.
      
      Before:
      
        # file test.o
        test.o: ELF 32-bit LSB relocatable, Intel 80386, version 1 (SYSV), not stripped
        # pahole test.o
        struct bpf_map_info {
      	__u32                      type;                 /*     0     4 */
      	__u32                      id;                   /*     4     4 */
      	__u32                      key_size;             /*     8     4 */
      	__u32                      value_size;           /*    12     4 */
      	__u32                      max_entries;          /*    16     4 */
      	__u32                      map_flags;            /*    20     4 */
      	char                       name[16];             /*    24    16 */
      	__u32                      ifindex;              /*    40     4 */
      	__u64                      netns_dev;            /*    44     8 */
      	__u64                      netns_ino;            /*    52     8 */
      
      	/* size: 64, cachelines: 1, members: 10 */
      	/* padding: 4 */
        };
      
      After (same as on 64 bit):
      
        # file test.o
        test.o: ELF 32-bit LSB relocatable, Intel 80386, version 1 (SYSV), not stripped
        # pahole test.o
        struct bpf_map_info {
      	__u32                      type;                 /*     0     4 */
      	__u32                      id;                   /*     4     4 */
      	__u32                      key_size;             /*     8     4 */
      	__u32                      value_size;           /*    12     4 */
      	__u32                      max_entries;          /*    16     4 */
      	__u32                      map_flags;            /*    20     4 */
      	char                       name[16];             /*    24    16 */
      	__u32                      ifindex;              /*    40     4 */
      
      	/* XXX 4 bytes hole, try to pack */
      
      	__u64                      netns_dev;            /*    48     8 */
      	__u64                      netns_ino;            /*    56     8 */
      	/* --- cacheline 1 boundary (64 bytes) --- */
      
      	/* size: 64, cachelines: 1, members: 10 */
      	/* sum members: 60, holes: 1, sum holes: 4 */
        };
      Reported-by: NDmitry V. Levin <ldv@altlinux.org>
      Reported-by: NEugene Syromiatnikov <esyr@redhat.com>
      Fixes: 52775b33 ("bpf: offload: report device information about offloaded maps")
      Fixes: 675fc275 ("bpf: offload: report device information for offloaded programs")
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      36f9814a
  5. 30 5月, 2018 3 次提交
  6. 28 5月, 2018 1 次提交
    • A
      bpf: Hooks for sys_sendmsg · 1cedee13
      Andrey Ignatov 提交于
      In addition to already existing BPF hooks for sys_bind and sys_connect,
      the patch provides new hooks for sys_sendmsg.
      
      It leverages existing BPF program type `BPF_PROG_TYPE_CGROUP_SOCK_ADDR`
      that provides access to socket itlself (properties like family, type,
      protocol) and user-passed `struct sockaddr *` so that BPF program can
      override destination IP and port for system calls such as sendto(2) or
      sendmsg(2) and/or assign source IP to the socket.
      
      The hooks are implemented as two new attach types:
      `BPF_CGROUP_UDP4_SENDMSG` and `BPF_CGROUP_UDP6_SENDMSG` for UDPv4 and
      UDPv6 correspondingly.
      
      UDPv4 and UDPv6 separate attach types for same reason as sys_bind and
      sys_connect hooks, i.e. to prevent reading from / writing to e.g.
      user_ip6 fields when user passes sockaddr_in since it'd be out-of-bound.
      
      The difference with already existing hooks is sys_sendmsg are
      implemented only for unconnected UDP.
      
      For TCP it doesn't make sense to change user-provided `struct sockaddr *`
      at sendto(2)/sendmsg(2) time since socket either was already connected
      and has source/destination set or wasn't connected and call to
      sendto(2)/sendmsg(2) would lead to ENOTCONN anyway.
      
      Connected UDP is already handled by sys_connect hooks that can override
      source/destination at connect time and use fast-path later, i.e. these
      hooks don't affect UDP fast-path.
      
      Rewriting source IP is implemented differently than that in sys_connect
      hooks. When sys_sendmsg is used with unconnected UDP it doesn't work to
      just bind socket to desired local IP address since source IP can be set
      on per-packet basis by using ancillary data (cmsg(3)). So no matter if
      socket is bound or not, source IP has to be rewritten on every call to
      sys_sendmsg.
      
      To do so two new fields are added to UAPI `struct bpf_sock_addr`;
      * `msg_src_ip4` to set source IPv4 for UDPv4;
      * `msg_src_ip6` to set source IPv6 for UDPv6.
      Signed-off-by: NAndrey Ignatov <rdna@fb.com>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      1cedee13
  7. 25 5月, 2018 1 次提交
    • Y
      bpf: introduce bpf subcommand BPF_TASK_FD_QUERY · 41bdc4b4
      Yonghong Song 提交于
      Currently, suppose a userspace application has loaded a bpf program
      and attached it to a tracepoint/kprobe/uprobe, and a bpf
      introspection tool, e.g., bpftool, wants to show which bpf program
      is attached to which tracepoint/kprobe/uprobe. Such attachment
      information will be really useful to understand the overall bpf
      deployment in the system.
      
      There is a name field (16 bytes) for each program, which could
      be used to encode the attachment point. There are some drawbacks
      for this approaches. First, bpftool user (e.g., an admin) may not
      really understand the association between the name and the
      attachment point. Second, if one program is attached to multiple
      places, encoding a proper name which can imply all these
      attachments becomes difficult.
      
      This patch introduces a new bpf subcommand BPF_TASK_FD_QUERY.
      Given a pid and fd, if the <pid, fd> is associated with a
      tracepoint/kprobe/uprobe perf event, BPF_TASK_FD_QUERY will return
         . prog_id
         . tracepoint name, or
         . k[ret]probe funcname + offset or kernel addr, or
         . u[ret]probe filename + offset
      to the userspace.
      The user can use "bpftool prog" to find more information about
      bpf program itself with prog_id.
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NYonghong Song <yhs@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      41bdc4b4
  8. 24 5月, 2018 4 次提交
    • M
      ipv6: sr: Add seg6local action End.BPF · 004d4b27
      Mathieu Xhonneux 提交于
      This patch adds the End.BPF action to the LWT seg6local infrastructure.
      This action works like any other seg6local End action, meaning that an IPv6
      header with SRH is needed, whose DA has to be equal to the SID of the
      action. It will also advance the SRH to the next segment, the BPF program
      does not have to take care of this.
      
      Since the BPF program may not be a source of instability in the kernel, it
      is important to ensure that the integrity of the packet is maintained
      before yielding it back to the IPv6 layer. The hook hence keeps track if
      the SRH has been altered through the helpers, and re-validates its
      content if needed with seg6_validate_srh. The state kept for validation is
      stored in a per-CPU buffer. The BPF program is not allowed to directly
      write into the packet, and only some fields of the SRH can be altered
      through the helper bpf_lwt_seg6_store_bytes.
      
      Performances profiling has shown that the SRH re-validation does not induce
      a significant overhead. If the altered SRH is deemed as invalid, the packet
      is dropped.
      
      This validation is also done before executing any action through
      bpf_lwt_seg6_action, and will not be performed again if the SRH is not
      modified after calling the action.
      
      The BPF program may return 3 types of return codes:
          - BPF_OK: the End.BPF action will look up the next destination through
                   seg6_lookup_nexthop.
          - BPF_REDIRECT: if an action has been executed through the
                bpf_lwt_seg6_action helper, the BPF program should return this
                value, as the skb's destination is already set and the default
                lookup should not be performed.
          - BPF_DROP : the packet will be dropped.
      Signed-off-by: NMathieu Xhonneux <m.xhonneux@gmail.com>
      Acked-by: NDavid Lebrun <dlebrun@google.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      004d4b27
    • M
      bpf: Add IPv6 Segment Routing helpers · fe94cc29
      Mathieu Xhonneux 提交于
      The BPF seg6local hook should be powerful enough to enable users to
      implement most of the use-cases one could think of. After some thinking,
      we figured out that the following actions should be possible on a SRv6
      packet, requiring 3 specific helpers :
          - bpf_lwt_seg6_store_bytes: Modify non-sensitive fields of the SRH
          - bpf_lwt_seg6_adjust_srh: Allow to grow or shrink a SRH
                                     (to add/delete TLVs)
          - bpf_lwt_seg6_action: Apply some SRv6 network programming actions
                                 (specifically End.X, End.T, End.B6 and
                                  End.B6.Encap)
      
      The specifications of these helpers are provided in the patch (see
      include/uapi/linux/bpf.h).
      
      The non-sensitive fields of the SRH are the following : flags, tag and
      TLVs. The other fields can not be modified, to maintain the SRH
      integrity. Flags, tag and TLVs can easily be modified as their validity
      can be checked afterwards via seg6_validate_srh. It is not allowed to
      modify the segments directly. If one wants to add segments on the path,
      he should stack a new SRH using the End.B6 action via
      bpf_lwt_seg6_action.
      
      Growing, shrinking or editing TLVs via the helpers will flag the SRH as
      invalid, and it will have to be re-validated before re-entering the IPv6
      layer. This flag is stored in a per-CPU buffer, along with the current
      header length in bytes.
      
      Storing the SRH len in bytes in the control block is mandatory when using
      bpf_lwt_seg6_adjust_srh. The Header Ext. Length field contains the SRH
      len rounded to 8 bytes (a padding TLV can be inserted to ensure the 8-bytes
      boundary). When adding/deleting TLVs within the BPF program, the SRH may
      temporary be in an invalid state where its length cannot be rounded to 8
      bytes without remainder, hence the need to store the length in bytes
      separately. The caller of the BPF program can then ensure that the SRH's
      final length is valid using this value. Again, a final SRH modified by a
      BPF program which doesn’t respect the 8-bytes boundary will be discarded
      as it will be considered as invalid.
      
      Finally, a fourth helper is provided, bpf_lwt_push_encap, which is
      available from the LWT BPF IN hook, but not from the seg6local BPF one.
      This helper allows to encapsulate a Segment Routing Header (either with
      a new outer IPv6 header, or by inlining it directly in the existing IPv6
      header) into a non-SRv6 packet. This helper is required if we want to
      offer the possibility to dynamically encapsulate a SRH for non-SRv6 packet,
      as the BPF seg6local hook only works on traffic already containing a SRH.
      This is the BPF equivalent of the seg6 LWT infrastructure, which achieves
      the same purpose but with a static SRH per route.
      
      These helpers require CONFIG_IPV6=y (and not =m).
      Signed-off-by: NMathieu Xhonneux <m.xhonneux@gmail.com>
      Acked-by: NDavid Lebrun <dlebrun@google.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      fe94cc29
    • S
      bpf: get JITed image lengths of functions via syscall · 815581c1
      Sandipan Das 提交于
      This adds new two new fields to struct bpf_prog_info. For
      multi-function programs, these fields can be used to pass
      a list of the JITed image lengths of each function for a
      given program to userspace using the bpf system call with
      the BPF_OBJ_GET_INFO_BY_FD command.
      
      This can be used by userspace applications like bpftool
      to split up the contiguous JITed dump, also obtained via
      the system call, into more relatable chunks corresponding
      to each function.
      Signed-off-by: NSandipan Das <sandipan@linux.vnet.ibm.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      815581c1
    • S
      bpf: get kernel symbol addresses via syscall · dbecd738
      Sandipan Das 提交于
      This adds new two new fields to struct bpf_prog_info. For
      multi-function programs, these fields can be used to pass
      a list of kernel symbol addresses for all functions in a
      given program to userspace using the bpf system call with
      the BPF_OBJ_GET_INFO_BY_FD command.
      
      When bpf_jit_kallsyms is enabled, we can get the address
      of the corresponding kernel symbol for a callee function
      and resolve the symbol's name. The address is determined
      by adding the value of the call instruction's imm field
      to __bpf_call_base. This offset gets assigned to the imm
      field by the verifier.
      
      For some architectures, such as powerpc64, the imm field
      is not large enough to hold this offset.
      
      We resolve this by:
      
      [1] Assigning the subprog id to the imm field of a call
          instruction in the verifier instead of the offset of
          the callee's symbol's address from __bpf_call_base.
      
      [2] Determining the address of a callee's corresponding
          symbol by using the imm field as an index for the
          list of kernel symbol addresses now available from
          the program info.
      Suggested-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NSandipan Das <sandipan@linux.vnet.ibm.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      dbecd738
  9. 23 5月, 2018 1 次提交
  10. 19 5月, 2018 1 次提交
  11. 16 5月, 2018 1 次提交
  12. 11 5月, 2018 1 次提交
    • D
      bpf: Provide helper to do forwarding lookups in kernel FIB table · 87f5fc7e
      David Ahern 提交于
      Provide a helper for doing a FIB and neighbor lookup in the kernel
      tables from an XDP program. The helper provides a fastpath for forwarding
      packets. If the packet is a local delivery or for any reason is not a
      simple lookup and forward, the packet continues up the stack.
      
      If it is to be forwarded, the forwarding can be done directly if the
      neighbor is already known. If the neighbor does not exist, the first
      few packets go up the stack for neighbor resolution. Once resolved, the
      xdp program provides the fast path.
      
      On successful lookup the nexthop dmac, current device smac and egress
      device index are returned.
      
      The API supports IPv4, IPv6 and MPLS protocols, but only IPv4 and IPv6
      are implemented in this patch. The API includes layer 4 parameters if
      the XDP program chooses to do deep packet inspection to allow compare
      against ACLs implemented as FIB rules.
      
      Header rewrite is left to the XDP program.
      
      The lookup takes 2 flags:
      - BPF_FIB_LOOKUP_DIRECT to do a lookup that bypasses FIB rules and goes
        straight to the table associated with the device (expert setting for
        those looking to maximize throughput)
      
      - BPF_FIB_LOOKUP_OUTPUT to do a lookup from the egress perspective.
        Default is an ingress lookup.
      
      Initial performance numbers collected by Jesper, forwarded packets/sec:
      
             Full stack    XDP FIB lookup    XDP Direct lookup
      IPv4   1,947,969       7,074,156          7,415,333
      IPv6   1,728,000       6,165,504          7,262,720
      
      These number are single CPU core forwarding on a Broadwell
      E5-1650 v4 @ 3.60GHz.
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Acked-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      87f5fc7e
  13. 09 5月, 2018 2 次提交
    • M
      bpf: btf: Add struct bpf_btf_info · 62dab84c
      Martin KaFai Lau 提交于
      During BPF_OBJ_GET_INFO_BY_FD on a btf_fd, the current bpf_attr's
      info.info is directly filled with the BTF binary data.  It is
      not extensible.  In this case, we want to add BTF ID.
      
      This patch adds "struct bpf_btf_info" which has the BTF ID as
      one of its member.  The BTF binary data itself is exposed through
      the "btf" and "btf_size" members.
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Acked-by: NAlexei Starovoitov <ast@fb.com>
      Acked-by: NSong Liu <songliubraving@fb.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      62dab84c
    • M
      bpf: btf: Introduce BTF ID · 78958fca
      Martin KaFai Lau 提交于
      This patch gives an ID to each loaded BTF.  The ID is allocated by
      the idr like the existing prog-id and map-id.
      
      The bpf_put(map->btf) is moved to __bpf_map_put() so that the
      userspace can stop seeing the BTF ID ASAP when the last BTF
      refcnt is gone.
      
      It also makes BTF accessible from userspace through the
      1. new BPF_BTF_GET_FD_BY_ID command.  It is limited to CAP_SYS_ADMIN
         which is inline with the BPF_BTF_LOAD cmd and the existing
         BPF_[MAP|PROG]_GET_FD_BY_ID cmd.
      2. new btf_id (and btf_key_id + btf_value_id) in "struct bpf_map_info"
      
      Once the BTF ID handler is accessible from userspace, freeing a BTF
      object has to go through a rcu period.  The BPF_BTF_GET_FD_BY_ID cmd
      can then be done under a rcu_read_lock() instead of taking
      spin_lock.
      [Note: A similar rcu usage can be done to the existing
             bpf_prog_get_fd_by_id() in a follow up patch]
      
      When processing the BPF_BTF_GET_FD_BY_ID cmd,
      refcount_inc_not_zero() is needed because the BTF object
      could be already in the rcu dead row .  btf_get() is
      removed since its usage is currently limited to btf.c
      alone.  refcount_inc() is used directly instead.
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Acked-by: NAlexei Starovoitov <ast@fb.com>
      Acked-by: NSong Liu <songliubraving@fb.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      78958fca
  14. 04 5月, 2018 2 次提交
  15. 30 4月, 2018 2 次提交
  16. 29 4月, 2018 2 次提交
    • A
      bpf: Fix helpers ctx struct types in uapi doc · a3ef8e9a
      Andrey Ignatov 提交于
      Helpers may operate on two types of ctx structures: user visible ones
      (e.g. `struct bpf_sock_ops`) when used in user programs, and kernel ones
      (e.g. `struct bpf_sock_ops_kern`) in kernel implementation.
      
      UAPI documentation must refer to only user visible structures.
      
      The patch replaces references to `_kern` structures in BPF helpers
      description by corresponding user visible structures.
      Signed-off-by: NAndrey Ignatov <rdna@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      a3ef8e9a
    • Y
      bpf: add bpf_get_stack helper · c195651e
      Yonghong Song 提交于
      Currently, stackmap and bpf_get_stackid helper are provided
      for bpf program to get the stack trace. This approach has
      a limitation though. If two stack traces have the same hash,
      only one will get stored in the stackmap table,
      so some stack traces are missing from user perspective.
      
      This patch implements a new helper, bpf_get_stack, will
      send stack traces directly to bpf program. The bpf program
      is able to see all stack traces, and then can do in-kernel
      processing or send stack traces to user space through
      shared map or bpf_perf_event_output.
      Acked-by: NAlexei Starovoitov <ast@fb.com>
      Signed-off-by: NYonghong Song <yhs@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      c195651e
  17. 27 4月, 2018 10 次提交
    • Q
      bpf: add documentation for eBPF helpers (65-66) · 2d020dd7
      Quentin Monnet 提交于
      Add documentation for eBPF helper functions to bpf.h user header file.
      This documentation can be parsed with the Python script provided in
      another commit of the patch series, in order to provide a RST document
      that can later be converted into a man page.
      
      The objective is to make the documentation easily understandable and
      accessible to all eBPF developers, including beginners.
      
      This patch contains descriptions for the following helper functions:
      
      Helper from Nikita:
      - bpf_xdp_adjust_tail()
      
      Helper from Eyal:
      - bpf_skb_get_xfrm_state()
      
      v4:
      - New patch (helpers did not exist yet for previous versions).
      
      Cc: Nikita V. Shirokov <tehnerd@tehnerd.com>
      Cc: Eyal Birger <eyal.birger@gmail.com>
      Signed-off-by: NQuentin Monnet <quentin.monnet@netronome.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      2d020dd7
    • Q
      bpf: add documentation for eBPF helpers (58-64) · ab127040
      Quentin Monnet 提交于
      Add documentation for eBPF helper functions to bpf.h user header file.
      This documentation can be parsed with the Python script provided in
      another commit of the patch series, in order to provide a RST document
      that can later be converted into a man page.
      
      The objective is to make the documentation easily understandable and
      accessible to all eBPF developers, including beginners.
      
      This patch contains descriptions for the following helper functions, all
      written by John:
      
      - bpf_redirect_map()
      - bpf_sk_redirect_map()
      - bpf_sock_map_update()
      - bpf_msg_redirect_map()
      - bpf_msg_apply_bytes()
      - bpf_msg_cork_bytes()
      - bpf_msg_pull_data()
      
      v4:
      - bpf_redirect_map(): Fix typos: "XDP_ABORT" changed to "XDP_ABORTED",
        "his" to "this". Also add a paragraph on performance improvement over
        bpf_redirect() helper.
      
      v3:
      - bpf_sk_redirect_map(): Improve description of BPF_F_INGRESS flag.
      - bpf_msg_redirect_map(): Improve description of BPF_F_INGRESS flag.
      - bpf_redirect_map(): Fix note on CPU redirection, not fully implemented
        for generic XDP but supported on native XDP.
      - bpf_msg_pull_data(): Clarify comment about invalidated verifier
        checks.
      
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NQuentin Monnet <quentin.monnet@netronome.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      ab127040
    • Q
      bpf: add documentation for eBPF helpers (51-57) · 7aa79a86
      Quentin Monnet 提交于
      Add documentation for eBPF helper functions to bpf.h user header file.
      This documentation can be parsed with the Python script provided in
      another commit of the patch series, in order to provide a RST document
      that can later be converted into a man page.
      
      The objective is to make the documentation easily understandable and
      accessible to all eBPF developers, including beginners.
      
      This patch contains descriptions for the following helper functions:
      
      Helpers from Lawrence:
      - bpf_setsockopt()
      - bpf_getsockopt()
      - bpf_sock_ops_cb_flags_set()
      
      Helpers from Yonghong:
      - bpf_perf_event_read_value()
      - bpf_perf_prog_read_value()
      
      Helper from Josef:
      - bpf_override_return()
      
      Helper from Andrey:
      - bpf_bind()
      
      v4:
      - bpf_perf_event_read_value(): State that this helper should be
        preferred over bpf_perf_event_read().
      
      v3:
      - bpf_perf_event_read_value(): Fix time of selection for perf event type
        in description. Remove occurences of "cores" to avoid confusion with
        "CPU".
      - bpf_bind(): Remove last paragraph of description, which was off topic.
      
      Cc: Lawrence Brakmo <brakmo@fb.com>
      Cc: Yonghong Song <yhs@fb.com>
      Cc: Josef Bacik <jbacik@fb.com>
      Cc: Andrey Ignatov <rdna@fb.com>
      Signed-off-by: NQuentin Monnet <quentin.monnet@netronome.com>
      Acked-by: NYonghong Song <yhs@fb.com>
      [for bpf_perf_event_read_value(), bpf_perf_prog_read_value()]
      Acked-by: NAndrey Ignatov <rdna@fb.com>
      [for bpf_bind()]
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      7aa79a86
    • Q
      bpf: add documentation for eBPF helpers (42-50) · c6b5fb86
      Quentin Monnet 提交于
      Add documentation for eBPF helper functions to bpf.h user header file.
      This documentation can be parsed with the Python script provided in
      another commit of the patch series, in order to provide a RST document
      that can later be converted into a man page.
      
      The objective is to make the documentation easily understandable and
      accessible to all eBPF developers, including beginners.
      
      This patch contains descriptions for the following helper functions:
      
      Helper from Kaixu:
      - bpf_perf_event_read()
      
      Helpers from Martin:
      - bpf_skb_under_cgroup()
      - bpf_xdp_adjust_head()
      
      Helpers from Sargun:
      - bpf_probe_write_user()
      - bpf_current_task_under_cgroup()
      
      Helper from Thomas:
      - bpf_skb_change_head()
      
      Helper from Gianluca:
      - bpf_probe_read_str()
      
      Helpers from Chenbo:
      - bpf_get_socket_cookie()
      - bpf_get_socket_uid()
      
      v4:
      - bpf_perf_event_read(): State that bpf_perf_event_read_value() should
        be preferred over this helper.
      - bpf_skb_change_head(): Clarify comment about invalidated verifier
        checks.
      - bpf_xdp_adjust_head(): Clarify comment about invalidated verifier
        checks.
      - bpf_probe_write_user(): Add that dst must be a valid user space
        address.
      - bpf_get_socket_cookie(): Improve description by making clearer that
        the cockie belongs to the socket, and state that it remains stable for
        the life of the socket.
      
      v3:
      - bpf_perf_event_read(): Fix time of selection for perf event type in
        description. Remove occurences of "cores" to avoid confusion with
        "CPU".
      
      Cc: Martin KaFai Lau <kafai@fb.com>
      Cc: Sargun Dhillon <sargun@sargun.me>
      Cc: Thomas Graf <tgraf@suug.ch>
      Cc: Gianluca Borello <g.borello@gmail.com>
      Cc: Chenbo Feng <fengc@google.com>
      Signed-off-by: NQuentin Monnet <quentin.monnet@netronome.com>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      [for bpf_skb_under_cgroup(), bpf_xdp_adjust_head()]
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      c6b5fb86
    • Q
      bpf: add documentation for eBPF helpers (33-41) · fa15601a
      Quentin Monnet 提交于
      Add documentation for eBPF helper functions to bpf.h user header file.
      This documentation can be parsed with the Python script provided in
      another commit of the patch series, in order to provide a RST document
      that can later be converted into a man page.
      
      The objective is to make the documentation easily understandable and
      accessible to all eBPF developers, including beginners.
      
      This patch contains descriptions for the following helper functions, all
      written by Daniel:
      
      - bpf_get_hash_recalc()
      - bpf_skb_change_tail()
      - bpf_skb_pull_data()
      - bpf_csum_update()
      - bpf_set_hash_invalid()
      - bpf_get_numa_node_id()
      - bpf_set_hash()
      - bpf_skb_adjust_room()
      - bpf_xdp_adjust_meta()
      
      v4:
      - bpf_skb_change_tail(): Clarify comment about invalidated verifier
        checks.
      - bpf_skb_pull_data(): Clarify the motivation for using this helper or
        bpf_skb_load_bytes(), on non-linear buffers. Fix RST formatting for
        *skb*. Clarify comment about invalidated verifier checks.
      - bpf_csum_update(): Fix description of checksum (entire packet, not IP
        checksum). Fix a typo: "header" instead of "helper".
      - bpf_set_hash_invalid(): Mention bpf_get_hash_recalc().
      - bpf_get_numa_node_id(): State that the helper is not restricted to
        programs attached to sockets.
      - bpf_skb_adjust_room(): Clarify comment about invalidated verifier
        checks.
      - bpf_xdp_adjust_meta(): Clarify comment about invalidated verifier
        checks.
      
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NQuentin Monnet <quentin.monnet@netronome.com>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      fa15601a
    • Q
      bpf: add documentation for eBPF helpers (23-32) · 1fdd08be
      Quentin Monnet 提交于
      Add documentation for eBPF helper functions to bpf.h user header file.
      This documentation can be parsed with the Python script provided in
      another commit of the patch series, in order to provide a RST document
      that can later be converted into a man page.
      
      The objective is to make the documentation easily understandable and
      accessible to all eBPF developers, including beginners.
      
      This patch contains descriptions for the following helper functions, all
      written by Daniel:
      
      - bpf_get_prandom_u32()
      - bpf_get_smp_processor_id()
      - bpf_get_cgroup_classid()
      - bpf_get_route_realm()
      - bpf_skb_load_bytes()
      - bpf_csum_diff()
      - bpf_skb_get_tunnel_opt()
      - bpf_skb_set_tunnel_opt()
      - bpf_skb_change_proto()
      - bpf_skb_change_type()
      
      v4:
      - bpf_get_prandom_u32(): Warn that the prng is not cryptographically
        secure.
      - bpf_get_smp_processor_id(): Fix a typo (case).
      - bpf_get_cgroup_classid(): Clarify description. Add notes on the helper
        being limited to cgroup v1, and to egress path.
      - bpf_get_route_realm(): Add comparison with bpf_get_cgroup_classid().
        Add a note about usage with TC and advantage of clsact. Fix a typo in
        return value ("sdb" instead of "skb").
      - bpf_skb_load_bytes(): Make explicit loading large data loads it to the
        eBPF stack.
      - bpf_csum_diff(): Add a note on seed that can be cascaded. Link to
        bpf_l3|l4_csum_replace().
      - bpf_skb_get_tunnel_opt(): Add a note about usage with "collect
        metadata" mode, and example of this with Geneve.
      - bpf_skb_set_tunnel_opt(): Add a link to bpf_skb_get_tunnel_opt()
        description.
      - bpf_skb_change_proto(): Mention that the main use case is NAT64.
        Clarify comment about invalidated verifier checks.
      
      v3:
      - bpf_get_prandom_u32(): Fix helper name :(. Add description, including
        a note on the internal random state.
      - bpf_get_smp_processor_id(): Add description, including a note on the
        processor id remaining stable during program run.
      - bpf_get_cgroup_classid(): State that CONFIG_CGROUP_NET_CLASSID is
        required to use the helper. Add a reference to related documentation.
        State that placing a task in net_cls controller disables cgroup-bpf.
      - bpf_get_route_realm(): State that CONFIG_CGROUP_NET_CLASSID is
        required to use this helper.
      - bpf_skb_load_bytes(): Fix comment on current use cases for the helper.
      
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NQuentin Monnet <quentin.monnet@netronome.com>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      1fdd08be
    • Q
      bpf: add documentation for eBPF helpers (12-22) · c456dec4
      Quentin Monnet 提交于
      Add documentation for eBPF helper functions to bpf.h user header file.
      This documentation can be parsed with the Python script provided in
      another commit of the patch series, in order to provide a RST document
      that can later be converted into a man page.
      
      The objective is to make the documentation easily understandable and
      accessible to all eBPF developers, including beginners.
      
      This patch contains descriptions for the following helper functions, all
      written by Alexei:
      
      - bpf_get_current_pid_tgid()
      - bpf_get_current_uid_gid()
      - bpf_get_current_comm()
      - bpf_skb_vlan_push()
      - bpf_skb_vlan_pop()
      - bpf_skb_get_tunnel_key()
      - bpf_skb_set_tunnel_key()
      - bpf_redirect()
      - bpf_perf_event_output()
      - bpf_get_stackid()
      - bpf_get_current_task()
      
      v4:
      - bpf_redirect(): Fix typo: "XDP_ABORT" changed to "XDP_ABORTED". Add
        note on bpf_redirect_map() providing better performance. Replace "Save
        for" with "Except for".
      - bpf_skb_vlan_push(): Clarify comment about invalidated verifier
        checks.
      - bpf_skb_vlan_pop(): Clarify comment about invalidated verifier
        checks.
      - bpf_skb_get_tunnel_key(): Add notes on tunnel_id, "collect metadata"
        mode, and example tunneling protocols with which it can be used.
      - bpf_skb_set_tunnel_key(): Add a reference to the description of
        bpf_skb_get_tunnel_key().
      - bpf_perf_event_output(): Specify that, and for what purpose, the
        helper can be used with programs attached to TC and XDP.
      
      v3:
      - bpf_skb_get_tunnel_key(): Change and improve description and example.
      - bpf_redirect(): Improve description of BPF_F_INGRESS flag.
      - bpf_perf_event_output(): Fix first sentence of description. Delete
        wrong statement on context being evaluated as a struct pt_reg. Remove
        the long yet incomplete example.
      - bpf_get_stackid(): Add a note about PERF_MAX_STACK_DEPTH being
        configurable.
      
      Cc: Alexei Starovoitov <ast@kernel.org>
      Signed-off-by: NQuentin Monnet <quentin.monnet@netronome.com>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      c456dec4
    • Q
      bpf: add documentation for eBPF helpers (01-11) · ad4a5223
      Quentin Monnet 提交于
      Add documentation for eBPF helper functions to bpf.h user header file.
      This documentation can be parsed with the Python script provided in
      another commit of the patch series, in order to provide a RST document
      that can later be converted into a man page.
      
      The objective is to make the documentation easily understandable and
      accessible to all eBPF developers, including beginners.
      
      This patch contains descriptions for the following helper functions, all
      written by Alexei:
      
      - bpf_map_lookup_elem()
      - bpf_map_update_elem()
      - bpf_map_delete_elem()
      - bpf_probe_read()
      - bpf_ktime_get_ns()
      - bpf_trace_printk()
      - bpf_skb_store_bytes()
      - bpf_l3_csum_replace()
      - bpf_l4_csum_replace()
      - bpf_tail_call()
      - bpf_clone_redirect()
      
      v4:
      - bpf_map_lookup_elem(): Add "const" qualifier for key.
      - bpf_map_update_elem(): Add "const" qualifier for key and value.
      - bpf_map_lookup_elem(): Add "const" qualifier for key.
      - bpf_skb_store_bytes(): Clarify comment about invalidated verifier
        checks.
      - bpf_l3_csum_replace(): Mention L3 instead of just IP, and add a note
        about bpf_csum_diff().
      - bpf_l4_csum_replace(): Mention L4 instead of just TCP/UDP, and add a
        note about bpf_csum_diff().
      - bpf_tail_call(): Bring minor edits to description.
      - bpf_clone_redirect(): Add a note about the relation with
        bpf_redirect(). Also clarify comment about invalidated verifier
        checks.
      
      v3:
      - bpf_map_lookup_elem(): Fix description of restrictions for flags
        related to the existence of the entry.
      - bpf_trace_printk(): State that trace_pipe can be configured. Fix
        return value in case an unknown format specifier is met. Add a note on
        kernel log notice when the helper is used. Edit example.
      - bpf_tail_call(): Improve comment on stack inheritance.
      - bpf_clone_redirect(): Improve description of BPF_F_INGRESS flag.
      
      Cc: Alexei Starovoitov <ast@kernel.org>
      Signed-off-by: NQuentin Monnet <quentin.monnet@netronome.com>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      ad4a5223
    • Q
      bpf: add script and prepare bpf.h for new helpers documentation · 56a092c8
      Quentin Monnet 提交于
      Remove previous "overview" of eBPF helpers from user bpf.h header.
      Replace it by a comment explaining how to process the new documentation
      (to come in following patches) with a Python script to produce RST, then
      man page documentation.
      
      Also add the aforementioned Python script under scripts/. It is used to
      process include/uapi/linux/bpf.h and to extract helper descriptions, to
      turn it into a RST document that can further be processed with rst2man
      to produce a man page. The script takes one "--filename <path/to/file>"
      option. If the script is launched from scripts/ in the kernel root
      directory, it should be able to find the location of the header to
      parse, and "--filename <path/to/file>" is then optional. If it cannot
      find the file, then the option becomes mandatory. RST-formatted
      documentation is printed to standard output.
      
      Typical workflow for producing the final man page would be:
      
          $ ./scripts/bpf_helpers_doc.py \
                  --filename include/uapi/linux/bpf.h > /tmp/bpf-helpers.rst
          $ rst2man /tmp/bpf-helpers.rst > /tmp/bpf-helpers.7
          $ man /tmp/bpf-helpers.7
      
      Note that the tool kernel-doc cannot be used to document eBPF helpers,
      whose signatures are not available directly in the header files
      (pre-processor directives are used to produce them at the beginning of
      the compilation process).
      
      v4:
      - Also remove overviews for newly added bpf_xdp_adjust_tail() and
        bpf_skb_get_xfrm_state().
      - Remove vague statement about what helpers are restricted to GPL
        programs in "LICENSE" section for man page footer.
      - Replace license boilerplate with SPDX tag for Python script.
      
      v3:
      - Change license for man page.
      - Remove "for safety reasons" from man page header text.
      - Change "packets metadata" to "packets" in man page header text.
      - Move and fix comment on helpers introducing no overhead.
      - Remove "NOTES" section from man page footer.
      - Add "LICENSE" section to man page footer.
      - Edit description of file include/uapi/linux/bpf.h in man page footer.
      Signed-off-by: NQuentin Monnet <quentin.monnet@netronome.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      56a092c8
    • J
      bpf: Add gpl_compatible flag to struct bpf_prog_info · b85fab0e
      Jiri Olsa 提交于
      Adding gpl_compatible flag to struct bpf_prog_info
      so it can be dumped via bpf_prog_get_info_by_fd and
      displayed via bpftool progs dump.
      
      Alexei noticed 4-byte hole in struct bpf_prog_info,
      so we put the u32 flags field in there, and we can
      keep adding bit fields in there without breaking
      user space.
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      b85fab0e
  18. 25 4月, 2018 1 次提交
    • E
      bpf: add helper for getting xfrm states · 12bed760
      Eyal Birger 提交于
      This commit introduces a helper which allows fetching xfrm state
      parameters by eBPF programs attached to TC.
      
      Prototype:
      bpf_skb_get_xfrm_state(skb, index, xfrm_state, size, flags)
      
      skb: pointer to skb
      index: the index in the skb xfrm_state secpath array
      xfrm_state: pointer to 'struct bpf_xfrm_state'
      size: size of 'struct bpf_xfrm_state'
      flags: reserved for future extensions
      
      The helper returns 0 on success. Non zero if no xfrm state at the index
      is found - or non exists at all.
      
      struct bpf_xfrm_state currently includes the SPI, peer IPv4/IPv6
      address and the reqid; it can be further extended by adding elements to
      its end - indicating the populated fields by the 'size' argument -
      keeping backwards compatibility.
      
      Typical usage:
      
      struct bpf_xfrm_state x = {};
      bpf_skb_get_xfrm_state(skb, 0, &x, sizeof(x), 0);
      ...
      Signed-off-by: NEyal Birger <eyal.birger@gmail.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      12bed760
  19. 20 4月, 2018 2 次提交
    • M
      bpf: btf: Add pretty print support to the basic arraymap · a26ca7c9
      Martin KaFai Lau 提交于
      This patch adds pretty print support to the basic arraymap.
      Support for other bpf maps can be added later.
      
      This patch adds new attrs to the BPF_MAP_CREATE command to allow
      specifying the btf_fd, btf_key_id and btf_value_id.  The
      BPF_MAP_CREATE can then associate the btf to the map if
      the creating map supports BTF.
      
      A BTF supported map needs to implement two new map ops,
      map_seq_show_elem() and map_check_btf().  This patch has
      implemented these new map ops for the basic arraymap.
      
      It also adds file_operations, bpffs_map_fops, to the pinned
      map such that the pinned map can be opened and read.
      After that, the user has an intuitive way to do
      "cat bpffs/pathto/a-pinned-map" instead of getting
      an error.
      
      bpffs_map_fops should not be extended further to support
      other operations.  Other operations (e.g. write/key-lookup...)
      should be realized by the userspace tools (e.g. bpftool) through
      the BPF_OBJ_GET_INFO_BY_FD, map's lookup/update interface...etc.
      Follow up patches will allow the userspace to obtain
      the BTF from a map-fd.
      
      Here is a sample output when reading a pinned arraymap
      with the following map's value:
      
      struct map_value {
      	int count_a;
      	int count_b;
      };
      
      cat /sys/fs/bpf/pinned_array_map:
      
      0: {1,2}
      1: {3,4}
      2: {5,6}
      ...
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Acked-by: NAlexei Starovoitov <ast@fb.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      a26ca7c9
    • M
      bpf: btf: Add BPF_BTF_LOAD command · f56a653c
      Martin KaFai Lau 提交于
      This patch adds a BPF_BTF_LOAD command which
      1) loads and verifies the BTF (implemented in earlier patches)
      2) returns a BTF fd to userspace.  In the next patch, the
         BTF fd can be specified during BPF_MAP_CREATE.
      
      It currently limits to CAP_SYS_ADMIN.
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Acked-by: NAlexei Starovoitov <ast@fb.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      f56a653c