- 11 8月, 2018 1 次提交
-
-
由 Martin KaFai Lau 提交于
This patch introduces a new map type BPF_MAP_TYPE_REUSEPORT_SOCKARRAY. To unleash the full potential of a bpf prog, it is essential for the userspace to be capable of directly setting up a bpf map which can then be consumed by the bpf prog to make decision. In this case, decide which SO_REUSEPORT sk to serve the incoming request. By adding BPF_MAP_TYPE_REUSEPORT_SOCKARRAY, the userspace has total control and visibility on where a SO_REUSEPORT sk should be located in a bpf map. The later patch will introduce BPF_PROG_TYPE_SK_REUSEPORT such that the bpf prog can directly select a sk from the bpf map. That will raise the programmability of the bpf prog attached to a reuseport group (a group of sk serving the same IP:PORT). For example, in UDP, the bpf prog can peek into the payload (e.g. through the "data" pointer introduced in the later patch) to learn the application level's connection information and then decide which sk to pick from a bpf map. The userspace can tightly couple the sk's location in a bpf map with the application logic in generating the UDP payload's connection information. This connection info contact/API stays within the userspace. Also, when used with map-in-map, the userspace can switch the old-server-process's inner map to a new-server-process's inner map in one call "bpf_map_update_elem(outer_map, &index, &new_reuseport_array)". The bpf prog will then direct incoming requests to the new process instead of the old process. The old process can finish draining the pending requests (e.g. by "accept()") before closing the old-fds. [Note that deleting a fd from a bpf map does not necessary mean the fd is closed] During map_update_elem(), Only SO_REUSEPORT sk (i.e. which has already been added to a reuse->socks[]) can be used. That means a SO_REUSEPORT sk that is "bind()" for UDP or "bind()+listen()" for TCP. These conditions are ensured in "reuseport_array_update_check()". A SO_REUSEPORT sk can only be added once to a map (i.e. the same sk cannot be added twice even to the same map). SO_REUSEPORT already allows another sk to be created for the same IP:PORT. There is no need to re-create a similar usage in the BPF side. When a SO_REUSEPORT is deleted from the "reuse->socks[]" (e.g. "close()"), it will notify the bpf map to remove it from the map also. It is done through "bpf_sk_reuseport_detach()" and it will only be called if >=1 of the "reuse->sock[]" has ever been added to a bpf map. The map_update()/map_delete() has to be in-sync with the "reuse->socks[]". Hence, the same "reuseport_lock" used by "reuse->socks[]" has to be used here also. Care has been taken to ensure the lock is only acquired when the adding sk passes some strict tests. and freeing the map does not require the reuseport_lock. The reuseport_array will also support lookup from the syscall side. It will return a sock_gen_cookie(). The sock_gen_cookie() is on-demand (i.e. a sk's cookie is not generated until the very first map_lookup_elem()). The lookup cookie is 64bits but it goes against the logical userspace expectation on 32bits sizeof(fd) (and as other fd based bpf maps do also). It may catch user in surprise if we enforce value_size=8 while userspace still pass a 32bits fd during update. Supporting different value_size between lookup and update seems unintuitive also. We also need to consider what if other existing fd based maps want to return 64bits value from syscall's lookup in the future. Hence, reuseport_array supports both value_size 4 and 8, and assuming user will usually use value_size=4. The syscall's lookup will return ENOSPC on value_size=4. It will will only return 64bits value from sock_gen_cookie() when user consciously choose value_size=8 (as a signal that lookup is desired) which then requires a 64bits value in both lookup and update. Signed-off-by: NMartin KaFai Lau <kafai@fb.com> Acked-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
- 03 8月, 2018 2 次提交
-
-
由 Roman Gushchin 提交于
The bpf_get_local_storage() helper function is used to get a pointer to the bpf local storage from a bpf program. It takes a pointer to a storage map and flags as arguments. Right now it accepts only cgroup storage maps, and flags argument has to be 0. Further it can be extended to support other types of local storage: e.g. thread local storage etc. Signed-off-by: NRoman Gushchin <guro@fb.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Acked-by: NMartin KaFai Lau <kafai@fb.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
由 Roman Gushchin 提交于
This commit introduces BPF_MAP_TYPE_CGROUP_STORAGE maps: a special type of maps which are implementing the cgroup storage. >From the userspace point of view it's almost a generic hash map with the (cgroup inode id, attachment type) pair used as a key. The only difference is that some operations are restricted: 1) a user can't create new entries, 2) a user can't remove existing entries. The lookup from userspace is o(log(n)). Signed-off-by: NRoman Gushchin <guro@fb.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Acked-by: NMartin KaFai Lau <kafai@fb.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
- 31 7月, 2018 1 次提交
-
-
由 Andrey Ignatov 提交于
bpf_get_socket_cookie() helper can be used to identify skb that correspond to the same socket. Though socket cookie can be useful in many other use-cases where socket is available in program context. Specifically BPF_PROG_TYPE_CGROUP_SOCK_ADDR and BPF_PROG_TYPE_SOCK_OPS programs can benefit from it so that one of them can augment a value in a map prepared earlier by other program for the same socket. The patch adds support to call bpf_get_socket_cookie() from BPF_PROG_TYPE_CGROUP_SOCK_ADDR and BPF_PROG_TYPE_SOCK_OPS. It doesn't introduce new helpers. Instead it reuses same helper name bpf_get_socket_cookie() but adds support to this helper to accept `struct bpf_sock_addr` and `struct bpf_sock_ops`. Documentation in bpf.h is changed in a way that should not break automatic generation of markdown. Signed-off-by: NAndrey Ignatov <rdna@fb.com> Acked-by: NYonghong Song <yhs@fb.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
- 15 7月, 2018 1 次提交
-
-
由 Andrey Ignatov 提交于
Add new TCP-BPF callback that is called on listen(2) right after socket transition to TCP_LISTEN state. It fills the gap for listening sockets in TCP-BPF. For example BPF program can set BPF_SOCK_OPS_STATE_CB_FLAG when socket becomes listening and track later transition from TCP_LISTEN to TCP_CLOSE with BPF_SOCK_OPS_STATE_CB callback. Before there was no way to do it with TCP-BPF and other options were much harder to work with. E.g. socket state tracking can be done with tracepoints (either raw or regular) but they can't be attached to cgroup and their lifetime has to be managed separately. Signed-off-by: NAndrey Ignatov <rdna@fb.com> Acked-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
- 13 7月, 2018 1 次提交
-
-
由 Quentin Monnet 提交于
Minor formatting edits for eBPF helpers documentation, including blank lines removal, fix of item list for return values in bpf_fib_lookup(), and missing prefix on bpf_skb_load_bytes_relative(). Signed-off-by: NQuentin Monnet <quentin.monnet@netronome.com> Reviewed-by: NJakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
- 29 6月, 2018 1 次提交
-
-
由 David Ahern 提交于
For ACLs implemented using either FIB rules or FIB entries, the BPF program needs the FIB lookup status to be able to drop the packet. Since the bpf_fib_lookup API has not reached a released kernel yet, change the return code to contain an encoding of the FIB lookup result and return the nexthop device index in the params struct. In addition, inform the BPF program of any post FIB lookup reason as to why the packet needs to go up the stack. The fib result for unicast routes must have an egress device, so remove the check that it is non-NULL. Signed-off-by: NDavid Ahern <dsahern@gmail.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
- 04 6月, 2018 2 次提交
-
-
由 David Ahern 提交于
As Michal noted the flow struct takes both the flow label and priority. Update the bpf_fib_lookup API to note that it is flowinfo and not just the flow label. Cc: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: NDavid Ahern <dsahern@gmail.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
-
由 Yonghong Song 提交于
bpf has been used extensively for tracing. For example, bcc contains an almost full set of bpf-based tools to trace kernel and user functions/events. Most tracing tools are currently either filtered based on pid or system-wide. Containers have been used quite extensively in industry and cgroup is often used together to provide resource isolation and protection. Several processes may run inside the same container. It is often desirable to get container-level tracing results as well, e.g. syscall count, function count, I/O activity, etc. This patch implements a new helper, bpf_get_current_cgroup_id(), which will return cgroup id based on the cgroup within which the current task is running. The later patch will provide an example to show that userspace can get the same cgroup id so it could configure a filter or policy in the bpf program based on task cgroup id. The helper is currently implemented for tracing. It can be added to other program types as well when needed. Acked-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NYonghong Song <yhs@fb.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
-
- 03 6月, 2018 2 次提交
-
-
由 Daniel Borkmann 提交于
Since the remaining bits are not filled in struct bpf_tunnel_key resp. struct bpf_xfrm_state and originate from uninitialized stack space, we should make sure to clear them before handing control back to the program. Also add a padding element to struct bpf_xfrm_state for future use similar as we have in struct bpf_tunnel_key and clear it as well. struct bpf_xfrm_state { __u32 reqid; /* 0 4 */ __u32 spi; /* 4 4 */ __u16 family; /* 8 2 */ /* XXX 2 bytes hole, try to pack */ union { __u32 remote_ipv4; /* 4 */ __u32 remote_ipv6[4]; /* 16 */ }; /* 12 16 */ /* size: 28, cachelines: 1, members: 4 */ /* sum members: 26, holes: 1, sum holes: 2 */ /* last cacheline: 28 bytes */ }; Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NAlexei Starovoitov <ast@kernel.org> Acked-by: NSong Liu <songliubraving@fb.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
-
由 Daniel Borkmann 提交于
Add a new bpf_skb_cgroup_id() helper that allows to retrieve the cgroup id from the skb's socket. This is useful in particular to enable bpf_get_cgroup_classid()-like behavior for cgroup v1 in cgroup v2 by allowing ID based matching on egress. This can in particular be used in combination with applying policy e.g. from map lookups, and also complements the older bpf_skb_under_cgroup() interface. In user space the cgroup id for a given path can be retrieved through the f_handle as demonstrated in [0] recently. [0] https://lkml.org/lkml/2018/5/22/1190Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
-
- 02 6月, 2018 1 次提交
-
-
由 Daniel Borkmann 提交于
In 64 bit, we have a 4 byte hole between ifindex and netns_dev in the case of struct bpf_map_info but also struct bpf_prog_info. In net-next commit b85fab0e ("bpf: Add gpl_compatible flag to struct bpf_prog_info") added a bitfield into it to expose some flags related to programs. Thus, add an unnamed __u32 bitfield for both so that alignment keeps the same in both 32 and 64 bit cases, and can be naturally extended from there as in b85fab0e. Before: # file test.o test.o: ELF 32-bit LSB relocatable, Intel 80386, version 1 (SYSV), not stripped # pahole test.o struct bpf_map_info { __u32 type; /* 0 4 */ __u32 id; /* 4 4 */ __u32 key_size; /* 8 4 */ __u32 value_size; /* 12 4 */ __u32 max_entries; /* 16 4 */ __u32 map_flags; /* 20 4 */ char name[16]; /* 24 16 */ __u32 ifindex; /* 40 4 */ __u64 netns_dev; /* 44 8 */ __u64 netns_ino; /* 52 8 */ /* size: 64, cachelines: 1, members: 10 */ /* padding: 4 */ }; After (same as on 64 bit): # file test.o test.o: ELF 32-bit LSB relocatable, Intel 80386, version 1 (SYSV), not stripped # pahole test.o struct bpf_map_info { __u32 type; /* 0 4 */ __u32 id; /* 4 4 */ __u32 key_size; /* 8 4 */ __u32 value_size; /* 12 4 */ __u32 max_entries; /* 16 4 */ __u32 map_flags; /* 20 4 */ char name[16]; /* 24 16 */ __u32 ifindex; /* 40 4 */ /* XXX 4 bytes hole, try to pack */ __u64 netns_dev; /* 48 8 */ __u64 netns_ino; /* 56 8 */ /* --- cacheline 1 boundary (64 bytes) --- */ /* size: 64, cachelines: 1, members: 10 */ /* sum members: 60, holes: 1, sum holes: 4 */ }; Reported-by: NDmitry V. Levin <ldv@altlinux.org> Reported-by: NEugene Syromiatnikov <esyr@redhat.com> Fixes: 52775b33 ("bpf: offload: report device information about offloaded maps") Fixes: 675fc275 ("bpf: offload: report device information for offloaded programs") Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
-
- 30 5月, 2018 3 次提交
-
-
由 Sean Young 提交于
Add support for BPF_PROG_LIRC_MODE2. This type of BPF program can call rc_keydown() to reported decoded IR scancodes, or rc_repeat() to report that the last key should be repeated. The bpf program can be attached to using the bpf(BPF_PROG_ATTACH) syscall; the target_fd must be the /dev/lircN device. Acked-by: NYonghong Song <yhs@fb.com> Signed-off-by: NSean Young <sean@mess.org> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
由 David Ahern 提交于
MPLS support will not be submitted this dev cycle, but in working on it I do see a few changes are needed to the API. For now, drop mpls from the API. Since the fields in question are unions, the mpls fields can be added back later without affecting the uapi. Signed-off-by: NDavid Ahern <dsahern@gmail.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
由 Quentin Monnet 提交于
These are minor edits for the eBPF helpers documentation in include/uapi/linux/bpf.h. The main fix consists in removing "BPF_FIB_LOOKUP_", because it ends with a non-escaped underscore that gets interpreted by rst2man and produces the following message in the resulting manual page: DOCUTILS SYSTEM MESSAGES System Message: ERROR/3 (/tmp/bpf-helpers.rst:, line 1514) Unknown target name: "bpf_fib_lookup". Other edits consist in: - Improving formatting for flag values for "bpf_fib_lookup()" helper. - Emphasising a parameter name in description of the return value for "bpf_get_stack()" helper. - Removing unnecessary blank lines between "Description" and "Return" sections for the few helpers that would use it, for consistency. Signed-off-by: NQuentin Monnet <quentin.monnet@netronome.com> Acked-by: NSong Liu <songliubraving@fb.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
- 28 5月, 2018 1 次提交
-
-
由 Andrey Ignatov 提交于
In addition to already existing BPF hooks for sys_bind and sys_connect, the patch provides new hooks for sys_sendmsg. It leverages existing BPF program type `BPF_PROG_TYPE_CGROUP_SOCK_ADDR` that provides access to socket itlself (properties like family, type, protocol) and user-passed `struct sockaddr *` so that BPF program can override destination IP and port for system calls such as sendto(2) or sendmsg(2) and/or assign source IP to the socket. The hooks are implemented as two new attach types: `BPF_CGROUP_UDP4_SENDMSG` and `BPF_CGROUP_UDP6_SENDMSG` for UDPv4 and UDPv6 correspondingly. UDPv4 and UDPv6 separate attach types for same reason as sys_bind and sys_connect hooks, i.e. to prevent reading from / writing to e.g. user_ip6 fields when user passes sockaddr_in since it'd be out-of-bound. The difference with already existing hooks is sys_sendmsg are implemented only for unconnected UDP. For TCP it doesn't make sense to change user-provided `struct sockaddr *` at sendto(2)/sendmsg(2) time since socket either was already connected and has source/destination set or wasn't connected and call to sendto(2)/sendmsg(2) would lead to ENOTCONN anyway. Connected UDP is already handled by sys_connect hooks that can override source/destination at connect time and use fast-path later, i.e. these hooks don't affect UDP fast-path. Rewriting source IP is implemented differently than that in sys_connect hooks. When sys_sendmsg is used with unconnected UDP it doesn't work to just bind socket to desired local IP address since source IP can be set on per-packet basis by using ancillary data (cmsg(3)). So no matter if socket is bound or not, source IP has to be rewritten on every call to sys_sendmsg. To do so two new fields are added to UAPI `struct bpf_sock_addr`; * `msg_src_ip4` to set source IPv4 for UDPv4; * `msg_src_ip6` to set source IPv6 for UDPv6. Signed-off-by: NAndrey Ignatov <rdna@fb.com> Acked-by: NAlexei Starovoitov <ast@kernel.org> Acked-by: NMartin KaFai Lau <kafai@fb.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
- 25 5月, 2018 1 次提交
-
-
由 Yonghong Song 提交于
Currently, suppose a userspace application has loaded a bpf program and attached it to a tracepoint/kprobe/uprobe, and a bpf introspection tool, e.g., bpftool, wants to show which bpf program is attached to which tracepoint/kprobe/uprobe. Such attachment information will be really useful to understand the overall bpf deployment in the system. There is a name field (16 bytes) for each program, which could be used to encode the attachment point. There are some drawbacks for this approaches. First, bpftool user (e.g., an admin) may not really understand the association between the name and the attachment point. Second, if one program is attached to multiple places, encoding a proper name which can imply all these attachments becomes difficult. This patch introduces a new bpf subcommand BPF_TASK_FD_QUERY. Given a pid and fd, if the <pid, fd> is associated with a tracepoint/kprobe/uprobe perf event, BPF_TASK_FD_QUERY will return . prog_id . tracepoint name, or . k[ret]probe funcname + offset or kernel addr, or . u[ret]probe filename + offset to the userspace. The user can use "bpftool prog" to find more information about bpf program itself with prog_id. Acked-by: NMartin KaFai Lau <kafai@fb.com> Signed-off-by: NYonghong Song <yhs@fb.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
-
- 24 5月, 2018 4 次提交
-
-
由 Mathieu Xhonneux 提交于
This patch adds the End.BPF action to the LWT seg6local infrastructure. This action works like any other seg6local End action, meaning that an IPv6 header with SRH is needed, whose DA has to be equal to the SID of the action. It will also advance the SRH to the next segment, the BPF program does not have to take care of this. Since the BPF program may not be a source of instability in the kernel, it is important to ensure that the integrity of the packet is maintained before yielding it back to the IPv6 layer. The hook hence keeps track if the SRH has been altered through the helpers, and re-validates its content if needed with seg6_validate_srh. The state kept for validation is stored in a per-CPU buffer. The BPF program is not allowed to directly write into the packet, and only some fields of the SRH can be altered through the helper bpf_lwt_seg6_store_bytes. Performances profiling has shown that the SRH re-validation does not induce a significant overhead. If the altered SRH is deemed as invalid, the packet is dropped. This validation is also done before executing any action through bpf_lwt_seg6_action, and will not be performed again if the SRH is not modified after calling the action. The BPF program may return 3 types of return codes: - BPF_OK: the End.BPF action will look up the next destination through seg6_lookup_nexthop. - BPF_REDIRECT: if an action has been executed through the bpf_lwt_seg6_action helper, the BPF program should return this value, as the skb's destination is already set and the default lookup should not be performed. - BPF_DROP : the packet will be dropped. Signed-off-by: NMathieu Xhonneux <m.xhonneux@gmail.com> Acked-by: NDavid Lebrun <dlebrun@google.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
由 Mathieu Xhonneux 提交于
The BPF seg6local hook should be powerful enough to enable users to implement most of the use-cases one could think of. After some thinking, we figured out that the following actions should be possible on a SRv6 packet, requiring 3 specific helpers : - bpf_lwt_seg6_store_bytes: Modify non-sensitive fields of the SRH - bpf_lwt_seg6_adjust_srh: Allow to grow or shrink a SRH (to add/delete TLVs) - bpf_lwt_seg6_action: Apply some SRv6 network programming actions (specifically End.X, End.T, End.B6 and End.B6.Encap) The specifications of these helpers are provided in the patch (see include/uapi/linux/bpf.h). The non-sensitive fields of the SRH are the following : flags, tag and TLVs. The other fields can not be modified, to maintain the SRH integrity. Flags, tag and TLVs can easily be modified as their validity can be checked afterwards via seg6_validate_srh. It is not allowed to modify the segments directly. If one wants to add segments on the path, he should stack a new SRH using the End.B6 action via bpf_lwt_seg6_action. Growing, shrinking or editing TLVs via the helpers will flag the SRH as invalid, and it will have to be re-validated before re-entering the IPv6 layer. This flag is stored in a per-CPU buffer, along with the current header length in bytes. Storing the SRH len in bytes in the control block is mandatory when using bpf_lwt_seg6_adjust_srh. The Header Ext. Length field contains the SRH len rounded to 8 bytes (a padding TLV can be inserted to ensure the 8-bytes boundary). When adding/deleting TLVs within the BPF program, the SRH may temporary be in an invalid state where its length cannot be rounded to 8 bytes without remainder, hence the need to store the length in bytes separately. The caller of the BPF program can then ensure that the SRH's final length is valid using this value. Again, a final SRH modified by a BPF program which doesn’t respect the 8-bytes boundary will be discarded as it will be considered as invalid. Finally, a fourth helper is provided, bpf_lwt_push_encap, which is available from the LWT BPF IN hook, but not from the seg6local BPF one. This helper allows to encapsulate a Segment Routing Header (either with a new outer IPv6 header, or by inlining it directly in the existing IPv6 header) into a non-SRv6 packet. This helper is required if we want to offer the possibility to dynamically encapsulate a SRH for non-SRv6 packet, as the BPF seg6local hook only works on traffic already containing a SRH. This is the BPF equivalent of the seg6 LWT infrastructure, which achieves the same purpose but with a static SRH per route. These helpers require CONFIG_IPV6=y (and not =m). Signed-off-by: NMathieu Xhonneux <m.xhonneux@gmail.com> Acked-by: NDavid Lebrun <dlebrun@google.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
由 Sandipan Das 提交于
This adds new two new fields to struct bpf_prog_info. For multi-function programs, these fields can be used to pass a list of the JITed image lengths of each function for a given program to userspace using the bpf system call with the BPF_OBJ_GET_INFO_BY_FD command. This can be used by userspace applications like bpftool to split up the contiguous JITed dump, also obtained via the system call, into more relatable chunks corresponding to each function. Signed-off-by: NSandipan Das <sandipan@linux.vnet.ibm.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
由 Sandipan Das 提交于
This adds new two new fields to struct bpf_prog_info. For multi-function programs, these fields can be used to pass a list of kernel symbol addresses for all functions in a given program to userspace using the bpf system call with the BPF_OBJ_GET_INFO_BY_FD command. When bpf_jit_kallsyms is enabled, we can get the address of the corresponding kernel symbol for a callee function and resolve the symbol's name. The address is determined by adding the value of the call instruction's imm field to __bpf_call_base. This offset gets assigned to the imm field by the verifier. For some architectures, such as powerpc64, the imm field is not large enough to hold this offset. We resolve this by: [1] Assigning the subprog id to the imm field of a call instruction in the verifier instead of the offset of the callee's symbol's address from __bpf_call_base. [2] Determining the address of a callee's corresponding symbol by using the imm field as an index for the list of kernel symbol addresses now available from the program info. Suggested-by: NDaniel Borkmann <daniel@iogearbox.net> Signed-off-by: NSandipan Das <sandipan@linux.vnet.ibm.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
- 23 5月, 2018 1 次提交
-
-
由 Martin KaFai Lau 提交于
In "struct bpf_map_info", the name "btf_id", "btf_key_id" and "btf_value_id" could cause confusion because the "id" of "btf_id" means the BPF obj id given to the BTF object while "btf_key_id" and "btf_value_id" means the BTF type id within that BTF object. To make it clear, btf_key_id and btf_value_id are renamed to btf_key_type_id and btf_value_type_id. Suggested-by: NDaniel Borkmann <daniel@iogearbox.net> Signed-off-by: NMartin KaFai Lau <kafai@fb.com> Acked-by: NYonghong Song <yhs@fb.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
- 19 5月, 2018 1 次提交
-
-
由 John Fastabend 提交于
Currently sk_msg programs only have access to the raw data. However, it is often useful when building policies to have the policies specific to the socket endpoint. This allows using the socket tuple as input into filters, etc. This patch adds ctx access to the sock fields. Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com> Acked-by: NMartin KaFai Lau <kafai@fb.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
- 16 5月, 2018 1 次提交
-
-
由 John Fastabend 提交于
Sockmap is currently backed by an array and enforces keys to be four bytes. This works well for many use cases and was originally modeled after devmap which also uses four bytes keys. However, this has become limiting in larger use cases where a hash would be more appropriate. For example users may want to use the 5-tuple of the socket as the lookup key. To support this add hash support. Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com> Acked-by: NDavid S. Miller <davem@davemloft.net> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
- 11 5月, 2018 1 次提交
-
-
由 David Ahern 提交于
Provide a helper for doing a FIB and neighbor lookup in the kernel tables from an XDP program. The helper provides a fastpath for forwarding packets. If the packet is a local delivery or for any reason is not a simple lookup and forward, the packet continues up the stack. If it is to be forwarded, the forwarding can be done directly if the neighbor is already known. If the neighbor does not exist, the first few packets go up the stack for neighbor resolution. Once resolved, the xdp program provides the fast path. On successful lookup the nexthop dmac, current device smac and egress device index are returned. The API supports IPv4, IPv6 and MPLS protocols, but only IPv4 and IPv6 are implemented in this patch. The API includes layer 4 parameters if the XDP program chooses to do deep packet inspection to allow compare against ACLs implemented as FIB rules. Header rewrite is left to the XDP program. The lookup takes 2 flags: - BPF_FIB_LOOKUP_DIRECT to do a lookup that bypasses FIB rules and goes straight to the table associated with the device (expert setting for those looking to maximize throughput) - BPF_FIB_LOOKUP_OUTPUT to do a lookup from the egress perspective. Default is an ingress lookup. Initial performance numbers collected by Jesper, forwarded packets/sec: Full stack XDP FIB lookup XDP Direct lookup IPv4 1,947,969 7,074,156 7,415,333 IPv6 1,728,000 6,165,504 7,262,720 These number are single CPU core forwarding on a Broadwell E5-1650 v4 @ 3.60GHz. Signed-off-by: NDavid Ahern <dsahern@gmail.com> Acked-by: NJesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
- 09 5月, 2018 2 次提交
-
-
由 Martin KaFai Lau 提交于
During BPF_OBJ_GET_INFO_BY_FD on a btf_fd, the current bpf_attr's info.info is directly filled with the BTF binary data. It is not extensible. In this case, we want to add BTF ID. This patch adds "struct bpf_btf_info" which has the BTF ID as one of its member. The BTF binary data itself is exposed through the "btf" and "btf_size" members. Signed-off-by: NMartin KaFai Lau <kafai@fb.com> Acked-by: NAlexei Starovoitov <ast@fb.com> Acked-by: NSong Liu <songliubraving@fb.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
由 Martin KaFai Lau 提交于
This patch gives an ID to each loaded BTF. The ID is allocated by the idr like the existing prog-id and map-id. The bpf_put(map->btf) is moved to __bpf_map_put() so that the userspace can stop seeing the BTF ID ASAP when the last BTF refcnt is gone. It also makes BTF accessible from userspace through the 1. new BPF_BTF_GET_FD_BY_ID command. It is limited to CAP_SYS_ADMIN which is inline with the BPF_BTF_LOAD cmd and the existing BPF_[MAP|PROG]_GET_FD_BY_ID cmd. 2. new btf_id (and btf_key_id + btf_value_id) in "struct bpf_map_info" Once the BTF ID handler is accessible from userspace, freeing a BTF object has to go through a rcu period. The BPF_BTF_GET_FD_BY_ID cmd can then be done under a rcu_read_lock() instead of taking spin_lock. [Note: A similar rcu usage can be done to the existing bpf_prog_get_fd_by_id() in a follow up patch] When processing the BPF_BTF_GET_FD_BY_ID cmd, refcount_inc_not_zero() is needed because the BTF object could be already in the rcu dead row . btf_get() is removed since its usage is currently limited to btf.c alone. refcount_inc() is used directly instead. Signed-off-by: NMartin KaFai Lau <kafai@fb.com> Acked-by: NAlexei Starovoitov <ast@fb.com> Acked-by: NSong Liu <songliubraving@fb.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
- 04 5月, 2018 2 次提交
-
-
由 Daniel Borkmann 提交于
This adds a small BPF helper similar to bpf_skb_load_bytes() that is able to load relative to mac/net header offset from the skb's linear data. Compared to bpf_skb_load_bytes(), it takes a fifth argument namely start_header, which is either BPF_HDR_START_MAC or BPF_HDR_START_NET. This allows for a more flexible alternative compared to LD_ABS/LD_IND with negative offset. It's enabled for tc BPF programs as well as sock filter program types where it's mainly useful in reuseport programs to ease access to lower header data. Reference: https://lists.iovisor.org/pipermail/iovisor-dev/2017-March/000698.htmlSigned-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
-
由 Björn Töpel 提交于
The xskmap is yet another BPF map, very much inspired by dev/cpu/sockmap, and is a holder of AF_XDP sockets. A user application adds AF_XDP sockets into the map, and by using the bpf_redirect_map helper, an XDP program can redirect XDP frames to an AF_XDP socket. Note that a socket that is bound to certain ifindex/queue index will *only* accept XDP frames from that netdev/queue index. If an XDP program tries to redirect from a netdev/queue index other than what the socket is bound to, the frame will not be received on the socket. A socket can reside in multiple maps. v3: Fixed race and simplified code. v2: Removed one indirection in map lookup. Signed-off-by: NBjörn Töpel <bjorn.topel@intel.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
-
- 30 4月, 2018 2 次提交
-
-
由 Quentin Monnet 提交于
Fix formatting (indent) for bpf_get_stack() helper documentation, so that the doc is rendered correctly with the Python script. Fixes: c195651e ("bpf: add bpf_get_stack helper") Cc: Yonghong Song <yhs@fb.com> Signed-off-by: NQuentin Monnet <quentin.monnet@netronome.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
由 Quentin Monnet 提交于
Some edits brought to the last iteration of BPF helper functions documentation introduced an error with RST formatting. As a result, most of one paragraph is rendered in bold text when only the name of a helper should be. Fix it, and fix formatting of another function name in the same paragraph. Fixes: c6b5fb86 ("bpf: add documentation for eBPF helpers (42-50)") Signed-off-by: NQuentin Monnet <quentin.monnet@netronome.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
- 29 4月, 2018 2 次提交
-
-
由 Andrey Ignatov 提交于
Helpers may operate on two types of ctx structures: user visible ones (e.g. `struct bpf_sock_ops`) when used in user programs, and kernel ones (e.g. `struct bpf_sock_ops_kern`) in kernel implementation. UAPI documentation must refer to only user visible structures. The patch replaces references to `_kern` structures in BPF helpers description by corresponding user visible structures. Signed-off-by: NAndrey Ignatov <rdna@fb.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
-
由 Yonghong Song 提交于
Currently, stackmap and bpf_get_stackid helper are provided for bpf program to get the stack trace. This approach has a limitation though. If two stack traces have the same hash, only one will get stored in the stackmap table, so some stack traces are missing from user perspective. This patch implements a new helper, bpf_get_stack, will send stack traces directly to bpf program. The bpf program is able to see all stack traces, and then can do in-kernel processing or send stack traces to user space through shared map or bpf_perf_event_output. Acked-by: NAlexei Starovoitov <ast@fb.com> Signed-off-by: NYonghong Song <yhs@fb.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
-
- 27 4月, 2018 7 次提交
-
-
由 Quentin Monnet 提交于
Add documentation for eBPF helper functions to bpf.h user header file. This documentation can be parsed with the Python script provided in another commit of the patch series, in order to provide a RST document that can later be converted into a man page. The objective is to make the documentation easily understandable and accessible to all eBPF developers, including beginners. This patch contains descriptions for the following helper functions: Helper from Nikita: - bpf_xdp_adjust_tail() Helper from Eyal: - bpf_skb_get_xfrm_state() v4: - New patch (helpers did not exist yet for previous versions). Cc: Nikita V. Shirokov <tehnerd@tehnerd.com> Cc: Eyal Birger <eyal.birger@gmail.com> Signed-off-by: NQuentin Monnet <quentin.monnet@netronome.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
由 Quentin Monnet 提交于
Add documentation for eBPF helper functions to bpf.h user header file. This documentation can be parsed with the Python script provided in another commit of the patch series, in order to provide a RST document that can later be converted into a man page. The objective is to make the documentation easily understandable and accessible to all eBPF developers, including beginners. This patch contains descriptions for the following helper functions, all written by John: - bpf_redirect_map() - bpf_sk_redirect_map() - bpf_sock_map_update() - bpf_msg_redirect_map() - bpf_msg_apply_bytes() - bpf_msg_cork_bytes() - bpf_msg_pull_data() v4: - bpf_redirect_map(): Fix typos: "XDP_ABORT" changed to "XDP_ABORTED", "his" to "this". Also add a paragraph on performance improvement over bpf_redirect() helper. v3: - bpf_sk_redirect_map(): Improve description of BPF_F_INGRESS flag. - bpf_msg_redirect_map(): Improve description of BPF_F_INGRESS flag. - bpf_redirect_map(): Fix note on CPU redirection, not fully implemented for generic XDP but supported on native XDP. - bpf_msg_pull_data(): Clarify comment about invalidated verifier checks. Cc: Jesper Dangaard Brouer <brouer@redhat.com> Cc: John Fastabend <john.fastabend@gmail.com> Signed-off-by: NQuentin Monnet <quentin.monnet@netronome.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
由 Quentin Monnet 提交于
Add documentation for eBPF helper functions to bpf.h user header file. This documentation can be parsed with the Python script provided in another commit of the patch series, in order to provide a RST document that can later be converted into a man page. The objective is to make the documentation easily understandable and accessible to all eBPF developers, including beginners. This patch contains descriptions for the following helper functions: Helpers from Lawrence: - bpf_setsockopt() - bpf_getsockopt() - bpf_sock_ops_cb_flags_set() Helpers from Yonghong: - bpf_perf_event_read_value() - bpf_perf_prog_read_value() Helper from Josef: - bpf_override_return() Helper from Andrey: - bpf_bind() v4: - bpf_perf_event_read_value(): State that this helper should be preferred over bpf_perf_event_read(). v3: - bpf_perf_event_read_value(): Fix time of selection for perf event type in description. Remove occurences of "cores" to avoid confusion with "CPU". - bpf_bind(): Remove last paragraph of description, which was off topic. Cc: Lawrence Brakmo <brakmo@fb.com> Cc: Yonghong Song <yhs@fb.com> Cc: Josef Bacik <jbacik@fb.com> Cc: Andrey Ignatov <rdna@fb.com> Signed-off-by: NQuentin Monnet <quentin.monnet@netronome.com> Acked-by: NYonghong Song <yhs@fb.com> [for bpf_perf_event_read_value(), bpf_perf_prog_read_value()] Acked-by: NAndrey Ignatov <rdna@fb.com> [for bpf_bind()] Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
由 Quentin Monnet 提交于
Add documentation for eBPF helper functions to bpf.h user header file. This documentation can be parsed with the Python script provided in another commit of the patch series, in order to provide a RST document that can later be converted into a man page. The objective is to make the documentation easily understandable and accessible to all eBPF developers, including beginners. This patch contains descriptions for the following helper functions: Helper from Kaixu: - bpf_perf_event_read() Helpers from Martin: - bpf_skb_under_cgroup() - bpf_xdp_adjust_head() Helpers from Sargun: - bpf_probe_write_user() - bpf_current_task_under_cgroup() Helper from Thomas: - bpf_skb_change_head() Helper from Gianluca: - bpf_probe_read_str() Helpers from Chenbo: - bpf_get_socket_cookie() - bpf_get_socket_uid() v4: - bpf_perf_event_read(): State that bpf_perf_event_read_value() should be preferred over this helper. - bpf_skb_change_head(): Clarify comment about invalidated verifier checks. - bpf_xdp_adjust_head(): Clarify comment about invalidated verifier checks. - bpf_probe_write_user(): Add that dst must be a valid user space address. - bpf_get_socket_cookie(): Improve description by making clearer that the cockie belongs to the socket, and state that it remains stable for the life of the socket. v3: - bpf_perf_event_read(): Fix time of selection for perf event type in description. Remove occurences of "cores" to avoid confusion with "CPU". Cc: Martin KaFai Lau <kafai@fb.com> Cc: Sargun Dhillon <sargun@sargun.me> Cc: Thomas Graf <tgraf@suug.ch> Cc: Gianluca Borello <g.borello@gmail.com> Cc: Chenbo Feng <fengc@google.com> Signed-off-by: NQuentin Monnet <quentin.monnet@netronome.com> Acked-by: NAlexei Starovoitov <ast@kernel.org> Acked-by: NMartin KaFai Lau <kafai@fb.com> [for bpf_skb_under_cgroup(), bpf_xdp_adjust_head()] Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
由 Quentin Monnet 提交于
Add documentation for eBPF helper functions to bpf.h user header file. This documentation can be parsed with the Python script provided in another commit of the patch series, in order to provide a RST document that can later be converted into a man page. The objective is to make the documentation easily understandable and accessible to all eBPF developers, including beginners. This patch contains descriptions for the following helper functions, all written by Daniel: - bpf_get_hash_recalc() - bpf_skb_change_tail() - bpf_skb_pull_data() - bpf_csum_update() - bpf_set_hash_invalid() - bpf_get_numa_node_id() - bpf_set_hash() - bpf_skb_adjust_room() - bpf_xdp_adjust_meta() v4: - bpf_skb_change_tail(): Clarify comment about invalidated verifier checks. - bpf_skb_pull_data(): Clarify the motivation for using this helper or bpf_skb_load_bytes(), on non-linear buffers. Fix RST formatting for *skb*. Clarify comment about invalidated verifier checks. - bpf_csum_update(): Fix description of checksum (entire packet, not IP checksum). Fix a typo: "header" instead of "helper". - bpf_set_hash_invalid(): Mention bpf_get_hash_recalc(). - bpf_get_numa_node_id(): State that the helper is not restricted to programs attached to sockets. - bpf_skb_adjust_room(): Clarify comment about invalidated verifier checks. - bpf_xdp_adjust_meta(): Clarify comment about invalidated verifier checks. Cc: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: NQuentin Monnet <quentin.monnet@netronome.com> Acked-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
由 Quentin Monnet 提交于
Add documentation for eBPF helper functions to bpf.h user header file. This documentation can be parsed with the Python script provided in another commit of the patch series, in order to provide a RST document that can later be converted into a man page. The objective is to make the documentation easily understandable and accessible to all eBPF developers, including beginners. This patch contains descriptions for the following helper functions, all written by Daniel: - bpf_get_prandom_u32() - bpf_get_smp_processor_id() - bpf_get_cgroup_classid() - bpf_get_route_realm() - bpf_skb_load_bytes() - bpf_csum_diff() - bpf_skb_get_tunnel_opt() - bpf_skb_set_tunnel_opt() - bpf_skb_change_proto() - bpf_skb_change_type() v4: - bpf_get_prandom_u32(): Warn that the prng is not cryptographically secure. - bpf_get_smp_processor_id(): Fix a typo (case). - bpf_get_cgroup_classid(): Clarify description. Add notes on the helper being limited to cgroup v1, and to egress path. - bpf_get_route_realm(): Add comparison with bpf_get_cgroup_classid(). Add a note about usage with TC and advantage of clsact. Fix a typo in return value ("sdb" instead of "skb"). - bpf_skb_load_bytes(): Make explicit loading large data loads it to the eBPF stack. - bpf_csum_diff(): Add a note on seed that can be cascaded. Link to bpf_l3|l4_csum_replace(). - bpf_skb_get_tunnel_opt(): Add a note about usage with "collect metadata" mode, and example of this with Geneve. - bpf_skb_set_tunnel_opt(): Add a link to bpf_skb_get_tunnel_opt() description. - bpf_skb_change_proto(): Mention that the main use case is NAT64. Clarify comment about invalidated verifier checks. v3: - bpf_get_prandom_u32(): Fix helper name :(. Add description, including a note on the internal random state. - bpf_get_smp_processor_id(): Add description, including a note on the processor id remaining stable during program run. - bpf_get_cgroup_classid(): State that CONFIG_CGROUP_NET_CLASSID is required to use the helper. Add a reference to related documentation. State that placing a task in net_cls controller disables cgroup-bpf. - bpf_get_route_realm(): State that CONFIG_CGROUP_NET_CLASSID is required to use this helper. - bpf_skb_load_bytes(): Fix comment on current use cases for the helper. Cc: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: NQuentin Monnet <quentin.monnet@netronome.com> Acked-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
由 Quentin Monnet 提交于
Add documentation for eBPF helper functions to bpf.h user header file. This documentation can be parsed with the Python script provided in another commit of the patch series, in order to provide a RST document that can later be converted into a man page. The objective is to make the documentation easily understandable and accessible to all eBPF developers, including beginners. This patch contains descriptions for the following helper functions, all written by Alexei: - bpf_get_current_pid_tgid() - bpf_get_current_uid_gid() - bpf_get_current_comm() - bpf_skb_vlan_push() - bpf_skb_vlan_pop() - bpf_skb_get_tunnel_key() - bpf_skb_set_tunnel_key() - bpf_redirect() - bpf_perf_event_output() - bpf_get_stackid() - bpf_get_current_task() v4: - bpf_redirect(): Fix typo: "XDP_ABORT" changed to "XDP_ABORTED". Add note on bpf_redirect_map() providing better performance. Replace "Save for" with "Except for". - bpf_skb_vlan_push(): Clarify comment about invalidated verifier checks. - bpf_skb_vlan_pop(): Clarify comment about invalidated verifier checks. - bpf_skb_get_tunnel_key(): Add notes on tunnel_id, "collect metadata" mode, and example tunneling protocols with which it can be used. - bpf_skb_set_tunnel_key(): Add a reference to the description of bpf_skb_get_tunnel_key(). - bpf_perf_event_output(): Specify that, and for what purpose, the helper can be used with programs attached to TC and XDP. v3: - bpf_skb_get_tunnel_key(): Change and improve description and example. - bpf_redirect(): Improve description of BPF_F_INGRESS flag. - bpf_perf_event_output(): Fix first sentence of description. Delete wrong statement on context being evaluated as a struct pt_reg. Remove the long yet incomplete example. - bpf_get_stackid(): Add a note about PERF_MAX_STACK_DEPTH being configurable. Cc: Alexei Starovoitov <ast@kernel.org> Signed-off-by: NQuentin Monnet <quentin.monnet@netronome.com> Acked-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-