- 04 5月, 2018 5 次提交
-
-
由 Björn Töpel 提交于
The xskmap is yet another BPF map, very much inspired by dev/cpu/sockmap, and is a holder of AF_XDP sockets. A user application adds AF_XDP sockets into the map, and by using the bpf_redirect_map helper, an XDP program can redirect XDP frames to an AF_XDP socket. Note that a socket that is bound to certain ifindex/queue index will *only* accept XDP frames from that netdev/queue index. If an XDP program tries to redirect from a netdev/queue index other than what the socket is bound to, the frame will not be received on the socket. A socket can reside in multiple maps. v3: Fixed race and simplified code. v2: Removed one indirection in map lookup. Signed-off-by: NBjörn Töpel <bjorn.topel@intel.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
-
由 Magnus Karlsson 提交于
Here, the bind syscall is added. Binding an AF_XDP socket, means associating the socket to an umem, a netdev and a queue index. This can be done in two ways. The first way, creating a "socket from scratch". Create the umem using the XDP_UMEM_REG setsockopt and an associated fill queue with XDP_UMEM_FILL_QUEUE. Create the Rx queue using the XDP_RX_QUEUE setsockopt. Call bind passing ifindex and queue index ("channel" in ethtool speak). The second way to bind a socket, is simply skipping the umem/netdev/queue index, and passing another already setup AF_XDP socket. The new socket will then have the same umem/netdev/queue index as the parent so it will share the same umem. You must also set the flags field in the socket address to XDP_SHARED_UMEM. v2: Use PTR_ERR instead of passing error variable explicitly. Signed-off-by: NMagnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> -
由 Björn Töpel 提交于
Another setsockopt (XDP_RX_QUEUE) is added to let the process allocate a queue, where the kernel can pass completed Rx frames from the kernel to user process. The mmapping of the queue is done using the XDP_PGOFF_RX_QUEUE offset. Signed-off-by: NBjörn Töpel <bjorn.topel@intel.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
-
由 Magnus Karlsson 提交于
Here, we add another setsockopt for registered user memory (umem) called XDP_UMEM_FILL_QUEUE. Using this socket option, the process can ask the kernel to allocate a queue (ring buffer) and also mmap it (XDP_UMEM_PGOFF_FILL_QUEUE) into the process. The queue is used to explicitly pass ownership of umem frames from the user process to the kernel. These frames will in a later patch be filled in with Rx packet data by the kernel. v2: Fixed potential crash in xsk_mmap. Signed-off-by: NMagnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
-
由 Björn Töpel 提交于
In this commit the base structure of the AF_XDP address family is set up. Further, we introduce the abilty register a window of user memory to the kernel via the XDP_UMEM_REG setsockopt syscall. The memory window is viewed by an AF_XDP socket as a set of equally large frames. After a user memory registration all frames are "owned" by the user application, and not the kernel. v2: More robust checks on umem creation and unaccount on error. Call set_page_dirty_lock on cleanup. Simplified xdp_umem_reg. Co-authored-by: NMagnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: NMagnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: NBjörn Töpel <bjorn.topel@intel.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
-
- 30 4月, 2018 2 次提交
-
-
由 Quentin Monnet 提交于
Fix formatting (indent) for bpf_get_stack() helper documentation, so that the doc is rendered correctly with the Python script. Fixes: c195651e ("bpf: add bpf_get_stack helper") Cc: Yonghong Song <yhs@fb.com> Signed-off-by: NQuentin Monnet <quentin.monnet@netronome.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
由 Quentin Monnet 提交于
Some edits brought to the last iteration of BPF helper functions documentation introduced an error with RST formatting. As a result, most of one paragraph is rendered in bold text when only the name of a helper should be. Fix it, and fix formatting of another function name in the same paragraph. Fixes: c6b5fb86 ("bpf: add documentation for eBPF helpers (42-50)") Signed-off-by: NQuentin Monnet <quentin.monnet@netronome.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
- 29 4月, 2018 2 次提交
-
-
由 Andrey Ignatov 提交于
Helpers may operate on two types of ctx structures: user visible ones (e.g. `struct bpf_sock_ops`) when used in user programs, and kernel ones (e.g. `struct bpf_sock_ops_kern`) in kernel implementation. UAPI documentation must refer to only user visible structures. The patch replaces references to `_kern` structures in BPF helpers description by corresponding user visible structures. Signed-off-by: NAndrey Ignatov <rdna@fb.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
-
由 Yonghong Song 提交于
Currently, stackmap and bpf_get_stackid helper are provided for bpf program to get the stack trace. This approach has a limitation though. If two stack traces have the same hash, only one will get stored in the stackmap table, so some stack traces are missing from user perspective. This patch implements a new helper, bpf_get_stack, will send stack traces directly to bpf program. The bpf program is able to see all stack traces, and then can do in-kernel processing or send stack traces to user space through shared map or bpf_perf_event_output. Acked-by: NAlexei Starovoitov <ast@fb.com> Signed-off-by: NYonghong Song <yhs@fb.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
-
- 27 4月, 2018 11 次提交
-
-
由 Quentin Monnet 提交于
Add documentation for eBPF helper functions to bpf.h user header file. This documentation can be parsed with the Python script provided in another commit of the patch series, in order to provide a RST document that can later be converted into a man page. The objective is to make the documentation easily understandable and accessible to all eBPF developers, including beginners. This patch contains descriptions for the following helper functions: Helper from Nikita: - bpf_xdp_adjust_tail() Helper from Eyal: - bpf_skb_get_xfrm_state() v4: - New patch (helpers did not exist yet for previous versions). Cc: Nikita V. Shirokov <tehnerd@tehnerd.com> Cc: Eyal Birger <eyal.birger@gmail.com> Signed-off-by: NQuentin Monnet <quentin.monnet@netronome.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
由 Quentin Monnet 提交于
Add documentation for eBPF helper functions to bpf.h user header file. This documentation can be parsed with the Python script provided in another commit of the patch series, in order to provide a RST document that can later be converted into a man page. The objective is to make the documentation easily understandable and accessible to all eBPF developers, including beginners. This patch contains descriptions for the following helper functions, all written by John: - bpf_redirect_map() - bpf_sk_redirect_map() - bpf_sock_map_update() - bpf_msg_redirect_map() - bpf_msg_apply_bytes() - bpf_msg_cork_bytes() - bpf_msg_pull_data() v4: - bpf_redirect_map(): Fix typos: "XDP_ABORT" changed to "XDP_ABORTED", "his" to "this". Also add a paragraph on performance improvement over bpf_redirect() helper. v3: - bpf_sk_redirect_map(): Improve description of BPF_F_INGRESS flag. - bpf_msg_redirect_map(): Improve description of BPF_F_INGRESS flag. - bpf_redirect_map(): Fix note on CPU redirection, not fully implemented for generic XDP but supported on native XDP. - bpf_msg_pull_data(): Clarify comment about invalidated verifier checks. Cc: Jesper Dangaard Brouer <brouer@redhat.com> Cc: John Fastabend <john.fastabend@gmail.com> Signed-off-by: NQuentin Monnet <quentin.monnet@netronome.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
由 Quentin Monnet 提交于
Add documentation for eBPF helper functions to bpf.h user header file. This documentation can be parsed with the Python script provided in another commit of the patch series, in order to provide a RST document that can later be converted into a man page. The objective is to make the documentation easily understandable and accessible to all eBPF developers, including beginners. This patch contains descriptions for the following helper functions: Helpers from Lawrence: - bpf_setsockopt() - bpf_getsockopt() - bpf_sock_ops_cb_flags_set() Helpers from Yonghong: - bpf_perf_event_read_value() - bpf_perf_prog_read_value() Helper from Josef: - bpf_override_return() Helper from Andrey: - bpf_bind() v4: - bpf_perf_event_read_value(): State that this helper should be preferred over bpf_perf_event_read(). v3: - bpf_perf_event_read_value(): Fix time of selection for perf event type in description. Remove occurences of "cores" to avoid confusion with "CPU". - bpf_bind(): Remove last paragraph of description, which was off topic. Cc: Lawrence Brakmo <brakmo@fb.com> Cc: Yonghong Song <yhs@fb.com> Cc: Josef Bacik <jbacik@fb.com> Cc: Andrey Ignatov <rdna@fb.com> Signed-off-by: NQuentin Monnet <quentin.monnet@netronome.com> Acked-by: NYonghong Song <yhs@fb.com> [for bpf_perf_event_read_value(), bpf_perf_prog_read_value()] Acked-by: NAndrey Ignatov <rdna@fb.com> [for bpf_bind()] Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
由 Quentin Monnet 提交于
Add documentation for eBPF helper functions to bpf.h user header file. This documentation can be parsed with the Python script provided in another commit of the patch series, in order to provide a RST document that can later be converted into a man page. The objective is to make the documentation easily understandable and accessible to all eBPF developers, including beginners. This patch contains descriptions for the following helper functions: Helper from Kaixu: - bpf_perf_event_read() Helpers from Martin: - bpf_skb_under_cgroup() - bpf_xdp_adjust_head() Helpers from Sargun: - bpf_probe_write_user() - bpf_current_task_under_cgroup() Helper from Thomas: - bpf_skb_change_head() Helper from Gianluca: - bpf_probe_read_str() Helpers from Chenbo: - bpf_get_socket_cookie() - bpf_get_socket_uid() v4: - bpf_perf_event_read(): State that bpf_perf_event_read_value() should be preferred over this helper. - bpf_skb_change_head(): Clarify comment about invalidated verifier checks. - bpf_xdp_adjust_head(): Clarify comment about invalidated verifier checks. - bpf_probe_write_user(): Add that dst must be a valid user space address. - bpf_get_socket_cookie(): Improve description by making clearer that the cockie belongs to the socket, and state that it remains stable for the life of the socket. v3: - bpf_perf_event_read(): Fix time of selection for perf event type in description. Remove occurences of "cores" to avoid confusion with "CPU". Cc: Martin KaFai Lau <kafai@fb.com> Cc: Sargun Dhillon <sargun@sargun.me> Cc: Thomas Graf <tgraf@suug.ch> Cc: Gianluca Borello <g.borello@gmail.com> Cc: Chenbo Feng <fengc@google.com> Signed-off-by: NQuentin Monnet <quentin.monnet@netronome.com> Acked-by: NAlexei Starovoitov <ast@kernel.org> Acked-by: NMartin KaFai Lau <kafai@fb.com> [for bpf_skb_under_cgroup(), bpf_xdp_adjust_head()] Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
由 Quentin Monnet 提交于
Add documentation for eBPF helper functions to bpf.h user header file. This documentation can be parsed with the Python script provided in another commit of the patch series, in order to provide a RST document that can later be converted into a man page. The objective is to make the documentation easily understandable and accessible to all eBPF developers, including beginners. This patch contains descriptions for the following helper functions, all written by Daniel: - bpf_get_hash_recalc() - bpf_skb_change_tail() - bpf_skb_pull_data() - bpf_csum_update() - bpf_set_hash_invalid() - bpf_get_numa_node_id() - bpf_set_hash() - bpf_skb_adjust_room() - bpf_xdp_adjust_meta() v4: - bpf_skb_change_tail(): Clarify comment about invalidated verifier checks. - bpf_skb_pull_data(): Clarify the motivation for using this helper or bpf_skb_load_bytes(), on non-linear buffers. Fix RST formatting for *skb*. Clarify comment about invalidated verifier checks. - bpf_csum_update(): Fix description of checksum (entire packet, not IP checksum). Fix a typo: "header" instead of "helper". - bpf_set_hash_invalid(): Mention bpf_get_hash_recalc(). - bpf_get_numa_node_id(): State that the helper is not restricted to programs attached to sockets. - bpf_skb_adjust_room(): Clarify comment about invalidated verifier checks. - bpf_xdp_adjust_meta(): Clarify comment about invalidated verifier checks. Cc: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: NQuentin Monnet <quentin.monnet@netronome.com> Acked-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
由 Quentin Monnet 提交于
Add documentation for eBPF helper functions to bpf.h user header file. This documentation can be parsed with the Python script provided in another commit of the patch series, in order to provide a RST document that can later be converted into a man page. The objective is to make the documentation easily understandable and accessible to all eBPF developers, including beginners. This patch contains descriptions for the following helper functions, all written by Daniel: - bpf_get_prandom_u32() - bpf_get_smp_processor_id() - bpf_get_cgroup_classid() - bpf_get_route_realm() - bpf_skb_load_bytes() - bpf_csum_diff() - bpf_skb_get_tunnel_opt() - bpf_skb_set_tunnel_opt() - bpf_skb_change_proto() - bpf_skb_change_type() v4: - bpf_get_prandom_u32(): Warn that the prng is not cryptographically secure. - bpf_get_smp_processor_id(): Fix a typo (case). - bpf_get_cgroup_classid(): Clarify description. Add notes on the helper being limited to cgroup v1, and to egress path. - bpf_get_route_realm(): Add comparison with bpf_get_cgroup_classid(). Add a note about usage with TC and advantage of clsact. Fix a typo in return value ("sdb" instead of "skb"). - bpf_skb_load_bytes(): Make explicit loading large data loads it to the eBPF stack. - bpf_csum_diff(): Add a note on seed that can be cascaded. Link to bpf_l3|l4_csum_replace(). - bpf_skb_get_tunnel_opt(): Add a note about usage with "collect metadata" mode, and example of this with Geneve. - bpf_skb_set_tunnel_opt(): Add a link to bpf_skb_get_tunnel_opt() description. - bpf_skb_change_proto(): Mention that the main use case is NAT64. Clarify comment about invalidated verifier checks. v3: - bpf_get_prandom_u32(): Fix helper name :(. Add description, including a note on the internal random state. - bpf_get_smp_processor_id(): Add description, including a note on the processor id remaining stable during program run. - bpf_get_cgroup_classid(): State that CONFIG_CGROUP_NET_CLASSID is required to use the helper. Add a reference to related documentation. State that placing a task in net_cls controller disables cgroup-bpf. - bpf_get_route_realm(): State that CONFIG_CGROUP_NET_CLASSID is required to use this helper. - bpf_skb_load_bytes(): Fix comment on current use cases for the helper. Cc: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: NQuentin Monnet <quentin.monnet@netronome.com> Acked-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> -
由 Quentin Monnet 提交于
Add documentation for eBPF helper functions to bpf.h user header file. This documentation can be parsed with the Python script provided in another commit of the patch series, in order to provide a RST document that can later be converted into a man page. The objective is to make the documentation easily understandable and accessible to all eBPF developers, including beginners. This patch contains descriptions for the following helper functions, all written by Alexei: - bpf_get_current_pid_tgid() - bpf_get_current_uid_gid() - bpf_get_current_comm() - bpf_skb_vlan_push() - bpf_skb_vlan_pop() - bpf_skb_get_tunnel_key() - bpf_skb_set_tunnel_key() - bpf_redirect() - bpf_perf_event_output() - bpf_get_stackid() - bpf_get_current_task() v4: - bpf_redirect(): Fix typo: "XDP_ABORT" changed to "XDP_ABORTED". Add note on bpf_redirect_map() providing better performance. Replace "Save for" with "Except for". - bpf_skb_vlan_push(): Clarify comment about invalidated verifier checks. - bpf_skb_vlan_pop(): Clarify comment about invalidated verifier checks. - bpf_skb_get_tunnel_key(): Add notes on tunnel_id, "collect metadata" mode, and example tunneling protocols with which it can be used. - bpf_skb_set_tunnel_key(): Add a reference to the description of bpf_skb_get_tunnel_key(). - bpf_perf_event_output(): Specify that, and for what purpose, the helper can be used with programs attached to TC and XDP. v3: - bpf_skb_get_tunnel_key(): Change and improve description and example. - bpf_redirect(): Improve description of BPF_F_INGRESS flag. - bpf_perf_event_output(): Fix first sentence of description. Delete wrong statement on context being evaluated as a struct pt_reg. Remove the long yet incomplete example. - bpf_get_stackid(): Add a note about PERF_MAX_STACK_DEPTH being configurable. Cc: Alexei Starovoitov <ast@kernel.org> Signed-off-by: NQuentin Monnet <quentin.monnet@netronome.com> Acked-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
由 Quentin Monnet 提交于
Add documentation for eBPF helper functions to bpf.h user header file. This documentation can be parsed with the Python script provided in another commit of the patch series, in order to provide a RST document that can later be converted into a man page. The objective is to make the documentation easily understandable and accessible to all eBPF developers, including beginners. This patch contains descriptions for the following helper functions, all written by Alexei: - bpf_map_lookup_elem() - bpf_map_update_elem() - bpf_map_delete_elem() - bpf_probe_read() - bpf_ktime_get_ns() - bpf_trace_printk() - bpf_skb_store_bytes() - bpf_l3_csum_replace() - bpf_l4_csum_replace() - bpf_tail_call() - bpf_clone_redirect() v4: - bpf_map_lookup_elem(): Add "const" qualifier for key. - bpf_map_update_elem(): Add "const" qualifier for key and value. - bpf_map_lookup_elem(): Add "const" qualifier for key. - bpf_skb_store_bytes(): Clarify comment about invalidated verifier checks. - bpf_l3_csum_replace(): Mention L3 instead of just IP, and add a note about bpf_csum_diff(). - bpf_l4_csum_replace(): Mention L4 instead of just TCP/UDP, and add a note about bpf_csum_diff(). - bpf_tail_call(): Bring minor edits to description. - bpf_clone_redirect(): Add a note about the relation with bpf_redirect(). Also clarify comment about invalidated verifier checks. v3: - bpf_map_lookup_elem(): Fix description of restrictions for flags related to the existence of the entry. - bpf_trace_printk(): State that trace_pipe can be configured. Fix return value in case an unknown format specifier is met. Add a note on kernel log notice when the helper is used. Edit example. - bpf_tail_call(): Improve comment on stack inheritance. - bpf_clone_redirect(): Improve description of BPF_F_INGRESS flag. Cc: Alexei Starovoitov <ast@kernel.org> Signed-off-by: NQuentin Monnet <quentin.monnet@netronome.com> Acked-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
由 Quentin Monnet 提交于
Remove previous "overview" of eBPF helpers from user bpf.h header. Replace it by a comment explaining how to process the new documentation (to come in following patches) with a Python script to produce RST, then man page documentation. Also add the aforementioned Python script under scripts/. It is used to process include/uapi/linux/bpf.h and to extract helper descriptions, to turn it into a RST document that can further be processed with rst2man to produce a man page. The script takes one "--filename <path/to/file>" option. If the script is launched from scripts/ in the kernel root directory, it should be able to find the location of the header to parse, and "--filename <path/to/file>" is then optional. If it cannot find the file, then the option becomes mandatory. RST-formatted documentation is printed to standard output. Typical workflow for producing the final man page would be: $ ./scripts/bpf_helpers_doc.py \ --filename include/uapi/linux/bpf.h > /tmp/bpf-helpers.rst $ rst2man /tmp/bpf-helpers.rst > /tmp/bpf-helpers.7 $ man /tmp/bpf-helpers.7 Note that the tool kernel-doc cannot be used to document eBPF helpers, whose signatures are not available directly in the header files (pre-processor directives are used to produce them at the beginning of the compilation process). v4: - Also remove overviews for newly added bpf_xdp_adjust_tail() and bpf_skb_get_xfrm_state(). - Remove vague statement about what helpers are restricted to GPL programs in "LICENSE" section for man page footer. - Replace license boilerplate with SPDX tag for Python script. v3: - Change license for man page. - Remove "for safety reasons" from man page header text. - Change "packets metadata" to "packets" in man page header text. - Move and fix comment on helpers introducing no overhead. - Remove "NOTES" section from man page footer. - Add "LICENSE" section to man page footer. - Edit description of file include/uapi/linux/bpf.h in man page footer. Signed-off-by: NQuentin Monnet <quentin.monnet@netronome.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> -
由 Jiri Olsa 提交于
Adding gpl_compatible flag to struct bpf_prog_info so it can be dumped via bpf_prog_get_info_by_fd and displayed via bpftool progs dump. Alexei noticed 4-byte hole in struct bpf_prog_info, so we put the u32 flags field in there, and we can keep adding bit fields in there without breaking user space. Signed-off-by: NJiri Olsa <jolsa@kernel.org> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
由 Willem de Bruijn 提交于
Support generic segmentation offload for udp datagrams. Callers can concatenate and send at once the payload of multiple datagrams with the same destination. To set segment size, the caller sets socket option UDP_SEGMENT to the length of each discrete payload. This value must be smaller than or equal to the relevant MTU. A follow-up patch adds cmsg UDP_SEGMENT to specify segment size on a per send call basis. Total byte length may then exceed MTU. If not an exact multiple of segment size, the last segment will be shorter. The implementation adds a gso_size field to the udp socket, ip(v6) cmsg cookie and inet_cork structure to be able to set the value at setsockopt or cmsg time and to work with both lockless and corked paths. Initial benchmark numbers show UDP GSO about as expensive as TCP GSO. tcp tso 3197 MB/s 54232 msg/s 54232 calls/s 6,457,754,262 cycles tcp gso 1765 MB/s 29939 msg/s 29939 calls/s 11,203,021,806 cycles tcp without tso/gso * 739 MB/s 12548 msg/s 12548 calls/s 11,205,483,630 cycles udp 876 MB/s 14873 msg/s 624666 calls/s 11,205,777,429 cycles udp gso 2139 MB/s 36282 msg/s 36282 calls/s 11,204,374,561 cycles [*] after reverting commit 0a6b2a1d ("tcp: switch to GSO being always on") Measured total system cycles ('-a') for one core while pinning both the network receive path and benchmark process to that core: perf stat -a -C 12 -e cycles \ ./udpgso_bench_tx -C 12 -4 -D "$DST" -l 4 Note the reduction in calls/s with GSO. Bytes per syscall drops increases from 1470 to 61818. Signed-off-by: NWillem de Bruijn <willemb@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 25 4月, 2018 1 次提交
-
-
由 Eyal Birger 提交于
This commit introduces a helper which allows fetching xfrm state parameters by eBPF programs attached to TC. Prototype: bpf_skb_get_xfrm_state(skb, index, xfrm_state, size, flags) skb: pointer to skb index: the index in the skb xfrm_state secpath array xfrm_state: pointer to 'struct bpf_xfrm_state' size: size of 'struct bpf_xfrm_state' flags: reserved for future extensions The helper returns 0 on success. Non zero if no xfrm state at the index is found - or non exists at all. struct bpf_xfrm_state currently includes the SPI, peer IPv4/IPv6 address and the reqid; it can be further extended by adding elements to its end - indicating the populated fields by the 'size' argument - keeping backwards compatibility. Typical usage: struct bpf_xfrm_state x = {}; bpf_skb_get_xfrm_state(skb, 0, &x, sizeof(x), 0); ... Signed-off-by: NEyal Birger <eyal.birger@gmail.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
- 23 4月, 2018 1 次提交
-
-
由 Martin KaFai Lau 提交于
This patch cleans up btf.h in uapi: 1) Rename "name" to "name_off" to better reflect it is an offset to the string section instead of a char array. 2) Remove unused value BTF_FLAGS_COMPR and BTF_MAGIC_SWAP Suggested-by: NDaniel Borkmann <daniel@iogearbox.net> Signed-off-by: NMartin KaFai Lau <kafai@fb.com> Acked-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
- 20 4月, 2018 6 次提交
-
-
In previous commit, we changed the default emulated MTU for UDP bearers to 14k. This commit adds the functionality to set/change the default value by configuring new MTU for UDP media. UDP bearer(s) have to be disabled and enabled back for the new MTU to take effect. Acked-by: NYing Xue <ying.xue@windriver.com> Acked-by: NJon Maloy <jon.maloy@ericsson.com> Signed-off-by: NGhantaKrishnamurthy MohanKrishna <mohan.krishna.ghanta.krishnamurthy@ericsson.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
Currently, all bearers are configured with MTU value same as the underlying L2 device. However, in case of bearers with media type UDP, higher throughput is possible with a fixed and higher emulated MTU value than adapting to the underlying L2 MTU. In this commit, we introduce a parameter mtu in struct tipc_media and a default value is set for UDP. A default value of 14k was determined by experimentation and found to have a higher throughput than 16k. MTU for UDP bearers are assigned the above set value of media MTU. Acked-by: NYing Xue <ying.xue@windriver.com> Acked-by: NJon Maloy <jon.maloy@ericsson.com> Signed-off-by: NGhantaKrishnamurthy MohanKrishna <mohan.krishna.ghanta.krishnamurthy@ericsson.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Martin KaFai Lau 提交于
This patch adds pretty print support to the basic arraymap. Support for other bpf maps can be added later. This patch adds new attrs to the BPF_MAP_CREATE command to allow specifying the btf_fd, btf_key_id and btf_value_id. The BPF_MAP_CREATE can then associate the btf to the map if the creating map supports BTF. A BTF supported map needs to implement two new map ops, map_seq_show_elem() and map_check_btf(). This patch has implemented these new map ops for the basic arraymap. It also adds file_operations, bpffs_map_fops, to the pinned map such that the pinned map can be opened and read. After that, the user has an intuitive way to do "cat bpffs/pathto/a-pinned-map" instead of getting an error. bpffs_map_fops should not be extended further to support other operations. Other operations (e.g. write/key-lookup...) should be realized by the userspace tools (e.g. bpftool) through the BPF_OBJ_GET_INFO_BY_FD, map's lookup/update interface...etc. Follow up patches will allow the userspace to obtain the BTF from a map-fd. Here is a sample output when reading a pinned arraymap with the following map's value: struct map_value { int count_a; int count_b; }; cat /sys/fs/bpf/pinned_array_map: 0: {1,2} 1: {3,4} 2: {5,6} ... Signed-off-by: NMartin KaFai Lau <kafai@fb.com> Acked-by: NAlexei Starovoitov <ast@fb.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> -
由 Martin KaFai Lau 提交于
This patch adds a BPF_BTF_LOAD command which 1) loads and verifies the BTF (implemented in earlier patches) 2) returns a BTF fd to userspace. In the next patch, the BTF fd can be specified during BPF_MAP_CREATE. It currently limits to CAP_SYS_ADMIN. Signed-off-by: NMartin KaFai Lau <kafai@fb.com> Acked-by: NAlexei Starovoitov <ast@fb.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
由 Martin KaFai Lau 提交于
This patch introduces BPF type Format (BTF). BTF (BPF Type Format) is the meta data format which describes the data types of BPF program/map. Hence, it basically focus on the C programming language which the modern BPF is primary using. The first use case is to provide a generic pretty print capability for a BPF map. BTF has its root from CTF (Compact C-Type format). To simplify the handling of BTF data, BTF removes the differences between small and big type/struct-member. Hence, BTF consistently uses u32 instead of supporting both "one u16" and "two u32 (+padding)" in describing type and struct-member. It also raises the number of types (and functions) limit from 0x7fff to 0x7fffffff. Due to the above changes, the format is not compatible to CTF. Hence, BTF starts with a new BTF_MAGIC and version number. This patch does the first verification pass to the BTF. The first pass checks: 1. meta-data size (e.g. It does not go beyond the total btf's size) 2. name_offset is valid 3. Each BTF_KIND (e.g. int, enum, struct....) does its own check of its meta-data. Some other checks, like checking a struct's member is referring to a valid type, can only be done in the second pass. The second verification pass will be implemented in the next patch. Signed-off-by: NMartin KaFai Lau <kafai@fb.com> Acked-by: NAlexei Starovoitov <ast@fb.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
由 Yuchung Cheng 提交于
Export data delivered and delivered with CE marks to 1) SNMP TCPDelivered and TCPDeliveredCE 2) getsockopt(TCP_INFO) 3) Timestamping API SOF_TIMESTAMPING_OPT_STATS Note that for SCM_TSTAMP_ACK, the delivery info in SOF_TIMESTAMPING_OPT_STATS is reported before the info was fully updated on the ACK. These stats help application monitor TCP delivery and ECN status on per host, per connection, even per message level. Signed-off-by: NYuchung Cheng <ycheng@google.com> Reviewed-by: NNeal Cardwell <ncardwell@google.com> Reviewed-by: NSoheil Hassas Yeganeh <soheil@google.com> Reviewed-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 19 4月, 2018 1 次提交
-
-
由 Nikita V. Shirokov 提交于
Adding new bpf helper which would allow us to manipulate xdp's data_end pointer, and allow us to reduce packet's size indended use case: to generate ICMP messages from XDP context, where such message would contain truncated original packet. Signed-off-by: NNikita V. Shirokov <tehnerd@tehnerd.com> Acked-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
- 18 4月, 2018 1 次提交
-
-
由 Hangbin Liu 提交于
Like tos inherit, ttl inherit should also means inherit the inner protocol's ttl values, which actually not implemented in vxlan yet. But we could not treat ttl == 0 as "use the inner TTL", because that would be used also when the "ttl" option is not specified and that would be a behavior change, and breaking real use cases. So add a different attribute IFLA_VXLAN_TTL_INHERIT when "ttl inherit" is specified with ip cmd. Reported-by: NJianlin Shi <jishi@redhat.com> Suggested-by: NJiri Benc <jbenc@redhat.com> Signed-off-by: NHangbin Liu <liuhangbin@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 17 4月, 2018 2 次提交
-
-
由 Alexey Budankov 提交于
Store preempting context switch out event into Perf trace as a part of PERF_RECORD_SWITCH[_CPU_WIDE] record. Percentage of preempting and non-preempting context switches help understanding the nature of workloads (CPU or IO bound) that are running on a machine; The event is treated as preemption one when task->state value of the thread being switched out is TASK_RUNNING. Event type encoding is implemented using PERF_RECORD_MISC_SWITCH_OUT_PREEMPT bit; Signed-off-by: NAlexey Budankov <alexey.budankov@linux.intel.com> Acked-by: NPeter Zijlstra <peterz@infradead.org> Tested-by: NArnaldo Carvalho de Melo <acme@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lkml.kernel.org/r/9ff84e83-a0ca-dd82-a6d0-cb951689be74@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
-
由 Heiner Kallweit 提交于
This patch adds missing values for the max read request size. E.g. network driver r8169 uses a value of 4K. Signed-off-by: NHeiner Kallweit <hkallweit1@gmail.com> Acked-by: NBjorn Helgaas <bhelgaas@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 14 4月, 2018 1 次提交
-
-
由 Theodore Ts'o 提交于
Add a new ioctl which forces the the crng to be reseeded. Signed-off-by: NTheodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org
-
- 12 4月, 2018 6 次提交
-
-
由 Masahiro Yamada 提交于
Minor cleanups available by _UL and _ULL. Link: http://lkml.kernel.org/r/1519301715-31798-5-git-send-email-yamada.masahiro@socionext.comSigned-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: David Howells <dhowells@redhat.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Guan Xuetao <gxt@mprc.pku.edu.cn> Cc: Russell King <rmk+kernel@armlinux.org.uk> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Masahiro Yamada 提交于
ARM, ARM64 and UniCore32 duplicate the definition of UL(): #define UL(x) _AC(x, UL) This is not actually arch-specific, so it will be useful to move it to a common header. Currently, we only have the uapi variant for linux/const.h, so I am creating include/linux/const.h. I also added _UL(), _ULL() and ULL() because _AC() is mostly used in the form either _AC(..., UL) or _AC(..., ULL). I expect they will be replaced in follow-up cleanups. The underscore-prefixed ones should be used for exported headers. Link: http://lkml.kernel.org/r/1519301715-31798-4-git-send-email-yamada.masahiro@socionext.comSigned-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com> Acked-by: NGuan Xuetao <gxt@mprc.pku.edu.cn> Acked-by: NCatalin Marinas <catalin.marinas@arm.com> Acked-by: NRussell King <rmk+kernel@armlinux.org.uk> Cc: David Howells <dhowells@redhat.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Masahiro Yamada 提交于
Patch series "linux/const.h: cleanups of macros such as UL(), _BITUL(), BIT() etc", v3. ARM, ARM64, UniCore32 define UL() as a shorthand of _AC(..., UL). More architectures may introduce it in the future. UL() is arch-agnostic, and useful. So let's move it to include/linux/const.h Currently, <asm/memory.h> must be included to use UL(). It pulls in more bloats just for defining some bit macros. I posted V2 one year ago. The previous posts are: https://patchwork.kernel.org/patch/9498273/ https://patchwork.kernel.org/patch/9498275/ https://patchwork.kernel.org/patch/9498269/ https://patchwork.kernel.org/patch/9498271/ At that time, what blocked this series was a comment from David Howells: You need to be very careful doing this. Some userspace stuff depends on the guard macro names on the kernel header files. (https://patchwork.kernel.org/patch/9498275/) Looking at the code closer, I noticed this is not a problem. See the following line. https://github.com/torvalds/linux/blob/v4.16-rc2/scripts/headers_install.sh#L40 scripts/headers_install.sh rips off _UAPI prefix from guard macro names. I ran "make headers_install" and confirmed the result is what I expect. So, we can prefix the include guard of include/uapi/linux/const.h, and add a new include/linux/const.h. This patch (of 4): I am going to add include/linux/const.h for the kernel space. Add _UAPI to the include guard of include/uapi/linux/const.h to prepare for that. Please notice the guard name of the exported one will be kept as-is. So, this commit has no impact to the userspace even if some userspace stuff depends on the guard macro names. scripts/headers_install.sh processes exported headers by SED, and rips off "_UAPI" from guard macro names. #ifndef _UAPI_LINUX_CONST_H #define _UAPI_LINUX_CONST_H will be turned into #ifndef _LINUX_CONST_H #define _LINUX_CONST_H Link: http://lkml.kernel.org/r/1519301715-31798-2-git-send-email-yamada.masahiro@socionext.comSigned-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com> Cc: David Howells <dhowells@redhat.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Guan Xuetao <gxt@mprc.pku.edu.cn> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Davidlohr Bueso 提交于
There is a permission discrepancy when consulting msq ipc object metadata between /proc/sysvipc/msg (0444) and the MSG_STAT shmctl command. The later does permission checks for the object vs S_IRUGO. As such there can be cases where EACCESS is returned via syscall but the info is displayed anyways in the procfs files. While this might have security implications via info leaking (albeit no writing to the msq metadata), this behavior goes way back and showing all the objects regardless of the permissions was most likely an overlook - so we are stuck with it. Furthermore, modifying either the syscall or the procfs file can cause userspace programs to break (ie ipcs). Some applications require getting the procfs info (without root privileges) and can be rather slow in comparison with a syscall -- up to 500x in some reported cases for shm. This patch introduces a new MSG_STAT_ANY command such that the msq ipc object permissions are ignored, and only audited instead. In addition, I've left the lsm security hook checks in place, as if some policy can block the call, then the user has no other choice than just parsing the procfs file. Link: http://lkml.kernel.org/r/20180215162458.10059-4-dave@stgolabs.netSigned-off-by: NDavidlohr Bueso <dbueso@suse.de> Reported-by: NRobert Kettler <robert.kettler@outlook.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Kees Cook <keescook@chromium.org> Cc: Manfred Spraul <manfred@colorfullife.com> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Michal Hocko <mhocko@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Davidlohr Bueso 提交于
There is a permission discrepancy when consulting shm ipc object metadata between /proc/sysvipc/sem (0444) and the SEM_STAT semctl command. The later does permission checks for the object vs S_IRUGO. As such there can be cases where EACCESS is returned via syscall but the info is displayed anyways in the procfs files. While this might have security implications via info leaking (albeit no writing to the sma metadata), this behavior goes way back and showing all the objects regardless of the permissions was most likely an overlook - so we are stuck with it. Furthermore, modifying either the syscall or the procfs file can cause userspace programs to break (ie ipcs). Some applications require getting the procfs info (without root privileges) and can be rather slow in comparison with a syscall -- up to 500x in some reported cases for shm. This patch introduces a new SEM_STAT_ANY command such that the sem ipc object permissions are ignored, and only audited instead. In addition, I've left the lsm security hook checks in place, as if some policy can block the call, then the user has no other choice than just parsing the procfs file. Link: http://lkml.kernel.org/r/20180215162458.10059-3-dave@stgolabs.netSigned-off-by: NDavidlohr Bueso <dbueso@suse.de> Reported-by: NRobert Kettler <robert.kettler@outlook.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Kees Cook <keescook@chromium.org> Cc: Manfred Spraul <manfred@colorfullife.com> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Michal Hocko <mhocko@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Davidlohr Bueso 提交于
Patch series "sysvipc: introduce STAT_ANY commands", v2. The following patches adds the discussed (see [1]) new command for shm as well as for sems and msq as they are subject to the same discrepancies for ipc object permission checks between the syscall and via procfs. These new commands are justified in that (1) we are stuck with this semantics as changing syscall and procfs can break userland; and (2) some users can benefit from performance (for large amounts of shm segments, for example) from not having to parse the procfs interface. Once merged, I will submit the necesary manpage updates. But I'm thinking something like: : diff --git a/man2/shmctl.2 b/man2/shmctl.2 : index 7bb503999941..bb00bbe21a57 100644 : --- a/man2/shmctl.2 : +++ b/man2/shmctl.2 : @@ -41,6 +41,7 @@ : .\" 2005-04-25, mtk -- noted aberrant Linux behavior w.r.t. new : .\" attaches to a segment that has already been marked for deletion. : .\" 2005-08-02, mtk: Added IPC_INFO, SHM_INFO, SHM_STAT descriptions. : +.\" 2018-02-13, dbueso: Added SHM_STAT_ANY description. : .\" : .TH SHMCTL 2 2017-09-15 "Linux" "Linux Programmer's Manual" : .SH NAME : @@ -242,6 +243,18 @@ However, the : argument is not a segment identifier, but instead an index into : the kernel's internal array that maintains information about : all shared memory segments on the system. : +.TP : +.BR SHM_STAT_ANY " (Linux-specific)" : +Return a : +.I shmid_ds : +structure as for : +.BR SHM_STAT . : +However, the : +.I shm_perm.mode : +is not checked for read access for : +.IR shmid , : +resembing the behaviour of : +/proc/sysvipc/shm. : .PP : The caller can prevent or allow swapping of a shared : memory segment with the following \fIcmd\fP values: : @@ -287,7 +300,7 @@ operation returns the index of the highest used entry in the : kernel's internal array recording information about all : shared memory segments. : (This information can be used with repeated : -.B SHM_STAT : +.B SHM_STAT/SHM_STAT_ANY : operations to obtain information about all shared memory segments : on the system.) : A successful : @@ -328,7 +341,7 @@ isn't accessible. : \fIshmid\fP is not a valid identifier, or \fIcmd\fP : is not a valid command. : Or: for a : -.B SHM_STAT : +.B SHM_STAT/SHM_STAT_ANY : operation, the index value specified in : .I shmid : referred to an array slot that is currently unused. This patch (of 3): There is a permission discrepancy when consulting shm ipc object metadata between /proc/sysvipc/shm (0444) and the SHM_STAT shmctl command. The later does permission checks for the object vs S_IRUGO. As such there can be cases where EACCESS is returned via syscall but the info is displayed anyways in the procfs files. While this might have security implications via info leaking (albeit no writing to the shm metadata), this behavior goes way back and showing all the objects regardless of the permissions was most likely an overlook - so we are stuck with it. Furthermore, modifying either the syscall or the procfs file can cause userspace programs to break (ie ipcs). Some applications require getting the procfs info (without root privileges) and can be rather slow in comparison with a syscall -- up to 500x in some reported cases. This patch introduces a new SHM_STAT_ANY command such that the shm ipc object permissions are ignored, and only audited instead. In addition, I've left the lsm security hook checks in place, as if some policy can block the call, then the user has no other choice than just parsing the procfs file. [1] https://lkml.org/lkml/2017/12/19/220 Link: http://lkml.kernel.org/r/20180215162458.10059-2-dave@stgolabs.netSigned-off-by: NDavidlohr Bueso <dbueso@suse.de> Acked-by: NMichal Hocko <mhocko@suse.com> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Manfred Spraul <manfred@colorfullife.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Kees Cook <keescook@chromium.org> Cc: Robert Kettler <robert.kettler@outlook.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 11 4月, 2018 1 次提交
-
-
由 Jonathan Helman 提交于
Export the number of successful and failed hugetlb page allocations via the virtio balloon driver. These 2 counts come directly from the vm_events HTLB_BUDDY_PGALLOC and HTLB_BUDDY_PGALLOC_FAIL. Signed-off-by: NJonathan Helman <jonathan.helman@oracle.com> Signed-off-by: NMichael S. Tsirkin <mst@redhat.com> Reviewed-by: NJason Wang <jasowang@redhat.com>
-