- 20 1月, 2019 1 次提交
-
-
由 Willem de Bruijn 提交于
Syzkaller was able to construct a packet of negative length by redirecting from bpf_prog_test_run_skb with BPF_PROG_TYPE_LWT_XMIT: BUG: KASAN: slab-out-of-bounds in memcpy include/linux/string.h:345 [inline] BUG: KASAN: slab-out-of-bounds in skb_copy_from_linear_data include/linux/skbuff.h:3421 [inline] BUG: KASAN: slab-out-of-bounds in __pskb_copy_fclone+0x2dd/0xeb0 net/core/skbuff.c:1395 Read of size 4294967282 at addr ffff8801d798009c by task syz-executor2/12942 kasan_report.cold.9+0x242/0x309 mm/kasan/report.c:412 check_memory_region_inline mm/kasan/kasan.c:260 [inline] check_memory_region+0x13e/0x1b0 mm/kasan/kasan.c:267 memcpy+0x23/0x50 mm/kasan/kasan.c:302 memcpy include/linux/string.h:345 [inline] skb_copy_from_linear_data include/linux/skbuff.h:3421 [inline] __pskb_copy_fclone+0x2dd/0xeb0 net/core/skbuff.c:1395 __pskb_copy include/linux/skbuff.h:1053 [inline] pskb_copy include/linux/skbuff.h:2904 [inline] skb_realloc_headroom+0xe7/0x120 net/core/skbuff.c:1539 ipip6_tunnel_xmit net/ipv6/sit.c:965 [inline] sit_tunnel_xmit+0xe1b/0x30d0 net/ipv6/sit.c:1029 __netdev_start_xmit include/linux/netdevice.h:4325 [inline] netdev_start_xmit include/linux/netdevice.h:4334 [inline] xmit_one net/core/dev.c:3219 [inline] dev_hard_start_xmit+0x295/0xc90 net/core/dev.c:3235 __dev_queue_xmit+0x2f0d/0x3950 net/core/dev.c:3805 dev_queue_xmit+0x17/0x20 net/core/dev.c:3838 __bpf_tx_skb net/core/filter.c:2016 [inline] __bpf_redirect_common net/core/filter.c:2054 [inline] __bpf_redirect+0x5cf/0xb20 net/core/filter.c:2061 ____bpf_clone_redirect net/core/filter.c:2094 [inline] bpf_clone_redirect+0x2f6/0x490 net/core/filter.c:2066 bpf_prog_41f2bcae09cd4ac3+0xb25/0x1000 The generated test constructs a packet with mac header, network header, skb->data pointing to network header and skb->len 0. Redirecting to a sit0 through __bpf_redirect_no_mac pulls the mac length, even though skb->data already is at skb->network_header. bpf_prog_test_run_skb has already pulled it as LWT_XMIT !is_l2. Update the offset calculation to pull only if skb->data differs from skb->network_header, which is not true in this case. The test itself can be run only from commit 1cf1cae9 ("bpf: introduce BPF_PROG_TEST_RUN command"), but the same type of packets with skb at network header could already be built from lwt xmit hooks, so this fix is more relevant to that commit. Also set the mac header on redirect from LWT_XMIT, as even after this change to __bpf_redirect_no_mac that field is expected to be set, but is not yet in ip_finish_output2. Fixes: 3a0af8fd ("bpf: BPF for lightweight tunnel infrastructure") Reported-by: Nsyzbot <syzkaller@googlegroups.com> Signed-off-by: NWillem de Bruijn <willemb@google.com> Acked-by: NMartin KaFai Lau <kafai@fb.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
- 18 1月, 2019 2 次提交
-
-
由 Yuchung Cheng 提交于
If sch_fq packet scheduler is not used, TCP can fallback to internal pacing, but this requires sk_pacing_status to be properly set. Fixes: 8c4b4c7e ("bpf: Add setsockopt helper function to bpf") Signed-off-by: NYuchung Cheng <ycheng@google.com> Signed-off-by: NEric Dumazet <edumazet@google.com> Cc: Lawrence Brakmo <brakmo@fb.com> Acked-by: NMartin KaFai Lau <kafai@fb.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
由 Peter Oskolkov 提交于
In sock_setsockopt() (net/core/sock.h), when SO_MARK option is used to change sk_mark, sk_dst_reset(sk) is called. The same should be done in bpf_setsockopt(). Fixes: 8c4b4c7e ("bpf: Add setsockopt helper function to bpf") Reported-by: NMaciej Żenczykowski <maze@google.com> Signed-off-by: NPeter Oskolkov <posk@google.com> Acked-by: NMartin KaFai Lau <kafai@fb.com> Reviewed-by: NMaciej Żenczykowski <maze@google.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
- 17 1月, 2019 1 次提交
-
-
由 Mathieu Malaterre 提交于
There is a plan to build the kernel with -Wimplicit-fallthrough and this place in the code produced a warnings (W=1). To preserve as much of the existing comment only change a ‘:’ into a ‘,’. This is enough change, to match the regular expression expected by GCC. This commit removes the following warning: net/core/filter.c:5310:6: warning: this statement may fall through [-Wimplicit-fallthrough=] Signed-off-by: NMathieu Malaterre <malat@debian.org> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
- 10 1月, 2019 1 次提交
-
-
由 Yuchung Cheng 提交于
The existing BPF TCP initial congestion window (TCP_BPF_IW) does not to work on (active) Fast Open sender. This is because it changes the (initial) window only if data_segs_out is zero -- but data_segs_out is also incremented on SYN-data. This patch fixes the issue by proerly accounting for SYN-data additionally. Fixes: fc747810 ("bpf: Adds support for setting initial cwnd") Signed-off-by: NYuchung Cheng <ycheng@google.com> Reviewed-by: NNeal Cardwell <ncardwell@google.com> Acked-by: NLawrence Brakmo <brakmo@fb.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
-
- 24 12月, 2018 1 次提交
-
-
由 David S. Miller 提交于
This reverts: 50d52586 ("net: core: Fix Spectre v1 vulnerability") d686026b ("phonet: af_phonet: Fix Spectre v1 vulnerability") a95386f0 ("nfc: af_nfc: Fix Spectre v1 vulnerability") a3ac5817 ("can: af_can: Fix Spectre v1 vulnerability") After some discussion with Alexei Starovoitov these all seem to be completely unnecessary. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 23 12月, 2018 1 次提交
-
-
由 Gustavo A. R. Silva 提交于
flen is indirectly controlled by user-space, hence leading to a potential exploitation of the Spectre variant 1 vulnerability. This issue was detected with the help of Smatch: net/core/filter.c:1101 bpf_check_classic() warn: potential spectre issue 'filter' [w] Fix this by sanitizing flen before using it to index filter at line 1101: switch (filter[flen - 1].code) { and through pc at line 1040: const struct sock_filter *ftest = &filter[pc]; Notice that given that speculation windows are large, the policy is to kill the speculation on the first load and not worry if it can be completed with a dependent load/store [1]. [1] https://marc.info/?l=linux-kernel&m=152449131114778&w=2Signed-off-by: NGustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 21 12月, 2018 2 次提交
-
-
由 John Fastabend 提交于
Enforce comment on structure layout dependency with a BUILD_BUG_ON to ensure the condition is maintained. Suggested-by: NDaniel Borkmann <daniel@iogearbox.net> Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
由 John Fastabend 提交于
The check for max offset in sk_msg_is_valid_access uses sizeof() which is incorrect because it would allow accessing possibly past the end of the struct in the padded case. Further, it doesn't preclude accessing any padding that may be added in the middle of a struct. All told this makes it fragile to rely on. To fix this explicitly check offsets with fields using the bpf_ctx_range() and bpf_ctx_range_till() macros. For reference the current structure layout looks as follows (reported by pahole) struct sk_msg_md { union { void * data; /* 8 */ }; /* 0 8 */ union { void * data_end; /* 8 */ }; /* 8 8 */ __u32 family; /* 16 4 */ __u32 remote_ip4; /* 20 4 */ __u32 local_ip4; /* 24 4 */ __u32 remote_ip6[4]; /* 28 16 */ __u32 local_ip6[4]; /* 44 16 */ __u32 remote_port; /* 60 4 */ /* --- cacheline 1 boundary (64 bytes) --- */ __u32 local_port; /* 64 4 */ __u32 size; /* 68 4 */ /* size: 72, cachelines: 2, members: 10 */ /* last cacheline: 8 bytes */ }; So there should be no padding at the moment but fixing this now prevents future errors. Reported-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
- 19 12月, 2018 1 次提交
-
-
由 John Fastabend 提交于
This adds metadata to sk_msg_md for BPF programs to read the sk_msg size. When the SK_MSG program is running under an application that is using sendfile the data is not copied into sk_msg buffers by default. Rather the BPF program uses sk_msg_pull_data to read the bytes in. This avoids doing the costly memcopy instructions when they are not in fact needed. However, if we don't know the size of the sk_msg we have to guess if needed bytes are available by doing a pull request which may fail. By including the size of the sk_msg BPF programs can check the size before issuing sk_msg_pull_data requests. Additionally, the same applies for sendmsg calls when the application provides multiple iovs. Here the BPF program needs to pull in data to update data pointers but its not clear where the data ends without a size parameter. In many cases "guessing" is not easy to do and results in multiple calls to pull and without bounded loops everything gets fairly tricky. Clean this up by including a u32 size field. Note, all writes into sk_msg_md are rejected already from sk_msg_is_valid_access so nothing additional is needed there. Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
- 04 12月, 2018 1 次提交
-
-
由 Petar Penkov 提交于
The pkt_len field in qdisc_skb_cb stores the skb length as it will appear on the wire after segmentation. For byte accounting, this value is more accurate than skb->len. It is computed on entry to the TC layer, so only valid there. Allow read access to this field from BPF tc classifier and action programs. The implementation is analogous to tc_classid, aside from restricting to read access. To distinguish it from skb->len and self-describe export as wire_len. Changes v1->v2 - Rename pkt_len to wire_len Signed-off-by: NPetar Penkov <ppenkov@google.com> Signed-off-by: NVlad Dumitrescu <vladum@google.com> Signed-off-by: NWillem de Bruijn <willemb@google.com> Acked-by: NSong Liu <songliubraving@fb.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
- 01 12月, 2018 2 次提交
-
-
由 Joe Stringer 提交于
David Ahern and Nicolas Dichtel report that the handling of the netns id 0 is incorrect for the BPF socket lookup helpers: rather than finding the netns with id 0, it is resolving to the current netns. This renders the netns_id 0 inaccessible. To fix this, adjust the API for the netns to treat all negative s32 values as a lookup in the current netns (including u64 values which when truncated to s32 become negative), while any values with a positive value in the signed 32-bit integer space would result in a lookup for a socket in the netns corresponding to that id. As before, if the netns with that ID does not exist, no socket will be found. Any netns outside of these ranges will fail to find a corresponding socket, as those values are reserved for future usage. Signed-off-by: NJoe Stringer <joe@wand.net.nz> Acked-by: NNicolas Dichtel <nicolas.dichtel@6wind.com> Acked-by: NJoey Pabalinas <joeypabalinas@gmail.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
-
由 Daniel Borkmann 提交于
Currently, pointer offsets in three BPF context structures are broken in two scenarios: i) 32 bit compiled applications running on 64 bit kernels, and ii) LLVM compiled BPF programs running on 32 bit kernels. The latter is due to BPF target machine being strictly 64 bit. So in each of the cases the offsets will mismatch in verifier when checking / rewriting context access. Fix this by providing a helper macro __bpf_md_ptr() that will enforce padding up to 64 bit and proper alignment, and for context access a macro bpf_ctx_range_ptr() which will cover full 64 bit member range on 32 bit archs. For flow_keys, we additionally need to force the size check to sizeof(__u64) as with other pointer types. Fixes: d58e468b ("flow_dissector: implements flow dissector BPF hook") Fixes: 4f738adb ("bpf: create tcp_bpf_ulp allowing BPF to monitor socket TX/RX data") Fixes: 2dbb9b9e ("bpf: Introduce BPF_PROG_TYPE_SK_REUSEPORT") Reported-by: NDavid S. Miller <davem@davemloft.net> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NDavid S. Miller <davem@davemloft.net> Tested-by: NDavid S. Miller <davem@davemloft.net> Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
-
- 29 11月, 2018 1 次提交
-
-
由 John Fastabend 提交于
This adds a BPF SK_MSG program helper so that we can pop data from a msg. We use this to pop metadata from a previous push data call. Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
- 27 11月, 2018 1 次提交
-
-
由 David Miller 提交于
'offset' is constant and if it is zero, no need to subtract it from BPF_REG_TMP. Signed-off-by: NDavid S. Miller <davem@davemloft.net> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
- 23 11月, 2018 1 次提交
-
-
由 Vlad Dumitrescu 提交于
This could be used to rate limit egress traffic in concert with a qdisc which supports Earliest Departure Time, such as FQ. Write access from cg skb progs only with CAP_SYS_ADMIN, since the value will be used by downstream qdiscs. It might make sense to relax this. Changes v1 -> v2: - allow access from cg skb, write only with CAP_SYS_ADMIN Signed-off-by: NVlad Dumitrescu <vladum@google.com> Acked-by: NEric Dumazet <edumazet@google.com> Acked-by: NWillem de Bruijn <willemb@google.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
-
- 17 11月, 2018 4 次提交
-
-
由 Michał Mirosław 提交于
Replace VLAN_TAG_PRESENT with single bit flag and free up VLAN.CFI overload. Now VLAN.CFI is visible in networking stack and can be passed around intact. Signed-off-by: NMichał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Michał Mirosław 提交于
Signed-off-by: NMichał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Andrey Ignatov 提交于
Make bpf_sk_lookup_tcp, bpf_sk_lookup_udp and bpf_sk_release helpers available in programs of type BPF_PROG_TYPE_CGROUP_SOCK_ADDR. Such programs operate on sockets and have access to socket and struct sockaddr passed by user to system calls such as sys_bind, sys_connect, sys_sendmsg. It's useful to be able to lookup other sockets from these programs. E.g. sys_connect may lookup IP:port endpoint and if there is a server socket bound to that endpoint ("server" can be defined by saddr & sport being zero), redirect client connection to it by rewriting IP:port in sockaddr passed to sys_connect. Signed-off-by: NAndrey Ignatov <rdna@fb.com> Acked-by: NAlexei Starovoitov <ast@kernel.org> Acked-by: NMartin KaFai Lau <kafai@fb.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
-
由 Andrey Ignatov 提交于
Lookup functions in sk_lookup have different expectations about byte order of provided arguments. Specifically __inet_lookup, __udp4_lib_lookup and __udp6_lib_lookup expect dport to be in network byte order and do ntohs(dport) internally. At the same time __inet6_lookup expects dport to be in host byte order and correspondingly name the argument hnum. sk_lookup works correctly with __inet_lookup, __udp4_lib_lookup and __inet6_lookup with regard to dport. But in __udp6_lib_lookup case it uses host instead of expected network byte order. It makes result returned by bpf_sk_lookup_udp for IPv6 incorrect. The patch fixes byte order of dport passed to __udp6_lib_lookup. Originally sk_lookup properly handled UDPv6, but not TCPv6. 5ef0ae84 fixes TCPv6 but breaks UDPv6. Fixes: 5ef0ae84 ("bpf: Fix IPv6 dport byte-order in bpf_sk_lookup") Signed-off-by: NAndrey Ignatov <rdna@fb.com> Acked-by: NJoe Stringer <joe@wand.net.nz> Acked-by: NMartin KaFai Lau <kafai@fb.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
-
- 09 11月, 2018 3 次提交
-
-
由 Nitin Hande 提交于
This patch proposes to extend the sk_lookup() BPF API to the XDP hookpoint. The sk_lookup() helper supports a lookup on incoming packet to find the corresponding socket that will receive this packet. Current support for this BPF API is at the tc hookpoint. This patch will extend this API at XDP hookpoint. A XDP program can map the incoming packet to the 5-tuple parameter and invoke the API to find the corresponding socket structure. Signed-off-by: NNitin Hande <Nitin.Hande@gmail.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
由 Sowmini Varadhan 提交于
This patch allows eBPF programs that use sock_ops to send perf based event notifications using bpf_perf_event_output(). Our main use case for this is the following: We would like to monitor some subset of TCP sockets in user-space, (the monitoring application would define 4-tuples it wants to monitor) using TCP_INFO stats to analyze reported problems. The idea is to use those stats to see where the bottlenecks are likely to be ("is it application-limited?" or "is there evidence of BufferBloat in the path?" etc). Today we can do this by periodically polling for tcp_info, but this could be made more efficient if the kernel would asynchronously notify the application via tcp_info when some "interesting" thresholds (e.g., "RTT variance > X", or "total_retrans > Y" etc) are reached. And to make this effective, it is better if we could apply the threshold check *before* constructing the tcp_info netlink notification, so that we don't waste resources constructing notifications that will be discarded by the filter. This work solves the problem by adding perf event based notification support for sock_ops. The eBPF program can thus be designed to apply any desired filters to the bpf_sock_ops and trigger a perf event notification based on the evaluation from the filter. The user space component can use these perf event notifications to either read any state managed by the eBPF program, or issue a TCP_INFO netlink call if desired. Signed-off-by: NSowmini Varadhan <sowmini.varadhan@oracle.com> Co-developed-by: NDaniel Borkmann <daniel@iogearbox.net> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
由 Andrey Ignatov 提交于
Lookup functions in sk_lookup have different expectations about byte order of provided arguments. Specifically __inet_lookup, __udp4_lib_lookup and __udp6_lib_lookup expect dport to be in network byte order and do ntohs(dport) internally. At the same time __inet6_lookup expects dport to be in host byte order and correspondingly name the argument hnum. sk_lookup works correctly with __inet_lookup, __udp4_lib_lookup and __inet6_lookup with regard to dport. But in __udp6_lib_lookup case it uses host instead of expected network byte order. It makes result returned by bpf_sk_lookup_udp for IPv6 incorrect. The patch fixes byte order of dport passed to __udp6_lib_lookup. Originally sk_lookup properly handled UDPv6, but not TCPv6. 5ef0ae84 fixes TCPv6 but breaks UDPv6. Fixes: 5ef0ae84 ("bpf: Fix IPv6 dport byte-order in bpf_sk_lookup") Signed-off-by: NAndrey Ignatov <rdna@fb.com> Acked-by: NJoe Stringer <joe@wand.net.nz> Acked-by: NMartin KaFai Lau <kafai@fb.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
- 27 10月, 2018 1 次提交
-
-
由 Daniel Borkmann 提交于
Commit cd339431 ("bpf: introduce the bpf_get_local_storage() helper function") enabled the bpf_get_local_storage() helper also for BPF program types where it does not make sense to use them. They have been added both in sk_skb_func_proto() and sk_msg_func_proto() even though both program types are not invoked in combination with cgroups, and neither through BPF_PROG_RUN_ARRAY(). In the latter the bpf_cgroup_storage_set() is set shortly before BPF program invocation. Later, the helper bpf_get_local_storage() retrieves this prior set up per-cpu pointer and hands the buffer to the BPF program. The map argument in there solely retrieves the enum bpf_cgroup_storage_type from a local storage map associated with the program and based on the type returns either the global or per-cpu storage. However, there is no specific association between the program's map and the actual content in bpf_cgroup_storage[]. Meaning, any BPF program that would have been properly run from the cgroup side through BPF_PROG_RUN_ARRAY() where bpf_cgroup_storage_set() was performed, and that is later unloaded such that prog / maps are teared down will cause a use after free if that pointer is retrieved from programs that are not run through BPF_PROG_RUN_ARRAY() but have the cgroup local storage helper enabled in their func proto. Lets just remove it from the two sock_map program types to fix it. Auditing through the types where this helper is enabled, it appears that these are the only ones where it was mistakenly allowed. Fixes: cd339431 ("bpf: introduce the bpf_get_local_storage() helper function") Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Cc: Roman Gushchin <guro@fb.com> Acked-by: NJohn Fastabend <john.fastabend@gmail.com> Acked-by: NRoman Gushchin <guro@fb.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
-
- 26 10月, 2018 2 次提交
-
-
由 Daniel Borkmann 提交于
Given this seems to be quite fragile and can easily slip through the cracks, lets make direct packet write more robust by requiring that future program types which allow for such write must provide a prologue callback. In case of XDP and sk_msg it's noop, thus add a generic noop handler there. The latter starts out with NULL data/data_end unconditionally when sg pages are shared. Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NAlexei Starovoitov <ast@kernel.org> Acked-by: NSong Liu <songliubraving@fb.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
-
由 Daniel Borkmann 提交于
Commit b39b5f41 ("bpf: add cg_skb_is_valid_access for BPF_PROG_TYPE_CGROUP_SKB") added support for returning pkt pointers for direct packet access. Given this program type is allowed for both unprivileged and privileged users, we shouldn't allow unprivileged ones to use it, e.g. besides others one reason would be to avoid any potential speculation on the packet test itself, thus guard this for root only. Fixes: b39b5f41 ("bpf: add cg_skb_is_valid_access for BPF_PROG_TYPE_CGROUP_SKB") Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NAlexei Starovoitov <ast@kernel.org> Cc: Song Liu <songliubraving@fb.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
-
- 21 10月, 2018 1 次提交
-
-
由 John Fastabend 提交于
This allows user to push data into a msg using sk_msg program types. The format is as follows, bpf_msg_push_data(msg, offset, len, flags) this will insert 'len' bytes at offset 'offset'. For example to prepend 10 bytes at the front of the message the user can, bpf_msg_push_data(msg, 0, 10, 0); This will invalidate data bounds so BPF user will have to then recheck data bounds after calling this. After this the msg size will have been updated and the user is free to write into the added bytes. We allow any offset/len as long as it is within the (data, data_end) range. However, a copy will be required if the ring is full and its possible for the helper to fail with ENOMEM or EINVAL errors which need to be handled by the BPF program. This can be used similar to XDP metadata to pass data between sk_msg layer and lower layers. Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
- 20 10月, 2018 2 次提交
-
-
由 Song Liu 提交于
BPF programs of BPF_PROG_TYPE_CGROUP_SKB need to access headers in the skb. This patch enables direct access of skb for these programs. Two helper functions bpf_compute_and_save_data_end() and bpf_restore_data_end() are introduced. There are used in __cgroup_bpf_run_filter_skb(), to compute proper data_end for the BPF program, and restore original data afterwards. Signed-off-by: NSong Liu <songliubraving@fb.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
-
由 Mauricio Vasquez B 提交于
Queue/stack maps implement a FIFO/LIFO data storage for ebpf programs. These maps support peek, pop and push operations that are exposed to eBPF programs through the new bpf_map[peek/pop/push] helpers. Those operations are exposed to userspace applications through the already existing syscalls in the following way: BPF_MAP_LOOKUP_ELEM -> peek BPF_MAP_LOOKUP_AND_DELETE_ELEM -> pop BPF_MAP_UPDATE_ELEM -> push Queue/stack maps are implemented using a buffer, tail and head indexes, hence BPF_F_NO_PREALLOC is not supported. As opposite to other maps, queue and stack do not use RCU for protecting maps values, the bpf_map[peek/pop] have a ARG_PTR_TO_UNINIT_MAP_VALUE argument that is a pointer to a memory zone where to save the value of a map. Basically the same as ARG_PTR_TO_UNINIT_MEM, but the size has not be passed as an extra argument. Our main motivation for implementing queue/stack maps was to keep track of a pool of elements, like network ports in a SNAT, however we forsee other use cases, like for exampling saving last N kernel events in a map and then analysing from userspace. Signed-off-by: NMauricio Vasquez B <mauricio.vasquez@polito.it> Acked-by: NSong Liu <songliubraving@fb.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
-
- 16 10月, 2018 4 次提交
-
-
由 Eric Dumazet 提交于
sk_pacing_rate has beed introduced as a u32 field in 2013, effectively limiting per flow pacing to 34Gbit. We believe it is time to allow TCP to pace high speed flows on 64bit hosts, as we now can reach 100Gbit on one TCP flow. This patch adds no cost for 32bit kernels. The tcpi_pacing_rate and tcpi_max_pacing_rate were already exported as 64bit, so iproute2/ss command require no changes. Unfortunately the SO_MAX_PACING_RATE socket option will stay 32bit and we will need to add a new option to let applications control high pacing rates. State Recv-Q Send-Q Local Address:Port Peer Address:Port ESTAB 0 1787144 10.246.9.76:49992 10.246.9.77:36741 timer:(on,003ms,0) ino:91863 sk:2 <-> skmem:(r0,rb540000,t66440,tb2363904,f605944,w1822984,o0,bl0,d0) ts sack bbr wscale:8,8 rto:201 rtt:0.057/0.006 mss:1448 rcvmss:536 advmss:1448 cwnd:138 ssthresh:178 bytes_acked:256699822585 segs_out:177279177 segs_in:3916318 data_segs_out:177279175 bbr:(bw:31276.8Mbps,mrtt:0,pacing_gain:1.25,cwnd_gain:2) send 28045.5Mbps lastrcv:73333 pacing_rate 38705.0Mbps delivery_rate 22997.6Mbps busy:73333ms unacked:135 retrans:0/157 rcv_space:14480 notsent:2085120 minrtt:0.013 Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Joe Stringer 提交于
Commit 6acc9b43 ("bpf: Add helper to retrieve socket in BPF") mistakenly passed the destination port in network byte-order to the IPv6 TCP/UDP socket lookup functions, which meant that BPF writers would need to either manually swap the byte-order of this field or otherwise IPv6 sockets could not be located via this helper. Fix the issue by swapping the byte-order appropriately in the helper. This also makes the API more consistent with the IPv4 version. Fixes: 6acc9b43 ("bpf: Add helper to retrieve socket in BPF") Signed-off-by: NJoe Stringer <joe@wand.net.nz> Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
-
由 Joe Stringer 提交于
This is a more complete fix than d71019b5 ("net: core: Fix build with CONFIG_IPV6=m"), so that IPv6 sockets may be looked up if the IPv6 module is loaded (not just if it's compiled in). Signed-off-by: NJoe Stringer <joe@wand.net.nz> Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
-
由 Daniel Borkmann 提交于
Add a generic sk_msg layer, and convert current sockmap and later kTLS over to make use of it. While sk_buff handles network packet representation from netdevice up to socket, sk_msg handles data representation from application to socket layer. This means that sk_msg framework spans across ULP users in the kernel, and enables features such as introspection or filtering of data with the help of BPF programs that operate on this data structure. Latter becomes in particular useful for kTLS where data encryption is deferred into the kernel, and as such enabling the kernel to perform L7 introspection and policy based on BPF for TLS connections where the record is being encrypted after BPF has run and came to a verdict. In order to get there, first step is to transform open coding of scatter-gather list handling into a common core framework that subsystems can use. The code itself has been split and refactored into three bigger pieces: i) the generic sk_msg API which deals with managing the scatter gather ring, providing helpers for walking and mangling, transferring application data from user space into it, and preparing it for BPF pre/post-processing, ii) the plain sock map itself where sockets can be attached to or detached from; these bits are independent of i) which can now be used also without sock map, and iii) the integration with plain TCP as one protocol to be used for processing L7 application data (later this could e.g. also be extended to other protocols like UDP). The semantics are the same with the old sock map code and therefore no change of user facing behavior or APIs. While pursuing this work it also helped finding a number of bugs in the old sockmap code that we've fixed already in earlier commits. The test_sockmap kselftest suite passes through fine as well. Joint work with John. Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
-
- 14 10月, 2018 1 次提交
-
-
由 Joe Stringer 提交于
Dan Carpenter reports: The patch 6acc9b43: "bpf: Add helper to retrieve socket in BPF" from Oct 2, 2018, leads to the following Smatch complaint: net/core/filter.c:4893 bpf_sk_lookup() error: we previously assumed 'skb->dev' could be null (see line 4885) Fix this issue by checking skb->dev before using it. Signed-off-by: NJoe Stringer <joe@wand.net.nz> Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
-
- 09 10月, 2018 1 次提交
-
-
由 Arnd Bergmann 提交于
The newly added TCP and UDP handling fails to link when CONFIG_INET is disabled: net/core/filter.o: In function `sk_lookup': filter.c:(.text+0x7ff8): undefined reference to `tcp_hashinfo' filter.c:(.text+0x7ffc): undefined reference to `tcp_hashinfo' filter.c:(.text+0x8020): undefined reference to `__inet_lookup_established' filter.c:(.text+0x8058): undefined reference to `__inet_lookup_listener' filter.c:(.text+0x8068): undefined reference to `udp_table' filter.c:(.text+0x8070): undefined reference to `udp_table' filter.c:(.text+0x808c): undefined reference to `__udp4_lib_lookup' net/core/filter.o: In function `bpf_sk_release': filter.c:(.text+0x82e8): undefined reference to `sock_gen_put' Wrap the related sections of code in #ifdefs for the config option. Furthermore, sk_lookup() should always have been marked 'static', this also avoids a warning about a missing prototype when building with 'make W=1'. Fixes: 6acc9b43 ("bpf: Add helper to retrieve socket in BPF") Signed-off-by: NArnd Bergmann <arnd@arndb.de> Signed-off-by: NJoe Stringer <joe@wand.net.nz> Acked-by: NSong Liu <songliubraving@fb.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
- 04 10月, 2018 1 次提交
-
-
由 Joe Stringer 提交于
Stephen Rothwell reports the following link failure with IPv6 as module: x86_64-linux-gnu-ld: net/core/filter.o: in function `sk_lookup': (.text+0x19219): undefined reference to `__udp6_lib_lookup' Fix the build by only enabling the IPv6 socket lookup if IPv6 support is compiled into the kernel. Signed-off-by: NJoe Stringer <joe@wand.net.nz> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
- 03 10月, 2018 2 次提交
-
-
由 Joe Stringer 提交于
This patch adds new BPF helper functions, bpf_sk_lookup_tcp() and bpf_sk_lookup_udp() which allows BPF programs to find out if there is a socket listening on this host, and returns a socket pointer which the BPF program can then access to determine, for instance, whether to forward or drop traffic. bpf_sk_lookup_xxx() may take a reference on the socket, so when a BPF program makes use of this function, it must subsequently pass the returned pointer into the newly added sk_release() to return the reference. By way of example, the following pseudocode would filter inbound connections at XDP if there is no corresponding service listening for the traffic: struct bpf_sock_tuple tuple; struct bpf_sock_ops *sk; populate_tuple(ctx, &tuple); // Extract the 5tuple from the packet sk = bpf_sk_lookup_tcp(ctx, &tuple, sizeof tuple, netns, 0); if (!sk) { // Couldn't find a socket listening for this traffic. Drop. return TC_ACT_SHOT; } bpf_sk_release(sk, 0); return TC_ACT_OK; Signed-off-by: NJoe Stringer <joe@wand.net.nz> Acked-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
由 Joe Stringer 提交于
Teach the verifier a little bit about a new type of pointer, a PTR_TO_SOCKET. This pointer type is accessed from BPF through the 'struct bpf_sock' structure. Signed-off-by: NJoe Stringer <joe@wand.net.nz> Acked-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
- 15 9月, 2018 1 次提交
-
-
由 Petar Penkov 提交于
Adds a hook for programs of type BPF_PROG_TYPE_FLOW_DISSECTOR and attach type BPF_FLOW_DISSECTOR that is executed in the flow dissector path. The BPF program is per-network namespace. Signed-off-by: NPetar Penkov <ppenkov@google.com> Signed-off-by: NWillem de Bruijn <willemb@google.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
-
- 13 9月, 2018 1 次提交
-
-
由 Tushar Dave 提交于
Helper bpg_msg_pull_data() can allocate multiple pages while linearizing multiple scatterlist elements into one shared page. However, if the shared page has size > PAGE_SIZE, using copy_page_to_iter() causes below warning. e.g. [ 6367.019832] WARNING: CPU: 2 PID: 7410 at lib/iov_iter.c:825 page_copy_sane.part.8+0x0/0x8 To avoid above warning, use __GFP_COMP while allocating multiple contiguous pages. Fixes: 015632bb ("bpf: sk_msg program helper bpf_sk_msg_pull_data") Signed-off-by: NTushar Dave <tushar.n.dave@oracle.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-