- 18 7月, 2023 2 次提交
-
-
由 JofDiamonds 提交于
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I7LTRR CVE: NA Reference: https://gitee.com/openeuler/kernel/commit/97aeb284efece2a8af5bb424d4e980905927f7bb -------------------------------- Add new optname(BPF_SO_ORIGINAL_DST 800, BPF_SO_REPLY_SRC 801) to get origdst/reply src for bpf progs. Now only support IPv4. Signed-off-by: NWang Yufen <wangyufen@huawei.com> Signed-off-by: NLiu Jian <liujian56@huawei.com> Signed-off-by: NJofDiamonds <kwb0523@163.com> Reviewed-by: Nwuchangye <wuchangye@huawei.com>
-
由 JofDiamonds 提交于
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I7LTRR CVE: NA Reference: https://gitee.com/openeuler/kernel/commit/9d4b4a05ae00d7e5b2f8a33fdbdf974df182ccb7 -------------------------------- Add the function for bpf sock_ops hook to get sock's uid and gid. Signed-off-by: NLiu Jian <liujian56@huawei.com> Conflicts: include/uapi/linux/bpf.h net/core/filter.c tools/include/uapi/linux/bpf.h Signed-off-by: NJofDiamonds <kwb0523@163.com> Reviewed-by: Nwuchangye <wuchangye@huawei.com>
-
- 16 7月, 2023 1 次提交
-
-
由 zhang-mingyi66 提交于
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I7LE1H ---------------------------------------------------- Since commit 2585cd62 ("bpf: Only reply field should be writeable"), sockops is not allowd to modify the replylong field except replylong[0]. The reason is that the replylong[1] to replylong[3] field is not used at that time. But in actual use, we can call `BPF_CGROUP_RUN_PROG_SOCK_OPS` in the kernel modules and expect sockops to return some useful data. The design comment about bpf_sock_ops::replylong in include/uapi/linux/bpf.h is described as follows: ``` struct bpf_sock_ops { __u32 op; union { __u32 args[4]; /* Optionally passed to bpf program */ __u32 reply; /* Returned by bpf program */ __u32 replylong[4]; /* Optioznally returned by bpf prog */ }; ... ``` It seems to contradict the purpose for which the field was originally designed. Let's remove this restriction. Fixes: 2585cd62 ("bpf: Only reply field should be writeable") Signed-off-by: Nzhang-mingyi66 <zhangmingyi5@huawei.com>
-
- 15 7月, 2023 1 次提交
-
-
由 zhang-mingyi66 提交于
sockets during setopt hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I7LE1H ------------------------------------------------------ Currently, the ebpf program can distinguish sockets according to the address accessed by the client, and use the ULP framework to modify the matched sockets to delay link establishment. Signed-off-by: Nzhang-mingyi66 <zhangmingyi5@huawei.com>
-
- 22 4月, 2023 1 次提交
-
-
由 Florian Westphal 提交于
This adds minimal support for BPF_PROG_TYPE_NETFILTER bpf programs that will be invoked via the NF_HOOK() points in the ip stack. Invocation incurs an indirect call. This is not a necessity: Its possible to add 'DEFINE_BPF_DISPATCHER(nf_progs)' and handle the program invocation with the same method already done for xdp progs. This isn't done here to keep the size of this chunk down. Verifier restricts verdicts to either DROP or ACCEPT. Signed-off-by: NFlorian Westphal <fw@strlen.de> Link: https://lore.kernel.org/r/20230421170300.24115-3-fw@strlen.deSigned-off-by: NAlexei Starovoitov <ast@kernel.org>
-
- 18 4月, 2023 1 次提交
-
-
由 Daniel Borkmann 提交于
There are some use-cases where it is desirable to use bpf_redirect() in combination with ifb device, which currently is not supported, for example, around filtering inbound traffic with BPF to then push it to ifb which holds the qdisc for shaping in contrast to doing that on the egress device. Toke mentions the following case related to OpenWrt: Because there's not always a single egress on the other side. These are mainly home routers, which tend to have one or more WiFi devices bridged to one or more ethernet ports on the LAN side, and a single upstream WAN port. And the objective is to control the total amount of traffic going over the WAN link (in both directions), to deal with bufferbloat in the ISP network (which is sadly still all too prevalent). In this setup, the traffic can be split arbitrarily between the links on the LAN side, and the only "single bottleneck" is the WAN link. So we install both egress and ingress shapers on this, configured to something like 95-98% of the true link bandwidth, thus moving the queues into the qdisc layer in the router. It's usually necessary to set the ingress bandwidth shaper a bit lower than the egress due to being "downstream" of the bottleneck link, but it does work surprisingly well. We usually use something like a matchall filter to put all ingress traffic on the ifb, so doing the redirect from BPF has not been an immediate requirement thus far. However, it does seem a bit odd that this is not possible, and we do have a BPF-based filter that layers on top of this kind of setup, which currently uses u32 as the ingress filter and so it could presumably be improved to use BPF instead if that was available. Reported-by: NToke Høiland-Jørgensen <toke@redhat.com> Reported-by: NYafang Shao <laoar.shao@gmail.com> Reported-by: NTonghao Zhang <xiangxia.m.yue@gmail.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NYafang Shao <laoar.shao@gmail.com> Acked-by: NToke Høiland-Jørgensen <toke@redhat.com> Link: https://git.openwrt.org/?p=project/qosify.git;a=blob;f=README Link: https://lore.kernel.org/bpf/875y9yzbuy.fsf@toke.dk Link: https://lore.kernel.org/r/8cebc8b2b6e967e10cbafe2ffd6795050e74accd.1681739137.git.daniel@iogearbox.netSigned-off-by: NAlexei Starovoitov <ast@kernel.org>
-
- 05 4月, 2023 3 次提交
-
-
由 Alexei Starovoitov 提交于
bpf_[sk|inode|task|cgrp]_storage_[get|delete]() and bpf_get_socket_cookie() helpers perform run-time check that sk|inode|task|cgrp pointer != NULL. Teach verifier about this fact and allow bpf programs to pass PTR_TO_BTF_ID | PTR_MAYBE_NULL into such helpers. It will be used in the subsequent patch that will do bpf_sk_storage_get(.., skb->sk, ...); Even when 'skb' pointer is trusted the 'sk' pointer may be NULL. Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NAndrii Nakryiko <andrii@kernel.org> Acked-by: NDavid Vernet <void@manifault.com> Link: https://lore.kernel.org/bpf/20230404045029.82870-5-alexei.starovoitov@gmail.com
-
由 Alexei Starovoitov 提交于
Remove unused arguments from btf_struct_access() callback. Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NAndrii Nakryiko <andrii@kernel.org> Acked-by: NDavid Vernet <void@manifault.com> Link: https://lore.kernel.org/bpf/20230404045029.82870-3-alexei.starovoitov@gmail.com
-
由 Alexei Starovoitov 提交于
Remove duplicated if (atype == BPF_READ) btf_struct_access() from btf_struct_access() callback and invoke it only for writes. This is possible to do because currently btf_struct_access() custom callback always delegates to generic btf_struct_access() helper for BPF_READ accesses. Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NAndrii Nakryiko <andrii@kernel.org> Acked-by: NDavid Vernet <void@manifault.com> Link: https://lore.kernel.org/bpf/20230404045029.82870-2-alexei.starovoitov@gmail.com
-
- 22 3月, 2023 1 次提交
-
-
由 Eric Dumazet 提交于
rcu_bh is no longer a win, especially for objects freed with standard call_rcu(). Switch neighbour code to no longer disable BH when not necessary. Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
- 21 3月, 2023 1 次提交
-
-
由 Jakub Kicinski 提交于
vlan_present is gone since commit 354259fa ("net: remove skb->vlan_present") rename the offset field to what BPF is currently looking for in this byte - mono_delivery_time and tc_at_ingress. Signed-off-by: NJakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/r/20230321014115.997841-2-kuba@kernel.orgSigned-off-by: NMartin KaFai Lau <martin.lau@kernel.org>
-
- 15 3月, 2023 1 次提交
-
-
由 Eric Dumazet 提交于
We have many lockless accesses to n->nud_state. Before adding another one in the following patch, add annotations to readers and writers. Signed-off-by: NEric Dumazet <edumazet@google.com> Reviewed-by: NDavid Ahern <dsahern@kernel.org> Reviewed-by: NMartin KaFai Lau <martin.lau@kernel.org> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
- 04 3月, 2023 1 次提交
-
-
由 Eduard Zingerman 提交于
Lift verifier restriction to use BPF_ST_MEM instructions to write to context data structures. This requires the following changes: - verifier.c:do_check() for BPF_ST updated to: - no longer forbid writes to registers of type PTR_TO_CTX; - track dst_reg type in the env->insn_aux_data[...].ptr_type field (same way it is done for BPF_STX and BPF_LDX instructions). - verifier.c:convert_ctx_access() and various callbacks invoked by it are updated to handled BPF_ST instruction alongside BPF_STX. Signed-off-by: NEduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20230304011247.566040-2-eddyz87@gmail.comSigned-off-by: NAlexei Starovoitov <ast@kernel.org>
-
- 03 3月, 2023 1 次提交
-
-
由 Tejun Heo 提交于
These helpers are safe to call from any context and there's no reason to restrict access to them. Remove them from bpf_trace and filter lists and add to bpf_base_func_proto() under perfmon_capable(). v2: After consulting with Andrii, relocated in bpf_base_func_proto() so that they require bpf_capable() but not perfomon_capable() as it doesn't read from or affect others on the system. Signed-off-by: NTejun Heo <tj@kernel.org> Cc: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/ZAD8QyoszMZiTzBY@slm.duckdns.orgSigned-off-by: NAlexei Starovoitov <ast@kernel.org>
-
- 02 3月, 2023 3 次提交
-
-
由 Joanne Koong 提交于
Two new kfuncs are added, bpf_dynptr_slice and bpf_dynptr_slice_rdwr. The user must pass in a buffer to store the contents of the data slice if a direct pointer to the data cannot be obtained. For skb and xdp type dynptrs, these two APIs are the only way to obtain a data slice. However, for other types of dynptrs, there is no difference between bpf_dynptr_slice(_rdwr) and bpf_dynptr_data. For skb type dynptrs, the data is copied into the user provided buffer if any of the data is not in the linear portion of the skb. For xdp type dynptrs, the data is copied into the user provided buffer if the data is between xdp frags. If the skb is cloned and a call to bpf_dynptr_data_rdwr is made, then the skb will be uncloned (see bpf_unclone_prologue()). Please note that any bpf_dynptr_write() automatically invalidates any prior data slices of the skb dynptr. This is because the skb may be cloned or may need to pull its paged buffer into the head. As such, any bpf_dynptr_write() will automatically have its prior data slices invalidated, even if the write is to data in the skb head of an uncloned skb. Please note as well that any other helper calls that change the underlying packet buffer (eg bpf_skb_pull_data()) invalidates any data slices of the skb dynptr as well, for the same reasons. Signed-off-by: NJoanne Koong <joannelkoong@gmail.com> Link: https://lore.kernel.org/r/20230301154953.641654-10-joannelkoong@gmail.comSigned-off-by: NAlexei Starovoitov <ast@kernel.org>
-
由 Joanne Koong 提交于
Add xdp dynptrs, which are dynptrs whose underlying pointer points to a xdp_buff. The dynptr acts on xdp data. xdp dynptrs have two main benefits. One is that they allow operations on sizes that are not statically known at compile-time (eg variable-sized accesses). Another is that parsing the packet data through dynptrs (instead of through direct access of xdp->data and xdp->data_end) can be more ergonomic and less brittle (eg does not need manual if checking for being within bounds of data_end). For reads and writes on the dynptr, this includes reading/writing from/to and across fragments. Data slices through the bpf_dynptr_data API are not supported; instead bpf_dynptr_slice() and bpf_dynptr_slice_rdwr() should be used. For examples of how xdp dynptrs can be used, please see the attached selftests. Signed-off-by: NJoanne Koong <joannelkoong@gmail.com> Link: https://lore.kernel.org/r/20230301154953.641654-9-joannelkoong@gmail.comSigned-off-by: NAlexei Starovoitov <ast@kernel.org>
-
由 Joanne Koong 提交于
Add skb dynptrs, which are dynptrs whose underlying pointer points to a skb. The dynptr acts on skb data. skb dynptrs have two main benefits. One is that they allow operations on sizes that are not statically known at compile-time (eg variable-sized accesses). Another is that parsing the packet data through dynptrs (instead of through direct access of skb->data and skb->data_end) can be more ergonomic and less brittle (eg does not need manual if checking for being within bounds of data_end). For bpf prog types that don't support writes on skb data, the dynptr is read-only (bpf_dynptr_write() will return an error) For reads and writes through the bpf_dynptr_read() and bpf_dynptr_write() interfaces, reading and writing from/to data in the head as well as from/to non-linear paged buffers is supported. Data slices through the bpf_dynptr_data API are not supported; instead bpf_dynptr_slice() and bpf_dynptr_slice_rdwr() (added in subsequent commit) should be used. For examples of how skb dynptrs can be used, please see the attached selftests. Signed-off-by: NJoanne Koong <joannelkoong@gmail.com> Link: https://lore.kernel.org/r/20230301154953.641654-8-joannelkoong@gmail.comSigned-off-by: NAlexei Starovoitov <ast@kernel.org>
-
- 18 2月, 2023 1 次提交
-
-
由 Martin KaFai Lau 提交于
The bpf_fib_lookup() also looks up the neigh table. This was done before bpf_redirect_neigh() was added. In the use case that does not manage the neigh table and requires bpf_fib_lookup() to lookup a fib to decide if it needs to redirect or not, the bpf prog can depend only on using bpf_redirect_neigh() to lookup the neigh. It also keeps the neigh entries fresh and connected. This patch adds a bpf_fib_lookup flag, SKIP_NEIGH, to avoid the double neigh lookup when the bpf prog always call bpf_redirect_neigh() to do the neigh lookup. The params->smac output is skipped together when SKIP_NEIGH is set because bpf_redirect_neigh() will figure out the smac also. Signed-off-by: NMartin KaFai Lau <martin.lau@kernel.org> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20230217205515.3583372-1-martin.lau@linux.dev
-
- 17 2月, 2023 1 次提交
-
-
由 Martin KaFai Lau 提交于
The bpf_fib_lookup() helper does not only look up the fib (ie. route) but it also looks up the neigh. Before returning the neigh, the helper does not check for NUD_VALID. When a neigh state (neigh->nud_state) is in NUD_FAILED, its dmac (neigh->ha) could be all zeros. The helper still returns SUCCESS instead of NO_NEIGH in this case. Because of the SUCCESS return value, the bpf prog directly uses the returned dmac and ends up filling all zero in the eth header. This patch checks for NUD_VALID and returns NO_NEIGH if the neigh is not valid. Signed-off-by: NMartin KaFai Lau <martin.lau@kernel.org> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20230217004150.2980689-3-martin.lau@linux.dev
-
- 03 2月, 2023 1 次提交
-
-
由 Lorenzo Bianconi 提交于
Check if the destination device implements ndo_xdp_xmit callback relying on NETDEV_XDP_ACT_NDO_XMIT flags. Moreover, check if the destination device supports XDP non-linear frame in __xdp_enqueue and is_valid_dst routines. This patch allows to perform XDP_REDIRECT on non-linear XDP buffers. Acked-by: NJesper Dangaard Brouer <brouer@redhat.com> Co-developed-by: NKumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: NKumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: NLorenzo Bianconi <lorenzo@kernel.org> Link: https://lore.kernel.org/r/26a94c33520c0bfba021b3fbb2cb8c1e69bf53b8.1675245258.git.lorenzo@kernel.orgSigned-off-by: NAlexei Starovoitov <ast@kernel.org>
-
- 29 1月, 2023 1 次提交
-
-
由 Ilya Leoshkevich 提交于
These functions already check that th_len < sizeof(*th), and propagating the lower bound (th_len > 0) may be challenging in complex code, e.g. as is the case with xdp_synproxy test on s390x [1]. Switch to ARG_CONST_SIZE_OR_ZERO in order to make the verifier accept code where it cannot prove that th_len > 0. [1] https://lore.kernel.org/bpf/CAEf4Bzb3uiSHtUbgVWmkWuJ5Sw1UZd4c_iuS4QXtUkXmTTtXuQ@mail.gmail.com/Signed-off-by: NIlya Leoshkevich <iii@linux.ibm.com> Link: https://lore.kernel.org/r/20230128000650.1516334-2-iii@linux.ibm.comSigned-off-by: NAlexei Starovoitov <ast@kernel.org>
-
- 26 1月, 2023 1 次提交
-
-
由 Kui-Feng Lee 提交于
Resolve an issue when calling sol_tcp_sockopt() on a socket with ktls enabled. Prior to this patch, sol_tcp_sockopt() would only allow calls if the function pointer of setsockopt of the socket was set to tcp_setsockopt(). However, any socket with ktls enabled would have its function pointer set to tls_setsockopt(). To resolve this issue, the patch adds a check of the protocol of the linux socket and allows bpf_setsockopt() to be called if ktls is initialized on the linux socket. This ensures that calls to sol_tcp_sockopt() will succeed on sockets with ktls enabled. Signed-off-by: NKui-Feng Lee <kuifeng@meta.com> Link: https://lore.kernel.org/r/20230125201608.908230-2-kuifeng@meta.comSigned-off-by: NMartin KaFai Lau <martin.lau@kernel.org>
-
- 24 1月, 2023 1 次提交
-
-
由 Stanislav Fomichev 提交于
BPF offloading infra will be reused to implement bound-but-not-offloaded bpf programs. Rename existing helpers for clarity. No functional changes. Cc: John Fastabend <john.fastabend@gmail.com> Cc: David Ahern <dsahern@gmail.com> Cc: Martin KaFai Lau <martin.lau@linux.dev> Cc: Willem de Bruijn <willemb@google.com> Cc: Jesper Dangaard Brouer <brouer@redhat.com> Cc: Anatoly Burakov <anatoly.burakov@intel.com> Cc: Alexander Lobakin <alexandr.lobakin@intel.com> Cc: Magnus Karlsson <magnus.karlsson@gmail.com> Cc: Maryam Tahhan <mtahhan@redhat.com> Cc: xdp-hints@xdp-project.net Cc: netdev@vger.kernel.org Reviewed-by: NJakub Kicinski <kuba@kernel.org> Signed-off-by: NStanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/r/20230119221536.3349901-3-sdf@google.comSigned-off-by: NMartin KaFai Lau <martin.lau@kernel.org>
-
- 18 1月, 2023 1 次提交
-
-
由 Magnus Karlsson 提交于
Document in the XDP_REDIRECT manual section that drivers must call xdp_do_flush() before napi_complete_done(). The two reasons behind this can be found following the links below. Signed-off-by: NMagnus Karlsson <magnus.karlsson@intel.com> Link: https://lore.kernel.org/r/20221220185903.1105011-1-sbohrer@cloudflare.com Link: https://lore.kernel.org/all/20210624160609.292325-1-toke@redhat.com/Acked-by: NToke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 16 1月, 2023 1 次提交
-
-
由 Ziyang Xuan 提交于
Add ipip6 and ip6ip decap support for bpf_skb_adjust_room(). Main use case is for using cls_bpf on ingress hook to decapsulate IPv4 over IPv6 and IPv6 over IPv4 tunnel packets. Add two new flags BPF_F_ADJ_ROOM_DECAP_L3_IPV{4,6} to indicate the new IP header version after decapsulating the outer IP header. Suggested-by: NWillem de Bruijn <willemb@google.com> Signed-off-by: NZiyang Xuan <william.xuanziyang@huawei.com> Reviewed-by: NWillem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/r/b268ec7f0ff9431f4f43b1b40ab856ebb28cb4e1.1673574419.git.william.xuanziyang@huawei.comSigned-off-by: NMartin KaFai Lau <martin.lau@kernel.org>
-
- 11 1月, 2023 1 次提交
-
-
由 Haiyue Wang 提交于
The variable 'insn' is initialized to 'insn_buf' without being changed, only some helper macros are defined, so the insn buffer comparison is unnecessary. Just remove it. This missed removal back in 2377b81d ("bpf: split shared bpf_tcp_sock and bpf_sock_ops implementation"). Signed-off-by: NHaiyue Wang <haiyue.wang@intel.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NStanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/bpf/20230108151258.96570-1-haiyue.wang@intel.com
-
- 21 12月, 2022 1 次提交
-
-
由 Jakub Kicinski 提交于
Anand hit a BUG() when pulling off headers on egress to a SW tunnel. We get to skb_checksum_help() with an invalid checksum offset (commit d7ea0d9d ("net: remove two BUG() from skb_checksum_help()") converted those BUGs to WARN_ONs()). He points out oddness in how skb_postpull_rcsum() gets used. Indeed looks like we should pull before "postpull", otherwise the CHECKSUM_PARTIAL fixup from skb_postpull_rcsum() will not be able to do its job: if (skb->ip_summed == CHECKSUM_PARTIAL && skb_checksum_start_offset(skb) < 0) skb->ip_summed = CHECKSUM_NONE; Reported-by: NAnand Parthasarathy <anpartha@meta.com> Fixes: 6578171a ("bpf: add bpf_skb_change_proto helper") Signed-off-by: NJakub Kicinski <kuba@kernel.org> Acked-by: NStanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/r/20221220004701.402165-1-kuba@kernel.orgSigned-off-by: NMartin KaFai Lau <martin.lau@kernel.org>
-
- 20 12月, 2022 1 次提交
-
-
由 Christian Ehrig 提交于
This patch allows to remove TUNNEL_KEY from the tunnel flags bitmap when using bpf_skb_set_tunnel_key by providing a BPF_F_NO_TUNNEL_KEY flag. On egress, the resulting tunnel header will not contain a tunnel key if the protocol and implementation supports it. At the moment bpf_tunnel_key wants a user to specify a numeric tunnel key. This will wrap the inner packet into a tunnel header with the key bit and value set accordingly. This is problematic when using a tunnel protocol that supports optional tunnel keys and a receiving tunnel device that is not expecting packets with the key bit set. The receiver won't decapsulate and drop the packet. RFC 2890 and RFC 2784 GRE tunnels are examples where this flag is useful. It allows for generating packets, that can be decapsulated by a GRE tunnel device not operating in collect metadata mode or not expecting the key bit set. Signed-off-by: NChristian Ehrig <cehrig@cloudflare.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Reviewed-by: NJakub Sitnicki <jakub@cloudflare.com> Acked-by: NStanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/bpf/20221218051734.31411-1-cehrig@cloudflare.com
-
- 06 12月, 2022 1 次提交
-
-
由 Eyal Birger 提交于
This change adds xfrm metadata helpers using the unstable kfunc call interface for the TC-BPF hooks. This allows steering traffic towards different IPsec connections based on logic implemented in bpf programs. This object is built based on the availability of BTF debug info. When setting the xfrm metadata, percpu metadata dsts are used in order to avoid allocating a metadata dst per packet. In order to guarantee safe module unload, the percpu dsts are allocated on first use and never freed. The percpu pointer is stored in net/core/filter.c so that it can be reused on module reload. The metadata percpu dsts take ownership of the original skb dsts so that they may be used as part of the xfrm transmission logic - e.g. for MTU calculations. Signed-off-by: NEyal Birger <eyal.birger@gmail.com> Link: https://lore.kernel.org/r/20221203084659.1837829-3-eyal.birger@gmail.comSigned-off-by: NMartin KaFai Lau <martin.lau@kernel.org>
-
- 05 12月, 2022 1 次提交
-
-
由 Toke Høiland-Jørgensen 提交于
The bpf_ct_set_nat_info() kfunc is defined in the nf_nat.ko module, and takes as a parameter the nf_conn___init struct, which is allocated through the bpf_xdp_ct_alloc() helper defined in the nf_conntrack.ko module. However, because kernel modules can't deduplicate BTF types between each other, and the nf_conn___init struct is not referenced anywhere in vmlinux BTF, this leads to two distinct BTF IDs for the same type (one in each module). This confuses the verifier, as described here: https://lore.kernel.org/all/87leoh372s.fsf@toke.dk/ As a workaround, add an explicit BTF_TYPE_EMIT for the type in net/filter.c, so the type definition gets included in vmlinux BTF. This way, both modules can refer to the same type ID (as they both build on top of vmlinux BTF), and the verifier is no longer confused. v2: - Use BTF_TYPE_EMIT (which is a statement so it has to be inside a function definition; use xdp_func_proto() for this, since this is mostly xdp-related). Fixes: 820dc052 ("net: netfilter: move bpf_ct_set_nat_info kfunc in nf_nat_bpf.c") Signed-off-by: NToke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/r/20221201123939.696558-1-toke@redhat.comSigned-off-by: NAlexei Starovoitov <ast@kernel.org>
-
- 22 11月, 2022 1 次提交
-
-
由 Stanislav Fomichev 提交于
To avoid potentially breaking existing users. Both mac/no-mac cases have to be amended; mac_header >= network_header is not enough (verified with a new test, see next patch). Fixes: fd189422 ("bpf: Don't redirect packets with invalid pkt_len") Signed-off-by: NStanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/r/20221121180340.1983627-1-sdf@google.comSigned-off-by: NMartin KaFai Lau <martin.lau@kernel.org>
-
- 19 11月, 2022 1 次提交
-
-
由 Maryam Tahhan 提交于
Add documentation for BPF_MAP_TYPE_DEVMAP and BPF_MAP_TYPE_DEVMAP_HASH including kernel version introduced, usage and examples. Add documentation that describes XDP_REDIRECT. Signed-off-by: NMaryam Tahhan <mtahhan@redhat.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Reviewed-by: NToke Høiland-Jørgensen <toke@redhat.com> Acked-by: NYonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20221115144921.165483-1-mtahhan@redhat.com
-
- 16 11月, 2022 2 次提交
-
-
由 Kuniyuki Iwashima 提交于
We will soon introduce an optional per-netns hash table for UDP. This means we cannot use udp_table directly in most places. Instead, access it via net->ipv4.udp_table. The access will be valid only while initialising udp_table itself and creating/destroying each netns. Signed-off-by: NKuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Toke Høiland-Jørgensen 提交于
For queueing packets in XDP we want to add a new redirect map type with support for 64-bit indexes. To prepare fore this, expand the width of the 'key' argument to the bpf_redirect_map() helper. Since BPF registers are always 64-bit, this should be safe to do after the fact. Acked-by: NSong Liu <song@kernel.org> Reviewed-by: NStanislav Fomichev <sdf@google.com> Signed-off-by: NToke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/r/20221108140601.149971-3-toke@redhat.comSigned-off-by: NAlexei Starovoitov <ast@kernel.org>
-
- 15 11月, 2022 1 次提交
-
-
由 Kumar Kartikeya Dwivedi 提交于
Instead of having to pass multiple arguments that describe the register, pass the bpf_reg_state into the btf_struct_access callback. Currently, all call sites simply reuse the btf and btf_id of the reg they want to check the access of. The only exception to this pattern is the callsite in check_ptr_to_map_access, hence for that case create a dummy reg to simulate PTR_TO_BTF_ID access. Signed-off-by: NKumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20221114191547.1694267-8-memxor@gmail.comSigned-off-by: NAlexei Starovoitov <ast@kernel.org>
-
- 12 11月, 2022 2 次提交
-
-
由 Eric Dumazet 提交于
skb->vlan_present seems redundant. We can instead derive it from this boolean expression: vlan_present = skb->vlan_proto != 0 || skb->vlan_tci != 0 Add a new union, to access both fields in a single load/store when possible. union { u32 vlan_all; struct { __be16 vlan_proto; __u16 vlan_tci; }; }; This allows following patch to remove a conditional test in GRO stack. Note: We move remcsum_offload to keep TC_AT_INGRESS_MASK and SKB_MONO_DELIVERY_TIME_MASK unchanged. Signed-off-by: NEric Dumazet <edumazet@google.com> Acked-by: NYonghong Song <yhs@fb.com> Acked-by: NMartin KaFai Lau <martin.lau@kernel.org> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Martin KaFai Lau 提交于
The bpf-tc prog has already been able to access the skb_hwtstamps(skb)->hwtstamp. This patch extends the same hwtstamp access to the sockops prog. In sockops, the skb is also available to the bpf prog during the BPF_SOCK_OPS_PARSE_HDR_OPT_CB event. There is a use case that the hwtstamp will be useful to the sockops prog to better measure the one-way-delay when the sender has put the tx timestamp in the tcp header option. Signed-off-by: NMartin KaFai Lau <martin.lau@kernel.org> Signed-off-by: NAndrii Nakryiko <andrii@kernel.org> Acked-by: NYonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20221107230420.4192307-2-martin.lau@linux.dev
-
- 04 11月, 2022 1 次提交
-
-
由 Stanislav Fomichev 提交于
syzkaller managed to trigger another case where skb->len == 0 when we enter __dev_queue_xmit: WARNING: CPU: 0 PID: 2470 at include/linux/skbuff.h:2576 skb_assert_len include/linux/skbuff.h:2576 [inline] WARNING: CPU: 0 PID: 2470 at include/linux/skbuff.h:2576 __dev_queue_xmit+0x2069/0x35e0 net/core/dev.c:4295 Call Trace: dev_queue_xmit+0x17/0x20 net/core/dev.c:4406 __bpf_tx_skb net/core/filter.c:2115 [inline] __bpf_redirect_no_mac net/core/filter.c:2140 [inline] __bpf_redirect+0x5fb/0xda0 net/core/filter.c:2163 ____bpf_clone_redirect net/core/filter.c:2447 [inline] bpf_clone_redirect+0x247/0x390 net/core/filter.c:2419 bpf_prog_48159a89cb4a9a16+0x59/0x5e bpf_dispatcher_nop_func include/linux/bpf.h:897 [inline] __bpf_prog_run include/linux/filter.h:596 [inline] bpf_prog_run include/linux/filter.h:603 [inline] bpf_test_run+0x46c/0x890 net/bpf/test_run.c:402 bpf_prog_test_run_skb+0xbdc/0x14c0 net/bpf/test_run.c:1170 bpf_prog_test_run+0x345/0x3c0 kernel/bpf/syscall.c:3648 __sys_bpf+0x43a/0x6c0 kernel/bpf/syscall.c:5005 __do_sys_bpf kernel/bpf/syscall.c:5091 [inline] __se_sys_bpf kernel/bpf/syscall.c:5089 [inline] __x64_sys_bpf+0x7c/0x90 kernel/bpf/syscall.c:5089 do_syscall_64+0x54/0x70 arch/x86/entry/common.c:48 entry_SYSCALL_64_after_hwframe+0x61/0xc6 The reproducer doesn't really reproduce outside of syzkaller environment, so I'm taking a guess here. It looks like we do generate correct ETH_HLEN-sized packet, but we redirect the packet to the tunneling device. Before we do so, we __skb_pull l2 header and arrive again at skb->len == 0. Doesn't seem like we can do anything better than having an explicit check after __skb_pull? Cc: Eric Dumazet <edumazet@google.com> Reported-by: syzbot+f635e86ec3fa0a37e019@syzkaller.appspotmail.com Signed-off-by: NStanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/r/20221027225537.353077-1-sdf@google.comSigned-off-by: NMartin KaFai Lau <martin.lau@kernel.org> Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
-
- 30 9月, 2022 2 次提交
-
-
由 Martin KaFai Lau 提交于
When a bad bpf prog '.init' calls bpf_setsockopt(TCP_CONGESTION, "itself"), it will trigger this loop: .init => bpf_setsockopt(tcp_cc) => .init => bpf_setsockopt(tcp_cc) ... ... => .init => bpf_setsockopt(tcp_cc). It was prevented by the prog->active counter before but the prog->active detection cannot be used in struct_ops as explained in the earlier patch of the set. In this patch, the second bpf_setsockopt(tcp_cc) is not allowed in order to break the loop. This is done by using a bit of an existing 1 byte hole in tcp_sock to check if there is on-going bpf_setsockopt(TCP_CONGESTION) in this tcp_sock. Note that this essentially limits only the first '.init' can call bpf_setsockopt(TCP_CONGESTION) to pick a fallback cc (eg. peer does not support ECN) and the second '.init' cannot fallback to another cc. This applies even the second bpf_setsockopt(TCP_CONGESTION) will not cause a loop. Signed-off-by: NMartin KaFai Lau <martin.lau@kernel.org> Reviewed-by: NEric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20220929070407.965581-5-martin.lau@linux.devSigned-off-by: NAlexei Starovoitov <ast@kernel.org>
-
由 Martin KaFai Lau 提交于
This patch moves the bpf_setsockopt(TCP_CONGESTION) logic into another function. The next patch will add extra logic to avoid recursion and this will make the latter patch easier to follow. Signed-off-by: NMartin KaFai Lau <martin.lau@kernel.org> Link: https://lore.kernel.org/r/20220929070407.965581-4-martin.lau@linux.devSigned-off-by: NAlexei Starovoitov <ast@kernel.org>
-