提交 · a8f4a42f066e02783e960a065fc1241abc82dd45 · openeuler / Kernel

18 7月, 2023 2 次提交

bpf: Add new bpf helper to get SO_ORIGINAL_DST/REPLY_SRC · 50d52727

由 JofDiamonds 提交于 7月 17, 2023

hulk inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I7LTRR
CVE: NA

Reference: https://gitee.com/openeuler/kernel/commit/97aeb284efece2a8af5bb424d4e980905927f7bb

--------------------------------

Add new optname(BPF_SO_ORIGINAL_DST 800, BPF_SO_REPLY_SRC 801)
to get origdst/reply src for bpf progs.
Now only support IPv4.
Signed-off-by: NWang Yufen <wangyufen@huawei.com>
Signed-off-by: NLiu Jian <liujian56@huawei.com>
Signed-off-by: NJofDiamonds <kwb0523@163.com>
Reviewed-by: Nwuchangye <wuchangye@huawei.com>

50d52727

bpf: Add bpf_get_sockops_uid_gid helper function · 35377bf0

由 JofDiamonds 提交于 7月 17, 2023

hulk inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I7LTRR
CVE: NA

Reference: https://gitee.com/openeuler/kernel/commit/9d4b4a05ae00d7e5b2f8a33fdbdf974df182ccb7

--------------------------------

Add the function for bpf sock_ops hook to get sock's uid and gid.
Signed-off-by: NLiu Jian <liujian56@huawei.com>
Conflicts:
	include/uapi/linux/bpf.h
	net/core/filter.c
	tools/include/uapi/linux/bpf.h
Signed-off-by: NJofDiamonds <kwb0523@163.com>
Reviewed-by: Nwuchangye <wuchangye@huawei.com>

35377bf0

16 7月, 2023 1 次提交

bpf, sockops: Enhance the return capability of sockops · 01d81b16

由 zhang-mingyi66 提交于 7月 15, 2023

hulk inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I7LE1H

----------------------------------------------------

Since commit 2585cd62 ("bpf: Only reply field should be writeable"),
sockops is not allowd to modify the replylong field except replylong[0].
The reason is that the replylong[1] to replylong[3] field is not used
at that time.

But in actual use, we can call `BPF_CGROUP_RUN_PROG_SOCK_OPS` in the
kernel modules and expect sockops to return some useful data.

The design comment about bpf_sock_ops::replylong in
include/uapi/linux/bpf.h is described as follows:

```
  struct bpf_sock_ops {
        __u32 op;
        union {
                __u32 args[4];          /* Optionally passed to bpf program */
                __u32 reply;            /* Returned by bpf program          */
                __u32 replylong[4];     /* Optioznally returned by bpf prog  */
        };
  ...
```

It seems to contradict the purpose for which the field was originally
designed. Let's remove this restriction.

Fixes: 2585cd62 ("bpf: Only reply field should be writeable")
Signed-off-by: Nzhang-mingyi66 <zhangmingyi5@huawei.com>

01d81b16

15 7月, 2023 1 次提交

ipv4, bpf: Introduced to support the ULP to modify · eab6965b

由 zhang-mingyi66 提交于 7月 15, 2023

sockets during setopt

hulk inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I7LE1H

------------------------------------------------------

Currently, the ebpf program can distinguish sockets according to
the address accessed by the client, and use the ULP framework to
modify the matched sockets to delay link establishment.
Signed-off-by: Nzhang-mingyi66 <zhangmingyi5@huawei.com>

eab6965b

22 4月, 2023 1 次提交

bpf: minimal support for programs hooked into netfilter framework · fd9c663b

由 Florian Westphal 提交于 4月 21, 2023

This adds minimal support for BPF_PROG_TYPE_NETFILTER bpf programs
that will be invoked via the NF_HOOK() points in the ip stack.

Invocation incurs an indirect call. This is not a necessity: Its
possible to add 'DEFINE_BPF_DISPATCHER(nf_progs)' and handle the
program invocation with the same method already done for xdp progs.

This isn't done here to keep the size of this chunk down.

Verifier restricts verdicts to either DROP or ACCEPT.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Link: https://lore.kernel.org/r/20230421170300.24115-3-fw@strlen.deSigned-off-by: NAlexei Starovoitov <ast@kernel.org>

fd9c663b

18 4月, 2023 1 次提交

bpf: Set skb redirect and from_ingress info in __bpf_tx_skb · 59e498a3

由 Daniel Borkmann 提交于 4月 17, 2023

There are some use-cases where it is desirable to use bpf_redirect()
in combination with ifb device, which currently is not supported, for
example, around filtering inbound traffic with BPF to then push it to
ifb which holds the qdisc for shaping in contrast to doing that on the
egress device.

Toke mentions the following case related to OpenWrt:

Because there's not always a single egress on the other side. These are
mainly home routers, which tend to have one or more WiFi devices bridged
to one or more ethernet ports on the LAN side, and a single upstream WAN
port. And the objective is to control the total amount of traffic going
over the WAN link (in both directions), to deal with bufferbloat in the
ISP network (which is sadly still all too prevalent).

In this setup, the traffic can be split arbitrarily between the links
on the LAN side, and the only "single bottleneck" is the WAN link. So we
install both egress and ingress shapers on this, configured to something
like 95-98% of the true link bandwidth, thus moving the queues into the
qdisc layer in the router. It's usually necessary to set the ingress
bandwidth shaper a bit lower than the egress due to being "downstream"
of the bottleneck link, but it does work surprisingly well.

We usually use something like a matchall filter to put all ingress
traffic on the ifb, so doing the redirect from BPF has not been an
immediate requirement thus far. However, it does seem a bit odd that
this is not possible, and we do have a BPF-based filter that layers on
top of this kind of setup, which currently uses u32 as the ingress
filter and so it could presumably be improved to use BPF instead if
that was available.
Reported-by: NToke Høiland-Jørgensen <toke@redhat.com>
Reported-by: NYafang Shao <laoar.shao@gmail.com>
Reported-by: NTonghao Zhang <xiangxia.m.yue@gmail.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NYafang Shao <laoar.shao@gmail.com>
Acked-by: NToke Høiland-Jørgensen <toke@redhat.com>
Link: https://git.openwrt.org/?p=project/qosify.git;a=blob;f=README
Link: https://lore.kernel.org/bpf/875y9yzbuy.fsf@toke.dk
Link: https://lore.kernel.org/r/8cebc8b2b6e967e10cbafe2ffd6795050e74accd.1681739137.git.daniel@iogearbox.netSigned-off-by: NAlexei Starovoitov <ast@kernel.org>

59e498a3

05 4月, 2023 3 次提交

bpf: Teach verifier that certain helpers accept NULL pointer. · 91571a51

由 Alexei Starovoitov 提交于 4月 03, 2023

bpf_[sk|inode|task|cgrp]_storage_[get|delete]() and bpf_get_socket_cookie() helpers
perform run-time check that sk|inode|task|cgrp pointer != NULL.
Teach verifier about this fact and allow bpf programs to pass
PTR_TO_BTF_ID | PTR_MAYBE_NULL into such helpers.
It will be used in the subsequent patch that will do
bpf_sk_storage_get(.., skb->sk, ...);
Even when 'skb' pointer is trusted the 'sk' pointer may be NULL.
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NAndrii Nakryiko <andrii@kernel.org>
Acked-by: NDavid Vernet <void@manifault.com>
Link: https://lore.kernel.org/bpf/20230404045029.82870-5-alexei.starovoitov@gmail.com

91571a51

bpf: Remove unused arguments from btf_struct_access(). · b7e852a9

由 Alexei Starovoitov 提交于 4月 03, 2023

Remove unused arguments from btf_struct_access() callback.
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NAndrii Nakryiko <andrii@kernel.org>
Acked-by: NDavid Vernet <void@manifault.com>
Link: https://lore.kernel.org/bpf/20230404045029.82870-3-alexei.starovoitov@gmail.com

b7e852a9

bpf: Invoke btf_struct_access() callback only for writes. · 7d64c513

由 Alexei Starovoitov 提交于 4月 03, 2023

Remove duplicated if (atype == BPF_READ) btf_struct_access() from
btf_struct_access() callback and invoke it only for writes. This is
possible to do because currently btf_struct_access() custom callback
always delegates to generic btf_struct_access() helper for BPF_READ
accesses.
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NAndrii Nakryiko <andrii@kernel.org>
Acked-by: NDavid Vernet <void@manifault.com>
Link: https://lore.kernel.org/bpf/20230404045029.82870-2-alexei.starovoitov@gmail.com

7d64c513

22 3月, 2023 1 次提交

neighbour: switch to standard rcu, instead of rcu_bh · 09eed119

由 Eric Dumazet 提交于 3月 21, 2023

rcu_bh is no longer a win, especially for objects freed
with standard call_rcu().

Switch neighbour code to no longer disable BH when not necessary.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

09eed119

21 3月, 2023 1 次提交

net: skbuff: rename __pkt_vlan_present_offset to __mono_tc_offset · 04aae213

由 Jakub Kicinski 提交于 3月 20, 2023

vlan_present is gone since
commit 354259fa ("net: remove skb->vlan_present")
rename the offset field to what BPF is currently looking
for in this byte - mono_delivery_time and tc_at_ingress.
Signed-off-by: NJakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/r/20230321014115.997841-2-kuba@kernel.orgSigned-off-by: NMartin KaFai Lau <martin.lau@kernel.org>

04aae213

15 3月, 2023 1 次提交

neighbour: annotate lockless accesses to n->nud_state · b071af52

由 Eric Dumazet 提交于 3月 13, 2023

We have many lockless accesses to n->nud_state.

Before adding another one in the following patch,
add annotations to readers and writers.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Reviewed-by: NDavid Ahern <dsahern@kernel.org>
Reviewed-by: NMartin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

b071af52

04 3月, 2023 1 次提交

bpf: allow ctx writes using BPF_ST_MEM instruction · 0d80a619

由 Eduard Zingerman 提交于 3月 04, 2023

Lift verifier restriction to use BPF_ST_MEM instructions to write to
context data structures. This requires the following changes:
 - verifier.c:do_check() for BPF_ST updated to:
   - no longer forbid writes to registers of type PTR_TO_CTX;
   - track dst_reg type in the env->insn_aux_data[...].ptr_type field
     (same way it is done for BPF_STX and BPF_LDX instructions).
 - verifier.c:convert_ctx_access() and various callbacks invoked by
   it are updated to handled BPF_ST instruction alongside BPF_STX.
Signed-off-by: NEduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20230304011247.566040-2-eddyz87@gmail.comSigned-off-by: NAlexei Starovoitov <ast@kernel.org>

0d80a619

03 3月, 2023 1 次提交

bpf: Make bpf_get_current_[ancestor_]cgroup_id() available for all program types · c501bf55

由 Tejun Heo 提交于 3月 02, 2023

These helpers are safe to call from any context and there's no reason to
restrict access to them. Remove them from bpf_trace and filter lists and add
to bpf_base_func_proto() under perfmon_capable().

v2: After consulting with Andrii, relocated in bpf_base_func_proto() so that
they require bpf_capable() but not perfomon_capable() as it doesn't read
from or affect others on the system.
Signed-off-by: NTejun Heo <tj@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/ZAD8QyoszMZiTzBY@slm.duckdns.orgSigned-off-by: NAlexei Starovoitov <ast@kernel.org>

c501bf55

02 3月, 2023 3 次提交

bpf: Add bpf_dynptr_slice and bpf_dynptr_slice_rdwr · 66e3a13e

由 Joanne Koong 提交于 3月 01, 2023

Two new kfuncs are added, bpf_dynptr_slice and bpf_dynptr_slice_rdwr.
The user must pass in a buffer to store the contents of the data slice
if a direct pointer to the data cannot be obtained.

For skb and xdp type dynptrs, these two APIs are the only way to obtain
a data slice. However, for other types of dynptrs, there is no
difference between bpf_dynptr_slice(_rdwr) and bpf_dynptr_data.

For skb type dynptrs, the data is copied into the user provided buffer
if any of the data is not in the linear portion of the skb. For xdp type
dynptrs, the data is copied into the user provided buffer if the data is
between xdp frags.

If the skb is cloned and a call to bpf_dynptr_data_rdwr is made, then
the skb will be uncloned (see bpf_unclone_prologue()).

Please note that any bpf_dynptr_write() automatically invalidates any prior
data slices of the skb dynptr. This is because the skb may be cloned or
may need to pull its paged buffer into the head. As such, any
bpf_dynptr_write() will automatically have its prior data slices
invalidated, even if the write is to data in the skb head of an uncloned
skb. Please note as well that any other helper calls that change the
underlying packet buffer (eg bpf_skb_pull_data()) invalidates any data
slices of the skb dynptr as well, for the same reasons.
Signed-off-by: NJoanne Koong <joannelkoong@gmail.com>
Link: https://lore.kernel.org/r/20230301154953.641654-10-joannelkoong@gmail.comSigned-off-by: NAlexei Starovoitov <ast@kernel.org>

66e3a13e

bpf: Add xdp dynptrs · 05421aec

由 Joanne Koong 提交于 3月 01, 2023

Add xdp dynptrs, which are dynptrs whose underlying pointer points
to a xdp_buff. The dynptr acts on xdp data. xdp dynptrs have two main
benefits. One is that they allow operations on sizes that are not
statically known at compile-time (eg variable-sized accesses).
Another is that parsing the packet data through dynptrs (instead of
through direct access of xdp->data and xdp->data_end) can be more
ergonomic and less brittle (eg does not need manual if checking for
being within bounds of data_end).

For reads and writes on the dynptr, this includes reading/writing
from/to and across fragments. Data slices through the bpf_dynptr_data
API are not supported; instead bpf_dynptr_slice() and
bpf_dynptr_slice_rdwr() should be used.

For examples of how xdp dynptrs can be used, please see the attached
selftests.
Signed-off-by: NJoanne Koong <joannelkoong@gmail.com>
Link: https://lore.kernel.org/r/20230301154953.641654-9-joannelkoong@gmail.comSigned-off-by: NAlexei Starovoitov <ast@kernel.org>

05421aec

bpf: Add skb dynptrs · b5964b96

由 Joanne Koong 提交于 3月 01, 2023

Add skb dynptrs, which are dynptrs whose underlying pointer points
to a skb. The dynptr acts on skb data. skb dynptrs have two main
benefits. One is that they allow operations on sizes that are not
statically known at compile-time (eg variable-sized accesses).
Another is that parsing the packet data through dynptrs (instead of
through direct access of skb->data and skb->data_end) can be more
ergonomic and less brittle (eg does not need manual if checking for
being within bounds of data_end).

For bpf prog types that don't support writes on skb data, the dynptr is
read-only (bpf_dynptr_write() will return an error)

For reads and writes through the bpf_dynptr_read() and bpf_dynptr_write()
interfaces, reading and writing from/to data in the head as well as from/to
non-linear paged buffers is supported. Data slices through the
bpf_dynptr_data API are not supported; instead bpf_dynptr_slice() and
bpf_dynptr_slice_rdwr() (added in subsequent commit) should be used.

For examples of how skb dynptrs can be used, please see the attached
selftests.
Signed-off-by: NJoanne Koong <joannelkoong@gmail.com>
Link: https://lore.kernel.org/r/20230301154953.641654-8-joannelkoong@gmail.comSigned-off-by: NAlexei Starovoitov <ast@kernel.org>

b5964b96

18 2月, 2023 1 次提交

bpf: Add BPF_FIB_LOOKUP_SKIP_NEIGH for bpf_fib_lookup · 31de4105

由 Martin KaFai Lau 提交于 2月 17, 2023

The bpf_fib_lookup() also looks up the neigh table.
This was done before bpf_redirect_neigh() was added.

In the use case that does not manage the neigh table
and requires bpf_fib_lookup() to lookup a fib to
decide if it needs to redirect or not, the bpf prog can
depend only on using bpf_redirect_neigh() to lookup the
neigh. It also keeps the neigh entries fresh and connected.

This patch adds a bpf_fib_lookup flag, SKIP_NEIGH, to avoid
the double neigh lookup when the bpf prog always call
bpf_redirect_neigh() to do the neigh lookup. The params->smac
output is skipped together when SKIP_NEIGH is set because
bpf_redirect_neigh() will figure out the smac also.
Signed-off-by: NMartin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230217205515.3583372-1-martin.lau@linux.dev

31de4105

17 2月, 2023 1 次提交

bpf: bpf_fib_lookup should not return neigh in NUD_FAILED state · 1fe4850b

由 Martin KaFai Lau 提交于 2月 16, 2023

The bpf_fib_lookup() helper does not only look up the fib (ie. route)
but it also looks up the neigh. Before returning the neigh, the helper
does not check for NUD_VALID. When a neigh state (neigh->nud_state)
is in NUD_FAILED, its dmac (neigh->ha) could be all zeros. The helper
still returns SUCCESS instead of NO_NEIGH in this case. Because of the
SUCCESS return value, the bpf prog directly uses the returned dmac
and ends up filling all zero in the eth header.

This patch checks for NUD_VALID and returns NO_NEIGH if the neigh is
not valid.
Signed-off-by: NMartin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230217004150.2980689-3-martin.lau@linux.dev

1fe4850b

03 2月, 2023 1 次提交

bpf: devmap: check XDP features in __xdp_enqueue routine · b9d460c9

由 Lorenzo Bianconi 提交于 2月 01, 2023

Check if the destination device implements ndo_xdp_xmit callback relying
on NETDEV_XDP_ACT_NDO_XMIT flags. Moreover, check if the destination device
supports XDP non-linear frame in __xdp_enqueue and is_valid_dst routines.
This patch allows to perform XDP_REDIRECT on non-linear XDP buffers.
Acked-by: NJesper Dangaard Brouer <brouer@redhat.com>
Co-developed-by: NKumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: NKumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: NLorenzo Bianconi <lorenzo@kernel.org>
Link: https://lore.kernel.org/r/26a94c33520c0bfba021b3fbb2cb8c1e69bf53b8.1675245258.git.lorenzo@kernel.orgSigned-off-by: NAlexei Starovoitov <ast@kernel.org>

b9d460c9

29 1月, 2023 1 次提交

bpf: Use ARG_CONST_SIZE_OR_ZERO for 3rd argument of bpf_tcp_raw_gen_syncookie_ipv{4,6}() · bf384975

由 Ilya Leoshkevich 提交于 1月 28, 2023

These functions already check that th_len < sizeof(*th), and
propagating the lower bound (th_len > 0) may be challenging
in complex code, e.g. as is the case with xdp_synproxy test on
s390x [1]. Switch to ARG_CONST_SIZE_OR_ZERO in order to make the
verifier accept code where it cannot prove that th_len > 0.

[1] https://lore.kernel.org/bpf/CAEf4Bzb3uiSHtUbgVWmkWuJ5Sw1UZd4c_iuS4QXtUkXmTTtXuQ@mail.gmail.com/Signed-off-by: NIlya Leoshkevich <iii@linux.ibm.com>
Link: https://lore.kernel.org/r/20230128000650.1516334-2-iii@linux.ibm.comSigned-off-by: NAlexei Starovoitov <ast@kernel.org>

bf384975

26 1月, 2023 1 次提交

bpf: Check the protocol of a sock to agree the calls to bpf_setsockopt(). · 2ab42c7b

由 Kui-Feng Lee 提交于 1月 25, 2023

Resolve an issue when calling sol_tcp_sockopt() on a socket with ktls
enabled. Prior to this patch, sol_tcp_sockopt() would only allow calls
if the function pointer of setsockopt of the socket was set to
tcp_setsockopt(). However, any socket with ktls enabled would have its
function pointer set to tls_setsockopt(). To resolve this issue, the
patch adds a check of the protocol of the linux socket and allows
bpf_setsockopt() to be called if ktls is initialized on the linux
socket. This ensures that calls to sol_tcp_sockopt() will succeed on
sockets with ktls enabled.
Signed-off-by: NKui-Feng Lee <kuifeng@meta.com>
Link: https://lore.kernel.org/r/20230125201608.908230-2-kuifeng@meta.comSigned-off-by: NMartin KaFai Lau <martin.lau@kernel.org>

2ab42c7b

24 1月, 2023 1 次提交

bpf: Rename bpf_{prog,map}_is_dev_bound to is_offloaded · 9d03ebc7

由 Stanislav Fomichev 提交于 1月 19, 2023

BPF offloading infra will be reused to implement
bound-but-not-offloaded bpf programs. Rename existing
helpers for clarity. No functional changes.

Cc: John Fastabend <john.fastabend@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Cc: Willem de Bruijn <willemb@google.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Anatoly Burakov <anatoly.burakov@intel.com>
Cc: Alexander Lobakin <alexandr.lobakin@intel.com>
Cc: Magnus Karlsson <magnus.karlsson@gmail.com>
Cc: Maryam Tahhan <mtahhan@redhat.com>
Cc: xdp-hints@xdp-project.net
Cc: netdev@vger.kernel.org
Reviewed-by: NJakub Kicinski <kuba@kernel.org>
Signed-off-by: NStanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/20230119221536.3349901-3-sdf@google.comSigned-off-by: NMartin KaFai Lau <martin.lau@kernel.org>

9d03ebc7

18 1月, 2023 1 次提交

xdp: document xdp_do_flush() before napi_complete_done() · 68e5b6aa

由 Magnus Karlsson 提交于 1月 17, 2023

Document in the XDP_REDIRECT manual section that drivers must call
xdp_do_flush() before napi_complete_done(). The two reasons behind
this can be found following the links below.
Signed-off-by: NMagnus Karlsson <magnus.karlsson@intel.com>
Link: https://lore.kernel.org/r/20221220185903.1105011-1-sbohrer@cloudflare.com
Link: https://lore.kernel.org/all/20210624160609.292325-1-toke@redhat.com/Acked-by: NToke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

68e5b6aa

16 1月, 2023 1 次提交

bpf: Add ipip6 and ip6ip decap support for bpf_skb_adjust_room() · d219df60

由 Ziyang Xuan 提交于 1月 13, 2023

Add ipip6 and ip6ip decap support for bpf_skb_adjust_room().
Main use case is for using cls_bpf on ingress hook to decapsulate
IPv4 over IPv6 and IPv6 over IPv4 tunnel packets.

Add two new flags BPF_F_ADJ_ROOM_DECAP_L3_IPV{4,6} to indicate the
new IP header version after decapsulating the outer IP header.
Suggested-by: NWillem de Bruijn <willemb@google.com>
Signed-off-by: NZiyang Xuan <william.xuanziyang@huawei.com>
Reviewed-by: NWillem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/r/b268ec7f0ff9431f4f43b1b40ab856ebb28cb4e1.1673574419.git.william.xuanziyang@huawei.comSigned-off-by: NMartin KaFai Lau <martin.lau@kernel.org>

d219df60

11 1月, 2023 1 次提交

bpf: Remove the unnecessary insn buffer comparison · 66cf99b5

由 Haiyue Wang 提交于 1月 08, 2023

The variable 'insn' is initialized to 'insn_buf' without being changed, only
some helper macros are defined, so the insn buffer comparison is unnecessary.
Just remove it. This missed removal back in 2377b81d ("bpf: split shared
bpf_tcp_sock and bpf_sock_ops implementation").
Signed-off-by: NHaiyue Wang <haiyue.wang@intel.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NStanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/bpf/20230108151258.96570-1-haiyue.wang@intel.com

66cf99b5

21 12月, 2022 1 次提交

bpf: pull before calling skb_postpull_rcsum() · 54c3f1a8

由 Jakub Kicinski 提交于 12月 19, 2022

Anand hit a BUG() when pulling off headers on egress to a SW tunnel.
We get to skb_checksum_help() with an invalid checksum offset
(commit d7ea0d9d ("net: remove two BUG() from skb_checksum_help()")
converted those BUGs to WARN_ONs()).
He points out oddness in how skb_postpull_rcsum() gets used.
Indeed looks like we should pull before "postpull", otherwise
the CHECKSUM_PARTIAL fixup from skb_postpull_rcsum() will not
be able to do its job:

	if (skb->ip_summed == CHECKSUM_PARTIAL &&
	    skb_checksum_start_offset(skb) < 0)
		skb->ip_summed = CHECKSUM_NONE;
Reported-by: NAnand Parthasarathy <anpartha@meta.com>
Fixes: 6578171a ("bpf: add bpf_skb_change_proto helper")
Signed-off-by: NJakub Kicinski <kuba@kernel.org>
Acked-by: NStanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/20221220004701.402165-1-kuba@kernel.orgSigned-off-by: NMartin KaFai Lau <martin.lau@kernel.org>

54c3f1a8

20 12月, 2022 1 次提交

bpf: Add flag BPF_F_NO_TUNNEL_KEY to bpf_skb_set_tunnel_key() · e26aa600

由 Christian Ehrig 提交于 12月 18, 2022

This patch allows to remove TUNNEL_KEY from the tunnel flags bitmap
when using bpf_skb_set_tunnel_key by providing a BPF_F_NO_TUNNEL_KEY
flag. On egress, the resulting tunnel header will not contain a tunnel
key if the protocol and implementation supports it.

At the moment bpf_tunnel_key wants a user to specify a numeric tunnel
key. This will wrap the inner packet into a tunnel header with the key
bit and value set accordingly. This is problematic when using a tunnel
protocol that supports optional tunnel keys and a receiving tunnel
device that is not expecting packets with the key bit set. The receiver
won't decapsulate and drop the packet.

RFC 2890 and RFC 2784 GRE tunnels are examples where this flag is
useful. It allows for generating packets, that can be decapsulated by
a GRE tunnel device not operating in collect metadata mode or not
expecting the key bit set.
Signed-off-by: NChristian Ehrig <cehrig@cloudflare.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Reviewed-by: NJakub Sitnicki <jakub@cloudflare.com>
Acked-by: NStanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/bpf/20221218051734.31411-1-cehrig@cloudflare.com

e26aa600

06 12月, 2022 1 次提交

xfrm: interface: Add unstable helpers for setting/getting XFRM metadata from TC-BPF · 94151f5a

由 Eyal Birger 提交于 12月 03, 2022

This change adds xfrm metadata helpers using the unstable kfunc call
interface for the TC-BPF hooks. This allows steering traffic towards
different IPsec connections based on logic implemented in bpf programs.

This object is built based on the availability of BTF debug info.

When setting the xfrm metadata, percpu metadata dsts are used in order
to avoid allocating a metadata dst per packet.

In order to guarantee safe module unload, the percpu dsts are allocated
on first use and never freed. The percpu pointer is stored in
net/core/filter.c so that it can be reused on module reload.

The metadata percpu dsts take ownership of the original skb dsts so
that they may be used as part of the xfrm transmission logic - e.g.
for MTU calculations.
Signed-off-by: NEyal Birger <eyal.birger@gmail.com>
Link: https://lore.kernel.org/r/20221203084659.1837829-3-eyal.birger@gmail.comSigned-off-by: NMartin KaFai Lau <martin.lau@kernel.org>

94151f5a

05 12月, 2022 1 次提交

bpf: Add dummy type reference to nf_conn___init to fix type deduplication · 578ce69f

由 Toke Høiland-Jørgensen 提交于 12月 01, 2022

The bpf_ct_set_nat_info() kfunc is defined in the nf_nat.ko module, and
takes as a parameter the nf_conn___init struct, which is allocated through
the bpf_xdp_ct_alloc() helper defined in the nf_conntrack.ko module.
However, because kernel modules can't deduplicate BTF types between each
other, and the nf_conn___init struct is not referenced anywhere in vmlinux
BTF, this leads to two distinct BTF IDs for the same type (one in each
module). This confuses the verifier, as described here:

https://lore.kernel.org/all/87leoh372s.fsf@toke.dk/

As a workaround, add an explicit BTF_TYPE_EMIT for the type in
net/filter.c, so the type definition gets included in vmlinux BTF. This
way, both modules can refer to the same type ID (as they both build on top
of vmlinux BTF), and the verifier is no longer confused.

v2:

- Use BTF_TYPE_EMIT (which is a statement so it has to be inside a function
  definition; use xdp_func_proto() for this, since this is mostly
  xdp-related).

Fixes: 820dc052 ("net: netfilter: move bpf_ct_set_nat_info kfunc in nf_nat_bpf.c")
Signed-off-by: NToke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/r/20221201123939.696558-1-toke@redhat.comSigned-off-by: NAlexei Starovoitov <ast@kernel.org>

578ce69f

22 11月, 2022 1 次提交

bpf: Move skb->len == 0 checks into __bpf_redirect · 114039b3

由 Stanislav Fomichev 提交于 11月 21, 2022

To avoid potentially breaking existing users.

Both mac/no-mac cases have to be amended; mac_header >= network_header
is not enough (verified with a new test, see next patch).

Fixes: fd189422 ("bpf: Don't redirect packets with invalid pkt_len")
Signed-off-by: NStanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/20221121180340.1983627-1-sdf@google.comSigned-off-by: NMartin KaFai Lau <martin.lau@kernel.org>

114039b3

19 11月, 2022 1 次提交

bpf, docs: DEVMAPs and XDP_REDIRECT · d1e91173

由 Maryam Tahhan 提交于 11月 15, 2022

Add documentation for BPF_MAP_TYPE_DEVMAP and BPF_MAP_TYPE_DEVMAP_HASH
including kernel version introduced, usage and examples.

Add documentation that describes XDP_REDIRECT.
Signed-off-by: NMaryam Tahhan <mtahhan@redhat.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Reviewed-by: NToke Høiland-Jørgensen <toke@redhat.com>
Acked-by: NYonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20221115144921.165483-1-mtahhan@redhat.com

d1e91173

16 11月, 2022 2 次提交

udp: Access &udp_table via net. · ba6aac15

由 Kuniyuki Iwashima 提交于 11月 14, 2022

We will soon introduce an optional per-netns hash table
for UDP.

This means we cannot use udp_table directly in most places.

Instead, access it via net->ipv4.udp_table.

The access will be valid only while initialising udp_table
itself and creating/destroying each netns.
Signed-off-by: NKuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ba6aac15

bpf: Expand map key argument of bpf_redirect_map to u64 · 32637e33

由 Toke Høiland-Jørgensen 提交于 11月 08, 2022

For queueing packets in XDP we want to add a new redirect map type with
support for 64-bit indexes. To prepare fore this, expand the width of the
'key' argument to the bpf_redirect_map() helper. Since BPF registers are
always 64-bit, this should be safe to do after the fact.
Acked-by: NSong Liu <song@kernel.org>
Reviewed-by: NStanislav Fomichev <sdf@google.com>
Signed-off-by: NToke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/r/20221108140601.149971-3-toke@redhat.comSigned-off-by: NAlexei Starovoitov <ast@kernel.org>

32637e33

15 11月, 2022 1 次提交

bpf: Refactor btf_struct_access · 6728aea7

由 Kumar Kartikeya Dwivedi 提交于 11月 15, 2022

Instead of having to pass multiple arguments that describe the register,
pass the bpf_reg_state into the btf_struct_access callback. Currently,
all call sites simply reuse the btf and btf_id of the reg they want to
check the access of. The only exception to this pattern is the callsite
in check_ptr_to_map_access, hence for that case create a dummy reg to
simulate PTR_TO_BTF_ID access.
Signed-off-by: NKumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20221114191547.1694267-8-memxor@gmail.comSigned-off-by: NAlexei Starovoitov <ast@kernel.org>

6728aea7

12 11月, 2022 2 次提交

net: remove skb->vlan_present · 354259fa

由 Eric Dumazet 提交于 11月 09, 2022

skb->vlan_present seems redundant.

We can instead derive it from this boolean expression:

vlan_present = skb->vlan_proto != 0 || skb->vlan_tci != 0

Add a new union, to access both fields in a single load/store
when possible.

	union {
		u32	vlan_all;
		struct {
		__be16	vlan_proto;
		__u16	vlan_tci;
		};
	};

This allows following patch to remove a conditional test in GRO stack.

Note:
  We move remcsum_offload to keep TC_AT_INGRESS_MASK
  and SKB_MONO_DELIVERY_TIME_MASK unchanged.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Acked-by: NYonghong Song <yhs@fb.com>
Acked-by: NMartin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>

354259fa

bpf: Add hwtstamp field for the sockops prog · 9bb05349

由 Martin KaFai Lau 提交于 11月 07, 2022

The bpf-tc prog has already been able to access the
skb_hwtstamps(skb)->hwtstamp.  This patch extends the same hwtstamp
access to the sockops prog.

In sockops, the skb is also available to the bpf prog during
the BPF_SOCK_OPS_PARSE_HDR_OPT_CB event.  There is a use case
that the hwtstamp will be useful to the sockops prog to better
measure the one-way-delay when the sender has put the tx
timestamp in the tcp header option.
Signed-off-by: NMartin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: NAndrii Nakryiko <andrii@kernel.org>
Acked-by: NYonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20221107230420.4192307-2-martin.lau@linux.dev

9bb05349

04 11月, 2022 1 次提交

bpf: make sure skb->len != 0 when redirecting to a tunneling device · 07ec7b50

由 Stanislav Fomichev 提交于 10月 27, 2022

syzkaller managed to trigger another case where skb->len == 0
when we enter __dev_queue_xmit:

WARNING: CPU: 0 PID: 2470 at include/linux/skbuff.h:2576 skb_assert_len include/linux/skbuff.h:2576 [inline]
WARNING: CPU: 0 PID: 2470 at include/linux/skbuff.h:2576 __dev_queue_xmit+0x2069/0x35e0 net/core/dev.c:4295

Call Trace:
 dev_queue_xmit+0x17/0x20 net/core/dev.c:4406
 __bpf_tx_skb net/core/filter.c:2115 [inline]
 __bpf_redirect_no_mac net/core/filter.c:2140 [inline]
 __bpf_redirect+0x5fb/0xda0 net/core/filter.c:2163
 ____bpf_clone_redirect net/core/filter.c:2447 [inline]
 bpf_clone_redirect+0x247/0x390 net/core/filter.c:2419
 bpf_prog_48159a89cb4a9a16+0x59/0x5e
 bpf_dispatcher_nop_func include/linux/bpf.h:897 [inline]
 __bpf_prog_run include/linux/filter.h:596 [inline]
 bpf_prog_run include/linux/filter.h:603 [inline]
 bpf_test_run+0x46c/0x890 net/bpf/test_run.c:402
 bpf_prog_test_run_skb+0xbdc/0x14c0 net/bpf/test_run.c:1170
 bpf_prog_test_run+0x345/0x3c0 kernel/bpf/syscall.c:3648
 __sys_bpf+0x43a/0x6c0 kernel/bpf/syscall.c:5005
 __do_sys_bpf kernel/bpf/syscall.c:5091 [inline]
 __se_sys_bpf kernel/bpf/syscall.c:5089 [inline]
 __x64_sys_bpf+0x7c/0x90 kernel/bpf/syscall.c:5089
 do_syscall_64+0x54/0x70 arch/x86/entry/common.c:48
 entry_SYSCALL_64_after_hwframe+0x61/0xc6

The reproducer doesn't really reproduce outside of syzkaller
environment, so I'm taking a guess here. It looks like we
do generate correct ETH_HLEN-sized packet, but we redirect
the packet to the tunneling device. Before we do so, we
__skb_pull l2 header and arrive again at skb->len == 0.
Doesn't seem like we can do anything better than having
an explicit check after __skb_pull?

Cc: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot+f635e86ec3fa0a37e019@syzkaller.appspotmail.com
Signed-off-by: NStanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/20221027225537.353077-1-sdf@google.comSigned-off-by: NMartin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

07ec7b50

30 9月, 2022 2 次提交

bpf: tcp: Stop bpf_setsockopt(TCP_CONGESTION) in init ops to recur itself · 061ff040

由 Martin KaFai Lau 提交于 9月 29, 2022

When a bad bpf prog '.init' calls
bpf_setsockopt(TCP_CONGESTION, "itself"), it will trigger this loop:

.init => bpf_setsockopt(tcp_cc) => .init => bpf_setsockopt(tcp_cc) ...
... => .init => bpf_setsockopt(tcp_cc).

It was prevented by the prog->active counter before but the prog->active
detection cannot be used in struct_ops as explained in the earlier
patch of the set.

In this patch, the second bpf_setsockopt(tcp_cc) is not allowed
in order to break the loop.  This is done by using a bit of
an existing 1 byte hole in tcp_sock to check if there is
on-going bpf_setsockopt(TCP_CONGESTION) in this tcp_sock.

Note that this essentially limits only the first '.init' can
call bpf_setsockopt(TCP_CONGESTION) to pick a fallback cc (eg. peer
does not support ECN) and the second '.init' cannot fallback to
another cc.  This applies even the second
bpf_setsockopt(TCP_CONGESTION) will not cause a loop.
Signed-off-by: NMartin KaFai Lau <martin.lau@kernel.org>
Reviewed-by: NEric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20220929070407.965581-5-martin.lau@linux.devSigned-off-by: NAlexei Starovoitov <ast@kernel.org>

061ff040

bpf: Refactor bpf_setsockopt(TCP_CONGESTION) handling into another function · 1e7d217f

由 Martin KaFai Lau 提交于 9月 29, 2022

This patch moves the bpf_setsockopt(TCP_CONGESTION) logic into
another function. The next patch will add extra logic to avoid
recursion and this will make the latter patch easier to follow.
Signed-off-by: NMartin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20220929070407.965581-4-martin.lau@linux.devSigned-off-by: NAlexei Starovoitov <ast@kernel.org>

1e7d217f

openeuler / Kernel 大约 1 年 前同步成功

openeuler / Kernel
大约 1 年前同步成功