- 10 6月, 2021 1 次提交
-
-
由 Paolo Abeni 提交于
Kaustubh reported and diagnosed a panic in udp_lib_lookup(). The root cause is udp_abort() racing with close(). Both racing functions acquire the socket lock, but udp{v6}_destroy_sock() release it before performing destructive actions. We can't easily extend the socket lock scope to avoid the race, instead use the SOCK_DEAD flag to prevent udp_abort from doing any action when the critical race happens. Diagnosed-and-tested-by: NKaustubh Pandey <kapandey@codeaurora.org> Fixes: 5d77dca8 ("net: diag: support SOCK_DESTROY for UDP sockets") Signed-off-by: NPaolo Abeni <pabeni@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 02 4月, 2021 1 次提交
-
-
由 Cong Wang 提交于
Currently sockmap calls into each protocol to update the struct proto and replace it. This certainly won't work when the protocol is implemented as a module, for example, AF_UNIX. Introduce a new ops sk->sk_prot->psock_update_sk_prot(), so each protocol can implement its own way to replace the struct proto. This also helps get rid of symbol dependencies on CONFIG_INET. Signed-off-by: NCong Wang <cong.wang@bytedance.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210331023237.41094-11-xiyou.wangcong@gmail.com
-
- 31 3月, 2021 1 次提交
-
-
由 Paolo Abeni 提交于
When UDP packets generated locally by a socket with UDP_SEGMENT traverse the following path: UDP tunnel(xmit) -> veth (segmentation) -> veth (gro) -> UDP tunnel (rx) -> UDP socket (no UDP_GRO) ip_summed will be set to CHECKSUM_PARTIAL at creation time and such checksum mode will be preserved in the above path up to the UDP tunnel receive code where we have: __iptunnel_pull_header() -> skb_pull_rcsum() -> skb_postpull_rcsum() -> __skb_postpull_rcsum() The latter will convert the skb to CHECKSUM_NONE. The UDP GSO packet will be later segmented as part of the rx socket receive operation, and will present a CHECKSUM_NONE after segmentation. Additionally the segmented packets UDP CB still refers to the original GSO packet len. Overall that causes unexpected/wrong csum validation errors later in the UDP receive path. We could possibly address the issue with some additional checks and csum mangling in the UDP tunnel code. Since the issue affects only this UDP receive slow path, let's set a suitable csum status there. Note that SKB_GSO_UDP_L4 or SKB_GSO_FRAGLIST packets lacking an UDP encapsulation present a valid checksum when landing to udp_queue_rcv_skb(), as the UDP checksum has been validated by the GRO engine. v2 -> v3: - even more verbose commit message and comments v1 -> v2: - restrict the csum update to the packets strictly needing them - hopefully clarify the commit message and code comments Signed-off-by: NPaolo Abeni <pabeni@redhat.com> Reviewed-by: NWillem de Bruijn <willemb@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 05 2月, 2021 1 次提交
-
-
由 Xin Long 提交于
When enabling encap for a ipv6 socket without udp_encap_needed_key increased, UDP GRO won't work for v4 mapped v6 address packets as sk will be NULL in udp4_gro_receive(). This patch is to enable it by increasing udp_encap_needed_key for v6 sockets in udp_tunnel_encap_enable(), and correspondingly decrease udp_encap_needed_key in udpv6_destroy_sock(). v1->v2: - add udp_encap_disable() and export it. v2->v3: - add the change for rxrpc and bareudp into one patch, as Alex suggested. v3->v4: - move rxrpc part to another patch. Acked-by: NWillem de Bruijn <willemb@google.com> Signed-off-by: NXin Long <lucien.xin@gmail.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
- 21 1月, 2021 1 次提交
-
-
由 Stanislav Fomichev 提交于
When we attach any cgroup hook, the rest (even if unused/unattached) start to contribute small overhead. In particular, the one we want to avoid is __cgroup_bpf_run_filter_skb which does two redirections to get to the cgroup and pushes/pulls skb. Let's split cgroup_bpf_enabled to be per-attach to make sure only used attach types trigger. I've dropped some existing high-level cgroup_bpf_enabled in some places because BPF_PROG_CGROUP_XXX_RUN macros usually have another cgroup_bpf_enabled check. I also had to copy-paste BPF_CGROUP_RUN_SA_PROG_LOCK for GETPEERNAME/GETSOCKNAME because type for cgroup_bpf_enabled[type] has to be constant and known at compile time. Signed-off-by: NStanislav Fomichev <sdf@google.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Acked-by: NSong Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20210115163501.805133-4-sdf@google.com
-
- 24 11月, 2020 1 次提交
-
-
由 Paul Moore 提交于
As pointed out by Herbert in a recent related patch, the LSM hooks do not have the necessary address family information to use the flowi struct safely. As none of the LSMs currently use any of the protocol specific flowi information, replace the flowi pointers with pointers to the address family independent flowi_common struct. Reported-by: NHerbert Xu <herbert@gondor.apana.org.au> Acked-by: NJames Morris <jamorris@linux.microsoft.com> Signed-off-by: NPaul Moore <paul@paul-moore.com>
-
- 15 11月, 2020 1 次提交
-
-
由 Eric Dumazet 提交于
These functions do not need to be exported. Signed-off-by: NEric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20201113113553.3411756-1-eric.dumazet@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
-
- 11 11月, 2020 1 次提交
-
-
由 Eric Dumazet 提交于
The skb is needed only to fetch the keys for the lookup. Both functions are used from GRO stack, we do not want accidental modification of the skb. Signed-off-by: NEric Dumazet <edumazet@google.com> Acked-by: NAlexander Lobakin <alobakin@pm.me> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
- 10 11月, 2020 1 次提交
-
-
由 Menglong Dong 提交于
When udp_memory_allocated is at the limit, __udp_enqueue_schedule_skb will return a -ENOBUFS, and skb will be dropped in __udp_queue_rcv_skb without any counters being done. It's hard to find out what happened once this happen. So we introduce a UDP_MIB_MEMERRORS to do this job. Well, this change looks friendly to the existing users, such as netstat: $ netstat -u -s Udp: 0 packets received 639 packets to unknown port received. 158689 packet receive errors 180022 packets sent RcvbufErrors: 20930 MemErrors: 137759 UdpLite: IpExt: InOctets: 257426235 OutOctets: 257460598 InNoECTPkts: 181177 v2: - Fix some alignment problems Signed-off-by: NMenglong Dong <dong.menglong@zte.com.cn> Link: https://lore.kernel.org/r/1604627354-43207-1-git-send-email-dong.menglong@zte.com.cnSigned-off-by: NJakub Kicinski <kuba@kernel.org>
-
- 31 10月, 2020 1 次提交
-
-
由 Xin Long 提交于
There is a chance that __udp4/6_lib_lookup() returns a udp encap sock in __udp_lib_err(), like the udp encap listening sock may use the same port as remote encap port, in which case it should go to __udp4/6_lib_err_encap() for more validation before processing the icmp packet. This patch is to check encap_type in __udp_lib_err() for the further validation for a encap sock. Signed-off-by: NXin Long <lucien.xin@gmail.com> Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: NJakub Kicinski <kuba@kernel.org>
-
- 31 7月, 2020 1 次提交
-
-
由 Jakub Sitnicki 提交于
When BPF sk lookup invokes reuseport handling for the selected socket, it should ignore the fact that reuseport group can contain connected UDP sockets. With BPF sk lookup this is not relevant as we are not scoring sockets to find the best match, which might be a connected UDP socket. Fix it by unconditionally accepting the socket selected by reuseport. This fixes the following two failures reported by test_progs. # ./test_progs -t sk_lookup ... #73/14 UDP IPv4 redir and reuseport with conns:FAIL ... #73/20 UDP IPv6 redir and reuseport with conns:FAIL ... Fixes: a57066b1 ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net") Reported-by: NAlexei Starovoitov <alexei.starovoitov@gmail.com> Signed-off-by: NJakub Sitnicki <jakub@cloudflare.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20200726120228.1414348-1-jakub@cloudflare.com
-
- 26 7月, 2020 1 次提交
-
-
由 Jakub Sitnicki 提交于
When BPF socket lookup prog selects a socket that belongs to a reuseport group, and the reuseport group has connected sockets in it, the socket selected by reuseport will be discarded, and socket returned by BPF socket lookup will be used instead. Modify this behavior so that the socket selected by reuseport running after BPF socket lookup always gets used. Ignore the fact that the reuseport group might have connections because it is only relevant when scoring sockets during regular hashtable-based lookup. Fixes: 72f7e944 ("udp: Run SK_LOOKUP BPF program on socket lookup") Fixes: 6d4201b1 ("udp6: Run SK_LOOKUP BPF program on socket lookup") Signed-off-by: NJakub Sitnicki <jakub@cloudflare.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Acked-by: NKuniyuki Iwashima <kuniyu@amazon.co.jp> Link: https://lore.kernel.org/bpf/20200722161720.940831-2-jakub@cloudflare.com
-
- 25 7月, 2020 2 次提交
-
-
由 Christoph Hellwig 提交于
Rework the remaining setsockopt code to pass a sockptr_t instead of a plain user pointer. This removes the last remaining set_fs(KERNEL_DS) outside of architecture specific code. Signed-off-by: NChristoph Hellwig <hch@lst.de> Acked-by: Stefan Schmidt <stefan@datenfreihafen.org> [ieee802154] Acked-by: NMatthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Christoph Hellwig 提交于
Pass a sockptr_t to prepare for set_fs-less handling of the kernel pointer from bpf-cgroup. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 22 7月, 2020 2 次提交
-
-
由 Miaohe Lin 提交于
We can't use IS_UDPLITE to replace udp_sk->pcflag when UDPLITE_RECV_CC is checked. Fixes: b2bf1e26 ("[UDP]: Clean up for IS_UDPLITE macro") Signed-off-by: NMiaohe Lin <linmiaohe@huawei.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Kuniyuki Iwashima 提交于
Currently, SO_REUSEPORT does not work well if connected sockets are in a UDP reuseport group. Then reuseport_has_conns() returns true and the result of reuseport_select_sock() is discarded. Also, unconnected sockets have the same score, hence only does the first unconnected socket in udp_hslot always receive all packets sent to unconnected sockets. So, the result of reuseport_select_sock() should be used for load balancing. The noteworthy point is that the unconnected sockets placed after connected sockets in sock_reuseport.socks will receive more packets than others because of the algorithm in reuseport_select_sock(). index | connected | reciprocal_scale | result --------------------------------------------- 0 | no | 20% | 40% 1 | no | 20% | 20% 2 | yes | 20% | 0% 3 | no | 20% | 40% 4 | yes | 20% | 0% If most of the sockets are connected, this can be a problem, but it still works better than now. Fixes: acdcecc6 ("udp: correct reuseport selection with connected sockets") CC: Willem de Bruijn <willemb@google.com> Reviewed-by: NBenjamin Herrenschmidt <benh@amazon.com> Signed-off-by: NKuniyuki Iwashima <kuniyu@amazon.co.jp> Acked-by: NWillem de Bruijn <willemb@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 20 7月, 2020 1 次提交
-
-
由 Christoph Hellwig 提交于
Handle the few cases that need special treatment in-line using in_compat_syscall(). This also removes all the now unused compat_{get,set}sockopt methods. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 18 7月, 2020 2 次提交
-
-
由 Jakub Sitnicki 提交于
Same as for udp4, let BPF program override the socket lookup result, by selecting a receiving socket of its choice or failing the lookup, if no connected UDP socket matched packet 4-tuple. Suggested-by: NMarek Majkowski <marek@cloudflare.com> Signed-off-by: NJakub Sitnicki <jakub@cloudflare.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Acked-by: NAndrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200717103536.397595-11-jakub@cloudflare.com
-
由 Jakub Sitnicki 提交于
Prepare for calling into reuseport from __udp6_lib_lookup as well. Signed-off-by: NJakub Sitnicki <jakub@cloudflare.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Acked-by: NAndrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/bpf/20200717103536.397595-10-jakub@cloudflare.com
-
- 14 7月, 2020 1 次提交
-
-
由 Andrew Lunn 提交于
Simple fixes which require no deep knowledge of the code. Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Signed-off-by: NAndrew Lunn <andrew@lunn.ch> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 31 3月, 2020 1 次提交
-
-
由 Joe Stringer 提交于
Refactor the UDP/TCP handlers slightly to allow skb_steal_sock() to make the determination of whether the socket is reference counted in the case where it is prefetched by earlier logic such as early_demux. Signed-off-by: NJoe Stringer <joe@wand.net.nz> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Acked-by: NMartin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20200329225342.16317-3-joe@wand.net.nz
-
- 15 1月, 2020 1 次提交
-
-
由 Jason A. Donenfeld 提交于
This is a straight-forward conversion case for the new function, iterating over the return value from udp_rcv_segment, which actually is a wrapper around skb_gso_segment. Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 31 10月, 2019 1 次提交
-
-
由 Eric Dumazet 提交于
This socket field can be read and written by concurrent cpus. Use READ_ONCE() and WRITE_ONCE() annotations to document this, and avoid some compiler 'optimizations'. KCSAN reported : BUG: KCSAN: data-race in tcp_v4_rcv / tcp_v4_rcv write to 0xffff88812220763c of 4 bytes by interrupt on cpu 0: sk_incoming_cpu_update include/net/sock.h:953 [inline] tcp_v4_rcv+0x1b3c/0x1bb0 net/ipv4/tcp_ipv4.c:1934 ip_protocol_deliver_rcu+0x4d/0x420 net/ipv4/ip_input.c:204 ip_local_deliver_finish+0x110/0x140 net/ipv4/ip_input.c:231 NF_HOOK include/linux/netfilter.h:305 [inline] NF_HOOK include/linux/netfilter.h:299 [inline] ip_local_deliver+0x133/0x210 net/ipv4/ip_input.c:252 dst_input include/net/dst.h:442 [inline] ip_rcv_finish+0x121/0x160 net/ipv4/ip_input.c:413 NF_HOOK include/linux/netfilter.h:305 [inline] NF_HOOK include/linux/netfilter.h:299 [inline] ip_rcv+0x18f/0x1a0 net/ipv4/ip_input.c:523 __netif_receive_skb_one_core+0xa7/0xe0 net/core/dev.c:5010 __netif_receive_skb+0x37/0xf0 net/core/dev.c:5124 process_backlog+0x1d3/0x420 net/core/dev.c:5955 napi_poll net/core/dev.c:6392 [inline] net_rx_action+0x3ae/0xa90 net/core/dev.c:6460 __do_softirq+0x115/0x33f kernel/softirq.c:292 do_softirq_own_stack+0x2a/0x40 arch/x86/entry/entry_64.S:1082 do_softirq.part.0+0x6b/0x80 kernel/softirq.c:337 do_softirq kernel/softirq.c:329 [inline] __local_bh_enable_ip+0x76/0x80 kernel/softirq.c:189 read to 0xffff88812220763c of 4 bytes by interrupt on cpu 1: sk_incoming_cpu_update include/net/sock.h:952 [inline] tcp_v4_rcv+0x181a/0x1bb0 net/ipv4/tcp_ipv4.c:1934 ip_protocol_deliver_rcu+0x4d/0x420 net/ipv4/ip_input.c:204 ip_local_deliver_finish+0x110/0x140 net/ipv4/ip_input.c:231 NF_HOOK include/linux/netfilter.h:305 [inline] NF_HOOK include/linux/netfilter.h:299 [inline] ip_local_deliver+0x133/0x210 net/ipv4/ip_input.c:252 dst_input include/net/dst.h:442 [inline] ip_rcv_finish+0x121/0x160 net/ipv4/ip_input.c:413 NF_HOOK include/linux/netfilter.h:305 [inline] NF_HOOK include/linux/netfilter.h:299 [inline] ip_rcv+0x18f/0x1a0 net/ipv4/ip_input.c:523 __netif_receive_skb_one_core+0xa7/0xe0 net/core/dev.c:5010 __netif_receive_skb+0x37/0xf0 net/core/dev.c:5124 process_backlog+0x1d3/0x420 net/core/dev.c:5955 napi_poll net/core/dev.c:6392 [inline] net_rx_action+0x3ae/0xa90 net/core/dev.c:6460 __do_softirq+0x115/0x33f kernel/softirq.c:292 run_ksoftirqd+0x46/0x60 kernel/softirq.c:603 smpboot_thread_fn+0x37d/0x4a0 kernel/smpboot.c:165 Reported by Kernel Concurrency Sanitizer on: CPU: 1 PID: 16 Comm: ksoftirqd/1 Not tainted 5.4.0-rc3+ #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Signed-off-by: NEric Dumazet <edumazet@google.com> Reported-by: Nsyzbot <syzkaller@googlegroups.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 03 10月, 2019 2 次提交
-
-
由 Josh Hunt 提交于
Prior to this change an application sending <= 1MSS worth of data and enabling UDP GSO would fail if the system had SW GSO enabled, but the same send would succeed if HW GSO offload is enabled. In addition to this inconsistency the error in the SW GSO case does not get back to the application if sending out of a real device so the user is unaware of this failure. With this change we only perform GSO if the # of segments is > 1 even if the application has enabled segmentation. I've also updated the relevant udpgso selftests. Fixes: bec1f6f6 ("udp: generate gso with UDP_SEGMENT") Signed-off-by: NJosh Hunt <johunt@akamai.com> Reviewed-by: NWillem de Bruijn <willemb@google.com> Reviewed-by: NAlexander Duyck <alexander.h.duyck@linux.intel.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Josh Hunt 提交于
Commit dfec0ee2 ("udp: Record gso_segs when supporting UDP segmentation offload") added gso_segs calculation, but incorrectly got sizeof() the pointer and not the underlying data type. In addition let's fix the v6 case. Fixes: bec1f6f6 ("udp: generate gso with UDP_SEGMENT") Fixes: dfec0ee2 ("udp: Record gso_segs when supporting UDP segmentation offload") Signed-off-by: NJosh Hunt <johunt@akamai.com> Reviewed-by: NAlexander Duyck <alexander.h.duyck@linux.intel.com> Acked-by: NWillem de Bruijn <willemb@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 16 9月, 2019 1 次提交
-
-
由 Willem de Bruijn 提交于
UDP reuseport groups can hold a mix unconnected and connected sockets. Ensure that connections only receive all traffic to their 4-tuple. Fast reuseport returns on the first reuseport match on the assumption that all matches are equal. Only if connections are present, return to the previous behavior of scoring all sockets. Record if connections are present and if so (1) treat such connected sockets as an independent match from the group, (2) only return 2-tuple matches from reuseport and (3) do not return on the first 2-tuple reuseport match to allow for a higher scoring match later. New field has_conns is set without locks. No other fields in the bitmap are modified at runtime and the field is only ever set unconditionally, so an RMW cannot miss a change. Fixes: e32ea7e7 ("soreuseport: fast reuseport UDP socket selection") Link: http://lkml.kernel.org/r/CA+FuTSfRP09aJNYRt04SS6qj22ViiOEWaWmLAwX0psk8-PGNxw@mail.gmail.comSigned-off-by: NWillem de Bruijn <willemb@google.com> Acked-by: NPaolo Abeni <pabeni@redhat.com> Acked-by: NCraig Gallek <kraig@google.com> Signed-off-by: NWillem de Bruijn <willemb@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 14 9月, 2019 1 次提交
-
-
由 Willem de Bruijn 提交于
Enable setting skb->mark for UDP and RAW sockets using cmsg. This is analogous to existing support for TOS, TTL, txtime, etc. Packet sockets already support this as of commit c7d39e32 ("packet: support per-packet fwmark for af_packet sendmsg"). Similar to other fields, implement by 1. initialize the sockcm_cookie.mark from socket option sk_mark 2. optionally overwrite this in ip_cmsg_send/ip6_datagram_send_ctl 3. initialize inet_cork.mark from sockcm_cookie.mark 4. initialize each (usually just one) skb->mark from inet_cork.mark Step 1 is handled in one location for most protocols by ipcm_init_sk as of commit 35178206 ("ipv4: ipcm_cookie initializers"). Signed-off-by: NWillem de Bruijn <willemb@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 09 7月, 2019 1 次提交
-
-
由 Willem de Bruijn 提交于
Processes can request ipv6 flowlabels with cmsg IPV6_FLOWINFO. If not set, by default an autogenerated flowlabel is selected. Explicit flowlabels require a control operation per label plus a datapath check on every connection (every datagram if unconnected). This is particularly expensive on unconnected sockets multiplexing many flows, such as QUIC. In the common case, where no lease is exclusive, the check can be safely elided, as both lease request and check trivially succeed. Indeed, autoflowlabel does the same even with exclusive leases. Elide the check if no process has requested an exclusive lease. fl6_sock_lookup previously returns either a reference to a lease or NULL to denote failure. Modify to return a real error and update all callers. On return NULL, they can use the label and will elide the atomic_dec in fl6_sock_release. This is an optimization. Robust applications still have to revert to requesting leases if the fast path fails due to an exclusive lease. Changes RFC->v1: - use static_key_false_deferred to rate limit jump label operations - call static_key_deferred_flush to stop timers on exit - move decrement out of RCU context - defer optimization also if opt data is associated with a lease - updated all fp6_sock_lookup callers, not just udp Signed-off-by: NWillem de Bruijn <willemb@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 06 7月, 2019 1 次提交
-
-
由 Li RongQing 提交于
the check parameter is never used Signed-off-by: NLi RongQing <lirongqing@baidu.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 15 6月, 2019 2 次提交
-
-
由 Tim Beale 提交于
This was originally passed through to the VRF logic in compute_score(). But that logic has now been replaced by udp_sk_bound_dev_eq() and so this code is no longer used or needed. Signed-off-by: NTim Beale <timbeale@catalyst.net.nz> Reviewed-by: NDavid Ahern <dsahern@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Tim Beale 提交于
Originally this was used by the VRF logic in compute_score(), but that was later replaced by udp_sk_bound_dev_eq() and the parameter became unused. Note this change adds an 'unused variable' compiler warning that will be removed in the next patch (I've split the removal in two to make review slightly easier). Signed-off-by: NTim Beale <timbeale@catalyst.net.nz> Reviewed-by: NDavid Ahern <dsahern@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 07 6月, 2019 1 次提交
-
-
由 Daniel Borkmann 提交于
Intention of cgroup bind/connect/sendmsg BPF hooks is to act transparently to applications as also stated in original motivation in 7828f20e ("Merge branch 'bpf-cgroup-bind-connect'"). When recently integrating the latter two hooks into Cilium to enable host based load-balancing with Kubernetes, I ran into the issue that pods couldn't start up as DNS got broken. Kubernetes typically sets up DNS as a service and is thus subject to load-balancing. Upon further debugging, it turns out that the cgroupv2 sendmsg BPF hooks API is currently insufficient and thus not usable as-is for standard applications shipped with most distros. To break down the issue we ran into with a simple example: # cat /etc/resolv.conf nameserver 147.75.207.207 nameserver 147.75.207.208 For the purpose of a simple test, we set up above IPs as service IPs and transparently redirect traffic to a different DNS backend server for that node: # cilium service list ID Frontend Backend 1 147.75.207.207:53 1 => 8.8.8.8:53 2 147.75.207.208:53 1 => 8.8.8.8:53 The attached BPF program is basically selecting one of the backends if the service IP/port matches on the cgroup hook. DNS breaks here, because the hooks are not transparent enough to applications which have built-in msg_name address checks: # nslookup 1.1.1.1 ;; reply from unexpected source: 8.8.8.8#53, expected 147.75.207.207#53 ;; reply from unexpected source: 8.8.8.8#53, expected 147.75.207.208#53 ;; reply from unexpected source: 8.8.8.8#53, expected 147.75.207.207#53 [...] ;; connection timed out; no servers could be reached # dig 1.1.1.1 ;; reply from unexpected source: 8.8.8.8#53, expected 147.75.207.207#53 ;; reply from unexpected source: 8.8.8.8#53, expected 147.75.207.208#53 ;; reply from unexpected source: 8.8.8.8#53, expected 147.75.207.207#53 [...] ; <<>> DiG 9.11.3-1ubuntu1.7-Ubuntu <<>> 1.1.1.1 ;; global options: +cmd ;; connection timed out; no servers could be reached For comparison, if none of the service IPs is used, and we tell nslookup to use 8.8.8.8 directly it works just fine, of course: # nslookup 1.1.1.1 8.8.8.8 1.1.1.1.in-addr.arpa name = one.one.one.one. In order to fix this and thus act more transparent to the application, this needs reverse translation on recvmsg() side. A minimal fix for this API is to add similar recvmsg() hooks behind the BPF cgroups static key such that the program can track state and replace the current sockaddr_in{,6} with the original service IP. From BPF side, this basically tracks the service tuple plus socket cookie in an LRU map where the reverse NAT can then be retrieved via map value as one example. Side-note: the BPF cgroups static key should be converted to a per-hook static key in future. Same example after this fix: # cilium service list ID Frontend Backend 1 147.75.207.207:53 1 => 8.8.8.8:53 2 147.75.207.208:53 1 => 8.8.8.8:53 Lookups work fine now: # nslookup 1.1.1.1 1.1.1.1.in-addr.arpa name = one.one.one.one. Authoritative answers can be found from: # dig 1.1.1.1 ; <<>> DiG 9.11.3-1ubuntu1.7-Ubuntu <<>> 1.1.1.1 ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 51550 ;; flags: qr rd ra ad; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1 ;; OPT PSEUDOSECTION: ; EDNS: version: 0, flags:; udp: 512 ;; QUESTION SECTION: ;1.1.1.1. IN A ;; AUTHORITY SECTION: . 23426 IN SOA a.root-servers.net. nstld.verisign-grs.com. 2019052001 1800 900 604800 86400 ;; Query time: 17 msec ;; SERVER: 147.75.207.207#53(147.75.207.207) ;; WHEN: Tue May 21 12:59:38 UTC 2019 ;; MSG SIZE rcvd: 111 And from an actual packet level it shows that we're using the back end server when talking via 147.75.207.20{7,8} front end: # tcpdump -i any udp [...] 12:59:52.698732 IP foo.42011 > google-public-dns-a.google.com.domain: 18803+ PTR? 1.1.1.1.in-addr.arpa. (38) 12:59:52.698735 IP foo.42011 > google-public-dns-a.google.com.domain: 18803+ PTR? 1.1.1.1.in-addr.arpa. (38) 12:59:52.701208 IP google-public-dns-a.google.com.domain > foo.42011: 18803 1/0/0 PTR one.one.one.one. (67) 12:59:52.701208 IP google-public-dns-a.google.com.domain > foo.42011: 18803 1/0/0 PTR one.one.one.one. (67) [...] In order to be flexible and to have same semantics as in sendmsg BPF programs, we only allow return codes in [1,1] range. In the sendmsg case the program is called if msg->msg_name is present which can be the case in both, connected and unconnected UDP. The former only relies on the sockaddr_in{,6} passed via connect(2) if passed msg->msg_name was NULL. Therefore, on recvmsg side, we act in similar way to call into the BPF program whenever a non-NULL msg->msg_name was passed independent of sk->sk_state being TCP_ESTABLISHED or not. Note that for TCP case, the msg->msg_name is ignored in the regular recvmsg path and therefore not relevant. For the case of ip{,v6}_recv_error() paths, picked up via MSG_ERRQUEUE, the hook is not called. This is intentional as it aligns with the same semantics as in case of TCP cgroup BPF hooks right now. This might be better addressed in future through a different bpf_attach_type such that this case can be distinguished from the regular recvmsg paths, for example. Fixes: 1cedee13 ("bpf: Hooks for sys_sendmsg") Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NAndrey Ignatov <rdna@fb.com> Acked-by: NMartin KaFai Lau <kafai@fb.com> Acked-by: NMartynas Pumputis <m@lambda.lt> Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
-
- 06 6月, 2019 1 次提交
-
-
由 Enrico Weigelt 提交于
IS_ERR() already calls unlikely(), so this extra unlikely() call around IS_ERR() is not needed. Signed-off-by: NEnrico Weigelt <info@metux.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 04 6月, 2019 2 次提交
-
-
由 Martin KaFai Lau 提交于
When the commit a6024562 ("udp: Add GRO functions to UDP socket") added udp[46]_lib_lookup_skb to the udp_gro code path, it broke the reuseport_select_sock() assumption that skb->data is pointing to the transport header. This patch follows an earlier __udp6_lib_err() fix by passing a NULL skb to avoid calling the reuseport's bpf_prog. Fixes: a6024562 ("udp: Add GRO functions to UDP socket") Cc: Tom Herbert <tom@herbertland.com> Signed-off-by: NMartin KaFai Lau <kafai@fb.com> Acked-by: NSong Liu <songliubraving@fb.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
-
由 Martin KaFai Lau 提交于
__udp6_lib_err() may be called when handling icmpv6 message. For example, the icmpv6 toobig(type=2). __udp6_lib_lookup() is then called which may call reuseport_select_sock(). reuseport_select_sock() will call into a bpf_prog (if there is one). reuseport_select_sock() is expecting the skb->data pointing to the transport header (udphdr in this case). For example, run_bpf_filter() is pulling the transport header. However, in the __udp6_lib_err() path, the skb->data is pointing to the ipv6hdr instead of the udphdr. One option is to pull and push the ipv6hdr in __udp6_lib_err(). Instead of doing this, this patch follows how the original commit 538950a1 ("soreuseport: setsockopt SO_ATTACH_REUSEPORT_[CE]BPF") was done in IPv4, which has passed a NULL skb pointer to reuseport_select_sock(). Fixes: 538950a1 ("soreuseport: setsockopt SO_ATTACH_REUSEPORT_[CE]BPF") Cc: Craig Gallek <kraig@google.com> Signed-off-by: NMartin KaFai Lau <kafai@fb.com> Acked-by: NSong Liu <songliubraving@fb.com> Acked-by: NCraig Gallek <kraig@google.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
-
- 31 5月, 2019 1 次提交
-
-
由 Thomas Gleixner 提交于
Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license as published by the free software foundation either version 2 of the license or at your option any later version extracted by the scancode license scanner the SPDX license identifier GPL-2.0-or-later has been chosen to replace the boilerplate/reference in 3029 file(s). Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NAllison Randal <allison@lohutok.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190527070032.746973796@linutronix.deSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 06 5月, 2019 2 次提交
-
-
由 Paolo Abeni 提交于
So that we avoid another indirect call per RX packet, if early demux is enabled. Signed-off-by: NPaolo Abeni <pabeni@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Paolo Abeni 提交于
So that we avoid another indirect call per RX packet in the common case. Signed-off-by: NPaolo Abeni <pabeni@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 13 4月, 2019 1 次提交
-
-
由 Tetsuo Handa 提交于
KMSAN will complain if valid address length passed to udpv6_pre_connect() is shorter than sizeof("struct sockaddr"->sa_family) bytes. (This patch is bogus if it is guaranteed that udpv6_pre_connect() is always called after checking "struct sockaddr"->sa_family. In that case, we want a comment why we don't need to check valid address length here.) Signed-off-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Acked-by: NSong Liu <songliubraving@fb.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 09 4月, 2019 1 次提交
-
-
由 Paolo Abeni 提交于
After commit a297569f ("net/udp: do not touch skb->peeked unless really needed") the 'peeked' argument of __skb_try_recv_datagram() and friends is always equal to !!'flags & MSG_PEEK'. Since such argument is really a boolean info, and the callers have already 'flags & MSG_PEEK' handy, we can remove it and clean-up the code a bit. Signed-off-by: NPaolo Abeni <pabeni@redhat.com> Acked-by: NWillem de Bruijn <willemb@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-