- 22 4月, 2022 2 次提交
-
-
由 Kuniyuki Iwashima 提交于
Since commit 9fe516ba ("inet: move ipv6only in sock_common"), ipv6_only_sock() and __ipv6_only_sock() are the same macro. Let's remove the one. Signed-off-by: NKuniyuki Iwashima <kuniyu@amazon.co.jp> Reviewed-by: NDavid Ahern <dsahern@kernel.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Arun Ajith S 提交于
Fix mistake in the original patch where limits were specified but the handler didn't take care of the limits. Signed-off-by: NArun Ajith S <aajith@arista.com> Reviewed-by: NDavid Ahern <dsahern@kernel.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 17 4月, 2022 1 次提交
-
-
由 Arun Ajith S 提交于
Add a new neighbour cache entry in STALE state for routers on receiving an unsolicited (gratuitous) neighbour advertisement with target link-layer-address option specified. This is similar to the arp_accept configuration for IPv4. A new sysctl endpoint is created to turn on this behaviour: /proc/sys/net/ipv6/conf/interface/accept_unsolicited_na. Signed-off-by: NArun Ajith S <aajith@arista.com> Reviewed-by: NDavid Ahern <dsahern@kernel.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 16 4月, 2022 2 次提交
-
-
由 Eric Dumazet 提交于
Reads and Writes to ip6_rt_gc_expire always have been racy, as syzbot reported lately [1] There is a possible risk of under-flow, leading to unexpected high value passed to fib6_run_gc(), although I have not observed this in the field. Hosts hitting ip6_dst_gc() very hard are under pretty bad state anyway. [1] BUG: KCSAN: data-race in ip6_dst_gc / ip6_dst_gc read-write to 0xffff888102110744 of 4 bytes by task 13165 on cpu 1: ip6_dst_gc+0x1f3/0x220 net/ipv6/route.c:3311 dst_alloc+0x9b/0x160 net/core/dst.c:86 ip6_dst_alloc net/ipv6/route.c:344 [inline] icmp6_dst_alloc+0xb2/0x360 net/ipv6/route.c:3261 mld_sendpack+0x2b9/0x580 net/ipv6/mcast.c:1807 mld_send_cr net/ipv6/mcast.c:2119 [inline] mld_ifc_work+0x576/0x800 net/ipv6/mcast.c:2651 process_one_work+0x3d3/0x720 kernel/workqueue.c:2289 worker_thread+0x618/0xa70 kernel/workqueue.c:2436 kthread+0x1a9/0x1e0 kernel/kthread.c:376 ret_from_fork+0x1f/0x30 read-write to 0xffff888102110744 of 4 bytes by task 11607 on cpu 0: ip6_dst_gc+0x1f3/0x220 net/ipv6/route.c:3311 dst_alloc+0x9b/0x160 net/core/dst.c:86 ip6_dst_alloc net/ipv6/route.c:344 [inline] icmp6_dst_alloc+0xb2/0x360 net/ipv6/route.c:3261 mld_sendpack+0x2b9/0x580 net/ipv6/mcast.c:1807 mld_send_cr net/ipv6/mcast.c:2119 [inline] mld_ifc_work+0x576/0x800 net/ipv6/mcast.c:2651 process_one_work+0x3d3/0x720 kernel/workqueue.c:2289 worker_thread+0x618/0xa70 kernel/workqueue.c:2436 kthread+0x1a9/0x1e0 kernel/kthread.c:376 ret_from_fork+0x1f/0x30 value changed: 0x00000bb3 -> 0x00000ba9 Reported by Kernel Concurrency Sanitizer on: CPU: 0 PID: 11607 Comm: kworker/0:21 Not tainted 5.18.0-rc1-syzkaller-00037-g42e7a03d-dirty #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Workqueue: mld mld_ifc_work Fixes: 1da177e4 ("Linux-2.6.12-rc2") Signed-off-by: NEric Dumazet <edumazet@google.com> Reported-by: Nsyzbot <syzkaller@googlegroups.com> Reviewed-by: NDavid Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20220413181333.649424-1-eric.dumazet@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Eric Dumazet 提交于
idev can be NULL, as the surrounding code suggests. Fixes: 4daf841a ("net: ipv6: add skb drop reasons to ip6_rcv_core()") Signed-off-by: NEric Dumazet <edumazet@google.com> Cc: Menglong Dong <imagedong@tencent.com> Cc: Jiang Biao <benbjiang@tencent.com> Cc: Hao Peng <flyingpeng@tencent.com> Link: https://lore.kernel.org/r/20220413205653.1178458-1-eric.dumazet@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
-
- 15 4月, 2022 2 次提交
-
-
由 Peilin Ye 提交于
Feng reported an skb_under_panic BUG triggered by running test_ip6gretap() in tools/testing/selftests/bpf/test_tunnel.sh: [ 82.492551] skbuff: skb_under_panic: text:ffffffffb268bb8e len:403 put:12 head:ffff9997c5480000 data:ffff9997c547fff8 tail:0x18b end:0x2c0 dev:ip6gretap11 <...> [ 82.607380] Call Trace: [ 82.609389] <TASK> [ 82.611136] skb_push.cold.109+0x10/0x10 [ 82.614289] __gre6_xmit+0x41e/0x590 [ 82.617169] ip6gre_tunnel_xmit+0x344/0x3f0 [ 82.620526] dev_hard_start_xmit+0xf1/0x330 [ 82.623882] sch_direct_xmit+0xe4/0x250 [ 82.626961] __dev_queue_xmit+0x720/0xfe0 <...> [ 82.633431] packet_sendmsg+0x96a/0x1cb0 [ 82.636568] sock_sendmsg+0x30/0x40 <...> The following sequence of events caused the BUG: 1. During ip6gretap device initialization, tunnel->tun_hlen (e.g. 4) is calculated based on old flags (see ip6gre_calc_hlen()); 2. packet_snd() reserves header room for skb A, assuming tunnel->tun_hlen is 4; 3. Later (in clsact Qdisc), the eBPF program sets a new tunnel key for skb A using bpf_skb_set_tunnel_key() (see _ip6gretap_set_tunnel()); 4. __gre6_xmit() detects the new tunnel key, and recalculates "tun_hlen" (e.g. 12) based on new flags (e.g. TUNNEL_KEY and TUNNEL_SEQ); 5. gre_build_header() calls skb_push() with insufficient reserved header room, triggering the BUG. As sugguested by Cong, fix it by moving the call to skb_cow_head() after the recalculation of tun_hlen. Reproducer: OBJ=$LINUX/tools/testing/selftests/bpf/test_tunnel_kern.o ip netns add at_ns0 ip link add veth0 type veth peer name veth1 ip link set veth0 netns at_ns0 ip netns exec at_ns0 ip addr add 172.16.1.100/24 dev veth0 ip netns exec at_ns0 ip link set dev veth0 up ip link set dev veth1 up mtu 1500 ip addr add dev veth1 172.16.1.200/24 ip netns exec at_ns0 ip addr add ::11/96 dev veth0 ip netns exec at_ns0 ip link set dev veth0 up ip addr add dev veth1 ::22/96 ip link set dev veth1 up ip netns exec at_ns0 \ ip link add dev ip6gretap00 type ip6gretap seq flowlabel 0xbcdef key 2 \ local ::11 remote ::22 ip netns exec at_ns0 ip addr add dev ip6gretap00 10.1.1.100/24 ip netns exec at_ns0 ip addr add dev ip6gretap00 fc80::100/96 ip netns exec at_ns0 ip link set dev ip6gretap00 up ip link add dev ip6gretap11 type ip6gretap external ip addr add dev ip6gretap11 10.1.1.200/24 ip addr add dev ip6gretap11 fc80::200/24 ip link set dev ip6gretap11 up tc qdisc add dev ip6gretap11 clsact tc filter add dev ip6gretap11 egress bpf da obj $OBJ sec ip6gretap_set_tunnel tc filter add dev ip6gretap11 ingress bpf da obj $OBJ sec ip6gretap_get_tunnel ping6 -c 3 -w 10 -q ::11 Fixes: 6712abc1 ("ip6_gre: add ip6 gre and gretap collect_md mode") Reported-by: NFeng Zhou <zhoufeng.zf@bytedance.com> Co-developed-by: NCong Wang <cong.wang@bytedance.com> Signed-off-by: NCong Wang <cong.wang@bytedance.com> Signed-off-by: NPeilin Ye <peilin.ye@bytedance.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Peilin Ye 提交于
Do not update tunnel->tun_hlen in data plane code. Use a local variable instead, just like "tunnel_hlen" in net/ipv4/ip_gre.c:gre_fb_xmit(). Co-developed-by: NCong Wang <cong.wang@bytedance.com> Signed-off-by: NCong Wang <cong.wang@bytedance.com> Signed-off-by: NPeilin Ye <peilin.ye@bytedance.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 13 4月, 2022 9 次提交
-
-
由 Menglong Dong 提交于
Replace kfree_skb() used in ip6_protocol_deliver_rcu() with kfree_skb_reason(). No new reasons are added. Some paths are ignored, as they are not common, such as encapsulation on non-final protocol. Signed-off-by: NMenglong Dong <imagedong@tencent.com> Reviewed-by: NJiang Biao <benbjiang@tencent.com> Reviewed-by: NHao Peng <flyingpeng@tencent.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Menglong Dong 提交于
Replace kfree_skb() used in ip6_rcv_core() with kfree_skb_reason(). No new drop reasons are added. Seems now we use 'SKB_DROP_REASON_IP_INHDR' for too many case during ipv6 header parse or check, just like what 'IPSTATS_MIB_INHDRERRORS' do. Will it be too general and hard to know what happened? Signed-off-by: NMenglong Dong <imagedong@tencent.com> Reviewed-by: NJiang Biao <benbjiang@tencent.com> Reviewed-by: NHao Peng <flyingpeng@tencent.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Menglong Dong 提交于
Replace kfree_skb() used in TLV encoded option header parsing with kfree_skb_reason(). Following functions are involved: ip6_parse_tlv() ipv6_hop_ra() ipv6_hop_ioam() ipv6_hop_jumbo() ipv6_hop_calipso() ipv6_dest_hao() Most skb drops during this process are regarded as 'InHdrErrors', as 'IPSTATS_MIB_INHDRERRORS' is used when ip6_parse_tlv() fails, which make we use 'SKB_DROP_REASON_IP_INHDR' correspondingly. However, 'IP_INHDR' is a relatively general reason. Therefore, we can use other reasons with higher priority in some cases. For example, 'SKB_DROP_REASON_UNHANDLED_PROTO' is used for unknown TLV options. Signed-off-by: NMenglong Dong <imagedong@tencent.com> Reviewed-by: NJiang Biao <benbjiang@tencent.com> Reviewed-by: NHao Peng <flyingpeng@tencent.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Menglong Dong 提交于
There are two call chains for ipv6_hop_jumbo(). The first one is: ipv6_destopt_rcv() -> ip6_parse_tlv() -> ipv6_hop_jumbo() On this call chain, the drop statistics will be done in ipv6_destopt_rcv() with 'IPSTATS_MIB_INHDRERRORS' if ipv6_hop_jumbo() returns false. The second call chain is: ip6_rcv_core() -> ipv6_parse_hopopts() -> ip6_parse_tlv() And the drop statistics will also be done in ip6_rcv_core() with 'IPSTATS_MIB_INHDRERRORS' if ipv6_hop_jumbo() returns false. Therefore, the statistics in ipv6_hop_jumbo() is redundant, which means the drop is counted twice. The statistics in ipv6_hop_jumbo() is almost the same as the outside, except the 'IPSTATS_MIB_INTRUNCATEDPKTS', which seems that we have to ignore it. Signed-off-by: NMenglong Dong <imagedong@tencent.com> Reviewed-by: NJiang Biao <benbjiang@tencent.com> Reviewed-by: NHao Peng <flyingpeng@tencent.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Menglong Dong 提交于
In order to add the skb drop reasons support to icmpv6_param_prob(), introduce the function icmpv6_param_prob_reason() and make icmpv6_param_prob() an inline call to it. This new function will be used in the following patches. Signed-off-by: NMenglong Dong <imagedong@tencent.com> Reviewed-by: NJiang Biao <benbjiang@tencent.com> Reviewed-by: NHao Peng <flyingpeng@tencent.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Menglong Dong 提交于
Replace kfree_skb() which is used in ip6_forward() and ip_forward() with kfree_skb_reason(). The new drop reason 'SKB_DROP_REASON_PKT_TOO_BIG' is introduced for the case that the length of the packet exceeds MTU and can't fragment. Signed-off-by: NMenglong Dong <imagedong@tencent.com> Reviewed-by: NJiang Biao <benbjiang@tencent.com> Reviewed-by: NHao Peng <flyingpeng@tencent.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Menglong Dong 提交于
Replace kfree_skb() used in ip6_pkt_drop() with kfree_skb_reason(). No new reason is added. Signed-off-by: NMenglong Dong <imagedong@tencent.com> Reviewed-by: NJiang Biao <benbjiang@tencent.com> Reviewed-by: NHao Peng <flyingpeng@tencent.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Guo Zhengkui 提交于
Address the following coccicheck warning: net/ipv6/exthdrs.c:620:44-45: WARNING opportunity for swap() by using swap() for the swapping of variable values and drop the tmp (`addr`) variable that is not needed any more. Signed-off-by: NGuo Zhengkui <guozhengkui@vivo.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Sabrina Dubroca 提交于
Commit ebe48d36 ("esp: Fix possible buffer overflow in ESP transformation") tried to fix skb_page_frag_refill usage in ESP by capping allocsize to 32k, but that doesn't completely solve the issue, as skb_page_frag_refill may return a single page. If that happens, we will write out of bounds, despite the check introduced in the previous patch. This patch forces COW in cases where we would end up calling skb_page_frag_refill with a size larger than a page (first in esp_output_head with tailen, then in esp_output_tail with skb->data_len). Fixes: cac2661c ("esp4: Avoid skb_cow_data whenever possible") Fixes: 03e2a30f ("esp6: Avoid skb_cow_data whenever possible") Signed-off-by: NSabrina Dubroca <sd@queasysnail.net> Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
-
- 12 4月, 2022 1 次提交
-
-
由 Oliver Hartkopp 提交于
The internal recvmsg() functions have two parameters 'flags' and 'noblock' that were merged inside skb_recv_datagram(). As a follow up patch to commit f4b41f06 ("net: remove noblock parameter from skb_recv_datagram()") this patch removes the separate 'noblock' parameter for recvmsg(). Analogue to the referenced patch for skb_recv_datagram() the 'flags' and 'noblock' parameters are unnecessarily split up with e.g. err = sk->sk_prot->recvmsg(sk, msg, size, flags & MSG_DONTWAIT, flags & ~MSG_DONTWAIT, &addr_len); or in err = INDIRECT_CALL_2(sk->sk_prot->recvmsg, tcp_recvmsg, udp_recvmsg, sk, msg, size, flags & MSG_DONTWAIT, flags & ~MSG_DONTWAIT, &addr_len); instead of simply using only flags all the time and check for MSG_DONTWAIT where needed (to preserve for the formerly separated no(n)block condition). Signed-off-by: NOliver Hartkopp <socketcan@hartkopp.net> Link: https://lore.kernel.org/r/20220411124955.154876-1-socketcan@hartkopp.netSigned-off-by: NPaolo Abeni <pabeni@redhat.com>
-
- 11 4月, 2022 3 次提交
-
-
由 Nicolas Dichtel 提交于
kongweibin reported a kernel panic in ip6_forward() when input interface has no in6 dev associated. The following tc commands were used to reproduce this panic: tc qdisc del dev vxlan100 root tc qdisc add dev vxlan100 root netem corrupt 5% CC: stable@vger.kernel.org Fixes: ccd27f05 ("ipv6: fix 'disable_policy' for fwd packets") Reported-by: Nkongweibin <kongweibin2@huawei.com> Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com> Reviewed-by: NDavid Ahern <dsahern@kernel.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Pablo Neira Ayuso 提交于
If policy-based routing using the iif selector is used, then the fib expression fails to look up for the reverse path from the prerouting hook because the input interface cannot be inferred. In order to support this scenario, extend the fib expression to allow to use after the route lookup, from the forward hook. This patch also adds support for the input hook for usability reasons. Since the prerouting hook cannot be used for the scenario described above, users need two rules: one for the forward chain and another rule for the input chain to check for the reverse path check for locally targeted traffic. Acked-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
由 Menglong Dong 提交于
Replace kfree_skb() used in icmp_rcv() and icmpv6_rcv() with kfree_skb_reason(). In order to get the reasons of the skb drops after icmp message handle, we change the return type of 'handler()' in 'struct icmp_control' from 'bool' to 'enum skb_drop_reason'. This may change its original intention, as 'false' means failure, but 'SKB_NOT_DROPPED_YET' means success now. Therefore, all 'handler' and the call of them need to be handled. Following 'handler' functions are involved: icmp_unreach() icmp_redirect() icmp_echo() icmp_timestamp() icmp_discard() And following new drop reasons are added: SKB_DROP_REASON_ICMP_CSUM SKB_DROP_REASON_INVALID_PROTO The reason 'INVALID_PROTO' is introduced for the case that the packet doesn't follow rfc 1122 and is dropped. This is not a common case, and I believe we can locate the problem from the data in the packet. For now, this 'INVALID_PROTO' is used for the icmp broadcasts with wrong types. Maybe there should be a document file for these reasons. For example, list all the case that causes the 'UNHANDLED_PROTO' and 'INVALID_PROTO' drop reason. Therefore, users can locate their problems according to the document. Reviewed-by: NHao Peng <flyingpeng@tencent.com> Reviewed-by: NJiang Biao <benbjiang@tencent.com> Signed-off-by: NMenglong Dong <imagedong@tencent.com> Reviewed-by: NDavid Ahern <dsahern@kernel.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 08 4月, 2022 1 次提交
-
-
由 Jeffrey Ji 提交于
Increment rx_otherhost_dropped counter when packet dropped due to mismatched dest MAC addr. An example when this drop can occur is when manually crafting raw packets that will be consumed by a user space application via a tap device. For testing purposes local traffic was generated using trafgen for the client and netcat to start a server Tested: Created 2 netns, sent 1 packet using trafgen from 1 to the other with "{eth(daddr=$INCORRECT_MAC...}", verified that iproute2 showed the counter was incremented. (Also had to modify iproute2 to show the stat, additional patch for that coming next.) Signed-off-by: NJeffrey Ji <jeffreyji@google.com> Reviewed-by: NBrian Vazquez <brianvv@google.com> Reviewed-by: NEric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20220406172600.1141083-1-jeffreyjilinux@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
-
- 07 4月, 2022 2 次提交
-
-
由 Niels Dossche 提交于
idev->addr_list needs to be protected by idev->lock. However, it is not always possible to do so while iterating and performing actions on inet6_ifaddr instances. For example, multiple functions (like addrconf_{join,leave}_anycast) eventually call down to other functions that acquire the idev->lock. The current code temporarily unlocked the idev->lock during the loops, which can cause race conditions. Moving the locks up is also not an appropriate solution as the ordering of lock acquisition will be inconsistent with for example mc_lock. This solution adds an additional field to inet6_ifaddr that is used to temporarily add the instances to a temporary list while holding idev->lock. The temporary list can then be traversed without holding idev->lock. This change was done in two places. In addrconf_ifdown, the list_for_each_entry_safe variant of the list loop is also no longer necessary as there is no deletion within that specific loop. Suggested-by: NPaolo Abeni <pabeni@redhat.com> Signed-off-by: NNiels Dossche <dossche.niels@gmail.com> Acked-by: NPaolo Abeni <pabeni@redhat.com> Link: https://lore.kernel.org/r/20220403231523.45843-1-dossche.niels@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Eric Dumazet 提交于
We had various bugs over the years with code breaking the assumption that tp->snd_cwnd is greater than zero. Lately, syzbot reported the WARN_ON_ONCE(!tp->prior_cwnd) added in commit 8b8a321f ("tcp: fix zero cwnd in tcp_cwnd_reduction") can trigger, and without a repro we would have to spend considerable time finding the bug. Instead of complaining too late, we want to catch where and when tp->snd_cwnd is set to an illegal value. Signed-off-by: NEric Dumazet <edumazet@google.com> Suggested-by: NYuchung Cheng <ycheng@google.com> Cc: Neal Cardwell <ncardwell@google.com> Acked-by: NYuchung Cheng <ycheng@google.com> Link: https://lore.kernel.org/r/20220405233538.947344-1-eric.dumazet@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
-
- 06 4月, 2022 3 次提交
-
-
由 Hongbin Wang 提交于
There is a same action when the variable is initialized Signed-off-by: NHongbin Wang <wh_bin@126.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Florian Westphal 提交于
net/ipv6/ip6mr.c:1656:14: warning: unused variable 'do_wrmifwhole' Move it to the CONFIG_IPV6_PIMSM_V2 scope where its used. Fixes: 4b340a5a ("net: ip6mr: add support for passing full packet on wrong mif") Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Oliver Hartkopp 提交于
skb_recv_datagram() has two parameters 'flags' and 'noblock' that are merged inside skb_recv_datagram() by 'flags | (noblock ? MSG_DONTWAIT : 0)' As 'flags' may contain MSG_DONTWAIT as value most callers split the 'flags' into 'flags' and 'noblock' with finally obsolete bit operations like this: skb_recv_datagram(sk, flags & ~MSG_DONTWAIT, flags & MSG_DONTWAIT, &rc); And this is not even done consistently with the 'flags' parameter. This patch removes the obsolete and costly splitting into two parameters and only performs bit operations when really needed on the caller side. One missing conversion thankfully reported by kernel test robot. I missed to enable kunit tests to build the mctp code. Reported-by: Nkernel test robot <lkp@intel.com> Signed-off-by: NOliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 05 4月, 2022 1 次提交
-
-
由 David Ahern 提交于
VRF devices are the loopbacks for VRFs, and a loopback can not be assigned to a VRF. Accordingly, the condition in ip6_pkt_drop should be '||' not '&&'. Fixes: 1d3fd8a1 ("vrf: Use orig netdev to count Ip6InNoRoutes and a fresh route lookup when sending dest unreach") Reported-by: NPudak, Filip <Filip.Pudak@windriver.com> Reported-by: NXiao, Jiguang <Jiguang.Xiao@windriver.com> Signed-off-by: NDavid Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20220404150908.2937-1-dsahern@kernel.orgSigned-off-by: NPaolo Abeni <pabeni@redhat.com>
-
- 20 3月, 2022 2 次提交
-
-
由 Florian Westphal 提交于
The fib expression stores to a register, so we can't add empty stub. Check that the register that is being written is in fact redundant. In most cases, this is expected to cancel tracking as re-use is unlikely. Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
由 Pablo Neira Ayuso 提交于
Skip register tracking for expressions that perform read-only operations on the registers. Define and use a cookie pointer NFT_REDUCE_READONLY to avoid defining stubs for these expressions. This patch re-enables register tracking which was disabled in ed5f85d4 ("netfilter: nf_tables: disable register tracking"). Follow up patches add remaining register tracking for existing expressions. Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
- 16 3月, 2022 1 次提交
-
-
由 David Ahern 提交于
The fundamental premise of VRF and l3mdev core code is binding a socket to a device (l3mdev or netdev with an L3 domain) to indicate L3 scope. Legacy code resets flowi_oif to the l3mdev losing any original port device binding. Ben (among others) has demonstrated use cases where the original port device binding is important and needs to be retained. This patch handles that by adding a new entry to the common flow struct that can indicate the l3mdev index for later rule and table matching avoiding the need to reset flowi_oif. In addition to allowing more use cases that require port device binds, this patch brings a few datapath simplications: 1. l3mdev_fib_rule_match is only called when walking fib rules and always after l3mdev_update_flow. That allows an optimization to bail early for non-VRF type uses cases when flowi_l3mdev is not set. Also, only that index needs to be checked for the FIB table id. 2. l3mdev_update_flow can be called with flowi_oif set to a l3mdev (e.g., VRF) device. By resetting flowi_oif only for this case the FLOWI_FLAG_SKIP_NH_OIF flag is not longer needed and can be removed, removing several checks in the datapath. The flowi_iif path can be simplified to only be called if the it is not loopback (loopback can not be assigned to an L3 domain) and the l3mdev index is not already set. 3. Avoid another device lookup in the output path when the fib lookup returns a reject failure. Note: 2 functional tests for local traffic with reject fib rules are updated to reflect the new direct failure at FIB lookup time for ping rather than the failure on packet path. The current code fails like this: HINT: Fails since address on vrf device is out of device scope COMMAND: ip netns exec ns-A ping -c1 -w1 -I eth1 172.16.3.1 ping: Warning: source address might be selected on device other than: eth1 PING 172.16.3.1 (172.16.3.1) from 172.16.3.1 eth1: 56(84) bytes of data. --- 172.16.3.1 ping statistics --- 1 packets transmitted, 0 received, 100% packet loss, time 0ms where the test now directly fails: HINT: Fails since address on vrf device is out of device scope COMMAND: ip netns exec ns-A ping -c1 -w1 -I eth1 172.16.3.1 ping: connect: No route to host Signed-off-by: NDavid Ahern <dsahern@kernel.org> Tested-by: NBen Greear <greearb@candelatech.com> Link: https://lore.kernel.org/r/20220314204551.16369-1-dsahern@kernel.orgSigned-off-by: NJakub Kicinski <kuba@kernel.org>
-
- 14 3月, 2022 1 次提交
-
-
由 Sabrina Dubroca 提交于
Commit 5f9c55c8 ("ipv6: check return value of ipv6_skip_exthdr") introduced an incorrect check, which leads to all ESP packets over either TCPv6 or UDPv6 encapsulation being dropped. In this particular case, offset is negative, since skb->data points to the ESP header in the following chain of headers, while skb->network_header points to the IPv6 header: IPv6 | ext | ... | ext | UDP | ESP | ... That doesn't seem to be a problem, especially considering that if we reach esp6_input_done2, we're guaranteed to have a full set of headers available (otherwise the packet would have been dropped earlier in the stack). However, it means that the return value will (intentionally) be negative. We can make the test more specific, as the expected return value of ipv6_skip_exthdr will be the (negated) size of either a UDP header, or a TCP header with possible options. In the future, we should probably either make ipv6_skip_exthdr explicitly accept negative offsets (and adjust its return value for error cases), or make ipv6_skip_exthdr only take non-negative offsets (and audit all callers). Fixes: 5f9c55c8 ("ipv6: check return value of ipv6_skip_exthdr") Reported-by: NXiumei Mu <xmu@redhat.com> Signed-off-by: NSabrina Dubroca <sd@queasysnail.net> Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
-
- 12 3月, 2022 1 次提交
-
-
由 Tadeusz Struk 提交于
Syzbot found a kernel bug in the ipv6 stack: LINK: https://syzkaller.appspot.com/bug?id=205d6f11d72329ab8d62a610c44c5e7e25415580 The reproducer triggers it by sending a crafted message via sendmmsg() call, which triggers skb_over_panic, and crashes the kernel: skbuff: skb_over_panic: text:ffffffff84647fb4 len:65575 put:65575 head:ffff888109ff0000 data:ffff888109ff0088 tail:0x100af end:0xfec0 dev:<NULL> Update the check that prevents an invalid packet with MTU equal to the fregment header size to eat up all the space for payload. The reproducer can be found here: LINK: https://syzkaller.appspot.com/text?tag=ReproC&x=1648c83fb00000 Reported-by: syzbot+e223cf47ec8ae183f2a0@syzkaller.appspotmail.com Signed-off-by: NTadeusz Struk <tadeusz.struk@linaro.org> Acked-by: NWillem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/r/20220310232538.1044947-1-tadeusz.struk@linaro.orgSigned-off-by: NJakub Kicinski <kuba@kernel.org>
-
- 09 3月, 2022 1 次提交
-
-
由 Jakub Kicinski 提交于
We have a number of cases where function returns drop/no drop decision as a boolean. Now that we want to report the reason code as well we have to pass extra output arguments. We can make the reason code evaluate correctly as bool. I believe we're good to reorder the reasons as they are reported to user space as strings. Signed-off-by: NJakub Kicinski <kuba@kernel.org> Reviewed-by: NDavid Ahern <dsahern@kernel.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 07 3月, 2022 3 次提交
-
-
由 Steffen Klassert 提交于
The esp tunnel GSO handlers use skb_mac_gso_segment to push the inner packet to the segmentation handlers. However, skb_mac_gso_segment takes the Ethernet Protocol ID from 'skb->protocol' which is wrong for inter address family tunnels. We fix this by introducing a new skb_eth_gso_segment function. This function can be used if it is necessary to pass the Ethernet Protocol ID directly to the segmentation handler. First users of this function will be the esp4 and esp6 tunnel segmentation handlers. Fixes: c35fe410 ("xfrm: Add mode handlers for IPsec on layer 2") Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
-
由 Steffen Klassert 提交于
The xfrm{4,6}_beet_gso_segment() functions did not correctly set the SKB_GSO_IPXIP4 and SKB_GSO_IPXIP6 gso types for the address family tunneling case. Fix this by setting these gso types. Fixes: 384a46ea ("esp4: add gso_segment for esp4 beet mode") Fixes: 7f9e40eb ("esp6: add gso_segment for esp6 beet mode") Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
-
由 Steffen Klassert 提交于
The maximum message size that can be send is bigger than the maximum site that skb_page_frag_refill can allocate. So it is possible to write beyond the allocated buffer. Fix this by doing a fallback to COW in that case. v2: Avoid get get_order() costs as suggested by Linus Torvalds. Fixes: cac2661c ("esp4: Avoid skb_cow_data whenever possible") Fixes: 03e2a30f ("esp6: Avoid skb_cow_data whenever possible") Reported-by: Nvalis <sec@valis.email> Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
-
- 04 3月, 2022 1 次提交
-
-
由 Eric Dumazet 提交于
While investigating on why a synchronize_net() has been added recently in ipv6_mc_down(), I found that igmp6_event_query() and igmp6_event_report() might drop skbs in some cases. Discussion about removing synchronize_net() from ipv6_mc_down() will happen in a different thread. Fixes: f185de28 ("mld: add new workqueues for process mld events") Signed-off-by: NEric Dumazet <edumazet@google.com> Cc: Taehee Yoo <ap420073@gmail.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20220303173728.937869-1-eric.dumazet@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
-
- 03 3月, 2022 3 次提交
-
-
由 Martin KaFai Lau 提交于
The previous patches handled the delivery_time in the ingress path before the routing decision is made. This patch can postpone clearing delivery_time in a skb until knowing it is delivered locally and also set the (rcv) timestamp if needed. This patch moves the skb_clear_delivery_time() from dev.c to ip_local_deliver_finish() and ip6_input_finish(). Signed-off-by: NMartin KaFai Lau <kafai@fb.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Martin KaFai Lau 提交于
IOAM is a hop-by-hop option with a temporary iana allocation (49). Since it is hop-by-hop, it is done before the input routing decision. One of the traced data field is the (rcv) timestamp. When the locally generated skb is looping from egress to ingress over a virtual interface (e.g. veth, loopback...), skb->tstamp may have the delivery time before it is known that it will be delivered locally and received by another sk. Like handling the network tapping (tcpdump) in the earlier patch, this patch gets the timestamp if needed without over-writing the delivery_time in the skb->tstamp. skb_tstamp_cond() is added to do the ktime_get_real() with an extra cond arg to check on top of the netstamp_needed_key static key. skb_tstamp_cond() will also be used in a latter patch and it needs the netstamp_needed_key check. Signed-off-by: NMartin KaFai Lau <kafai@fb.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Martin KaFai Lau 提交于
A latter patch will postpone the delivery_time clearing until the stack knows the skb is being delivered locally (i.e. calling skb_clear_delivery_time() at ip_local_deliver_finish() for IPv4 and at ip6_input_finish() for IPv6). That will allow other kernel forwarding path (e.g. ip[6]_forward) to keep the delivery_time also. A very similar IPv6 defrag codes have been duplicated in multiple places: regular IPv6, nf_conntrack, and 6lowpan. Unlike the IPv4 defrag which is done before ip_local_deliver_finish(), the regular IPv6 defrag is done after ip6_input_finish(). Thus, no change should be needed in the regular IPv6 defrag logic because skb_clear_delivery_time() should have been called. 6lowpan also does not need special handling on delivery_time because it is a non-inet packet_type. However, cf_conntrack has a case in NF_INET_PRE_ROUTING that needs to do the IPv6 defrag earlier. Thus, it needs to save the mono_delivery_time bit in the inet_frag_queue which is similar to how it is handled in the previous patch for the IPv4 defrag. This patch chooses to do it consistently and stores the mono_delivery_time in the inet_frag_queue for all cases such that it will be easier for the future refactoring effort on the IPv6 reasm code. Signed-off-by: NMartin KaFai Lau <kafai@fb.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-