提交 · 89e9c7280075f6733b22dd0740daeddeb1256ebf · openeuler / Kernel

22 4月, 2022 2 次提交

ipv6: Remove __ipv6_only_sock(). · 89e9c728

由 Kuniyuki Iwashima 提交于 4月 20, 2022

Since commit 9fe516ba ("inet: move ipv6only in sock_common"),
ipv6_only_sock() and __ipv6_only_sock() are the same macro.  Let's
remove the one.
Signed-off-by: NKuniyuki Iwashima <kuniyu@amazon.co.jp>
Reviewed-by: NDavid Ahern <dsahern@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

89e9c728

net/ipv6: Enforce limits for accept_unsolicited_na sysctl · d09d3ec0

由 Arun Ajith S 提交于 4月 19, 2022

Fix mistake in the original patch where limits were specified but the
handler didn't take care of the limits.
Signed-off-by: NArun Ajith S <aajith@arista.com>
Reviewed-by: NDavid Ahern <dsahern@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d09d3ec0

17 4月, 2022 1 次提交

net/ipv6: Introduce accept_unsolicited_na knob to implement router-side changes for RFC9131 · f9a2fb73

由 Arun Ajith S 提交于 4月 15, 2022

Add a new neighbour cache entry in STALE state for routers on receiving
an unsolicited (gratuitous) neighbour advertisement with
target link-layer-address option specified.
This is similar to the arp_accept configuration for IPv4.
A new sysctl endpoint is created to turn on this behaviour:
/proc/sys/net/ipv6/conf/interface/accept_unsolicited_na.
Signed-off-by: NArun Ajith S <aajith@arista.com>
Reviewed-by: NDavid Ahern <dsahern@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f9a2fb73

16 4月, 2022 2 次提交

ipv6: make ip6_rt_gc_expire an atomic_t · 9cb7c013

由 Eric Dumazet 提交于 4月 13, 2022

Reads and Writes to ip6_rt_gc_expire always have been racy,
as syzbot reported lately [1]

There is a possible risk of under-flow, leading
to unexpected high value passed to fib6_run_gc(),
although I have not observed this in the field.

Hosts hitting ip6_dst_gc() very hard are under pretty bad
state anyway.

[1]
BUG: KCSAN: data-race in ip6_dst_gc / ip6_dst_gc

read-write to 0xffff888102110744 of 4 bytes by task 13165 on cpu 1:
 ip6_dst_gc+0x1f3/0x220 net/ipv6/route.c:3311
 dst_alloc+0x9b/0x160 net/core/dst.c:86
 ip6_dst_alloc net/ipv6/route.c:344 [inline]
 icmp6_dst_alloc+0xb2/0x360 net/ipv6/route.c:3261
 mld_sendpack+0x2b9/0x580 net/ipv6/mcast.c:1807
 mld_send_cr net/ipv6/mcast.c:2119 [inline]
 mld_ifc_work+0x576/0x800 net/ipv6/mcast.c:2651
 process_one_work+0x3d3/0x720 kernel/workqueue.c:2289
 worker_thread+0x618/0xa70 kernel/workqueue.c:2436
 kthread+0x1a9/0x1e0 kernel/kthread.c:376
 ret_from_fork+0x1f/0x30

read-write to 0xffff888102110744 of 4 bytes by task 11607 on cpu 0:
 ip6_dst_gc+0x1f3/0x220 net/ipv6/route.c:3311
 dst_alloc+0x9b/0x160 net/core/dst.c:86
 ip6_dst_alloc net/ipv6/route.c:344 [inline]
 icmp6_dst_alloc+0xb2/0x360 net/ipv6/route.c:3261
 mld_sendpack+0x2b9/0x580 net/ipv6/mcast.c:1807
 mld_send_cr net/ipv6/mcast.c:2119 [inline]
 mld_ifc_work+0x576/0x800 net/ipv6/mcast.c:2651
 process_one_work+0x3d3/0x720 kernel/workqueue.c:2289
 worker_thread+0x618/0xa70 kernel/workqueue.c:2436
 kthread+0x1a9/0x1e0 kernel/kthread.c:376
 ret_from_fork+0x1f/0x30

value changed: 0x00000bb3 -> 0x00000ba9

Reported by Kernel Concurrency Sanitizer on:
CPU: 0 PID: 11607 Comm: kworker/0:21 Not tainted 5.18.0-rc1-syzkaller-00037-g42e7a03d-dirty #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: mld mld_ifc_work

Fixes: 1da177e4 ("Linux-2.6.12-rc2")
Signed-off-by: NEric Dumazet <edumazet@google.com>
Reported-by: Nsyzbot <syzkaller@googlegroups.com>
Reviewed-by: NDavid Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20220413181333.649424-1-eric.dumazet@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>

9cb7c013

ipv6: fix NULL deref in ip6_rcv_core() · 0339d25a

由 Eric Dumazet 提交于 4月 13, 2022

idev can be NULL, as the surrounding code suggests.

Fixes: 4daf841a ("net: ipv6: add skb drop reasons to ip6_rcv_core()")
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Menglong Dong <imagedong@tencent.com>
Cc: Jiang Biao <benbjiang@tencent.com>
Cc: Hao Peng <flyingpeng@tencent.com>
Link: https://lore.kernel.org/r/20220413205653.1178458-1-eric.dumazet@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>

0339d25a

15 4月, 2022 2 次提交

ip6_gre: Fix skb_under_panic in __gre6_xmit() · ab198e1d

由 Peilin Ye 提交于 4月 14, 2022

Feng reported an skb_under_panic BUG triggered by running
test_ip6gretap() in tools/testing/selftests/bpf/test_tunnel.sh:

[   82.492551] skbuff: skb_under_panic: text:ffffffffb268bb8e len:403 put:12 head:ffff9997c5480000 data:ffff9997c547fff8 tail:0x18b end:0x2c0 dev:ip6gretap11
<...>
[   82.607380] Call Trace:
[   82.609389]  <TASK>
[   82.611136]  skb_push.cold.109+0x10/0x10
[   82.614289]  __gre6_xmit+0x41e/0x590
[   82.617169]  ip6gre_tunnel_xmit+0x344/0x3f0
[   82.620526]  dev_hard_start_xmit+0xf1/0x330
[   82.623882]  sch_direct_xmit+0xe4/0x250
[   82.626961]  __dev_queue_xmit+0x720/0xfe0
<...>
[   82.633431]  packet_sendmsg+0x96a/0x1cb0
[   82.636568]  sock_sendmsg+0x30/0x40
<...>

The following sequence of events caused the BUG:

1. During ip6gretap device initialization, tunnel->tun_hlen (e.g. 4) is
   calculated based on old flags (see ip6gre_calc_hlen());
2. packet_snd() reserves header room for skb A, assuming
   tunnel->tun_hlen is 4;
3. Later (in clsact Qdisc), the eBPF program sets a new tunnel key for
   skb A using bpf_skb_set_tunnel_key() (see _ip6gretap_set_tunnel());
4. __gre6_xmit() detects the new tunnel key, and recalculates
   "tun_hlen" (e.g. 12) based on new flags (e.g. TUNNEL_KEY and
   TUNNEL_SEQ);
5. gre_build_header() calls skb_push() with insufficient reserved header
   room, triggering the BUG.

As sugguested by Cong, fix it by moving the call to skb_cow_head() after
the recalculation of tun_hlen.

Reproducer:

  OBJ=$LINUX/tools/testing/selftests/bpf/test_tunnel_kern.o

  ip netns add at_ns0
  ip link add veth0 type veth peer name veth1
  ip link set veth0 netns at_ns0
  ip netns exec at_ns0 ip addr add 172.16.1.100/24 dev veth0
  ip netns exec at_ns0 ip link set dev veth0 up
  ip link set dev veth1 up mtu 1500
  ip addr add dev veth1 172.16.1.200/24

  ip netns exec at_ns0 ip addr add ::11/96 dev veth0
  ip netns exec at_ns0 ip link set dev veth0 up
  ip addr add dev veth1 ::22/96
  ip link set dev veth1 up

  ip netns exec at_ns0 \
  	ip link add dev ip6gretap00 type ip6gretap seq flowlabel 0xbcdef key 2 \
  	local ::11 remote ::22

  ip netns exec at_ns0 ip addr add dev ip6gretap00 10.1.1.100/24
  ip netns exec at_ns0 ip addr add dev ip6gretap00 fc80::100/96
  ip netns exec at_ns0 ip link set dev ip6gretap00 up

  ip link add dev ip6gretap11 type ip6gretap external
  ip addr add dev ip6gretap11 10.1.1.200/24
  ip addr add dev ip6gretap11 fc80::200/24
  ip link set dev ip6gretap11 up

  tc qdisc add dev ip6gretap11 clsact
  tc filter add dev ip6gretap11 egress bpf da obj $OBJ sec ip6gretap_set_tunnel
  tc filter add dev ip6gretap11 ingress bpf da obj $OBJ sec ip6gretap_get_tunnel

  ping6 -c 3 -w 10 -q ::11

Fixes: 6712abc1 ("ip6_gre: add ip6 gre and gretap collect_md mode")
Reported-by: NFeng Zhou <zhoufeng.zf@bytedance.com>
Co-developed-by: NCong Wang <cong.wang@bytedance.com>
Signed-off-by: NCong Wang <cong.wang@bytedance.com>
Signed-off-by: NPeilin Ye <peilin.ye@bytedance.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ab198e1d

ip6_gre: Avoid updating tunnel->tun_hlen in __gre6_xmit() · f40c064e

由 Peilin Ye 提交于 4月 14, 2022

Do not update tunnel->tun_hlen in data plane code.  Use a local variable
instead, just like "tunnel_hlen" in net/ipv4/ip_gre.c:gre_fb_xmit().
Co-developed-by: NCong Wang <cong.wang@bytedance.com>
Signed-off-by: NCong Wang <cong.wang@bytedance.com>
Signed-off-by: NPeilin Ye <peilin.ye@bytedance.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f40c064e

13 4月, 2022 9 次提交

net: ipv6: add skb drop reasons to ip6_protocol_deliver_rcu() · eeab7e7f

由 Menglong Dong 提交于 4月 13, 2022

Replace kfree_skb() used in ip6_protocol_deliver_rcu() with
kfree_skb_reason().

No new reasons are added.

Some paths are ignored, as they are not common, such as encapsulation
on non-final protocol.
Signed-off-by: NMenglong Dong <imagedong@tencent.com>
Reviewed-by: NJiang Biao <benbjiang@tencent.com>
Reviewed-by: NHao Peng <flyingpeng@tencent.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

eeab7e7f

net: ipv6: add skb drop reasons to ip6_rcv_core() · 4daf841a

由 Menglong Dong 提交于 4月 13, 2022

Replace kfree_skb() used in ip6_rcv_core() with kfree_skb_reason().
No new drop reasons are added.

Seems now we use 'SKB_DROP_REASON_IP_INHDR' for too many case during
ipv6 header parse or check, just like what 'IPSTATS_MIB_INHDRERRORS'
do. Will it be too general and hard to know what happened?
Signed-off-by: NMenglong Dong <imagedong@tencent.com>
Reviewed-by: NJiang Biao <benbjiang@tencent.com>
Reviewed-by: NHao Peng <flyingpeng@tencent.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4daf841a

net: ipv6: add skb drop reasons to TLV parse · 7d9dbdfb

由 Menglong Dong 提交于 4月 13, 2022

Replace kfree_skb() used in TLV encoded option header parsing with
kfree_skb_reason(). Following functions are involved:

ip6_parse_tlv()
ipv6_hop_ra()
ipv6_hop_ioam()
ipv6_hop_jumbo()
ipv6_hop_calipso()
ipv6_dest_hao()

Most skb drops during this process are regarded as 'InHdrErrors',
as 'IPSTATS_MIB_INHDRERRORS' is used when ip6_parse_tlv() fails,
which make we use 'SKB_DROP_REASON_IP_INHDR' correspondingly.

However, 'IP_INHDR' is a relatively general reason. Therefore, we
can use other reasons with higher priority in some cases. For example,
'SKB_DROP_REASON_UNHANDLED_PROTO' is used for unknown TLV options.
Signed-off-by: NMenglong Dong <imagedong@tencent.com>
Reviewed-by: NJiang Biao <benbjiang@tencent.com>
Reviewed-by: NHao Peng <flyingpeng@tencent.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7d9dbdfb

net: ipv6: remove redundant statistics in ipv6_hop_jumbo() · bba98083

由 Menglong Dong 提交于 4月 13, 2022

There are two call chains for ipv6_hop_jumbo(). The first one is:

ipv6_destopt_rcv() -> ip6_parse_tlv() -> ipv6_hop_jumbo()

On this call chain, the drop statistics will be done in
ipv6_destopt_rcv() with 'IPSTATS_MIB_INHDRERRORS' if ipv6_hop_jumbo()
returns false.

The second call chain is:

ip6_rcv_core() -> ipv6_parse_hopopts() -> ip6_parse_tlv()

And the drop statistics will also be done in ip6_rcv_core() with
'IPSTATS_MIB_INHDRERRORS' if ipv6_hop_jumbo() returns false.

Therefore, the statistics in ipv6_hop_jumbo() is redundant, which
means the drop is counted twice. The statistics in ipv6_hop_jumbo()
is almost the same as the outside, except the
'IPSTATS_MIB_INTRUNCATEDPKTS', which seems that we have to ignore it.
Signed-off-by: NMenglong Dong <imagedong@tencent.com>
Reviewed-by: NJiang Biao <benbjiang@tencent.com>
Reviewed-by: NHao Peng <flyingpeng@tencent.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bba98083

net: icmp: introduce function icmpv6_param_prob_reason() · 1ad6d548

由 Menglong Dong 提交于 4月 13, 2022

In order to add the skb drop reasons support to icmpv6_param_prob(),
introduce the function icmpv6_param_prob_reason() and make
icmpv6_param_prob() an inline call to it. This new function will be
used in the following patches.
Signed-off-by: NMenglong Dong <imagedong@tencent.com>
Reviewed-by: NJiang Biao <benbjiang@tencent.com>
Reviewed-by: NHao Peng <flyingpeng@tencent.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1ad6d548

net: ip: add skb drop reasons to ip forwarding · 2edc1a38

由 Menglong Dong 提交于 4月 13, 2022

Replace kfree_skb() which is used in ip6_forward() and ip_forward()
with kfree_skb_reason().

The new drop reason 'SKB_DROP_REASON_PKT_TOO_BIG' is introduced for
the case that the length of the packet exceeds MTU and can't
fragment.
Signed-off-by: NMenglong Dong <imagedong@tencent.com>
Reviewed-by: NJiang Biao <benbjiang@tencent.com>
Reviewed-by: NHao Peng <flyingpeng@tencent.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2edc1a38

net: ipv6: add skb drop reasons to ip6_pkt_drop() · 3ae42cc8

由 Menglong Dong 提交于 4月 13, 2022

Replace kfree_skb() used in ip6_pkt_drop() with kfree_skb_reason().
No new reason is added.
Signed-off-by: NMenglong Dong <imagedong@tencent.com>
Reviewed-by: NJiang Biao <benbjiang@tencent.com>
Reviewed-by: NHao Peng <flyingpeng@tencent.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3ae42cc8

ipv6: exthdrs: use swap() instead of open coding it · 5ee6ad1d

由 Guo Zhengkui 提交于 4月 12, 2022

Address the following coccicheck warning:
net/ipv6/exthdrs.c:620:44-45: WARNING opportunity for swap()

by using swap() for the swapping of variable values and drop
the tmp (`addr`) variable that is not needed any more.
Signed-off-by: NGuo Zhengkui <guozhengkui@vivo.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5ee6ad1d

esp: limit skb_page_frag_refill use to a single page · 5bd8baab

由 Sabrina Dubroca 提交于 4月 13, 2022

Commit ebe48d36 ("esp: Fix possible buffer overflow in ESP
transformation") tried to fix skb_page_frag_refill usage in ESP by
capping allocsize to 32k, but that doesn't completely solve the issue,
as skb_page_frag_refill may return a single page. If that happens, we
will write out of bounds, despite the check introduced in the previous
patch.

This patch forces COW in cases where we would end up calling
skb_page_frag_refill with a size larger than a page (first in
esp_output_head with tailen, then in esp_output_tail with
skb->data_len).

Fixes: cac2661c ("esp4: Avoid skb_cow_data whenever possible")
Fixes: 03e2a30f ("esp6: Avoid skb_cow_data whenever possible")
Signed-off-by: NSabrina Dubroca <sd@queasysnail.net>
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>

5bd8baab

12 4月, 2022 1 次提交

net: remove noblock parameter from recvmsg() entities · ec095263

由 Oliver Hartkopp 提交于 4月 11, 2022

The internal recvmsg() functions have two parameters 'flags' and 'noblock'
that were merged inside skb_recv_datagram(). As a follow up patch to commit
f4b41f06 ("net: remove noblock parameter from skb_recv_datagram()")
this patch removes the separate 'noblock' parameter for recvmsg().

Analogue to the referenced patch for skb_recv_datagram() the 'flags' and
'noblock' parameters are unnecessarily split up with e.g.

err = sk->sk_prot->recvmsg(sk, msg, size, flags & MSG_DONTWAIT,
                           flags & ~MSG_DONTWAIT, &addr_len);

or in

err = INDIRECT_CALL_2(sk->sk_prot->recvmsg, tcp_recvmsg, udp_recvmsg,
                      sk, msg, size, flags & MSG_DONTWAIT,
                      flags & ~MSG_DONTWAIT, &addr_len);

instead of simply using only flags all the time and check for MSG_DONTWAIT
where needed (to preserve for the formerly separated no(n)block condition).
Signed-off-by: NOliver Hartkopp <socketcan@hartkopp.net>
Link: https://lore.kernel.org/r/20220411124955.154876-1-socketcan@hartkopp.netSigned-off-by: NPaolo Abeni <pabeni@redhat.com>

ec095263

11 4月, 2022 3 次提交

ipv6: fix panic when forwarding a pkt with no in6 dev · e3fa461d

由 Nicolas Dichtel 提交于 4月 08, 2022

kongweibin reported a kernel panic in ip6_forward() when input interface
has no in6 dev associated.

The following tc commands were used to reproduce this panic:
tc qdisc del dev vxlan100 root
tc qdisc add dev vxlan100 root netem corrupt 5%

CC: stable@vger.kernel.org
Fixes: ccd27f05 ("ipv6: fix 'disable_policy' for fwd packets")
Reported-by: Nkongweibin <kongweibin2@huawei.com>
Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
Reviewed-by: NDavid Ahern <dsahern@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e3fa461d

netfilter: nft_fib: reverse path filter for policy-based routing on iif · be8be04e

由 Pablo Neira Ayuso 提交于 3月 31, 2022

If policy-based routing using the iif selector is used, then the fib
expression fails to look up for the reverse path from the prerouting
hook because the input interface cannot be inferred. In order to support
this scenario, extend the fib expression to allow to use after the route
lookup, from the forward hook.

This patch also adds support for the input hook for usability reasons.
Since the prerouting hook cannot be used for the scenario described
above, users need two rules: one for the forward chain and another rule
for the input chain to check for the reverse path check for locally
targeted traffic.
Acked-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

be8be04e

net: icmp: add skb drop reasons to icmp protocol · b384c95a

由 Menglong Dong 提交于 4月 07, 2022

Replace kfree_skb() used in icmp_rcv() and icmpv6_rcv() with
kfree_skb_reason().

In order to get the reasons of the skb drops after icmp message handle,
we change the return type of 'handler()' in 'struct icmp_control' from
'bool' to 'enum skb_drop_reason'. This may change its original
intention, as 'false' means failure, but 'SKB_NOT_DROPPED_YET' means
success now. Therefore, all 'handler' and the call of them need to be
handled. Following 'handler' functions are involved:

icmp_unreach()
icmp_redirect()
icmp_echo()
icmp_timestamp()
icmp_discard()

And following new drop reasons are added:

SKB_DROP_REASON_ICMP_CSUM
SKB_DROP_REASON_INVALID_PROTO

The reason 'INVALID_PROTO' is introduced for the case that the packet
doesn't follow rfc 1122 and is dropped. This is not a common case, and
I believe we can locate the problem from the data in the packet. For now,
this 'INVALID_PROTO' is used for the icmp broadcasts with wrong types.

Maybe there should be a document file for these reasons. For example,
list all the case that causes the 'UNHANDLED_PROTO' and 'INVALID_PROTO'
drop reason. Therefore, users can locate their problems according to the
document.
Reviewed-by: NHao Peng <flyingpeng@tencent.com>
Reviewed-by: NJiang Biao <benbjiang@tencent.com>
Signed-off-by: NMenglong Dong <imagedong@tencent.com>
Reviewed-by: NDavid Ahern <dsahern@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b384c95a

08 4月, 2022 1 次提交

net-core: rx_otherhost_dropped to core_stats · 794c24e9

由 Jeffrey Ji 提交于 4月 06, 2022

Increment rx_otherhost_dropped counter when packet dropped due to
mismatched dest MAC addr.

An example when this drop can occur is when manually crafting raw
packets that will be consumed by a user space application via a tap
device. For testing purposes local traffic was generated using trafgen
for the client and netcat to start a server

Tested: Created 2 netns, sent 1 packet using trafgen from 1 to the other
with "{eth(daddr=$INCORRECT_MAC...}", verified that iproute2 showed the
counter was incremented. (Also had to modify iproute2 to show the stat,
additional patch for that coming next.)
Signed-off-by: NJeffrey Ji <jeffreyji@google.com>
Reviewed-by: NBrian Vazquez <brianvv@google.com>
Reviewed-by: NEric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20220406172600.1141083-1-jeffreyjilinux@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>

794c24e9

07 4月, 2022 2 次提交

ipv6: fix locking issues with loops over idev->addr_list · 51454ea4

由 Niels Dossche 提交于 4月 04, 2022

idev->addr_list needs to be protected by idev->lock. However, it is not
always possible to do so while iterating and performing actions on
inet6_ifaddr instances. For example, multiple functions (like
addrconf_{join,leave}_anycast) eventually call down to other functions
that acquire the idev->lock. The current code temporarily unlocked the
idev->lock during the loops, which can cause race conditions. Moving the
locks up is also not an appropriate solution as the ordering of lock
acquisition will be inconsistent with for example mc_lock.

This solution adds an additional field to inet6_ifaddr that is used
to temporarily add the instances to a temporary list while holding
idev->lock. The temporary list can then be traversed without holding
idev->lock. This change was done in two places. In addrconf_ifdown, the
list_for_each_entry_safe variant of the list loop is also no longer
necessary as there is no deletion within that specific loop.
Suggested-by: NPaolo Abeni <pabeni@redhat.com>
Signed-off-by: NNiels Dossche <dossche.niels@gmail.com>
Acked-by: NPaolo Abeni <pabeni@redhat.com>
Link: https://lore.kernel.org/r/20220403231523.45843-1-dossche.niels@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>

51454ea4

tcp: add accessors to read/set tp->snd_cwnd · 40570375

由 Eric Dumazet 提交于 4月 05, 2022

We had various bugs over the years with code
breaking the assumption that tp->snd_cwnd is greater
than zero.

Lately, syzbot reported the WARN_ON_ONCE(!tp->prior_cwnd) added
in commit 8b8a321f ("tcp: fix zero cwnd in tcp_cwnd_reduction")
can trigger, and without a repro we would have to spend
considerable time finding the bug.

Instead of complaining too late, we want to catch where
and when tp->snd_cwnd is set to an illegal value.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Suggested-by: NYuchung Cheng <ycheng@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Acked-by: NYuchung Cheng <ycheng@google.com>
Link: https://lore.kernel.org/r/20220405233538.947344-1-eric.dumazet@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>

40570375

06 4月, 2022 3 次提交

ip6_tunnel: Remove duplicate assignments · 487dc3ca

由 Hongbin Wang 提交于 4月 05, 2022

There is a same action when the variable is initialized
Signed-off-by: NHongbin Wang <wh_bin@126.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

487dc3ca

net: ipv6mr: fix unused variable warning with CONFIG_IPV6_PIMSM_V2=n · a3ebe92a

由 Florian Westphal 提交于 4月 06, 2022

net/ipv6/ip6mr.c:1656:14: warning: unused variable 'do_wrmifwhole'

Move it to the CONFIG_IPV6_PIMSM_V2 scope where its used.

Fixes: 4b340a5a ("net: ip6mr: add support for passing full packet on wrong mif")
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a3ebe92a

net: remove noblock parameter from skb_recv_datagram() · f4b41f06

由 Oliver Hartkopp 提交于 4月 04, 2022

skb_recv_datagram() has two parameters 'flags' and 'noblock' that are
merged inside skb_recv_datagram() by 'flags | (noblock ? MSG_DONTWAIT : 0)'

As 'flags' may contain MSG_DONTWAIT as value most callers split the 'flags'
into 'flags' and 'noblock' with finally obsolete bit operations like this:

skb_recv_datagram(sk, flags & ~MSG_DONTWAIT, flags & MSG_DONTWAIT, &rc);

And this is not even done consistently with the 'flags' parameter.

This patch removes the obsolete and costly splitting into two parameters
and only performs bit operations when really needed on the caller side.

One missing conversion thankfully reported by kernel test robot. I missed
to enable kunit tests to build the mctp code.
Reported-by: Nkernel test robot <lkp@intel.com>
Signed-off-by: NOliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f4b41f06

05 4月, 2022 1 次提交

ipv6: Fix stats accounting in ip6_pkt_drop · 1158f79f

由 David Ahern 提交于 4月 04, 2022

VRF devices are the loopbacks for VRFs, and a loopback can not be
assigned to a VRF. Accordingly, the condition in ip6_pkt_drop should
be '||' not '&&'.

Fixes: 1d3fd8a1 ("vrf: Use orig netdev to count Ip6InNoRoutes and a fresh route lookup when sending dest unreach")
Reported-by: NPudak, Filip <Filip.Pudak@windriver.com>
Reported-by: NXiao, Jiguang <Jiguang.Xiao@windriver.com>
Signed-off-by: NDavid Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20220404150908.2937-1-dsahern@kernel.orgSigned-off-by: NPaolo Abeni <pabeni@redhat.com>

1158f79f

20 3月, 2022 2 次提交

netfilter: nft_fib: add reduce support · 3c1eb413

由 Florian Westphal 提交于 3月 14, 2022

The fib expression stores to a register, so we can't add empty stub.
Check that the register that is being written is in fact redundant.

In most cases, this is expected to cancel tracking as re-use is
unlikely.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

3c1eb413

netfilter: nf_tables: do not reduce read-only expressions · b2d30654

由 Pablo Neira Ayuso 提交于 3月 14, 2022

Skip register tracking for expressions that perform read-only operations
on the registers. Define and use a cookie pointer NFT_REDUCE_READONLY to
avoid defining stubs for these expressions.

This patch re-enables register tracking which was disabled in ed5f85d4
("netfilter: nf_tables: disable register tracking"). Follow up patches
add remaining register tracking for existing expressions.
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

b2d30654

16 3月, 2022 1 次提交

net: Add l3mdev index to flow struct and avoid oif reset for port devices · 40867d74

由 David Ahern 提交于 3月 14, 2022

The fundamental premise of VRF and l3mdev core code is binding a socket
to a device (l3mdev or netdev with an L3 domain) to indicate L3 scope.
Legacy code resets flowi_oif to the l3mdev losing any original port
device binding. Ben (among others) has demonstrated use cases where the
original port device binding is important and needs to be retained.
This patch handles that by adding a new entry to the common flow struct
that can indicate the l3mdev index for later rule and table matching
avoiding the need to reset flowi_oif.

In addition to allowing more use cases that require port device binds,
this patch brings a few datapath simplications:

1. l3mdev_fib_rule_match is only called when walking fib rules and
always after l3mdev_update_flow. That allows an optimization to bail
early for non-VRF type uses cases when flowi_l3mdev is not set. Also,
only that index needs to be checked for the FIB table id.

2. l3mdev_update_flow can be called with flowi_oif set to a l3mdev
(e.g., VRF) device. By resetting flowi_oif only for this case the
FLOWI_FLAG_SKIP_NH_OIF flag is not longer needed and can be removed,
removing several checks in the datapath. The flowi_iif path can be
simplified to only be called if the it is not loopback (loopback can
not be assigned to an L3 domain) and the l3mdev index is not already
set.

3. Avoid another device lookup in the output path when the fib lookup
returns a reject failure.

Note: 2 functional tests for local traffic with reject fib rules are
updated to reflect the new direct failure at FIB lookup time for ping
rather than the failure on packet path. The current code fails like this:

HINT: Fails since address on vrf device is out of device scope
COMMAND: ip netns exec ns-A ping -c1 -w1 -I eth1 172.16.3.1
ping: Warning: source address might be selected on device other than: eth1
PING 172.16.3.1 (172.16.3.1) from 172.16.3.1 eth1: 56(84) bytes of data.

--- 172.16.3.1 ping statistics ---
1 packets transmitted, 0 received, 100% packet loss, time 0ms

where the test now directly fails:

HINT: Fails since address on vrf device is out of device scope
COMMAND: ip netns exec ns-A ping -c1 -w1 -I eth1 172.16.3.1
ping: connect: No route to host
Signed-off-by: NDavid Ahern <dsahern@kernel.org>
Tested-by: NBen Greear <greearb@candelatech.com>
Link: https://lore.kernel.org/r/20220314204551.16369-1-dsahern@kernel.orgSigned-off-by: NJakub Kicinski <kuba@kernel.org>

40867d74

14 3月, 2022 1 次提交

esp6: fix check on ipv6_skip_exthdr's return value · 4db4075f

由 Sabrina Dubroca 提交于 3月 10, 2022

Commit 5f9c55c8 ("ipv6: check return value of ipv6_skip_exthdr")
introduced an incorrect check, which leads to all ESP packets over
either TCPv6 or UDPv6 encapsulation being dropped. In this particular
case, offset is negative, since skb->data points to the ESP header in
the following chain of headers, while skb->network_header points to
the IPv6 header:

    IPv6 | ext | ... | ext | UDP | ESP | ...

That doesn't seem to be a problem, especially considering that if we
reach esp6_input_done2, we're guaranteed to have a full set of headers
available (otherwise the packet would have been dropped earlier in the
stack). However, it means that the return value will (intentionally)
be negative. We can make the test more specific, as the expected
return value of ipv6_skip_exthdr will be the (negated) size of either
a UDP header, or a TCP header with possible options.

In the future, we should probably either make ipv6_skip_exthdr
explicitly accept negative offsets (and adjust its return value for
error cases), or make ipv6_skip_exthdr only take non-negative
offsets (and audit all callers).

Fixes: 5f9c55c8 ("ipv6: check return value of ipv6_skip_exthdr")
Reported-by: NXiumei Mu <xmu@redhat.com>
Signed-off-by: NSabrina Dubroca <sd@queasysnail.net>
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>

4db4075f

12 3月, 2022 1 次提交

net: ipv6: fix skb_over_panic in __ip6_append_data · 5e34af41

由 Tadeusz Struk 提交于 3月 10, 2022

Syzbot found a kernel bug in the ipv6 stack:
LINK: https://syzkaller.appspot.com/bug?id=205d6f11d72329ab8d62a610c44c5e7e25415580
The reproducer triggers it by sending a crafted message via sendmmsg()
call, which triggers skb_over_panic, and crashes the kernel:

skbuff: skb_over_panic: text:ffffffff84647fb4 len:65575 put:65575
head:ffff888109ff0000 data:ffff888109ff0088 tail:0x100af end:0xfec0
dev:<NULL>

Update the check that prevents an invalid packet with MTU equal
to the fregment header size to eat up all the space for payload.

The reproducer can be found here:
LINK: https://syzkaller.appspot.com/text?tag=ReproC&x=1648c83fb00000

Reported-by: syzbot+e223cf47ec8ae183f2a0@syzkaller.appspotmail.com
Signed-off-by: NTadeusz Struk <tadeusz.struk@linaro.org>
Acked-by: NWillem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/r/20220310232538.1044947-1-tadeusz.struk@linaro.orgSigned-off-by: NJakub Kicinski <kuba@kernel.org>

5e34af41

09 3月, 2022 1 次提交

skb: make drop reason booleanable · 1330b6ef

由 Jakub Kicinski 提交于 3月 07, 2022

We have a number of cases where function returns drop/no drop
decision as a boolean. Now that we want to report the reason
code as well we have to pass extra output arguments.

We can make the reason code evaluate correctly as bool.

I believe we're good to reorder the reasons as they are
reported to user space as strings.
Signed-off-by: NJakub Kicinski <kuba@kernel.org>
Reviewed-by: NDavid Ahern <dsahern@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1330b6ef

07 3月, 2022 3 次提交

net: Fix esp GSO on inter address family tunnels. · 23c7f8d7

由 Steffen Klassert 提交于 3月 07, 2022

The esp tunnel GSO handlers use skb_mac_gso_segment to
push the inner packet to the segmentation handlers.
However, skb_mac_gso_segment takes the Ethernet Protocol
ID from 'skb->protocol' which is wrong for inter address
family tunnels. We fix this by introducing a new
skb_eth_gso_segment function.

This function can be used if it is necessary to pass the
Ethernet Protocol ID directly to the segmentation handler.
First users of this function will be the esp4 and esp6
tunnel segmentation handlers.

Fixes: c35fe410 ("xfrm: Add mode handlers for IPsec on layer 2")
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>

23c7f8d7

esp: Fix BEET mode inter address family tunneling on GSO · 053c8fdf

由 Steffen Klassert 提交于 3月 07, 2022

The xfrm{4,6}_beet_gso_segment() functions did not correctly set the
SKB_GSO_IPXIP4 and SKB_GSO_IPXIP6 gso types for the address family
tunneling case. Fix this by setting these gso types.

Fixes: 384a46ea ("esp4: add gso_segment for esp4 beet mode")
Fixes: 7f9e40eb ("esp6: add gso_segment for esp6 beet mode")
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>

053c8fdf

esp: Fix possible buffer overflow in ESP transformation · ebe48d36

由 Steffen Klassert 提交于 3月 07, 2022

The maximum message size that can be send is bigger than
the  maximum site that skb_page_frag_refill can allocate.
So it is possible to write beyond the allocated buffer.

Fix this by doing a fallback to COW in that case.

v2:

Avoid get get_order() costs as suggested by Linus Torvalds.

Fixes: cac2661c ("esp4: Avoid skb_cow_data whenever possible")
Fixes: 03e2a30f ("esp6: Avoid skb_cow_data whenever possible")
Reported-by: Nvalis <sec@valis.email>
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>

ebe48d36

04 3月, 2022 1 次提交

ipv6: fix skb drops in igmp6_event_query() and igmp6_event_report() · 2d3916f3

由 Eric Dumazet 提交于 3月 03, 2022

While investigating on why a synchronize_net() has been added recently
in ipv6_mc_down(), I found that igmp6_event_query() and igmp6_event_report()
might drop skbs in some cases.

Discussion about removing synchronize_net() from ipv6_mc_down()
will happen in a different thread.

Fixes: f185de28 ("mld: add new workqueues for process mld events")
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Taehee Yoo <ap420073@gmail.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20220303173728.937869-1-eric.dumazet@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>

2d3916f3

03 3月, 2022 3 次提交

net: Postpone skb_clear_delivery_time() until knowing the skb is delivered locally · cd14e9b7

由 Martin KaFai Lau 提交于 3月 02, 2022

The previous patches handled the delivery_time in the ingress path
before the routing decision is made.  This patch can postpone clearing
delivery_time in a skb until knowing it is delivered locally and also
set the (rcv) timestamp if needed.  This patch moves the
skb_clear_delivery_time() from dev.c to ip_local_deliver_finish()
and ip6_input_finish().
Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cd14e9b7

net: ipv6: Get rcv timestamp if needed when handling hop-by-hop IOAM option · b6561f84

由 Martin KaFai Lau 提交于 3月 02, 2022

IOAM is a hop-by-hop option with a temporary iana allocation (49).
Since it is hop-by-hop, it is done before the input routing decision.
One of the traced data field is the (rcv) timestamp.

When the locally generated skb is looping from egress to ingress over
a virtual interface (e.g. veth, loopback...), skb->tstamp may have the
delivery time before it is known that it will be delivered locally
and received by another sk.

Like handling the network tapping (tcpdump) in the earlier patch,
this patch gets the timestamp if needed without over-writing the
delivery_time in the skb->tstamp.  skb_tstamp_cond() is added to do the
ktime_get_real() with an extra cond arg to check on top of the
netstamp_needed_key static key.  skb_tstamp_cond() will also be used in
a latter patch and it needs the netstamp_needed_key check.
Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b6561f84

net: ipv6: Handle delivery_time in ipv6 defrag · 335c8cf3

由 Martin KaFai Lau 提交于 3月 02, 2022

A latter patch will postpone the delivery_time clearing until the stack
knows the skb is being delivered locally (i.e. calling
skb_clear_delivery_time() at ip_local_deliver_finish() for IPv4
and at ip6_input_finish() for IPv6).  That will allow other kernel
forwarding path (e.g. ip[6]_forward) to keep the delivery_time also.

A very similar IPv6 defrag codes have been duplicated in
multiple places: regular IPv6, nf_conntrack, and 6lowpan.

Unlike the IPv4 defrag which is done before ip_local_deliver_finish(),
the regular IPv6 defrag is done after ip6_input_finish().
Thus, no change should be needed in the regular IPv6 defrag
logic because skb_clear_delivery_time() should have been called.

6lowpan also does not need special handling on delivery_time
because it is a non-inet packet_type.

However, cf_conntrack has a case in NF_INET_PRE_ROUTING that needs
to do the IPv6 defrag earlier.  Thus, it needs to save the
mono_delivery_time bit in the inet_frag_queue which is similar
to how it is handled in the previous patch for the IPv4 defrag.

This patch chooses to do it consistently and stores the mono_delivery_time
in the inet_frag_queue for all cases such that it will be easier
for the future refactoring effort on the IPv6 reasm code.
Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

335c8cf3

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功