提交 · e6d7bbfbacd59c9564d2af69692b5878ebb268db · openeuler / Kernel

13 12月, 2022 1 次提交

net: Fix data-races around weight_p and dev_weight_[rt]x_bias. · e6d7bbfb

由 Kuniyuki Iwashima 提交于 12月 02, 2022

stable inclusion
from stable-v5.10.140
commit b498a1b0171e7152ce5a837c7f74b4c1a296586f
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I63FTT

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=b498a1b0171e7152ce5a837c7f74b4c1a296586f

--------------------------------

[ Upstream commit bf955b5a ]

While reading weight_p, it can be changed concurrently.  Thus, we need
to add READ_ONCE() to its reader.

Also, dev_[rt]x_weight can be read/written at the same time.  So, we
need to use READ_ONCE() and WRITE_ONCE() for its access.  Moreover, to
use the same weight_p while changing dev_[rt]x_weight, we add a mutex
in proc_do_dev_weight().

Fixes: 3d48b53f ("net: dev_weight: TX/RX orthogonality")
Fixes: 1da177e4 ("Linux-2.6.12-rc2")
Signed-off-by: NKuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

e6d7bbfb

10 11月, 2022 4 次提交

net: inline rollback_registered_many() · 60a225c1

由 Jakub Kicinski 提交于 11月 10, 2022

stable inclusion
from stable-v5.10.134
commit 4f900c37f13eef18e56f541b525d1e743948e01c
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5ZVR7

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=4f900c37f13eef18e56f541b525d1e743948e01c

--------------------------------

commit 0cbe1e57 upstream.

Similar to the change for rollback_registered() -
rollback_registered_many() was a part of unregister_netdevice_many()
minus the net_set_todo(), which is no longer needed.

Functionally this patch moves the list_empty() check back after:

	BUG_ON(dev_boot_phase);
	ASSERT_RTNL();

but I can't find any reason why that would be an issue.
Reviewed-by: NEdwin Peer <edwin.peer@broadcom.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>
Signed-off-by: NFedor Pchelkin <pchelkin@ispras.ru>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

60a225c1

net: move rollback_registered_many() · 20bf2e9a

由 Jakub Kicinski 提交于 11月 10, 2022

stable inclusion
from stable-v5.10.134
commit bf2f3d1970c03f14c84515738de5c0cb28c8ebce
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5ZVR7

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=bf2f3d1970c03f14c84515738de5c0cb28c8ebce

--------------------------------

commit bcfe2f1a upstream.

Move rollback_registered_many() and add a temporary
forward declaration to make merging the code into
unregister_netdevice_many() easier to review.

No functional changes.
Reviewed-by: NEdwin Peer <edwin.peer@broadcom.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>
Signed-off-by: NFedor Pchelkin <pchelkin@ispras.ru>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

 Conflicts:
	net/core/dev.c
[zzk: commit ("net/ns: put workqueue of cleanup_net sleep for a while when notify")
adds cond_resched() into function which is being moved]
Reviewed-by: NWei Li <liwei391@huawei.com>

20bf2e9a

net: inline rollback_registered() · e7978ed4

由 Jakub Kicinski 提交于 11月 10, 2022

stable inclusion
from stable-v5.10.134
commit 672fac0a4372b7cc053bc7458c536eaea450a572
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5ZVR7

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=672fac0a4372b7cc053bc7458c536eaea450a572

--------------------------------

commit 037e56bd upstream.

rollback_registered() is a local helper, it's common for driver
code to call unregister_netdevice_queue(dev, NULL) when they
want to unregister netdevices under rtnl_lock. Inline
rollback_registered() and adjust the only remaining caller.
Reviewed-by: NEdwin Peer <edwin.peer@broadcom.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>
Signed-off-by: NFedor Pchelkin <pchelkin@ispras.ru>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

e7978ed4

net: move net_set_todo inside rollback_registered() · 38cd659f

由 Jakub Kicinski 提交于 11月 10, 2022

stable inclusion
from stable-v5.10.134
commit b1158677d46bd67e32d6606d1f8c01d8ba9e6d22
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5ZVR7

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=b1158677d46bd67e32d6606d1f8c01d8ba9e6d22

--------------------------------

commit 2014beea upstream.

Commit 93ee31f1 ("[NET]: Fix free_netdev on register_netdev
failure.") moved net_set_todo() outside of rollback_registered()
so that rollback_registered() can be used in the failure path of
register_netdevice() but without risking a double free.

Since commit cf124db5 ("net: Fix inconsistent teardown and
release of private netdev state."), however, we have a better
way of handling that condition, since destructors don't call
free_netdev() directly.

After the change in commit c269a24c ("net: make free_netdev()
more lenient with unregistering devices") we can now move
net_set_todo() back.
Reviewed-by: NEdwin Peer <edwin.peer@broadcom.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>
Signed-off-by: NFedor Pchelkin <pchelkin@ispras.ru>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

38cd659f

27 9月, 2022 1 次提交

net: remove two BUG() from skb_checksum_help() · eaadd14b

由 Eric Dumazet 提交于 9月 27, 2022

stable inclusion
from stable-v5.10.121
commit 312c43e98ed190bd8fd7a71a0addf9539d5b8ab1
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5L6CQ

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=312c43e98ed190bd8fd7a71a0addf9539d5b8ab1

--------------------------------

[ Upstream commit d7ea0d9d ]

I have a syzbot report that managed to get a crash in skb_checksum_help()

If syzbot can trigger these BUG(), it makes sense to replace
them with more friendly WARN_ON_ONCE() since skb_checksum_help()
can instead return an error code.

Note that syzbot will still crash there, until real bug is fixed.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>

eaadd14b

26 7月, 2022 1 次提交

bpf: Don't redirect packets with invalid pkt_len · 5375ef0e

由 Zhengchao Shao 提交于 7月 26, 2022

mainline inclusion
from mainline-v5.19-rc6
commit fd189422
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5HWKR
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/commit/?id=fd1894224407

--------------------------------

Syzbot found an issue [1]: fq_codel_drop() try to drop a flow whitout any
skbs, that is, the flow->head is null.
The root cause, as the [2] says, is because that bpf_prog_test_run_skb()
run a bpf prog which redirects empty skbs.
So we should determine whether the length of the packet modified by bpf
prog or others like bpf_prog_test is valid before forwarding it directly.

LINK: [1] https://syzkaller.appspot.com/bug?id=0b84da80c2917757915afa89f7738a9d16ec96c5
LINK: [2] https://www.spinics.net/lists/netdev/msg777503.html

Reported-by: syzbot+7a12909485b94426aceb@syzkaller.appspotmail.com
Signed-off-by: NZhengchao Shao <shaozhengchao@huawei.com>
Reviewed-by: NStanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/20220715115559.139691-1-shaozhengchao@huawei.comSigned-off-by: NAlexei Starovoitov <ast@kernel.org>
Reviewed-by: NWei Yongjun <weiyongjun1@huawei.com>
Reviewed-by: NYue Haibing <yuehaibing@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

5375ef0e

21 6月, 2022 1 次提交

net/ns: put workqueue of cleanup_net sleep for a while when notify. · 758c9c45

由 Zhengchao Shao 提交于 6月 21, 2022

hulk inclusion
category: bugfix
bugzilla: 186807 https://gitee.com/openeuler/kernel/issues/I5ATLD
CVE: NA

--------------------------------

When we clean up namespace, we have to notify every netdevice that
dev is down. If device that registered too many, the notify time will
take too many CPU time, It will course CPU soft-lockup issue. The
reprocedure is followed:
NIFS=50
for ((i=0; i<$NIFS; i++))
do
        ip netns add dummy-ns$i
        ip netns exec dummy-ns$i ip link set lo up
done

for ((j=0; j<$NIFS; j++))
do
        for ((i=0; i<1000; i++))
        do
                if=eth$j$i
                ip netns exec dummy-ns$j ip link add $if type dummy
                ip netns exec dummy-ns$j ip link set $if up
                done
done

for ((i=0; i<$NIFS; i++))
do
        ip netns del dummy-ns$i
done
The test will result in the following stack. So clean up work must
sleep for a while when notify device down/change.

watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [kworker/u8:5:288]
Modules linked in:
CPU: 0 PID: 288 Comm: kworker/u8:5 Tainted: G    B             5.10.0+ #5
Hardware name: linux,dummy-virt (DT)
Workqueue: netns cleanup_net
pstate: 20000005 (nzCv daif -PAN -UAO -TCO BTYPE=--)
pc : atomic_set include/asm-generic/atomic-instrumented.h:46 [inline]
pc : __alloc_skb+0x268/0x450 net/core/skbuff.c:241
lr : atomic_set include/asm-generic/atomic-instrumented.h:46 [inline]
lr : __alloc_skb+0x268/0x450 net/core/skbuff.c:241
sp : ffff000015607610
x29: ffff000015607610 x28: 00000000ffffffff
x27: 0000000000000001 x26: ffff0000cc9400e0
x25: ffff0000c745c1be x24: 1fffe00002ac0ed0
x23: 0000000000000000 x22: ffff0000cc9400c0
x21: ffff0000c745c234 x20: ffff0000cc940000
x19: ffff0000c745c140 x18: 0000000000000000
x17: 0000000000000000 x16: 0000000000000000
x15: 0000000000000000 x14: 1fffe00002ac0f00
x13: 0000000000000000 x12: ffff80001992801d
x11: 1fffe0001992801c x10: ffff80001992801c
x9 : dfffa00000000000 x8 : ffff0000cc9400e3
x7 : 0000000000000001 x6 : ffff80001992801c
x5 : ffff0000cc9400e0 x4 : dfffa00000000000
x3 : ffffa00011529a78 x2 : 0000000000000003
x1 : 0000000000000000 x0 : ffff0000cc9400e0
Call trace:
 atomic_set include/asm-generic/atomic-instrumented.h:46 [inline]
 __alloc_skb+0x268/0x450 net/core/skbuff.c:241
 alloc_skb include/linux/skbuff.h:1107 [inline]
 nlmsg_new include/net/netlink.h:958 [inline]
 rtmsg_ifa+0xf4/0x1e0 net/ipv4/devinet.c:1900
 __inet_del_ifa+0x328/0x650 net/ipv4/devinet.c:427
 inet_del_ifa net/ipv4/devinet.c:465 [inline]
 inetdev_destroy net/ipv4/devinet.c:318 [inline]
 inetdev_event+0x2ac/0xac0 net/ipv4/devinet.c:1599
 notifier_call_chain kernel/notifier.c:83 [inline]
 raw_notifier_call_chain+0x94/0xd0 kernel/notifier.c:410
 call_netdevice_notifiers_info+0x9c/0x14c net/core/dev.c:2047
 call_netdevice_notifiers_extack net/core/dev.c:2059 [inline]
 call_netdevice_notifiers net/core/dev.c:2073 [inline]
 rollback_registered_many+0x3d0/0x7dc net/core/dev.c:9558
 unregister_netdevice_many+0x40/0x1b0 net/core/dev.c:10779
 default_device_exit_batch+0x24c/0x2a0 net/core/dev.c:11262
 ops_exit_list+0xb4/0xd0 net/core/net_namespace.c:192
 cleanup_net+0x2b8/0x540 net/core/net_namespace.c:608
 process_one_work+0x3ec/0xa40 kernel/workqueue.c:2279
 worker_thread+0x110/0x8b0 kernel/workqueue.c:2425
 kthread+0x1ac/0x1fc kernel/kthread.c:313
 ret_from_fork+0x10/0x18 arch/arm64/kernel/entry.S:1034
Signed-off-by: NZhengchao Shao <shaozhengchao@huawei.com>
Reviewed-by: NWei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

758c9c45

31 5月, 2022 1 次提交

net, xdp: Update pkt_type if generic XDP changes unicast MAC · 5dbf395c

由 Martin Willi 提交于 5月 31, 2022

mainline inclusion
from mainline-v5.13-rc1
commit 22b60343
category: bugfix
bugzilla: 186880 https://gitee.com/openeuler/kernel/issues/I5A7W4

--------------------------------

If a generic XDP program changes the destination MAC address from/to
multicast/broadcast, the skb->pkt_type is updated to properly handle
the packet when passed up the stack. When changing the MAC from/to
the NICs MAC, PACKET_HOST/OTHERHOST is not updated, though, making
the behavior different from that of native XDP.

Remember the PACKET_HOST/OTHERHOST state before calling the program
in generic XDP, and update pkt_type accordingly if the destination
MAC address has changed. As eth_type_trans() assumes a default
pkt_type of PACKET_HOST, restore that before calling it.

The use case for this is when a XDP program wants to push received
packets up the stack by rewriting the MAC to the NICs MAC, for
example by cluster nodes sharing MAC addresses.

Fixes: 29724956 ("net: fix generic XDP to handle if eth header was mangled")
Signed-off-by: NMartin Willi <martin@strongswan.org>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NToke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/bpf/20210419141559.8611-1-martin@strongswan.orgSigned-off-by: NZhengchao Shao <shaozhengchao@huawei.com>
Conflict:
	net/core/dev.c
Reviewed-by: NWei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

5dbf395c

27 4月, 2022 1 次提交

xdp: check prog type before updating BPF link · 05f2806d

由 Toke Høiland-Jørgensen 提交于 4月 27, 2022

stable inclusion
from stable-v5.10.94
commit 58fa3e900255d61684ea5dfce0302b4b59d39c24
bugzilla: https://gitee.com/openeuler/kernel/issues/I531X9

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=58fa3e900255d61684ea5dfce0302b4b59d39c24

--------------------------------

commit 382778ed upstream.

The bpf_xdp_link_update() function didn't check the program type before
updating the program, which made it possible to install any program type as
an XDP program, which is obviously not good. Syzbot managed to trigger this
by swapping in an LWT program on the XDP hook which would crash in a helper
call.

Fix this by adding a check and bailing out if the types don't match.

Fixes: 026a4c28 ("bpf, xdp: Implement LINK_UPDATE for BPF XDP link")
Reported-by: syzbot+983941aa85af6ded1fd9@syzkaller.appspotmail.com
Acked-by: NAndrii Nakryiko <andrii@kernel.org>
Signed-off-by: NToke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/r/20220107221115.326171-1-toke@redhat.comSigned-off-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>

05f2806d

08 3月, 2022 3 次提交

net: avoid quadratic behavior in netdev_wait_allrefs_any() · c051cdf1

由 Eric Dumazet 提交于 3月 08, 2022

net-next inclusion
from net-next-v5.17-rc5
commit 86213f80
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4VZN0?from=project-issue

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=86213f80da1b1d007721cc22e04b5f5d0da33127

--------------------------------

If the list of devices has N elements, netdev_wait_allrefs_any()
is called N times, and linkwatch_forget_dev() is called N*(N-1)/2 times.

Fix this by calling linkwatch_forget_dev() only once per device.

Fixes: faab39f6 ("net: allow out-of-order netdev unregistration")
Signed-off-by: NEric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20220218065430.2613262-1-eric.dumazet@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
Signed-off-by: NZiyang Xuan <william.xuanziyang@huawei.com>
Reviewed-by: NWei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

c051cdf1

net: allow out-of-order netdev unregistration · 8755374c

由 Jakub Kicinski 提交于 3月 08, 2022

net-next inclusion
from net-next-v5.17-rc5
commit faab39f6
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4VZN0?from=project-issue

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=faab39f63c1fc4bcdf135690f03bd596b578c67e

--------------------------------

Sprinkle for each loops to allow netdevices to be unregistered
out of order, as their refs are released.

This prevents problems caused by dependencies between netdevs
which want to release references in their ->priv_destructor.
See commit d6ff94af ("vlan: move dev_put into vlan_dev_uninit")
for example.

Eric has removed the only known ordering requirement in
commit c002496b ("Merge branch 'ipv6-loopback'")
so let's try this and see if anything explodes...
Reviewed-by: NEric Dumazet <edumazet@google.com>
Reviewed-by: NXin Long <lucien.xin@gmail.com>
Link: https://lore.kernel.org/r/20220215225310.3679266-2-kuba@kernel.orgSigned-off-by: NJakub Kicinski <kuba@kernel.org>
Conflicts:
	net/core/dev.c
Signed-off-by: NZiyang Xuan <william.xuanziyang@huawei.com>
Reviewed-by: NWei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

8755374c

net: transition netdev reg state earlier in run_todo · b2a7df99

由 Jakub Kicinski 提交于 3月 08, 2022

net-next inclusion
from net-next-v5.17-rc5
commit ae68db14
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4VZN0?from=project-issue

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=ae68db14b6164ce46beffaf35eb7c9bb2f92fee3

--------------------------------

In prep for unregistering netdevs out of order move the netdev
state validation and change outside of the loop.

While at it modernize this code and use WARN() instead of
pr_err() + dump_stack().
Reviewed-by: NEric Dumazet <edumazet@google.com>
Reviewed-by: NXin Long <lucien.xin@gmail.com>
Link: https://lore.kernel.org/r/20220215225310.3679266-1-kuba@kernel.orgSigned-off-by: NJakub Kicinski <kuba@kernel.org>
Signed-off-by: NZiyang Xuan <william.xuanziyang@huawei.com>
Reviewed-by: NWei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

b2a7df99

14 1月, 2022 1 次提交

net: annotate data-races on txq->xmit_lock_owner · d7764824

由 Eric Dumazet 提交于 1月 14, 2022

stable inclusion
from stable-v5.10.84
commit fa973bf5fd0fda6f0bf9a5d3d403078824dc27ac
bugzilla: 186030 https://gitee.com/openeuler/kernel/issues/I4QV2F

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=fa973bf5fd0fda6f0bf9a5d3d403078824dc27ac

--------------------------------

commit 7a10d8c8 upstream.

syzbot found that __dev_queue_xmit() is reading txq->xmit_lock_owner
without annotations.

No serious issue there, let's document what is happening there.

BUG: KCSAN: data-race in __dev_queue_xmit / __dev_queue_xmit

write to 0xffff888139d09484 of 4 bytes by interrupt on cpu 0:
 __netif_tx_unlock include/linux/netdevice.h:4437 [inline]
 __dev_queue_xmit+0x948/0xf70 net/core/dev.c:4229
 dev_queue_xmit_accel+0x19/0x20 net/core/dev.c:4265
 macvlan_queue_xmit drivers/net/macvlan.c:543 [inline]
 macvlan_start_xmit+0x2b3/0x3d0 drivers/net/macvlan.c:567
 __netdev_start_xmit include/linux/netdevice.h:4987 [inline]
 netdev_start_xmit include/linux/netdevice.h:5001 [inline]
 xmit_one+0x105/0x2f0 net/core/dev.c:3590
 dev_hard_start_xmit+0x72/0x120 net/core/dev.c:3606
 sch_direct_xmit+0x1b2/0x7c0 net/sched/sch_generic.c:342
 __dev_xmit_skb+0x83d/0x1370 net/core/dev.c:3817
 __dev_queue_xmit+0x590/0xf70 net/core/dev.c:4194
 dev_queue_xmit+0x13/0x20 net/core/dev.c:4259
 neigh_hh_output include/net/neighbour.h:511 [inline]
 neigh_output include/net/neighbour.h:525 [inline]
 ip6_finish_output2+0x995/0xbb0 net/ipv6/ip6_output.c:126
 __ip6_finish_output net/ipv6/ip6_output.c:191 [inline]
 ip6_finish_output+0x444/0x4c0 net/ipv6/ip6_output.c:201
 NF_HOOK_COND include/linux/netfilter.h:296 [inline]
 ip6_output+0x10e/0x210 net/ipv6/ip6_output.c:224
 dst_output include/net/dst.h:450 [inline]
 NF_HOOK include/linux/netfilter.h:307 [inline]
 ndisc_send_skb+0x486/0x610 net/ipv6/ndisc.c:508
 ndisc_send_rs+0x3b0/0x3e0 net/ipv6/ndisc.c:702
 addrconf_rs_timer+0x370/0x540 net/ipv6/addrconf.c:3898
 call_timer_fn+0x2e/0x240 kernel/time/timer.c:1421
 expire_timers+0x116/0x240 kernel/time/timer.c:1466
 __run_timers+0x368/0x410 kernel/time/timer.c:1734
 run_timer_softirq+0x2e/0x60 kernel/time/timer.c:1747
 __do_softirq+0x158/0x2de kernel/softirq.c:558
 __irq_exit_rcu kernel/softirq.c:636 [inline]
 irq_exit_rcu+0x37/0x70 kernel/softirq.c:648
 sysvec_apic_timer_interrupt+0x3e/0xb0 arch/x86/kernel/apic/apic.c:1097
 asm_sysvec_apic_timer_interrupt+0x12/0x20

read to 0xffff888139d09484 of 4 bytes by interrupt on cpu 1:
 __dev_queue_xmit+0x5e3/0xf70 net/core/dev.c:4213
 dev_queue_xmit_accel+0x19/0x20 net/core/dev.c:4265
 macvlan_queue_xmit drivers/net/macvlan.c:543 [inline]
 macvlan_start_xmit+0x2b3/0x3d0 drivers/net/macvlan.c:567
 __netdev_start_xmit include/linux/netdevice.h:4987 [inline]
 netdev_start_xmit include/linux/netdevice.h:5001 [inline]
 xmit_one+0x105/0x2f0 net/core/dev.c:3590
 dev_hard_start_xmit+0x72/0x120 net/core/dev.c:3606
 sch_direct_xmit+0x1b2/0x7c0 net/sched/sch_generic.c:342
 __dev_xmit_skb+0x83d/0x1370 net/core/dev.c:3817
 __dev_queue_xmit+0x590/0xf70 net/core/dev.c:4194
 dev_queue_xmit+0x13/0x20 net/core/dev.c:4259
 neigh_resolve_output+0x3db/0x410 net/core/neighbour.c:1523
 neigh_output include/net/neighbour.h:527 [inline]
 ip6_finish_output2+0x9be/0xbb0 net/ipv6/ip6_output.c:126
 __ip6_finish_output net/ipv6/ip6_output.c:191 [inline]
 ip6_finish_output+0x444/0x4c0 net/ipv6/ip6_output.c:201
 NF_HOOK_COND include/linux/netfilter.h:296 [inline]
 ip6_output+0x10e/0x210 net/ipv6/ip6_output.c:224
 dst_output include/net/dst.h:450 [inline]
 NF_HOOK include/linux/netfilter.h:307 [inline]
 ndisc_send_skb+0x486/0x610 net/ipv6/ndisc.c:508
 ndisc_send_rs+0x3b0/0x3e0 net/ipv6/ndisc.c:702
 addrconf_rs_timer+0x370/0x540 net/ipv6/addrconf.c:3898
 call_timer_fn+0x2e/0x240 kernel/time/timer.c:1421
 expire_timers+0x116/0x240 kernel/time/timer.c:1466
 __run_timers+0x368/0x410 kernel/time/timer.c:1734
 run_timer_softirq+0x2e/0x60 kernel/time/timer.c:1747
 __do_softirq+0x158/0x2de kernel/softirq.c:558
 __irq_exit_rcu kernel/softirq.c:636 [inline]
 irq_exit_rcu+0x37/0x70 kernel/softirq.c:648
 sysvec_apic_timer_interrupt+0x8d/0xb0 arch/x86/kernel/apic/apic.c:1097
 asm_sysvec_apic_timer_interrupt+0x12/0x20
 kcsan_setup_watchpoint+0x94/0x420 kernel/kcsan/core.c:443
 folio_test_anon include/linux/page-flags.h:581 [inline]
 PageAnon include/linux/page-flags.h:586 [inline]
 zap_pte_range+0x5ac/0x10e0 mm/memory.c:1347
 zap_pmd_range mm/memory.c:1467 [inline]
 zap_pud_range mm/memory.c:1496 [inline]
 zap_p4d_range mm/memory.c:1517 [inline]
 unmap_page_range+0x2dc/0x3d0 mm/memory.c:1538
 unmap_single_vma+0x157/0x210 mm/memory.c:1583
 unmap_vmas+0xd0/0x180 mm/memory.c:1615
 exit_mmap+0x23d/0x470 mm/mmap.c:3170
 __mmput+0x27/0x1b0 kernel/fork.c:1113
 mmput+0x3d/0x50 kernel/fork.c:1134
 exit_mm+0xdb/0x170 kernel/exit.c:507
 do_exit+0x608/0x17a0 kernel/exit.c:819
 do_group_exit+0xce/0x180 kernel/exit.c:929
 get_signal+0xfc3/0x1550 kernel/signal.c:2852
 arch_do_signal_or_restart+0x8c/0x2e0 arch/x86/kernel/signal.c:868
 handle_signal_work kernel/entry/common.c:148 [inline]
 exit_to_user_mode_loop kernel/entry/common.c:172 [inline]
 exit_to_user_mode_prepare+0x113/0x190 kernel/entry/common.c:207
 __syscall_exit_to_user_mode_work kernel/entry/common.c:289 [inline]
 syscall_exit_to_user_mode+0x20/0x40 kernel/entry/common.c:300
 do_syscall_64+0x50/0xd0 arch/x86/entry/common.c:86
 entry_SYSCALL_64_after_hwframe+0x44/0xae

value changed: 0x00000000 -> 0xffffffff

Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 28712 Comm: syz-executor.0 Tainted: G        W         5.16.0-rc1-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011

Fixes: 1da177e4 ("Linux-2.6.12-rc2")
Signed-off-by: NEric Dumazet <edumazet@google.com>
Reported-by: Nsyzbot <syzkaller@googlegroups.com>
Link: https://lore.kernel.org/r/20211130170155.2331929-1-eric.dumazet@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

d7764824

06 12月, 2021 2 次提交

net: sched: update default qdisc visibility after Tx queue cnt changes · e4c35a3c

由 Jakub Kicinski 提交于 12月 06, 2021

stable inclusion
from stable-5.10.80
commit 8d433ab5c8c26fa33ef9013b375a74d30918ac03
bugzilla: 185821 https://gitee.com/openeuler/kernel/issues/I4L7CG

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=8d433ab5c8c26fa33ef9013b375a74d30918ac03

--------------------------------

[ Upstream commit 1e080f17 ]

mq / mqprio make the default child qdiscs visible. They only do
so for the qdiscs which are within real_num_tx_queues when the
device is registered. Depending on order of calls in the driver,
or if user space changes config via ethtool -L the number of
qdiscs visible under tc qdisc show will differ from the number
of queues. This is confusing to users and potentially to system
configuration scripts which try to make sure qdiscs have the
right parameters.

Add a new Qdisc_ops callback and make relevant qdiscs TTRT.

Note that this uncovers the "shortcut" created by
commit 1f27cde3 ("net: sched: use pfifo_fast for non real queues")
The default child qdiscs beyond initial real_num_tx are always
pfifo_fast, no matter what the sysfs setting is. Fixing this
gets a little tricky because we'd need to keep a reference
on whatever the default qdisc was at the time of creation.
In practice this is likely an non-issue the qdiscs likely have
to be configured to non-default settings, so whatever user space
is doing such configuration can replace the pfifos... now that
it will see them.
Reported-by: NMatthew Massey <matthewmassey@fb.com>
Reviewed-by: NDave Taht <dave.taht@gmail.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Reviewed-by: NWeilong Chen <chenweilong@huawei.com>
Acked-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

e4c35a3c

net: multicast: calculate csum of looped-back and forwarded packets · 6e05c8c9

由 Cyril Strejc 提交于 12月 06, 2021

stable inclusion
from stable-5.10.80
commit b8cb3f4ffa3ad0b4f0d62334aa6cda28e7f93881
bugzilla: 185821 https://gitee.com/openeuler/kernel/issues/I4L7CG

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=b8cb3f4ffa3ad0b4f0d62334aa6cda28e7f93881

--------------------------------

[ Upstream commit 9122a70a ]

During a testing of an user-space application which transmits UDP
multicast datagrams and utilizes multicast routing to send the UDP
datagrams out of defined network interfaces, I've found a multicast
router does not fill-in UDP checksum into locally produced, looped-back
and forwarded UDP datagrams, if an original output NIC the datagrams
are sent to has UDP TX checksum offload enabled.

The datagrams are sent malformed out of the NIC the datagrams have been
forwarded to.

It is because:

1. If TX checksum offload is enabled on the output NIC, UDP checksum
is not calculated by kernel and is not filled into skb data.

2. dev_loopback_xmit(), which is called solely by
ip_mc_finish_output(), sets skb->ip_summed = CHECKSUM_UNNECESSARY
unconditionally.

3. Since 35fc92a9 ("[NET]: Allow forwarding of ip_summed except
CHECKSUM_COMPLETE"), the ip_summed value is preserved during
forwarding.

4. If ip_summed != CHECKSUM_PARTIAL, checksum is not calculated during
a packet egress.

The minimum fix in dev_loopback_xmit():

1. Preserves skb->ip_summed CHECKSUM_PARTIAL. This is the
case when the original output NIC has TX checksum offload enabled.
The effects are:

a) If the forwarding destination interface supports TX checksum
offloading, the NIC driver is responsible to fill-in the
checksum.

b) If the forwarding destination interface does NOT support TX
checksum offloading, checksums are filled-in by kernel before
skb is submitted to the NIC driver.

c) For local delivery, checksum validation is skipped as in the
case of CHECKSUM_UNNECESSARY, thanks to skb_csum_unnecessary().

2. Translates ip_summed CHECKSUM_NONE to CHECKSUM_UNNECESSARY. It
means, for CHECKSUM_NONE, the behavior is unmodified and is there
to skip a looped-back packet local delivery checksum validation.
Signed-off-by: NCyril Strejc <cyril.strejc@skoda.cz>
Reviewed-by: NWillem de Bruijn <willemb@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Reviewed-by: NWeilong Chen <chenweilong@huawei.com>
Acked-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

6e05c8c9

15 11月, 2021 2 次提交

net: Prevent infinite while loop in skb_tx_hash() · 2dc5039f

由 Michael Chan 提交于 11月 15, 2021

stable inclusion
from stable-5.10.77
commit 2b7c5eed19d3009dd6cca36853b3a22d7eff9209
bugzilla: 185677 https://gitee.com/openeuler/kernel/issues/I4IAP7

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=2b7c5eed19d3009dd6cca36853b3a22d7eff9209

--------------------------------

commit 0c57eeec upstream.

Drivers call netdev_set_num_tc() and then netdev_set_tc_queue()
to set the queue count and offset for each TC.  So the queue count
and offset for the TCs may be zero for a short period after dev->num_tc
has been set.  If a TX packet is being transmitted at this time in the
code path netdev_pick_tx() -> skb_tx_hash(), skb_tx_hash() may see
nonzero dev->num_tc but zero qcount for the TC.  The while loop that
keeps looping while hash >= qcount will not end.

Fix it by checking the TC's qcount to be nonzero before using it.

Fixes: eadec877 ("net: Add support for subordinate traffic classes to netdev_pick_tx")
Reviewed-by: NAndy Gospodarek <gospo@broadcom.com>
Signed-off-by: NMichael Chan <michael.chan@broadcom.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

2dc5039f

net: make free_netdev() more lenient with unregistering devices · 47c846eb

由 Jakub Kicinski 提交于 11月 15, 2021

mainline inclusion
from mainline-v5.11-rc4
commit c269a24c
category: bugfix
bugzilla: 181892 https://gitee.com/openeuler/kernel/issues/I4DDEL

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c269a24ce057abfc31130960e96ab197ef6ab196

-------------------------------------------------

There are two flavors of handling netdev registration:
 - ones called without holding rtnl_lock: register_netdev() and
   unregister_netdev(); and
 - those called with rtnl_lock held: register_netdevice() and
   unregister_netdevice().

While the semantics of the former are pretty clear, the same can't
be said about the latter. The netdev_todo mechanism is utilized to
perform some of the device unregistering tasks and it hooks into
rtnl_unlock() so the locked variants can't actually finish the work.
In general free_netdev() does not mix well with locked calls. Most
drivers operating under rtnl_lock set dev->needs_free_netdev to true
and expect core to make the free_netdev() call some time later.

The part where this becomes most problematic is error paths. There is
no way to unwind the state cleanly after a call to register_netdevice(),
since unreg can't be performed fully without dropping locks.

Make free_netdev() more lenient, and defer the freeing if device
is being unregistered. This allows error paths to simply call
free_netdev() both after register_netdevice() failed, and after
a call to unregister_netdevice() but before dropping rtnl_lock.

Simplify the error paths which are currently doing gymnastics
around free_netdev() handling.
Signed-off-by: NJakub Kicinski <kuba@kernel.org>
Signed-off-by: NXu Jia <xujia39@huawei.com>
Reviewed-by: NYue Haibing <yuehaibing@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

47c846eb

15 10月, 2021 4 次提交

skbuff: Fix build with SKB extensions disabled · a41e4108

由 Florian Fainelli 提交于 10月 14, 2021

stable inclusion
from stable-5.10.54
commit c9f8e17990e05b1c848a28566e3a31f7fef8ea2c
bugzilla: 175586 https://gitee.com/openeuler/kernel/issues/I4DVDU

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=c9f8e17990e05b1c848a28566e3a31f7fef8ea2c

--------------------------------

commit 9615fe36 upstream.

We will fail to build with CONFIG_SKB_EXTENSIONS disabled after
8550ff8d ("skbuff: Release nfct refcount on napi stolen or re-used
skbs") since there is an unconditionally use of skb_ext_find() without
an appropriate stub. Simply build the code conditionally and properly
guard against both COFNIG_SKB_EXTENSIONS as well as
CONFIG_NET_TC_SKB_EXT being disabled.

Fixes: Fixes: 8550ff8d ("skbuff: Release nfct refcount on napi stolen or re-used skbs")
Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
Reviewed-by: NRoi Dayan <roid@nvidia.com>
Reviewed-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

a41e4108

skbuff: Release nfct refcount on napi stolen or re-used skbs · b3d48838

由 Paul Blakey 提交于 10月 14, 2021

stable inclusion
from stable-5.10.54
commit 570341f10eccfe5c339363544a0339a8550944b7
bugzilla: 175586 https://gitee.com/openeuler/kernel/issues/I4DVDU

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=570341f10eccfe5c339363544a0339a8550944b7

--------------------------------

commit 8550ff8d upstream.

When multiple SKBs are merged to a new skb under napi GRO,
or SKB is re-used by napi, if nfct was set for them in the
driver, it will not be released while freeing their stolen
head state or on re-use.

Release nfct on napi's stolen or re-used SKBs, and
in gro_list_prepare, check conntrack metadata diff.

Fixes: 5c6b9460 ("net/mlx5e: CT: Handle misses after executing CT action")
Reviewed-by: NRoi Dayan <roid@nvidia.com>
Signed-off-by: NPaul Blakey <paulb@nvidia.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

b3d48838

xdp, net: Fix use-after-free in bpf_xdp_link_release · 2f3d77b9

由 Xuan Zhuo 提交于 10月 14, 2021

stable inclusion
from stable-5.10.54
commit ca9ba1de8f09976b45ccc8e655c51c6201992139
bugzilla: 175586 https://gitee.com/openeuler/kernel/issues/I4DVDU

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=ca9ba1de8f09976b45ccc8e655c51c6201992139

--------------------------------

[ Upstream commit 5acc7d3e ]

The problem occurs between dev_get_by_index() and dev_xdp_attach_link().
At this point, dev_xdp_uninstall() is called. Then xdp link will not be
detached automatically when dev is released. But link->dev already
points to dev, when xdp link is released, dev will still be accessed,
but dev has been released.

dev_get_by_index()        |
link->dev = dev           |
                          |      rtnl_lock()
                          |      unregister_netdevice_many()
                          |          dev_xdp_uninstall()
                          |      rtnl_unlock()
rtnl_lock();              |
dev_xdp_attach_link()     |
rtnl_unlock();            |
                          |      netdev_run_todo() // dev released
bpf_xdp_link_release()    |
    /* access dev.        |
       use-after-free */  |

[   45.966867] BUG: KASAN: use-after-free in bpf_xdp_link_release+0x3b8/0x3d0
[   45.967619] Read of size 8 at addr ffff00000f9980c8 by task a.out/732
[   45.968297]
[   45.968502] CPU: 1 PID: 732 Comm: a.out Not tainted 5.13.0+ #22
[   45.969222] Hardware name: linux,dummy-virt (DT)
[   45.969795] Call trace:
[   45.970106]  dump_backtrace+0x0/0x4c8
[   45.970564]  show_stack+0x30/0x40
[   45.970981]  dump_stack_lvl+0x120/0x18c
[   45.971470]  print_address_description.constprop.0+0x74/0x30c
[   45.972182]  kasan_report+0x1e8/0x200
[   45.972659]  __asan_report_load8_noabort+0x2c/0x50
[   45.973273]  bpf_xdp_link_release+0x3b8/0x3d0
[   45.973834]  bpf_link_free+0xd0/0x188
[   45.974315]  bpf_link_put+0x1d0/0x218
[   45.974790]  bpf_link_release+0x3c/0x58
[   45.975291]  __fput+0x20c/0x7e8
[   45.975706]  ____fput+0x24/0x30
[   45.976117]  task_work_run+0x104/0x258
[   45.976609]  do_notify_resume+0x894/0xaf8
[   45.977121]  work_pending+0xc/0x328
[   45.977575]
[   45.977775] The buggy address belongs to the page:
[   45.978369] page:fffffc00003e6600 refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x4f998
[   45.979522] flags: 0x7fffe0000000000(node=0|zone=0|lastcpupid=0x3ffff)
[   45.980349] raw: 07fffe0000000000 fffffc00003e6708 ffff0000dac3c010 0000000000000000
[   45.981309] raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
[   45.982259] page dumped because: kasan: bad access detected
[   45.982948]
[   45.983153] Memory state around the buggy address:
[   45.983753]  ffff00000f997f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[   45.984645]  ffff00000f998000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[   45.985533] >ffff00000f998080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[   45.986419]                                               ^
[   45.987112]  ffff00000f998100: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[   45.988006]  ffff00000f998180: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
[   45.988895] ==================================================================
[   45.989773] Disabling lock debugging due to kernel taint
[   45.990552] Kernel panic - not syncing: panic_on_warn set ...
[   45.991166] CPU: 1 PID: 732 Comm: a.out Tainted: G    B             5.13.0+ #22
[   45.991929] Hardware name: linux,dummy-virt (DT)
[   45.992448] Call trace:
[   45.992753]  dump_backtrace+0x0/0x4c8
[   45.993208]  show_stack+0x30/0x40
[   45.993627]  dump_stack_lvl+0x120/0x18c
[   45.994113]  dump_stack+0x1c/0x34
[   45.994530]  panic+0x3a4/0x7d8
[   45.994930]  end_report+0x194/0x198
[   45.995380]  kasan_report+0x134/0x200
[   45.995850]  __asan_report_load8_noabort+0x2c/0x50
[   45.996453]  bpf_xdp_link_release+0x3b8/0x3d0
[   45.997007]  bpf_link_free+0xd0/0x188
[   45.997474]  bpf_link_put+0x1d0/0x218
[   45.997942]  bpf_link_release+0x3c/0x58
[   45.998429]  __fput+0x20c/0x7e8
[   45.998833]  ____fput+0x24/0x30
[   45.999247]  task_work_run+0x104/0x258
[   45.999731]  do_notify_resume+0x894/0xaf8
[   46.000236]  work_pending+0xc/0x328
[   46.000697] SMP: stopping secondary CPUs
[   46.001226] Dumping ftrace buffer:
[   46.001663]    (ftrace buffer empty)
[   46.002110] Kernel Offset: disabled
[   46.002545] CPU features: 0x00000001,23202c00
[   46.003080] Memory Limit: none

Fixes: aa8d3a71 ("bpf, xdp: Add bpf_link-based XDP attachment API")
Reported-by: NAbaci <abaci@linux.alibaba.com>
Signed-off-by: NXuan Zhuo <xuanzhuo@linux.alibaba.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Reviewed-by: NDust Li <dust.li@linux.alibaba.com>
Acked-by: NAndrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210710031635.41649-1-xuanzhuo@linux.alibaba.comSigned-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

2f3d77b9

net: Treat __napi_schedule_irqoff() as __napi_schedule() on PREEMPT_RT · 948e7c95

由 Sebastian Andrzej Siewior 提交于 10月 14, 2021

stable inclusion
from stable-5.10.51
commit 3393405257ed27f5b13b75faf1842e1b083c6f7b
bugzilla: 175263 https://gitee.com/openeuler/kernel/issues/I4DT6F

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=3393405257ed27f5b13b75faf1842e1b083c6f7b

--------------------------------

[ Upstream commit 8380c81d ]

__napi_schedule_irqoff() is an optimized version of __napi_schedule()
which can be used where it is known that interrupts are disabled,
e.g. in interrupt-handlers, spin_lock_irq() sections or hrtimer
callbacks.

On PREEMPT_RT enabled kernels this assumptions is not true. Force-
threaded interrupt handlers and spinlocks are not disabling interrupts
and the NAPI hrtimer callback is forced into softirq context which runs
with interrupts enabled as well.

Chasing all usage sites of __napi_schedule_irqoff() is a whack-a-mole
game so make __napi_schedule_irqoff() invoke __napi_schedule() for
PREEMPT_RT kernels.

The callers of ____napi_schedule() in the networking core have been
audited and are correct on PREEMPT_RT kernels as well.
Reported-by: NJuri Lelli <juri.lelli@redhat.com>
Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
Reviewed-by: NJuri Lelli <juri.lelli@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

948e7c95

12 10月, 2021 1 次提交

net: make sure devices go through netdev_wait_all_refs · 2bab5c83

由 Jakub Kicinski 提交于 10月 12, 2021

mainline inclusion
from mainline-v5.11-rc4
commit 766b0515
category: bugfix
bugzilla: 108062 https://gitee.com/openeuler/kernel/issues/I4DDEL

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=766b0515d5bec4b780750773ed3009b148df8c0a

-----------------------------------------------

If register_netdevice() fails at the very last stage - the
notifier call - some subsystems may have already seen it and
grabbed a reference. struct net_device can't be freed right
away without calling netdev_wait_all_refs().

Now that we have a clean interface in form of dev->needs_free_netdev
and lenient free_netdev() we can undo what commit 93ee31f1 ("[NET]:
Fix free_netdev on register_netdev failure.") has done and complete
the unregistration path by bringing the net_set_todo() call back.

After registration fails user is still expected to explicitly
free the net_device, so make sure ->needs_free_netdev is cleared,
otherwise rolling back the registration will cause the old double
free for callers who release rtnl_lock before the free.

This also solves the problem of priv_destructor not being called
on notifier error.

net_set_todo() will be moved back into unregister_netdevice_queue()
in a follow up.
Reported-by: NHulk Robot <hulkci@huawei.com>
Reported-by: NYang Yingliang <yangyingliang@huawei.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>
Signed-off-by: NHuang Guobin <huangguobin4@huawei.com>
Reviewed-by: NYue Haibing <yuehaibing@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

2bab5c83

17 7月, 2021 1 次提交

net: add inline function skb_csum_is_sctp · 607f00b3

由 Xin Long 提交于 7月 16, 2021

mainline inclusion
from mainline-v5.12-rc1-dontuse
commit fa821170
category: feature
bugzilla: 173966
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=fa82117010430aff2ce86400f7328f55a31b48a6

----------------------------------------------------------------------

This patch is to define a inline function skb_csum_is_sctp(), and
also replace all places where it checks if it's a SCTP CSUM skb.
This function would be used later in many networking drivers in
the following patches.
Suggested-by: NAlexander Duyck <alexander.duyck@gmail.com>
Signed-off-by: NXin Long <lucien.xin@gmail.com>
Reviewed-by: NAlexander Duyck <alexanderduyck@fb.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>
Reviewed-by: NYongxin Li <liyongxin1@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

607f00b3

14 7月, 2021 1 次提交

smp: Cleanup smp_call_function*() · 4f9ed36f

由 Peter Zijlstra 提交于 7月 09, 2021

mainline inclusion
from mainline-v5.11-rc1
commit 545b8c8d
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I3ZV2C
CVE: NA

-------------------------------------------------

Get rid of the __call_single_node union and cleanup the API a little
to avoid external code relying on the structure layout as much.
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: NFrederic Weisbecker <frederic@kernel.org>

conflict:
	kernel/debug/debug_core.c
	kernel/sched/core.c
	kernel/smp.c: fix csd_lock_wait_getcpu() csd->node.dst
Signed-off-by: NTong Tiangen <tongtiangen@huawei.com>
Reviewed-by: NChen Wandun <chenwandun@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

4f9ed36f

15 6月, 2021 2 次提交

net: sched: fix tx action reschedule issue with stopped queue · fc1bd7af

由 Yunsheng Lin 提交于 6月 07, 2021

stable inclusion
from stable-5.10.42
commit f9fc21e2b11eb861a903aec8009dc03d9202933a
bugzilla: 55093
CVE: NA

--------------------------------

[ Upstream commit dcad9ee9 ]

The netdev qeueue might be stopped when byte queue limit has
reached or tx hw ring is full, net_tx_action() may still be
rescheduled if STATE_MISSED is set, which consumes unnecessary
cpu without dequeuing and transmiting any skb because the
netdev queue is stopped, see qdisc_run_end().

This patch fixes it by checking the netdev queue state before
calling qdisc_run() and clearing STATE_MISSED if netdev queue is
stopped during qdisc_run(), the net_tx_action() is rescheduled
again when netdev qeueue is restarted, see netif_tx_wake_queue().

As there is time window between netif_xmit_frozen_or_stopped()
checking and STATE_MISSED clearing, between which STATE_MISSED
may set by net_tx_action() scheduled by netif_tx_wake_queue(),
so set the STATE_MISSED again if netdev queue is restarted.

Fixes: 6b3ba914 ("net: sched: allow qdiscs to handle locking")
Reported-by: NMichal Kubecek <mkubecek@suse.cz>
Acked-by: NJakub Kicinski <kuba@kernel.org>
Signed-off-by: NYunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

fc1bd7af

net: sched: fix tx action rescheduling issue during deactivation · 006771a8

由 Yunsheng Lin 提交于 6月 07, 2021

stable inclusion
from stable-5.10.42
commit 2f23d5bcd9f89c239da83abd6270f5f0d9dd95bc
bugzilla: 55093
CVE: NA

--------------------------------

[ Upstream commit 102b55ee ]

Currently qdisc_run() checks the STATE_DEACTIVATED of lockless
qdisc before calling __qdisc_run(), which ultimately clear the
STATE_MISSED when all the skb is dequeued. If STATE_DEACTIVATED
is set before clearing STATE_MISSED, there may be rescheduling
of net_tx_action() at the end of qdisc_run_end(), see below:

CPU0(net_tx_atcion)  CPU1(__dev_xmit_skb)  CPU2(dev_deactivate)
          .                   .                     .
          .            set STATE_MISSED             .
          .           __netif_schedule()            .
          .                   .           set STATE_DEACTIVATED
          .                   .                qdisc_reset()
          .                   .                     .
          .<---------------   .              synchronize_net()
clear __QDISC_STATE_SCHED  |  .                     .
          .                |  .                     .
          .                |  .            some_qdisc_is_busy()
          .                |  .               return *false*
          .                |  .                     .
  test STATE_DEACTIVATED   |  .                     .
__qdisc_run() *not* called |  .                     .
          .                |  .                     .
   test STATE_MISS         |  .                     .
 __netif_schedule()--------|  .                     .
          .                   .                     .
          .                   .                     .

__qdisc_run() is not called by net_tx_atcion() in CPU0 because
CPU2 has set STATE_DEACTIVATED flag during dev_deactivate(), and
STATE_MISSED is only cleared in __qdisc_run(), __netif_schedule
is called at the end of qdisc_run_end(), causing tx action
rescheduling problem.

qdisc_run() called by net_tx_action() runs in the softirq context,
which should has the same semantic as the qdisc_run() called by
__dev_xmit_skb() protected by rcu_read_lock_bh(). And there is a
synchronize_net() between STATE_DEACTIVATED flag being set and
qdisc_reset()/some_qdisc_is_busy in dev_deactivate(), we can safely
bail out for the deactived lockless qdisc in net_tx_action(), and
qdisc_reset() will reset all skb not dequeued yet.

So add the rcu_read_lock() explicitly to protect the qdisc_run()
and do the STATE_DEACTIVATED checking in net_tx_action() before
calling qdisc_run_begin(). Another option is to do the checking in
the qdisc_run_end(), but it will add unnecessary overhead for
non-tx_action case, because __dev_queue_xmit() will not see qdisc
with STATE_DEACTIVATED after synchronize_net(), the qdisc with
STATE_DEACTIVATED can only be seen by net_tx_action() because of
__netif_schedule().

The STATE_DEACTIVATED checking in qdisc_run() is to avoid race
between net_tx_action() and qdisc_reset(), see:
commit d518d2ed ("net/sched: fix race between deactivation
and dequeue for NOLOCK qdisc"). As the bailout added above for
deactived lockless qdisc in net_tx_action() provides better
protection for the race without calling qdisc_run() at all, so
remove the STATE_DEACTIVATED checking in qdisc_run().

After qdisc_reset(), there is no skb in qdisc to be dequeued, so
clear the STATE_MISSED in dev_reset_queue() too.

Fixes: 6b3ba914 ("net: sched: allow qdiscs to handle locking")
Acked-by: NJakub Kicinski <kuba@kernel.org>
Signed-off-by: NYunsheng Lin <linyunsheng@huawei.com>
V8: Clearing STATE_MISSED before calling __netif_schedule() has
    avoid the endless rescheduling problem, but there may still
    be a unnecessary rescheduling, so adjust the commit log.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

006771a8

03 6月, 2021 1 次提交

gro: fix napi_gro_frags() Fast GRO breakage due to IP alignment check · 48c61836

由 Alexander Lobakin 提交于 5月 24, 2021

stable inclusion
from stable-5.10.37
commit da54cc2549399072b95926dbe9dc44546c297e75
bugzilla: 51868
CVE: NA

--------------------------------

[ Upstream commit 7ad18ff6 ]

Commit 38ec4944 ("gro: ensure frag0 meets IP header alignment")
did the right thing, but missed the fact that napi_gro_frags() logics
calls for skb_gro_reset_offset() *before* pulling Ethernet header
to the skb linear space.
That said, the introduced check for frag0 address being aligned to 4
always fails for it as Ethernet header is obviously 14 bytes long,
and in case with NET_IP_ALIGN its start is not aligned to 4.

Fix this by adding @nhoff argument to skb_gro_reset_offset() which
tells if an IP header is placed right at the start of frag0 or not.
This restores Fast GRO for napi_gro_frags() that became very slow
after the mentioned commit, and preserves the introduced check to
avoid silent unaligned accesses.

From v1 [0]:
 - inline tiny skb_gro_reset_offset() to let the code be optimized
   more efficively (esp. for the !NET_IP_ALIGN case) (Eric);
 - pull in Reviewed-by from Eric.

[0] https://lore.kernel.org/netdev/20210418114200.5839-1-alobakin@pm.me

Fixes: 38ec4944 ("gro: ensure frag0 meets IP header alignment")
Reviewed-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NAlexander Lobakin <alobakin@pm.me>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

48c61836

26 4月, 2021 1 次提交

gro: ensure frag0 meets IP header alignment · bb92a8a6

由 Eric Dumazet 提交于 4月 23, 2021

stable inclusion
from stable-5.10.32
commit 9143158a6bd35839ddd0cc723b1576aa8b3c632d
bugzilla: 51796

--------------------------------

commit 38ec4944 upstream.

After commit 0f6925b3 ("virtio_net: Do not pull payload in skb->head")
Guenter Roeck reported one failure in his tests using sh architecture.

After much debugging, we have been able to spot silent unaligned accesses
in inet_gro_receive()

The issue at hand is that upper networking stacks assume their header
is word-aligned. Low level drivers are supposed to reserve NET_IP_ALIGN
bytes before the Ethernet header to make that happen.

This patch hardens skb_gro_reset_offset() to not allow frag0 fast-path
if the fragment is not properly aligned.

Some arches like x86, arm64 and powerpc do not care and define NET_IP_ALIGN
as 0, this extra check will be a NOP for them.

Note that if frag0 is not used, GRO will call pskb_may_pull()
as many times as needed to pull network and transport headers.

Fixes: 0f6925b3 ("virtio_net: Do not pull payload in skb->head")
Fixes: 78a478d0 ("gro: Inline skb_gro_header and cache frag0 virtual address")
Signed-off-by: NEric Dumazet <edumazet@google.com>
Reported-by: NGuenter Roeck <linux@roeck-us.net>
Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Acked-by: NMichael S. Tsirkin <mst@redhat.com>
Tested-by: NGuenter Roeck <linux@roeck-us.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

bb92a8a6

19 4月, 2021 2 次提交

can: dev: Move device back to init netns on owning netns delete · b04a8995

由 Martin Willi 提交于 4月 07, 2021

stable inclusion
from stable-5.10.27
commit 8dc08a2962c855f4a88923017445799474ff6446
bugzilla: 51493

--------------------------------

commit 3a5ca857 upstream.

When a non-initial netns is destroyed, the usual policy is to delete
all virtual network interfaces contained, but move physical interfaces
back to the initial netns. This keeps the physical interface visible
on the system.

CAN devices are somewhat special, as they define rtnl_link_ops even
if they are physical devices. If a CAN interface is moved into a
non-initial netns, destroying that netns lets the interface vanish
instead of moving it back to the initial netns. default_device_exit()
skips CAN interfaces due to having rtnl_link_ops set. Reproducer:

  ip netns add foo
  ip link set can0 netns foo
  ip netns delete foo

WARNING: CPU: 1 PID: 84 at net/core/dev.c:11030 ops_exit_list+0x38/0x60
CPU: 1 PID: 84 Comm: kworker/u4:2 Not tainted 5.10.19 #1
Workqueue: netns cleanup_net
[<c010e700>] (unwind_backtrace) from [<c010a1d8>] (show_stack+0x10/0x14)
[<c010a1d8>] (show_stack) from [<c086dc10>] (dump_stack+0x94/0xa8)
[<c086dc10>] (dump_stack) from [<c086b938>] (__warn+0xb8/0x114)
[<c086b938>] (__warn) from [<c086ba10>] (warn_slowpath_fmt+0x7c/0xac)
[<c086ba10>] (warn_slowpath_fmt) from [<c0629f20>] (ops_exit_list+0x38/0x60)
[<c0629f20>] (ops_exit_list) from [<c062a5c4>] (cleanup_net+0x230/0x380)
[<c062a5c4>] (cleanup_net) from [<c0142c20>] (process_one_work+0x1d8/0x438)
[<c0142c20>] (process_one_work) from [<c0142ee4>] (worker_thread+0x64/0x5a8)
[<c0142ee4>] (worker_thread) from [<c0148a98>] (kthread+0x148/0x14c)
[<c0148a98>] (kthread) from [<c0100148>] (ret_from_fork+0x14/0x2c)

To properly restore physical CAN devices to the initial netns on owning
netns exit, introduce a flag on rtnl_link_ops that can be set by drivers.
For CAN devices setting this flag, default_device_exit() considers them
non-virtual, applying the usual namespace move.

The issue was introduced in the commit mentioned below, as at that time
CAN devices did not have a dellink() operation.

Fixes: e008b5fc ("net: Simplfy default_device_exit and improve batching.")
Link: https://lore.kernel.org/r/20210302122423.872326-1-martin@strongswan.orgSigned-off-by: NMartin Willi <martin@strongswan.org>
Signed-off-by: NMarc Kleine-Budde <mkl@pengutronix.de>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: N  Weilong Chen <chenweilong@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

b04a8995

net: check all name nodes in __dev_alloc_name · a11f1d2a

由 Jiri Bohac 提交于 4月 07, 2021

stable inclusion
from stable-5.10.27
commit 943e1583bf8a5cbcedfc4a00d92d8aac9e7e436d
bugzilla: 51493

--------------------------------

[ Upstream commit 6c015a22 ]

__dev_alloc_name(), when supplied with a name containing '%d',
will search for the first available device number to generate a
unique device name.

Since commit ff927412 ("net:
introduce name_node struct to be used in hashlist") network
devices may have alternate names.  __dev_alloc_name() does take
these alternate names into account, possibly generating a name
that is already taken and failing with -ENFILE as a result.

This demonstrates the bug:

    # rmmod dummy 2>/dev/null
    # ip link property add dev lo altname dummy0
    # modprobe dummy numdummies=1
    modprobe: ERROR: could not insert 'dummy': Too many open files in system

Instead of creating a device named dummy1, modprobe fails.

Fix this by checking all the names in the d->name_node list, not just d->name.
Signed-off-by: NJiri Bohac <jbohac@suse.cz>
Fixes: ff927412 ("net: introduce name_node struct to be used in hashlist")
Reviewed-by: NJiri Pirko <jiri@nvidia.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: N  Weilong Chen <chenweilong@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

a11f1d2a

09 4月, 2021 1 次提交

net: fix dev_ifsioc_locked() race condition · fb49a7bf

由 Cong Wang 提交于 3月 18, 2021

stable inclusion
from stable-5.10.21
commit 1fc205d9e400f069ebf30d3faa6ec2bab2cbd7b4
bugzilla: 50609

--------------------------------

commit 3b23a32a upstream.

dev_ifsioc_locked() is called with only RCU read lock, so when
there is a parallel writer changing the mac address, it could
get a partially updated mac address, as shown below:

Thread 1			Thread 2
// eth_commit_mac_addr_change()
memcpy(dev->dev_addr, addr->sa_data, ETH_ALEN);
				// dev_ifsioc_locked()
				memcpy(ifr->ifr_hwaddr.sa_data,
					dev->dev_addr,...);

Close this race condition by guarding them with a RW semaphore,
like netdev_get_name(). We can not use seqlock here as it does not
allow blocking. The writers already take RTNL anyway, so this does
not affect the slow path. To avoid bothering existing
dev_set_mac_address() callers in drivers, introduce a new wrapper
just for user-facing callers on ioctl and rtnetlink paths.

Note, bonding also changes slave mac addresses but that requires
a separate patch due to the complexity of bonding code.

Fixes: 3710becf ("net: RCU locking for simple ioctl()")
Reported-by: N"Gong, Sishuai" <sishuai@purdue.edu>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: NCong Wang <cong.wang@bytedance.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

fb49a7bf

09 3月, 2021 2 次提交

net/sched: fix miss init the mru in qdisc_skb_cb · 2de3d820

由 wenxu 提交于 3月 01, 2021

stable inclusion
from stable-5.10.18
commit 496ef46dbf6dcc432597f53af7be92c6a41dabec
bugzilla: 50148

--------------------------------

[ Upstream commit aadaca9e ]

The mru in the qdisc_skb_cb should be init as 0. Only defrag packets in the
act_ct will set the value.

Fixes: 038ebb1a ("net/sched: act_ct: fix miss set mru for ovs after defrag in act_ct")
Signed-off-by: Nwenxu <wenxu@ucloud.cn>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>

2de3d820

net: gro: do not keep too many GRO packets in napi->rx_list · b670ad4e

由 Eric Dumazet 提交于 2月 23, 2021

stable inclusion
from stable-5.10.17
commit 9e6ce473e96ba0ad63820325892373d24c1aa8aa
bugzilla: 48169

--------------------------------

commit 8dc1c444 upstream.

Commit c8079432 ("net: Fix packet reordering caused by GRO and
listified RX cooperation") had the unfortunate effect of adding
latencies in common workloads.

Before the patch, GRO packets were immediately passed to
upper stacks.

After the patch, we can accumulate quite a lot of GRO
packets (depdending on NAPI budget).

My fix is counting in napi->rx_count number of segments
instead of number of logical packets.

Fixes: c8079432 ("net: Fix packet reordering caused by GRO and listified RX cooperation")
Signed-off-by: NEric Dumazet <edumazet@google.com>
Bisected-by: NJohn Sperbeck <jsperbeck@google.com>
Tested-by: NJian Yang <jianyang@google.com>
Cc: Maxim Mikityanskiy <maximmi@mellanox.com>
Reviewed-by: NSaeed Mahameed <saeedm@nvidia.com>
Reviewed-by: NEdward Cree <ecree.xilinx@gmail.com>
Reviewed-by: NAlexander Lobakin <alobakin@pm.me>
Link: https://lore.kernel.org/r/20210204213146.4192368-1-eric.dumazet@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>

b670ad4e

08 2月, 2021 1 次提交

net: Disable NETIF_F_HW_TLS_RX when RXCSUM is disabled · 6f904b52

由 Tariq Toukan 提交于 1月 28, 2021

stable inclusion
from stable-5.10.11
commit f0f3d3e6e938d72c371e9838dad0014b1a788dbd
bugzilla: 47621

--------------------------------

commit a3eb4e9d upstream.

With NETIF_F_HW_TLS_RX packets are decrypted in HW. This cannot be
logically done when RXCSUM offload is off.

Fixes: 14136564 ("net: Add TLS RX offload feature")
Signed-off-by: NTariq Toukan <tariqt@nvidia.com>
Reviewed-by: NBoris Pismenny <borisp@nvidia.com>
Link: https://lore.kernel.org/r/20210117151538.9411-1-tariqt@nvidia.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>

6f904b52

09 12月, 2020 1 次提交

xdp: Remove the xdp_attachment_flags_ok() callback · 998f1729

由 Toke Høiland-Jørgensen 提交于 12月 09, 2020

Since commit 7f0a8382 ("bpf, xdp: Maintain info on attached XDP BPF
programs in net_device"), the XDP program attachment info is now maintained
in the core code. This interacts badly with the xdp_attachment_flags_ok()
check that prevents unloading an XDP program with different load flags than
it was loaded with. In practice, two kinds of failures are seen:

- An XDP program loaded without specifying a mode (and which then ends up
  in driver mode) cannot be unloaded if the program mode is specified on
  unload.

- The dev_xdp_uninstall() hook always calls the driver callback with the
  mode set to the type of the program but an empty flags argument, which
  means the flags_ok() check prevents the program from being removed,
  leading to bpf prog reference leaks.

The original reason this check was added was to avoid ambiguity when
multiple programs were loaded. With the way the checks are done in the core
now, this is quite simple to enforce in the core code, so let's add a check
there and get rid of the xdp_attachment_flags_ok() callback entirely.

Fixes: 7f0a8382 ("bpf, xdp: Maintain info on attached XDP BPF programs in net_device")
Signed-off-by: NToke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NJakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/bpf/160752225751.110217.10267659521308669050.stgit@toke.dk

998f1729

25 11月, 2020 1 次提交

net, xsk: Avoid taking multiple skbuff references · 36ccdf85

由 Björn Töpel 提交于 11月 23, 2020

Commit 642e450b ("xsk: Do not discard packet when NETDEV_TX_BUSY")
addressed the problem that packets were discarded from the Tx AF_XDP
ring, when the driver returned NETDEV_TX_BUSY. Part of the fix was
bumping the skbuff reference count, so that the buffer would not be
freed by dev_direct_xmit(). A reference count larger than one means
that the skbuff is "shared", which is not the case.

If the "shared" skbuff is sent to the generic XDP receive path,
netif_receive_generic_xdp(), and pskb_expand_head() is entered the
BUG_ON(skb_shared(skb)) will trigger.

This patch adds a variant to dev_direct_xmit(), __dev_direct_xmit(),
where a user can select the skbuff free policy. This allows AF_XDP to
avoid bumping the reference count, but still keep the NETDEV_TX_BUSY
behavior.

Fixes: 642e450b ("xsk: Do not discard packet when NETDEV_TX_BUSY")
Reported-by: NYonghong Song <yhs@fb.com>
Signed-off-by: NBjörn Töpel <bjorn.topel@intel.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20201123175600.146255-1-bjorn.topel@gmail.com

36ccdf85

25 10月, 2020 1 次提交

random32: add noise from network and scheduling activity · 3744741a

由 Willy Tarreau 提交于 8月 10, 2020

With the removal of the interrupt perturbations in previous random32
change (random32: make prandom_u32() output unpredictable), the PRNG
has become 100% deterministic again. While SipHash is expected to be
way more robust against brute force than the previous Tausworthe LFSR,
there's still the risk that whoever has even one temporary access to
the PRNG's internal state is able to predict all subsequent draws till
the next reseed (roughly every minute). This may happen through a side
channel attack or any data leak.

This patch restores the spirit of commit f227e3ec ("random32: update
the net random state on interrupt and activity") in that it will perturb
the internal PRNG's statee using externally collected noise, except that
it will not pick that noise from the random pool's bits nor upon
interrupt, but will rather combine a few elements along the Tx path
that are collectively hard to predict, such as dev, skb and txq
pointers, packet length and jiffies values. These ones are combined
using a single round of SipHash into a single long variable that is
mixed with the net_rand_state upon each invocation.

The operation was inlined because it produces very small and efficient
code, typically 3 xor, 2 add and 2 rol. The performance was measured
to be the same (even very slightly better) than before the switch to
SipHash; on a 6-core 12-thread Core i7-8700k equipped with a 40G NIC
(i40e), the connection rate dropped from 556k/s to 555k/s while the
SYN cookie rate grew from 5.38 Mpps to 5.45 Mpps.

Link: https://lore.kernel.org/netdev/20200808152628.GA27941@SDF.ORG/
Cc: George Spelvin <lkml@sdf.org>
Cc: Amit Klein <aksecurity@gmail.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: tytso@mit.edu
Cc: Florian Westphal <fw@strlen.de>
Cc: Marc Plumb <lkml.mplumb@gmail.com>
Tested-by: NSedat Dilek <sedat.dilek@gmail.com>
Signed-off-by: NWilly Tarreau <w@1wt.eu>

3744741a

19 10月, 2020 1 次提交

net: core: use list_del_init() instead of list_del() in netdev_run_todo() · 0e8b8d6a

由 Taehee Yoo 提交于 10月 15, 2020

dev->unlink_list is reused unless dev is deleted.
So, list_del() should not be used.
Due to using list_del(), dev->unlink_list can't be reused so that
dev->nested_level update logic doesn't work.
In order to fix this bug, list_del_init() should be used instead
of list_del().

Test commands:
    ip link add bond0 type bond
    ip link add bond1 type bond
    ip link set bond0 master bond1
    ip link set bond0 nomaster
    ip link set bond1 master bond0
    ip link set bond1 nomaster

Splat looks like:
[  255.750458][ T1030] ============================================
[  255.751967][ T1030] WARNING: possible recursive locking detected
[  255.753435][ T1030] 5.9.0-rc8+ #772 Not tainted
[  255.754553][ T1030] --------------------------------------------
[  255.756047][ T1030] ip/1030 is trying to acquire lock:
[  255.757304][ T1030] ffff88811782a280 (&dev_addr_list_lock_key/1){+...}-{2:2}, at: dev_mc_sync_multiple+0xc2/0x150
[  255.760056][ T1030]
[  255.760056][ T1030] but task is already holding lock:
[  255.761862][ T1030] ffff88811130a280 (&dev_addr_list_lock_key/1){+...}-{2:2}, at: bond_enslave+0x3d4d/0x43e0 [bonding]
[  255.764581][ T1030]
[  255.764581][ T1030] other info that might help us debug this:
[  255.766645][ T1030]  Possible unsafe locking scenario:
[  255.766645][ T1030]
[  255.768566][ T1030]        CPU0
[  255.769415][ T1030]        ----
[  255.770259][ T1030]   lock(&dev_addr_list_lock_key/1);
[  255.771629][ T1030]   lock(&dev_addr_list_lock_key/1);
[  255.772994][ T1030]
[  255.772994][ T1030]  *** DEADLOCK ***
[  255.772994][ T1030]
[  255.775091][ T1030]  May be due to missing lock nesting notation
[  255.775091][ T1030]
[  255.777182][ T1030] 2 locks held by ip/1030:
[  255.778299][ T1030]  #0: ffffffffb1f63250 (rtnl_mutex){+.+.}-{3:3}, at: rtnetlink_rcv_msg+0x2e4/0x8b0
[  255.780600][ T1030]  #1: ffff88811130a280 (&dev_addr_list_lock_key/1){+...}-{2:2}, at: bond_enslave+0x3d4d/0x43e0 [bonding]
[  255.783411][ T1030]
[  255.783411][ T1030] stack backtrace:
[  255.784874][ T1030] CPU: 7 PID: 1030 Comm: ip Not tainted 5.9.0-rc8+ #772
[  255.786595][ T1030] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
[  255.789030][ T1030] Call Trace:
[  255.789850][ T1030]  dump_stack+0x99/0xd0
[  255.790882][ T1030]  __lock_acquire.cold.71+0x166/0x3cc
[  255.792285][ T1030]  ? register_lock_class+0x1a30/0x1a30
[  255.793619][ T1030]  ? rcu_read_lock_sched_held+0x91/0xc0
[  255.794963][ T1030]  ? rcu_read_lock_bh_held+0xa0/0xa0
[  255.796246][ T1030]  lock_acquire+0x1b8/0x850
[  255.797332][ T1030]  ? dev_mc_sync_multiple+0xc2/0x150
[  255.798624][ T1030]  ? bond_enslave+0x3d4d/0x43e0 [bonding]
[  255.800039][ T1030]  ? check_flags+0x50/0x50
[  255.801143][ T1030]  ? lock_contended+0xd80/0xd80
[  255.802341][ T1030]  _raw_spin_lock_nested+0x2e/0x70
[  255.803592][ T1030]  ? dev_mc_sync_multiple+0xc2/0x150
[  255.804897][ T1030]  dev_mc_sync_multiple+0xc2/0x150
[  255.806168][ T1030]  bond_enslave+0x3d58/0x43e0 [bonding]
[  255.807542][ T1030]  ? __lock_acquire+0xe53/0x51b0
[  255.808824][ T1030]  ? bond_update_slave_arr+0xdc0/0xdc0 [bonding]
[  255.810451][ T1030]  ? check_chain_key+0x236/0x5e0
[  255.811742][ T1030]  ? mutex_is_locked+0x13/0x50
[  255.812910][ T1030]  ? rtnl_is_locked+0x11/0x20
[  255.814061][ T1030]  ? netdev_master_upper_dev_get+0xf/0x120
[  255.815553][ T1030]  do_setlink+0x94c/0x3040
[ ... ]

Reported-by: syzbot+4a0f7bc34e3997a6c7df@syzkaller.appspotmail.com
Fixes: 1fc70edb ("net: core: add nested_level variable in net_device")
Signed-off-by: NTaehee Yoo <ap420073@gmail.com>
Link: https://lore.kernel.org/r/20201015162606.9377-1-ap420073@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>

0e8b8d6a

14 10月, 2020 1 次提交

net: add function dev_fetch_sw_netstats for fetching pcpu_sw_netstats · 44fa32f0

由 Heiner Kallweit 提交于 10月 12, 2020

In several places the same code is used to populate rtnl_link_stats64
fields with data from pcpu_sw_netstats. Therefore factor out this code
to a new function dev_fetch_sw_netstats().

v2:
- constify argument netstats
- don't ignore netstats being NULL or an ERRPTR
- switch to EXPORT_SYMBOL_GPL
Signed-off-by: NHeiner Kallweit <hkallweit1@gmail.com>
Link: https://lore.kernel.org/r/6d16a338-52f5-df69-0020-6bc771a7d498@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>

44fa32f0

openeuler / Kernel 大约 2 年 前同步成功

openeuler / Kernel
大约 2 年前同步成功