- 17 12月, 2022 11 次提交
-
-
由 Jay Zhou 提交于
mainline inclusion from mainline-v5.10 commit: 3c9bd400 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I66COX CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.10.y&id=3c9bd4006bfc2dccda1823db61b3f470ef91cfaa -------------------------------- It could take kvm->mmu_lock for an extended period of time when enabling dirty log for the first time. The main cost is to clear all the D-bits of last level SPTEs. This situation can benefit from manual dirty log protect as well, which can reduce the mmu_lock time taken. The sequence is like this: 1. Initialize all the bits of the dirty bitmap to 1 when enabling dirty log for the first time 2. Only write protect the huge pages 3. KVM_GET_DIRTY_LOG returns the dirty bitmap info 4. KVM_CLEAR_DIRTY_LOG will clear D-bit for each of the leaf level SPTEs gradually in small chunks Under the Intel(R) Xeon(R) Gold 6152 CPU @ 2.10GHz environment, I did some tests with a 128G windows VM and counted the time taken of memory_global_dirty_log_start, here is the numbers: VM Size Before After optimization 128G 460ms 10ms Signed-off-by: NJay Zhou <jianjay.zhou@huawei.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> [Remove test code because the test code is too different.] Signed-off-by: zhengchuan<zhengchuan@huawei.com>
-
由 Peter Xu 提交于
mainline inclusion from mainline-v5.10 commit: d7547c55 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I66COX CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.10.y&id=d7547c55cbe7471255ca51f14bcd4699f5eaabe5 -------------------------------- The previous KVM_CAP_MANUAL_DIRTY_LOG_PROTECT has some problem which blocks the correct usage from userspace. Obsolete the old one and introduce a new capability bit for it. Suggested-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NPeter Xu <peterx@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> [Remove test code because the test code is too different.] Signed-off-by: zhengchuan<zhengchuan@huawei.com>
-
由 Peter Xu 提交于
mainline inclusion from mainline-v5.10 commit: 53eac7a8 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I66COX CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.10.y&id=53eac7a8f8cf3d7dc5ecac1946f31442f5eee5f3 -------------------------------- Just imaging the case where num_pages < BITS_PER_LONG, then the loop will be skipped while it shouldn't. Signed-off-by: NPeter Xu <peterx@redhat.com> Fixes: 2a31b9dbSigned-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Peter Xu 提交于
mainline inclusion from mainline-v5.10 commit: 4ddc9204 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I66COX CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.10.y&id=4ddc9204572c33f2eb91fbdb1d99d8078388b67d -------------------------------- kvm_dirty_bitmap_bytes() will return the size of the dirty bitmap of the memslot rather than the size of bitmap passed over from the ioctl. Here for KVM_CLEAR_DIRTY_LOG we should only copy exactly the size of bitmap that covers kvm_clear_dirty_log.num_pages. Signed-off-by: NPeter Xu <peterx@redhat.com> Cc: stable@vger.kernel.org Fixes: 2a31b9dbSigned-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Jiang Biao 提交于
mainline inclusion from mainline-v5.10 commit: b8b00220 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I66COX CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.10.y&id=b8b002209c061273fd1ef7bb3c3c32301623a282 -------------------------------- is_dirty has been renamed to flush, but the comment for it is outdated. And the description about @flush parameter for kvm_clear_dirty_log_protect() is missing, add it in this patch as well. Signed-off-by: NJiang Biao <benbjiang@tencent.com> Reviewed-by: NCornelia Huck <cohuck@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
mainline inclusion from mainline-v5.1 commit: 76d58e0f category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I66COX CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.10.y&id=76d58e0f07ec203bbdfcaabd9a9fc10a5a3ed5ea -------------------------------- If a memory slot's size is not a multiple of 64 pages (256K), then the KVM_CLEAR_DIRTY_LOG API is unusable: clearing the final 64 pages either requires the requested page range to go beyond memslot->npages, or requires log->num_pages to be unaligned, and kvm_clear_dirty_log_protect requires log->num_pages to be both in range and aligned. To allow this case, allow log->num_pages not to be a multiple of 64 if it ends exactly on the last page of the slot. Reported-by: NPeter Xu <peterx@redhat.com> Fixes: 98938aa8 ("KVM: validate userspace input in kvm_clear_dirty_log_protect()", 2019-01-02) Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> [Remove test code because the test code is too different.] Signed-off-by: zhengchuan<zhengchuan@huawei.com>
-
由 Lan Tianyu 提交于
mainline inclusion from mainline-v5.1 commit: a67794ca category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I66COX CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.10.y&id=a67794cafbc4594debf53dbe4e2a7708426f492e -------------------------------- The value of "dirty_bitmap[i]" is already check before setting its value to mask. The following check of "mask" is redundant. The check of "mask" was introduced by commit 58d2930f ("KVM: Eliminate extra function calls in kvm_get_dirty_log_protect()"), revert it. Signed-off-by: NLan Tianyu <Tianyu.Lan@microsoft.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Tomas Bortoli 提交于
mainline inclusion from mainline-v5.0 commit: 98938aa8 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I66COX CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.10.y&id=98938aa8edd66dc95024d7c936a4bc315f6615ff -------------------------------- The function at issue does not fully validate the content of the structure pointed by the log parameter, though its content has just been copied from userspace and lacks validation. Fix that. Moreover, change the type of n to unsigned long as that is the type returned by kvm_dirty_bitmap_bytes(). Signed-off-by: NTomas Bortoli <tomasbortoli@gmail.com> Reported-by: syzbot+028366e52c9ace67deb3@syzkaller.appspotmail.com [Squashed the fix from Paolo. - Radim.] Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
由 Paolo Bonzini 提交于
mainline inclusion from mainline-v5.0 commit: 2a31b9db category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I66COX CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.10.y&id=2a31b9db153530df4aa02dac8c32837bf5f47019 -------------------------------- There are two problems with KVM_GET_DIRTY_LOG. First, and less important, it can take kvm->mmu_lock for an extended period of time. Second, its user can actually see many false positives in some cases. The latter is due to a benign race like this: 1. KVM_GET_DIRTY_LOG returns a set of dirty pages and write protects them. 2. The guest modifies the pages, causing them to be marked ditry. 3. Userspace actually copies the pages. 4. KVM_GET_DIRTY_LOG returns those pages as dirty again, even though they were not written to since (3). This is especially a problem for large guests, where the time between (1) and (3) can be substantial. This patch introduces a new capability which, when enabled, makes KVM_GET_DIRTY_LOG not write-protect the pages it returns. Instead, userspace has to explicitly clear the dirty log bits just before using the content of the page. The new KVM_CLEAR_DIRTY_LOG ioctl can also operate on a 64-page granularity rather than requiring to sync a full memslot; this way, the mmu_lock is taken for small amounts of time, and only a small amount of time will pass between write protection of pages and the sending of their content. Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> [Remove test code because the test code is too different.] Signed-off-by: zhengchuan<zhengchuan@huawei.com>
-
由 Paolo Bonzini 提交于
mainline inclusion from mainline-v5.0 commit: 8fe65a82 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I66COX CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.10.y&id=8fe65a8299f9e1f40cb95308ab7b3c4ad80bf801 -------------------------------- When manual dirty log reprotect will be enabled, kvm_get_dirty_log_protect's pointer argument will always be false on exit, because no TLB flush is needed until the manual re-protection operation. Rename it from "is_dirty" to "flush", which more accurately tells the caller what they have to do with it. Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
mainline inclusion from mainline-v5.0 commit: e5d83c74 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I66COX CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.10.y&id=e5d83c74a5800c2a1fa3ba982c1c4b2b39ae6db2 -------------------------------- The first such capability to be handled in virt/kvm/ will be manual dirty page reprotection. Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 08 12月, 2022 26 次提交
-
-
由 Chen Jun 提交于
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I64Y5Y CVE: NA ------------------------------- If local_group_add_task fails in init_local_group. ida free the same id twice. init_local_group local_group_add_task // failed goto free_spg free_spg: free_sp_group_locked free_sp_group_id // free spg->id free_spg_id: free_new_spg_id // double free spg->id To fix it, return before calling free_new_spg_id. Signed-off-by: NChen Jun <chenjun102@huawei.com> Signed-off-by: NGuo Mengqi <guomengqi3@huawei.com> Reviewed-by: Nchenweilong <chenweilong@huawei.com> Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
-
由 Baisong Zhong 提交于
stable inclusion from stable-v4.19.267 commit 730fb1ef974a13915bc7651364d8b3318891cd70 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I63UEU CVE: NA -------------------------------- commit d3fd203f upstream. We got a syzkaller problem because of aarch64 alignment fault if KFENCE enabled. When the size from user bpf program is an odd number, like 399, 407, etc, it will cause the struct skb_shared_info's unaligned access. As seen below: BUG: KFENCE: use-after-free read in __skb_clone+0x23c/0x2a0 net/core/skbuff.c:1032 Use-after-free read at 0xffff6254fffac077 (in kfence-#213): __lse_atomic_add arch/arm64/include/asm/atomic_lse.h:26 [inline] arch_atomic_add arch/arm64/include/asm/atomic.h:28 [inline] arch_atomic_inc include/linux/atomic-arch-fallback.h:270 [inline] atomic_inc include/asm-generic/atomic-instrumented.h:241 [inline] __skb_clone+0x23c/0x2a0 net/core/skbuff.c:1032 skb_clone+0xf4/0x214 net/core/skbuff.c:1481 ____bpf_clone_redirect net/core/filter.c:2433 [inline] bpf_clone_redirect+0x78/0x1c0 net/core/filter.c:2420 bpf_prog_d3839dd9068ceb51+0x80/0x330 bpf_dispatcher_nop_func include/linux/bpf.h:728 [inline] bpf_test_run+0x3c0/0x6c0 net/bpf/test_run.c:53 bpf_prog_test_run_skb+0x638/0xa7c net/bpf/test_run.c:594 bpf_prog_test_run kernel/bpf/syscall.c:3148 [inline] __do_sys_bpf kernel/bpf/syscall.c:4441 [inline] __se_sys_bpf+0xad0/0x1634 kernel/bpf/syscall.c:4381 kfence-#213: 0xffff6254fffac000-0xffff6254fffac196, size=407, cache=kmalloc-512 allocated by task 15074 on cpu 0 at 1342.585390s: kmalloc include/linux/slab.h:568 [inline] kzalloc include/linux/slab.h:675 [inline] bpf_test_init.isra.0+0xac/0x290 net/bpf/test_run.c:191 bpf_prog_test_run_skb+0x11c/0xa7c net/bpf/test_run.c:512 bpf_prog_test_run kernel/bpf/syscall.c:3148 [inline] __do_sys_bpf kernel/bpf/syscall.c:4441 [inline] __se_sys_bpf+0xad0/0x1634 kernel/bpf/syscall.c:4381 __arm64_sys_bpf+0x50/0x60 kernel/bpf/syscall.c:4381 To fix the problem, we adjust @size so that (@size + @hearoom) is a multiple of SMP_CACHE_BYTES. So we make sure the struct skb_shared_info is aligned to a cache line. Fixes: 1cf1cae9 ("bpf: introduce BPF_PROG_TEST_RUN command") Signed-off-by: NBaisong Zhong <zhongbaisong@huawei.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Cc: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/bpf/20221102081620.1465154-1-zhongbaisong@huawei.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
-
由 Eric Dumazet 提交于
stable inclusion from stable-v4.19.267 commit 650137a7c0b2892df2e5b0bc112d7b09a78c93c8 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I63UEU CVE: NA -------------------------------- commit b64085b0 upstream. macvlan should enforce a minimal mtu of 68, even at link creation. This patch avoids the current behavior (which could lead to crashes in ipv6 stack if the link is brought up) $ ip link add macvlan1 link eno1 mtu 8 type macvlan # This should fail ! $ ip link sh dev macvlan1 5: macvlan1@eno1: <BROADCAST,MULTICAST> mtu 8 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether 02:47:6c:24:74:82 brd ff:ff:ff:ff:ff:ff $ ip link set macvlan1 mtu 67 Error: mtu less than device minimum. $ ip link set macvlan1 mtu 68 $ ip link set macvlan1 mtu 8 Error: mtu less than device minimum. Fixes: 91572088 ("net: use core MTU range checking in core net infra") Reported-by: Nsyzbot <syzkaller@googlegroups.com> Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
-
由 Chuang Wang 提交于
stable inclusion from stable-v4.19.267 commit a81b44d1df1f07f00c0dcc0a0b3d2fa24a46289e category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I63UEU CVE: NA -------------------------------- [ Upstream commit 23569b56 ] kmemleak reports memory leaks in macvlan_common_newlink, as follows: ip link add link eth0 name .. type macvlan mode source macaddr add <MAC-ADDR> kmemleak reports: unreferenced object 0xffff8880109bb140 (size 64): comm "ip", pid 284, jiffies 4294986150 (age 430.108s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 b8 aa 5a 12 80 88 ff ff ..........Z..... 80 1b fa 0d 80 88 ff ff 1e ff ac af c7 c1 6b 6b ..............kk backtrace: [<ffffffff813e06a7>] kmem_cache_alloc_trace+0x1c7/0x300 [<ffffffff81b66025>] macvlan_hash_add_source+0x45/0xc0 [<ffffffff81b66a67>] macvlan_changelink_sources+0xd7/0x170 [<ffffffff81b6775c>] macvlan_common_newlink+0x38c/0x5a0 [<ffffffff81b6797e>] macvlan_newlink+0xe/0x20 [<ffffffff81d97f8f>] __rtnl_newlink+0x7af/0xa50 [<ffffffff81d98278>] rtnl_newlink+0x48/0x70 ... In the scenario where the macvlan mode is configured as 'source', macvlan_changelink_sources() will be execured to reconfigure list of remote source mac addresses, at the same time, if register_netdevice() return an error, the resource generated by macvlan_changelink_sources() is not cleaned up. Using this patch, in the case of an error, it will execute macvlan_flush_sources() to ensure that the resource is cleaned up. Fixes: aa5fd0fb ("driver: macvlan: Destroy new macvlan port if macvlan_common_newlink failed.") Signed-off-by: NChuang Wang <nashuiliang@gmail.com> Link: https://lore.kernel.org/r/20221109090735.690500-1-nashuiliang@gmail.comSigned-off-by: NPaolo Abeni <pabeni@redhat.com> Signed-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
-
由 Alexander Potapenko 提交于
stable inclusion from stable-v4.19.267 commit 6d26d0587abccb9835382a0b53faa7b9b1cd83e3 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I63UEU CVE: NA -------------------------------- [ Upstream commit c23fb2c8 ] When copying a `struct ifaddrlblmsg` to the network, __ifal_reserved remained uninitialized, resulting in a 1-byte infoleak: BUG: KMSAN: kernel-network-infoleak in __netdev_start_xmit ./include/linux/netdevice.h:4841 __netdev_start_xmit ./include/linux/netdevice.h:4841 netdev_start_xmit ./include/linux/netdevice.h:4857 xmit_one net/core/dev.c:3590 dev_hard_start_xmit+0x1dc/0x800 net/core/dev.c:3606 __dev_queue_xmit+0x17e8/0x4350 net/core/dev.c:4256 dev_queue_xmit ./include/linux/netdevice.h:3009 __netlink_deliver_tap_skb net/netlink/af_netlink.c:307 __netlink_deliver_tap+0x728/0xad0 net/netlink/af_netlink.c:325 netlink_deliver_tap net/netlink/af_netlink.c:338 __netlink_sendskb net/netlink/af_netlink.c:1263 netlink_sendskb+0x1d9/0x200 net/netlink/af_netlink.c:1272 netlink_unicast+0x56d/0xf50 net/netlink/af_netlink.c:1360 nlmsg_unicast ./include/net/netlink.h:1061 rtnl_unicast+0x5a/0x80 net/core/rtnetlink.c:758 ip6addrlbl_get+0xfad/0x10f0 net/ipv6/addrlabel.c:628 rtnetlink_rcv_msg+0xb33/0x1570 net/core/rtnetlink.c:6082 ... Uninit was created at: slab_post_alloc_hook+0x118/0xb00 mm/slab.h:742 slab_alloc_node mm/slub.c:3398 __kmem_cache_alloc_node+0x4f2/0x930 mm/slub.c:3437 __do_kmalloc_node mm/slab_common.c:954 __kmalloc_node_track_caller+0x117/0x3d0 mm/slab_common.c:975 kmalloc_reserve net/core/skbuff.c:437 __alloc_skb+0x27a/0xab0 net/core/skbuff.c:509 alloc_skb ./include/linux/skbuff.h:1267 nlmsg_new ./include/net/netlink.h:964 ip6addrlbl_get+0x490/0x10f0 net/ipv6/addrlabel.c:608 rtnetlink_rcv_msg+0xb33/0x1570 net/core/rtnetlink.c:6082 netlink_rcv_skb+0x299/0x550 net/netlink/af_netlink.c:2540 rtnetlink_rcv+0x26/0x30 net/core/rtnetlink.c:6109 netlink_unicast_kernel net/netlink/af_netlink.c:1319 netlink_unicast+0x9ab/0xf50 net/netlink/af_netlink.c:1345 netlink_sendmsg+0xebc/0x10f0 net/netlink/af_netlink.c:1921 ... This patch ensures that the reserved field is always initialized. Reported-by: syzbot+3553517af6020c4f2813f1003fe76ef3cbffe98d@syzkaller.appspotmail.com Fixes: 2a8cc6c8 ("[IPV6] ADDRCONF: Support RFC3484 configurable address selection policy table.") Signed-off-by: NAlexander Potapenko <glider@google.com> Reviewed-by: NDavid Ahern <dsahern@kernel.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net> Signed-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
-
由 Jiri Benc 提交于
stable inclusion from stable-v4.19.267 commit bd5362e58721e4d0d1a37796593bd6e51536ce7a category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I63UEU CVE: NA -------------------------------- [ Upstream commit 9e4b7a99 ] Since commit 3dcbdb13 ("net: gso: Fix skb_segment splat when splitting gso_size mangled skb having linear-headed frag_list"), it is allowed to change gso_size of a GRO packet. However, that commit assumes that "checking the first list_skb member suffices; i.e if either of the list_skb members have non head_frag head, then the first one has too". It turns out this assumption does not hold. We've seen BUG_ON being hit in skb_segment when skbs on the frag_list had differing head_frag with the vmxnet3 driver. This happens because __netdev_alloc_skb and __napi_alloc_skb can return a skb that is page backed or kmalloced depending on the requested size. As the result, the last small skb in the GRO packet can be kmalloced. There are three different locations where this can be fixed: (1) We could check head_frag in GRO and not allow GROing skbs with different head_frag. However, that would lead to performance regression on normal forward paths with unmodified gso_size, where !head_frag in the last packet is not a problem. (2) Set a flag in bpf_skb_net_grow and bpf_skb_net_shrink indicating that NETIF_F_SG is undesirable. That would need to eat a bit in sk_buff. Furthermore, that flag can be unset when all skbs on the frag_list are page backed. To retain good performance, bpf_skb_net_grow/shrink would have to walk the frag_list. (3) Walk the frag_list in skb_segment when determining whether NETIF_F_SG should be cleared. This of course slows things down. This patch implements (3). To limit the performance impact in skb_segment, the list is walked only for skbs with SKB_GSO_DODGY set that have gso_size changed. Normal paths thus will not hit it. We could check only the last skb but since we need to walk the whole list anyway, let's stay on the safe side. Fixes: 3dcbdb13 ("net: gso: Fix skb_segment splat when splitting gso_size mangled skb having linear-headed frag_list") Signed-off-by: NJiri Benc <jbenc@redhat.com> Reviewed-by: NWillem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/r/e04426a6a91baf4d1081e1b478c82b5de25fdf21.1667407944.git.jbenc@redhat.comSigned-off-by: NJakub Kicinski <kuba@kernel.org> Signed-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
-
由 Kuniyuki Iwashima 提交于
stable inclusion from stable-v4.19.265 commit 7162f05f1f0f2b9554ea207842a1389c067cc1db category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I63UEU CVE: NA -------------------------------- commit 11052589 upstream. Commit e21145a9 ("ipv4: namespacify ip_early_demux sysctl knob") made it possible to enable/disable early_demux on a per-netns basis. Then, we introduced two knobs, tcp_early_demux and udp_early_demux, to switch it for TCP/UDP in commit dddb64bc ("net: Add sysctl to toggle early demux for tcp and udp"). However, the .proc_handler() was wrong and actually disabled us from changing the behaviour in each netns. We can execute early_demux if net.ipv4.ip_early_demux is on and each proto .early_demux() handler is not NULL. When we toggle (tcp|udp)_early_demux, the change itself is saved in each netns variable, but the .early_demux() handler is a global variable, so the handler is switched based on the init_net's sysctl variable. Thus, netns (tcp|udp)_early_demux knobs have nothing to do with the logic. Whether we CAN execute proto .early_demux() is always decided by init_net's sysctl knob, and whether we DO it or not is by each netns ip_early_demux knob. This patch namespacifies (tcp|udp)_early_demux again. For now, the users of the .early_demux() handler are TCP and UDP only, and they are called directly to avoid retpoline. So, we can remove the .early_demux() handler from inet6?_protos and need not dereference them in ip6?_rcv_finish_core(). If another proto needs .early_demux(), we can restore it at that time. Fixes: dddb64bc ("net: Add sysctl to toggle early demux for tcp and udp") Signed-off-by: NKuniyuki Iwashima <kuniyu@amazon.com> Link: https://lore.kernel.org/r/20220713175207.7727-1-kuniyu@amazon.comSigned-off-by: NJakub Kicinski <kuba@kernel.org> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
-
由 Zhengchao Shao 提交于
stable inclusion from stable-v4.19.265 commit 83fbf246ced54dadd7b9adc2a16efeff30ba944d category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I63UEU CVE: NA -------------------------------- [ Upstream commit 768b3c74 ] During the initialization of ip6_route_net_init_late(), if file ipv6_route or rt6_stats fails to be created, the initialization is successful by default. Therefore, the ipv6_route or rt6_stats file doesn't be found during the remove in ip6_route_net_exit_late(). It will cause WRNING. The following is the stack information: name 'rt6_stats' WARNING: CPU: 0 PID: 9 at fs/proc/generic.c:712 remove_proc_entry+0x389/0x460 Modules linked in: Workqueue: netns cleanup_net RIP: 0010:remove_proc_entry+0x389/0x460 PKRU: 55555554 Call Trace: <TASK> ops_exit_list+0xb0/0x170 cleanup_net+0x4ea/0xb00 process_one_work+0x9bf/0x1710 worker_thread+0x665/0x1080 kthread+0x2e4/0x3a0 ret_from_fork+0x1f/0x30 </TASK> Fixes: cdb18761 ("[NETNS][IPV6] route6 - create route6 proc files for the namespace") Signed-off-by: NZhengchao Shao <shaozhengchao@huawei.com> Reviewed-by: NEric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20221102020610.351330-1-shaozhengchao@huawei.comSigned-off-by: NJakub Kicinski <kuba@kernel.org> Signed-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
-
由 Chen Zhongjin 提交于
stable inclusion from stable-v4.19.265 commit b736592de2aa53aee2d48d6b129bc0c892007bbe category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I63UEU CVE: NA -------------------------------- [ Upstream commit f8017317 ] When IPv6 module gets initialized but hits an error in the middle, kenel panic with: KASAN: null-ptr-deref in range [0x0000000000000598-0x000000000000059f] CPU: 1 PID: 361 Comm: insmod Hardware name: QEMU Standard PC (i440FX + PIIX, 1996) RIP: 0010:__neigh_ifdown.isra.0+0x24b/0x370 RSP: 0018:ffff888012677908 EFLAGS: 00000202 ... Call Trace: <TASK> neigh_table_clear+0x94/0x2d0 ndisc_cleanup+0x27/0x40 [ipv6] inet6_init+0x21c/0x2cb [ipv6] do_one_initcall+0xd3/0x4d0 do_init_module+0x1ae/0x670 ... Kernel panic - not syncing: Fatal exception When ipv6 initialization fails, it will try to cleanup and calls: neigh_table_clear() neigh_ifdown(tbl, NULL) pneigh_queue_purge(&tbl->proxy_queue, dev_net(dev == NULL)) # dev_net(NULL) triggers null-ptr-deref. Fix it by passing NULL to pneigh_queue_purge() in neigh_ifdown() if dev is NULL, to make kernel not panic immediately. Fixes: 66ba215c ("neigh: fix possible DoS due to net iface start/stop loop") Signed-off-by: NChen Zhongjin <chenzhongjin@huawei.com> Reviewed-by: NEric Dumazet <edumazet@google.com> Reviewed-by: NDenis V. Lunev <den@openvz.org> Link: https://lore.kernel.org/r/20221101121552.21890-1-chenzhongjin@huawei.comSigned-off-by: NJakub Kicinski <kuba@kernel.org> Signed-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
-
由 Neal Cardwell 提交于
stable inclusion from stable-v4.19.264 commit 633da7b30b240000b1d9b690e43848406a0d060f category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I63UEU CVE: NA -------------------------------- [ Upstream commit 3d2af9cc ] This commit fixes a bug that can cause a TCP data sender to repeatedly defer RTOs when encountering SACK reneging. The bug is that when we're in fast recovery in a scenario with SACK reneging, every time we get an ACK we call tcp_check_sack_reneging() and it can note the apparent SACK reneging and rearm the RTO timer for srtt/2 into the future. In some SACK reneging scenarios that can happen repeatedly until the receive window fills up, at which point the sender can't send any more, the ACKs stop arriving, and the RTO fires at srtt/2 after the last ACK. But that can take far too long (O(10 secs)), since the connection is stuck in fast recovery with a low cwnd that cannot grow beyond ssthresh, even if more bandwidth is available. This fix changes the logic in tcp_check_sack_reneging() to only rearm the RTO timer if data is cumulatively ACKed, indicating forward progress. This avoids this kind of nearly infinite loop of RTO timer re-arming. In addition, this meets the goals of tcp_check_sack_reneging() in handling Windows TCP behavior that looks temporarily like SACK reneging but is not really. Many thanks to Jakub Kicinski and Neil Spring, who reported this issue and provided critical packet traces that enabled root-causing this issue. Also, many thanks to Jakub Kicinski for testing this fix. Fixes: 5ae344c9 ("tcp: reduce spurious retransmits due to transient SACK reneging") Reported-by: NJakub Kicinski <kuba@kernel.org> Reported-by: NNeil Spring <ntspring@fb.com> Signed-off-by: NNeal Cardwell <ncardwell@google.com> Reviewed-by: NEric Dumazet <edumazet@google.com> Cc: Yuchung Cheng <ycheng@google.com> Tested-by: NJakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/r/20221021170821.1093930-1-ncardwell.kernel@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org> Signed-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
-
由 Zhengchao Shao 提交于
stable inclusion from stable-v4.19.264 commit 5a2ea549be94924364f6911227d99be86e8cf34a category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I63UEU CVE: NA -------------------------------- [ Upstream commit d266935a ] When the ops_init() interface is invoked to initialize the net, but ops->init() fails, data is released. However, the ptr pointer in net->gen is invalid. In this case, when nfqnl_nf_hook_drop() is invoked to release the net, invalid address access occurs. The process is as follows: setup_net() ops_init() data = kzalloc(...) ---> alloc "data" net_assign_generic() ---> assign "date" to ptr in net->gen ... ops->init() ---> failed ... kfree(data); ---> ptr in net->gen is invalid ... ops_exit_list() ... nfqnl_nf_hook_drop() *q = nfnl_queue_pernet(net) ---> q is invalid The following is the Call Trace information: BUG: KASAN: use-after-free in nfqnl_nf_hook_drop+0x264/0x280 Read of size 8 at addr ffff88810396b240 by task ip/15855 Call Trace: <TASK> dump_stack_lvl+0x8e/0xd1 print_report+0x155/0x454 kasan_report+0xba/0x1f0 nfqnl_nf_hook_drop+0x264/0x280 nf_queue_nf_hook_drop+0x8b/0x1b0 __nf_unregister_net_hook+0x1ae/0x5a0 nf_unregister_net_hooks+0xde/0x130 ops_exit_list+0xb0/0x170 setup_net+0x7ac/0xbd0 copy_net_ns+0x2e6/0x6b0 create_new_namespaces+0x382/0xa50 unshare_nsproxy_namespaces+0xa6/0x1c0 ksys_unshare+0x3a4/0x7e0 __x64_sys_unshare+0x2d/0x40 do_syscall_64+0x35/0x80 entry_SYSCALL_64_after_hwframe+0x46/0xb0 </TASK> Allocated by task 15855: kasan_save_stack+0x1e/0x40 kasan_set_track+0x21/0x30 __kasan_kmalloc+0xa1/0xb0 __kmalloc+0x49/0xb0 ops_init+0xe7/0x410 setup_net+0x5aa/0xbd0 copy_net_ns+0x2e6/0x6b0 create_new_namespaces+0x382/0xa50 unshare_nsproxy_namespaces+0xa6/0x1c0 ksys_unshare+0x3a4/0x7e0 __x64_sys_unshare+0x2d/0x40 do_syscall_64+0x35/0x80 entry_SYSCALL_64_after_hwframe+0x46/0xb0 Freed by task 15855: kasan_save_stack+0x1e/0x40 kasan_set_track+0x21/0x30 kasan_save_free_info+0x2a/0x40 ____kasan_slab_free+0x155/0x1b0 slab_free_freelist_hook+0x11b/0x220 __kmem_cache_free+0xa4/0x360 ops_init+0xb9/0x410 setup_net+0x5aa/0xbd0 copy_net_ns+0x2e6/0x6b0 create_new_namespaces+0x382/0xa50 unshare_nsproxy_namespaces+0xa6/0x1c0 ksys_unshare+0x3a4/0x7e0 __x64_sys_unshare+0x2d/0x40 do_syscall_64+0x35/0x80 entry_SYSCALL_64_after_hwframe+0x46/0xb0 Fixes: f875bae0 ("net: Automatically allocate per namespace data.") Signed-off-by: NZhengchao Shao <shaozhengchao@huawei.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net> Signed-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
-
由 Ilpo Järvinen 提交于
stable inclusion from stable-v4.19.267 commit 40f5fa26c11bca5c947d218cc4fe6e0c64932070 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I63UEU CVE: NA -------------------------------- commit 1980860e upstream. Returning true from handle_rx_dma() without flushing DMA first creates a data ordering hazard. If DMA Rx has handled any character at the point when RLSI occurs, the non-DMA path handles any pending characters jumping them ahead of those characters that are pending under DMA. Fixes: 75df022b ("serial: 8250_dma: Fix RX handling") Cc: <stable@vger.kernel.org> Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@linux.intel.com> Reviewed-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://lore.kernel.org/r/20221108121952.5497-5-ilpo.jarvinen@linux.intel.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
-
由 Ilpo Järvinen 提交于
stable inclusion from stable-v4.19.267 commit 62cda857457c7de0922852d54d69b140bd6eeb7e category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I63UEU CVE: NA -------------------------------- commit a931237c upstream. DW UART sometimes triggers IIR_RDI during DMA Rx when IIR_RX_TIMEOUT should have been triggered instead. Since IIR_RDI has higher priority than IIR_RX_TIMEOUT, this causes the Rx to hang into interrupt loop. The problem seems to occur at least with some combinations of small-sized transfers (I've reproduced the problem on Elkhart Lake PSE UARTs). If there's already an on-going Rx DMA and IIR_RDI triggers, fall graciously back to non-DMA Rx. That is, behave as if IIR_RX_TIMEOUT had occurred. 8250_omap already considers IIR_RDI similar to this change so its nothing unheard of. Fixes: 75df022b ("serial: 8250_dma: Fix RX handling") Cc: <stable@vger.kernel.org> Co-developed-by: NSrikanth Thokala <srikanth.thokala@intel.com> Signed-off-by: NSrikanth Thokala <srikanth.thokala@intel.com> Co-developed-by: NAman Kumar <aman.kumar@intel.com> Signed-off-by: NAman Kumar <aman.kumar@intel.com> Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@linux.intel.com> Reviewed-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://lore.kernel.org/r/20221108121952.5497-2-ilpo.jarvinen@linux.intel.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
-
由 Gaosheng Cui 提交于
stable inclusion from stable-v4.19.265 commit 90577bcc01c4188416a47269f8433f70502abe98 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I63UEU CVE: NA -------------------------------- commit 8cf0a1bc upstream. In cap_inode_getsecurity(), we will use vfs_getxattr_alloc() to complete the memory allocation of tmpbuf, if we have completed the memory allocation of tmpbuf, but failed to call handler->get(...), there will be a memleak in below logic: |-- ret = (int)vfs_getxattr_alloc(mnt_userns, ...) | /* ^^^ alloc for tmpbuf */ |-- value = krealloc(*xattr_value, error + 1, flags) | /* ^^^ alloc memory */ |-- error = handler->get(handler, ...) | /* error! */ |-- *xattr_value = value | /* xattr_value is &tmpbuf (memory leak!) */ So we will try to free(tmpbuf) after vfs_getxattr_alloc() fails to fix it. Cc: stable@vger.kernel.org Fixes: 8db6c34f ("Introduce v3 namespaced file capabilities") Signed-off-by: NGaosheng Cui <cuigaosheng1@huawei.com> Acked-by: NSerge Hallyn <serge@hallyn.com> [PM: subject line and backtrace tweaks] Signed-off-by: NPaul Moore <paul@paul-moore.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
-
由 Arnd Bergmann 提交于
stable inclusion from stable-v4.19.191 commit 2f34dd12fd7a28888286924d74c0313532bc52d8 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I63UEU CVE: NA -------------------------------- commit 82e5d8cc upstream. gcc-11 introdces a harmless warning for cap_inode_getsecurity: security/commoncap.c: In function ‘cap_inode_getsecurity’: security/commoncap.c:440:33: error: ‘memcpy’ reading 16 bytes from a region of size 0 [-Werror=stringop-overread] 440 | memcpy(&nscap->data, &cap->data, sizeof(__le32) * 2 * VFS_CAP_U32); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The problem here is that tmpbuf is initialized to NULL, so gcc assumes it is not accessible unless it gets set by vfs_getxattr_alloc(). This is a legitimate warning as far as I can tell, but the code is correct since it correctly handles the error when that function fails. Add a separate NULL check to tell gcc about it as well. Signed-off-by: NArnd Bergmann <arnd@arndb.de> Acked-by: NChristian Brauner <christian.brauner@ubuntu.com> Signed-off-by: NJames Morris <jamorris@linux.microsoft.com> Cc: Andrey Zhizhikin <andrey.z@gmail.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
-
由 Daniil Tatianin 提交于
stable inclusion from stable-v4.19.267 commit 455ea324770205525cbc13f155806a5346794339 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I63UEU CVE: NA -------------------------------- commit 56f4ca0a upstream. rb_head_page_deactivate() expects cpu_buffer to contain a valid list of ->pages, so verify that the list is actually present before calling it. Found by Linux Verification Center (linuxtesting.org) with the SVACE static analysis tool. Link: https://lkml.kernel.org/r/20221114143129.3534443-1-d-tatianin@yandex-team.ru Cc: stable@vger.kernel.org Fixes: 77ae365e ("ring-buffer: make lockless") Signed-off-by: NDaniil Tatianin <d-tatianin@yandex-team.ru> Signed-off-by: NSteven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
-
由 Xiu Jianfeng 提交于
stable inclusion from stable-v4.19.267 commit b5bfc61f541d3f092b13dedcfe000d86eb8e133c category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I63UEU CVE: NA -------------------------------- commit 19ba6c8a upstream. The @ftrace_mod is allocated by kzalloc(), so both the members {prev,next} of @ftrace_mode->list are NULL, it's not a valid state to call list_del(). If kstrdup() for @ftrace_mod->{func|module} fails, it goes to @out_free tag and calls free_ftrace_mod() to destroy @ftrace_mod, then list_del() will write prev->next and next->prev, where null pointer dereference happens. BUG: kernel NULL pointer dereference, address: 0000000000000008 Oops: 0002 [#1] PREEMPT SMP NOPTI Call Trace: <TASK> ftrace_mod_callback+0x20d/0x220 ? do_filp_open+0xd9/0x140 ftrace_process_regex.isra.51+0xbf/0x130 ftrace_regex_write.isra.52.part.53+0x6e/0x90 vfs_write+0xee/0x3a0 ? __audit_filter_op+0xb1/0x100 ? auditd_test_task+0x38/0x50 ksys_write+0xa5/0xe0 do_syscall_64+0x3a/0x90 entry_SYSCALL_64_after_hwframe+0x63/0xcd Kernel panic - not syncing: Fatal exception So call INIT_LIST_HEAD() to initialize the list member to fix this issue. Link: https://lkml.kernel.org/r/20221116015207.30858-1-xiujianfeng@huawei.com Cc: stable@vger.kernel.org Fixes: 673feb9d ("ftrace: Add :mod: caching infrastructure to trace_array") Signed-off-by: NXiu Jianfeng <xiujianfeng@huawei.com> Signed-off-by: NSteven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
-
由 Wang Wensheng 提交于
stable inclusion from stable-v4.19.267 commit d110bb57a7e9831465aa3abb6c0d1cc658b05fbe category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I63UEU CVE: NA -------------------------------- commit bcea02b0 upstream. If we can't allocate this size, try something smaller with half of the size. Its order should be decreased by one instead of divided by two. Link: https://lkml.kernel.org/r/20221109094434.84046-3-wangwensheng4@huawei.com Cc: <mhiramat@kernel.org> Cc: <mark.rutland@arm.com> Cc: stable@vger.kernel.org Fixes: a7900875 ("ftrace: Allocate the mcount record pages as groups") Signed-off-by: NWang Wensheng <wangwensheng4@huawei.com> Signed-off-by: NSteven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
-
由 Li Qiang 提交于
stable inclusion from stable-v4.19.265 commit d608ed66abfaccc233404be2583ab89c37e560fc category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I63UEU CVE: NA -------------------------------- commit 4a6f316d upstream. In aggregate kprobe case, when arm_kprobe failed, we need set the kp->flags with KPROBE_FLAG_DISABLED again. If not, the 'kp' kprobe will been considered as enabled but it actually not enabled. Link: https://lore.kernel.org/all/20220902155820.34755-1-liq3ea@163.com/ Fixes: 12310e34 ("kprobes: Propagate error from arm_kprobe_ftrace()") Cc: stable@vger.kernel.org Signed-off-by: NLi Qiang <liq3ea@163.com> Acked-by: NMasami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: NMasami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
-
由 Alexander Potapenko 提交于
stable inclusion from stable-v4.19.267 commit 8a5be2948f350d34b1f6acb9ca3be4c89359a057 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I63UEU CVE: NA -------------------------------- commit 1468c6f4 upstream. Functions implementing the a_ops->write_end() interface accept the `void *fsdata` parameter that is supposed to be initialized by the corresponding a_ops->write_begin() (which accepts `void **fsdata`). However not all a_ops->write_begin() implementations initialize `fsdata` unconditionally, so it may get passed uninitialized to a_ops->write_end(), resulting in undefined behavior. Fix this by initializing fsdata with NULL before the call to write_begin(), rather than doing so in all possible a_ops implementations. This patch covers only the following cases found by running x86 KMSAN under syzkaller: - generic_perform_write() - cont_expand_zero() and generic_cont_expand_simple() - page_symlink() Other cases of passing uninitialized fsdata may persist in the codebase. Link: https://lkml.kernel.org/r/20220915150417.722975-43-glider@google.comSigned-off-by: NAlexander Potapenko <glider@google.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Andrey Konovalov <andreyknvl@google.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Christoph Hellwig <hch@lst.de> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Eric Biggers <ebiggers@google.com> Cc: Eric Biggers <ebiggers@kernel.org> Cc: Eric Dumazet <edumazet@google.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Ilya Leoshkevich <iii@linux.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Kees Cook <keescook@chromium.org> Cc: Marco Elver <elver@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michael S. Tsirkin <mst@redhat.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Petr Mladek <pmladek@suse.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vegard Nossum <vegard.nossum@oracle.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
-
由 Zhang Xiaoxu 提交于
stable inclusion from stable-v4.19.265 commit 86ce0e93cf6fb4d0c447323ac66577c642628b9d category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I63UEU CVE: NA -------------------------------- [ Upstream commit 7e843672 ] If one of the slot allocate failed, should cleanup all the other allocated slots, otherwise, the allocated slots will leak: unreferenced object 0xffff8881115aa100 (size 64): comm ""mount.nfs"", pid 679, jiffies 4294744957 (age 115.037s) hex dump (first 32 bytes): 00 cc 19 73 81 88 ff ff 00 a0 5a 11 81 88 ff ff ...s......Z..... 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<000000007a4c434a>] nfs4_find_or_create_slot+0x8e/0x130 [<000000005472a39c>] nfs4_realloc_slot_table+0x23f/0x270 [<00000000cd8ca0eb>] nfs40_init_client+0x4a/0x90 [<00000000128486db>] nfs4_init_client+0xce/0x270 [<000000008d2cacad>] nfs4_set_client+0x1a2/0x2b0 [<000000000e593b52>] nfs4_create_server+0x300/0x5f0 [<00000000e4425dd2>] nfs4_try_get_tree+0x65/0x110 [<00000000d3a6176f>] vfs_get_tree+0x41/0xf0 [<0000000016b5ad4c>] path_mount+0x9b3/0xdd0 [<00000000494cae71>] __x64_sys_mount+0x190/0x1d0 [<000000005d56bdec>] do_syscall_64+0x35/0x80 [<00000000687c9ae4>] entry_SYSCALL_64_after_hwframe+0x46/0xb0 Fixes: abf79bb3 ("NFS: Add a slot table to struct nfs_client for NFSv4.0 transport blocking") Signed-off-by: NZhang Xiaoxu <zhangxiaoxu5@huawei.com> Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com> Signed-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
-
由 Christian A. Ehrhardt 提交于
stable inclusion from stable-v4.19.264 commit 028cf780743eea79abffa7206b9dcfc080ad3546 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I63UEU CVE: NA -------------------------------- commit 4abc9965 upstream. Syzkaller managed to trigger concurrent calls to kernfs_remove_by_name_ns() for the same file resulting in a KASAN detected use-after-free. The race occurs when the root node is freed during kernfs_drain(). To prevent this acquire an additional reference for the root of the tree that is removed before calling __kernfs_remove(). Found by syzkaller with the following reproducer (slab_nomerge is required): syz_mount_image$ext4(0x0, &(0x7f0000000100)='./file0\x00', 0x100000, 0x0, 0x0, 0x0, 0x0) r0 = openat(0xffffffffffffff9c, &(0x7f0000000080)='/proc/self/exe\x00', 0x0, 0x0) close(r0) pipe2(&(0x7f0000000140)={0xffffffffffffffff, <r1=>0xffffffffffffffff}, 0x800) mount$9p_fd(0x0, &(0x7f0000000040)='./file0\x00', &(0x7f00000000c0), 0x408, &(0x7f0000000280)={'trans=fd,', {'rfdno', 0x3d, r0}, 0x2c, {'wfdno', 0x3d, r1}, 0x2c, {[{@cache_loose}, {@mmap}, {@loose}, {@loose}, {@mmap}], [{@mask={'mask', 0x3d, '^MAY_EXEC'}}, {@fsmagic={'fsmagic', 0x3d, 0x10001}}, {@dont_hash}]}}) Sample report: ================================================================== BUG: KASAN: use-after-free in kernfs_type include/linux/kernfs.h:335 [inline] BUG: KASAN: use-after-free in kernfs_leftmost_descendant fs/kernfs/dir.c:1261 [inline] BUG: KASAN: use-after-free in __kernfs_remove.part.0+0x843/0x960 fs/kernfs/dir.c:1369 Read of size 2 at addr ffff8880088807f0 by task syz-executor.2/857 CPU: 0 PID: 857 Comm: syz-executor.2 Not tainted 6.0.0-rc3-00363-g7726d4c3 #5 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x6e/0x91 lib/dump_stack.c:106 print_address_description mm/kasan/report.c:317 [inline] print_report.cold+0x5e/0x5e5 mm/kasan/report.c:433 kasan_report+0xa3/0x130 mm/kasan/report.c:495 kernfs_type include/linux/kernfs.h:335 [inline] kernfs_leftmost_descendant fs/kernfs/dir.c:1261 [inline] __kernfs_remove.part.0+0x843/0x960 fs/kernfs/dir.c:1369 __kernfs_remove fs/kernfs/dir.c:1356 [inline] kernfs_remove_by_name_ns+0x108/0x190 fs/kernfs/dir.c:1589 sysfs_slab_add+0x133/0x1e0 mm/slub.c:5943 __kmem_cache_create+0x3e0/0x550 mm/slub.c:4899 create_cache mm/slab_common.c:229 [inline] kmem_cache_create_usercopy+0x167/0x2a0 mm/slab_common.c:335 p9_client_create+0xd4d/0x1190 net/9p/client.c:993 v9fs_session_init+0x1e6/0x13c0 fs/9p/v9fs.c:408 v9fs_mount+0xb9/0xbd0 fs/9p/vfs_super.c:126 legacy_get_tree+0xf1/0x200 fs/fs_context.c:610 vfs_get_tree+0x85/0x2e0 fs/super.c:1530 do_new_mount fs/namespace.c:3040 [inline] path_mount+0x675/0x1d00 fs/namespace.c:3370 do_mount fs/namespace.c:3383 [inline] __do_sys_mount fs/namespace.c:3591 [inline] __se_sys_mount fs/namespace.c:3568 [inline] __x64_sys_mount+0x282/0x300 fs/namespace.c:3568 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x38/0x90 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x7f725f983aed Code: 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007f725f0f7028 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5 RAX: ffffffffffffffda RBX: 00007f725faa3f80 RCX: 00007f725f983aed RDX: 00000000200000c0 RSI: 0000000020000040 RDI: 0000000000000000 RBP: 00007f725f9f419c R08: 0000000020000280 R09: 0000000000000000 R10: 0000000000000408 R11: 0000000000000246 R12: 0000000000000000 R13: 0000000000000006 R14: 00007f725faa3f80 R15: 00007f725f0d7000 </TASK> Allocated by task 855: kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38 kasan_set_track mm/kasan/common.c:45 [inline] set_alloc_info mm/kasan/common.c:437 [inline] __kasan_slab_alloc+0x66/0x80 mm/kasan/common.c:470 kasan_slab_alloc include/linux/kasan.h:224 [inline] slab_post_alloc_hook mm/slab.h:727 [inline] slab_alloc_node mm/slub.c:3243 [inline] slab_alloc mm/slub.c:3251 [inline] __kmem_cache_alloc_lru mm/slub.c:3258 [inline] kmem_cache_alloc+0xbf/0x200 mm/slub.c:3268 kmem_cache_zalloc include/linux/slab.h:723 [inline] __kernfs_new_node+0xd4/0x680 fs/kernfs/dir.c:593 kernfs_new_node fs/kernfs/dir.c:655 [inline] kernfs_create_dir_ns+0x9c/0x220 fs/kernfs/dir.c:1010 sysfs_create_dir_ns+0x127/0x290 fs/sysfs/dir.c:59 create_dir lib/kobject.c:63 [inline] kobject_add_internal+0x24a/0x8d0 lib/kobject.c:223 kobject_add_varg lib/kobject.c:358 [inline] kobject_init_and_add+0x101/0x160 lib/kobject.c:441 sysfs_slab_add+0x156/0x1e0 mm/slub.c:5954 __kmem_cache_create+0x3e0/0x550 mm/slub.c:4899 create_cache mm/slab_common.c:229 [inline] kmem_cache_create_usercopy+0x167/0x2a0 mm/slab_common.c:335 p9_client_create+0xd4d/0x1190 net/9p/client.c:993 v9fs_session_init+0x1e6/0x13c0 fs/9p/v9fs.c:408 v9fs_mount+0xb9/0xbd0 fs/9p/vfs_super.c:126 legacy_get_tree+0xf1/0x200 fs/fs_context.c:610 vfs_get_tree+0x85/0x2e0 fs/super.c:1530 do_new_mount fs/namespace.c:3040 [inline] path_mount+0x675/0x1d00 fs/namespace.c:3370 do_mount fs/namespace.c:3383 [inline] __do_sys_mount fs/namespace.c:3591 [inline] __se_sys_mount fs/namespace.c:3568 [inline] __x64_sys_mount+0x282/0x300 fs/namespace.c:3568 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x38/0x90 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd Freed by task 857: kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38 kasan_set_track+0x21/0x30 mm/kasan/common.c:45 kasan_set_free_info+0x20/0x40 mm/kasan/generic.c:370 ____kasan_slab_free mm/kasan/common.c:367 [inline] ____kasan_slab_free mm/kasan/common.c:329 [inline] __kasan_slab_free+0x108/0x190 mm/kasan/common.c:375 kasan_slab_free include/linux/kasan.h:200 [inline] slab_free_hook mm/slub.c:1754 [inline] slab_free_freelist_hook mm/slub.c:1780 [inline] slab_free mm/slub.c:3534 [inline] kmem_cache_free+0x9c/0x340 mm/slub.c:3551 kernfs_put.part.0+0x2b2/0x520 fs/kernfs/dir.c:547 kernfs_put+0x42/0x50 fs/kernfs/dir.c:521 __kernfs_remove.part.0+0x72d/0x960 fs/kernfs/dir.c:1407 __kernfs_remove fs/kernfs/dir.c:1356 [inline] kernfs_remove_by_name_ns+0x108/0x190 fs/kernfs/dir.c:1589 sysfs_slab_add+0x133/0x1e0 mm/slub.c:5943 __kmem_cache_create+0x3e0/0x550 mm/slub.c:4899 create_cache mm/slab_common.c:229 [inline] kmem_cache_create_usercopy+0x167/0x2a0 mm/slab_common.c:335 p9_client_create+0xd4d/0x1190 net/9p/client.c:993 v9fs_session_init+0x1e6/0x13c0 fs/9p/v9fs.c:408 v9fs_mount+0xb9/0xbd0 fs/9p/vfs_super.c:126 legacy_get_tree+0xf1/0x200 fs/fs_context.c:610 vfs_get_tree+0x85/0x2e0 fs/super.c:1530 do_new_mount fs/namespace.c:3040 [inline] path_mount+0x675/0x1d00 fs/namespace.c:3370 do_mount fs/namespace.c:3383 [inline] __do_sys_mount fs/namespace.c:3591 [inline] __se_sys_mount fs/namespace.c:3568 [inline] __x64_sys_mount+0x282/0x300 fs/namespace.c:3568 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x38/0x90 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd The buggy address belongs to the object at ffff888008880780 which belongs to the cache kernfs_node_cache of size 128 The buggy address is located 112 bytes inside of 128-byte region [ffff888008880780, ffff888008880800) The buggy address belongs to the physical page: page:00000000732833f8 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x8880 flags: 0x100000000000200(slab|node=0|zone=1) raw: 0100000000000200 0000000000000000 dead000000000122 ffff888001147280 raw: 0000000000000000 0000000000150015 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff888008880680: fc fc fc fc fc fc fc fc fa fb fb fb fb fb fb fb ffff888008880700: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc >ffff888008880780: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff888008880800: fc fc fc fc fc fc fc fc fa fb fb fb fb fb fb fb ffff888008880880: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc ================================================================== Acked-by: NTejun Heo <tj@kernel.org> Cc: stable <stable@kernel.org> # -rc3 Signed-off-by: NChristian A. Ehrhardt <lk@c--e.de> Link: https://lore.kernel.org/r/20220913121723.691454-1-lk@c--e.deSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
-
由 Rik van Riel 提交于
stable inclusion from stable-v4.19.264 commit 2b35432d324898ec41beb27031d2a1a864a4d40e category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I63UEU CVE: NA -------------------------------- commit 12df140f upstream. The h->*_huge_pages counters are protected by the hugetlb_lock, but alloc_huge_page has a corner case where it can decrement the counter outside of the lock. This could lead to a corrupted value of h->resv_huge_pages, which we have observed on our systems. Take the hugetlb_lock before decrementing h->resv_huge_pages to avoid a potential race. Link: https://lkml.kernel.org/r/20221017202505.0e6a4fcd@imladris.surriel.com Fixes: a88c7695 ("mm: hugetlb: fix hugepage memory leak caused by wrong reserve count") Signed-off-by: NRik van Riel <riel@surriel.com> Reviewed-by: NMike Kravetz <mike.kravetz@oracle.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Glen McCready <gkmccready@meta.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Muchun Song <songmuchun@bytedance.com> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NMike Kravetz <mike.kravetz@oracle.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
-
由 Seth Jenkins 提交于
stable inclusion from stable-v4.19.264 commit dbe863bce7679c7f5ec0e993d834fe16c5e687b5 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I63UEU CVE: NA -------------------------------- Commit 258f669e ("mm: /proc/pid/smaps_rollup: convert to single value seq_file") introduced a null-deref if there are no vma's in the task in show_smaps_rollup. Fixes: 258f669e ("mm: /proc/pid/smaps_rollup: convert to single value seq_file") Signed-off-by: NSeth Jenkins <sethjenkins@google.com> Reviewed-by: NAlexey Dobriyan <adobriyan@gmail.com> Tested-by: NAlexey Dobriyan <adobriyan@gmail.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
-
由 Xia Fukun 提交于
stable inclusion from stable-v4.19.267 commit 93d9cef55f8fe463e3b9f6c73c7a32619222c657 category: bugfix bugzilla: 187828, https://gitee.com/openeuler/kernel/issues/I63UEU CVE: NA -------------------------------- [ Upstream commit a382f8fe ] These are indeed "should not happen" situations, but it turns out recent changes made the 'task_is_stopped_or_trace()' case trigger (fix for that exists, is pending more testing), and the BUG_ON() makes it unnecessarily hard to actually debug for no good reason. It's been that way for a long time, but let's make it clear: BUG_ON() is not good for debugging, and should never be used in situations where you could just say "this shouldn't happen, but we can continue". Use WARN_ON_ONCE() instead to make sure it gets logged, and then just continue running. Instead of making the system basically unusuable because you crashed the machine while potentially holding some very core locks (eg this function is commonly called while holding 'tasklist_lock' for writing). Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NXia Fukun <xiafukun@huawei.com> Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
-
由 Xia Fukun 提交于
stable inclusion from stable-v4.19.267 commit 33d2f83e3f2c1fdabb365d25bed3aa630041cbc0 category: bugfix bugzilla: 188002, https://gitee.com/openeuler/kernel/issues/I63UEU CVE: NA -------------------------------- commit fc82bbf4 upstream. This is another old BUG_ON() that just shouldn't exist (see also commit a382f8fe: "signal handling: don't use BUG_ON() for debugging"). In fact, as Matthew Wilcox points out, this condition shouldn't really even result in a warning, since a negative id allocation result is just a normal allocation failure: "I wonder if we should even warn here -- sure, the caller is trying to free something that wasn't allocated, but we don't warn for kfree(NULL)" and goes on to point out how that current error check is only causing people to unnecessarily do their own index range checking before freeing it. This was noted by Itay Iellin, because the bluetooth HCI socket cookie code does *not* do that range checking, and ends up just freeing the error case too, triggering the BUG_ON(). The HCI code requires CAP_NET_RAW, and seems to just result in an ugly splat, but there really is no reason to BUG_ON() here, and we have generally striven for allocation models where it's always ok to just do free(alloc()); even if the allocation were to fail for some random reason (usually obviously that "random" reason being some resource limit). Fixes: 88eca020 ("ida: simplified functions for id allocation") Reported-by: NItay Iellin <ieitayie@gmail.com> Suggested-by: NMatthew Wilcox <willy@infradead.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NXia Fukun <xiafukun@huawei.com> Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
-
- 06 12月, 2022 1 次提交
-
-
由 openeuler-ci-bot 提交于
Merge Pull Request from: @leoliu-oc When the processor is idle,low-power idle states (C-states) can be used to save power. For Zhaoxin processors,there are two methods to enter idle states. One is HLT instruction and legacy method of I/O reads from the CPI-defined register (known as P_LVLx),the other one is MWAIT instruction with idle states hints. Default for legacy operating system,HLT and P_LVLx I/O reads are used for Zhaoxin Processors to enter idle states, but we have checked on some Zhaoxin platform that MWAIT instruction is more efficient than P_LVLx I/O reads and HLT, so we add MWAIT Cx support for Zhaoxin Processors. ### Issue https://gitee.com/openeuler/kernel/issues/I62TOM ### Test N/A ### Known Issue N/A ### Default config change N/A Link:https://gitee.com/openeuler/kernel/pulls/272 Reviewed-by: Laibin Qiu <qiulaibin@huawei.com> Signed-off-by: Xie XiuQi <xiexiuqi@huawei.com>
-
- 05 12月, 2022 2 次提交
-
-
由 Sungwoo Kim 提交于
maillist inclusion category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I63D3E CVE: CVE-2022-45934 -------------------------------- By keep sending L2CAP_CONF_REQ packets, chan->num_conf_rsp increases multiple times and eventually it will wrap around the maximum number (i.e., 255). This patch prevents this by adding a boundary check with L2CAP_MAX_CONF_RSP Btmon log: Bluetooth monitor ver 5.64 = Note: Linux version 6.1.0-rc2 (x86_64) 0.264594 = Note: Bluetooth subsystem version 2.22 0.264636 @ MGMT Open: btmon (privileged) version 1.22 {0x0001} 0.272191 = New Index: 00:00:00:00:00:00 (Primary,Virtual,hci0) [hci0] 13.877604 @ RAW Open: 9496 (privileged) version 2.22 {0x0002} 13.890741 = Open Index: 00:00:00:00:00:00 [hci0] 13.900426 (...) > ACL Data RX: Handle 200 flags 0x00 dlen 1033 #32 [hci0] 14.273106 invalid packet size (12 != 1033) 08 00 01 00 02 01 04 00 01 10 ff ff ............ > ACL Data RX: Handle 200 flags 0x00 dlen 1547 #33 [hci0] 14.273561 invalid packet size (14 != 1547) 0a 00 01 00 04 01 06 00 40 00 00 00 00 00 ........@..... > ACL Data RX: Handle 200 flags 0x00 dlen 2061 #34 [hci0] 14.274390 invalid packet size (16 != 2061) 0c 00 01 00 04 01 08 00 40 00 00 00 00 00 00 04 ........@....... > ACL Data RX: Handle 200 flags 0x00 dlen 2061 #35 [hci0] 14.274932 invalid packet size (16 != 2061) 0c 00 01 00 04 01 08 00 40 00 00 00 07 00 03 00 ........@....... = bluetoothd: Bluetooth daemon 5.43 14.401828 > ACL Data RX: Handle 200 flags 0x00 dlen 1033 #36 [hci0] 14.275753 invalid packet size (12 != 1033) 08 00 01 00 04 01 04 00 40 00 00 00 ........@... Signed-off-by: NSungwoo Kim <iam@sung-woo.kim> Signed-off-by: NLuiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: NBaisong Zhong <zhongbaisong@huawei.com> Reviewed-by: NLiu Jian <liujian56@huawei.com> Reviewed-by: NYue Haibing <yuehaibing@huawei.com> Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com> Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
-
由 Jakub Sitnicki 提交于
mainline inclusion from mainline-v6.1-rc7 commit af295e85 category: bugfix bugzilla: 188056, https://gitee.com/src-openeuler/kernel/issues/I62RNU CVE: CVE-2022-4129 -------------------------------- When holding a reader-writer spin lock we cannot sleep. Calling setup_udp_tunnel_sock() with write lock held violates this rule, because we end up calling percpu_down_read(), which might sleep, as syzbot reports [1]: __might_resched.cold+0x222/0x26b kernel/sched/core.c:9890 percpu_down_read include/linux/percpu-rwsem.h:49 [inline] cpus_read_lock+0x1b/0x140 kernel/cpu.c:310 static_key_slow_inc+0x12/0x20 kernel/jump_label.c:158 udp_tunnel_encap_enable include/net/udp_tunnel.h:187 [inline] setup_udp_tunnel_sock+0x43d/0x550 net/ipv4/udp_tunnel_core.c:81 l2tp_tunnel_register+0xc51/0x1210 net/l2tp/l2tp_core.c:1509 pppol2tp_connect+0xcdc/0x1a10 net/l2tp/l2tp_ppp.c:723 Trim the writer-side critical section for sk_callback_lock down to the minimum, so that it covers only operations on sk_user_data. Also, when grabbing the sk_callback_lock, we always need to disable BH, as Eric points out. Failing to do so leads to deadlocks because we acquire sk_callback_lock in softirq context, which can get stuck waiting on us if: 1) it runs on the same CPU, or CPU0 ---- lock(clock-AF_INET6); <Interrupt> lock(clock-AF_INET6); 2) lock ordering leads to priority inversion CPU0 CPU1 ---- ---- lock(clock-AF_INET6); local_irq_disable(); lock(&tcp_hashinfo.bhash[i].lock); lock(clock-AF_INET6); <Interrupt> lock(&tcp_hashinfo.bhash[i].lock); ... as syzbot reports [2,3]. Use the _bh variants for write_(un)lock. [1] https://lore.kernel.org/netdev/0000000000004e78ec05eda79749@google.com/ [2] https://lore.kernel.org/netdev/000000000000e38b6605eda76f98@google.com/ [3] https://lore.kernel.org/netdev/000000000000dfa31e05eda76f75@google.com/ v2: - Check and set sk_user_data while holding sk_callback_lock for both L2TP encapsulation types (IP and UDP) (Tetsuo) Cc: Tom Parkin <tparkin@katalix.com> Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Fixes: b68777d5 ("l2tp: Serialize access to sk_user_data with sk_callback_lock") Reported-by: NEric Dumazet <edumazet@google.com> Reported-by: syzbot+703d9e154b3b58277261@syzkaller.appspotmail.com Reported-by: syzbot+50680ced9e98a61f7698@syzkaller.appspotmail.com Reported-by: syzbot+de987172bb74a381879b@syzkaller.appspotmail.com Signed-off-by: NJakub Sitnicki <jakub@cloudflare.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net> Signed-off-by: NLu Wei <luwei32@huawei.com> Reviewed-by: NLiu Jian <liujian56@huawei.com> Reviewed-by: NYue Haibing <yuehaibing@huawei.com> Reviewed-by: NYue Haibing <yuehaibing@huawei.com> Reviewed-by: NWang Weiyang <wangweiyang2@huawei.com> Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
-