1. 08 12月, 2022 9 次提交
    • W
      ftrace: Optimize the allocation for mcount entries · 3d193fb1
      Wang Wensheng 提交于
      stable inclusion
      from stable-v4.19.267
      commit d110bb57a7e9831465aa3abb6c0d1cc658b05fbe
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I63UEU
      CVE: NA
      
      --------------------------------
      
      commit bcea02b0 upstream.
      
      If we can't allocate this size, try something smaller with half of the
      size. Its order should be decreased by one instead of divided by two.
      
      Link: https://lkml.kernel.org/r/20221109094434.84046-3-wangwensheng4@huawei.com
      
      Cc: <mhiramat@kernel.org>
      Cc: <mark.rutland@arm.com>
      Cc: stable@vger.kernel.org
      Fixes: a7900875 ("ftrace: Allocate the mcount record pages as groups")
      Signed-off-by: NWang Wensheng <wangwensheng4@huawei.com>
      Signed-off-by: NSteven Rostedt (Google) <rostedt@goodmis.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
      3d193fb1
    • L
      kprobe: reverse kp->flags when arm_kprobe failed · 04150923
      Li Qiang 提交于
      stable inclusion
      from stable-v4.19.265
      commit d608ed66abfaccc233404be2583ab89c37e560fc
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I63UEU
      CVE: NA
      
      --------------------------------
      
      commit 4a6f316d upstream.
      
      In aggregate kprobe case, when arm_kprobe failed,
      we need set the kp->flags with KPROBE_FLAG_DISABLED again.
      If not, the 'kp' kprobe will been considered as enabled
      but it actually not enabled.
      
      Link: https://lore.kernel.org/all/20220902155820.34755-1-liq3ea@163.com/
      
      Fixes: 12310e34 ("kprobes: Propagate error from arm_kprobe_ftrace()")
      Cc: stable@vger.kernel.org
      Signed-off-by: NLi Qiang <liq3ea@163.com>
      Acked-by: NMasami Hiramatsu (Google) <mhiramat@kernel.org>
      Signed-off-by: NMasami Hiramatsu (Google) <mhiramat@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
      04150923
    • A
      mm: fs: initialize fsdata passed to write_begin/write_end interface · c01f46a9
      Alexander Potapenko 提交于
      stable inclusion
      from stable-v4.19.267
      commit 8a5be2948f350d34b1f6acb9ca3be4c89359a057
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I63UEU
      CVE: NA
      
      --------------------------------
      
      commit 1468c6f4 upstream.
      
      Functions implementing the a_ops->write_end() interface accept the `void
      *fsdata` parameter that is supposed to be initialized by the corresponding
      a_ops->write_begin() (which accepts `void **fsdata`).
      
      However not all a_ops->write_begin() implementations initialize `fsdata`
      unconditionally, so it may get passed uninitialized to a_ops->write_end(),
      resulting in undefined behavior.
      
      Fix this by initializing fsdata with NULL before the call to
      write_begin(), rather than doing so in all possible a_ops implementations.
      
      This patch covers only the following cases found by running x86 KMSAN
      under syzkaller:
      
       - generic_perform_write()
       - cont_expand_zero() and generic_cont_expand_simple()
       - page_symlink()
      
      Other cases of passing uninitialized fsdata may persist in the codebase.
      
      Link: https://lkml.kernel.org/r/20220915150417.722975-43-glider@google.comSigned-off-by: NAlexander Potapenko <glider@google.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Andrey Konovalov <andreyknvl@gmail.com>
      Cc: Andrey Konovalov <andreyknvl@google.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Eric Biggers <ebiggers@google.com>
      Cc: Eric Biggers <ebiggers@kernel.org>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: Ilya Leoshkevich <iii@linux.ibm.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Marco Elver <elver@google.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Michael S. Tsirkin <mst@redhat.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Vegard Nossum <vegard.nossum@oracle.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
      c01f46a9
    • Z
      nfs4: Fix kmemleak when allocate slot failed · 920f74ac
      Zhang Xiaoxu 提交于
      stable inclusion
      from stable-v4.19.265
      commit 86ce0e93cf6fb4d0c447323ac66577c642628b9d
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I63UEU
      CVE: NA
      
      --------------------------------
      
      [ Upstream commit 7e843672 ]
      
      If one of the slot allocate failed, should cleanup all the other
      allocated slots, otherwise, the allocated slots will leak:
      
        unreferenced object 0xffff8881115aa100 (size 64):
          comm ""mount.nfs"", pid 679, jiffies 4294744957 (age 115.037s)
          hex dump (first 32 bytes):
            00 cc 19 73 81 88 ff ff 00 a0 5a 11 81 88 ff ff  ...s......Z.....
            00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
          backtrace:
            [<000000007a4c434a>] nfs4_find_or_create_slot+0x8e/0x130
            [<000000005472a39c>] nfs4_realloc_slot_table+0x23f/0x270
            [<00000000cd8ca0eb>] nfs40_init_client+0x4a/0x90
            [<00000000128486db>] nfs4_init_client+0xce/0x270
            [<000000008d2cacad>] nfs4_set_client+0x1a2/0x2b0
            [<000000000e593b52>] nfs4_create_server+0x300/0x5f0
            [<00000000e4425dd2>] nfs4_try_get_tree+0x65/0x110
            [<00000000d3a6176f>] vfs_get_tree+0x41/0xf0
            [<0000000016b5ad4c>] path_mount+0x9b3/0xdd0
            [<00000000494cae71>] __x64_sys_mount+0x190/0x1d0
            [<000000005d56bdec>] do_syscall_64+0x35/0x80
            [<00000000687c9ae4>] entry_SYSCALL_64_after_hwframe+0x46/0xb0
      
      Fixes: abf79bb3 ("NFS: Add a slot table to struct nfs_client for NFSv4.0 transport blocking")
      Signed-off-by: NZhang Xiaoxu <zhangxiaoxu5@huawei.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
      920f74ac
    • C
      kernfs: fix use-after-free in __kernfs_remove · a1a691b4
      Christian A. Ehrhardt 提交于
      stable inclusion
      from stable-v4.19.264
      commit 028cf780743eea79abffa7206b9dcfc080ad3546
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I63UEU
      CVE: NA
      
      --------------------------------
      
      commit 4abc9965 upstream.
      
      Syzkaller managed to trigger concurrent calls to
      kernfs_remove_by_name_ns() for the same file resulting in
      a KASAN detected use-after-free. The race occurs when the root
      node is freed during kernfs_drain().
      
      To prevent this acquire an additional reference for the root
      of the tree that is removed before calling __kernfs_remove().
      
      Found by syzkaller with the following reproducer (slab_nomerge is
      required):
      
      syz_mount_image$ext4(0x0, &(0x7f0000000100)='./file0\x00', 0x100000, 0x0, 0x0, 0x0, 0x0)
      r0 = openat(0xffffffffffffff9c, &(0x7f0000000080)='/proc/self/exe\x00', 0x0, 0x0)
      close(r0)
      pipe2(&(0x7f0000000140)={0xffffffffffffffff, <r1=>0xffffffffffffffff}, 0x800)
      mount$9p_fd(0x0, &(0x7f0000000040)='./file0\x00', &(0x7f00000000c0), 0x408, &(0x7f0000000280)={'trans=fd,', {'rfdno', 0x3d, r0}, 0x2c, {'wfdno', 0x3d, r1}, 0x2c, {[{@cache_loose}, {@mmap}, {@loose}, {@loose}, {@mmap}], [{@mask={'mask', 0x3d, '^MAY_EXEC'}}, {@fsmagic={'fsmagic', 0x3d, 0x10001}}, {@dont_hash}]}})
      
      Sample report:
      
      ==================================================================
      BUG: KASAN: use-after-free in kernfs_type include/linux/kernfs.h:335 [inline]
      BUG: KASAN: use-after-free in kernfs_leftmost_descendant fs/kernfs/dir.c:1261 [inline]
      BUG: KASAN: use-after-free in __kernfs_remove.part.0+0x843/0x960 fs/kernfs/dir.c:1369
      Read of size 2 at addr ffff8880088807f0 by task syz-executor.2/857
      
      CPU: 0 PID: 857 Comm: syz-executor.2 Not tainted 6.0.0-rc3-00363-g7726d4c3 #5
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
      Call Trace:
       <TASK>
       __dump_stack lib/dump_stack.c:88 [inline]
       dump_stack_lvl+0x6e/0x91 lib/dump_stack.c:106
       print_address_description mm/kasan/report.c:317 [inline]
       print_report.cold+0x5e/0x5e5 mm/kasan/report.c:433
       kasan_report+0xa3/0x130 mm/kasan/report.c:495
       kernfs_type include/linux/kernfs.h:335 [inline]
       kernfs_leftmost_descendant fs/kernfs/dir.c:1261 [inline]
       __kernfs_remove.part.0+0x843/0x960 fs/kernfs/dir.c:1369
       __kernfs_remove fs/kernfs/dir.c:1356 [inline]
       kernfs_remove_by_name_ns+0x108/0x190 fs/kernfs/dir.c:1589
       sysfs_slab_add+0x133/0x1e0 mm/slub.c:5943
       __kmem_cache_create+0x3e0/0x550 mm/slub.c:4899
       create_cache mm/slab_common.c:229 [inline]
       kmem_cache_create_usercopy+0x167/0x2a0 mm/slab_common.c:335
       p9_client_create+0xd4d/0x1190 net/9p/client.c:993
       v9fs_session_init+0x1e6/0x13c0 fs/9p/v9fs.c:408
       v9fs_mount+0xb9/0xbd0 fs/9p/vfs_super.c:126
       legacy_get_tree+0xf1/0x200 fs/fs_context.c:610
       vfs_get_tree+0x85/0x2e0 fs/super.c:1530
       do_new_mount fs/namespace.c:3040 [inline]
       path_mount+0x675/0x1d00 fs/namespace.c:3370
       do_mount fs/namespace.c:3383 [inline]
       __do_sys_mount fs/namespace.c:3591 [inline]
       __se_sys_mount fs/namespace.c:3568 [inline]
       __x64_sys_mount+0x282/0x300 fs/namespace.c:3568
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x38/0x90 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x63/0xcd
      RIP: 0033:0x7f725f983aed
      Code: 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48
      RSP: 002b:00007f725f0f7028 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5
      RAX: ffffffffffffffda RBX: 00007f725faa3f80 RCX: 00007f725f983aed
      RDX: 00000000200000c0 RSI: 0000000020000040 RDI: 0000000000000000
      RBP: 00007f725f9f419c R08: 0000000020000280 R09: 0000000000000000
      R10: 0000000000000408 R11: 0000000000000246 R12: 0000000000000000
      R13: 0000000000000006 R14: 00007f725faa3f80 R15: 00007f725f0d7000
       </TASK>
      
      Allocated by task 855:
       kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
       kasan_set_track mm/kasan/common.c:45 [inline]
       set_alloc_info mm/kasan/common.c:437 [inline]
       __kasan_slab_alloc+0x66/0x80 mm/kasan/common.c:470
       kasan_slab_alloc include/linux/kasan.h:224 [inline]
       slab_post_alloc_hook mm/slab.h:727 [inline]
       slab_alloc_node mm/slub.c:3243 [inline]
       slab_alloc mm/slub.c:3251 [inline]
       __kmem_cache_alloc_lru mm/slub.c:3258 [inline]
       kmem_cache_alloc+0xbf/0x200 mm/slub.c:3268
       kmem_cache_zalloc include/linux/slab.h:723 [inline]
       __kernfs_new_node+0xd4/0x680 fs/kernfs/dir.c:593
       kernfs_new_node fs/kernfs/dir.c:655 [inline]
       kernfs_create_dir_ns+0x9c/0x220 fs/kernfs/dir.c:1010
       sysfs_create_dir_ns+0x127/0x290 fs/sysfs/dir.c:59
       create_dir lib/kobject.c:63 [inline]
       kobject_add_internal+0x24a/0x8d0 lib/kobject.c:223
       kobject_add_varg lib/kobject.c:358 [inline]
       kobject_init_and_add+0x101/0x160 lib/kobject.c:441
       sysfs_slab_add+0x156/0x1e0 mm/slub.c:5954
       __kmem_cache_create+0x3e0/0x550 mm/slub.c:4899
       create_cache mm/slab_common.c:229 [inline]
       kmem_cache_create_usercopy+0x167/0x2a0 mm/slab_common.c:335
       p9_client_create+0xd4d/0x1190 net/9p/client.c:993
       v9fs_session_init+0x1e6/0x13c0 fs/9p/v9fs.c:408
       v9fs_mount+0xb9/0xbd0 fs/9p/vfs_super.c:126
       legacy_get_tree+0xf1/0x200 fs/fs_context.c:610
       vfs_get_tree+0x85/0x2e0 fs/super.c:1530
       do_new_mount fs/namespace.c:3040 [inline]
       path_mount+0x675/0x1d00 fs/namespace.c:3370
       do_mount fs/namespace.c:3383 [inline]
       __do_sys_mount fs/namespace.c:3591 [inline]
       __se_sys_mount fs/namespace.c:3568 [inline]
       __x64_sys_mount+0x282/0x300 fs/namespace.c:3568
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x38/0x90 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
      Freed by task 857:
       kasan_save_stack+0x1e/0x40 mm/kasan/common.c:38
       kasan_set_track+0x21/0x30 mm/kasan/common.c:45
       kasan_set_free_info+0x20/0x40 mm/kasan/generic.c:370
       ____kasan_slab_free mm/kasan/common.c:367 [inline]
       ____kasan_slab_free mm/kasan/common.c:329 [inline]
       __kasan_slab_free+0x108/0x190 mm/kasan/common.c:375
       kasan_slab_free include/linux/kasan.h:200 [inline]
       slab_free_hook mm/slub.c:1754 [inline]
       slab_free_freelist_hook mm/slub.c:1780 [inline]
       slab_free mm/slub.c:3534 [inline]
       kmem_cache_free+0x9c/0x340 mm/slub.c:3551
       kernfs_put.part.0+0x2b2/0x520 fs/kernfs/dir.c:547
       kernfs_put+0x42/0x50 fs/kernfs/dir.c:521
       __kernfs_remove.part.0+0x72d/0x960 fs/kernfs/dir.c:1407
       __kernfs_remove fs/kernfs/dir.c:1356 [inline]
       kernfs_remove_by_name_ns+0x108/0x190 fs/kernfs/dir.c:1589
       sysfs_slab_add+0x133/0x1e0 mm/slub.c:5943
       __kmem_cache_create+0x3e0/0x550 mm/slub.c:4899
       create_cache mm/slab_common.c:229 [inline]
       kmem_cache_create_usercopy+0x167/0x2a0 mm/slab_common.c:335
       p9_client_create+0xd4d/0x1190 net/9p/client.c:993
       v9fs_session_init+0x1e6/0x13c0 fs/9p/v9fs.c:408
       v9fs_mount+0xb9/0xbd0 fs/9p/vfs_super.c:126
       legacy_get_tree+0xf1/0x200 fs/fs_context.c:610
       vfs_get_tree+0x85/0x2e0 fs/super.c:1530
       do_new_mount fs/namespace.c:3040 [inline]
       path_mount+0x675/0x1d00 fs/namespace.c:3370
       do_mount fs/namespace.c:3383 [inline]
       __do_sys_mount fs/namespace.c:3591 [inline]
       __se_sys_mount fs/namespace.c:3568 [inline]
       __x64_sys_mount+0x282/0x300 fs/namespace.c:3568
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x38/0x90 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x63/0xcd
      
      The buggy address belongs to the object at ffff888008880780
       which belongs to the cache kernfs_node_cache of size 128
      The buggy address is located 112 bytes inside of
       128-byte region [ffff888008880780, ffff888008880800)
      
      The buggy address belongs to the physical page:
      page:00000000732833f8 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x8880
      flags: 0x100000000000200(slab|node=0|zone=1)
      raw: 0100000000000200 0000000000000000 dead000000000122 ffff888001147280
      raw: 0000000000000000 0000000000150015 00000001ffffffff 0000000000000000
      page dumped because: kasan: bad access detected
      
      Memory state around the buggy address:
       ffff888008880680: fc fc fc fc fc fc fc fc fa fb fb fb fb fb fb fb
       ffff888008880700: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
      >ffff888008880780: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                                                   ^
       ffff888008880800: fc fc fc fc fc fc fc fc fa fb fb fb fb fb fb fb
       ffff888008880880: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
      ==================================================================
      Acked-by: NTejun Heo <tj@kernel.org>
      Cc: stable <stable@kernel.org> # -rc3
      Signed-off-by: NChristian A. Ehrhardt <lk@c--e.de>
      Link: https://lore.kernel.org/r/20220913121723.691454-1-lk@c--e.deSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
      a1a691b4
    • R
      mm,hugetlb: take hugetlb_lock before decrementing h->resv_huge_pages · 7a5b0955
      Rik van Riel 提交于
      stable inclusion
      from stable-v4.19.264
      commit 2b35432d324898ec41beb27031d2a1a864a4d40e
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I63UEU
      CVE: NA
      
      --------------------------------
      
      commit 12df140f upstream.
      
      The h->*_huge_pages counters are protected by the hugetlb_lock, but
      alloc_huge_page has a corner case where it can decrement the counter
      outside of the lock.
      
      This could lead to a corrupted value of h->resv_huge_pages, which we have
      observed on our systems.
      
      Take the hugetlb_lock before decrementing h->resv_huge_pages to avoid a
      potential race.
      
      Link: https://lkml.kernel.org/r/20221017202505.0e6a4fcd@imladris.surriel.com
      Fixes: a88c7695 ("mm: hugetlb: fix hugepage memory leak caused by wrong reserve count")
      Signed-off-by: NRik van Riel <riel@surriel.com>
      Reviewed-by: NMike Kravetz <mike.kravetz@oracle.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Glen McCready <gkmccready@meta.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Muchun Song <songmuchun@bytedance.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NMike Kravetz <mike.kravetz@oracle.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
      7a5b0955
    • S
      mm: /proc/pid/smaps_rollup: fix no vma's null-deref · 33213b46
      Seth Jenkins 提交于
      stable inclusion
      from stable-v4.19.264
      commit dbe863bce7679c7f5ec0e993d834fe16c5e687b5
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I63UEU
      CVE: NA
      
      --------------------------------
      
      Commit 258f669e ("mm: /proc/pid/smaps_rollup: convert to single value
      seq_file") introduced a null-deref if there are no vma's in the task in
      show_smaps_rollup.
      
      Fixes: 258f669e ("mm: /proc/pid/smaps_rollup: convert to single value seq_file")
      Signed-off-by: NSeth Jenkins <sethjenkins@google.com>
      Reviewed-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Tested-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
      33213b46
    • X
      signal handling: don't use BUG_ON() for debugging · a2f88993
      Xia Fukun 提交于
      stable inclusion
      from stable-v4.19.267
      commit 93d9cef55f8fe463e3b9f6c73c7a32619222c657
      category: bugfix
      bugzilla: 187828, https://gitee.com/openeuler/kernel/issues/I63UEU
      CVE: NA
      
      --------------------------------
      
      [ Upstream commit a382f8fe ]
      
      These are indeed "should not happen" situations, but it turns out recent
      changes made the 'task_is_stopped_or_trace()' case trigger (fix for that
      exists, is pending more testing), and the BUG_ON() makes it
      unnecessarily hard to actually debug for no good reason.
      
      It's been that way for a long time, but let's make it clear: BUG_ON() is
      not good for debugging, and should never be used in situations where you
      could just say "this shouldn't happen, but we can continue".
      
      Use WARN_ON_ONCE() instead to make sure it gets logged, and then just
      continue running.  Instead of making the system basically unusuable
      because you crashed the machine while potentially holding some very core
      locks (eg this function is commonly called while holding 'tasklist_lock'
      for writing).
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NXia Fukun <xiafukun@huawei.com>
      Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
      a2f88993
    • X
      ida: don't use BUG_ON() for debugging · eaab6483
      Xia Fukun 提交于
      stable inclusion
      from stable-v4.19.267
      commit 33d2f83e3f2c1fdabb365d25bed3aa630041cbc0
      category: bugfix
      bugzilla: 188002, https://gitee.com/openeuler/kernel/issues/I63UEU
      CVE: NA
      
      --------------------------------
      
      commit fc82bbf4 upstream.
      
      This is another old BUG_ON() that just shouldn't exist (see also commit
      a382f8fe: "signal handling: don't use BUG_ON() for debugging").
      
      In fact, as Matthew Wilcox points out, this condition shouldn't really
      even result in a warning, since a negative id allocation result is just
      a normal allocation failure:
      
        "I wonder if we should even warn here -- sure, the caller is trying to
         free something that wasn't allocated, but we don't warn for
         kfree(NULL)"
      
      and goes on to point out how that current error check is only causing
      people to unnecessarily do their own index range checking before freeing
      it.
      
      This was noted by Itay Iellin, because the bluetooth HCI socket cookie
      code does *not* do that range checking, and ends up just freeing the
      error case too, triggering the BUG_ON().
      
      The HCI code requires CAP_NET_RAW, and seems to just result in an ugly
      splat, but there really is no reason to BUG_ON() here, and we have
      generally striven for allocation models where it's always ok to just do
      
          free(alloc());
      
      even if the allocation were to fail for some random reason (usually
      obviously that "random" reason being some resource limit).
      
      Fixes: 88eca020 ("ida: simplified functions for id allocation")
      Reported-by: NItay Iellin <ieitayie@gmail.com>
      Suggested-by: NMatthew Wilcox <willy@infradead.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NXia Fukun <xiafukun@huawei.com>
      Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
      eaab6483
  2. 06 12月, 2022 1 次提交
    • O
      !272 [openEuler-1.0-LTS] Add MWAIT Cx support for Zhaoxin CPUs. · 75ea48ac
      openeuler-ci-bot 提交于
      Merge Pull Request from: @leoliu-oc 
       
      When the processor is idle,low-power idle states (C-states) can be used to save power. For Zhaoxin processors,there are two methods to enter idle states. One is HLT instruction and legacy method of I/O reads from the CPI-defined register (known as P_LVLx),the other one is MWAIT instruction with idle states hints.
      
      Default for legacy operating system,HLT and P_LVLx I/O reads are used for Zhaoxin Processors to enter idle states, but we have checked on some Zhaoxin platform that MWAIT instruction is more efficient than P_LVLx I/O reads and HLT, so we add MWAIT Cx support for Zhaoxin Processors.
      
      ### Issue
      https://gitee.com/openeuler/kernel/issues/I62TOM
      
      ### Test
      N/A
      
      ### Known Issue
      N/A
      
      ### Default config change
      N/A
      
       
       
      Link:https://gitee.com/openeuler/kernel/pulls/272 
      Reviewed-by: Laibin Qiu <qiulaibin@huawei.com> 
      Signed-off-by: Xie XiuQi <xiexiuqi@huawei.com> 
      75ea48ac
  3. 05 12月, 2022 3 次提交
  4. 29 11月, 2022 5 次提交
  5. 27 11月, 2022 1 次提交
    • F
      x86/tsc: use topology_max_packages() in tsc watchdog check · 4f283abb
      Feng Tang 提交于
      hulk inclusion
      category: bugfix
      bugzilla: 187942, https://gitee.com/openeuler/kernel/issues/I5U037
      CVE: NA
      
      -------------------------------
      
      Commit b50db709 ("x86/tsc: Disable clocksource watchdog for TSC
      on qualified platorms") was introduced to solve problem that
      sometimes TSC clocksource is wrongly judged as unstable by watchdog
      like 'jiffies', HPET, etc.
      
      In it, the hardware socket number is a key factor for judging
      whether to disable the watchdog for TSC, and 'nr_online_nodes' was
      chosen as an estimation due to it is needed in early boot phase
      before registering 'tsc-early' clocksource, where all none-boot
      CPUs are not brought up yet.
      
      In recent patch review, Dave Hansen pointed out there are many
      cases that 'nr_online_nodes' could have issue, like:
      * numa emulation (numa=fake=4 etc.)
      * numa=off
      * platforms with CPU+DRAM nodes, CPU-less HBM nodes, CPU-less
        persistent memory nodes.
      
      Peter Zijlstra suggested to use logical package ids, but it is
      only usable after smp_init() and all CPUs are initialized.
      
      One solution is to skip the watchdog for 'tsc-early' clocksource,
      and move the check after smp_init(), while before 'tsc'
      clocksoure is registered, where topology_max_packages() could
      be used as a much more accurate socket number.
      Signed-off-by: NFeng Tang <feng.tang@intel.com>
      
      Conflict:
      	arch/x86/kernel/tsc.c
      Signed-off-by: NYu Liao <liaoyu15@huawei.com>
      Reviewed-by: NXiongfeng Wang <wangxiongfeng2@huawei.com>
      Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
      4f283abb
  6. 26 11月, 2022 2 次提交
    • X
      scsi: hisi_sas: Set iptt aborted flag when receiving an abnormal CQ · 4cccc16a
      Xingui Yang 提交于
      driver inclusion
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I62ZXO
      CVE: NA
      
      ------------------------------------------------
      
      During the write I/O, when the SAS PHY switch is tested, the hardware
      may reports two CQs for one IO. the first cq indicates invalid port when
      DPH scheduling, the second cq indicates that response frame has been
      written to the memory but the I/O is ended abnormally due to I/O data
      underload. So set iptt aborted flag when receiving an abnormal CQ, then the
      host will discards the IPTT frame received from the SAS hard disk.
      Signed-off-by: NXingui Yang <yangxingui@huawei.com>
      Reviewed-by: Nkang fenglong <kangfenglong@huawei.com>
      Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
      4cccc16a
    • L
      ext4: fix bug in extents parsing when eh_entries == 0 and eh_depth > 0 · bc9ebdce
      Luís Henriques 提交于
      mainline inclusion
      from mainline-v6.0-rc7
      commit 29a5b8a1
      category: bugfix
      bugzilla: 187444, https://gitee.com/openeuler/kernel/issues/I6261Z
      CVE: NA
      
      --------------------------------
      
      When walking through an inode extents, the ext4_ext_binsearch_idx() function
      assumes that the extent header has been previously validated.  However, there
      are no checks that verify that the number of entries (eh->eh_entries) is
      non-zero when depth is > 0.  And this will lead to problems because the
      EXT_FIRST_INDEX() and EXT_LAST_INDEX() will return garbage and result in this:
      
      [  135.245946] ------------[ cut here ]------------
      [  135.247579] kernel BUG at fs/ext4/extents.c:2258!
      [  135.249045] invalid opcode: 0000 [#1] PREEMPT SMP
      [  135.250320] CPU: 2 PID: 238 Comm: tmp118 Not tainted 5.19.0-rc8+ #4
      [  135.252067] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.15.0-0-g2dd4b9b-rebuilt.opensuse.org 04/01/2014
      [  135.255065] RIP: 0010:ext4_ext_map_blocks+0xc20/0xcb0
      [  135.256475] Code:
      [  135.261433] RSP: 0018:ffffc900005939f8 EFLAGS: 00010246
      [  135.262847] RAX: 0000000000000024 RBX: ffffc90000593b70 RCX: 0000000000000023
      [  135.264765] RDX: ffff8880038e5f10 RSI: 0000000000000003 RDI: ffff8880046e922c
      [  135.266670] RBP: ffff8880046e9348 R08: 0000000000000001 R09: ffff888002ca580c
      [  135.268576] R10: 0000000000002602 R11: 0000000000000000 R12: 0000000000000024
      [  135.270477] R13: 0000000000000000 R14: 0000000000000024 R15: 0000000000000000
      [  135.272394] FS:  00007fdabdc56740(0000) GS:ffff88807dd00000(0000) knlGS:0000000000000000
      [  135.274510] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  135.276075] CR2: 00007ffc26bd4f00 CR3: 0000000006261004 CR4: 0000000000170ea0
      [  135.277952] Call Trace:
      [  135.278635]  <TASK>
      [  135.279247]  ? preempt_count_add+0x6d/0xa0
      [  135.280358]  ? percpu_counter_add_batch+0x55/0xb0
      [  135.281612]  ? _raw_read_unlock+0x18/0x30
      [  135.282704]  ext4_map_blocks+0x294/0x5a0
      [  135.283745]  ? xa_load+0x6f/0xa0
      [  135.284562]  ext4_mpage_readpages+0x3d6/0x770
      [  135.285646]  read_pages+0x67/0x1d0
      [  135.286492]  ? folio_add_lru+0x51/0x80
      [  135.287441]  page_cache_ra_unbounded+0x124/0x170
      [  135.288510]  filemap_get_pages+0x23d/0x5a0
      [  135.289457]  ? path_openat+0xa72/0xdd0
      [  135.290332]  filemap_read+0xbf/0x300
      [  135.291158]  ? _raw_spin_lock_irqsave+0x17/0x40
      [  135.292192]  new_sync_read+0x103/0x170
      [  135.293014]  vfs_read+0x15d/0x180
      [  135.293745]  ksys_read+0xa1/0xe0
      [  135.294461]  do_syscall_64+0x3c/0x80
      [  135.295284]  entry_SYSCALL_64_after_hwframe+0x46/0xb0
      
      This patch simply adds an extra check in __ext4_ext_check(), verifying that
      eh_entries is not 0 when eh_depth is > 0.
      
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=215941
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=216283
      Cc: Baokun Li <libaokun1@huawei.com>
      Cc: stable@kernel.org
      Signed-off-by: NLuís Henriques <lhenriques@suse.de>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Reviewed-by: NBaokun Li <libaokun1@huawei.com>
      Link: https://lore.kernel.org/r/20220822094235.2690-1-lhenriques@suse.deSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: NBaokun Li <libaokun1@huawei.com>
      Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
      Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
      bc9ebdce
  7. 24 11月, 2022 1 次提交
    • L
      Add MWAIT Cx support for Zhaoxin CPUs. · e1b6487f
      leoliu 提交于
      zhaoxin inclusion
      category: feature
      bugzilla: https://gitee.com/openeuler/kernel/issues/I62TOM
      CVE: NA
      
      ----------------------------------------------------------------
      
      When the processor is idle,low-power idle states (C-states) can be used
      to save power. For Zhaoxin processors,there are two methods to enter idle
      states. One is HLT instruction and legacy method of I/O reads from the
      ACPI-defined register (known as P_LVLx),the other one is MWAIT
      instruction with idle states hints.
      
      Default for legacy operating system,HLT and P_LVLx I/O reads are used for
      Zhaoxin Processors to enter idle states, but we have checked on some
      Zhaoxin platform that MWAIT instruction is more efficient than P_LVLx I/O
      reads and HLT, so we add MWAIT Cx support for Zhaoxin Processors.
      Signed-off-by: Nleoliu <leoliu@zhaoxin.com>
      e1b6487f
  8. 21 11月, 2022 1 次提交
  9. 19 11月, 2022 4 次提交
  10. 15 11月, 2022 1 次提交
  11. 14 11月, 2022 2 次提交
  12. 08 11月, 2022 10 次提交
    • R
      init/main.c: return 1 from handled __setup() functions · d484e833
      Randy Dunlap 提交于
      stable inclusion
      from stable-4.19.238
      commit c7daf1b4ad809692d5c26f33c02ed8a031066548
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I5X41F
      CVE: NA
      
      --------------------------------
      
      [ Upstream commit f9a40b08 ]
      
      initcall_blacklist() should return 1 to indicate that it handled its
      cmdline arguments.
      
      set_debug_rodata() should return 1 to indicate that it handled its
      cmdline arguments.  Print a warning if the option string is invalid.
      
      This prevents these strings from being added to the 'init' program's
      environment as they are not init arguments/parameters.
      
      Link: https://lkml.kernel.org/r/20220221050901.23985-1-rdunlap@infradead.orgSigned-off-by: NRandy Dunlap <rdunlap@infradead.org>
      Reported-by: NIgor Zhbanov <i.zhbanov@omprussia.ru>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NLin Yujun <linyujun809@huawei.com>
      Reviewed-by: NZhang Jianhua <chris.zjh@huawei.com>
      Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
      d484e833
    • P
      x86/pm: Save the MSR validity status at context setup · 47477ca7
      Pawan Gupta 提交于
      stable inclusion
      from stable-4.19.238
      commit c7daf1b4ad809692d5c26f33c02ed8a031066548
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I5X41F
      CVE: NA
      
      --------------------------------
      
      commit 73924ec4 upstream.
      
      The mechanism to save/restore MSRs during S3 suspend/resume checks for
      the MSR validity during suspend, and only restores the MSR if its a
      valid MSR.  This is not optimal, as an invalid MSR will unnecessarily
      throw an exception for every suspend cycle.  The more invalid MSRs,
      higher the impact will be.
      
      Check and save the MSR validity at setup.  This ensures that only valid
      MSRs that are guaranteed to not throw an exception will be attempted
      during suspend.
      
      Fixes: 7a9c2dd0 ("x86/pm: Introduce quirk framework to save/restore extra MSR registers around suspend/resume")
      Suggested-by: NDave Hansen <dave.hansen@linux.intel.com>
      Signed-off-by: NPawan Gupta <pawan.kumar.gupta@linux.intel.com>
      Reviewed-by: NDave Hansen <dave.hansen@linux.intel.com>
      Acked-by: NBorislav Petkov <bp@suse.de>
      Cc: stable@vger.kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NLin Yujun <linyujun809@huawei.com>
      Reviewed-by: NZhang Jianhua <chris.zjh@huawei.com>
      Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
      47477ca7
    • P
      x86/speculation: Restore speculation related MSRs during S3 resume · 2b91784d
      Pawan Gupta 提交于
      stable inclusion
      from stable-4.19.238
      commit c7daf1b4ad809692d5c26f33c02ed8a031066548
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I5X41F
      CVE: NA
      
      --------------------------------
      
      commit e2a1256b upstream.
      
      After resuming from suspend-to-RAM, the MSRs that control CPU's
      speculative execution behavior are not being restored on the boot CPU.
      
      These MSRs are used to mitigate speculative execution vulnerabilities.
      Not restoring them correctly may leave the CPU vulnerable.  Secondary
      CPU's MSRs are correctly being restored at S3 resume by
      identify_secondary_cpu().
      
      During S3 resume, restore these MSRs for boot CPU when restoring its
      processor state.
      
      Fixes: 77243971 ("x86/bugs/intel: Set proper CPU features and setup RDS")
      Reported-by: NNeelima Krishnan <neelima.krishnan@intel.com>
      Signed-off-by: NPawan Gupta <pawan.kumar.gupta@linux.intel.com>
      Tested-by: NNeelima Krishnan <neelima.krishnan@intel.com>
      Acked-by: NBorislav Petkov <bp@suse.de>
      Reviewed-by: NDave Hansen <dave.hansen@linux.intel.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NLin Yujun <linyujun809@huawei.com>
      Reviewed-by: NZhang Jianhua <chris.zjh@huawei.com>
      Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
      2b91784d
    • B
      x86/cpu: Load microcode during restore_processor_state() · 27dd57ae
      Borislav Petkov 提交于
      stable inclusion
      from stable-4.19.238
      commit c7daf1b4ad809692d5c26f33c02ed8a031066548
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I5X41F
      CVE: NA
      
      --------------------------------
      
      commit f9e14dbb upstream.
      
      When resuming from system sleep state, restore_processor_state()
      restores the boot CPU MSRs. These MSRs could be emulated by microcode.
      If microcode is not loaded yet, writing to emulated MSRs leads to
      unchecked MSR access error:
      
        ...
        PM: Calling lapic_suspend+0x0/0x210
        unchecked MSR access error: WRMSR to 0x10f (tried to write 0x0...0) at rIP: ... (native_write_msr)
        Call Trace:
          <TASK>
          ? restore_processor_state
          x86_acpi_suspend_lowlevel
          acpi_suspend_enter
          suspend_devices_and_enter
          pm_suspend.cold
          state_store
          kobj_attr_store
          sysfs_kf_write
          kernfs_fop_write_iter
          new_sync_write
          vfs_write
          ksys_write
          __x64_sys_write
          do_syscall_64
          entry_SYSCALL_64_after_hwframe
         RIP: 0033:0x7fda13c260a7
      
      To ensure microcode emulated MSRs are available for restoration, load
      the microcode on the boot CPU before restoring these MSRs.
      
        [ Pawan: write commit message and productize it. ]
      
      Fixes: e2a1256b ("x86/speculation: Restore speculation related MSRs during S3 resume")
      Reported-by: NKyle D. Pelton <kyle.d.pelton@intel.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Signed-off-by: NPawan Gupta <pawan.kumar.gupta@linux.intel.com>
      Tested-by: NKyle D. Pelton <kyle.d.pelton@intel.com>
      Cc: stable@vger.kernel.org
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=215841
      Link: https://lore.kernel.org/r/4350dfbf785cd482d3fafa72b2b49c83102df3ce.1650386317.git.pawan.kumar.gupta@linux.intel.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NLin Yujun <linyujun809@huawei.com>
      Reviewed-by: NZhang Jianhua <chris.zjh@huawei.com>
      Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
      27dd57ae
    • T
      genirq: Synchronize interrupt thread startup · 950faec0
      Thomas Pfaff 提交于
      stable inclusion
      from stable-4.19.238
      commit c7daf1b4ad809692d5c26f33c02ed8a031066548
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I5X41F
      CVE: NA
      
      --------------------------------
      
      commit 8707898e upstream.
      
      A kernel hang can be observed when running setserial in a loop on a kernel
      with force threaded interrupts. The sequence of events is:
      
         setserial
           open("/dev/ttyXXX")
             request_irq()
           do_stuff()
            -> serial interrupt
               -> wake(irq_thread)
      	      desc->threads_active++;
           close()
             free_irq()
               kthread_stop(irq_thread)
           synchronize_irq() <- hangs because desc->threads_active != 0
      
      The thread is created in request_irq() and woken up, but does not get on a
      CPU to reach the actual thread function, which would handle the pending
      wake-up. kthread_stop() sets the should stop condition which makes the
      thread immediately exit, which in turn leaves the stale threads_active
      count around.
      
      This problem was introduced with commit 519cc865, which addressed a
      interrupt sharing issue in the PCIe code.
      
      Before that commit free_irq() invoked synchronize_irq(), which waits for
      the hard interrupt handler and also for associated threads to complete.
      
      To address the PCIe issue synchronize_irq() was replaced with
      __synchronize_hardirq(), which only waits for the hard interrupt handler to
      complete, but not for threaded handlers.
      
      This was done under the assumption, that the interrupt thread already
      reached the thread function and waits for a wake-up, which is guaranteed to
      be handled before acting on the stop condition. The problematic case, that
      the thread would not reach the thread function, was obviously overlooked.
      
      Make sure that the interrupt thread is really started and reaches
      thread_fn() before returning from __setup_irq().
      
      This utilizes the existing wait queue in the interrupt descriptor. The
      wait queue is unused for non-shared interrupts. For shared interrupts the
      usage might cause a spurious wake-up of a waiter in synchronize_irq() or the
      completion of a threaded handler might cause a spurious wake-up of the
      waiter for the ready flag. Both are harmless and have no functional impact.
      
      [ tglx: Amended changelog ]
      
      Fixes: 519cc865 ("genirq: Synchronize only with single thread on free_irq()")
      Signed-off-by: NThomas Pfaff <tpfaff@pcs.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NMarc Zyngier <maz@kernel.org>
      Cc: stable@vger.kernel.org
      Link: https://lore.kernel.org/r/552fe7b4-9224-b183-bb87-a8f36d335690@pcs.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NLin Yujun <linyujun809@huawei.com>
      Reviewed-by: NZhang Jianhua <chris.zjh@huawei.com>
      Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
      950faec0
    • M
      nvme: Fix IOC_PR_CLEAR and IOC_PR_RELEASE ioctls for nvme devices · 3b23e85f
      Michael Kelley 提交于
      stable inclusion
      from stable-v4.19.261
      commit 5f7fd71e5bebf337769f20dd125822ce63266e4d
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I5ZXGL
      CVE: NA
      
      --------------------------------
      
      [ Upstream commit c292a337 ]
      
      The IOC_PR_CLEAR and IOC_PR_RELEASE ioctls are
      non-functional on NVMe devices because the nvme_pr_clear()
      and nvme_pr_release() functions set the IEKEY field incorrectly.
      The IEKEY field should be set only when the key is zero (i.e,
      not specified).  The current code does it backwards.
      
      Furthermore, the NVMe spec describes the persistent
      reservation "clear" function as an option on the reservation
      release command. The current implementation of nvme_pr_clear()
      erroneously uses the reservation register command.
      
      Fix these errors. Note that NVMe version 1.3 and later specify
      that setting the IEKEY field will return an error of Invalid
      Field in Command.  The fix will set IEKEY when the key is zero,
      which is appropriate as these ioctls consider a zero key to
      be "unspecified", and the intention of the spec change is
      to require a valid key.
      
      Tested on a version 1.4 PCI NVMe device in an Azure VM.
      
      Fixes: 1673f1f0 ("nvme: move block_device_operations and ns/ctrl freeing to common code")
      Fixes: 1d277a63 ("NVMe: Add persistent reservation ops")
      Signed-off-by: NMichael Kelley <mikelley@microsoft.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Conflicts:
      	drivers/nvme/host/core.c
      Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
      Reviewed-by: NJason Yan <yanaijie@huawei.com>
      Reviewed-by: NYu Kuai <yukuai3@huawei.com>
      Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
      3b23e85f
    • E
      once: add DO_ONCE_SLOW() for sleepable contexts · 3743e9b5
      Eric Dumazet 提交于
      stable inclusion
      from stable-v4.19.262
      commit f5686a03b138f6330eeda082ee4f96c8109f56f3
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I5ZXGL
      CVE: NA
      
      --------------------------------
      
      [ Upstream commit 62c07983 ]
      
      Christophe Leroy reported a ~80ms latency spike
      happening at first TCP connect() time.
      
      This is because __inet_hash_connect() uses get_random_once()
      to populate a perturbation table which became quite big
      after commit 4c2c8f03 ("tcp: increase source port perturb table to 2^16")
      
      get_random_once() uses DO_ONCE(), which block hard irqs for the duration
      of the operation.
      
      This patch adds DO_ONCE_SLOW() which uses a mutex instead of a spinlock
      for operations where we prefer to stay in process context.
      
      Then __inet_hash_connect() can use get_random_slow_once()
      to populate its perturbation table.
      
      Fixes: 4c2c8f03 ("tcp: increase source port perturb table to 2^16")
      Fixes: 190cc824 ("tcp: change source port randomizarion at connect() time")
      Reported-by: NChristophe Leroy <christophe.leroy@csgroup.eu>
      Link: https://lore.kernel.org/netdev/CANn89iLAEYBaoYajy0Y9UmGFff5GPxDUoG-ErVB2jDdRNQ5Tug@mail.gmail.com/T/#tSigned-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Willy Tarreau <w@1wt.eu>
      Tested-by: NChristophe Leroy <christophe.leroy@csgroup.eu>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      
      One conflict occurs because the commit 4c2c8f03 ("tcp: increase
      source port perturb table to 2^16") is integrated but the commit
      e9261476 ("tcp: dynamically allocate the perturb table used by
      source port") is not integrated.
      One conflict occurs because the commit 1027b96e ("once: Fix panic
      when module unload") is not integrated.
      
      Conflicts:
      	net/ipv4/inet_hashtables.c
      	lib/once.c
      Signed-off-by: NLiu Jian <liujian56@huawei.com>
      Reviewed-by: NYue Haibing <yuehaibing@huawei.com>
      Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
      3743e9b5
    • E
      inet: fully convert sk->sk_rx_dst to RCU rules · 4c4298bf
      Eric Dumazet 提交于
      stable inclusion
      from stable-v4.19.262
      commit 75a578000ae5e511e5d0e8433c94a14d9c99c412
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I5ZXGL
      CVE: NA
      
      --------------------------------
      
      commit 8f905c0e upstream.
      
      syzbot reported various issues around early demux,
      one being included in this changelog [1]
      
      sk->sk_rx_dst is using RCU protection without clearly
      documenting it.
      
      And following sequences in tcp_v4_do_rcv()/tcp_v6_do_rcv()
      are not following standard RCU rules.
      
      [a]    dst_release(dst);
      [b]    sk->sk_rx_dst = NULL;
      
      They look wrong because a delete operation of RCU protected
      pointer is supposed to clear the pointer before
      the call_rcu()/synchronize_rcu() guarding actual memory freeing.
      
      In some cases indeed, dst could be freed before [b] is done.
      
      We could cheat by clearing sk_rx_dst before calling
      dst_release(), but this seems the right time to stick
      to standard RCU annotations and debugging facilities.
      
      [1]
      BUG: KASAN: use-after-free in dst_check include/net/dst.h:470 [inline]
      BUG: KASAN: use-after-free in tcp_v4_early_demux+0x95b/0x960 net/ipv4/tcp_ipv4.c:1792
      Read of size 2 at addr ffff88807f1cb73a by task syz-executor.5/9204
      
      CPU: 0 PID: 9204 Comm: syz-executor.5 Not tainted 5.16.0-rc5-syzkaller #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       <TASK>
       __dump_stack lib/dump_stack.c:88 [inline]
       dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106
       print_address_description.constprop.0.cold+0x8d/0x320 mm/kasan/report.c:247
       __kasan_report mm/kasan/report.c:433 [inline]
       kasan_report.cold+0x83/0xdf mm/kasan/report.c:450
       dst_check include/net/dst.h:470 [inline]
       tcp_v4_early_demux+0x95b/0x960 net/ipv4/tcp_ipv4.c:1792
       ip_rcv_finish_core.constprop.0+0x15de/0x1e80 net/ipv4/ip_input.c:340
       ip_list_rcv_finish.constprop.0+0x1b2/0x6e0 net/ipv4/ip_input.c:583
       ip_sublist_rcv net/ipv4/ip_input.c:609 [inline]
       ip_list_rcv+0x34e/0x490 net/ipv4/ip_input.c:644
       __netif_receive_skb_list_ptype net/core/dev.c:5508 [inline]
       __netif_receive_skb_list_core+0x549/0x8e0 net/core/dev.c:5556
       __netif_receive_skb_list net/core/dev.c:5608 [inline]
       netif_receive_skb_list_internal+0x75e/0xd80 net/core/dev.c:5699
       gro_normal_list net/core/dev.c:5853 [inline]
       gro_normal_list net/core/dev.c:5849 [inline]
       napi_complete_done+0x1f1/0x880 net/core/dev.c:6590
       virtqueue_napi_complete drivers/net/virtio_net.c:339 [inline]
       virtnet_poll+0xca2/0x11b0 drivers/net/virtio_net.c:1557
       __napi_poll+0xaf/0x440 net/core/dev.c:7023
       napi_poll net/core/dev.c:7090 [inline]
       net_rx_action+0x801/0xb40 net/core/dev.c:7177
       __do_softirq+0x29b/0x9c2 kernel/softirq.c:558
       invoke_softirq kernel/softirq.c:432 [inline]
       __irq_exit_rcu+0x123/0x180 kernel/softirq.c:637
       irq_exit_rcu+0x5/0x20 kernel/softirq.c:649
       common_interrupt+0x52/0xc0 arch/x86/kernel/irq.c:240
       asm_common_interrupt+0x1e/0x40 arch/x86/include/asm/idtentry.h:629
      RIP: 0033:0x7f5e972bfd57
      Code: 39 d1 73 14 0f 1f 80 00 00 00 00 48 8b 50 f8 48 83 e8 08 48 39 ca 77 f3 48 39 c3 73 3e 48 89 13 48 8b 50 f8 48 89 38 49 8b 0e <48> 8b 3e 48 83 c3 08 48 83 c6 08 eb bc 48 39 d1 72 9e 48 39 d0 73
      RSP: 002b:00007fff8a413210 EFLAGS: 00000283
      RAX: 00007f5e97108990 RBX: 00007f5e97108338 RCX: ffffffff81d3aa45
      RDX: ffffffff81d3aa45 RSI: 00007f5e97108340 RDI: ffffffff81d3aa45
      RBP: 00007f5e97107eb8 R08: 00007f5e97108d88 R09: 0000000093c2e8d9
      R10: 0000000000000000 R11: 0000000000000000 R12: 00007f5e97107eb0
      R13: 00007f5e97108338 R14: 00007f5e97107ea8 R15: 0000000000000019
       </TASK>
      
      Allocated by task 13:
       kasan_save_stack+0x1e/0x50 mm/kasan/common.c:38
       kasan_set_track mm/kasan/common.c:46 [inline]
       set_alloc_info mm/kasan/common.c:434 [inline]
       __kasan_slab_alloc+0x90/0xc0 mm/kasan/common.c:467
       kasan_slab_alloc include/linux/kasan.h:259 [inline]
       slab_post_alloc_hook mm/slab.h:519 [inline]
       slab_alloc_node mm/slub.c:3234 [inline]
       slab_alloc mm/slub.c:3242 [inline]
       kmem_cache_alloc+0x202/0x3a0 mm/slub.c:3247
       dst_alloc+0x146/0x1f0 net/core/dst.c:92
       rt_dst_alloc+0x73/0x430 net/ipv4/route.c:1613
       ip_route_input_slow+0x1817/0x3a20 net/ipv4/route.c:2340
       ip_route_input_rcu net/ipv4/route.c:2470 [inline]
       ip_route_input_noref+0x116/0x2a0 net/ipv4/route.c:2415
       ip_rcv_finish_core.constprop.0+0x288/0x1e80 net/ipv4/ip_input.c:354
       ip_list_rcv_finish.constprop.0+0x1b2/0x6e0 net/ipv4/ip_input.c:583
       ip_sublist_rcv net/ipv4/ip_input.c:609 [inline]
       ip_list_rcv+0x34e/0x490 net/ipv4/ip_input.c:644
       __netif_receive_skb_list_ptype net/core/dev.c:5508 [inline]
       __netif_receive_skb_list_core+0x549/0x8e0 net/core/dev.c:5556
       __netif_receive_skb_list net/core/dev.c:5608 [inline]
       netif_receive_skb_list_internal+0x75e/0xd80 net/core/dev.c:5699
       gro_normal_list net/core/dev.c:5853 [inline]
       gro_normal_list net/core/dev.c:5849 [inline]
       napi_complete_done+0x1f1/0x880 net/core/dev.c:6590
       virtqueue_napi_complete drivers/net/virtio_net.c:339 [inline]
       virtnet_poll+0xca2/0x11b0 drivers/net/virtio_net.c:1557
       __napi_poll+0xaf/0x440 net/core/dev.c:7023
       napi_poll net/core/dev.c:7090 [inline]
       net_rx_action+0x801/0xb40 net/core/dev.c:7177
       __do_softirq+0x29b/0x9c2 kernel/softirq.c:558
      
      Freed by task 13:
       kasan_save_stack+0x1e/0x50 mm/kasan/common.c:38
       kasan_set_track+0x21/0x30 mm/kasan/common.c:46
       kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:370
       ____kasan_slab_free mm/kasan/common.c:366 [inline]
       ____kasan_slab_free mm/kasan/common.c:328 [inline]
       __kasan_slab_free+0xff/0x130 mm/kasan/common.c:374
       kasan_slab_free include/linux/kasan.h:235 [inline]
       slab_free_hook mm/slub.c:1723 [inline]
       slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1749
       slab_free mm/slub.c:3513 [inline]
       kmem_cache_free+0xbd/0x5d0 mm/slub.c:3530
       dst_destroy+0x2d6/0x3f0 net/core/dst.c:127
       rcu_do_batch kernel/rcu/tree.c:2506 [inline]
       rcu_core+0x7ab/0x1470 kernel/rcu/tree.c:2741
       __do_softirq+0x29b/0x9c2 kernel/softirq.c:558
      
      Last potentially related work creation:
       kasan_save_stack+0x1e/0x50 mm/kasan/common.c:38
       __kasan_record_aux_stack+0xf5/0x120 mm/kasan/generic.c:348
       __call_rcu kernel/rcu/tree.c:2985 [inline]
       call_rcu+0xb1/0x740 kernel/rcu/tree.c:3065
       dst_release net/core/dst.c:177 [inline]
       dst_release+0x79/0xe0 net/core/dst.c:167
       tcp_v4_do_rcv+0x612/0x8d0 net/ipv4/tcp_ipv4.c:1712
       sk_backlog_rcv include/net/sock.h:1030 [inline]
       __release_sock+0x134/0x3b0 net/core/sock.c:2768
       release_sock+0x54/0x1b0 net/core/sock.c:3300
       tcp_sendmsg+0x36/0x40 net/ipv4/tcp.c:1441
       inet_sendmsg+0x99/0xe0 net/ipv4/af_inet.c:819
       sock_sendmsg_nosec net/socket.c:704 [inline]
       sock_sendmsg+0xcf/0x120 net/socket.c:724
       sock_write_iter+0x289/0x3c0 net/socket.c:1057
       call_write_iter include/linux/fs.h:2162 [inline]
       new_sync_write+0x429/0x660 fs/read_write.c:503
       vfs_write+0x7cd/0xae0 fs/read_write.c:590
       ksys_write+0x1ee/0x250 fs/read_write.c:643
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
      
      The buggy address belongs to the object at ffff88807f1cb700
       which belongs to the cache ip_dst_cache of size 176
      The buggy address is located 58 bytes inside of
       176-byte region [ffff88807f1cb700, ffff88807f1cb7b0)
      The buggy address belongs to the page:
      page:ffffea0001fc72c0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x7f1cb
      flags: 0xfff00000000200(slab|node=0|zone=1|lastcpupid=0x7ff)
      raw: 00fff00000000200 dead000000000100 dead000000000122 ffff8881413bb780
      raw: 0000000000000000 0000000000100010 00000001ffffffff 0000000000000000
      page dumped because: kasan: bad access detected
      page_owner tracks the page as allocated
      page last allocated via order 0, migratetype Unmovable, gfp_mask 0x112a20(GFP_ATOMIC|__GFP_NOWARN|__GFP_NORETRY|__GFP_HARDWALL), pid 5, ts 108466983062, free_ts 108048976062
       prep_new_page mm/page_alloc.c:2418 [inline]
       get_page_from_freelist+0xa72/0x2f50 mm/page_alloc.c:4149
       __alloc_pages+0x1b2/0x500 mm/page_alloc.c:5369
       alloc_pages+0x1a7/0x300 mm/mempolicy.c:2191
       alloc_slab_page mm/slub.c:1793 [inline]
       allocate_slab mm/slub.c:1930 [inline]
       new_slab+0x32d/0x4a0 mm/slub.c:1993
       ___slab_alloc+0x918/0xfe0 mm/slub.c:3022
       __slab_alloc.constprop.0+0x4d/0xa0 mm/slub.c:3109
       slab_alloc_node mm/slub.c:3200 [inline]
       slab_alloc mm/slub.c:3242 [inline]
       kmem_cache_alloc+0x35c/0x3a0 mm/slub.c:3247
       dst_alloc+0x146/0x1f0 net/core/dst.c:92
       rt_dst_alloc+0x73/0x430 net/ipv4/route.c:1613
       __mkroute_output net/ipv4/route.c:2564 [inline]
       ip_route_output_key_hash_rcu+0x921/0x2d00 net/ipv4/route.c:2791
       ip_route_output_key_hash+0x18b/0x300 net/ipv4/route.c:2619
       __ip_route_output_key include/net/route.h:126 [inline]
       ip_route_output_flow+0x23/0x150 net/ipv4/route.c:2850
       ip_route_output_key include/net/route.h:142 [inline]
       geneve_get_v4_rt+0x3a6/0x830 drivers/net/geneve.c:809
       geneve_xmit_skb drivers/net/geneve.c:899 [inline]
       geneve_xmit+0xc4a/0x3540 drivers/net/geneve.c:1082
       __netdev_start_xmit include/linux/netdevice.h:4994 [inline]
       netdev_start_xmit include/linux/netdevice.h:5008 [inline]
       xmit_one net/core/dev.c:3590 [inline]
       dev_hard_start_xmit+0x1eb/0x920 net/core/dev.c:3606
       __dev_queue_xmit+0x299a/0x3650 net/core/dev.c:4229
      page last free stack trace:
       reset_page_owner include/linux/page_owner.h:24 [inline]
       free_pages_prepare mm/page_alloc.c:1338 [inline]
       free_pcp_prepare+0x374/0x870 mm/page_alloc.c:1389
       free_unref_page_prepare mm/page_alloc.c:3309 [inline]
       free_unref_page+0x19/0x690 mm/page_alloc.c:3388
       qlink_free mm/kasan/quarantine.c:146 [inline]
       qlist_free_all+0x5a/0xc0 mm/kasan/quarantine.c:165
       kasan_quarantine_reduce+0x180/0x200 mm/kasan/quarantine.c:272
       __kasan_slab_alloc+0xa2/0xc0 mm/kasan/common.c:444
       kasan_slab_alloc include/linux/kasan.h:259 [inline]
       slab_post_alloc_hook mm/slab.h:519 [inline]
       slab_alloc_node mm/slub.c:3234 [inline]
       kmem_cache_alloc_node+0x255/0x3f0 mm/slub.c:3270
       __alloc_skb+0x215/0x340 net/core/skbuff.c:414
       alloc_skb include/linux/skbuff.h:1126 [inline]
       alloc_skb_with_frags+0x93/0x620 net/core/skbuff.c:6078
       sock_alloc_send_pskb+0x783/0x910 net/core/sock.c:2575
       mld_newpack+0x1df/0x770 net/ipv6/mcast.c:1754
       add_grhead+0x265/0x330 net/ipv6/mcast.c:1857
       add_grec+0x1053/0x14e0 net/ipv6/mcast.c:1995
       mld_send_initial_cr.part.0+0xf6/0x230 net/ipv6/mcast.c:2242
       mld_send_initial_cr net/ipv6/mcast.c:1232 [inline]
       mld_dad_work+0x1d3/0x690 net/ipv6/mcast.c:2268
       process_one_work+0x9b2/0x1690 kernel/workqueue.c:2298
       worker_thread+0x658/0x11f0 kernel/workqueue.c:2445
      
      Memory state around the buggy address:
       ffff88807f1cb600: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
       ffff88807f1cb680: fb fb fb fb fb fb fc fc fc fc fc fc fc fc fc fc
      >ffff88807f1cb700: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                              ^
       ffff88807f1cb780: fb fb fb fb fb fb fc fc fc fc fc fc fc fc fc fc
       ffff88807f1cb800: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      
      Fixes: 41063e9d ("ipv4: Early TCP socket demux.")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Link: https://lore.kernel.org/r/20211220143330.680945-1-eric.dumazet@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      [cmllamas: fixed trivial merge conflict]
      Signed-off-by: NCarlos Llamas <cmllamas@google.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      Conflicts:
      	net/ipv4/af_inet.c
      Signed-off-by: NDong Chenchen <dongchenchen2@huawei.com>
      Reviewed-by: NYue Haibing <yuehaibing@huawei.com>
      Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
      4c4298bf
    • J
      ext4: continue to expand file system when the target size doesn't reach · 3b8b0d3f
      Jerry Lee 李修賢 提交于
      stable inclusion
      from stable-v4.19.262
      commit f2180ad6a43501597d20eacad0c6f146c51d4bbd
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I5ZXGL
      CVE: NA
      
      --------------------------------
      
      commit df3cb754 upstream.
      
      When expanding a file system from (16TiB-2MiB) to 18TiB, the operation
      exits early which leads to result inconsistency between resize2fs and
      Ext4 kernel driver.
      
      === before ===
      ○ → resize2fs /dev/mapper/thin
      resize2fs 1.45.5 (07-Jan-2020)
      Filesystem at /dev/mapper/thin is mounted on /mnt/test; on-line resizing required
      old_desc_blocks = 2048, new_desc_blocks = 2304
      The filesystem on /dev/mapper/thin is now 4831837696 (4k) blocks long.
      
      [  865.186308] EXT4-fs (dm-5): mounted filesystem with ordered data mode. Opts: (null). Quota mode: none.
      [  912.091502] dm-4: detected capacity change from 34359738368 to 38654705664
      [  970.030550] dm-5: detected capacity change from 34359734272 to 38654701568
      [ 1000.012751] EXT4-fs (dm-5): resizing filesystem from 4294966784 to 4831837696 blocks
      [ 1000.012878] EXT4-fs (dm-5): resized filesystem to 4294967296
      
      === after ===
      [  129.104898] EXT4-fs (dm-5): mounted filesystem with ordered data mode. Opts: (null). Quota mode: none.
      [  143.773630] dm-4: detected capacity change from 34359738368 to 38654705664
      [  198.203246] dm-5: detected capacity change from 34359734272 to 38654701568
      [  207.918603] EXT4-fs (dm-5): resizing filesystem from 4294966784 to 4831837696 blocks
      [  207.918754] EXT4-fs (dm-5): resizing filesystem from 4294967296 to 4831837696 blocks
      [  207.918758] EXT4-fs (dm-5): Converting file system to meta_bg
      [  207.918790] EXT4-fs (dm-5): resizing filesystem from 4294967296 to 4831837696 blocks
      [  221.454050] EXT4-fs (dm-5): resized to 4658298880 blocks
      [  227.634613] EXT4-fs (dm-5): resized filesystem to 4831837696
      Signed-off-by: NJerry Lee <jerrylee@qnap.com>
      Link: https://lore.kernel.org/r/PU1PR04MB22635E739BD21150DC182AC6A18C9@PU1PR04MB2263.apcprd04.prod.outlook.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
      3b8b0d3f
    • K
      nvme: copy firmware_rev on each init · 8f5816c2
      Keith Busch 提交于
      stable inclusion
      from stable-v4.19.262
      commit 366a2b3110c69f919fb3277acc1a0bb8cd8a8dbd
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I5ZXGL
      CVE: NA
      
      --------------------------------
      
      [ Upstream commit a8eb6c1b ]
      
      The firmware revision can change on after a reset so copy the most
      recent info each time instead of just the first time, otherwise the
      sysfs firmware_rev entry may contain stale data.
      Reported-by: NJeff Lien <jeff.lien@wdc.com>
      Signed-off-by: NKeith Busch <kbusch@kernel.org>
      Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
      Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com>
      Reviewed-by: NChao Leng <lengchao@huawei.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
      8f5816c2