1. 12 4月, 2023 32 次提交
    • K
      dccp: Call inet6_destroy_sock() via sk->sk_destruct(). · fb3defd5
      Kuniyuki Iwashima 提交于
      mainline inclusion
      from mainline-v6.2-rc1
      commit 1651951e
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I6TPN9
      CVE: NA
      
      Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v6.3-rc2&id=1651951ebea54970e0bda60c638fc2eee7a6218f
      
      --------------------------------
      
      After commit d38afeec ("tcp/udp: Call inet6_destroy_sock()
      in IPv6 sk->sk_destruct()."), we call inet6_destroy_sock() in
      sk->sk_destruct() by setting inet6_sock_destruct() to it to make
      sure we do not leak inet6-specific resources.
      
      DCCP sets its own sk->sk_destruct() in the dccp_init_sock(), and
      DCCPv6 socket shares it by calling the same init function via
      dccp_v6_init_sock().
      
      To call inet6_sock_destruct() from DCCPv6 sk->sk_destruct(), we
      export it and set dccp_v6_sk_destruct() in the init function.
      Signed-off-by: NKuniyuki Iwashima <kuniyu@amazon.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NZiyang Xuan <william.xuanziyang@huawei.com>
      Reviewed-by: NLiu Jian <liujian56@huawei.com>
      Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
      fb3defd5
    • K
      net: Remove WARN_ON_ONCE(sk->sk_forward_alloc) from sk_stream_kill_queues(). · 0e6c199a
      Kuniyuki Iwashima 提交于
      stable inclusion
      from stable-v5.10.171
      commit 3e4bbd1f38a8d35bd2d3aaffdb5f6ada546b669a
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I6TPN9
      CVE: NA
      
      Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=3e4bbd1f38a8d35bd2d3aaffdb5f6ada546b669a
      
      --------------------------------
      
      commit 62ec33b4 upstream.
      
      Christoph Paasch reported that commit b5fc2923 ("inet6: Remove
      inet6_destroy_sock() in sk->sk_prot->destroy().") started triggering
      WARN_ON_ONCE(sk->sk_forward_alloc) in sk_stream_kill_queues().  [0 - 2]
      Also, we can reproduce it by a program in [3].
      
      In the commit, we delay freeing ipv6_pinfo.pktoptions from sk->destroy()
      to sk->sk_destruct(), so sk->sk_forward_alloc is no longer zero in
      inet_csk_destroy_sock().
      
      The same check has been in inet_sock_destruct() from at least v2.6,
      we can just remove the WARN_ON_ONCE().  However, among the users of
      sk_stream_kill_queues(), only CAIF is not calling inet_sock_destruct().
      Thus, we add the same WARN_ON_ONCE() to caif_sock_destructor().
      
      [0]: https://lore.kernel.org/netdev/39725AB4-88F1-41B3-B07F-949C5CAEFF4F@icloud.com/
      [1]: https://github.com/multipath-tcp/mptcp_net-next/issues/341
      [2]:
      WARNING: CPU: 0 PID: 3232 at net/core/stream.c:212 sk_stream_kill_queues+0x2f9/0x3e0
      Modules linked in:
      CPU: 0 PID: 3232 Comm: syz-executor.0 Not tainted 6.2.0-rc5ab24eb4698afbe147b424149c529e2a43ec24eb5 #2
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
      RIP: 0010:sk_stream_kill_queues+0x2f9/0x3e0
      Code: 03 0f b6 04 02 84 c0 74 08 3c 03 0f 8e ec 00 00 00 8b ab 08 01 00 00 e9 60 ff ff ff e8 d0 5f b6 fe 0f 0b eb 97 e8 c7 5f b6 fe <0f> 0b eb a0 e8 be 5f b6 fe 0f 0b e9 6a fe ff ff e8 02 07 e3 fe e9
      RSP: 0018:ffff88810570fc68 EFLAGS: 00010293
      RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
      RDX: ffff888101f38f40 RSI: ffffffff8285e529 RDI: 0000000000000005
      RBP: 0000000000000ce0 R08: 0000000000000005 R09: 0000000000000000
      R10: 0000000000000ce0 R11: 0000000000000001 R12: ffff8881009e9488
      R13: ffffffff84af2cc0 R14: 0000000000000000 R15: ffff8881009e9458
      FS:  00007f7fdfbd5800(0000) GS:ffff88811b600000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000001b32923000 CR3: 00000001062fc006 CR4: 0000000000170ef0
      Call Trace:
       <TASK>
       inet_csk_destroy_sock+0x1a1/0x320
       __tcp_close+0xab6/0xe90
       tcp_close+0x30/0xc0
       inet_release+0xe9/0x1f0
       inet6_release+0x4c/0x70
       __sock_release+0xd2/0x280
       sock_close+0x15/0x20
       __fput+0x252/0xa20
       task_work_run+0x169/0x250
       exit_to_user_mode_prepare+0x113/0x120
       syscall_exit_to_user_mode+0x1d/0x40
       do_syscall_64+0x48/0x90
       entry_SYSCALL_64_after_hwframe+0x72/0xdc
      RIP: 0033:0x7f7fdf7ae28d
      Code: c1 20 00 00 75 10 b8 03 00 00 00 0f 05 48 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 ee fb ff ff 48 89 04 24 b8 03 00 00 00 0f 05 <48> 8b 3c 24 48 89 c2 e8 37 fc ff ff 48 89 d0 48 83 c4 08 48 3d 01
      RSP: 002b:00000000007dfbb0 EFLAGS: 00000293 ORIG_RAX: 0000000000000003
      RAX: 0000000000000000 RBX: 0000000000000004 RCX: 00007f7fdf7ae28d
      RDX: 0000000000000000 RSI: ffffffffffffffff RDI: 0000000000000003
      RBP: 0000000000000000 R08: 000000007f338e0f R09: 0000000000000e0f
      R10: 000000007f338e13 R11: 0000000000000293 R12: 00007f7fdefff000
      R13: 00007f7fdefffcd8 R14: 00007f7fdefffce0 R15: 00007f7fdefffcd8
       </TASK>
      
      [3]: https://lore.kernel.org/netdev/20230208004245.83497-1-kuniyu@amazon.com/
      
      Fixes: b5fc2923 ("inet6: Remove inet6_destroy_sock() in sk->sk_prot->destroy().")
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Reported-by: NChristoph Paasch <christophpaasch@icloud.com>
      Signed-off-by: NKuniyuki Iwashima <kuniyu@amazon.com>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NZiyang Xuan <william.xuanziyang@huawei.com>
      Reviewed-by: NLiu Jian <liujian56@huawei.com>
      Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
      0e6c199a
    • K
      inet6: Remove inet6_destroy_sock() in sk->sk_prot->destroy(). · 0c0b5ebd
      Kuniyuki Iwashima 提交于
      mainline inclusion
      from mainline-v6.2-rc1
      commit b5fc2923
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I6TPN9
      CVE: NA
      
      Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v6.3-rc2&id=b5fc29233d28be7a3322848ebe73ac327559cdb9
      
      --------------------------------
      
      After commit d38afeec ("tcp/udp: Call inet6_destroy_sock()
      in IPv6 sk->sk_destruct()."), we call inet6_destroy_sock() in
      sk->sk_destruct() by setting inet6_sock_destruct() to it to make
      sure we do not leak inet6-specific resources.
      
      Now we can remove unnecessary inet6_destroy_sock() calls in
      sk->sk_prot->destroy().
      
      DCCP and SCTP have their own sk->sk_destruct() function, so we
      change them separately in the following patches.
      Signed-off-by: NKuniyuki Iwashima <kuniyu@amazon.com>
      Reviewed-by: NMatthieu Baerts <matthieu.baerts@tessares.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Conflicts:
      	net/ipv6/ping.c
      Signed-off-by: NZiyang Xuan <william.xuanziyang@huawei.com>
      Reviewed-by: NLiu Jian <liujian56@huawei.com>
      Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
      0c0b5ebd
    • K
      tcp/udp: Call inet6_destroy_sock() in IPv6 sk->sk_destruct(). · da9581d8
      Kuniyuki Iwashima 提交于
      mainline inclusion
      from mainline-v6.1-rc1
      commit d38afeec
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I6TPN9
      CVE: NA
      
      Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v6.3-rc2&id=d38afeec26ed4739c640bf286c270559aab2ba5f
      
      --------------------------------
      
      Originally, inet6_sk(sk)->XXX were changed under lock_sock(), so we were
      able to clean them up by calling inet6_destroy_sock() during the IPv6 ->
      IPv4 conversion by IPV6_ADDRFORM.  However, commit 03485f2a ("udpv6:
      Add lockless sendmsg() support") added a lockless memory allocation path,
      which could cause a memory leak:
      
      setsockopt(IPV6_ADDRFORM)                 sendmsg()
      +-----------------------+                 +-------+
      - do_ipv6_setsockopt(sk, ...)             - udpv6_sendmsg(sk, ...)
        - sockopt_lock_sock(sk)                   ^._ called via udpv6_prot
          - lock_sock(sk)                             before WRITE_ONCE()
        - WRITE_ONCE(sk->sk_prot, &tcp_prot)
        - inet6_destroy_sock()                    - if (!corkreq)
        - sockopt_release_sock(sk)                  - ip6_make_skb(sk, ...)
          - release_sock(sk)                          ^._ lockless fast path for
                                                          the non-corking case
      
                                                      - __ip6_append_data(sk, ...)
                                                        - ipv6_local_rxpmtu(sk, ...)
                                                          - xchg(&np->rxpmtu, skb)
                                                            ^._ rxpmtu is never freed.
      
                                                      - goto out_no_dst;
      
                                                  - lock_sock(sk)
      
      For now, rxpmtu is only the case, but not to miss the future change
      and a similar bug fixed in commit e2732600 ("net: ping6: Fix
      memleak in ipv6_renew_options()."), let's set a new function to IPv6
      sk->sk_destruct() and call inet6_cleanup_sock() there.  Since the
      conversion does not change sk->sk_destruct(), we can guarantee that
      we can clean up IPv6 resources finally.
      
      We can now remove all inet6_destroy_sock() calls from IPv6 protocol
      specific ->destroy() functions, but such changes are invasive to
      backport.  So they can be posted as a follow-up later for net-next.
      
      Fixes: 03485f2a ("udpv6: Add lockless sendmsg() support")
      Signed-off-by: NKuniyuki Iwashima <kuniyu@amazon.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      Signed-off-by: NZiyang Xuan <william.xuanziyang@huawei.com>
      Reviewed-by: NLiu Jian <liujian56@huawei.com>
      Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
      da9581d8
    • K
      udp: Call inet6_destroy_sock() in setsockopt(IPV6_ADDRFORM). · 9e53e70d
      Kuniyuki Iwashima 提交于
      mainline inclusion
      from mainline-v6.1-rc1
      commit 21985f43
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I6TPN9
      CVE: NA
      
      Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v6.3-rc2&id=21985f43376cee092702d6cb963ff97a9d2ede68
      
      --------------------------------
      
      Commit 4b340ae2 ("IPv6: Complete IPV6_DONTFRAG support") forgot
      to add a change to free inet6_sk(sk)->rxpmtu while converting an IPv6
      socket into IPv4 with IPV6_ADDRFORM.  After conversion, sk_prot is
      changed to udp_prot and ->destroy() never cleans it up, resulting in
      a memory leak.
      
      This is due to the discrepancy between inet6_destroy_sock() and
      IPV6_ADDRFORM, so let's call inet6_destroy_sock() from IPV6_ADDRFORM
      to remove the difference.
      
      However, this is not enough for now because rxpmtu can be changed
      without lock_sock() after commit 03485f2a ("udpv6: Add lockless
      sendmsg() support").  We will fix this case in the following patch.
      
      Note we will rename inet6_destroy_sock() to inet6_cleanup_sock() and
      remove unnecessary inet6_destroy_sock() calls in sk_prot->destroy()
      in the future.
      
      Fixes: 4b340ae2 ("IPv6: Complete IPV6_DONTFRAG support")
      Signed-off-by: NKuniyuki Iwashima <kuniyu@amazon.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      Signed-off-by: NZiyang Xuan <william.xuanziyang@huawei.com>
      Reviewed-by: NLiu Jian <liujian56@huawei.com>
      Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
      9e53e70d
    • Z
      9p/xen : Fix use after free bug in xen_9pfs_front_remove due to race condition · e0c18fc3
      Zheng Wang 提交于
      maillist inclusion
      category: bugfix
      bugzilla: 188655, https://gitee.com/src-openeuler/kernel/issues/I6T36H
      CVE: CVE-2023-1859
      
      Reference: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=ea4f1009408efb4989a0f139b70fb338e7f687d0
      
      ----------------------------------------
      
      In xen_9pfs_front_probe, it calls xen_9pfs_front_alloc_dataring
      to init priv->rings and bound &ring->work with p9_xen_response.
      
      When it calls xen_9pfs_front_event_handler to handle IRQ requests,
      it will finally call schedule_work to start the work.
      
      When we call xen_9pfs_front_remove to remove the driver, there
      may be a sequence as follows:
      
      Fix it by finishing the work before cleanup in xen_9pfs_front_free.
      
      Note that, this bug is found by static analysis, which might be
      false positive.
      
      CPU0                  CPU1
      
                           |p9_xen_response
      xen_9pfs_front_remove|
        xen_9pfs_front_free|
      kfree(priv)          |
      //free priv          |
                           |p9_tag_lookup
                           |//use priv->client
      
      Fixes: 71ebd719 ("xen/9pfs: connect to the backend")
      Signed-off-by: NZheng Wang <zyytlz.wz@163.com>
      Reviewed-by: NMichal Swiatkowski <michal.swiatkowski@linux.intel.com>
      Signed-off-by: NEric Van Hensbergen <ericvh@kernel.org>
      Signed-off-by: NLu Wei <luwei32@huawei.com>
      Reviewed-by: NYue Haibing <yuehaibing@huawei.com>
      Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com>
      Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
      e0c18fc3
    • Z
      ubi: Fix failure attaching when vid_hdr offset equals to (sub)page size · 2893797b
      Zhihao Cheng 提交于
      maillist inclusion
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I6U6XK
      
      Reference: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit?id=1e020e1b96afdecd20680b5b5be2a6ffc3d27628
      
      --------------------------------
      
      Following process will make ubi attaching failed since commit
      1b42b1a3 ("ubi: ensure that VID header offset ... size"):
      
      ID="0xec,0xa1,0x00,0x15" # 128M 128KB 2KB
      modprobe nandsim id_bytes=$ID
      flash_eraseall /dev/mtd0
      modprobe ubi mtd="0,2048"  # set vid_hdr offset as 2048 (one page)
      (dmesg):
        ubi0 error: ubi_attach_mtd_dev [ubi]: VID header offset 2048 too large.
        UBI error: cannot attach mtd0
        UBI error: cannot initialize UBI, error -22
      
      Rework original solution, the key point is making sure
      'vid_hdr_shift + UBI_VID_HDR_SIZE < ubi->vid_hdr_alsize',
      so we should check vid_hdr_shift rather not vid_hdr_offset.
      Then, ubi still support (sub)page aligined VID header offset.
      
      Fixes: 1b42b1a3 ("ubi: ensure that VID header offset ... size")
      Signed-off-by: NZhihao Cheng <chengzhihao1@huawei.com>
      Tested-by: NNicolas Schichan <nschichan@freebox.fr>
      Tested-by: Miquel Raynal <miquel.raynal@bootlin.com> # v5.10, v4.19
      Signed-off-by: NRichard Weinberger <richard@nod.at>
      Signed-off-by: NZhaoLong Wang <wangzhaolong1@huawei.com>
      Reviewed-by: NZhihao Cheng <chengzhihao1@huawei.com>
      Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
      Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
      2893797b
    • G
      ubi: ensure that VID header offset + VID header size <= alloc, size · 9441f997
      George Kennedy 提交于
      stable inclusion
      from stable-v5.10.173
      commit 846bfba34175c23b13cc2023c2d67b96e8c14c43
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I6PMQI
      
      Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=846bfba34175c23b13cc2023c2d67b96e8c14c43
      
      --------------------------------
      
      [ Upstream commit 1b42b1a3 ]
      
      Ensure that the VID header offset + VID header size does not exceed
      the allocated area to avoid slab OOB.
      
      BUG: KASAN: slab-out-of-bounds in crc32_body lib/crc32.c:111 [inline]
      BUG: KASAN: slab-out-of-bounds in crc32_le_generic lib/crc32.c:179 [inline]
      BUG: KASAN: slab-out-of-bounds in crc32_le_base+0x58c/0x626 lib/crc32.c:197
      Read of size 4 at addr ffff88802bb36f00 by task syz-executor136/1555
      
      CPU: 2 PID: 1555 Comm: syz-executor136 Tainted: G        W
      6.0.0-1868 #1
      Hardware name: Red Hat KVM, BIOS 1.13.0-2.module+el8.3.0+7860+a7792d29
      04/01/2014
      Call Trace:
        <TASK>
        __dump_stack lib/dump_stack.c:88 [inline]
        dump_stack_lvl+0x85/0xad lib/dump_stack.c:106
        print_address_description mm/kasan/report.c:317 [inline]
        print_report.cold.13+0xb6/0x6bb mm/kasan/report.c:433
        kasan_report+0xa7/0x11b mm/kasan/report.c:495
        crc32_body lib/crc32.c:111 [inline]
        crc32_le_generic lib/crc32.c:179 [inline]
        crc32_le_base+0x58c/0x626 lib/crc32.c:197
        ubi_io_write_vid_hdr+0x1b7/0x472 drivers/mtd/ubi/io.c:1067
        create_vtbl+0x4d5/0x9c4 drivers/mtd/ubi/vtbl.c:317
        create_empty_lvol drivers/mtd/ubi/vtbl.c:500 [inline]
        ubi_read_volume_table+0x67b/0x288a drivers/mtd/ubi/vtbl.c:812
        ubi_attach+0xf34/0x1603 drivers/mtd/ubi/attach.c:1601
        ubi_attach_mtd_dev+0x6f3/0x185e drivers/mtd/ubi/build.c:965
        ctrl_cdev_ioctl+0x2db/0x347 drivers/mtd/ubi/cdev.c:1043
        vfs_ioctl fs/ioctl.c:51 [inline]
        __do_sys_ioctl fs/ioctl.c:870 [inline]
        __se_sys_ioctl fs/ioctl.c:856 [inline]
        __x64_sys_ioctl+0x193/0x213 fs/ioctl.c:856
        do_syscall_x64 arch/x86/entry/common.c:50 [inline]
        do_syscall_64+0x3e/0x86 arch/x86/entry/common.c:80
        entry_SYSCALL_64_after_hwframe+0x63/0x0
      RIP: 0033:0x7f96d5cf753d
      Code:
      RSP: 002b:00007fffd72206f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
      RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f96d5cf753d
      RDX: 0000000020000080 RSI: 0000000040186f40 RDI: 0000000000000003
      RBP: 0000000000400cd0 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000400be0
      R13: 00007fffd72207e0 R14: 0000000000000000 R15: 0000000000000000
        </TASK>
      
      Allocated by task 1555:
        kasan_save_stack+0x20/0x3d mm/kasan/common.c:38
        kasan_set_track mm/kasan/common.c:45 [inline]
        set_alloc_info mm/kasan/common.c:437 [inline]
        ____kasan_kmalloc mm/kasan/common.c:516 [inline]
        __kasan_kmalloc+0x88/0xa3 mm/kasan/common.c:525
        kasan_kmalloc include/linux/kasan.h:234 [inline]
        __kmalloc+0x138/0x257 mm/slub.c:4429
        kmalloc include/linux/slab.h:605 [inline]
        ubi_alloc_vid_buf drivers/mtd/ubi/ubi.h:1093 [inline]
        create_vtbl+0xcc/0x9c4 drivers/mtd/ubi/vtbl.c:295
        create_empty_lvol drivers/mtd/ubi/vtbl.c:500 [inline]
        ubi_read_volume_table+0x67b/0x288a drivers/mtd/ubi/vtbl.c:812
        ubi_attach+0xf34/0x1603 drivers/mtd/ubi/attach.c:1601
        ubi_attach_mtd_dev+0x6f3/0x185e drivers/mtd/ubi/build.c:965
        ctrl_cdev_ioctl+0x2db/0x347 drivers/mtd/ubi/cdev.c:1043
        vfs_ioctl fs/ioctl.c:51 [inline]
        __do_sys_ioctl fs/ioctl.c:870 [inline]
        __se_sys_ioctl fs/ioctl.c:856 [inline]
        __x64_sys_ioctl+0x193/0x213 fs/ioctl.c:856
        do_syscall_x64 arch/x86/entry/common.c:50 [inline]
        do_syscall_64+0x3e/0x86 arch/x86/entry/common.c:80
        entry_SYSCALL_64_after_hwframe+0x63/0x0
      
      The buggy address belongs to the object at ffff88802bb36e00
        which belongs to the cache kmalloc-256 of size 256
      The buggy address is located 0 bytes to the right of
        256-byte region [ffff88802bb36e00, ffff88802bb36f00)
      
      The buggy address belongs to the physical page:
      page:00000000ea4d1263 refcount:1 mapcount:0 mapping:0000000000000000
      index:0x0 pfn:0x2bb36
      head:00000000ea4d1263 order:1 compound_mapcount:0 compound_pincount:0
      flags: 0xfffffc0010200(slab|head|node=0|zone=1|lastcpupid=0x1fffff)
      raw: 000fffffc0010200 ffffea000066c300 dead000000000003 ffff888100042b40
      raw: 0000000000000000 0000000000100010 00000001ffffffff 0000000000000000
      page dumped because: kasan: bad access detected
      
      Memory state around the buggy address:
        ffff88802bb36e00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
        ffff88802bb36e80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      >ffff88802bb36f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
                          ^
        ffff88802bb36f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
        ffff88802bb37000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      ==================================================================
      
      Fixes: 801c135c ("UBI: Unsorted Block Images")
      Reported-by: Nsyzkaller <syzkaller@googlegroups.com>
      Signed-off-by: NGeorge Kennedy <george.kennedy@oracle.com>
      Signed-off-by: NRichard Weinberger <richard@nod.at>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NWang Hai <wanghai38@huawei.com>
      Signed-off-by: NZhaoLong Wang <wangzhaolong1@huawei.com>
      Reviewed-by: NZhihao Cheng <chengzhihao1@huawei.com>
      Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
      Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
      9441f997
    • Z
      ftrace: Fix issue that 'direct->addr' not restored in modify_ftrace_direct() · 6afc8591
      Zheng Yejian 提交于
      mainline inclusion
      from mainline-v6.3-rc6
      commit 2a2d8c51
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I6TQ89
      CVE: NA
      
      Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=2a2d8c51defb446e8d89a83f42f8e5cd529111e9
      
      --------------------------------
      
      Syzkaller report a WARNING: "WARN_ON(!direct)" in modify_ftrace_direct().
      
      Root cause is 'direct->addr' was changed from 'old_addr' to 'new_addr' but
      not restored if error happened on calling ftrace_modify_direct_caller().
      Then it can no longer find 'direct' by that 'old_addr'.
      
      To fix it, restore 'direct->addr' to 'old_addr' explicitly in error path.
      
      Link: https://lore.kernel.org/linux-trace-kernel/20230330025223.1046087-1-zhengyejian1@huawei.com
      
      Cc: stable@vger.kernel.org
      Cc: <mhiramat@kernel.org>
      Cc: <mark.rutland@arm.com>
      Cc: <ast@kernel.org>
      Cc: <daniel@iogearbox.net>
      Fixes: 77ab7785 ("ftrace: Fix modify_ftrace_direct.")
      Signed-off-by: NZheng Yejian <zhengyejian1@huawei.com>
      Signed-off-by: NSteven Rostedt (Google) <rostedt@goodmis.org>
      Signed-off-by: NZheng Yejian <zhengyejian1@huawei.com>
      Reviewed-by: NXu Kuohai <xukuohai@huawei.com>
      Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
      6afc8591
    • Y
      perf/core: Fix perf_output_begin parameter is incorrectly invoked in perf_event_bpf_output · 0eb9f75b
      Yang Jihong 提交于
      mainline inclusion
      from mainline-v6.3-rc3
      commit eb81a2ed
      category: bugfix
      bugzilla: https://gitee.com/src-openeuler/kernel/issues/I6ODHQ
      CVE: NA
      
      Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=eb81a2ed4f52be831c9fb879752d89645a312c13
      
      --------------------------------
      
      syzkaller reportes a KASAN issue with stack-out-of-bounds.
      The call trace is as follows:
        dump_stack+0x9c/0xd3
        print_address_description.constprop.0+0x19/0x170
        __kasan_report.cold+0x6c/0x84
        kasan_report+0x3a/0x50
        __perf_event_header__init_id+0x34/0x290
        perf_event_header__init_id+0x48/0x60
        perf_output_begin+0x4a4/0x560
        perf_event_bpf_output+0x161/0x1e0
        perf_iterate_sb_cpu+0x29e/0x340
        perf_iterate_sb+0x4c/0xc0
        perf_event_bpf_event+0x194/0x2c0
        __bpf_prog_put.constprop.0+0x55/0xf0
        __cls_bpf_delete_prog+0xea/0x120 [cls_bpf]
        cls_bpf_delete_prog_work+0x1c/0x30 [cls_bpf]
        process_one_work+0x3c2/0x730
        worker_thread+0x93/0x650
        kthread+0x1b8/0x210
        ret_from_fork+0x1f/0x30
      
      commit 267fb273 ("perf: Reduce stack usage of perf_output_begin()")
      use on-stack struct perf_sample_data of the caller function.
      
      However, perf_event_bpf_output uses incorrect parameter to convert
      small-sized data (struct perf_bpf_event) into large-sized data
      (struct perf_sample_data), which causes memory overwriting occurs in
      __perf_event_header__init_id.
      
      Fixes: 267fb273 ("perf: Reduce stack usage of perf_output_begin()")
      Signed-off-by: NYang Jihong <yangjihong1@huawei.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/20230314044735.56551-1-yangjihong1@huawei.comSigned-off-by: NYang Jihong <yangjihong1@huawei.com>
      Reviewed-by: NXu Kuohai <xukuohai@huawei.com>
      Reviewed-by: NZheng Yejian <zhengyejian1@huawei.com>
      Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
      0eb9f75b
    • Z
      xirc2ps_cs: Fix use after free bug in xirc2ps_detach · 3c3ce918
      Zheng Wang 提交于
      stable inclusion
      from stable-v5.10.176
      commit bfeeb3aaad4ee8eaaefe5d9edd9b2ccb5d9b7505
      category: bugfix
      bugzilla: 188641, https://gitee.com/src-openeuler/kernel/issues/I6R4MM
      CVE: CVE-2023-1670
      
      Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=bfeeb3aaad4ee8eaaefe5d9edd9b2ccb5d9b7505
      
      --------------------------------
      
      [ Upstream commit e8d20c3d ]
      
      In xirc2ps_probe, the local->tx_timeout_task was bounded
      with xirc2ps_tx_timeout_task. When timeout occurs,
      it will call xirc_tx_timeout->schedule_work to start the
      work.
      
      When we call xirc2ps_detach to remove the driver, there
      may be a sequence as follows:
      
      Stop responding to timeout tasks and complete scheduled
      tasks before cleanup in xirc2ps_detach, which will fix
      the problem.
      
      CPU0                  CPU1
      
                          |xirc2ps_tx_timeout_task
      xirc2ps_detach      |
        free_netdev       |
          kfree(dev);     |
                          |
                          | do_reset
                          |   //use dev
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: NZheng Wang <zyytlz.wz@163.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NDong Chenchen <dongchenchen2@huawei.com>
      Reviewed-by: NLiu Jian <liujian56@huawei.com>
      Reviewed-by: NWang Weiyang <wangweiyang2@huawei.com>
      Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
      3c3ce918
    • Z
      ring-buffer: Fix race while reader and writer are on the same page · 88d65db3
      Zheng Yejian 提交于
      mainline inclusion
      from mainline-v6.3-rc6
      commit 6455b616
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I6TJ97
      
      Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=6455b6163d8c680366663cdb8c679514d55fc30c
      
      --------------------------------
      
      When user reads file 'trace_pipe', kernel keeps printing following logs
      that warn at "cpu_buffer->reader_page->read > rb_page_size(reader)" in
      rb_get_reader_page(). It just looks like there's an infinite loop in
      tracing_read_pipe(). This problem occurs several times on arm64 platform
      when testing v5.10 and below.
      
        Call trace:
         rb_get_reader_page+0x248/0x1300
         rb_buffer_peek+0x34/0x160
         ring_buffer_peek+0xbc/0x224
         peek_next_entry+0x98/0xbc
         __find_next_entry+0xc4/0x1c0
         trace_find_next_entry_inc+0x30/0x94
         tracing_read_pipe+0x198/0x304
         vfs_read+0xb4/0x1e0
         ksys_read+0x74/0x100
         __arm64_sys_read+0x24/0x30
         el0_svc_common.constprop.0+0x7c/0x1bc
         do_el0_svc+0x2c/0x94
         el0_svc+0x20/0x30
         el0_sync_handler+0xb0/0xb4
         el0_sync+0x160/0x180
      
      Then I dump the vmcore and look into the problematic per_cpu ring_buffer,
      I found that tail_page/commit_page/reader_page are on the same page while
      reader_page->read is obviously abnormal:
        tail_page == commit_page == reader_page == {
          .write = 0x100d20,
          .read = 0x8f9f4805,  // Far greater than 0xd20, obviously abnormal!!!
          .entries = 0x10004c,
          .real_end = 0x0,
          .page = {
            .time_stamp = 0x857257416af0,
            .commit = 0xd20,  // This page hasn't been full filled.
            // .data[0...0xd20] seems normal.
          }
       }
      
      The root cause is most likely the race that reader and writer are on the
      same page while reader saw an event that not fully committed by writer.
      
      To fix this, add memory barriers to make sure the reader can see the
      content of what is committed. Since commit a0fcaaed ("ring-buffer: Fix
      race between reset page and reading page") has added the read barrier in
      rb_get_reader_page(), here we just need to add the write barrier.
      
      Link: https://lore.kernel.org/linux-trace-kernel/20230325021247.2923907-1-zhengyejian1@huawei.com
      
      Cc: stable@vger.kernel.org
      Fixes: 77ae365e ("ring-buffer: make lockless")
      Suggested-by: NSteven Rostedt (Google) <rostedt@goodmis.org>
      Signed-off-by: NZheng Yejian <zhengyejian1@huawei.com>
      Signed-off-by: NSteven Rostedt (Google) <rostedt@goodmis.org>
      Signed-off-by: NZheng Yejian <zhengyejian1@huawei.com>
      Reviewed-by: NYang Jihong <yangjihong1@huawei.com>
      Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
      88d65db3
    • Z
      loop: Add parm check in loop_control_ioctl · 4e3149e0
      Zhong Jinghua 提交于
      hulk inclusion
      category: bugfix
      bugzilla: 188586, https://gitee.com/openeuler/kernel/issues/I6TFPJ
      CVE: NA
      
      ----------------------------------------
      
      We found that in loop_control_ioctl, the kernel panic can be easily caused:
      
      1. syscall(__NR_ioctl, r[1], 0x4c80, 0x80000200000ul);
      Create a loop device 0x80000200000ul.
      In fact, in the code, it is used as the first_minor number, and the
      first_minor number is 0.
      So the created loop device number is 7:0.
      
      2. syscall(__NR_ioctl, r[2], 0x4c80, 0ul);
      Create a loop device 0x0ul.
      Since the 7:0 device has been created in 1, add_disk will fail because
      the major and first_minor numbers are consistent.
      
      3. syscall(__NR_ioctl, r[5], 0x4c81, 0ul);
      Delete the device that failed to create, the kernel panics.
      
      Panic like below:
      BUG: KASAN: null-ptr-deref in device_del+0xb3/0x840 drivers/base/core.c:3107
      Call Trace:
       kill_device drivers/base/core.c:3079 [inline]
       device_del+0xb3/0x840 drivers/base/core.c:3107
       del_gendisk+0x463/0x5f0 block/genhd.c:971
       loop_remove drivers/block/loop.c:2190 [inline]
       loop_control_ioctl drivers/block/loop.c:2289 [inline]
      
      The stack like below:
      Create loop device:
      loop_control_ioctl
        loop_add
          add_disk
            device_add_disk
              bdi_register
                bdi_register_va
                  device_create
                    device_create_groups_vargs
                      device_add
                        kfree(dev->p);
                          dev->p = NULL;
      
      Remove loop device:
      loop_control_ioctl
        loop_remove
          del_gendisk
            device_del
              kill_device
                if (dev->p->dead) // p is null
      
      Fix it by adding a check for parm.
      
      Fixes: 770fe30a ("loop: add management interface for on-demand device allocation")
      Signed-off-by: NZhong Jinghua <zhongjinghua@huawei.com>
      Reviewed-by: NYu Kuai <yukuai3@huawei.com>
      Reviewed-by: NHou Tao <houtao1@huawei.com>
      Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
      4e3149e0
    • Z
      ext4: Fix i_disksize exceeding i_size problem in paritally written case · ae4b9933
      Zhihao Cheng 提交于
      maillist inclusion
      category: bugfix
      bugzilla: https://gitee.com/src-openeuler/kernel/issues/I6SMBI
      CVE: NA
      
      Reference: https://www.spinics.net/lists/linux-ext4/msg88386.html
      
      --------------------------------
      
      Following process makes i_disksize exceed i_size:
      
      generic_perform_write
       copied = iov_iter_copy_from_user_atomic(len) // copied < len
       ext4_da_write_end
       | ext4_update_i_disksize
       |  new_i_size = pos + copied;
       |  WRITE_ONCE(EXT4_I(inode)->i_disksize, newsize) // update i_disksize
       | generic_write_end
       |  copied = block_write_end(copied, len) // copied = 0
       |   if (unlikely(copied < len))
       |    if (!PageUptodate(page))
       |     copied = 0;
       |  if (pos + copied > inode->i_size) // return false
       if (unlikely(copied == 0))
        goto again;
       if (unlikely(iov_iter_fault_in_readable(i, bytes))) {
        status = -EFAULT;
        break;
       }
      
      We get i_disksize greater than i_size here, which could trigger WARNING
      check 'i_size_read(inode) < EXT4_I(inode)->i_disksize' while doing dio:
      
      ext4_dio_write_iter
       iomap_dio_rw
        __iomap_dio_rw // return err, length is not aligned to 512
       ext4_handle_inode_extension
        WARN_ON_ONCE(i_size_read(inode) < EXT4_I(inode)->i_disksize) // Oops
      
       WARNING: CPU: 2 PID: 2609 at fs/ext4/file.c:319
       CPU: 2 PID: 2609 Comm: aa Not tainted 6.3.0-rc2
       RIP: 0010:ext4_file_write_iter+0xbc7
       Call Trace:
        vfs_write+0x3b1
        ksys_write+0x77
        do_syscall_64+0x39
      
      Fix it by updating 'copied' value before updating i_disksize just like
      ext4_write_inline_data_end() does.
      
      Fetch a reproducer in [Link].
      
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=217209
      Fixes: 64769240 ("ext4: Add delayed allocation support in data=writeback mode")
      Signed-off-by: NZhihao Cheng <chengzhihao1@huawei.com>
      Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
      Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
      ae4b9933
    • Z
      ext4: ext4_put_super: Remove redundant checking for 'sbi->s_journal_bdev' · 2db2f17b
      Zhihao Cheng 提交于
      maillist inclusion
      category: bugfix
      bugzilla: https://gitee.com/src-openeuler/kernel/issues/I6MMUV
      CVE: NA
      
      Reference: https://www.spinics.net/lists/linux-ext4/msg88237.html
      
      --------------------------------
      
      As discussed in [1], 'sbi->s_journal_bdev != sb->s_bdev' will always
      become true if sbi->s_journal_bdev exists. Filesystem block device and
      journal block device are both opened with 'FMODE_EXCL' mode, so these
      two devices can't be same one. Then we can remove the redundant checking
      'sbi->s_journal_bdev != sb->s_bdev' if 'sbi->s_journal_bdev' exists.
      
      [1] https://lore.kernel.org/lkml/f86584f6-3877-ff18-47a1-2efaa12d18b2@huawei.com/Signed-off-by: NZhihao Cheng <chengzhihao1@huawei.com>
      Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
      Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
      2db2f17b
    • Z
      ext4: Fix reusing stale buffer heads from last failed mounting · 1f6736e6
      Zhihao Cheng 提交于
      maillist inclusion
      category: bugfix
      bugzilla: https://gitee.com/src-openeuler/kernel/issues/I6MMUV
      CVE: NA
      
      Reference: https://www.spinics.net/lists/linux-ext4/msg88237.html
      
      --------------------------------
      
      Following process makes ext4 load stale buffer heads from last failed
      mounting in a new mounting operation:
      mount_bdev
       ext4_fill_super
       | ext4_load_and_init_journal
       |  ext4_load_journal
       |   jbd2_journal_load
       |    load_superblock
       |     journal_get_superblock
       |      set_buffer_verified(bh) // buffer head is verified
       |   jbd2_journal_recover // failed caused by EIO
       | goto failed_mount3a // skip 'sb->s_root' initialization
       deactivate_locked_super
        kill_block_super
         generic_shutdown_super
          if (sb->s_root)
          // false, skip ext4_put_super->invalidate_bdev->
          // invalidate_mapping_pages->mapping_evict_folio->
          // filemap_release_folio->try_to_free_buffers, which
          // cannot drop buffer head.
         blkdev_put
          blkdev_put_whole
           if (atomic_dec_and_test(&bdev->bd_openers))
           // false, systemd-udev happens to open the device. Then
           // blkdev_flush_mapping->kill_bdev->truncate_inode_pages->
           // truncate_inode_folio->truncate_cleanup_folio->
           // folio_invalidate->block_invalidate_folio->
           // filemap_release_folio->try_to_free_buffers will be skipped,
           // dropping buffer head is missed again.
      
      Second mount:
      ext4_fill_super
       ext4_load_and_init_journal
        ext4_load_journal
         ext4_get_journal
          jbd2_journal_init_inode
           journal_init_common
            bh = getblk_unmovable
             bh = __find_get_block // Found stale bh in last failed mounting
            journal->j_sb_buffer = bh
         jbd2_journal_load
          load_superblock
           journal_get_superblock
            if (buffer_verified(bh))
            // true, skip journal->j_format_version = 2, value is 0
          jbd2_journal_recover
           do_one_pass
            next_log_block += count_tags(journal, bh)
            // According to journal_tag_bytes(), 'tag_bytes' calculating is
            // affected by jbd2_has_feature_csum3(), jbd2_has_feature_csum3()
            // returns false because 'j->j_format_version >= 2' is not true,
            // then we get wrong next_log_block. The do_one_pass may exit
            // early whenoccuring non JBD2_MAGIC_NUMBER in 'next_log_block'.
      
      The filesystem is corrupted here, journal is partially replayed, and
      new journal sequence number actually is already used by last mounting.
      
      The invalidate_bdev() can drop all buffer heads even racing with bare
      reading block device(eg. systemd-udev), so we can fix it by invalidating
      bdev in error handling path in __ext4_fill_super().
      
      Fetch a reproducer in [Link].
      
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=217171
      Fixes: 25ed6e8a ("jbd2: enable journal clients to enable v2 checksumming")
      Cc: stable@vger.kernel.org # v3.5
      Conflicts:
      	fs/ext4/super.c
      	[ a7a79c29("ext4: unify the ext4 super block loading
      	  operation") is not applied.
      	  7edfd85b("ext4: Completely separate options parsing and sb
      	  setup") is not applied. ]
      Signed-off-by: NZhihao Cheng <chengzhihao1@huawei.com>
      Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
      Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
      1f6736e6
    • F
      btrfs: fix race between quota disable and quota assign ioctls · bbd9649a
      Filipe Manana 提交于
      mainline inclusion
      from mainline-v6.2-rc8
      commit 2f1a6be1
      category: bugfix
      bugzilla: https://gitee.com/src-openeuler/kernel/issues/I6PQCT
      CVE: CVE-2023-1611
      
      Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=2f1a6be12ab6c8470d5776e68644726c94257c54
      
      --------------------------------
      
      The quota assign ioctl can currently run in parallel with a quota disable
      ioctl call. The assign ioctl uses the quota root, while the disable ioctl
      frees that root, and therefore we can have a use-after-free triggered in
      the assign ioctl, leading to a trace like the following when KASAN is
      enabled:
      
        [672.723][T736] BUG: KASAN: slab-use-after-free in btrfs_search_slot+0x2962/0x2db0
        [672.723][T736] Read of size 8 at addr ffff888022ec0208 by task btrfs_search_sl/27736
        [672.724][T736]
        [672.725][T736] CPU: 1 PID: 27736 Comm: btrfs_search_sl Not tainted 6.3.0-rc3 #37
        [672.723][T736] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
        [672.727][T736] Call Trace:
        [672.728][T736]  <TASK>
        [672.728][T736]  dump_stack_lvl+0xd9/0x150
        [672.725][T736]  print_report+0xc1/0x5e0
        [672.720][T736]  ? __virt_addr_valid+0x61/0x2e0
        [672.727][T736]  ? __phys_addr+0xc9/0x150
        [672.725][T736]  ? btrfs_search_slot+0x2962/0x2db0
        [672.722][T736]  kasan_report+0xc0/0xf0
        [672.729][T736]  ? btrfs_search_slot+0x2962/0x2db0
        [672.724][T736]  btrfs_search_slot+0x2962/0x2db0
        [672.723][T736]  ? fs_reclaim_acquire+0xba/0x160
        [672.722][T736]  ? split_leaf+0x13d0/0x13d0
        [672.726][T736]  ? rcu_is_watching+0x12/0xb0
        [672.723][T736]  ? kmem_cache_alloc+0x338/0x3c0
        [672.722][T736]  update_qgroup_status_item+0xf7/0x320
        [672.724][T736]  ? add_qgroup_rb+0x3d0/0x3d0
        [672.739][T736]  ? do_raw_spin_lock+0x12d/0x2b0
        [672.730][T736]  ? spin_bug+0x1d0/0x1d0
        [672.737][T736]  btrfs_run_qgroups+0x5de/0x840
        [672.730][T736]  ? btrfs_qgroup_rescan_worker+0xa70/0xa70
        [672.738][T736]  ? __del_qgroup_relation+0x4ba/0xe00
        [672.738][T736]  btrfs_ioctl+0x3d58/0x5d80
        [672.735][T736]  ? tomoyo_path_number_perm+0x16a/0x550
        [672.737][T736]  ? tomoyo_execute_permission+0x4a0/0x4a0
        [672.731][T736]  ? btrfs_ioctl_get_supported_features+0x50/0x50
        [672.737][T736]  ? __sanitizer_cov_trace_switch+0x54/0x90
        [672.734][T736]  ? do_vfs_ioctl+0x132/0x1660
        [672.730][T736]  ? vfs_fileattr_set+0xc40/0xc40
        [672.730][T736]  ? _raw_spin_unlock_irq+0x2e/0x50
        [672.732][T736]  ? sigprocmask+0xf2/0x340
        [672.737][T736]  ? __fget_files+0x26a/0x480
        [672.732][T736]  ? bpf_lsm_file_ioctl+0x9/0x10
        [672.738][T736]  ? btrfs_ioctl_get_supported_features+0x50/0x50
        [672.736][T736]  __x64_sys_ioctl+0x198/0x210
        [672.736][T736]  do_syscall_64+0x39/0xb0
        [672.731][T736]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
        [672.739][T736] RIP: 0033:0x4556ad
        [672.742][T736]  </TASK>
        [672.743][T736]
        [672.748][T736] Allocated by task 27677:
        [672.743][T736]  kasan_save_stack+0x22/0x40
        [672.741][T736]  kasan_set_track+0x25/0x30
        [672.741][T736]  __kasan_kmalloc+0xa4/0xb0
        [672.749][T736]  btrfs_alloc_root+0x48/0x90
        [672.746][T736]  btrfs_create_tree+0x146/0xa20
        [672.744][T736]  btrfs_quota_enable+0x461/0x1d20
        [672.743][T736]  btrfs_ioctl+0x4a1c/0x5d80
        [672.747][T736]  __x64_sys_ioctl+0x198/0x210
        [672.749][T736]  do_syscall_64+0x39/0xb0
        [672.744][T736]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
        [672.756][T736]
        [672.757][T736] Freed by task 27677:
        [672.759][T736]  kasan_save_stack+0x22/0x40
        [672.759][T736]  kasan_set_track+0x25/0x30
        [672.756][T736]  kasan_save_free_info+0x2e/0x50
        [672.751][T736]  ____kasan_slab_free+0x162/0x1c0
        [672.758][T736]  slab_free_freelist_hook+0x89/0x1c0
        [672.752][T736]  __kmem_cache_free+0xaf/0x2e0
        [672.752][T736]  btrfs_put_root+0x1ff/0x2b0
        [672.759][T736]  btrfs_quota_disable+0x80a/0xbc0
        [672.752][T736]  btrfs_ioctl+0x3e5f/0x5d80
        [672.756][T736]  __x64_sys_ioctl+0x198/0x210
        [672.753][T736]  do_syscall_64+0x39/0xb0
        [672.765][T736]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
        [672.769][T736]
        [672.768][T736] The buggy address belongs to the object at ffff888022ec0000
        [672.768][T736]  which belongs to the cache kmalloc-4k of size 4096
        [672.769][T736] The buggy address is located 520 bytes inside of
        [672.769][T736]  freed 4096-byte region [ffff888022ec0000, ffff888022ec1000)
        [672.760][T736]
        [672.764][T736] The buggy address belongs to the physical page:
        [672.761][T736] page:ffffea00008bb000 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x22ec0
        [672.766][T736] head:ffffea00008bb000 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
        [672.779][T736] flags: 0xfff00000010200(slab|head|node=0|zone=1|lastcpupid=0x7ff)
        [672.770][T736] raw: 00fff00000010200 ffff888012842140 ffffea000054ba00 dead000000000002
        [672.770][T736] raw: 0000000000000000 0000000000040004 00000001ffffffff 0000000000000000
        [672.771][T736] page dumped because: kasan: bad access detected
        [672.778][T736] page_owner tracks the page as allocated
        [672.777][T736] page last allocated via order 3, migratetype Unmovable, gfp_mask 0xd2040(__GFP_IO|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), pid 88
        [672.779][T736]  get_page_from_freelist+0x119c/0x2d50
        [672.779][T736]  __alloc_pages+0x1cb/0x4a0
        [672.776][T736]  alloc_pages+0x1aa/0x270
        [672.773][T736]  allocate_slab+0x260/0x390
        [672.771][T736]  ___slab_alloc+0xa9a/0x13e0
        [672.778][T736]  __slab_alloc.constprop.0+0x56/0xb0
        [672.771][T736]  __kmem_cache_alloc_node+0x136/0x320
        [672.789][T736]  __kmalloc+0x4e/0x1a0
        [672.783][T736]  tomoyo_realpath_from_path+0xc3/0x600
        [672.781][T736]  tomoyo_path_perm+0x22f/0x420
        [672.782][T736]  tomoyo_path_unlink+0x92/0xd0
        [672.780][T736]  security_path_unlink+0xdb/0x150
        [672.788][T736]  do_unlinkat+0x377/0x680
        [672.788][T736]  __x64_sys_unlink+0xca/0x110
        [672.789][T736]  do_syscall_64+0x39/0xb0
        [672.783][T736]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
        [672.784][T736] page last free stack trace:
        [672.787][T736]  free_pcp_prepare+0x4e5/0x920
        [672.787][T736]  free_unref_page+0x1d/0x4e0
        [672.784][T736]  __unfreeze_partials+0x17c/0x1a0
        [672.797][T736]  qlist_free_all+0x6a/0x180
        [672.796][T736]  kasan_quarantine_reduce+0x189/0x1d0
        [672.797][T736]  __kasan_slab_alloc+0x64/0x90
        [672.793][T736]  kmem_cache_alloc+0x17c/0x3c0
        [672.799][T736]  getname_flags.part.0+0x50/0x4e0
        [672.799][T736]  getname_flags+0x9e/0xe0
        [672.792][T736]  vfs_fstatat+0x77/0xb0
        [672.791][T736]  __do_sys_newlstat+0x84/0x100
        [672.798][T736]  do_syscall_64+0x39/0xb0
        [672.796][T736]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
        [672.790][T736]
        [672.791][T736] Memory state around the buggy address:
        [672.799][T736]  ffff888022ec0100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
        [672.805][T736]  ffff888022ec0180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
        [672.802][T736] >ffff888022ec0200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
        [672.809][T736]                       ^
        [672.809][T736]  ffff888022ec0280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
        [672.809][T736]  ffff888022ec0300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      
      Fix this by having the qgroup assign ioctl take the qgroup ioctl mutex
      before calling btrfs_run_qgroups(), which is what all qgroup ioctls should
      call.
      Reported-by: Nbutt3rflyh4ck <butterflyhuangxx@gmail.com>
      Link: https://lore.kernel.org/linux-btrfs/CAFcO6XN3VD8ogmHwqRk4kbiwtpUSNySu2VAxN8waEPciCHJvMA@mail.gmail.com/
      CC: stable@vger.kernel.org # 5.10+
      Reviewed-by: NQu Wenruo <wqu@suse.com>
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NLong Li <leo.lilong@huawei.com>
      Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
      Reviewed-by: NWang Weiyang <wangweiyang2@huawei.com>
      Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
      bbd9649a
    • M
      dm crypt: add cond_resched() to dmcrypt_write() · d370383e
      Mikulas Patocka 提交于
      mainline inclusion
      from mainline-v6.3-rc4
      commit fb294b1c
      category: bugfix
      bugzilla: 188393, https://gitee.com/openeuler/kernel/issues/I6JPSH
      
      Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=fb294b1c0ba982144ca467a75e7d01ff26304e2b
      
      ----------------------------------------
      
      The loop in dmcrypt_write may be running for unbounded amount of time,
      thus we need cond_resched() in it.
      
      This commit fixes the following warning:
      
      [ 3391.153255][   C12] watchdog: BUG: soft lockup - CPU#12 stuck for 23s! [dmcrypt_write/2:2897]
      ...
      [ 3391.387210][   C12] Call trace:
      [ 3391.390338][   C12]  blk_attempt_bio_merge.part.6+0x38/0x158
      [ 3391.395970][   C12]  blk_attempt_plug_merge+0xc0/0x1b0
      [ 3391.401085][   C12]  blk_mq_submit_bio+0x398/0x550
      [ 3391.405856][   C12]  submit_bio_noacct+0x308/0x380
      [ 3391.410630][   C12]  dmcrypt_write+0x1e4/0x208 [dm_crypt]
      [ 3391.416005][   C12]  kthread+0x130/0x138
      [ 3391.419911][   C12]  ret_from_fork+0x10/0x18
      Reported-by: Nyangerkun <yangerkun@huawei.com>
      Fixes: dc267621 ("dm crypt: offload writes to thread")
      Cc: stable@vger.kernel.org
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@kernel.org>
      Signed-off-by: Nyangerkun <yangerkun@huawei.com>
      Reviewed-by: NHou Tao <houtao1@huawei.com>
      Reviewed-by: NYu Kuai <yukuai3@huawei.com>
      Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
      d370383e
    • S
      driver core: Fix lockdep warning on wfs_lock · de1cc2ef
      Saravana Kannan 提交于
      mainline inclusion
      from mainline-v5.11-rc1
      commit 7008e58c
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I6LM81
      CVE: NA
      
      Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=7008e58c63bc8468e8d16154e25d780198b3ecfc
      
      ----------------------------------------------
      
      There's a potential deadlock with the following cycle:
      wfs_lock --> device_links_lock --> kn->count
      
      Fix this by simply dropping the lock around a list_empty() check that's
      just exported to a sysfs file. The sysfs file output is an instantaneous
      check anyway and the lock doesn't really add any protection.
      
      Lockdep log:
      
      [   48.808132]
      [   48.808132] the existing dependency chain (in reverse order) is:
      [   48.809069]
      [   48.809069] -> #2 (kn->count){++++}:
      [   48.809707]        __kernfs_remove.llvm.7860393000964815146+0x2d4/0x460
      [   48.810537]        kernfs_remove_by_name_ns+0x54/0x9c
      [   48.811171]        sysfs_remove_file_ns+0x18/0x24
      [   48.811762]        device_del+0x2b8/0x5a8
      [   48.812269]        __device_link_del+0x98/0xb8
      [   48.812829]        device_links_driver_bound+0x210/0x2d8
      [   48.813496]        driver_bound+0x44/0xf8
      [   48.814000]        really_probe+0x340/0x6e0
      [   48.814526]        driver_probe_device+0xb8/0x100
      [   48.815117]        device_driver_attach+0x78/0xb8
      [   48.815708]        __driver_attach+0xe0/0x194
      [   48.816255]        bus_for_each_dev+0xa8/0x11c
      [   48.816816]        driver_attach+0x24/0x30
      [   48.817331]        bus_add_driver+0x100/0x1e0
      [   48.817880]        driver_register+0x78/0x114
      [   48.818427]        __platform_driver_register+0x44/0x50
      [   48.819089]        0xffffffdbb3227038
      [   48.819551]        do_one_initcall+0xd8/0x1e0
      [   48.820099]        do_init_module+0xd8/0x298
      [   48.820636]        load_module+0x3afc/0x44c8
      [   48.821173]        __arm64_sys_finit_module+0xbc/0xf0
      [   48.821807]        el0_svc_common+0xbc/0x1d0
      [   48.822344]        el0_svc_handler+0x74/0x98
      [   48.822882]        el0_svc+0x8/0xc
      [   48.823310]
      [   48.823310] -> #1 (device_links_lock){+.+.}:
      [   48.824036]        __mutex_lock_common+0xe0/0xe44
      [   48.824626]        mutex_lock_nested+0x28/0x34
      [   48.825185]        device_link_add+0xd4/0x4ec
      [   48.825734]        of_link_to_suppliers+0x158/0x204
      [   48.826347]        of_fwnode_add_links+0x50/0x64
      [   48.826928]        device_link_add_missing_supplier_links+0x90/0x11c
      [   48.827725]        fw_devlink_resume+0x58/0x130
      [   48.828296]        of_platform_default_populate_init+0xb4/0xd0
      [   48.829030]        do_one_initcall+0xd8/0x1e0
      [   48.829578]        do_initcall_level+0xb8/0xcc
      [   48.830137]        do_basic_setup+0x60/0x7c
      [   48.830662]        kernel_init_freeable+0x128/0x1ac
      [   48.831275]        kernel_init+0x18/0x29c
      [   48.831781]        ret_from_fork+0x10/0x18
      [   48.832297]
      [   48.832297] -> #0 (wfs_lock){+.+.}:
      [   48.832922]        __lock_acquire+0xe04/0x2e20
      [   48.833480]        lock_acquire+0xbc/0xec
      [   48.833984]        __mutex_lock_common+0xe0/0xe44
      [   48.834577]        mutex_lock_nested+0x28/0x34
      [   48.835136]        waiting_for_supplier_show+0x3c/0x98
      [   48.835781]        dev_attr_show+0x48/0xb4
      [   48.836295]        sysfs_kf_seq_show+0xe8/0x184
      [   48.836864]        kernfs_seq_show+0x48/0x8c
      [   48.837401]        seq_read+0x1c8/0x600
      [   48.837884]        kernfs_fop_read+0x68/0x204
      [   48.838431]        __vfs_read+0x60/0x214
      [   48.838925]        vfs_read+0xbc/0x15c
      [   48.839397]        ksys_read+0x78/0xe4
      [   48.839869]        __arm64_sys_read+0x1c/0x28
      [   48.840416]        el0_svc_common+0xbc/0x1d0
      [   48.840953]        el0_svc_handler+0x74/0x98
      [   48.841490]        el0_svc+0x8/0xc
      [   48.841917]
      [   48.841917] other info that might help us debug this:
      [   48.841917]
      [   48.842920] Chain exists of:
      [   48.842920]   wfs_lock --> device_links_lock --> kn->count
      [   48.842920]
      [   48.844152]  Possible unsafe locking scenario:
      [   48.844152]
      [   48.844895]        CPU0                    CPU1
      [   48.845463]        ----                    ----
      [   48.846032]   lock(kn->count);
      [   48.846417]                                lock(device_links_lock);
      [   48.847203]                                lock(kn->count);
      [   48.847902]   lock(wfs_lock);
      [   48.848276]
      [   48.848276]  *** DEADLOCK ***
      
      Reported-by: Cheng-Jui.Wang@mediatek.com
      Signed-off-by: NSaravana Kannan <saravanak@google.com>
      Link: https://lore.kernel.org/r/20201104205431.3795207-1-saravanak@google.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NZhang Zekun <zhangzekun11@huawei.com>
      Reviewed-by: NWeilong Chen <chenweilong@huawei.com>
      Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
      de1cc2ef
    • J
      driver core: platform: Add extra error check in devm_platform_get_irqs_affinity() · 73fa0aa6
      John Garry 提交于
      mainline inclusion
      from mainline-v5.11-rc5
      commit e1dc2099
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I6LM81
      CVE: NA
      
      Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e1dc20995cb9fa04b46e8f37113a7203c906d2bf
      
      ---------------------------------------------
      
      The current check of nvec < minvec for nvec returned from
      platform_irq_count() will not detect a negative error code in nvec.
      
      This is because minvec is unsigned, and, as such, nvec is promoted to
      unsigned in that check, which will make it a huge number (if it contained
      -EPROBE_DEFER).
      
      In practice, an error should not occur in nvec for the only in-tree
      user, but add a check anyway.
      
      Fixes: e15f2fa9 ("driver core: platform: Add devm_platform_get_irqs_affinity()")
      Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: NJohn Garry <john.garry@huawei.com>
      Signed-off-by: NMarc Zyngier <maz@kernel.org>
      Link: https://lore.kernel.org/r/1608561055-231244-1-git-send-email-john.garry@huawei.comSigned-off-by: NZhang Zekun <zhangzekun11@huawei.com>
      Reviewed-by: NWeilong Chen <chenweilong@huawei.com>
      Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
      73fa0aa6
    • D
      xfs: don't leak memory when attr fork loading fails · 99054b56
      Darrick J. Wong 提交于
      mainline inclusion
      from mainline-v5.19-rc5
      commit c78c2d09
      category: bugfix
      bugzilla: 187164, https://gitee.com/openeuler/kernel/issues/I4KIAO
      CVE: NA
      
      Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c78c2d0903183a41beb90c56a923e30f90fa91b9
      
      --------------------------------
      
      I observed the following evidence of a memory leak while running xfs/399
      from the xfs fsck test suite (edited for brevity):
      
      XFS (sde): Metadata corruption detected at xfs_attr_shortform_verify_struct.part.0+0x7b/0xb0 [xfs], inode 0x1172 attr fork
      XFS: Assertion failed: ip->i_af.if_u1.if_data == NULL, file: fs/xfs/libxfs/xfs_inode_fork.c, line: 315
      ------------[ cut here ]------------
      WARNING: CPU: 2 PID: 91635 at fs/xfs/xfs_message.c:104 assfail+0x46/0x4a [xfs]
      CPU: 2 PID: 91635 Comm: xfs_scrub Tainted: G        W         5.19.0-rc7-xfsx #rc7 6e6475eb29fd9dda3181f81b7ca7ff961d277a40
      Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1 04/01/2014
      RIP: 0010:assfail+0x46/0x4a [xfs]
      Call Trace:
       <TASK>
       xfs_ifork_zap_attr+0x7c/0xb0
       xfs_iformat_attr_fork+0x86/0x110
       xfs_inode_from_disk+0x41d/0x480
       xfs_iget+0x389/0xd70
       xfs_bulkstat_one_int+0x5b/0x540
       xfs_bulkstat_iwalk+0x1e/0x30
       xfs_iwalk_ag_recs+0xd1/0x160
       xfs_iwalk_run_callbacks+0xb9/0x180
       xfs_iwalk_ag+0x1d8/0x2e0
       xfs_iwalk+0x141/0x220
       xfs_bulkstat+0x105/0x180
       xfs_ioc_bulkstat.constprop.0.isra.0+0xc5/0x130
       xfs_file_ioctl+0xa5f/0xef0
       __x64_sys_ioctl+0x82/0xa0
       do_syscall_64+0x2b/0x80
       entry_SYSCALL_64_after_hwframe+0x46/0xb0
      
      This newly-added assertion checks that there aren't any incore data
      structures hanging off the incore fork when we're trying to reset its
      contents.  From the call trace, it is evident that iget was trying to
      construct an incore inode from the ondisk inode, but the attr fork
      verifier failed and we were trying to undo all the memory allocations
      that we had done earlier.
      
      The three assertions in xfs_ifork_zap_attr check that the caller has
      already called xfs_idestroy_fork, which clearly has not been done here.
      As the zap function then zeroes the pointers, we've effectively leaked
      the memory.
      
      The shortest change would have been to insert an extra call to
      xfs_idestroy_fork, but it makes more sense to bundle the _idestroy_fork
      call into _zap_attr, since all other callsites call _idestroy_fork
      immediately prior to calling _zap_attr.  IOWs, it eliminates one way to
      fail.
      
      Note: This change only applies cleanly to 2ed5b09b, since we just
      reworked the attr fork lifetime.  However, I think this memory leak has
      existed since 0f45a1b2, since the chain xfs_iformat_attr_fork ->
      xfs_iformat_local -> xfs_init_local_fork will allocate
      ifp->if_u1.if_data, but if xfs_ifork_verify_local_attr fails,
      xfs_iformat_attr_fork will free i_afp without freeing any of the stuff
      hanging off i_afp.  The solution for older kernels I think is to add the
      missing call to xfs_idestroy_fork just prior to calling kmem_cache_free.
      
      Found by fuzzing a.sfattr.hdr.totsize = lastbit in xfs/399.
      
      Fixes: 2ed5b09b ("xfs: make inode attribute forks a permanent part of struct xfs_inode")
      Probably-Fixes: 0f45a1b2 ("xfs: improve local fork verification")
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      
      conflicts:
      	fs/xfs/libxfs/xfs_attr_leaf.c
      Signed-off-by: NLong Li <leo.lilong@huawei.com>
      Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
      Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
      99054b56
    • D
      xfs: delete unnecessary NULL checks · c462f069
      Dan Carpenter 提交于
      mainline inclusion
      from mainline-v5.19-rc5
      commit 3f52e016
      category: bugfix
      bugzilla: 187164, https://gitee.com/openeuler/kernel/issues/I4KIAO
      CVE: NA
      
      Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=3f52e016af600982989b5dee958d313c52483c92
      
      --------------------------------
      
      These NULL check are no long needed after commit 2ed5b09b ("xfs:
      make inode attribute forks a permanent part of struct xfs_inode").
      Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
      Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: NLong Li <leo.lilong@huawei.com>
      Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
      Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
      c462f069
    • D
      xfs: replace inode fork size macros with functions · e4af4f66
      Darrick J. Wong 提交于
      mainline inclusion
      from mainline-v5.19-rc5
      commit c01147d9
      category: bugfix
      bugzilla: 187164, https://gitee.com/openeuler/kernel/issues/I4KIAO
      CVE: NA
      
      Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c01147d929899f02a0a8b15e406d12784768ca72
      
      --------------------------------
      
      Replace the shouty macros here with typechecked helper functions.
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      
      conflicts:
      	fs/xfs/libxfs/xfs_attr_leaf.c
      	fs/xfs/libxfs/xfs_bmap.c
      	fs/xfs/libxfs/xfs_bmap_btree.c
      	fs/xfs/libxfs/xfs_dir2.c
      	fs/xfs/libxfs/xfs_inode_fork.h
      	fs/xfs/scrub/symlink.c
      	fs/xfs/xfs_itable.c
      	fs/xfs/xfs_symlink.c
      	fs/xfs/xfs_trace.h
      Signed-off-by: NLong Li <leo.lilong@huawei.com>
      Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
      Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
      e4af4f66
    • D
      xfs: replace XFS_IFORK_Q with a proper predicate function · b81094f0
      Darrick J. Wong 提交于
      mainline inclusion
      from mainline-v5.19-rc5
      commit 932b42c6
      category: bugfix
      bugzilla: 187164, https://gitee.com/openeuler/kernel/issues/I4KIAO
      CVE: NA
      
      Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=932b42c66cb5d0ca9800b128415b4ad6b1952b3e
      
      --------------------------------
      
      Replace this shouty macro with a real C function that has a more
      descriptive name.
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      
      conflicts:
      	fs/xfs/libxfs/xfs_attr.h
      	fs/xfs/libxfs/xfs_inode_fork.c
      	fs/xfs/scrub/btree.c
      	fs/xfs/xfs_inode.c
      Signed-off-by: NLong Li <leo.lilong@huawei.com>
      Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
      Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
      b81094f0
    • D
      xfs: use XFS_IFORK_Q to determine the presence of an xattr fork · a0edb32c
      Darrick J. Wong 提交于
      mainline inclusion
      from mainline-v5.19-rc5
      commit e45d7cb2
      category: bugfix
      bugzilla: 187164, https://gitee.com/openeuler/kernel/issues/I4KIAO
      CVE: NA
      
      Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e45d7cb2356e6b59fe64da28324025cc6fcd3fbd
      
      --------------------------------
      
      Modify xfs_ifork_ptr to return a NULL pointer if the caller asks for the
      attribute fork but i_forkoff is zero.  This eliminates the ambiguity
      between i_forkoff and i_af.if_present, which should make it easier to
      understand the lifetime of attr forks.
      
      While we're at it, remove the if_present checks around calls to
      xfs_idestroy_fork and xfs_ifork_zap_attr since they can both handle attr
      forks that have already been torn down.
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      
      conflicts:
      	fs/xfs/libxfs/xfs_attr.h
      	fs/xfs/libxfs/xfs_inode_fork.c
      	fs/xfs/libxfs/xfs_inode_fork.h
      	fs/xfs/xfs_icache.c
      	fs/xfs/xfs_inode.c
      Signed-off-by: NLong Li <leo.lilong@huawei.com>
      Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
      Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
      a0edb32c
    • D
      xfs: make inode attribute forks a permanent part of struct xfs_inode · 732df364
      Darrick J. Wong 提交于
      mainline inclusion
      from mainline-v5.19-rc5
      commit 2ed5b09b
      category: bugfix
      bugzilla: 187164, https://gitee.com/openeuler/kernel/issues/I4KIAO
      CVE: NA
      
      Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=2ed5b09b3e8fc274ae8fecd6ab7c5106a364bed1
      
      --------------------------------
      
      Syzkaller reported a UAF bug a while back:
      
      ==================================================================
      BUG: KASAN: use-after-free in xfs_ilock_attr_map_shared+0xe3/0xf6 fs/xfs/xfs_inode.c:127
      Read of size 4 at addr ffff88802cec919c by task syz-executor262/2958
      
      CPU: 2 PID: 2958 Comm: syz-executor262 Not tainted
      5.15.0-0.30.3-20220406_1406 #3
      Hardware name: Red Hat KVM, BIOS 1.13.0-2.module+el8.3.0+7860+a7792d29
      04/01/2014
      Call Trace:
       <TASK>
       __dump_stack lib/dump_stack.c:88 [inline]
       dump_stack_lvl+0x82/0xa9 lib/dump_stack.c:106
       print_address_description.constprop.9+0x21/0x2d5 mm/kasan/report.c:256
       __kasan_report mm/kasan/report.c:442 [inline]
       kasan_report.cold.14+0x7f/0x11b mm/kasan/report.c:459
       xfs_ilock_attr_map_shared+0xe3/0xf6 fs/xfs/xfs_inode.c:127
       xfs_attr_get+0x378/0x4c2 fs/xfs/libxfs/xfs_attr.c:159
       xfs_xattr_get+0xe3/0x150 fs/xfs/xfs_xattr.c:36
       __vfs_getxattr+0xdf/0x13d fs/xattr.c:399
       cap_inode_need_killpriv+0x41/0x5d security/commoncap.c:300
       security_inode_need_killpriv+0x4c/0x97 security/security.c:1408
       dentry_needs_remove_privs.part.28+0x21/0x63 fs/inode.c:1912
       dentry_needs_remove_privs+0x80/0x9e fs/inode.c:1908
       do_truncate+0xc3/0x1e0 fs/open.c:56
       handle_truncate fs/namei.c:3084 [inline]
       do_open fs/namei.c:3432 [inline]
       path_openat+0x30ab/0x396d fs/namei.c:3561
       do_filp_open+0x1c4/0x290 fs/namei.c:3588
       do_sys_openat2+0x60d/0x98c fs/open.c:1212
       do_sys_open+0xcf/0x13c fs/open.c:1228
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x3a/0x7e arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0x0
      RIP: 0033:0x7f7ef4bb753d
      Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48
      89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73
      01 c3 48 8b 0d 1b 79 2c 00 f7 d8 64 89 01 48
      RSP: 002b:00007f7ef52c2ed8 EFLAGS: 00000246 ORIG_RAX: 0000000000000055
      RAX: ffffffffffffffda RBX: 0000000000404148 RCX: 00007f7ef4bb753d
      RDX: 00007f7ef4bb753d RSI: 0000000000000000 RDI: 0000000020004fc0
      RBP: 0000000000404140 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 0030656c69662f2e
      R13: 00007ffd794db37f R14: 00007ffd794db470 R15: 00007f7ef52c2fc0
       </TASK>
      
      Allocated by task 2953:
       kasan_save_stack+0x19/0x38 mm/kasan/common.c:38
       kasan_set_track mm/kasan/common.c:46 [inline]
       set_alloc_info mm/kasan/common.c:434 [inline]
       __kasan_slab_alloc+0x68/0x7c mm/kasan/common.c:467
       kasan_slab_alloc include/linux/kasan.h:254 [inline]
       slab_post_alloc_hook mm/slab.h:519 [inline]
       slab_alloc_node mm/slub.c:3213 [inline]
       slab_alloc mm/slub.c:3221 [inline]
       kmem_cache_alloc+0x11b/0x3eb mm/slub.c:3226
       kmem_cache_zalloc include/linux/slab.h:711 [inline]
       xfs_ifork_alloc+0x25/0xa2 fs/xfs/libxfs/xfs_inode_fork.c:287
       xfs_bmap_add_attrfork+0x3f2/0x9b1 fs/xfs/libxfs/xfs_bmap.c:1098
       xfs_attr_set+0xe38/0x12a7 fs/xfs/libxfs/xfs_attr.c:746
       xfs_xattr_set+0xeb/0x1a9 fs/xfs/xfs_xattr.c:59
       __vfs_setxattr+0x11b/0x177 fs/xattr.c:180
       __vfs_setxattr_noperm+0x128/0x5e0 fs/xattr.c:214
       __vfs_setxattr_locked+0x1d4/0x258 fs/xattr.c:275
       vfs_setxattr+0x154/0x33d fs/xattr.c:301
       setxattr+0x216/0x29f fs/xattr.c:575
       __do_sys_fsetxattr fs/xattr.c:632 [inline]
       __se_sys_fsetxattr fs/xattr.c:621 [inline]
       __x64_sys_fsetxattr+0x243/0x2fe fs/xattr.c:621
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x3a/0x7e arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0x0
      
      Freed by task 2949:
       kasan_save_stack+0x19/0x38 mm/kasan/common.c:38
       kasan_set_track+0x1c/0x21 mm/kasan/common.c:46
       kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:360
       ____kasan_slab_free mm/kasan/common.c:366 [inline]
       ____kasan_slab_free mm/kasan/common.c:328 [inline]
       __kasan_slab_free+0xe2/0x10e mm/kasan/common.c:374
       kasan_slab_free include/linux/kasan.h:230 [inline]
       slab_free_hook mm/slub.c:1700 [inline]
       slab_free_freelist_hook mm/slub.c:1726 [inline]
       slab_free mm/slub.c:3492 [inline]
       kmem_cache_free+0xdc/0x3ce mm/slub.c:3508
       xfs_attr_fork_remove+0x8d/0x132 fs/xfs/libxfs/xfs_attr_leaf.c:773
       xfs_attr_sf_removename+0x5dd/0x6cb fs/xfs/libxfs/xfs_attr_leaf.c:822
       xfs_attr_remove_iter+0x68c/0x805 fs/xfs/libxfs/xfs_attr.c:1413
       xfs_attr_remove_args+0xb1/0x10d fs/xfs/libxfs/xfs_attr.c:684
       xfs_attr_set+0xf1e/0x12a7 fs/xfs/libxfs/xfs_attr.c:802
       xfs_xattr_set+0xeb/0x1a9 fs/xfs/xfs_xattr.c:59
       __vfs_removexattr+0x106/0x16a fs/xattr.c:468
       cap_inode_killpriv+0x24/0x47 security/commoncap.c:324
       security_inode_killpriv+0x54/0xa1 security/security.c:1414
       setattr_prepare+0x1a6/0x897 fs/attr.c:146
       xfs_vn_change_ok+0x111/0x15e fs/xfs/xfs_iops.c:682
       xfs_vn_setattr_size+0x5f/0x15a fs/xfs/xfs_iops.c:1065
       xfs_vn_setattr+0x125/0x2ad fs/xfs/xfs_iops.c:1093
       notify_change+0xae5/0x10a1 fs/attr.c:410
       do_truncate+0x134/0x1e0 fs/open.c:64
       handle_truncate fs/namei.c:3084 [inline]
       do_open fs/namei.c:3432 [inline]
       path_openat+0x30ab/0x396d fs/namei.c:3561
       do_filp_open+0x1c4/0x290 fs/namei.c:3588
       do_sys_openat2+0x60d/0x98c fs/open.c:1212
       do_sys_open+0xcf/0x13c fs/open.c:1228
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x3a/0x7e arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0x0
      
      The buggy address belongs to the object at ffff88802cec9188
       which belongs to the cache xfs_ifork of size 40
      The buggy address is located 20 bytes inside of
       40-byte region [ffff88802cec9188, ffff88802cec91b0)
      The buggy address belongs to the page:
      page:00000000c3af36a1 refcount:1 mapcount:0 mapping:0000000000000000
      index:0x0 pfn:0x2cec9
      flags: 0xfffffc0000200(slab|node=0|zone=1|lastcpupid=0x1fffff)
      raw: 000fffffc0000200 ffffea00009d2580 0000000600000006 ffff88801a9ffc80
      raw: 0000000000000000 0000000080490049 00000001ffffffff 0000000000000000
      page dumped because: kasan: bad access detected
      
      Memory state around the buggy address:
       ffff88802cec9080: fb fb fb fc fc fa fb fb fb fb fc fc fb fb fb fb
       ffff88802cec9100: fb fc fc fb fb fb fb fb fc fc fb fb fb fb fb fc
      >ffff88802cec9180: fc fa fb fb fb fb fc fc fa fb fb fb fb fc fc fb
                                  ^
       ffff88802cec9200: fb fb fb fb fc fc fb fb fb fb fb fc fc fb fb fb
       ffff88802cec9280: fb fb fc fc fa fb fb fb fb fc fc fa fb fb fb fb
      ==================================================================
      
      The root cause of this bug is the unlocked access to xfs_inode.i_afp
      from the getxattr code paths while trying to determine which ILOCK mode
      to use to stabilize the xattr data.  Unfortunately, the VFS does not
      acquire i_rwsem when vfs_getxattr (or listxattr) call into the
      filesystem, which means that getxattr can race with a removexattr that's
      tearing down the attr fork and crash:
      
      xfs_attr_set:                          xfs_attr_get:
      xfs_attr_fork_remove:                  xfs_ilock_attr_map_shared:
      
      xfs_idestroy_fork(ip->i_afp);
      kmem_cache_free(xfs_ifork_cache, ip->i_afp);
      
                                             if (ip->i_afp &&
      
      ip->i_afp = NULL;
      
                                                 xfs_need_iread_extents(ip->i_afp))
                                             <KABOOM>
      
      ip->i_forkoff = 0;
      
      Regrettably, the VFS is much more lax about i_rwsem and getxattr than
      is immediately obvious -- not only does it not guarantee that we hold
      i_rwsem, it actually doesn't guarantee that we *don't* hold it either.
      The getxattr system call won't acquire the lock before calling XFS, but
      the file capabilities code calls getxattr with and without i_rwsem held
      to determine if the "security.capabilities" xattr is set on the file.
      
      Fixing the VFS locking requires a treewide investigation into every code
      path that could touch an xattr and what i_rwsem state it expects or sets
      up.  That could take years or even prove impossible; fortunately, we
      can fix this UAF problem inside XFS.
      
      An earlier version of this patch used smp_wmb in xfs_attr_fork_remove to
      ensure that i_forkoff is always zeroed before i_afp is set to null and
      changed the read paths to use smp_rmb before accessing i_forkoff and
      i_afp, which avoided these UAF problems.  However, the patch author was
      too busy dealing with other problems in the meantime, and by the time he
      came back to this issue, the situation had changed a bit.
      
      On a modern system with selinux, each inode will always have at least
      one xattr for the selinux label, so it doesn't make much sense to keep
      incurring the extra pointer dereference.  Furthermore, Allison's
      upcoming parent pointer patchset will also cause nearly every inode in
      the filesystem to have extended attributes.  Therefore, make the inode
      attribute fork structure part of struct xfs_inode, at a cost of 40 more
      bytes.
      
      This patch adds a clunky if_present field where necessary to maintain
      the existing logic of xattr fork null pointer testing in the existing
      codebase.  The next patch switches the logic over to XFS_IFORK_Q and it
      all goes away.
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      
      conflicts:
      	fs/xfs/libxfs/xfs_attr.c
      	fs/xfs/libxfs/xfs_attr.h
      	fs/xfs/libxfs/xfs_attr_leaf.c
      	fs/xfs/libxfs/xfs_bmap.c
      	fs/xfs/libxfs/xfs_inode_buf.c
      	fs/xfs/libxfs/xfs_inode_fork.c
      	fs/xfs/libxfs/xfs_inode_fork.h
      	fs/xfs/xfs_attr_inactive.c
      	fs/xfs/xfs_attr_list.c
      	fs/xfs/xfs_icache.c
      	fs/xfs/xfs_inode.c
      	fs/xfs/xfs_inode.h
      	fs/xfs/xfs_inode_item.c
      	fs/xfs/xfs_itable.c
      Signed-off-by: NLong Li <leo.lilong@huawei.com>
      Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
      Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
      732df364
    • D
      xfs: convert XFS_IFORK_PTR to a static inline helper · 47998bc8
      Darrick J. Wong 提交于
      mainline inclusion
      from mainline-v5.19-rc5
      commit 732436ef
      category: bugfix
      bugzilla: 187164, https://gitee.com/openeuler/kernel/issues/I4KIAO
      CVE: NA
      
      Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=732436ef916b4f338d672ea56accfdb11e8d0732
      
      --------------------------------
      
      We're about to make this logic do a bit more, so convert the macro to a
      static inline function for better typechecking and fewer shouty macros.
      No functional changes here.
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      
      conflicts:
      	fs/xfs/libxfs/xfs_bmap.c
      	fs/xfs/libxfs/xfs_bmap_btree.c
      	fs/xfs/libxfs/xfs_inode_fork.c
      	fs/xfs/libxfs/xfs_inode_fork.h
      	fs/xfs/scrub/bmap.c
      	fs/xfs/scrub/symlink.c
      	fs/xfs/xfs_inode.c
      	fs/xfs/xfs_ioctl.c
      	fs/xfs/xfs_qm.c
      	fs/xfs/xfs_reflink.c
      Signed-off-by: NLong Li <leo.lilong@huawei.com>
      Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
      Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
      47998bc8
    • B
      xfs: don't reuse busy extents on extent trim · bbfe8670
      Brian Foster 提交于
      mainline inclusion
      from mainline-v5.11-rc4
      commit 06058bc4
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I4KIAO
      CVE: NA
      
      Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=06058bc40534530e617e5623775c53bb24f032cb
      
      --------------------------------
      
      Freed extents are marked busy from the point the freeing transaction
      commits until the associated CIL context is checkpointed to the log.
      This prevents reuse and overwrite of recently freed blocks before
      the changes are committed to disk, which can lead to corruption
      after a crash. The exception to this rule is that metadata
      allocation is allowed to reuse busy extents because metadata changes
      are also logged.
      
      As of commit 97d3ac75 ("xfs: exact busy extent tracking"), XFS
      has allowed modification or complete invalidation of outstanding
      busy extents for metadata allocations. This implementation assumes
      that use of the associated extent is imminent, which is not always
      the case. For example, the trimmed extent might not satisfy the
      minimum length of the allocation request, or the allocation
      algorithm might be involved in a search for the optimal result based
      on locality.
      
      generic/019 reproduces a corruption caused by this scenario. First,
      a metadata block (usually a bmbt or symlink block) is freed from an
      inode. A subsequent bmbt split on an unrelated inode attempts a near
      mode allocation request that invalidates the busy block during the
      search, but does not ultimately allocate it. Due to the busy state
      invalidation, the block is no longer considered busy to subsequent
      allocation. A direct I/O write request immediately allocates the
      block and writes to it. Finally, the filesystem crashes while in a
      state where the initial metadata block free had not committed to the
      on-disk log. After recovery, the original metadata block is in its
      original location as expected, but has been corrupted by the
      aforementioned dio.
      
      This demonstrates that it is fundamentally unsafe to modify busy
      extent state for extents that are not guaranteed to be allocated.
      This applies to pretty much all of the code paths that currently
      trim busy extents for one reason or another. Therefore to address
      this problem, drop the reuse mechanism from the busy extent trim
      path. This code already knows how to return partial non-busy ranges
      of the targeted free extent and higher level code tracks the busy
      state of the allocation attempt. If a block allocation fails where
      one or more candidate extents is busy, we force the log and retry
      the allocation.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: NChandan Babu R <chandanrlinux@gmail.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NLong Li <leo.lilong@huawei.com>
      Reviewed-by: NYang Erkun <yangerkun@huawei.com>
      Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
      Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
      bbfe8670
    • Z
      fs/xfs: convert comma to semicolon · eaff78ee
      Zheng Yongjun 提交于
      mainline inclusion
      from mainline-v5.10-rc5
      commit 1189686e
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I4KIAO
      CVE: NA
      
      Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=1189686e5440041057f8cc21a7c1d13bb6642cb9
      
      --------------------------------
      
      Replace a comma between expression statements by a semicolon.
      Signed-off-by: NZheng Yongjun <zhengyongjun3@huawei.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NEric Sandeen <sandeen@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NLong Li <leo.lilong@huawei.com>
      Reviewed-by: NYang Erkun <yangerkun@huawei.com>
      Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
      Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
      eaff78ee
    • D
      xfs: xfs_ail_push_all_sync() stalls when racing with updates · 6efc0ef9
      Dave Chinner 提交于
      mainline inclusion
      from mainline-v5.17-rc6
      commit 941fbdfd
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I4KIAO
      CVE: NA
      
      Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=941fbdfd6dd0f1d7961c28123b5460912f678cb5
      
      --------------------------------
      
      xfs_ail_push_all_sync() has a loop like this:
      
      while max_ail_lsn {
      	prepare_to_wait(ail_empty)
      	target = max_ail_lsn
      	wake_up(ail_task);
      	schedule()
      }
      
      Which is designed to sleep until the AIL is emptied. When
      xfs_ail_update_finish() moves the tail of the log, it does:
      
      	if (list_empty(&ailp->ail_head))
      		wake_up_all(&ailp->ail_empty);
      
      So it will only wake up the sync push waiter when the AIL goes
      empty. If, by the time the push waiter has woken, the AIL has more
      in it, it will reset the target, wake the push task and go back to
      sleep.
      
      The problem here is that if the AIL is having items added to it
      when xfs_ail_push_all_sync() is called, then they may get inserted
      into the AIL at a LSN higher than the target LSN. At this point,
      xfsaild_push() will see that the target is X, the item LSNs are
      (X+N) and skip over them, hence never pushing the out.
      
      The result of this the AIL will not get emptied by the AIL push
      thread, hence xfs_ail_finish_update() will never see the AIL being
      empty even if it moves the tail. Hence xfs_ail_push_all_sync() never
      gets woken and hence cannot update the push target to capture the
      items beyond the current target on the LSN.
      
      This is a TOCTOU type of issue so the way to avoid it is to not
      use the push target at all for sync pushes. We know that a sync push
      is being requested by the fact the ail_empty wait queue is active,
      hence the xfsaild can just set the target to max_ail_lsn on every
      push that we see the wait queue active. Hence we no longer will
      leave items on the AIL that are beyond the LSN sampled at the start
      of a sync push.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChandan Babu R <chandan.babu@oracle.com>
      Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: NLong Li <leo.lilong@huawei.com>
      Reviewed-by: NYang Erkun <yangerkun@huawei.com>
      Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
      Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
      6efc0ef9
    • D
      xfs: check buffer pin state after locking in delwri_submit · ceb8e84c
      Dave Chinner 提交于
      mainline inclusion
      from mainline-v5.17-rc6
      commit dbd0f529
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I4KIAO
      CVE: NA
      
      Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=dbd0f5299302f8506637592e2373891a748c6990
      
      --------------------------------
      
      AIL flushing can get stuck here:
      
      [316649.005769] INFO: task xfsaild/pmem1:324525 blocked for more than 123 seconds.
      [316649.007807]       Not tainted 5.17.0-rc6-dgc+ #975
      [316649.009186] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [316649.011720] task:xfsaild/pmem1   state:D stack:14544 pid:324525 ppid:     2 flags:0x00004000
      [316649.014112] Call Trace:
      [316649.014841]  <TASK>
      [316649.015492]  __schedule+0x30d/0x9e0
      [316649.017745]  schedule+0x55/0xd0
      [316649.018681]  io_schedule+0x4b/0x80
      [316649.019683]  xfs_buf_wait_unpin+0x9e/0xf0
      [316649.021850]  __xfs_buf_submit+0x14a/0x230
      [316649.023033]  xfs_buf_delwri_submit_buffers+0x107/0x280
      [316649.024511]  xfs_buf_delwri_submit_nowait+0x10/0x20
      [316649.025931]  xfsaild+0x27e/0x9d0
      [316649.028283]  kthread+0xf6/0x120
      [316649.030602]  ret_from_fork+0x1f/0x30
      
      in the situation where flushing gets preempted between the unpin
      check and the buffer trylock under nowait conditions:
      
      	blk_start_plug(&plug);
      	list_for_each_entry_safe(bp, n, buffer_list, b_list) {
      		if (!wait_list) {
      			if (xfs_buf_ispinned(bp)) {
      				pinned++;
      				continue;
      			}
      Here >>>>>>
      			if (!xfs_buf_trylock(bp))
      				continue;
      
      This means submission is stuck until something else triggers a log
      force to unpin the buffer.
      
      To get onto the delwri list to begin with, the buffer pin state has
      already been checked, and hence it's relatively rare we get a race
      between flushing and encountering a pinned buffer in delwri
      submission to begin with. Further, to increase the pin count the
      buffer has to be locked, so the only way we can hit this race
      without failing the trylock is to be preempted between the pincount
      check seeing zero and the trylock being run.
      
      Hence to avoid this problem, just invert the order of trylock vs
      pin check. We shouldn't hit that many pinned buffers here, so
      optimising away the trylock for pinned buffers should not matter for
      performance at all.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChandan Babu R <chandan.babu@oracle.com>
      Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: NLong Li <leo.lilong@huawei.com>
      Reviewed-by: NYang Erkun <yangerkun@huawei.com>
      Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
      Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
      ceb8e84c
    • D
      xfs: log worker needs to start before intent/unlink recovery · 080ca40e
      Dave Chinner 提交于
      mainline inclusion
      from mainline-v5.17-rc6
      commit a9a4bc8c
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I4KIAO
      CVE: NA
      
      Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=a9a4bc8c76d747aa40b30e2dfc176c781f353a08
      
      --------------------------------
      
      After 963 iterations of generic/530, it deadlocked during recovery
      on a pinned inode cluster buffer like so:
      
      XFS (pmem1): Starting recovery (logdev: internal)
      INFO: task kworker/8:0:306037 blocked for more than 122 seconds.
            Not tainted 5.17.0-rc6-dgc+ #975
      "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      task:kworker/8:0     state:D stack:13024 pid:306037 ppid:     2 flags:0x00004000
      Workqueue: xfs-inodegc/pmem1 xfs_inodegc_worker
      Call Trace:
       <TASK>
       __schedule+0x30d/0x9e0
       schedule+0x55/0xd0
       schedule_timeout+0x114/0x160
       __down+0x99/0xf0
       down+0x5e/0x70
       xfs_buf_lock+0x36/0xf0
       xfs_buf_find+0x418/0x850
       xfs_buf_get_map+0x47/0x380
       xfs_buf_read_map+0x54/0x240
       xfs_trans_read_buf_map+0x1bd/0x490
       xfs_imap_to_bp+0x4f/0x70
       xfs_iunlink_map_ino+0x66/0xd0
       xfs_iunlink_map_prev.constprop.0+0x148/0x2f0
       xfs_iunlink_remove_inode+0xf2/0x1d0
       xfs_inactive_ifree+0x1a3/0x900
       xfs_inode_unlink+0xcc/0x210
       xfs_inodegc_worker+0x1ac/0x2f0
       process_one_work+0x1ac/0x390
       worker_thread+0x56/0x3c0
       kthread+0xf6/0x120
       ret_from_fork+0x1f/0x30
       </TASK>
      task:mount           state:D stack:13248 pid:324509 ppid:324233 flags:0x00004000
      Call Trace:
       <TASK>
       __schedule+0x30d/0x9e0
       schedule+0x55/0xd0
       schedule_timeout+0x114/0x160
       __down+0x99/0xf0
       down+0x5e/0x70
       xfs_buf_lock+0x36/0xf0
       xfs_buf_find+0x418/0x850
       xfs_buf_get_map+0x47/0x380
       xfs_buf_read_map+0x54/0x240
       xfs_trans_read_buf_map+0x1bd/0x490
       xfs_imap_to_bp+0x4f/0x70
       xfs_iget+0x300/0xb40
       xlog_recover_process_one_iunlink+0x4c/0x170
       xlog_recover_process_iunlinks.isra.0+0xee/0x130
       xlog_recover_finish+0x57/0x110
       xfs_log_mount_finish+0xfc/0x1e0
       xfs_mountfs+0x540/0x910
       xfs_fs_fill_super+0x495/0x850
       get_tree_bdev+0x171/0x270
       xfs_fs_get_tree+0x15/0x20
       vfs_get_tree+0x24/0xc0
       path_mount+0x304/0xba0
       __x64_sys_mount+0x108/0x140
       do_syscall_64+0x35/0x80
       entry_SYSCALL_64_after_hwframe+0x44/0xae
       </TASK>
      task:xfsaild/pmem1   state:D stack:14544 pid:324525 ppid:     2 flags:0x00004000
      Call Trace:
       <TASK>
       __schedule+0x30d/0x9e0
       schedule+0x55/0xd0
       io_schedule+0x4b/0x80
       xfs_buf_wait_unpin+0x9e/0xf0
       __xfs_buf_submit+0x14a/0x230
       xfs_buf_delwri_submit_buffers+0x107/0x280
       xfs_buf_delwri_submit_nowait+0x10/0x20
       xfsaild+0x27e/0x9d0
       kthread+0xf6/0x120
       ret_from_fork+0x1f/0x30
      
      We have the mount process waiting on an inode cluster buffer read,
      inodegc doing unlink waiting on the same inode cluster buffer, and
      the AIL push thread blocked in writeback waiting for the inode
      cluster buffer to become unpinned.
      
      What has happened here is that the AIL push thread has raced with
      the inodegc process modifying, committing and pinning the inode
      cluster buffer here in xfs_buf_delwri_submit_buffers() here:
      
      	blk_start_plug(&plug);
      	list_for_each_entry_safe(bp, n, buffer_list, b_list) {
      		if (!wait_list) {
      			if (xfs_buf_ispinned(bp)) {
      				pinned++;
      				continue;
      			}
      Here >>>>>>
      			if (!xfs_buf_trylock(bp))
      				continue;
      
      Basically, the AIL has found the buffer wasn't pinned and got the
      lock without blocking, but then the buffer was pinned. This implies
      the processing here was pre-empted between the pin check and the
      lock, because the pin count can only be increased while holding the
      buffer locked. Hence when it has gone to submit the IO, it has
      blocked waiting for the buffer to be unpinned.
      
      With all executing threads now waiting on the buffer to be unpinned,
      we normally get out of situations like this via the background log
      worker issuing a log force which will unpinned stuck buffers like
      this. But at this point in recovery, we haven't started the log
      worker. In fact, the first thing we do after processing intents and
      unlinked inodes is *start the log worker*. IOWs, we start it too
      late to have it break deadlocks like this.
      
      Avoid this and any other similar deadlock vectors in intent and
      unlinked inode recovery by starting the log worker before we recover
      intents and unlinked inodes. This part of recovery runs as though
      the filesystem is fully active, so we really should have the same
      infrastructure running as we normally do at runtime.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: NChandan Babu R <chandan.babu@oracle.com>
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: NLong Li <leo.lilong@huawei.com>
      Reviewed-by: NYang Erkun <yangerkun@huawei.com>
      Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
      Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
      080ca40e
  2. 10 4月, 2023 1 次提交
    • O
      !256 sched: Supprot dynamic affinity in scheduler · 23c0e711
      openeuler-ci-bot 提交于
      Merge Pull Request from: @zhangjian210 
       
      This pathchset support dynamic affinity feature.
      
      Dynamic affinity set preferred cpus for task. When the utilization of
      taskgroup's preferred cpu is low, task only run in cpus preferred to
      enhance cpu resource locality and reduce interference between task cgroups,
      otherwise task can burst preferred cpus to use external cpu within
      cpus allowed. 
       
      Link:https://gitee.com/openeuler/kernel/pulls/256 
      
      Reviewed-by: Zucheng Zheng <zhengzucheng@huawei.com> 
      Reviewed-by: Jialin Zhang <zhangjialin11@huawei.com> 
      Signed-off-by: Jialin Zhang <zhangjialin11@huawei.com> 
      23c0e711
  3. 08 4月, 2023 6 次提交
  4. 07 4月, 2023 1 次提交
    • O
      !323 [OLK-5.10] sched: Introduce priority load balance for CFS · 80748ad9
      openeuler-ci-bot 提交于
      Merge Pull Request from: @zhangsong234 
       
      euleros inclusion
      category: feature
      bugzilla: https://gitee.com/openeuler/kernel/issues/I5HF3M
      CVE: NA
      
      Add new sysctl interface:
      `/proc/sys/kernel/sched_prio_load_balance_enabled`
      
       0: default behavior
       1: enable priority load balance for qos scheduler
      
      For tasks co-location with qos scheduler, when CFS do load balance,
      it is reasonable to prefer migrating online(Latency Sensitive) tasks.
      So the CFS load balance can be changed to below:
      
      1) `cfs_tasks` list is owned by online tasks.
      2) Add new `cfs_offline_tasks` list which is owned by offline tasks.
      3) Prefer to migrate the online tasks of `cfs_tasks` list to dst rq.
      
      In the scenario of hyperthread interference, if the smt expeller feature
      enabled, CPU A and CPU B are two hyperthreads on a physical core,
      CPU A runs online tasks while CPU B only has offline tasks, The offline
      tasks on CPU B are expelled by the online tasks on CPU A and cannot be
      scheduled. However, when load balance is triggered, before CPU B can
      migrate some online tasks from CPU A, the load on the two cpus is already
      balanced. As a result, CPU B cannot run online tasks and online tasks
      cannot be evenly distributed among different cpus. 
       
      Link:https://gitee.com/openeuler/kernel/pulls/323 
      
      Reviewed-by: Zheng Zengkai <zhengzengkai@huawei.com> 
      Reviewed-by: Zucheng Zheng <zhengzucheng@huawei.com> 
      Reviewed-by: Liu Chao <liuchao173@huawei.com> 
      Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com> 
      80748ad9