提交 · 600130a3b3f73d76dbfa38d63f68d1c2520e582d · openeuler / Kernel

25 4月, 2023 2 次提交

kabi: Fix kabi breakage without build warning. · 600130a3

由 Xie Haocheng 提交于 4月 25, 2023

amd inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I6XNL2
CVE: NA

-------------------------------------------------
Error report detail:
*** ERROR - ABI BREAKAGE WAS DETECTED ***

The following symbols have been changed (this will cause an ABI breakage):
new kabi:
0x65d25289	__SCK__tp_func_xdp_exception	vmlinux	EXPORT_SYMBOL_GPL
0x5e9265ee	__tracepoint_xdp_exception	vmlinux	EXPORT_SYMBOL_GPL
old kabi:
0x5e0fbbff	__SCK__tp_func_xdp_exception	vmlinux	EXPORT_SYMBOL_GPL
0x017cc464	__tracepoint_xdp_exception	vmlinux	EXPORT_SYMBOL_GPL
Signed-off-by: NXie Haocheng <haocheng.xie@amd.com>

600130a3

Revert "kabi: Fix kabi breakage caused by commit ." · fb465f68

由 Xie Haocheng 提交于 4月 25, 2023

amd inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I6XNL2
CVE: NA

-------------------------------------------------

This reverts commit a9cbff64.

This patch could introduce build warnings, should be reverted.
The build warning messages:
WARNING: modpost: EXPORT symbol "__SCT__perf_lopwr_cb" [vmlinux] version generation failed, symbol will not be versioned.
    WARNING: modpost: EXPORT symbol "__SCT__perf_lopwr_cb" [vmlinux] version generation failed, symbol will not be versioned.
Signed-off-by: NXie Haocheng <haocheng.xie@amd.com>

fb465f68

19 4月, 2023 1 次提交

!581 Add support for SVE Direct WQE for hns · f7ecc8ac

由 openeuler-ci-bot 提交于 4月 19, 2023

Merge Pull Request from: @stinft 
 
Support SVE DWQE to increase the performance of SoC.
#I6VLLM 
 
Link:https://gitee.com/openeuler/kernel/pulls/581 

Reviewed-by: Chengchang Tang <tangchengchang@huawei.com> 
Reviewed-by: Jialin Zhang <zhangjialin11@huawei.com> 
Signed-off-by: Jialin Zhang <zhangjialin11@huawei.com>

f7ecc8ac

18 4月, 2023 2 次提交

!585 x86/speculation: Allow enabling STIBP with legacy IBRS · e77c0a41

由 openeuler-ci-bot 提交于 4月 18, 2023

Merge Pull Request from: @ci-robot 
 
PR sync from:  Wei Li <liwei391@huawei.com>
 https://mailweb.openeuler.org/hyperkitty/list/kernel@openeuler.org/thread/UNWNKYXBCMVT2QIU4QPROMO5UMGXKQS7/ 
 
 
Link:https://gitee.com/openeuler/kernel/pulls/585 

Reviewed-by: Xiongfeng Wang <wangxiongfeng2@huawei.com> 
Signed-off-by: Jialin Zhang <zhangjialin11@huawei.com>

e77c0a41

x86/speculation: Allow enabling STIBP with legacy IBRS · 811a507c

由 KP Singh 提交于 4月 18, 2023

stable inclusion
from stable-v5.10.173
commit abfed855f05863d292de2d0ebab4656791bab9c8
category: bugfix
bugzilla: https://gitee.com/src-openeuler/kernel/issues/I6V7TU
CVE: CVE-2023-1998

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=abfed855f05863d292de2d0ebab4656791bab9c8

--------------------------------

commit 6921ed90 upstream.

When plain IBRS is enabled (not enhanced IBRS), the logic in
spectre_v2_user_select_mitigation() determines that STIBP is not needed.

The IBRS bit implicitly protects against cross-thread branch target
injection. However, with legacy IBRS, the IBRS bit is cleared on
returning to userspace for performance reasons which leaves userspace
threads vulnerable to cross-thread branch target injection against which
STIBP protects.

Exclude IBRS from the spectre_v2_in_ibrs_mode() check to allow for
enabling STIBP (through seccomp/prctl() by default or always-on, if
selected by spectre_v2_user kernel cmdline parameter).

  [ bp: Massage. ]

Fixes: 7c693f54 ("x86/speculation: Add spectre_v2=ibrs option to support Kernel IBRS")
Reported-by: NJosé Oliveira <joseloliveira11@gmail.com>
Reported-by: NRodrigo Branco <rodrigo@kernelhacking.com>
Signed-off-by: NKP Singh <kpsingh@kernel.org>
Signed-off-by: NBorislav Petkov (AMD) <bp@alien8.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20230220120127.1975241-1-kpsingh@kernel.org
Link: https://lore.kernel.org/r/20230221184908.2349578-1-kpsingh@kernel.orgSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NWei Li <liwei391@huawei.com>

811a507c

17 4月, 2023 3 次提交

RDMA/hns: Add SVE DIRECT WQE flag to support libhns · 2c61dcac

由 Yixing Liu 提交于 4月 17, 2023

driver inclusion
category: bugfix
bugzilla: https://gitee.com/src-openeuler/rdma-core/issues/I6VLLM

---------------------------------------------------------------

Added SVE DWQE flag to control libhns SVE DWQE function.
Signed-off-by: NYixing Liu <liuyixing1@huawei.com>
Reviewed-by: NYangyang Li <liyangyang20@huawei.com>

2c61dcac

!575 Backport CVEs and bugfixes · 693a53a5

由 openeuler-ci-bot 提交于 4月 17, 2023

Merge Pull Request from: @zhangjialin11 
 
Pull new CVEs:
CVE-2023-1859
CVE-2023-1670
CVE-2023-1611

net bugfixes from Ziyang Xuan
ubi bugfixes from ZhaoLong Wang
ftrace and ring-buffer bugfixes from Zheng Yejian
perf bugfix from Yang Jihong
loop bugfix from Zhong Jinghua
ext4 bugfixes from Zhihao Cheng
dm crypt bugfix from yangerkun
driver core bugfixes from Zhang Zekun
xfs bugfixes from Long Li 
 
Link:https://gitee.com/openeuler/kernel/pulls/575 

Reviewed-by: Zheng Zengkai <zhengzengkai@huawei.com> 
Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com>

693a53a5

!576 Support congestion control algorithm configuration · 622c640a

由 openeuler-ci-bot 提交于 4月 17, 2023

Merge Pull Request from: @stinft 
 
```bash
Feature information:
1.RDMA/hns: Modify congestion abbreviation
The currently used abbreviation of cong cannot clearly
indicate the meaning, so the full name congest is used instead.

2.RDMA/hns: Support congestion control algorithm configuration at QP granularity
This patch supports to configure congestion control algorithm
based on QP granulariy. The configuration will be sent to
driver from user space. And then driver configures the selected
algorithm into QPC.
The current XRC type QP cannot deliver the configured
algorithm to kernel space, so the driver will set the default
algorithm for XRC type QP. And the default algorithm type is
controlled by the firmware.
```
bugzilla：#I6N1G4
 
 
Link:https://gitee.com/openeuler/kernel/pulls/576 

Reviewed-by: Chengchang Tang <tangchengchang@huawei.com> 
Reviewed-by: Jialin Zhang <zhangjialin11@huawei.com> 
Signed-off-by: Jialin Zhang <zhangjialin11@huawei.com>

622c640a

13 4月, 2023 3 次提交

!424 [OLK-5.10] openeuer/MAINTAINER: Add maintainers for Kunpeng SoC. · 6ca12438

由 openeuler-ci-bot 提交于 4月 13, 2023

Merge Pull Request from: @hellotcc 
 
Add Kunpeng SoC maintainers to openEuler/MAINTAINERS. 
 
Link:https://gitee.com/openeuler/kernel/pulls/424 

Reviewed-by: Xie XiuQi <xiexiuqi@huawei.com> 
Signed-off-by: Xie XiuQi <xiexiuqi@huawei.com>

6ca12438

RDMA/hns: Support congestion control algorithm configuration at QP granularity · 09f1b7cb

由 Yixing Liu 提交于 4月 12, 2023

driver inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I6N1G4

---------------------------------------------------------------

This patch supports to configure congestion control algorithm
based on QP granulariy. The configuration will be sent to
driver from user space. And then driver configures the selected
algorithm into QPC.

The current XRC type QP cannot deliver the configured
algorithm to kernel space, so the driver will set the default
algorithm for XRC type QP. And the default algorithm type is
controlled by the firmware.
Signed-off-by: NYixing Liu <liuyixing1@huawei.com>
Reviewed-by: NYangyang Li <liyangyang20@huawei.com>

09f1b7cb

RDMA/hns: Modify congestion abbreviation · 87d0ab38

由 Yixing Liu 提交于 4月 12, 2023

driver inclusion
category: cleanup
bugzilla: https://gitee.com/openeuler/kernel/issues/I6N1G4

---------------------------------------------------------------

The currently used abbreviation of cong cannot clearly
indicate the meaning, so the full name congest is used instead.
Signed-off-by: NYixing Liu <liuyixing1@huawei.com>
Reviewed-by: NYangyang Li <liyangyang20@huawei.com>

87d0ab38

12 4月, 2023 29 次提交

sctp: Call inet6_destroy_sock() via sk->sk_destruct(). · c1065432

由 Kuniyuki Iwashima 提交于 4月 12, 2023

mainline inclusion
from mainline-v6.2-rc1
commit 6431b0f6
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I6TPN9
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v6.3-rc2&id=6431b0f6ff1633ae598667e4cdd93830074a03e8

--------------------------------

After commit d38afeec ("tcp/udp: Call inet6_destroy_sock()
in IPv6 sk->sk_destruct()."), we call inet6_destroy_sock() in
sk->sk_destruct() by setting inet6_sock_destruct() to it to make
sure we do not leak inet6-specific resources.

SCTP sets its own sk->sk_destruct() in the sctp_init_sock(), and
SCTPv6 socket reuses it as the init function.

To call inet6_sock_destruct() from SCTPv6 sk->sk_destruct(), we
set sctp_v6_destruct_sock() in a new init function.
Signed-off-by: NKuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NZiyang Xuan <william.xuanziyang@huawei.com>
Reviewed-by: NLiu Jian <liujian56@huawei.com>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>

c1065432

dccp: Call inet6_destroy_sock() via sk->sk_destruct(). · fb3defd5

由 Kuniyuki Iwashima 提交于 4月 12, 2023

mainline inclusion
from mainline-v6.2-rc1
commit 1651951e
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I6TPN9
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v6.3-rc2&id=1651951ebea54970e0bda60c638fc2eee7a6218f

--------------------------------

After commit d38afeec ("tcp/udp: Call inet6_destroy_sock()
in IPv6 sk->sk_destruct()."), we call inet6_destroy_sock() in
sk->sk_destruct() by setting inet6_sock_destruct() to it to make
sure we do not leak inet6-specific resources.

DCCP sets its own sk->sk_destruct() in the dccp_init_sock(), and
DCCPv6 socket shares it by calling the same init function via
dccp_v6_init_sock().

To call inet6_sock_destruct() from DCCPv6 sk->sk_destruct(), we
export it and set dccp_v6_sk_destruct() in the init function.
Signed-off-by: NKuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NZiyang Xuan <william.xuanziyang@huawei.com>
Reviewed-by: NLiu Jian <liujian56@huawei.com>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>

fb3defd5

net: Remove WARN_ON_ONCE(sk->sk_forward_alloc) from sk_stream_kill_queues(). · 0e6c199a

由 Kuniyuki Iwashima 提交于 4月 12, 2023

stable inclusion
from stable-v5.10.171
commit 3e4bbd1f38a8d35bd2d3aaffdb5f6ada546b669a
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I6TPN9
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=3e4bbd1f38a8d35bd2d3aaffdb5f6ada546b669a

--------------------------------

commit 62ec33b4 upstream.

Christoph Paasch reported that commit b5fc2923 ("inet6: Remove
inet6_destroy_sock() in sk->sk_prot->destroy().") started triggering
WARN_ON_ONCE(sk->sk_forward_alloc) in sk_stream_kill_queues().  [0 - 2]
Also, we can reproduce it by a program in [3].

In the commit, we delay freeing ipv6_pinfo.pktoptions from sk->destroy()
to sk->sk_destruct(), so sk->sk_forward_alloc is no longer zero in
inet_csk_destroy_sock().

The same check has been in inet_sock_destruct() from at least v2.6,
we can just remove the WARN_ON_ONCE().  However, among the users of
sk_stream_kill_queues(), only CAIF is not calling inet_sock_destruct().
Thus, we add the same WARN_ON_ONCE() to caif_sock_destructor().

[0]: https://lore.kernel.org/netdev/39725AB4-88F1-41B3-B07F-949C5CAEFF4F@icloud.com/
[1]: https://github.com/multipath-tcp/mptcp_net-next/issues/341
[2]:
WARNING: CPU: 0 PID: 3232 at net/core/stream.c:212 sk_stream_kill_queues+0x2f9/0x3e0
Modules linked in:
CPU: 0 PID: 3232 Comm: syz-executor.0 Not tainted 6.2.0-rc5ab24eb4698afbe147b424149c529e2a43ec24eb5 #2
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
RIP: 0010:sk_stream_kill_queues+0x2f9/0x3e0
Code: 03 0f b6 04 02 84 c0 74 08 3c 03 0f 8e ec 00 00 00 8b ab 08 01 00 00 e9 60 ff ff ff e8 d0 5f b6 fe 0f 0b eb 97 e8 c7 5f b6 fe <0f> 0b eb a0 e8 be 5f b6 fe 0f 0b e9 6a fe ff ff e8 02 07 e3 fe e9
RSP: 0018:ffff88810570fc68 EFLAGS: 00010293
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
RDX: ffff888101f38f40 RSI: ffffffff8285e529 RDI: 0000000000000005
RBP: 0000000000000ce0 R08: 0000000000000005 R09: 0000000000000000
R10: 0000000000000ce0 R11: 0000000000000001 R12: ffff8881009e9488
R13: ffffffff84af2cc0 R14: 0000000000000000 R15: ffff8881009e9458
FS:  00007f7fdfbd5800(0000) GS:ffff88811b600000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000001b32923000 CR3: 00000001062fc006 CR4: 0000000000170ef0
Call Trace:
 <TASK>
 inet_csk_destroy_sock+0x1a1/0x320
 __tcp_close+0xab6/0xe90
 tcp_close+0x30/0xc0
 inet_release+0xe9/0x1f0
 inet6_release+0x4c/0x70
 __sock_release+0xd2/0x280
 sock_close+0x15/0x20
 __fput+0x252/0xa20
 task_work_run+0x169/0x250
 exit_to_user_mode_prepare+0x113/0x120
 syscall_exit_to_user_mode+0x1d/0x40
 do_syscall_64+0x48/0x90
 entry_SYSCALL_64_after_hwframe+0x72/0xdc
RIP: 0033:0x7f7fdf7ae28d
Code: c1 20 00 00 75 10 b8 03 00 00 00 0f 05 48 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 ee fb ff ff 48 89 04 24 b8 03 00 00 00 0f 05 <48> 8b 3c 24 48 89 c2 e8 37 fc ff ff 48 89 d0 48 83 c4 08 48 3d 01
RSP: 002b:00000000007dfbb0 EFLAGS: 00000293 ORIG_RAX: 0000000000000003
RAX: 0000000000000000 RBX: 0000000000000004 RCX: 00007f7fdf7ae28d
RDX: 0000000000000000 RSI: ffffffffffffffff RDI: 0000000000000003
RBP: 0000000000000000 R08: 000000007f338e0f R09: 0000000000000e0f
R10: 000000007f338e13 R11: 0000000000000293 R12: 00007f7fdefff000
R13: 00007f7fdefffcd8 R14: 00007f7fdefffce0 R15: 00007f7fdefffcd8
 </TASK>

[3]: https://lore.kernel.org/netdev/20230208004245.83497-1-kuniyu@amazon.com/

Fixes: b5fc2923 ("inet6: Remove inet6_destroy_sock() in sk->sk_prot->destroy().")
Reported-by: Nsyzbot <syzkaller@googlegroups.com>
Reported-by: NChristoph Paasch <christophpaasch@icloud.com>
Signed-off-by: NKuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NZiyang Xuan <william.xuanziyang@huawei.com>
Reviewed-by: NLiu Jian <liujian56@huawei.com>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>

0e6c199a

inet6: Remove inet6_destroy_sock() in sk->sk_prot->destroy(). · 0c0b5ebd

由 Kuniyuki Iwashima 提交于 4月 12, 2023

mainline inclusion
from mainline-v6.2-rc1
commit b5fc2923
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I6TPN9
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v6.3-rc2&id=b5fc29233d28be7a3322848ebe73ac327559cdb9

--------------------------------

After commit d38afeec ("tcp/udp: Call inet6_destroy_sock()
in IPv6 sk->sk_destruct()."), we call inet6_destroy_sock() in
sk->sk_destruct() by setting inet6_sock_destruct() to it to make
sure we do not leak inet6-specific resources.

Now we can remove unnecessary inet6_destroy_sock() calls in
sk->sk_prot->destroy().

DCCP and SCTP have their own sk->sk_destruct() function, so we
change them separately in the following patches.
Signed-off-by: NKuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: NMatthieu Baerts <matthieu.baerts@tessares.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Conflicts:
	net/ipv6/ping.c
Signed-off-by: NZiyang Xuan <william.xuanziyang@huawei.com>
Reviewed-by: NLiu Jian <liujian56@huawei.com>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>

0c0b5ebd

tcp/udp: Call inet6_destroy_sock() in IPv6 sk->sk_destruct(). · da9581d8

由 Kuniyuki Iwashima 提交于 4月 12, 2023

mainline inclusion
from mainline-v6.1-rc1
commit d38afeec
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I6TPN9
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v6.3-rc2&id=d38afeec26ed4739c640bf286c270559aab2ba5f

--------------------------------

Originally, inet6_sk(sk)->XXX were changed under lock_sock(), so we were
able to clean them up by calling inet6_destroy_sock() during the IPv6 ->
IPv4 conversion by IPV6_ADDRFORM.  However, commit 03485f2a ("udpv6:
Add lockless sendmsg() support") added a lockless memory allocation path,
which could cause a memory leak:

setsockopt(IPV6_ADDRFORM)                 sendmsg()
+-----------------------+                 +-------+
- do_ipv6_setsockopt(sk, ...)             - udpv6_sendmsg(sk, ...)
  - sockopt_lock_sock(sk)                   ^._ called via udpv6_prot
    - lock_sock(sk)                             before WRITE_ONCE()
  - WRITE_ONCE(sk->sk_prot, &tcp_prot)
  - inet6_destroy_sock()                    - if (!corkreq)
  - sockopt_release_sock(sk)                  - ip6_make_skb(sk, ...)
    - release_sock(sk)                          ^._ lockless fast path for
                                                    the non-corking case

                                                - __ip6_append_data(sk, ...)
                                                  - ipv6_local_rxpmtu(sk, ...)
                                                    - xchg(&np->rxpmtu, skb)
                                                      ^._ rxpmtu is never freed.

                                                - goto out_no_dst;

                                            - lock_sock(sk)

For now, rxpmtu is only the case, but not to miss the future change
and a similar bug fixed in commit e2732600 ("net: ping6: Fix
memleak in ipv6_renew_options()."), let's set a new function to IPv6
sk->sk_destruct() and call inet6_cleanup_sock() there.  Since the
conversion does not change sk->sk_destruct(), we can guarantee that
we can clean up IPv6 resources finally.

We can now remove all inet6_destroy_sock() calls from IPv6 protocol
specific ->destroy() functions, but such changes are invasive to
backport.  So they can be posted as a follow-up later for net-next.

Fixes: 03485f2a ("udpv6: Add lockless sendmsg() support")
Signed-off-by: NKuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>
Signed-off-by: NZiyang Xuan <william.xuanziyang@huawei.com>
Reviewed-by: NLiu Jian <liujian56@huawei.com>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>

da9581d8

udp: Call inet6_destroy_sock() in setsockopt(IPV6_ADDRFORM). · 9e53e70d

由 Kuniyuki Iwashima 提交于 4月 12, 2023

mainline inclusion
from mainline-v6.1-rc1
commit 21985f43
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I6TPN9
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v6.3-rc2&id=21985f43376cee092702d6cb963ff97a9d2ede68

--------------------------------

Commit 4b340ae2 ("IPv6: Complete IPV6_DONTFRAG support") forgot
to add a change to free inet6_sk(sk)->rxpmtu while converting an IPv6
socket into IPv4 with IPV6_ADDRFORM.  After conversion, sk_prot is
changed to udp_prot and ->destroy() never cleans it up, resulting in
a memory leak.

This is due to the discrepancy between inet6_destroy_sock() and
IPV6_ADDRFORM, so let's call inet6_destroy_sock() from IPV6_ADDRFORM
to remove the difference.

However, this is not enough for now because rxpmtu can be changed
without lock_sock() after commit 03485f2a ("udpv6: Add lockless
sendmsg() support").  We will fix this case in the following patch.

Note we will rename inet6_destroy_sock() to inet6_cleanup_sock() and
remove unnecessary inet6_destroy_sock() calls in sk_prot->destroy()
in the future.

Fixes: 4b340ae2 ("IPv6: Complete IPV6_DONTFRAG support")
Signed-off-by: NKuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>
Signed-off-by: NZiyang Xuan <william.xuanziyang@huawei.com>
Reviewed-by: NLiu Jian <liujian56@huawei.com>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>

9e53e70d

9p/xen : Fix use after free bug in xen_9pfs_front_remove due to race condition · e0c18fc3

由 Zheng Wang 提交于 4月 12, 2023

maillist inclusion
category: bugfix
bugzilla: 188655, https://gitee.com/src-openeuler/kernel/issues/I6T36H
CVE: CVE-2023-1859

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=ea4f1009408efb4989a0f139b70fb338e7f687d0

----------------------------------------

In xen_9pfs_front_probe, it calls xen_9pfs_front_alloc_dataring
to init priv->rings and bound &ring->work with p9_xen_response.

When it calls xen_9pfs_front_event_handler to handle IRQ requests,
it will finally call schedule_work to start the work.

When we call xen_9pfs_front_remove to remove the driver, there
may be a sequence as follows:

Fix it by finishing the work before cleanup in xen_9pfs_front_free.

Note that, this bug is found by static analysis, which might be
false positive.

CPU0                  CPU1

                     |p9_xen_response
xen_9pfs_front_remove|
  xen_9pfs_front_free|
kfree(priv)          |
//free priv          |
                     |p9_tag_lookup
                     |//use priv->client

Fixes: 71ebd719 ("xen/9pfs: connect to the backend")
Signed-off-by: NZheng Wang <zyytlz.wz@163.com>
Reviewed-by: NMichal Swiatkowski <michal.swiatkowski@linux.intel.com>
Signed-off-by: NEric Van Hensbergen <ericvh@kernel.org>
Signed-off-by: NLu Wei <luwei32@huawei.com>
Reviewed-by: NYue Haibing <yuehaibing@huawei.com>
Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>

e0c18fc3

ubi: Fix failure attaching when vid_hdr offset equals to (sub)page size · 2893797b

由 Zhihao Cheng 提交于 4月 12, 2023

maillist inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I6U6XK

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit?id=1e020e1b96afdecd20680b5b5be2a6ffc3d27628

--------------------------------

Following process will make ubi attaching failed since commit
1b42b1a3 ("ubi: ensure that VID header offset ... size"):

ID="0xec,0xa1,0x00,0x15" # 128M 128KB 2KB
modprobe nandsim id_bytes=$ID
flash_eraseall /dev/mtd0
modprobe ubi mtd="0,2048"  # set vid_hdr offset as 2048 (one page)
(dmesg):
  ubi0 error: ubi_attach_mtd_dev [ubi]: VID header offset 2048 too large.
  UBI error: cannot attach mtd0
  UBI error: cannot initialize UBI, error -22

Rework original solution, the key point is making sure
'vid_hdr_shift + UBI_VID_HDR_SIZE < ubi->vid_hdr_alsize',
so we should check vid_hdr_shift rather not vid_hdr_offset.
Then, ubi still support (sub)page aligined VID header offset.

Fixes: 1b42b1a3 ("ubi: ensure that VID header offset ... size")
Signed-off-by: NZhihao Cheng <chengzhihao1@huawei.com>
Tested-by: NNicolas Schichan <nschichan@freebox.fr>
Tested-by: Miquel Raynal <miquel.raynal@bootlin.com> # v5.10, v4.19
Signed-off-by: NRichard Weinberger <richard@nod.at>
Signed-off-by: NZhaoLong Wang <wangzhaolong1@huawei.com>
Reviewed-by: NZhihao Cheng <chengzhihao1@huawei.com>
Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>

2893797b

ubi: ensure that VID header offset + VID header size <= alloc, size · 9441f997

由 George Kennedy 提交于 4月 12, 2023

stable inclusion
from stable-v5.10.173
commit 846bfba34175c23b13cc2023c2d67b96e8c14c43
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I6PMQI

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=846bfba34175c23b13cc2023c2d67b96e8c14c43

--------------------------------

[ Upstream commit 1b42b1a3 ]

Ensure that the VID header offset + VID header size does not exceed
the allocated area to avoid slab OOB.

BUG: KASAN: slab-out-of-bounds in crc32_body lib/crc32.c:111 [inline]
BUG: KASAN: slab-out-of-bounds in crc32_le_generic lib/crc32.c:179 [inline]
BUG: KASAN: slab-out-of-bounds in crc32_le_base+0x58c/0x626 lib/crc32.c:197
Read of size 4 at addr ffff88802bb36f00 by task syz-executor136/1555

CPU: 2 PID: 1555 Comm: syz-executor136 Tainted: G        W
6.0.0-1868 #1
Hardware name: Red Hat KVM, BIOS 1.13.0-2.module+el8.3.0+7860+a7792d29
04/01/2014
Call Trace:
  <TASK>
  __dump_stack lib/dump_stack.c:88 [inline]
  dump_stack_lvl+0x85/0xad lib/dump_stack.c:106
  print_address_description mm/kasan/report.c:317 [inline]
  print_report.cold.13+0xb6/0x6bb mm/kasan/report.c:433
  kasan_report+0xa7/0x11b mm/kasan/report.c:495
  crc32_body lib/crc32.c:111 [inline]
  crc32_le_generic lib/crc32.c:179 [inline]
  crc32_le_base+0x58c/0x626 lib/crc32.c:197
  ubi_io_write_vid_hdr+0x1b7/0x472 drivers/mtd/ubi/io.c:1067
  create_vtbl+0x4d5/0x9c4 drivers/mtd/ubi/vtbl.c:317
  create_empty_lvol drivers/mtd/ubi/vtbl.c:500 [inline]
  ubi_read_volume_table+0x67b/0x288a drivers/mtd/ubi/vtbl.c:812
  ubi_attach+0xf34/0x1603 drivers/mtd/ubi/attach.c:1601
  ubi_attach_mtd_dev+0x6f3/0x185e drivers/mtd/ubi/build.c:965
  ctrl_cdev_ioctl+0x2db/0x347 drivers/mtd/ubi/cdev.c:1043
  vfs_ioctl fs/ioctl.c:51 [inline]
  __do_sys_ioctl fs/ioctl.c:870 [inline]
  __se_sys_ioctl fs/ioctl.c:856 [inline]
  __x64_sys_ioctl+0x193/0x213 fs/ioctl.c:856
  do_syscall_x64 arch/x86/entry/common.c:50 [inline]
  do_syscall_64+0x3e/0x86 arch/x86/entry/common.c:80
  entry_SYSCALL_64_after_hwframe+0x63/0x0
RIP: 0033:0x7f96d5cf753d
Code:
RSP: 002b:00007fffd72206f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f96d5cf753d
RDX: 0000000020000080 RSI: 0000000040186f40 RDI: 0000000000000003
RBP: 0000000000400cd0 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000400be0
R13: 00007fffd72207e0 R14: 0000000000000000 R15: 0000000000000000
  </TASK>

Allocated by task 1555:
  kasan_save_stack+0x20/0x3d mm/kasan/common.c:38
  kasan_set_track mm/kasan/common.c:45 [inline]
  set_alloc_info mm/kasan/common.c:437 [inline]
  ____kasan_kmalloc mm/kasan/common.c:516 [inline]
  __kasan_kmalloc+0x88/0xa3 mm/kasan/common.c:525
  kasan_kmalloc include/linux/kasan.h:234 [inline]
  __kmalloc+0x138/0x257 mm/slub.c:4429
  kmalloc include/linux/slab.h:605 [inline]
  ubi_alloc_vid_buf drivers/mtd/ubi/ubi.h:1093 [inline]
  create_vtbl+0xcc/0x9c4 drivers/mtd/ubi/vtbl.c:295
  create_empty_lvol drivers/mtd/ubi/vtbl.c:500 [inline]
  ubi_read_volume_table+0x67b/0x288a drivers/mtd/ubi/vtbl.c:812
  ubi_attach+0xf34/0x1603 drivers/mtd/ubi/attach.c:1601
  ubi_attach_mtd_dev+0x6f3/0x185e drivers/mtd/ubi/build.c:965
  ctrl_cdev_ioctl+0x2db/0x347 drivers/mtd/ubi/cdev.c:1043
  vfs_ioctl fs/ioctl.c:51 [inline]
  __do_sys_ioctl fs/ioctl.c:870 [inline]
  __se_sys_ioctl fs/ioctl.c:856 [inline]
  __x64_sys_ioctl+0x193/0x213 fs/ioctl.c:856
  do_syscall_x64 arch/x86/entry/common.c:50 [inline]
  do_syscall_64+0x3e/0x86 arch/x86/entry/common.c:80
  entry_SYSCALL_64_after_hwframe+0x63/0x0

The buggy address belongs to the object at ffff88802bb36e00
  which belongs to the cache kmalloc-256 of size 256
The buggy address is located 0 bytes to the right of
  256-byte region [ffff88802bb36e00, ffff88802bb36f00)

The buggy address belongs to the physical page:
page:00000000ea4d1263 refcount:1 mapcount:0 mapping:0000000000000000
index:0x0 pfn:0x2bb36
head:00000000ea4d1263 order:1 compound_mapcount:0 compound_pincount:0
flags: 0xfffffc0010200(slab|head|node=0|zone=1|lastcpupid=0x1fffff)
raw: 000fffffc0010200 ffffea000066c300 dead000000000003 ffff888100042b40
raw: 0000000000000000 0000000000100010 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
  ffff88802bb36e00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  ffff88802bb36e80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>ffff88802bb36f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
                    ^
  ffff88802bb36f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
  ffff88802bb37000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================

Fixes: 801c135c ("UBI: Unsorted Block Images")
Reported-by: Nsyzkaller <syzkaller@googlegroups.com>
Signed-off-by: NGeorge Kennedy <george.kennedy@oracle.com>
Signed-off-by: NRichard Weinberger <richard@nod.at>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NWang Hai <wanghai38@huawei.com>
Signed-off-by: NZhaoLong Wang <wangzhaolong1@huawei.com>
Reviewed-by: NZhihao Cheng <chengzhihao1@huawei.com>
Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>

9441f997

ftrace: Fix issue that 'direct->addr' not restored in modify_ftrace_direct() · 6afc8591

由 Zheng Yejian 提交于 4月 12, 2023

mainline inclusion
from mainline-v6.3-rc6
commit 2a2d8c51
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I6TQ89
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=2a2d8c51defb446e8d89a83f42f8e5cd529111e9

--------------------------------

Syzkaller report a WARNING: "WARN_ON(!direct)" in modify_ftrace_direct().

Root cause is 'direct->addr' was changed from 'old_addr' to 'new_addr' but
not restored if error happened on calling ftrace_modify_direct_caller().
Then it can no longer find 'direct' by that 'old_addr'.

To fix it, restore 'direct->addr' to 'old_addr' explicitly in error path.

Link: https://lore.kernel.org/linux-trace-kernel/20230330025223.1046087-1-zhengyejian1@huawei.com

Cc: stable@vger.kernel.org
Cc: <mhiramat@kernel.org>
Cc: <mark.rutland@arm.com>
Cc: <ast@kernel.org>
Cc: <daniel@iogearbox.net>
Fixes: 77ab7785 ("ftrace: Fix modify_ftrace_direct.")
Signed-off-by: NZheng Yejian <zhengyejian1@huawei.com>
Signed-off-by: NSteven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: NZheng Yejian <zhengyejian1@huawei.com>
Reviewed-by: NXu Kuohai <xukuohai@huawei.com>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>

6afc8591

perf/core: Fix perf_output_begin parameter is incorrectly invoked in perf_event_bpf_output · 0eb9f75b

由 Yang Jihong 提交于 4月 12, 2023

mainline inclusion
from mainline-v6.3-rc3
commit eb81a2ed
category: bugfix
bugzilla: https://gitee.com/src-openeuler/kernel/issues/I6ODHQ
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=eb81a2ed4f52be831c9fb879752d89645a312c13

--------------------------------

syzkaller reportes a KASAN issue with stack-out-of-bounds.
The call trace is as follows:
  dump_stack+0x9c/0xd3
  print_address_description.constprop.0+0x19/0x170
  __kasan_report.cold+0x6c/0x84
  kasan_report+0x3a/0x50
  __perf_event_header__init_id+0x34/0x290
  perf_event_header__init_id+0x48/0x60
  perf_output_begin+0x4a4/0x560
  perf_event_bpf_output+0x161/0x1e0
  perf_iterate_sb_cpu+0x29e/0x340
  perf_iterate_sb+0x4c/0xc0
  perf_event_bpf_event+0x194/0x2c0
  __bpf_prog_put.constprop.0+0x55/0xf0
  __cls_bpf_delete_prog+0xea/0x120 [cls_bpf]
  cls_bpf_delete_prog_work+0x1c/0x30 [cls_bpf]
  process_one_work+0x3c2/0x730
  worker_thread+0x93/0x650
  kthread+0x1b8/0x210
  ret_from_fork+0x1f/0x30

commit 267fb273 ("perf: Reduce stack usage of perf_output_begin()")
use on-stack struct perf_sample_data of the caller function.

However, perf_event_bpf_output uses incorrect parameter to convert
small-sized data (struct perf_bpf_event) into large-sized data
(struct perf_sample_data), which causes memory overwriting occurs in
__perf_event_header__init_id.

Fixes: 267fb273 ("perf: Reduce stack usage of perf_output_begin()")
Signed-off-by: NYang Jihong <yangjihong1@huawei.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20230314044735.56551-1-yangjihong1@huawei.comSigned-off-by: NYang Jihong <yangjihong1@huawei.com>
Reviewed-by: NXu Kuohai <xukuohai@huawei.com>
Reviewed-by: NZheng Yejian <zhengyejian1@huawei.com>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>

0eb9f75b

xirc2ps_cs: Fix use after free bug in xirc2ps_detach · 3c3ce918

由 Zheng Wang 提交于 4月 12, 2023

stable inclusion
from stable-v5.10.176
commit bfeeb3aaad4ee8eaaefe5d9edd9b2ccb5d9b7505
category: bugfix
bugzilla: 188641, https://gitee.com/src-openeuler/kernel/issues/I6R4MM
CVE: CVE-2023-1670

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=bfeeb3aaad4ee8eaaefe5d9edd9b2ccb5d9b7505

--------------------------------

[ Upstream commit e8d20c3d ]

In xirc2ps_probe, the local->tx_timeout_task was bounded
with xirc2ps_tx_timeout_task. When timeout occurs,
it will call xirc_tx_timeout->schedule_work to start the
work.

When we call xirc2ps_detach to remove the driver, there
may be a sequence as follows:

Stop responding to timeout tasks and complete scheduled
tasks before cleanup in xirc2ps_detach, which will fix
the problem.

CPU0                  CPU1

                    |xirc2ps_tx_timeout_task
xirc2ps_detach      |
  free_netdev       |
    kfree(dev);     |
                    |
                    | do_reset
                    |   //use dev

Fixes: 1da177e4 ("Linux-2.6.12-rc2")
Signed-off-by: NZheng Wang <zyytlz.wz@163.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NDong Chenchen <dongchenchen2@huawei.com>
Reviewed-by: NLiu Jian <liujian56@huawei.com>
Reviewed-by: NWang Weiyang <wangweiyang2@huawei.com>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>

3c3ce918

ring-buffer: Fix race while reader and writer are on the same page · 88d65db3

由 Zheng Yejian 提交于 4月 12, 2023

mainline inclusion
from mainline-v6.3-rc6
commit 6455b616
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I6TJ97

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=6455b6163d8c680366663cdb8c679514d55fc30c

--------------------------------

When user reads file 'trace_pipe', kernel keeps printing following logs
that warn at "cpu_buffer->reader_page->read > rb_page_size(reader)" in
rb_get_reader_page(). It just looks like there's an infinite loop in
tracing_read_pipe(). This problem occurs several times on arm64 platform
when testing v5.10 and below.

  Call trace:
   rb_get_reader_page+0x248/0x1300
   rb_buffer_peek+0x34/0x160
   ring_buffer_peek+0xbc/0x224
   peek_next_entry+0x98/0xbc
   __find_next_entry+0xc4/0x1c0
   trace_find_next_entry_inc+0x30/0x94
   tracing_read_pipe+0x198/0x304
   vfs_read+0xb4/0x1e0
   ksys_read+0x74/0x100
   __arm64_sys_read+0x24/0x30
   el0_svc_common.constprop.0+0x7c/0x1bc
   do_el0_svc+0x2c/0x94
   el0_svc+0x20/0x30
   el0_sync_handler+0xb0/0xb4
   el0_sync+0x160/0x180

Then I dump the vmcore and look into the problematic per_cpu ring_buffer,
I found that tail_page/commit_page/reader_page are on the same page while
reader_page->read is obviously abnormal:
  tail_page == commit_page == reader_page == {
    .write = 0x100d20,
    .read = 0x8f9f4805,  // Far greater than 0xd20, obviously abnormal!!!
    .entries = 0x10004c,
    .real_end = 0x0,
    .page = {
      .time_stamp = 0x857257416af0,
      .commit = 0xd20,  // This page hasn't been full filled.
      // .data[0...0xd20] seems normal.
    }
 }

The root cause is most likely the race that reader and writer are on the
same page while reader saw an event that not fully committed by writer.

To fix this, add memory barriers to make sure the reader can see the
content of what is committed. Since commit a0fcaaed ("ring-buffer: Fix
race between reset page and reading page") has added the read barrier in
rb_get_reader_page(), here we just need to add the write barrier.

Link: https://lore.kernel.org/linux-trace-kernel/20230325021247.2923907-1-zhengyejian1@huawei.com

Cc: stable@vger.kernel.org
Fixes: 77ae365e ("ring-buffer: make lockless")
Suggested-by: NSteven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: NZheng Yejian <zhengyejian1@huawei.com>
Signed-off-by: NSteven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: NZheng Yejian <zhengyejian1@huawei.com>
Reviewed-by: NYang Jihong <yangjihong1@huawei.com>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>

88d65db3

loop: Add parm check in loop_control_ioctl · 4e3149e0

由 Zhong Jinghua 提交于 4月 12, 2023

hulk inclusion
category: bugfix
bugzilla: 188586, https://gitee.com/openeuler/kernel/issues/I6TFPJ
CVE: NA

----------------------------------------

We found that in loop_control_ioctl, the kernel panic can be easily caused:

1. syscall(__NR_ioctl, r[1], 0x4c80, 0x80000200000ul);
Create a loop device 0x80000200000ul.
In fact, in the code, it is used as the first_minor number, and the
first_minor number is 0.
So the created loop device number is 7:0.

2. syscall(__NR_ioctl, r[2], 0x4c80, 0ul);
Create a loop device 0x0ul.
Since the 7:0 device has been created in 1, add_disk will fail because
the major and first_minor numbers are consistent.

3. syscall(__NR_ioctl, r[5], 0x4c81, 0ul);
Delete the device that failed to create, the kernel panics.

Panic like below:
BUG: KASAN: null-ptr-deref in device_del+0xb3/0x840 drivers/base/core.c:3107
Call Trace:
 kill_device drivers/base/core.c:3079 [inline]
 device_del+0xb3/0x840 drivers/base/core.c:3107
 del_gendisk+0x463/0x5f0 block/genhd.c:971
 loop_remove drivers/block/loop.c:2190 [inline]
 loop_control_ioctl drivers/block/loop.c:2289 [inline]

The stack like below:
Create loop device:
loop_control_ioctl
  loop_add
    add_disk
      device_add_disk
        bdi_register
          bdi_register_va
            device_create
              device_create_groups_vargs
                device_add
                  kfree(dev->p);
                    dev->p = NULL;

Remove loop device:
loop_control_ioctl
  loop_remove
    del_gendisk
      device_del
        kill_device
          if (dev->p->dead) // p is null

Fix it by adding a check for parm.

Fixes: 770fe30a ("loop: add management interface for on-demand device allocation")
Signed-off-by: NZhong Jinghua <zhongjinghua@huawei.com>
Reviewed-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NHou Tao <houtao1@huawei.com>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>

4e3149e0

ext4: Fix i_disksize exceeding i_size problem in paritally written case · ae4b9933

由 Zhihao Cheng 提交于 4月 12, 2023

maillist inclusion
category: bugfix
bugzilla: https://gitee.com/src-openeuler/kernel/issues/I6SMBI
CVE: NA

Reference: https://www.spinics.net/lists/linux-ext4/msg88386.html

--------------------------------

Following process makes i_disksize exceed i_size:

generic_perform_write
 copied = iov_iter_copy_from_user_atomic(len) // copied < len
 ext4_da_write_end
 | ext4_update_i_disksize
 |  new_i_size = pos + copied;
 |  WRITE_ONCE(EXT4_I(inode)->i_disksize, newsize) // update i_disksize
 | generic_write_end
 |  copied = block_write_end(copied, len) // copied = 0
 |   if (unlikely(copied < len))
 |    if (!PageUptodate(page))
 |     copied = 0;
 |  if (pos + copied > inode->i_size) // return false
 if (unlikely(copied == 0))
  goto again;
 if (unlikely(iov_iter_fault_in_readable(i, bytes))) {
  status = -EFAULT;
  break;
 }

We get i_disksize greater than i_size here, which could trigger WARNING
check 'i_size_read(inode) < EXT4_I(inode)->i_disksize' while doing dio:

ext4_dio_write_iter
 iomap_dio_rw
  __iomap_dio_rw // return err, length is not aligned to 512
 ext4_handle_inode_extension
  WARN_ON_ONCE(i_size_read(inode) < EXT4_I(inode)->i_disksize) // Oops

 WARNING: CPU: 2 PID: 2609 at fs/ext4/file.c:319
 CPU: 2 PID: 2609 Comm: aa Not tainted 6.3.0-rc2
 RIP: 0010:ext4_file_write_iter+0xbc7
 Call Trace:
  vfs_write+0x3b1
  ksys_write+0x77
  do_syscall_64+0x39

Fix it by updating 'copied' value before updating i_disksize just like
ext4_write_inline_data_end() does.

Fetch a reproducer in [Link].

Link: https://bugzilla.kernel.org/show_bug.cgi?id=217209
Fixes: 64769240 ("ext4: Add delayed allocation support in data=writeback mode")
Signed-off-by: NZhihao Cheng <chengzhihao1@huawei.com>
Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>

ae4b9933

ext4: ext4_put_super: Remove redundant checking for 'sbi->s_journal_bdev' · 2db2f17b

由 Zhihao Cheng 提交于 4月 12, 2023

maillist inclusion
category: bugfix
bugzilla: https://gitee.com/src-openeuler/kernel/issues/I6MMUV
CVE: NA

Reference: https://www.spinics.net/lists/linux-ext4/msg88237.html

--------------------------------

As discussed in [1], 'sbi->s_journal_bdev != sb->s_bdev' will always
become true if sbi->s_journal_bdev exists. Filesystem block device and
journal block device are both opened with 'FMODE_EXCL' mode, so these
two devices can't be same one. Then we can remove the redundant checking
'sbi->s_journal_bdev != sb->s_bdev' if 'sbi->s_journal_bdev' exists.

[1] https://lore.kernel.org/lkml/f86584f6-3877-ff18-47a1-2efaa12d18b2@huawei.com/Signed-off-by: NZhihao Cheng <chengzhihao1@huawei.com>
Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>

2db2f17b

ext4: Fix reusing stale buffer heads from last failed mounting · 1f6736e6

由 Zhihao Cheng 提交于 4月 12, 2023

maillist inclusion
category: bugfix
bugzilla: https://gitee.com/src-openeuler/kernel/issues/I6MMUV
CVE: NA

Reference: https://www.spinics.net/lists/linux-ext4/msg88237.html

--------------------------------

Following process makes ext4 load stale buffer heads from last failed
mounting in a new mounting operation:
mount_bdev
 ext4_fill_super
 | ext4_load_and_init_journal
 |  ext4_load_journal
 |   jbd2_journal_load
 |    load_superblock
 |     journal_get_superblock
 |      set_buffer_verified(bh) // buffer head is verified
 |   jbd2_journal_recover // failed caused by EIO
 | goto failed_mount3a // skip 'sb->s_root' initialization
 deactivate_locked_super
  kill_block_super
   generic_shutdown_super
    if (sb->s_root)
    // false, skip ext4_put_super->invalidate_bdev->
    // invalidate_mapping_pages->mapping_evict_folio->
    // filemap_release_folio->try_to_free_buffers, which
    // cannot drop buffer head.
   blkdev_put
    blkdev_put_whole
     if (atomic_dec_and_test(&bdev->bd_openers))
     // false, systemd-udev happens to open the device. Then
     // blkdev_flush_mapping->kill_bdev->truncate_inode_pages->
     // truncate_inode_folio->truncate_cleanup_folio->
     // folio_invalidate->block_invalidate_folio->
     // filemap_release_folio->try_to_free_buffers will be skipped,
     // dropping buffer head is missed again.

Second mount:
ext4_fill_super
 ext4_load_and_init_journal
  ext4_load_journal
   ext4_get_journal
    jbd2_journal_init_inode
     journal_init_common
      bh = getblk_unmovable
       bh = __find_get_block // Found stale bh in last failed mounting
      journal->j_sb_buffer = bh
   jbd2_journal_load
    load_superblock
     journal_get_superblock
      if (buffer_verified(bh))
      // true, skip journal->j_format_version = 2, value is 0
    jbd2_journal_recover
     do_one_pass
      next_log_block += count_tags(journal, bh)
      // According to journal_tag_bytes(), 'tag_bytes' calculating is
      // affected by jbd2_has_feature_csum3(), jbd2_has_feature_csum3()
      // returns false because 'j->j_format_version >= 2' is not true,
      // then we get wrong next_log_block. The do_one_pass may exit
      // early whenoccuring non JBD2_MAGIC_NUMBER in 'next_log_block'.

The filesystem is corrupted here, journal is partially replayed, and
new journal sequence number actually is already used by last mounting.

The invalidate_bdev() can drop all buffer heads even racing with bare
reading block device(eg. systemd-udev), so we can fix it by invalidating
bdev in error handling path in __ext4_fill_super().

Fetch a reproducer in [Link].

Link: https://bugzilla.kernel.org/show_bug.cgi?id=217171
Fixes: 25ed6e8a ("jbd2: enable journal clients to enable v2 checksumming")
Cc: stable@vger.kernel.org # v3.5
Conflicts:
	fs/ext4/super.c
	[ a7a79c29("ext4: unify the ext4 super block loading
	  operation") is not applied.
	  7edfd85b("ext4: Completely separate options parsing and sb
	  setup") is not applied. ]
Signed-off-by: NZhihao Cheng <chengzhihao1@huawei.com>
Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>

1f6736e6

btrfs: fix race between quota disable and quota assign ioctls · bbd9649a

由 Filipe Manana 提交于 4月 12, 2023

mainline inclusion
from mainline-v6.2-rc8
commit 2f1a6be1
category: bugfix
bugzilla: https://gitee.com/src-openeuler/kernel/issues/I6PQCT
CVE: CVE-2023-1611

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=2f1a6be12ab6c8470d5776e68644726c94257c54

--------------------------------

The quota assign ioctl can currently run in parallel with a quota disable
ioctl call. The assign ioctl uses the quota root, while the disable ioctl
frees that root, and therefore we can have a use-after-free triggered in
the assign ioctl, leading to a trace like the following when KASAN is
enabled:

  [672.723][T736] BUG: KASAN: slab-use-after-free in btrfs_search_slot+0x2962/0x2db0
  [672.723][T736] Read of size 8 at addr ffff888022ec0208 by task btrfs_search_sl/27736
  [672.724][T736]
  [672.725][T736] CPU: 1 PID: 27736 Comm: btrfs_search_sl Not tainted 6.3.0-rc3 #37
  [672.723][T736] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
  [672.727][T736] Call Trace:
  [672.728][T736]  <TASK>
  [672.728][T736]  dump_stack_lvl+0xd9/0x150
  [672.725][T736]  print_report+0xc1/0x5e0
  [672.720][T736]  ? __virt_addr_valid+0x61/0x2e0
  [672.727][T736]  ? __phys_addr+0xc9/0x150
  [672.725][T736]  ? btrfs_search_slot+0x2962/0x2db0
  [672.722][T736]  kasan_report+0xc0/0xf0
  [672.729][T736]  ? btrfs_search_slot+0x2962/0x2db0
  [672.724][T736]  btrfs_search_slot+0x2962/0x2db0
  [672.723][T736]  ? fs_reclaim_acquire+0xba/0x160
  [672.722][T736]  ? split_leaf+0x13d0/0x13d0
  [672.726][T736]  ? rcu_is_watching+0x12/0xb0
  [672.723][T736]  ? kmem_cache_alloc+0x338/0x3c0
  [672.722][T736]  update_qgroup_status_item+0xf7/0x320
  [672.724][T736]  ? add_qgroup_rb+0x3d0/0x3d0
  [672.739][T736]  ? do_raw_spin_lock+0x12d/0x2b0
  [672.730][T736]  ? spin_bug+0x1d0/0x1d0
  [672.737][T736]  btrfs_run_qgroups+0x5de/0x840
  [672.730][T736]  ? btrfs_qgroup_rescan_worker+0xa70/0xa70
  [672.738][T736]  ? __del_qgroup_relation+0x4ba/0xe00
  [672.738][T736]  btrfs_ioctl+0x3d58/0x5d80
  [672.735][T736]  ? tomoyo_path_number_perm+0x16a/0x550
  [672.737][T736]  ? tomoyo_execute_permission+0x4a0/0x4a0
  [672.731][T736]  ? btrfs_ioctl_get_supported_features+0x50/0x50
  [672.737][T736]  ? __sanitizer_cov_trace_switch+0x54/0x90
  [672.734][T736]  ? do_vfs_ioctl+0x132/0x1660
  [672.730][T736]  ? vfs_fileattr_set+0xc40/0xc40
  [672.730][T736]  ? _raw_spin_unlock_irq+0x2e/0x50
  [672.732][T736]  ? sigprocmask+0xf2/0x340
  [672.737][T736]  ? __fget_files+0x26a/0x480
  [672.732][T736]  ? bpf_lsm_file_ioctl+0x9/0x10
  [672.738][T736]  ? btrfs_ioctl_get_supported_features+0x50/0x50
  [672.736][T736]  __x64_sys_ioctl+0x198/0x210
  [672.736][T736]  do_syscall_64+0x39/0xb0
  [672.731][T736]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
  [672.739][T736] RIP: 0033:0x4556ad
  [672.742][T736]  </TASK>
  [672.743][T736]
  [672.748][T736] Allocated by task 27677:
  [672.743][T736]  kasan_save_stack+0x22/0x40
  [672.741][T736]  kasan_set_track+0x25/0x30
  [672.741][T736]  __kasan_kmalloc+0xa4/0xb0
  [672.749][T736]  btrfs_alloc_root+0x48/0x90
  [672.746][T736]  btrfs_create_tree+0x146/0xa20
  [672.744][T736]  btrfs_quota_enable+0x461/0x1d20
  [672.743][T736]  btrfs_ioctl+0x4a1c/0x5d80
  [672.747][T736]  __x64_sys_ioctl+0x198/0x210
  [672.749][T736]  do_syscall_64+0x39/0xb0
  [672.744][T736]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
  [672.756][T736]
  [672.757][T736] Freed by task 27677:
  [672.759][T736]  kasan_save_stack+0x22/0x40
  [672.759][T736]  kasan_set_track+0x25/0x30
  [672.756][T736]  kasan_save_free_info+0x2e/0x50
  [672.751][T736]  ____kasan_slab_free+0x162/0x1c0
  [672.758][T736]  slab_free_freelist_hook+0x89/0x1c0
  [672.752][T736]  __kmem_cache_free+0xaf/0x2e0
  [672.752][T736]  btrfs_put_root+0x1ff/0x2b0
  [672.759][T736]  btrfs_quota_disable+0x80a/0xbc0
  [672.752][T736]  btrfs_ioctl+0x3e5f/0x5d80
  [672.756][T736]  __x64_sys_ioctl+0x198/0x210
  [672.753][T736]  do_syscall_64+0x39/0xb0
  [672.765][T736]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
  [672.769][T736]
  [672.768][T736] The buggy address belongs to the object at ffff888022ec0000
  [672.768][T736]  which belongs to the cache kmalloc-4k of size 4096
  [672.769][T736] The buggy address is located 520 bytes inside of
  [672.769][T736]  freed 4096-byte region [ffff888022ec0000, ffff888022ec1000)
  [672.760][T736]
  [672.764][T736] The buggy address belongs to the physical page:
  [672.761][T736] page:ffffea00008bb000 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x22ec0
  [672.766][T736] head:ffffea00008bb000 order:3 entire_mapcount:0 nr_pages_mapped:0 pincount:0
  [672.779][T736] flags: 0xfff00000010200(slab|head|node=0|zone=1|lastcpupid=0x7ff)
  [672.770][T736] raw: 00fff00000010200 ffff888012842140 ffffea000054ba00 dead000000000002
  [672.770][T736] raw: 0000000000000000 0000000000040004 00000001ffffffff 0000000000000000
  [672.771][T736] page dumped because: kasan: bad access detected
  [672.778][T736] page_owner tracks the page as allocated
  [672.777][T736] page last allocated via order 3, migratetype Unmovable, gfp_mask 0xd2040(__GFP_IO|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC), pid 88
  [672.779][T736]  get_page_from_freelist+0x119c/0x2d50
  [672.779][T736]  __alloc_pages+0x1cb/0x4a0
  [672.776][T736]  alloc_pages+0x1aa/0x270
  [672.773][T736]  allocate_slab+0x260/0x390
  [672.771][T736]  ___slab_alloc+0xa9a/0x13e0
  [672.778][T736]  __slab_alloc.constprop.0+0x56/0xb0
  [672.771][T736]  __kmem_cache_alloc_node+0x136/0x320
  [672.789][T736]  __kmalloc+0x4e/0x1a0
  [672.783][T736]  tomoyo_realpath_from_path+0xc3/0x600
  [672.781][T736]  tomoyo_path_perm+0x22f/0x420
  [672.782][T736]  tomoyo_path_unlink+0x92/0xd0
  [672.780][T736]  security_path_unlink+0xdb/0x150
  [672.788][T736]  do_unlinkat+0x377/0x680
  [672.788][T736]  __x64_sys_unlink+0xca/0x110
  [672.789][T736]  do_syscall_64+0x39/0xb0
  [672.783][T736]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
  [672.784][T736] page last free stack trace:
  [672.787][T736]  free_pcp_prepare+0x4e5/0x920
  [672.787][T736]  free_unref_page+0x1d/0x4e0
  [672.784][T736]  __unfreeze_partials+0x17c/0x1a0
  [672.797][T736]  qlist_free_all+0x6a/0x180
  [672.796][T736]  kasan_quarantine_reduce+0x189/0x1d0
  [672.797][T736]  __kasan_slab_alloc+0x64/0x90
  [672.793][T736]  kmem_cache_alloc+0x17c/0x3c0
  [672.799][T736]  getname_flags.part.0+0x50/0x4e0
  [672.799][T736]  getname_flags+0x9e/0xe0
  [672.792][T736]  vfs_fstatat+0x77/0xb0
  [672.791][T736]  __do_sys_newlstat+0x84/0x100
  [672.798][T736]  do_syscall_64+0x39/0xb0
  [672.796][T736]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
  [672.790][T736]
  [672.791][T736] Memory state around the buggy address:
  [672.799][T736]  ffff888022ec0100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
  [672.805][T736]  ffff888022ec0180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
  [672.802][T736] >ffff888022ec0200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
  [672.809][T736]                       ^
  [672.809][T736]  ffff888022ec0280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
  [672.809][T736]  ffff888022ec0300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb

Fix this by having the qgroup assign ioctl take the qgroup ioctl mutex
before calling btrfs_run_qgroups(), which is what all qgroup ioctls should
call.
Reported-by: Nbutt3rflyh4ck <butterflyhuangxx@gmail.com>
Link: https://lore.kernel.org/linux-btrfs/CAFcO6XN3VD8ogmHwqRk4kbiwtpUSNySu2VAxN8waEPciCHJvMA@mail.gmail.com/
CC: stable@vger.kernel.org # 5.10+
Reviewed-by: NQu Wenruo <wqu@suse.com>
Signed-off-by: NFilipe Manana <fdmanana@suse.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NLong Li <leo.lilong@huawei.com>
Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
Reviewed-by: NWang Weiyang <wangweiyang2@huawei.com>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>

bbd9649a

dm crypt: add cond_resched() to dmcrypt_write() · d370383e

由 Mikulas Patocka 提交于 4月 12, 2023

mainline inclusion
from mainline-v6.3-rc4
commit fb294b1c
category: bugfix
bugzilla: 188393, https://gitee.com/openeuler/kernel/issues/I6JPSH

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=fb294b1c0ba982144ca467a75e7d01ff26304e2b

----------------------------------------

The loop in dmcrypt_write may be running for unbounded amount of time,
thus we need cond_resched() in it.

This commit fixes the following warning:

[ 3391.153255][   C12] watchdog: BUG: soft lockup - CPU#12 stuck for 23s! [dmcrypt_write/2:2897]
...
[ 3391.387210][   C12] Call trace:
[ 3391.390338][   C12]  blk_attempt_bio_merge.part.6+0x38/0x158
[ 3391.395970][   C12]  blk_attempt_plug_merge+0xc0/0x1b0
[ 3391.401085][   C12]  blk_mq_submit_bio+0x398/0x550
[ 3391.405856][   C12]  submit_bio_noacct+0x308/0x380
[ 3391.410630][   C12]  dmcrypt_write+0x1e4/0x208 [dm_crypt]
[ 3391.416005][   C12]  kthread+0x130/0x138
[ 3391.419911][   C12]  ret_from_fork+0x10/0x18
Reported-by: Nyangerkun <yangerkun@huawei.com>
Fixes: dc267621 ("dm crypt: offload writes to thread")
Cc: stable@vger.kernel.org
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@kernel.org>
Signed-off-by: Nyangerkun <yangerkun@huawei.com>
Reviewed-by: NHou Tao <houtao1@huawei.com>
Reviewed-by: NYu Kuai <yukuai3@huawei.com>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>

d370383e

driver core: Fix lockdep warning on wfs_lock · de1cc2ef

由 Saravana Kannan 提交于 4月 12, 2023

mainline inclusion
from mainline-v5.11-rc1
commit 7008e58c
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I6LM81
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=7008e58c63bc8468e8d16154e25d780198b3ecfc

----------------------------------------------

There's a potential deadlock with the following cycle:
wfs_lock --> device_links_lock --> kn->count

Fix this by simply dropping the lock around a list_empty() check that's
just exported to a sysfs file. The sysfs file output is an instantaneous
check anyway and the lock doesn't really add any protection.

Lockdep log:

[   48.808132]
[   48.808132] the existing dependency chain (in reverse order) is:
[   48.809069]
[   48.809069] -> #2 (kn->count){++++}:
[   48.809707]        __kernfs_remove.llvm.7860393000964815146+0x2d4/0x460
[   48.810537]        kernfs_remove_by_name_ns+0x54/0x9c
[   48.811171]        sysfs_remove_file_ns+0x18/0x24
[   48.811762]        device_del+0x2b8/0x5a8
[   48.812269]        __device_link_del+0x98/0xb8
[   48.812829]        device_links_driver_bound+0x210/0x2d8
[   48.813496]        driver_bound+0x44/0xf8
[   48.814000]        really_probe+0x340/0x6e0
[   48.814526]        driver_probe_device+0xb8/0x100
[   48.815117]        device_driver_attach+0x78/0xb8
[   48.815708]        __driver_attach+0xe0/0x194
[   48.816255]        bus_for_each_dev+0xa8/0x11c
[   48.816816]        driver_attach+0x24/0x30
[   48.817331]        bus_add_driver+0x100/0x1e0
[   48.817880]        driver_register+0x78/0x114
[   48.818427]        __platform_driver_register+0x44/0x50
[   48.819089]        0xffffffdbb3227038
[   48.819551]        do_one_initcall+0xd8/0x1e0
[   48.820099]        do_init_module+0xd8/0x298
[   48.820636]        load_module+0x3afc/0x44c8
[   48.821173]        __arm64_sys_finit_module+0xbc/0xf0
[   48.821807]        el0_svc_common+0xbc/0x1d0
[   48.822344]        el0_svc_handler+0x74/0x98
[   48.822882]        el0_svc+0x8/0xc
[   48.823310]
[   48.823310] -> #1 (device_links_lock){+.+.}:
[   48.824036]        __mutex_lock_common+0xe0/0xe44
[   48.824626]        mutex_lock_nested+0x28/0x34
[   48.825185]        device_link_add+0xd4/0x4ec
[   48.825734]        of_link_to_suppliers+0x158/0x204
[   48.826347]        of_fwnode_add_links+0x50/0x64
[   48.826928]        device_link_add_missing_supplier_links+0x90/0x11c
[   48.827725]        fw_devlink_resume+0x58/0x130
[   48.828296]        of_platform_default_populate_init+0xb4/0xd0
[   48.829030]        do_one_initcall+0xd8/0x1e0
[   48.829578]        do_initcall_level+0xb8/0xcc
[   48.830137]        do_basic_setup+0x60/0x7c
[   48.830662]        kernel_init_freeable+0x128/0x1ac
[   48.831275]        kernel_init+0x18/0x29c
[   48.831781]        ret_from_fork+0x10/0x18
[   48.832297]
[   48.832297] -> #0 (wfs_lock){+.+.}:
[   48.832922]        __lock_acquire+0xe04/0x2e20
[   48.833480]        lock_acquire+0xbc/0xec
[   48.833984]        __mutex_lock_common+0xe0/0xe44
[   48.834577]        mutex_lock_nested+0x28/0x34
[   48.835136]        waiting_for_supplier_show+0x3c/0x98
[   48.835781]        dev_attr_show+0x48/0xb4
[   48.836295]        sysfs_kf_seq_show+0xe8/0x184
[   48.836864]        kernfs_seq_show+0x48/0x8c
[   48.837401]        seq_read+0x1c8/0x600
[   48.837884]        kernfs_fop_read+0x68/0x204
[   48.838431]        __vfs_read+0x60/0x214
[   48.838925]        vfs_read+0xbc/0x15c
[   48.839397]        ksys_read+0x78/0xe4
[   48.839869]        __arm64_sys_read+0x1c/0x28
[   48.840416]        el0_svc_common+0xbc/0x1d0
[   48.840953]        el0_svc_handler+0x74/0x98
[   48.841490]        el0_svc+0x8/0xc
[   48.841917]
[   48.841917] other info that might help us debug this:
[   48.841917]
[   48.842920] Chain exists of:
[   48.842920]   wfs_lock --> device_links_lock --> kn->count
[   48.842920]
[   48.844152]  Possible unsafe locking scenario:
[   48.844152]
[   48.844895]        CPU0                    CPU1
[   48.845463]        ----                    ----
[   48.846032]   lock(kn->count);
[   48.846417]                                lock(device_links_lock);
[   48.847203]                                lock(kn->count);
[   48.847902]   lock(wfs_lock);
[   48.848276]
[   48.848276]  *** DEADLOCK ***

Reported-by: Cheng-Jui.Wang@mediatek.com
Signed-off-by: NSaravana Kannan <saravanak@google.com>
Link: https://lore.kernel.org/r/20201104205431.3795207-1-saravanak@google.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NZhang Zekun <zhangzekun11@huawei.com>
Reviewed-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>

de1cc2ef

driver core: platform: Add extra error check in devm_platform_get_irqs_affinity() · 73fa0aa6

由 John Garry 提交于 4月 12, 2023

mainline inclusion
from mainline-v5.11-rc5
commit e1dc2099
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I6LM81
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e1dc20995cb9fa04b46e8f37113a7203c906d2bf

---------------------------------------------

The current check of nvec < minvec for nvec returned from
platform_irq_count() will not detect a negative error code in nvec.

This is because minvec is unsigned, and, as such, nvec is promoted to
unsigned in that check, which will make it a huge number (if it contained
-EPROBE_DEFER).

In practice, an error should not occur in nvec for the only in-tree
user, but add a check anyway.

Fixes: e15f2fa9 ("driver core: platform: Add devm_platform_get_irqs_affinity()")
Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NJohn Garry <john.garry@huawei.com>
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/1608561055-231244-1-git-send-email-john.garry@huawei.comSigned-off-by: NZhang Zekun <zhangzekun11@huawei.com>
Reviewed-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>

73fa0aa6

xfs: don't leak memory when attr fork loading fails · 99054b56

由 Darrick J. Wong 提交于 4月 12, 2023

mainline inclusion
from mainline-v5.19-rc5
commit c78c2d09
category: bugfix
bugzilla: 187164, https://gitee.com/openeuler/kernel/issues/I4KIAO
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c78c2d0903183a41beb90c56a923e30f90fa91b9

--------------------------------

I observed the following evidence of a memory leak while running xfs/399
from the xfs fsck test suite (edited for brevity):

XFS (sde): Metadata corruption detected at xfs_attr_shortform_verify_struct.part.0+0x7b/0xb0 [xfs], inode 0x1172 attr fork
XFS: Assertion failed: ip->i_af.if_u1.if_data == NULL, file: fs/xfs/libxfs/xfs_inode_fork.c, line: 315
------------[ cut here ]------------
WARNING: CPU: 2 PID: 91635 at fs/xfs/xfs_message.c:104 assfail+0x46/0x4a [xfs]
CPU: 2 PID: 91635 Comm: xfs_scrub Tainted: G        W         5.19.0-rc7-xfsx #rc7 6e6475eb29fd9dda3181f81b7ca7ff961d277a40
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1 04/01/2014
RIP: 0010:assfail+0x46/0x4a [xfs]
Call Trace:
 <TASK>
 xfs_ifork_zap_attr+0x7c/0xb0
 xfs_iformat_attr_fork+0x86/0x110
 xfs_inode_from_disk+0x41d/0x480
 xfs_iget+0x389/0xd70
 xfs_bulkstat_one_int+0x5b/0x540
 xfs_bulkstat_iwalk+0x1e/0x30
 xfs_iwalk_ag_recs+0xd1/0x160
 xfs_iwalk_run_callbacks+0xb9/0x180
 xfs_iwalk_ag+0x1d8/0x2e0
 xfs_iwalk+0x141/0x220
 xfs_bulkstat+0x105/0x180
 xfs_ioc_bulkstat.constprop.0.isra.0+0xc5/0x130
 xfs_file_ioctl+0xa5f/0xef0
 __x64_sys_ioctl+0x82/0xa0
 do_syscall_64+0x2b/0x80
 entry_SYSCALL_64_after_hwframe+0x46/0xb0

This newly-added assertion checks that there aren't any incore data
structures hanging off the incore fork when we're trying to reset its
contents.  From the call trace, it is evident that iget was trying to
construct an incore inode from the ondisk inode, but the attr fork
verifier failed and we were trying to undo all the memory allocations
that we had done earlier.

The three assertions in xfs_ifork_zap_attr check that the caller has
already called xfs_idestroy_fork, which clearly has not been done here.
As the zap function then zeroes the pointers, we've effectively leaked
the memory.

The shortest change would have been to insert an extra call to
xfs_idestroy_fork, but it makes more sense to bundle the _idestroy_fork
call into _zap_attr, since all other callsites call _idestroy_fork
immediately prior to calling _zap_attr.  IOWs, it eliminates one way to
fail.

Note: This change only applies cleanly to 2ed5b09b, since we just
reworked the attr fork lifetime.  However, I think this memory leak has
existed since 0f45a1b2, since the chain xfs_iformat_attr_fork ->
xfs_iformat_local -> xfs_init_local_fork will allocate
ifp->if_u1.if_data, but if xfs_ifork_verify_local_attr fails,
xfs_iformat_attr_fork will free i_afp without freeing any of the stuff
hanging off i_afp.  The solution for older kernels I think is to add the
missing call to xfs_idestroy_fork just prior to calling kmem_cache_free.

Found by fuzzing a.sfattr.hdr.totsize = lastbit in xfs/399.

Fixes: 2ed5b09b ("xfs: make inode attribute forks a permanent part of struct xfs_inode")
Probably-Fixes: 0f45a1b2 ("xfs: improve local fork verification")
Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
Reviewed-by: NDave Chinner <dchinner@redhat.com>

conflicts:
	fs/xfs/libxfs/xfs_attr_leaf.c
Signed-off-by: NLong Li <leo.lilong@huawei.com>
Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>

99054b56

xfs: delete unnecessary NULL checks · c462f069

由 Dan Carpenter 提交于 4月 12, 2023

mainline inclusion
from mainline-v5.19-rc5
commit 3f52e016
category: bugfix
bugzilla: 187164, https://gitee.com/openeuler/kernel/issues/I4KIAO
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=3f52e016af600982989b5dee958d313c52483c92

--------------------------------

These NULL check are no long needed after commit 2ed5b09b ("xfs:
make inode attribute forks a permanent part of struct xfs_inode").
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
Signed-off-by: NLong Li <leo.lilong@huawei.com>
Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>

c462f069

xfs: replace inode fork size macros with functions · e4af4f66

由 Darrick J. Wong 提交于 4月 12, 2023

mainline inclusion
from mainline-v5.19-rc5
commit c01147d9
category: bugfix
bugzilla: 187164, https://gitee.com/openeuler/kernel/issues/I4KIAO
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c01147d929899f02a0a8b15e406d12784768ca72

--------------------------------

Replace the shouty macros here with typechecked helper functions.
Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
Reviewed-by: NDave Chinner <dchinner@redhat.com>

conflicts:
	fs/xfs/libxfs/xfs_attr_leaf.c
	fs/xfs/libxfs/xfs_bmap.c
	fs/xfs/libxfs/xfs_bmap_btree.c
	fs/xfs/libxfs/xfs_dir2.c
	fs/xfs/libxfs/xfs_inode_fork.h
	fs/xfs/scrub/symlink.c
	fs/xfs/xfs_itable.c
	fs/xfs/xfs_symlink.c
	fs/xfs/xfs_trace.h
Signed-off-by: NLong Li <leo.lilong@huawei.com>
Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>

e4af4f66

xfs: replace XFS_IFORK_Q with a proper predicate function · b81094f0

由 Darrick J. Wong 提交于 4月 12, 2023

mainline inclusion
from mainline-v5.19-rc5
commit 932b42c6
category: bugfix
bugzilla: 187164, https://gitee.com/openeuler/kernel/issues/I4KIAO
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=932b42c66cb5d0ca9800b128415b4ad6b1952b3e

--------------------------------

Replace this shouty macro with a real C function that has a more
descriptive name.
Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
Reviewed-by: NDave Chinner <dchinner@redhat.com>

conflicts:
	fs/xfs/libxfs/xfs_attr.h
	fs/xfs/libxfs/xfs_inode_fork.c
	fs/xfs/scrub/btree.c
	fs/xfs/xfs_inode.c
Signed-off-by: NLong Li <leo.lilong@huawei.com>
Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>

b81094f0

xfs: use XFS_IFORK_Q to determine the presence of an xattr fork · a0edb32c

由 Darrick J. Wong 提交于 4月 12, 2023

mainline inclusion
from mainline-v5.19-rc5
commit e45d7cb2
category: bugfix
bugzilla: 187164, https://gitee.com/openeuler/kernel/issues/I4KIAO
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e45d7cb2356e6b59fe64da28324025cc6fcd3fbd

--------------------------------

Modify xfs_ifork_ptr to return a NULL pointer if the caller asks for the
attribute fork but i_forkoff is zero.  This eliminates the ambiguity
between i_forkoff and i_af.if_present, which should make it easier to
understand the lifetime of attr forks.

While we're at it, remove the if_present checks around calls to
xfs_idestroy_fork and xfs_ifork_zap_attr since they can both handle attr
forks that have already been torn down.
Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
Reviewed-by: NDave Chinner <dchinner@redhat.com>

conflicts:
	fs/xfs/libxfs/xfs_attr.h
	fs/xfs/libxfs/xfs_inode_fork.c
	fs/xfs/libxfs/xfs_inode_fork.h
	fs/xfs/xfs_icache.c
	fs/xfs/xfs_inode.c
Signed-off-by: NLong Li <leo.lilong@huawei.com>
Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>

a0edb32c

xfs: make inode attribute forks a permanent part of struct xfs_inode · 732df364

由 Darrick J. Wong 提交于 4月 12, 2023

mainline inclusion
from mainline-v5.19-rc5
commit 2ed5b09b
category: bugfix
bugzilla: 187164, https://gitee.com/openeuler/kernel/issues/I4KIAO
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=2ed5b09b3e8fc274ae8fecd6ab7c5106a364bed1

--------------------------------

Syzkaller reported a UAF bug a while back:

==================================================================
BUG: KASAN: use-after-free in xfs_ilock_attr_map_shared+0xe3/0xf6 fs/xfs/xfs_inode.c:127
Read of size 4 at addr ffff88802cec919c by task syz-executor262/2958

CPU: 2 PID: 2958 Comm: syz-executor262 Not tainted
5.15.0-0.30.3-20220406_1406 #3
Hardware name: Red Hat KVM, BIOS 1.13.0-2.module+el8.3.0+7860+a7792d29
04/01/2014
Call Trace:
 <TASK>
 __dump_stack lib/dump_stack.c:88 [inline]
 dump_stack_lvl+0x82/0xa9 lib/dump_stack.c:106
 print_address_description.constprop.9+0x21/0x2d5 mm/kasan/report.c:256
 __kasan_report mm/kasan/report.c:442 [inline]
 kasan_report.cold.14+0x7f/0x11b mm/kasan/report.c:459
 xfs_ilock_attr_map_shared+0xe3/0xf6 fs/xfs/xfs_inode.c:127
 xfs_attr_get+0x378/0x4c2 fs/xfs/libxfs/xfs_attr.c:159
 xfs_xattr_get+0xe3/0x150 fs/xfs/xfs_xattr.c:36
 __vfs_getxattr+0xdf/0x13d fs/xattr.c:399
 cap_inode_need_killpriv+0x41/0x5d security/commoncap.c:300
 security_inode_need_killpriv+0x4c/0x97 security/security.c:1408
 dentry_needs_remove_privs.part.28+0x21/0x63 fs/inode.c:1912
 dentry_needs_remove_privs+0x80/0x9e fs/inode.c:1908
 do_truncate+0xc3/0x1e0 fs/open.c:56
 handle_truncate fs/namei.c:3084 [inline]
 do_open fs/namei.c:3432 [inline]
 path_openat+0x30ab/0x396d fs/namei.c:3561
 do_filp_open+0x1c4/0x290 fs/namei.c:3588
 do_sys_openat2+0x60d/0x98c fs/open.c:1212
 do_sys_open+0xcf/0x13c fs/open.c:1228
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x3a/0x7e arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x44/0x0
RIP: 0033:0x7f7ef4bb753d
Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48
89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73
01 c3 48 8b 0d 1b 79 2c 00 f7 d8 64 89 01 48
RSP: 002b:00007f7ef52c2ed8 EFLAGS: 00000246 ORIG_RAX: 0000000000000055
RAX: ffffffffffffffda RBX: 0000000000404148 RCX: 00007f7ef4bb753d
RDX: 00007f7ef4bb753d RSI: 0000000000000000 RDI: 0000000020004fc0
RBP: 0000000000404140 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0030656c69662f2e
R13: 00007ffd794db37f R14: 00007ffd794db470 R15: 00007f7ef52c2fc0
 </TASK>

Allocated by task 2953:
 kasan_save_stack+0x19/0x38 mm/kasan/common.c:38
 kasan_set_track mm/kasan/common.c:46 [inline]
 set_alloc_info mm/kasan/common.c:434 [inline]
 __kasan_slab_alloc+0x68/0x7c mm/kasan/common.c:467
 kasan_slab_alloc include/linux/kasan.h:254 [inline]
 slab_post_alloc_hook mm/slab.h:519 [inline]
 slab_alloc_node mm/slub.c:3213 [inline]
 slab_alloc mm/slub.c:3221 [inline]
 kmem_cache_alloc+0x11b/0x3eb mm/slub.c:3226
 kmem_cache_zalloc include/linux/slab.h:711 [inline]
 xfs_ifork_alloc+0x25/0xa2 fs/xfs/libxfs/xfs_inode_fork.c:287
 xfs_bmap_add_attrfork+0x3f2/0x9b1 fs/xfs/libxfs/xfs_bmap.c:1098
 xfs_attr_set+0xe38/0x12a7 fs/xfs/libxfs/xfs_attr.c:746
 xfs_xattr_set+0xeb/0x1a9 fs/xfs/xfs_xattr.c:59
 __vfs_setxattr+0x11b/0x177 fs/xattr.c:180
 __vfs_setxattr_noperm+0x128/0x5e0 fs/xattr.c:214
 __vfs_setxattr_locked+0x1d4/0x258 fs/xattr.c:275
 vfs_setxattr+0x154/0x33d fs/xattr.c:301
 setxattr+0x216/0x29f fs/xattr.c:575
 __do_sys_fsetxattr fs/xattr.c:632 [inline]
 __se_sys_fsetxattr fs/xattr.c:621 [inline]
 __x64_sys_fsetxattr+0x243/0x2fe fs/xattr.c:621
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x3a/0x7e arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x44/0x0

Freed by task 2949:
 kasan_save_stack+0x19/0x38 mm/kasan/common.c:38
 kasan_set_track+0x1c/0x21 mm/kasan/common.c:46
 kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:360
 ____kasan_slab_free mm/kasan/common.c:366 [inline]
 ____kasan_slab_free mm/kasan/common.c:328 [inline]
 __kasan_slab_free+0xe2/0x10e mm/kasan/common.c:374
 kasan_slab_free include/linux/kasan.h:230 [inline]
 slab_free_hook mm/slub.c:1700 [inline]
 slab_free_freelist_hook mm/slub.c:1726 [inline]
 slab_free mm/slub.c:3492 [inline]
 kmem_cache_free+0xdc/0x3ce mm/slub.c:3508
 xfs_attr_fork_remove+0x8d/0x132 fs/xfs/libxfs/xfs_attr_leaf.c:773
 xfs_attr_sf_removename+0x5dd/0x6cb fs/xfs/libxfs/xfs_attr_leaf.c:822
 xfs_attr_remove_iter+0x68c/0x805 fs/xfs/libxfs/xfs_attr.c:1413
 xfs_attr_remove_args+0xb1/0x10d fs/xfs/libxfs/xfs_attr.c:684
 xfs_attr_set+0xf1e/0x12a7 fs/xfs/libxfs/xfs_attr.c:802
 xfs_xattr_set+0xeb/0x1a9 fs/xfs/xfs_xattr.c:59
 __vfs_removexattr+0x106/0x16a fs/xattr.c:468
 cap_inode_killpriv+0x24/0x47 security/commoncap.c:324
 security_inode_killpriv+0x54/0xa1 security/security.c:1414
 setattr_prepare+0x1a6/0x897 fs/attr.c:146
 xfs_vn_change_ok+0x111/0x15e fs/xfs/xfs_iops.c:682
 xfs_vn_setattr_size+0x5f/0x15a fs/xfs/xfs_iops.c:1065
 xfs_vn_setattr+0x125/0x2ad fs/xfs/xfs_iops.c:1093
 notify_change+0xae5/0x10a1 fs/attr.c:410
 do_truncate+0x134/0x1e0 fs/open.c:64
 handle_truncate fs/namei.c:3084 [inline]
 do_open fs/namei.c:3432 [inline]
 path_openat+0x30ab/0x396d fs/namei.c:3561
 do_filp_open+0x1c4/0x290 fs/namei.c:3588
 do_sys_openat2+0x60d/0x98c fs/open.c:1212
 do_sys_open+0xcf/0x13c fs/open.c:1228
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x3a/0x7e arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x44/0x0

The buggy address belongs to the object at ffff88802cec9188
 which belongs to the cache xfs_ifork of size 40
The buggy address is located 20 bytes inside of
 40-byte region [ffff88802cec9188, ffff88802cec91b0)
The buggy address belongs to the page:
page:00000000c3af36a1 refcount:1 mapcount:0 mapping:0000000000000000
index:0x0 pfn:0x2cec9
flags: 0xfffffc0000200(slab|node=0|zone=1|lastcpupid=0x1fffff)
raw: 000fffffc0000200 ffffea00009d2580 0000000600000006 ffff88801a9ffc80
raw: 0000000000000000 0000000080490049 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
 ffff88802cec9080: fb fb fb fc fc fa fb fb fb fb fc fc fb fb fb fb
 ffff88802cec9100: fb fc fc fb fb fb fb fb fc fc fb fb fb fb fb fc
>ffff88802cec9180: fc fa fb fb fb fb fc fc fa fb fb fb fb fc fc fb
                            ^
 ffff88802cec9200: fb fb fb fb fc fc fb fb fb fb fb fc fc fb fb fb
 ffff88802cec9280: fb fb fc fc fa fb fb fb fb fc fc fa fb fb fb fb
==================================================================

The root cause of this bug is the unlocked access to xfs_inode.i_afp
from the getxattr code paths while trying to determine which ILOCK mode
to use to stabilize the xattr data.  Unfortunately, the VFS does not
acquire i_rwsem when vfs_getxattr (or listxattr) call into the
filesystem, which means that getxattr can race with a removexattr that's
tearing down the attr fork and crash:

xfs_attr_set:                          xfs_attr_get:
xfs_attr_fork_remove:                  xfs_ilock_attr_map_shared:

xfs_idestroy_fork(ip->i_afp);
kmem_cache_free(xfs_ifork_cache, ip->i_afp);

                                       if (ip->i_afp &&

ip->i_afp = NULL;

                                           xfs_need_iread_extents(ip->i_afp))
                                       <KABOOM>

ip->i_forkoff = 0;

Regrettably, the VFS is much more lax about i_rwsem and getxattr than
is immediately obvious -- not only does it not guarantee that we hold
i_rwsem, it actually doesn't guarantee that we *don't* hold it either.
The getxattr system call won't acquire the lock before calling XFS, but
the file capabilities code calls getxattr with and without i_rwsem held
to determine if the "security.capabilities" xattr is set on the file.

Fixing the VFS locking requires a treewide investigation into every code
path that could touch an xattr and what i_rwsem state it expects or sets
up.  That could take years or even prove impossible; fortunately, we
can fix this UAF problem inside XFS.

An earlier version of this patch used smp_wmb in xfs_attr_fork_remove to
ensure that i_forkoff is always zeroed before i_afp is set to null and
changed the read paths to use smp_rmb before accessing i_forkoff and
i_afp, which avoided these UAF problems.  However, the patch author was
too busy dealing with other problems in the meantime, and by the time he
came back to this issue, the situation had changed a bit.

On a modern system with selinux, each inode will always have at least
one xattr for the selinux label, so it doesn't make much sense to keep
incurring the extra pointer dereference.  Furthermore, Allison's
upcoming parent pointer patchset will also cause nearly every inode in
the filesystem to have extended attributes.  Therefore, make the inode
attribute fork structure part of struct xfs_inode, at a cost of 40 more
bytes.

This patch adds a clunky if_present field where necessary to maintain
the existing logic of xattr fork null pointer testing in the existing
codebase.  The next patch switches the logic over to XFS_IFORK_Q and it
all goes away.
Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
Reviewed-by: NDave Chinner <dchinner@redhat.com>

conflicts:
	fs/xfs/libxfs/xfs_attr.c
	fs/xfs/libxfs/xfs_attr.h
	fs/xfs/libxfs/xfs_attr_leaf.c
	fs/xfs/libxfs/xfs_bmap.c
	fs/xfs/libxfs/xfs_inode_buf.c
	fs/xfs/libxfs/xfs_inode_fork.c
	fs/xfs/libxfs/xfs_inode_fork.h
	fs/xfs/xfs_attr_inactive.c
	fs/xfs/xfs_attr_list.c
	fs/xfs/xfs_icache.c
	fs/xfs/xfs_inode.c
	fs/xfs/xfs_inode.h
	fs/xfs/xfs_inode_item.c
	fs/xfs/xfs_itable.c
Signed-off-by: NLong Li <leo.lilong@huawei.com>
Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>

732df364

xfs: convert XFS_IFORK_PTR to a static inline helper · 47998bc8

由 Darrick J. Wong 提交于 4月 12, 2023

mainline inclusion
from mainline-v5.19-rc5
commit 732436ef
category: bugfix
bugzilla: 187164, https://gitee.com/openeuler/kernel/issues/I4KIAO
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=732436ef916b4f338d672ea56accfdb11e8d0732

--------------------------------

We're about to make this logic do a bit more, so convert the macro to a
static inline function for better typechecking and fewer shouty macros.
No functional changes here.
Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
Reviewed-by: NDave Chinner <dchinner@redhat.com>

conflicts:
	fs/xfs/libxfs/xfs_bmap.c
	fs/xfs/libxfs/xfs_bmap_btree.c
	fs/xfs/libxfs/xfs_inode_fork.c
	fs/xfs/libxfs/xfs_inode_fork.h
	fs/xfs/scrub/bmap.c
	fs/xfs/scrub/symlink.c
	fs/xfs/xfs_inode.c
	fs/xfs/xfs_ioctl.c
	fs/xfs/xfs_qm.c
	fs/xfs/xfs_reflink.c
Signed-off-by: NLong Li <leo.lilong@huawei.com>
Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>

47998bc8

xfs: don't reuse busy extents on extent trim · bbfe8670

由 Brian Foster 提交于 4月 12, 2023

mainline inclusion
from mainline-v5.11-rc4
commit 06058bc4
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4KIAO
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=06058bc40534530e617e5623775c53bb24f032cb

--------------------------------

Freed extents are marked busy from the point the freeing transaction
commits until the associated CIL context is checkpointed to the log.
This prevents reuse and overwrite of recently freed blocks before
the changes are committed to disk, which can lead to corruption
after a crash. The exception to this rule is that metadata
allocation is allowed to reuse busy extents because metadata changes
are also logged.

As of commit 97d3ac75 ("xfs: exact busy extent tracking"), XFS
has allowed modification or complete invalidation of outstanding
busy extents for metadata allocations. This implementation assumes
that use of the associated extent is imminent, which is not always
the case. For example, the trimmed extent might not satisfy the
minimum length of the allocation request, or the allocation
algorithm might be involved in a search for the optimal result based
on locality.

generic/019 reproduces a corruption caused by this scenario. First,
a metadata block (usually a bmbt or symlink block) is freed from an
inode. A subsequent bmbt split on an unrelated inode attempts a near
mode allocation request that invalidates the busy block during the
search, but does not ultimately allocate it. Due to the busy state
invalidation, the block is no longer considered busy to subsequent
allocation. A direct I/O write request immediately allocates the
block and writes to it. Finally, the filesystem crashes while in a
state where the initial metadata block free had not committed to the
on-disk log. After recovery, the original metadata block is in its
original location as expected, but has been corrupted by the
aforementioned dio.

This demonstrates that it is fundamentally unsafe to modify busy
extent state for extents that are not guaranteed to be allocated.
This applies to pretty much all of the code paths that currently
trim busy extents for one reason or another. Therefore to address
this problem, drop the reuse mechanism from the busy extent trim
path. This code already knows how to return partial non-busy ranges
of the targeted free extent and higher level code tracks the busy
state of the allocation attempt. If a block allocation fails where
one or more candidate extents is busy, we force the log and retry
the allocation.
Signed-off-by: NBrian Foster <bfoster@redhat.com>
Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
Reviewed-by: NChandan Babu R <chandanrlinux@gmail.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NLong Li <leo.lilong@huawei.com>
Reviewed-by: NYang Erkun <yangerkun@huawei.com>
Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>

bbfe8670

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功