提交 · 34ba4153c10e9dd14369ec2c6bec22836018767f · openeuler / Kernel

12 4月, 2023 6 次提交

xfs: convert XFS_IFORK_PTR to a static inline helper · 34ba4153

由 Darrick J. Wong 提交于 4月 12, 2023

mainline inclusion
from mainline-v5.19-rc5
commit 732436ef
category: bugfix
bugzilla: 187164, https://gitee.com/openeuler/kernel/issues/I4KIAO
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=732436ef916b4f338d672ea56accfdb11e8d0732

--------------------------------

We're about to make this logic do a bit more, so convert the macro to a
static inline function for better typechecking and fewer shouty macros.
No functional changes here.
Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
Reviewed-by: NDave Chinner <dchinner@redhat.com>

conflicts:
	fs/xfs/libxfs/xfs_bmap.c
	fs/xfs/libxfs/xfs_bmap_btree.c
	fs/xfs/libxfs/xfs_inode_fork.c
	fs/xfs/libxfs/xfs_inode_fork.h
	fs/xfs/scrub/bmap.c
	fs/xfs/scrub/symlink.c
	fs/xfs/xfs_inode.c
	fs/xfs/xfs_ioctl.c
	fs/xfs/xfs_qm.c
	fs/xfs/xfs_reflink.c
Signed-off-by: NLong Li <leo.lilong@huawei.com>
Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>

34ba4153

xfs: don't reuse busy extents on extent trim · 63c2dc44

由 Brian Foster 提交于 4月 12, 2023

mainline inclusion
from mainline-v5.11-rc4
commit 06058bc4
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4KIAO
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=06058bc40534530e617e5623775c53bb24f032cb

--------------------------------

Freed extents are marked busy from the point the freeing transaction
commits until the associated CIL context is checkpointed to the log.
This prevents reuse and overwrite of recently freed blocks before
the changes are committed to disk, which can lead to corruption
after a crash. The exception to this rule is that metadata
allocation is allowed to reuse busy extents because metadata changes
are also logged.

As of commit 97d3ac75 ("xfs: exact busy extent tracking"), XFS
has allowed modification or complete invalidation of outstanding
busy extents for metadata allocations. This implementation assumes
that use of the associated extent is imminent, which is not always
the case. For example, the trimmed extent might not satisfy the
minimum length of the allocation request, or the allocation
algorithm might be involved in a search for the optimal result based
on locality.

generic/019 reproduces a corruption caused by this scenario. First,
a metadata block (usually a bmbt or symlink block) is freed from an
inode. A subsequent bmbt split on an unrelated inode attempts a near
mode allocation request that invalidates the busy block during the
search, but does not ultimately allocate it. Due to the busy state
invalidation, the block is no longer considered busy to subsequent
allocation. A direct I/O write request immediately allocates the
block and writes to it. Finally, the filesystem crashes while in a
state where the initial metadata block free had not committed to the
on-disk log. After recovery, the original metadata block is in its
original location as expected, but has been corrupted by the
aforementioned dio.

This demonstrates that it is fundamentally unsafe to modify busy
extent state for extents that are not guaranteed to be allocated.
This applies to pretty much all of the code paths that currently
trim busy extents for one reason or another. Therefore to address
this problem, drop the reuse mechanism from the busy extent trim
path. This code already knows how to return partial non-busy ranges
of the targeted free extent and higher level code tracks the busy
state of the allocation attempt. If a block allocation fails where
one or more candidate extents is busy, we force the log and retry
the allocation.
Signed-off-by: NBrian Foster <bfoster@redhat.com>
Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
Reviewed-by: NChandan Babu R <chandanrlinux@gmail.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NLong Li <leo.lilong@huawei.com>
Reviewed-by: NYang Erkun <yangerkun@huawei.com>
Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>

63c2dc44

fs/xfs: convert comma to semicolon · b53af989

由 Zheng Yongjun 提交于 4月 12, 2023

mainline inclusion
from mainline-v5.10-rc5
commit 1189686e
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4KIAO
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=1189686e5440041057f8cc21a7c1d13bb6642cb9

--------------------------------

Replace a comma between expression statements by a semicolon.
Signed-off-by: NZheng Yongjun <zhengyongjun3@huawei.com>
Reviewed-by: NBrian Foster <bfoster@redhat.com>
Reviewed-by: NEric Sandeen <sandeen@redhat.com>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NLong Li <leo.lilong@huawei.com>
Reviewed-by: NYang Erkun <yangerkun@huawei.com>
Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>

b53af989

xfs: xfs_ail_push_all_sync() stalls when racing with updates · 6d1cae97

由 Dave Chinner 提交于 4月 12, 2023

mainline inclusion
from mainline-v5.17-rc6
commit 941fbdfd
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4KIAO
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=941fbdfd6dd0f1d7961c28123b5460912f678cb5

--------------------------------

xfs_ail_push_all_sync() has a loop like this:

while max_ail_lsn {
	prepare_to_wait(ail_empty)
	target = max_ail_lsn
	wake_up(ail_task);
	schedule()
}

Which is designed to sleep until the AIL is emptied. When
xfs_ail_update_finish() moves the tail of the log, it does:

	if (list_empty(&ailp->ail_head))
		wake_up_all(&ailp->ail_empty);

So it will only wake up the sync push waiter when the AIL goes
empty. If, by the time the push waiter has woken, the AIL has more
in it, it will reset the target, wake the push task and go back to
sleep.

The problem here is that if the AIL is having items added to it
when xfs_ail_push_all_sync() is called, then they may get inserted
into the AIL at a LSN higher than the target LSN. At this point,
xfsaild_push() will see that the target is X, the item LSNs are
(X+N) and skip over them, hence never pushing the out.

The result of this the AIL will not get emptied by the AIL push
thread, hence xfs_ail_finish_update() will never see the AIL being
empty even if it moves the tail. Hence xfs_ail_push_all_sync() never
gets woken and hence cannot update the push target to capture the
items beyond the current target on the LSN.

This is a TOCTOU type of issue so the way to avoid it is to not
use the push target at all for sync pushes. We know that a sync push
is being requested by the fact the ail_empty wait queue is active,
hence the xfsaild can just set the target to max_ail_lsn on every
push that we see the wait queue active. Hence we no longer will
leave items on the AIL that are beyond the LSN sampled at the start
of a sync push.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NChandan Babu R <chandan.babu@oracle.com>
Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
Signed-off-by: NLong Li <leo.lilong@huawei.com>
Reviewed-by: NYang Erkun <yangerkun@huawei.com>
Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>

6d1cae97

xfs: check buffer pin state after locking in delwri_submit · 9781b974

由 Dave Chinner 提交于 4月 12, 2023

mainline inclusion
from mainline-v5.17-rc6
commit dbd0f529
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4KIAO
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=dbd0f5299302f8506637592e2373891a748c6990

--------------------------------

AIL flushing can get stuck here:

[316649.005769] INFO: task xfsaild/pmem1:324525 blocked for more than 123 seconds.
[316649.007807]       Not tainted 5.17.0-rc6-dgc+ #975
[316649.009186] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[316649.011720] task:xfsaild/pmem1   state:D stack:14544 pid:324525 ppid:     2 flags:0x00004000
[316649.014112] Call Trace:
[316649.014841]  <TASK>
[316649.015492]  __schedule+0x30d/0x9e0
[316649.017745]  schedule+0x55/0xd0
[316649.018681]  io_schedule+0x4b/0x80
[316649.019683]  xfs_buf_wait_unpin+0x9e/0xf0
[316649.021850]  __xfs_buf_submit+0x14a/0x230
[316649.023033]  xfs_buf_delwri_submit_buffers+0x107/0x280
[316649.024511]  xfs_buf_delwri_submit_nowait+0x10/0x20
[316649.025931]  xfsaild+0x27e/0x9d0
[316649.028283]  kthread+0xf6/0x120
[316649.030602]  ret_from_fork+0x1f/0x30

in the situation where flushing gets preempted between the unpin
check and the buffer trylock under nowait conditions:

	blk_start_plug(&plug);
	list_for_each_entry_safe(bp, n, buffer_list, b_list) {
		if (!wait_list) {
			if (xfs_buf_ispinned(bp)) {
				pinned++;
				continue;
			}
Here >>>>>>
			if (!xfs_buf_trylock(bp))
				continue;

This means submission is stuck until something else triggers a log
force to unpin the buffer.

To get onto the delwri list to begin with, the buffer pin state has
already been checked, and hence it's relatively rare we get a race
between flushing and encountering a pinned buffer in delwri
submission to begin with. Further, to increase the pin count the
buffer has to be locked, so the only way we can hit this race
without failing the trylock is to be preempted between the pincount
check seeing zero and the trylock being run.

Hence to avoid this problem, just invert the order of trylock vs
pin check. We shouldn't hit that many pinned buffers here, so
optimising away the trylock for pinned buffers should not matter for
performance at all.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NChandan Babu R <chandan.babu@oracle.com>
Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
Signed-off-by: NLong Li <leo.lilong@huawei.com>
Reviewed-by: NYang Erkun <yangerkun@huawei.com>
Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>

9781b974

xfs: log worker needs to start before intent/unlink recovery · 74d73186

由 Dave Chinner 提交于 4月 12, 2023

mainline inclusion
from mainline-v5.17-rc6
commit a9a4bc8c
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4KIAO
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=a9a4bc8c76d747aa40b30e2dfc176c781f353a08

--------------------------------

After 963 iterations of generic/530, it deadlocked during recovery
on a pinned inode cluster buffer like so:

XFS (pmem1): Starting recovery (logdev: internal)
INFO: task kworker/8:0:306037 blocked for more than 122 seconds.
      Not tainted 5.17.0-rc6-dgc+ #975
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:kworker/8:0     state:D stack:13024 pid:306037 ppid:     2 flags:0x00004000
Workqueue: xfs-inodegc/pmem1 xfs_inodegc_worker
Call Trace:
 <TASK>
 __schedule+0x30d/0x9e0
 schedule+0x55/0xd0
 schedule_timeout+0x114/0x160
 __down+0x99/0xf0
 down+0x5e/0x70
 xfs_buf_lock+0x36/0xf0
 xfs_buf_find+0x418/0x850
 xfs_buf_get_map+0x47/0x380
 xfs_buf_read_map+0x54/0x240
 xfs_trans_read_buf_map+0x1bd/0x490
 xfs_imap_to_bp+0x4f/0x70
 xfs_iunlink_map_ino+0x66/0xd0
 xfs_iunlink_map_prev.constprop.0+0x148/0x2f0
 xfs_iunlink_remove_inode+0xf2/0x1d0
 xfs_inactive_ifree+0x1a3/0x900
 xfs_inode_unlink+0xcc/0x210
 xfs_inodegc_worker+0x1ac/0x2f0
 process_one_work+0x1ac/0x390
 worker_thread+0x56/0x3c0
 kthread+0xf6/0x120
 ret_from_fork+0x1f/0x30
 </TASK>
task:mount           state:D stack:13248 pid:324509 ppid:324233 flags:0x00004000
Call Trace:
 <TASK>
 __schedule+0x30d/0x9e0
 schedule+0x55/0xd0
 schedule_timeout+0x114/0x160
 __down+0x99/0xf0
 down+0x5e/0x70
 xfs_buf_lock+0x36/0xf0
 xfs_buf_find+0x418/0x850
 xfs_buf_get_map+0x47/0x380
 xfs_buf_read_map+0x54/0x240
 xfs_trans_read_buf_map+0x1bd/0x490
 xfs_imap_to_bp+0x4f/0x70
 xfs_iget+0x300/0xb40
 xlog_recover_process_one_iunlink+0x4c/0x170
 xlog_recover_process_iunlinks.isra.0+0xee/0x130
 xlog_recover_finish+0x57/0x110
 xfs_log_mount_finish+0xfc/0x1e0
 xfs_mountfs+0x540/0x910
 xfs_fs_fill_super+0x495/0x850
 get_tree_bdev+0x171/0x270
 xfs_fs_get_tree+0x15/0x20
 vfs_get_tree+0x24/0xc0
 path_mount+0x304/0xba0
 __x64_sys_mount+0x108/0x140
 do_syscall_64+0x35/0x80
 entry_SYSCALL_64_after_hwframe+0x44/0xae
 </TASK>
task:xfsaild/pmem1   state:D stack:14544 pid:324525 ppid:     2 flags:0x00004000
Call Trace:
 <TASK>
 __schedule+0x30d/0x9e0
 schedule+0x55/0xd0
 io_schedule+0x4b/0x80
 xfs_buf_wait_unpin+0x9e/0xf0
 __xfs_buf_submit+0x14a/0x230
 xfs_buf_delwri_submit_buffers+0x107/0x280
 xfs_buf_delwri_submit_nowait+0x10/0x20
 xfsaild+0x27e/0x9d0
 kthread+0xf6/0x120
 ret_from_fork+0x1f/0x30

We have the mount process waiting on an inode cluster buffer read,
inodegc doing unlink waiting on the same inode cluster buffer, and
the AIL push thread blocked in writeback waiting for the inode
cluster buffer to become unpinned.

What has happened here is that the AIL push thread has raced with
the inodegc process modifying, committing and pinning the inode
cluster buffer here in xfs_buf_delwri_submit_buffers() here:

	blk_start_plug(&plug);
	list_for_each_entry_safe(bp, n, buffer_list, b_list) {
		if (!wait_list) {
			if (xfs_buf_ispinned(bp)) {
				pinned++;
				continue;
			}
Here >>>>>>
			if (!xfs_buf_trylock(bp))
				continue;

Basically, the AIL has found the buffer wasn't pinned and got the
lock without blocking, but then the buffer was pinned. This implies
the processing here was pre-empted between the pin check and the
lock, because the pin count can only be increased while holding the
buffer locked. Hence when it has gone to submit the IO, it has
blocked waiting for the buffer to be unpinned.

With all executing threads now waiting on the buffer to be unpinned,
we normally get out of situations like this via the background log
worker issuing a log force which will unpinned stuck buffers like
this. But at this point in recovery, we haven't started the log
worker. In fact, the first thing we do after processing intents and
unlinked inodes is *start the log worker*. IOWs, we start it too
late to have it break deadlocks like this.

Avoid this and any other similar deadlock vectors in intent and
unlinked inode recovery by starting the log worker before we recover
intents and unlinked inodes. This part of recovery runs as though
the filesystem is fully active, so we really should have the same
infrastructure running as we normally do at runtime.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
Reviewed-by: NChandan Babu R <chandan.babu@oracle.com>
Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
Signed-off-by: NLong Li <leo.lilong@huawei.com>
Reviewed-by: NYang Erkun <yangerkun@huawei.com>
Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>

74d73186

04 4月, 2023 19 次提交

!550 anolis: bond: broadcast ARP or ND messages to all slaves · 07cc5878

由 openeuler-ci-bot 提交于 4月 04, 2023

Merge Pull Request from: @wang-yufen316

This is achieved by broadcasting ARP or ND packets to all of its slave devices on transmit side. The switch will take further actions based on proper configuration.
A new sysctl knob "net.bonding.broadcast_arp_or_nd" is introduced which controls the behaviour of broadcasting.

Link:https://gitee.com/openeuler/kernel/pulls/550

Reviewed-by: Yue Haibing <yuehaibing@huawei.com>
Reviewed-by: Jialin Zhang <zhangjialin11@huawei.com>
Signed-off-by: Jialin Zhang <zhangjialin11@huawei.com>

07cc5878

!561 Backport CVEs and bugfixes · b7df3633

由 openeuler-ci-bot 提交于 4月 04, 2023

Merge Pull Request from: @zhangjialin11 
 
Pull new CVEs:
CVE-2023-1513
CVE-2022-4269

net bugfixes from Zhengchao Shao
ext4 bugfixes from Zhihao Cheng and Baokun Li
timer bugfix from Yu Liao
nvme bugfix from Li Lingfeng
xfs bugfixes from Zhihao Cheng
scsi bugfix from Yu Kuai 
 
Link:https://gitee.com/openeuler/kernel/pulls/561 

Reviewed-by: Zheng Zengkai <zhengzengkai@huawei.com> 
Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com>

b7df3633

!560 [sync] PR-539: LoongArch: fix compile warnning of drm/loongson driver · 89d651f2

由 openeuler-ci-bot 提交于 4月 04, 2023

Merge Pull Request from: @openeuler-sync-bot 
 

Origin pull request: 
https://gitee.com/openeuler/kernel/pulls/539 
 
fix compile warnning by remove unused function and variable. 
 
Link:https://gitee.com/openeuler/kernel/pulls/560 

Reviewed-by: Jialin Zhang <zhangjialin11@huawei.com> 
Signed-off-by: Jialin Zhang <zhangjialin11@huawei.com>

89d651f2

net: sched: Use struct_size() helper in kvmalloc() · 8bfbbf60

由 Gustavo A. R. Silva 提交于 4月 04, 2023

mainline inclusion
from mainline-v5.16-rc1
commit 12929198
category: bugfix
bugzilla: https://gitee.com/src-openeuler/kernel/issues/I6SFHJ
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=129291980f4901ae68ee3ba4344bdc38cf5f800d

--------------------------------

Make use of the struct_size() helper instead of an open-coded version,
in order to avoid any potential type mistakes or integer overflows
that, in the worst scenario, could lead to heap overflows.

Link: https://github.com/KSPP/linux/issues/160Signed-off-by: NGustavo A. R. Silva <gustavoars@kernel.org>
Link: https://lore.kernel.org/r/20210929201718.GA342296@embeddedorSigned-off-by: NJakub Kicinski <kuba@kernel.org>
Signed-off-by: NZhengchao Shao <shaozhengchao@huawei.com>
Reviewed-by: NLiu Jian <liujian56@huawei.com>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>

8bfbbf60

net_sched: Use struct_size() and flex_array_size() helpers · ef17d368

由 Gustavo A. R. Silva 提交于 4月 04, 2023

mainline inclusion
from mainline-v5.16-rc1
commit 69508d43
category: bugfix
bugzilla: https://gitee.com/src-openeuler/kernel/issues/I6SFHJ
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=69508d43334e3b09c344f662272bcf24a5b508ed

--------------------------------

Make use of the struct_size() and flex_array_size() helpers instead of
an open-coded version, in order to avoid any potential type mistakes
or integer overflows that, in the worse scenario, could lead to heap
overflows.

Link: https://github.com/KSPP/linux/issues/160Signed-off-by: NGustavo A. R. Silva <gustavoars@kernel.org>
Link: https://lore.kernel.org/r/20210928193107.GA262595@embeddedorSigned-off-by: NJakub Kicinski <kuba@kernel.org>

Conflicts:
	net/sched/sch_api.c
Signed-off-by: NZhengchao Shao <shaozhengchao@huawei.com>
Reviewed-by: NLiu Jian <liujian56@huawei.com>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>

ef17d368

ext4: dio take shared inode lock when overwriting preallocated blocks · 4cc68388

由 Zhang Yi 提交于 4月 04, 2023

mainline inclusion
from mainline-v6.3-rc1
commit 240930fb
category: perf
bugzilla: https://gitee.com/openeuler/kernel/issues/I6S63P
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=240930fb7e6b52229bdee5b1423bfeab0002fed2

--------------------------------

In the dio write path, we only take shared inode lock for the case of
aligned overwriting initialized blocks inside EOF. But for overwriting
preallocated blocks, it may only need to split unwritten extents, this
procedure has been protected under i_data_sem lock, it's safe to
release the exclusive inode lock and take shared inode lock.

This could give a significant speed up for multi-threaded writes. Test
on Intel Xeon Gold 6140 and nvme SSD with below fio parameters.

 direct=1
 ioengine=libaio
 iodepth=10
 numjobs=10
 runtime=60
 rw=randwrite
 size=100G

And the test result are:
Before:
 bs=4k       IOPS=11.1k, BW=43.2MiB/s
 bs=16k      IOPS=11.1k, BW=173MiB/s
 bs=64k      IOPS=11.2k, BW=697MiB/s

After:
 bs=4k       IOPS=41.4k, BW=162MiB/s
 bs=16k      IOPS=41.3k, BW=646MiB/s
 bs=64k      IOPS=13.5k, BW=843MiB/s
Signed-off-by: NZhang Yi <yi.zhang@huawei.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20221226062015.3479416-1-yi.zhang@huaweicloud.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
Conflicts:
	fs/ext4/file.c
	[ 2f632965("iomap: pass a flags argument to iomap_dio_rw")
	  is not applied. ]
Signed-off-by: NZhihao Cheng <chengzhihao1@huawei.com>
Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>

4cc68388

ext4: fix race between writepages and remount · 6b6f7836

由 Baokun Li 提交于 4月 04, 2023

hulk inclusion
category: bugfix
bugzilla: 188500, https://gitee.com/openeuler/kernel/issues/I6RJ0V
CVE: NA

--------------------------------

We got a WARNING in ext4_add_complete_io:
==================================================================
 WARNING: at fs/ext4/page-io.c:231 ext4_put_io_end_defer+0x182/0x250
 CPU: 10 PID: 77 Comm: ksoftirqd/10 Tainted: 6.3.0-rc2 #85
 RIP: 0010:ext4_put_io_end_defer+0x182/0x250 [ext4]
 [...]
 Call Trace:
  <TASK>
  ext4_end_bio+0xa8/0x240 [ext4]
  bio_endio+0x195/0x310
  blk_update_request+0x184/0x770
  scsi_end_request+0x2f/0x240
  scsi_io_completion+0x75/0x450
  scsi_finish_command+0xef/0x160
  scsi_complete+0xa3/0x180
  blk_complete_reqs+0x60/0x80
  blk_done_softirq+0x25/0x40
  __do_softirq+0x119/0x4c8
  run_ksoftirqd+0x42/0x70
  smpboot_thread_fn+0x136/0x3c0
  kthread+0x140/0x1a0
  ret_from_fork+0x2c/0x50
==================================================================

Above issue may happen as follows:

            cpu1                        cpu2
----------------------------|----------------------------
mount -o dioread_lock
ext4_writepages
 ext4_do_writepages
  *if (ext4_should_dioread_nolock(inode))*
    // rsv_blocks is not assigned here
                                 mount -o remount,dioread_nolock
  ext4_journal_start_with_reserve
   __ext4_journal_start
    __ext4_journal_start_sb
     jbd2__journal_start
      *if (rsv_blocks)*
        // h_rsv_handle is not initialized here
  mpage_map_and_submit_extent
    mpage_map_one_extent
      dioread_nolock = ext4_should_dioread_nolock(inode)
      if (dioread_nolock && (map->m_flags & EXT4_MAP_UNWRITTEN))
        mpd->io_submit.io_end->handle = handle->h_rsv_handle
        ext4_set_io_unwritten_flag
          io_end->flag |= EXT4_IO_END_UNWRITTEN
      // now io_end->handle is NULL but has EXT4_IO_END_UNWRITTEN flag

scsi_finish_command
 scsi_io_completion
  scsi_io_completion_action
   scsi_end_request
    blk_update_request
     req_bio_endio
      bio_endio
       bio->bi_end_io  > ext4_end_bio
        ext4_put_io_end_defer
	 ext4_add_complete_io
	  // trigger WARN_ON(!io_end->handle && sbi->s_journal);

The immediate cause of this problem is that ext4_should_dioread_nolock()
function returns inconsistent values in the ext4_do_writepages() and
mpage_map_one_extent(). There are four conditions in this function that
can be changed at mount time to cause this problem. These four conditions
can be divided into two categories:

    (1) journal_data and EXT4_EXTENTS_FL, which can be changed by ioctl
    (2) DELALLOC and DIOREAD_NOLOCK, which can be changed by remount

The two in the first category have been fixed by commit c8585c6f
("ext4: fix races between changing inode journal mode and ext4_writepages")
and commit cb85f4d2 ("ext4: fix race between writepages and enabling
EXT4_EXTENTS_FL") respectively.

Two cases in the other category have not yet been fixed, and the above
issue is caused by this situation. We refer to the fix for the first
category, when applying options during remount, we grab s_writepages_rwsem
to avoid racing with writepages ops to trigger this problem.

Fixes: 6b523df4 ("ext4: use transaction reservation for extent conversion in ext4_end_io")
Cc: stable@vger.kernel.org
Signed-off-by: NBaokun Li <libaokun1@huawei.com>
Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
Reviewed-by: NYang Erkun <yangerkun@huawei.com>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>

6b6f7836

clocksource/drivers/arm_arch_timer: Fix CNTPCT_LO and CNTVCT_LO value · 5c7451d7

由 Yang Guo 提交于 4月 04, 2023

mainline inclusion
from mainline-v6.1-rc1
commit af246cc6
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I6RQVI

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=af246cc6d0ed11318223606128bb0b09866c4c08

--------------------------------

CNTPCT_LO and CNTVCT_LO are defined by mistake in commit '8b82c4f8',
so fix them according to the Arm ARM DDI 0487I.a, Table I2-4
"CNTBaseN memory map" as follows:

Offset    Register      Type Description
0x000     CNTPCT[31:0]  RO   Physical Count register.
0x004     CNTPCT[63:32] RO
0x008     CNTVCT[31:0]  RO   Virtual Count register.
0x00C     CNTVCT[63:32] RO

Fixes: 8b82c4f8 ("clocksource/drivers/arm_arch_timer: Move MMIO timer programming over to CVAL")
Cc: stable@vger.kernel.org
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Acked-by: NMarc Zyngier <maz@kernel.org>
Signed-off-by: NYang Guo <guoyang2@huawei.com>
Signed-off-by: NShaokun Zhang <zhangshaokun@hisilicon.com>
Link: https://lore.kernel.org/r/20220927033221.49589-1-zhangshaokun@hisilicon.comSigned-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: NYu Liao <liaoyu15@huawei.com>
Reviewed-by: NXiongfeng Wang <wangxiongfeng2@huawei.com>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>

5c7451d7

kvm: initialize all of the kvm_debugregs structure before sending it to userspace · b2923141

由 Greg Kroah-Hartman 提交于 4月 04, 2023

stable inclusion
from stable-v5.10.169
commit 6416c2108ba54d569e4c98d3b62ac78cb12e7107
category: bugfix
bugzilla: https://gitee.com/src-openeuler/kernel/issues/I6OOP3
CVE: CVE-2023-1513

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=6416c2108ba54d569e4c98d3b62ac78cb12e7107

--------------------------------

commit 2c10b614 upstream.

When calling the KVM_GET_DEBUGREGS ioctl, on some configurations, there
might be some unitialized portions of the kvm_debugregs structure that
could be copied to userspace.  Prevent this as is done in the other kvm
ioctls, by setting the whole structure to 0 before copying anything into
it.

Bonus is that this reduces the lines of code as the explicit flag
setting and reserved space zeroing out can be removed.

Cc: Sean Christopherson <seanjc@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: <x86@kernel.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: stable <stable@kernel.org>
Reported-by: NXingyuan Mo <hdthky0@gmail.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Message-Id: <20230214103304.3689213-1-gregkh@linuxfoundation.org>
Tested-by: NXingyuan Mo <hdthky0@gmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NGuo Mengqi <guomengqi3@huawei.com>
Reviewed-by: NWang Weiyang <wangweiyang2@huawei.com>
Reviewed-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>

b2923141

nvme: use nvme_cid to generate command_id in trace event · 1c42e205

由 Li Lingfeng 提交于 4月 04, 2023

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I6QMTU

--------------------------------

Recently, a null-ptr-deref problem occur when submitting IO to nvme disk:

[34432.226539] ==========================================================
[34432.226579] BUG: KASAN: null-ptr-deref in
trace_event_raw_event_nvme_complete_rq+0x13c/0x270 [nvme_core]
[34432.226584] Read of size 2 at addr 0000000000000002 by task loop0/32
[34432.226586]
[34432.226594] CPU: 0 PID: 3242729 Comm: loop0 Kdump: loaded Tainted:
[34432.226598] Hardware name: Huawei TaiShan 2280 V2/BC82AMDD
[34432.226602] Call trace:
[34432.226610]  dump_backtrace+0x0/0x2fc
[34432.226615]  show_stack+0x20/0x30
[34432.226623]  dump_stack+0x104/0x17c
[34432.226630]  __kasan_report+0x138/0x140
[34432.226634]  kasan_report+0x44/0xdc
[34432.226639]  __asan_load2+0x90/0xd0
[34432.226662]  trace_event_raw_event_nvme_complete_rq+0x13c/0x270
[34432.226684]  nvme_complete_rq+0x228/0x480 [nvme_core]
[34432.226698]  nvme_pci_complete_rq+0x184/0x1b4 [nvme]
[34432.226706]  nvme_irq+0x270/0x500 [nvme]
[34432.226714]  __handle_irq_event_percpu+0x8c/0x324
[34432.226719]  handle_irq_event_percpu+0x88/0x11c
[34432.226724]  handle_irq_event+0x110/0x2b0
[34432.226729]  handle_fasteoi_irq+0x1e4/0x3f4
[34432.226734]  __handle_domain_irq+0xbc/0x130
[34432.226739]  gic_handle_irq+0x78/0x460
[34432.226743]  el1_irq+0xb8/0x140
[34432.226750]  __slab_alloc+0x38/0x70
[34432.226756]  kmem_cache_alloc+0x6b8/0x904
[34432.226762]  mempool_alloc_slab+0x3c/0x60
[34432.226766]  mempool_alloc+0xf0/0x440
[34432.226772]  bio_alloc_bioset+0x208/0x2f0
[34432.226899]  io_submit_init_bio+0x3c/0x190 [ext4]
[34432.226991]  ext4_bio_write_page+0x540/0xbd0 [ext4]
[34432.227082]  mpage_submit_page+0xb0/0x120 [ext4]
[34432.227173]  mpage_process_page_bufs+0x25c/0x2b4 [ext4]
[34432.227265]  mpage_prepare_extent_to_map+0x3b8/0x75c [ext4]
[34432.227356]  ext4_writepages+0x454/0xcb4 [ext4]
[34432.227361]  do_writepages+0xc4/0x1c0
...

This can be reproduced by following steps:
1) modprobe nvme
2) echo nvme:* > /sys/kernel/debug/tracing/set_event
3) dd if=/dev/random of=/dev/nvmexxx bs=1M count=1024

Generating command_id by nvme_cid() in trace event instead of
nvme_req(req)->cmd->common.command_id can fix it since
nvme_req(req)->cmd can be NULL in sometimes.

Fixes: eae0bc99 ("nvme: use command_id instead of req->tag in trace_nvme_complete_rq()")
Signed-off-by: NLi Lingfeng <lilingfeng3@huawei.com>
Reviewed-by: NHou Tao <houtao1@huawei.com>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>

1c42e205

xfs: don't report reserved bnobt space as available · 6026f819

由 Darrick J. Wong 提交于 4月 04, 2023

mainline inclusion
from mainline-v5.18-rc1
commit 85bcfa26
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4KIAO
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=85bcfa26f9a3782be37d4feafd49668b98b8bdbe

--------------------------------

On a modern filesystem, we don't allow userspace to allocate blocks for
data storage from the per-AG space reservations, the user-controlled
reservation pool that prevents ENOSPC in the middle of internal
operations, or the internal per-AG set-aside that prevents unwanted
filesystem shutdowns due to ENOSPC during a bmap btree split.

Since we now consider freespace btree blocks as unavailable for
allocation for data storage, we shouldn't report those blocks via statfs
either.  This makes the numbers that we return via the statfs f_bavail
and f_bfree fields a more conservative estimate of actual free space.
Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
Reviewed-by: NBrian Foster <bfoster@redhat.com>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NZhihao Cheng <chengzhihao1@huawei.com>
Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>

6026f819

xfs: don't include bnobt blocks when reserving free block pool · c5472995

由 Darrick J. Wong 提交于 4月 04, 2023

mainline inclusion
from mainline-v5.18-rc1
commit c8c56825
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4KIAO
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c8c568259772751a14e969b7230990508de73d9d

--------------------------------

xfs_reserve_blocks controls the size of the user-visible free space
reserve pool.  Given the difference between the current and requested
pool sizes, it will try to reserve free space from fdblocks.  However,
the amount requested from fdblocks is also constrained by the amount of
space that we think xfs_mod_fdblocks will give us.  If we forget to
subtract m_allocbt_blks before calling xfs_mod_fdblocks, it will will
return ENOSPC and we'll hang the kernel at mount due to the infinite
loop.

In commit fd43cf60, we decided that xfs_mod_fdblocks should not hand
out the "free space" used by the free space btrees, because some portion
of the free space btrees hold in reserve space for future btree
expansion.  Unfortunately, xfs_reserve_blocks' estimation of the number
of blocks that it could request from xfs_mod_fdblocks was not updated to
include m_allocbt_blks, so if space is extremely low, the caller hangs.

Fix this by creating a function to estimate the number of blocks that
can be reserved from fdblocks, which needs to exclude the set-aside and
m_allocbt_blks.

Found by running xfs/306 (which formats a single-AG 20MB filesystem)
with an fstests configuration that specifies a 1k blocksize and a
specially crafted log size that will consume 7/8 of the space (17920
blocks, specifically) in that AG.

Cc: Brian Foster <bfoster@redhat.com>
Fixes: fd43cf60 ("xfs: set aside allocation btree blocks from block reservation")
Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
Reviewed-by: NBrian Foster <bfoster@redhat.com>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Conflicts:
	fs/xfs/xfs_mount.h
	[ 15f04fdc("xfs: remove infinite loop when reserving
	  free block pool") applied earlier. ]
Signed-off-by: NZhihao Cheng <chengzhihao1@huawei.com>
Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>

c5472995

xfs: set aside allocation btree blocks from block reservation · 2ce4ef8b

由 Brian Foster 提交于 4月 04, 2023

mainline inclusion
from mainline-v5.13-rc1
commit fd43cf60
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4KIAO
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=fd43cf600cf61c66ae0a1021aca2f636115c7fcb

--------------------------------

The blocks used for allocation btrees (bnobt and countbt) are
technically considered free space. This is because as free space is
used, allocbt blocks are removed and naturally become available for
traditional allocation. However, this means that a significant
portion of free space may consist of in-use btree blocks if free
space is severely fragmented.

On large filesystems with large perag reservations, this can lead to
a rare but nasty condition where a significant amount of physical
free space is available, but the majority of actual usable blocks
consist of in-use allocbt blocks. We have a record of a (~12TB, 32
AG) filesystem with multiple AGs in a state with ~2.5GB or so free
blocks tracked across ~300 total allocbt blocks, but effectively at
100% full because the the free space is entirely consumed by
refcountbt perag reservation.

Such a large perag reservation is by design on large filesystems.
The problem is that because the free space is so fragmented, this AG
contributes the 300 or so allocbt blocks to the global counters as
free space. If this pattern repeats across enough AGs, the
filesystem lands in a state where global block reservation can
outrun physical block availability. For example, a streaming
buffered write on the affected filesystem continues to allow delayed
allocation beyond the point where writeback starts to fail due to
physical block allocation failures. The expected behavior is for the
delalloc block reservation to fail gracefully with -ENOSPC before
physical block allocation failure is a possibility.

To address this problem, set aside in-use allocbt blocks at
reservation time and thus ensure they cannot be reserved until truly
available for physical allocation. This allows alloc btree metadata
to continue to reside in free space, but dynamically adjusts
reservation availability based on internal state. Note that the
logic requires that the allocbt counter is fully populated at
reservation time before it is fully effective. We currently rely on
the mount time AGF scan in the perag reservation initialization code
for this dependency on filesystems where it's most important (i.e.
with active perag reservations).
Signed-off-by: NBrian Foster <bfoster@redhat.com>
Reviewed-by: NChandan Babu R <chandanrlinux@gmail.com>
Reviewed-by: NAllison Henderson <allison.henderson@oracle.com>
Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
Signed-off-by: NZhihao Cheng <chengzhihao1@huawei.com>
Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>

2ce4ef8b

xfs: introduce in-core global counter of allocbt blocks · dccd68f7

由 Brian Foster 提交于 4月 04, 2023

mainline inclusion
from mainline-v5.13-rc1
commit 16eaab83
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4KIAO
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=16eaab839a9273ed156ebfccbd40c15d1e72f3d8

--------------------------------

Introduce an in-core counter to track the sum of all allocbt blocks
used by the filesystem. This value is currently tracked per-ag via
the ->agf_btreeblks field in the AGF, which also happens to include
rmapbt blocks. A global, in-core count of allocbt blocks is required
to identify the subset of global ->m_fdblocks that consists of
unavailable blocks currently used for allocation btrees. To support
this calculation at block reservation time, construct a similar
global counter for allocbt blocks, populate it on first read of each
AGF and update it as allocbt blocks are used and released.
Signed-off-by: NBrian Foster <bfoster@redhat.com>
Reviewed-by: NChandan Babu R <chandanrlinux@gmail.com>
Reviewed-by: NAllison Henderson <allison.henderson@oracle.com>
Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
Signed-off-by: NZhihao Cheng <chengzhihao1@huawei.com>
Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>

dccd68f7

act_mirred: use the backlog for nested calls to mirred ingress · 8ca10d76

由 Davide Caratti 提交于 4月 04, 2023

mainline inclusion
from mainline-v6.3-rc1
commit ca22da2f
category: bugfix
bugzilla: https://gitee.com/src-openeuler/kernel/issues/I64END
CVE: CVE-2022-4269

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v6.2-rc7&id=ca22da2fbd693b54dc8e3b7b54ccc9f7e9ba3640

--------------------------------

William reports kernel soft-lockups on some OVS topologies when TC mirred
egress->ingress action is hit by local TCP traffic [1].
The same can also be reproduced with SCTP (thanks Xin for verifying), when
client and server reach themselves through mirred egress to ingress, and
one of the two peers sends a "heartbeat" packet (from within a timer).

Enqueueing to backlog proved to fix this soft lockup; however, as Cong
noticed [2], we should preserve - when possible - the current mirred
behavior that counts as "overlimits" any eventual packet drop subsequent to
the mirred forwarding action [3]. A compromise solution might use the
backlog only when tcf_mirred_act() has a nest level greater than one:
change tcf_mirred_forward() accordingly.

Also, add a kselftest that can reproduce the lockup and verifies TC mirred
ability to account for further packet drops after TC mirred egress->ingress
(when the nest level is 1).

 [1] https://lore.kernel.org/netdev/33dc43f587ec1388ba456b4915c75f02a8aae226.1663945716.git.dcaratti@redhat.com/
 [2] https://lore.kernel.org/netdev/Y0w%2FWWY60gqrtGLp@pop-os.localdomain/
 [3] such behavior is not guaranteed: for example, if RPS or skb RX
     timestamping is enabled on the mirred target device, the kernel
     can defer receiving the skb and return NET_RX_SUCCESS inside
     tcf_mirred_forward().
Reported-by: NWilliam Zhao <wizhao@redhat.com>
CC: Xin Long <lucien.xin@gmail.com>
Signed-off-by: NDavide Caratti <dcaratti@redhat.com>
Reviewed-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: NJamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
Conflicts:
	net/sched/act_mirred.c
	tools/testing/selftests/net/forwarding/tc_actions.sh
Signed-off-by: NZiyang Xuan <william.xuanziyang@huawei.com>
Reviewed-by: NLiu Jian <liujian56@huawei.com>
Reviewed-by: NWang Weiyang <wangweiyang2@huawei.com>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>

8ca10d76

net/sched: act_mirred: better wording on protection against excessive stack growth · bcb80550

由 Davide Caratti 提交于 4月 04, 2023

mainline inclusion
from mainline-v6.3-rc1
commit 78dcdffe
category: bugfix
bugzilla: https://gitee.com/src-openeuler/kernel/issues/I64END
CVE: CVE-2022-4269

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v6.2-rc7&id=78dcdffe0418ac8f3f057f26fe71ccf4d8ed851f

--------------------------------

with commit e2ca070f ("net: sched: protect against stack overflow in
TC act_mirred"), act_mirred protected itself against excessive stack growth
using per_cpu counter of nested calls to tcf_mirred_act(), and capping it
to MIRRED_RECURSION_LIMIT. However, such protection does not detect
recursion/loops in case the packet is enqueued to the backlog (for example,
when the mirred target device has RPS or skb timestamping enabled). Change
the wording from "recursion" to "nesting" to make it more clear to readers.

CC: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: NDavide Caratti <dcaratti@redhat.com>
Reviewed-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: NJamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
Signed-off-by: NZiyang Xuan <william.xuanziyang@huawei.com>
Reviewed-by: NLiu Jian <liujian56@huawei.com>
Reviewed-by: NWang Weiyang <wangweiyang2@huawei.com>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>

bcb80550

net/sched: act_mirred: refactor the handle of xmit · 8ef0878f

由 wenxu 提交于 4月 04, 2023

mainline inclusion
from mainline-v5.11-rc1
commit fa6d6399
category: bugfix
bugzilla: https://gitee.com/src-openeuler/kernel/issues/I64END
CVE: CVE-2022-4269

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v6.2-rc7&id=fa6d639930ee5cd3f932cc314f3407f07a06582d

--------------------------------

This one is prepare for the next patch.
Signed-off-by: Nwenxu <wenxu@ucloud.cn>
Signed-off-by: NJakub Kicinski <kuba@kernel.org>
Conflicts:
	include/net/sch_generic.h
Signed-off-by: NZiyang Xuan <william.xuanziyang@huawei.com>
Reviewed-by: NLiu Jian <liujian56@huawei.com>
Reviewed-by: NWang Weiyang <wangweiyang2@huawei.com>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>

8ef0878f

scsi: scsi_dh_alua: fix memleak for 'qdata' in alua_activate() · 394dcb96

由 Yu Kuai 提交于 4月 04, 2023

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I6NBAZ
CVE: NA

--------------------------------

If alua_rtpg_queue() failed from alua_activate(), then 'qdata' is not
freed, which will cause following memleak:

unreferenced object 0xffff88810b2c6980 (size 32):
  comm "kworker/u16:2", pid 635322, jiffies 4355801099 (age 1216426.076s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    40 39 24 c1 ff ff ff ff 00 f8 ea 0a 81 88 ff ff  @9$.............
  backtrace:
    [<0000000098f3a26d>] alua_activate+0xb0/0x320
    [<000000003b529641>] scsi_dh_activate+0xb2/0x140
    [<000000007b296db3>] activate_path_work+0xc6/0xe0 [dm_multipath]
    [<000000007adc9ace>] process_one_work+0x3c5/0x730
    [<00000000c457a985>] worker_thread+0x93/0x650
    [<00000000cb80e628>] kthread+0x1ba/0x210
    [<00000000a1e61077>] ret_from_fork+0x22/0x30

Fix the problem by freeing 'qdata' in error path.

Fixes: 625fe857 ("scsi: scsi_dh_alua: Check scsi_device_get() return value")
Signed-off-by: NYu Kuai <yukuai3@huawei.com>
Reviewed-by: NHou Tao <houtao1@huawei.com>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>

394dcb96

anolis: bond: broadcast ARP or ND messages to all slaves · 16dce2a8

由 Tony Lu 提交于 1月 10, 2023

anolis inclusion
from devel-5.10-v5.10.134-12
commit b90e28f7170e1ae40c572f9f80a50bbdc8f8b99f
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I697AN
CVE: NA

Reference: https://gitee.com/anolis/cloud-kernel/commit/b90e28f7170e1ae40c572f9f80a50bbdc8f8b99f

---------------------------

OpenAnolis Bug Tracker:0000282

This is achieved by broadcasting ARP or ND packets to all of its slave
devices on transmit side. The switch will take further actions based on
proper configuration.

A new sysctl knob "net.bonding.broadcast_arp_or_nd" is introduced which
controls the behaviour of broadcasting.
Signed-off-by: NTony Lu <tonylu@linux.alibaba.com>
Acked-by: NDust Li <dust.li@linux.alibaba.com>
Signed-off-by: NQiao Ma <mqaio@linux.alibaba.com>
Reviewed-by: NShile Zhang <shile.zhang@linux.alibaba.com>
Acked-by: NDust Li <dust.li@linux.alibaba.com>
Signed-off-by: NWang Yufen <wangyufen@huawei.com>

16dce2a8

03 4月, 2023 3 次提交

!558 net: hns3: modify reset delay time to avoid configuration timeout · c46c9551

由 openeuler-ci-bot 提交于 4月 03, 2023

Merge Pull Request from: @svishen

Currently the vf function reset needs to delay 5000ms for stack recovery.
This is too long for product configurations and cause configuration failures. According to the tests, 500ms delay is enough for reset process except PF FLR. So this patch adapts this delay in these scenarios.

issue:
https://gitee.com/openeuler/kernel/issues/I6SLBO

Link:https://gitee.com/openeuler/kernel/pulls/558

Reviewed-by: Jialin Zhang <zhangjialin11@huawei.com>
Signed-off-by: Jialin Zhang <zhangjialin11@huawei.com>

c46c9551

drm/loongson: fix compile warnning · 1cdcd126

由 Sui Jingfeng 提交于 3月 31, 2023

LoongArch inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I6RRGJ

--------------------------------

fix compile warnning by remove unused function and variable.
Signed-off-by: NSui Jingfeng <15330273260@189.cn>
Change-Id: I45a9adabcadd4f02e69fab44207aef4883e3fdb9
(cherry picked from commit c24ddfb3)

1cdcd126

net: hns3: modify reset delay time to avoid configuration timeout · ceabd1fd

由 Jie Wang 提交于 4月 03, 2023

driver inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I6SLBO
CVE: NA

----------------------------------------------------------------------

Currently the vf function reset needs to delay 5000ms for stack recovery.
This is too long for product configurations and cause configuration
failures. According to the tests, 500ms delay is enough for reset process
except PF FLR. So this patch adapts this delay in these scenarios.
Signed-off-by: NJie Wang <wangjie125@huawei.com>

ceabd1fd

01 4月, 2023 1 次提交

!541 fix CVE-2023-0266 · 0a481a3e

由 openeuler-ci-bot 提交于 3月 31, 2023

Merge Pull Request from: @barry19901226 
 
fix CVE-2023-0266 
 
Link:https://gitee.com/openeuler/kernel/pulls/541 

Reviewed-by: Zucheng Zheng <zhengzucheng@huawei.com> 
Reviewed-by: Jialin Zhang <zhangjialin11@huawei.com> 
Signed-off-by: Jialin Zhang <zhangjialin11@huawei.com>

0a481a3e

31 3月, 2023 1 次提交

ALSA: pcm: Move rwsem lock inside snd_ctl_elem_read to prevent UAF · 7fae8c8b

由 Clement Lecigne 提交于 3月 31, 2023

stable inclusion
from stable-v5.10.162
commit df02234e6b87d2a9a82acd3198e44bdeff8488c6
category: bugfix
bugzilla: https://gitee.com/src-openeuler/kernel/issues/I6AOWP
CVE: CVE-2023-0266

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=df02234e6b87d2a9a82acd3198e44bdeff8488c6

--------------------------------

[ Note: this is a fix that works around the bug equivalently as the
  two upstream commits:
   1fa4445f ("ALSA: control - introduce snd_ctl_notify_one() helper")
   56b88b50 ("ALSA: pcm: Move rwsem lock inside snd_ctl_elem_read to prevent UAF")
  but in a simpler way to fit with older stable trees -- tiwai ]

Add missing locking in ctl_elem_read_user/ctl_elem_write_user which can be
easily triggered and turned into an use-after-free.

Example code paths with SNDRV_CTL_IOCTL_ELEM_READ:

64-bits:
snd_ctl_ioctl
  snd_ctl_elem_read_user
    [takes controls_rwsem]
    snd_ctl_elem_read [lock properly held, all good]
    [drops controls_rwsem]

32-bits (compat):
snd_ctl_ioctl_compat
  snd_ctl_elem_write_read_compat
    ctl_elem_write_read
      snd_ctl_elem_read [missing lock, not good]

CVE-2023-0266 was assigned for this issue.
Signed-off-by: NClement Lecigne <clecigne@google.com>
Cc: stable@kernel.org # 5.12 and older
Signed-off-by: NTakashi Iwai <tiwai@suse.de>
Reviewed-by: NJaroslav Kysela <perex@perex.cz>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NHui Tang <tanghui20@huawei.com>

7fae8c8b

29 3月, 2023 10 次提交

!529 Backport CVEs and bugfixes · 358a5fb9

由 openeuler-ci-bot 提交于 3月 29, 2023

Merge Pull Request from: @zhangjialin11 
 
Pull new CVEs:
CVE-2023-1281
CVE-2022-48423
CVE-2023-1249
CVE-2022-48425
CVE-2022-48424
CVE-2023-28327
CVE-2023-28466
CVE-2023-1380

block and md/raid6 bugfixes from Zhong Jinghua
fs bugfixes from Zhihao Cheng and Baokun Li
tty bugfix from Yi Yang
mm bugfixes from ZhangPeng and Ze Zuo
bpf bugfixes from Pu Lehui and Liu Jian
ima bugfix from GUO Zihua
softirq and arch bugfixes from Lin Yujun 
 
Link:https://gitee.com/openeuler/kernel/pulls/529 

Reviewed-by: Zheng Zengkai <zhengzengkai@huawei.com> 
Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com>

358a5fb9

block: fix use-after-free of q->q_usage_counter · 3a651d39

由 Ming Lei 提交于 3月 29, 2023

mainline inclusion
from mainline-v6.2-rc1
commit d36a9ea5
category: bugfix
bugzilla: 187268, https://gitee.com/openeuler/kernel/issues/I5N162

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d36a9ea5e7766961e753ee38d4c331bbe6ef659b

----------------------------------------

For blk-mq, queue release handler is usually called after
blk_mq_freeze_queue_wait() returns. However, the
q_usage_counter->release() handler may not be run yet at that time, so
this can cause a use-after-free.

Fix the issue by moving percpu_ref_exit() into blk_free_queue_rcu().
Since ->release() is called with rcu read lock held, it is agreed that
the race should be covered in caller per discussion from the two links.
Reported-by: NZhang Wensheng <zhangwensheng@huaweicloud.com>
Reported-by: NZhong Jinghua <zhongjinghua@huawei.com>
Link: https://lore.kernel.org/linux-block/Y5prfOjyyjQKUrtH@T590/T/#u
Link: https://lore.kernel.org/lkml/Y4%2FmzMd4evRg9yDi@fedora/
Cc: Hillf Danton <hdanton@sina.com>
Cc: Yu Kuai <yukuai3@huawei.com>
Cc: Dennis Zhou <dennis@kernel.org>
Fixes: 2b0d3d3e ("percpu_ref: reduce memory footprint of percpu_ref in fast path")
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20221215021629.74870-1-ming.lei@redhat.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

conflicts:
 block/blk-sysfs.c
 block/blk-core.c
Signed-off-by: NZhong Jinghua <zhongjinghua@huawei.com>
Reviewed-by: NHou Tao <houtao1@huawei.com>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>

3a651d39

block: move q_usage_counter release into blk_queue_release · cde762ef

由 Ming Lei 提交于 3月 29, 2023

mainline inclusion
from mainline-v5.18-rc1
commit ba3e8456
category: bugfix
bugzilla: 187268, https://gitee.com/openeuler/kernel/issues/I5N162

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ba3e845665fbbb0252336f27200cd5cf288a3573

----------------------------------------

After blk_cleanup_queue() returns, disk may not be released yet, so
probably bio may still be submitted and ->q_usage_counter may be
touched, so far this way seems safe, but not good from API's viewpoint.

Move the release q_usage_counter into blk_queue_release().
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Reviewed-by: NBart Van Assche <bvanassche@acm.org>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220308055200.735835-12-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

conflicts:
 block/blk-core.c
 block/blk-sysfs.c
Signed-off-by: NZhong Jinghua <zhongjinghua@huawei.com>
Reviewed-by: NHou Tao <houtao1@huawei.com>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>

cde762ef

Revert "block: fix null-deref in percpu_ref_put" · 60469540

由 Zhong Jinghua 提交于 3月 29, 2023

hulk inclusion
category: bugfix
bugzilla: 187268, https://gitee.com/openeuler/kernel/issues/I5N162

----------------------------------------

This reverts commit 51e35e67.

There is a new fix for this problem in the mainline patch, so the patch
should return to the mainline solution.

mainline patch:
d36a9ea5 ("block: fix use-after-free of q->q_usage_counter")

Fixes: 51e35e67("block: fix null-deref in percpu_ref_put")
Signed-off-by: NZhong Jinghua <zhongjinghua@huawei.com>
Reviewed-by: NHou Tao <houtao1@huawei.com>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>

60469540

mm: compaction: avoid possible NULL pointer dereference in kcompactd_cpu_online · 4928d2df

由 Miaohe Lin 提交于 3月 29, 2023

mainline inclusion
from mainline-v6.3-rc1
commit 3109de30
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I6POXN
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=3109de308987ceae413ee015038d51e2a86c7806

--------------------------------

It's possible that kcompactd_run could fail to run kcompactd for a hot
added node and leave pgdat->kcompactd as NULL.  So pgdat->kcompactd should
be checked here to avoid possible NULL pointer dereference.

Link: https://lkml.kernel.org/r/20220418141253.24298-10-linmiaohe@huawei.comSigned-off-by: NMiaohe Lin <linmiaohe@huawei.com>
Cc: Charan Teja Kalla <charante@codeaurora.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: Pintu Kumar <pintu@codeaurora.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NZe Zuo <zuoze1@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>

4928d2df

md/raid6: Fix the problem of repeatedly applying for memory in raid5_read_one_chunk · 5c00a00f

由 Zhong Jinghua 提交于 3月 29, 2023

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I6OSXU

----------------------------------------

commit "md/raid6: refactor raid5_read_one_chunk" incorrectly merged the
code.
Repeatedly applying for memory leads to memory leaks.

Fix it by removing redundant allocating memory code.

Fixes: c13c2cd2 ("md/raid6: refactor raid5_read_one_chunk")
Signed-off-by: NZhong Jinghua <zhongjinghua@huawei.com>
Reviewed-by: NHou Tao <houtao1@huawei.com>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>

5c00a00f

xfs, iomap: limit individual ioend chain lengths in writeback · c5883137

由 Dave Chinner 提交于 3月 29, 2023

mainline inclusion
from mainline-v5.17-rc3
commit ebb7fb15
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4KIAO
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ebb7fb1557b1d03b906b668aa2164b51e6b7d19a

--------------------------------

Trond Myklebust reported soft lockups in XFS IO completion such as
this:

watchdog: BUG: soft lockup - CPU#12 stuck for 23s! [kworker/12:1:3106]
CPU: 12 PID: 3106 Comm: kworker/12:1 Not tainted 4.18.0-305.10.2.el8_4.x86_64 #1
Workqueue: xfs-conv/md127 xfs_end_io [xfs]
RIP: 0010:_raw_spin_unlock_irqrestore+0x11/0x20
Call Trace:
wake_up_page_bit+0x8a/0x110
iomap_finish_ioend+0xd7/0x1c0
iomap_finish_ioends+0x7f/0xb0
xfs_end_ioend+0x6b/0x100 [xfs]
xfs_end_io+0xb9/0xe0 [xfs]
process_one_work+0x1a7/0x360
worker_thread+0x1fa/0x390
kthread+0x116/0x130
ret_from_fork+0x35/0x40

Ioends are processed as an atomic completion unit when all the
chained bios in the ioend have completed their IO. Logically
contiguous ioends can also be merged and completed as a single,
larger unit. Both of these things can be problematic as both the
bio chains per ioend and the size of the merged ioends processed as
a single completion are both unbound.

If we have a large sequential dirty region in the page cache,
write_cache_pages() will keep feeding us sequential pages and we
will keep mapping them into ioends and bios until we get a dirty
page at a non-sequential file offset. These large sequential runs
can will result in bio and ioend chaining to optimise the io
patterns. The pages iunder writeback are pinned within these chains
until the submission chaining is broken, allowing the entire chain
to be completed. This can result in huge chains being processed
in IO completion context.

We get deep bio chaining if we have large contiguous physical
extents. We will keep adding pages to the current bio until it is
full, then we'll chain a new bio to keep adding pages for writeback.
Hence we can build bio chains that map millions of pages and tens of
gigabytes of RAM if the page cache contains big enough contiguous
dirty file regions. This long bio chain pins those pages until the
final bio in the chain completes and the ioend can iterate all the
chained bios and complete them.

OTOH, if we have a physically fragmented file, we end up submitting
one ioend per physical fragment that each have a small bio or bio
chain attached to them. We do not chain these at IO submission time,
but instead we chain them at completion time based on file
offset via iomap_ioend_try_merge(). Hence we can end up with unbound
ioend chains being built via completion merging.

XFS can then do COW remapping or unwritten extent conversion on that
merged chain, which involves walking an extent fragment at a time
and running a transaction to modify the physical extent information.
IOWs, we merge all the discontiguous ioends together into a
contiguous file range, only to then process them individually as
discontiguous extents.

This extent manipulation is computationally expensive and can run in
a tight loop, so merging logically contiguous but physically
discontigous ioends gains us nothing except for hiding the fact the
fact we broke the ioends up into individual physical extents at
submission and then need to loop over those individual physical
extents at completion.

Hence we need to have mechanisms to limit ioend sizes and
to break up completion processing of large merged ioend chains:

1. bio chains per ioend need to be bound in length. Pure overwrites
go straight to iomap_finish_ioend() in softirq context with the
exact bio chain attached to the ioend by submission. Hence the only
way to prevent long holdoffs here is to bound ioend submission
sizes because we can't reschedule in softirq context.

2. iomap_finish_ioends() has to handle unbound merged ioend chains
correctly. This relies on any one call to iomap_finish_ioend() being
bound in runtime so that cond_resched() can be issued regularly as
the long ioend chain is processed. i.e. this relies on mechanism #1
to limit individual ioend sizes to work correctly.

3. filesystems have to loop over the merged ioends to process
physical extent manipulations. This means they can loop internally,
and so we break merging at physical extent boundaries so the
filesystem can easily insert reschedule points between individual
extent manipulations.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reported-and-tested-by: NTrond Myklebust <trondmy@hammerspace.com>
Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
Conflicts:
include/linux/iomap.h
fs/iomap/buffered-io.c
fs/xfs/xfs_aops.c

[ 6e552494 ("iomap: remove unused private field from ioend")
is not applied.
95c4cd05 ("iomap: Convert to_iomap_page to take a folio") is
not applied.
8ffd74e9 ("iomap: Convert bio completions to use folios") is
not applied.
044c6449 ("xfs: drop unused ioend private merge and
setfilesize code") is not applied. ]
Signed-off-by: NZhihao Cheng <chengzhihao1@huawei.com>
Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>

c5883137

net/sched: tcindex: search key must be 16 bits · 01359ed3

由 Pedro Tammela 提交于 3月 29, 2023

stable inclusion
from stable-v5.10.168
commit 4fe9950815e19051b7b8268b4d4c3ac286a741bf
category: bugfix
bugzilla: 188576, https://gitee.com/src-openeuler/kernel/issues/I6OP9S
CVE: CVE-2023-1281

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=4fe9950815e19051b7b8268b4d4c3ac286a741bf

---------------------------

[ Upstream commit 42018a32 ]

Syzkaller found an issue where a handle greater than 16 bits would trigger
a null-ptr-deref in the imperfect hash area update.

general protection fault, probably for non-canonical address
0xdffffc0000000015: 0000 [#1] PREEMPT SMP KASAN
KASAN: null-ptr-deref in range [0x00000000000000a8-0x00000000000000af]
CPU: 0 PID: 5070 Comm: syz-executor456 Not tainted
6.2.0-rc7-syzkaller-00112-gc68f345b #0
Hardware name: Google Google Compute Engine/Google Compute Engine,
BIOS Google 01/21/2023
RIP: 0010:tcindex_set_parms+0x1a6a/0x2990 net/sched/cls_tcindex.c:509
Code: 01 e9 e9 fe ff ff 4c 8b bd 28 fe ff ff e8 0e 57 7d f9 48 8d bb
a8 00 00 00 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <80> 3c
02 00 0f 85 94 0c 00 00 48 8b 85 f8 fd ff ff 48 8b 9b a8 00
RSP: 0018:ffffc90003d3ef88 EFLAGS: 00010202
RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000000015 RSI: ffffffff8803a102 RDI: 00000000000000a8
RBP: ffffc90003d3f1d8 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000000 R12: ffff88801e2b10a8
R13: dffffc0000000000 R14: 0000000000030000 R15: ffff888017b3be00
FS: 00005555569af300(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000056041c6d2000 CR3: 000000002bfca000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
tcindex_change+0x1ea/0x320 net/sched/cls_tcindex.c:572
tc_new_tfilter+0x96e/0x2220 net/sched/cls_api.c:2155
rtnetlink_rcv_msg+0x959/0xca0 net/core/rtnetlink.c:6132
netlink_rcv_skb+0x165/0x440 net/netlink/af_netlink.c:2574
netlink_unicast_kernel net/netlink/af_netlink.c:1339 [inline]
netlink_unicast+0x547/0x7f0 net/netlink/af_netlink.c:1365
netlink_sendmsg+0x91b/0xe10 net/netlink/af_netlink.c:1942
sock_sendmsg_nosec net/socket.c:714 [inline]
sock_sendmsg+0xd3/0x120 net/socket.c:734
____sys_sendmsg+0x334/0x8c0 net/socket.c:2476
___sys_sendmsg+0x110/0x1b0 net/socket.c:2530
__sys_sendmmsg+0x18f/0x460 net/socket.c:2616
__do_sys_sendmmsg net/socket.c:2645 [inline]
__se_sys_sendmmsg net/socket.c:2642 [inline]
__x64_sys_sendmmsg+0x9d/0x100 net/socket.c:2642
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80

Fixes: ee059170 ("net/sched: tcindex: update imperfect hash filters respecting rcu")
Signed-off-by: NJamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: NPedro Tammela <pctammela@mojatatu.com>
Reported-by: Nsyzbot <syzkaller@googlegroups.com>
Reviewed-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NDong Chenchen <dongchenchen2@huawei.com>
Reviewed-by: NYue Haibing <yuehaibing@huawei.com>
Reviewed-by: NWang Weiyang <wangweiyang2@huawei.com>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>

01359ed3

net/sched: tcindex: update imperfect hash filters respecting rcu · c5eaa264

由 Pedro Tammela 提交于 3月 29, 2023

stable inclusion
from stable-v5.10.168
commit eb8e9d8572d1d9df17272783ad8a84843ce559d4
category: bugfix
bugzilla: 188576, https://gitee.com/src-openeuler/kernel/issues/I6OP9S
CVE: CVE-2023-1281

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=eb8e9d8572d1d9df17272783ad8a84843ce559d4

---------------------------

commit ee059170 upstream.

The imperfect hash area can be updated while packets are traversing,
which will cause a use-after-free when 'tcf_exts_exec()' is called
with the destroyed tcf_ext.

CPU 0:               CPU 1:
tcindex_set_parms    tcindex_classify
tcindex_lookup
                     tcindex_lookup
tcf_exts_change
                     tcf_exts_exec [UAF]

Stop operating on the shared area directly, by using a local copy,
and update the filter with 'rcu_replace_pointer()'. Delete the old
filter version only after a rcu grace period elapsed.

Fixes: 9b0d4446 ("net: sched: avoid atomic swap in tcf_exts_change")
Reported-by: Nvalis <sec@valis.email>
Suggested-by: Nvalis <sec@valis.email>
Signed-off-by: NJamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: NPedro Tammela <pctammela@mojatatu.com>
Link: https://lore.kernel.org/r/20230209143739.279867-1-pctammela@mojatatu.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NDong Chenchen <dongchenchen2@huawei.com>
Reviewed-by: NYue Haibing <yuehaibing@huawei.com>
Reviewed-by: NWang Weiyang <wangweiyang2@huawei.com>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>

c5eaa264

tty: fix out-of-bounds access in tty_driver_lookup_tty() · 03106cb1

由 Sven Schnelle 提交于 3月 29, 2023

stable inclusion
from stable-v5.10.173
commit 84ea44dc3e4ecb2632586238014bf6722aa5843b
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I6Q4F0
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=84ea44dc3e4ecb2632586238014bf6722aa5843b

--------------------------------

[ Upstream commit db4df8e9 ]

When specifying an invalid console= device like console=tty3270,
tty_driver_lookup_tty() returns the tty struct without checking
whether index is a valid number.

To reproduce:

qemu-system-x86_64 -enable-kvm -nographic -serial mon:stdio \
-kernel ../linux-build-x86/arch/x86/boot/bzImage \
-append "console=ttyS0 console=tty3270"

This crashes with:

[    0.770599] BUG: kernel NULL pointer dereference, address: 00000000000000ef
[    0.771265] #PF: supervisor read access in kernel mode
[    0.771773] #PF: error_code(0x0000) - not-present page
[    0.772609] Oops: 0000 [#1] PREEMPT SMP PTI
[    0.774878] RIP: 0010:tty_open+0x268/0x6f0
[    0.784013]  chrdev_open+0xbd/0x230
[    0.784444]  ? cdev_device_add+0x80/0x80
[    0.784920]  do_dentry_open+0x1e0/0x410
[    0.785389]  path_openat+0xca9/0x1050
[    0.785813]  do_filp_open+0xaa/0x150
[    0.786240]  file_open_name+0x133/0x1b0
[    0.786746]  filp_open+0x27/0x50
[    0.787244]  console_on_rootfs+0x14/0x4d
[    0.787800]  kernel_init_freeable+0x1e4/0x20d
[    0.788383]  ? rest_init+0xc0/0xc0
[    0.788881]  kernel_init+0x11/0x120
[    0.789356]  ret_from_fork+0x22/0x30
Signed-off-by: NSven Schnelle <svens@linux.ibm.com>
Reviewed-by: NJiri Slaby <jirislaby@kernel.org>
Link: https://lore.kernel.org/r/20221209112737.3222509-2-svens@linux.ibm.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NYi Yang <yiyang13@huawei.com>
Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com>
Reviewed-by: NCai Xinchen <caixinchen1@huawei.com>
Reviewed-by: NWang Weiyang <wangweiyang2@huawei.com>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>

03106cb1

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功