提交 · dfaabffc48de87d58cc5691036d3f8c67149abbc · openeuler / Kernel

14 7月, 2023 2 次提交

ext4: Add debug message to notify user space is out of free · dfaabffc

由 Zhihao Cheng 提交于 7月 14, 2023

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I7CBCS
CVE: NA

--------------------------------

Add debug message to notify user that ext4_writepages is stuck in loop
caused by ENOSPC.
Signed-off-by: NZhihao Cheng <chengzhihao1@huawei.com>
(cherry picked from commit 4ae7e703)

dfaabffc

Revert "ext4: Stop trying writing pages if no free blocks generated" · 1dd1a93d

由 Zhihao Cheng 提交于 7月 14, 2023

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I7CBCS
CVE: NA

--------------------------------

This reverts commit 07a8109d.

When ext4 runs out of space, there could be a potential data lost in
ext4_writepages:
If there are many preallocated blocks for some files, e4b bitmap is
different from block bitmap, and there are more free blocks accounted
by block bitmap.

    ext4_writepages                         P2
ext4_mb_new_blocks                  ext4_map_blocks
 ext4_mb_regular_allocator // No free bits in e4b bitmap
 ext4_mb_discard_preallocations_should_retry
  ext4_mb_discard_preallocations
   ext4_mb_discard_group_preallocations
    ext4_mb_release_inode_pa // updates e4b bitmap by pa->pa_free
     mb_free_blocks
                                     ext4_mb_new_blocks
                                      ext4_mb_regular_allocator
                                      // Got e4b bitmap's free bits
 ext4_mb_regular_allocator  // After 3 times retrying, ret ENOSPC

ext4_writepages
 mpage_map_and_submit_extent
  mpage_map_one_extent // ret ENOSPC
  if (err == -ENOSPC && EXT4_SB(sb)->s_mb_free_pending)
  // s_mb_free_pending is 0
  *give_up_on_write = true  // Abandon writeback, data lost!

Fixes: 07a8109d ("ext4: Stop trying writing pages if no free ...")
Signed-off-by: NZhihao Cheng <chengzhihao1@huawei.com>
(cherry picked from commit 5f142164)

1dd1a93d

07 7月, 2023 1 次提交

ext4: turning quotas off if mount failed after enable quotas · 21acb3f9

由 Baokun Li 提交于 7月 01, 2023

mainline inclusion
from mainline-v6.5
commit d13f99632748462c32fc95d729f5e754bab06064
category: bugfix
bugzilla: 188906, https://gitee.com/openeuler/kernel/issues/I7E9M5
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d13f99632748462c32fc95d729f5e754bab06064

--------------------------------

Yi found during a review of the patch "ext4: don't BUG on inconsistent
journal feature" that when ext4_mark_recovery_complete() returns an error
value, the error handling path does not turn off the enabled quotas,
which triggers the following kmemleak:

================================================================
unreferenced object 0xffff8cf68678e7c0 (size 64):
comm "mount", pid 746, jiffies 4294871231 (age 11.540s)
hex dump (first 32 bytes):
00 90 ef 82 f6 8c ff ff 00 00 00 00 41 01 00 00  ............A...
c7 00 00 00 bd 00 00 00 0a 00 00 00 48 00 00 00  ............H...
backtrace:
[<00000000c561ef24>] __kmem_cache_alloc_node+0x4d4/0x880
[<00000000d4e621d7>] kmalloc_trace+0x39/0x140
[<00000000837eee74>] v2_read_file_info+0x18a/0x3a0
[<0000000088f6c877>] dquot_load_quota_sb+0x2ed/0x770
[<00000000340a4782>] dquot_load_quota_inode+0xc6/0x1c0
[<0000000089a18bd5>] ext4_enable_quotas+0x17e/0x3a0 [ext4]
[<000000003a0268fa>] __ext4_fill_super+0x3448/0x3910 [ext4]
[<00000000b0f2a8a8>] ext4_fill_super+0x13d/0x340 [ext4]
[<000000004a9489c4>] get_tree_bdev+0x1dc/0x370
[<000000006e723bf1>] ext4_get_tree+0x1d/0x30 [ext4]
[<00000000c7cb663d>] vfs_get_tree+0x31/0x160
[<00000000320e1bed>] do_new_mount+0x1d5/0x480
[<00000000c074654c>] path_mount+0x22e/0xbe0
[<0000000003e97a8e>] do_mount+0x95/0xc0
[<000000002f3d3736>] __x64_sys_mount+0xc4/0x160
[<0000000027d2140c>] do_syscall_64+0x3f/0x90
================================================================

To solve this problem, we add a "failed_mount10" tag, and call
ext4_quota_off_umount() in this tag to release the enabled qoutas.

Fixes: 11215630 ("ext4: don't BUG on inconsistent journal feature")
Cc: stable@kernel.org
Signed-off-by: NZhang Yi <yi.zhang@huawei.com>
Signed-off-by: NBaokun Li <libaokun1@huawei.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20230327141630.156875-2-libaokun1@huawei.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

Conflicts:
	fs/ext4/super.c
Signed-off-by: NBaokun Li <libaokun1@huawei.com>
(cherry picked from commit e980e714)

21acb3f9

06 7月, 2023 1 次提交

ext4: Stop trying writing pages if no free blocks generated · ef111117

由 Zhihao Cheng 提交于 7月 05, 2023

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I7CBCS

--------------------------------

Folllowing steps could make ext4_wripages trap into a dead loop:

1. Consume free_clusters until free_clusters > 2 * sbi->s_resv_clusters,
   and free_clusters > EXT4_FREECLUSTERS_WATERMARK.
   // eg. free_clusters = 1422, sbi->s_resv_clusters = 512
   // nr_cpus = 4, EXT4_FREECLUSTERS_WATERMARK = 512
2. umount && mount.  // dirty_clusters = 0
3. Run free_clusters tasks concurrently to write different files, many
   tasks write(appendant) 4K data by da_write method. And each inode will
   consume one data block and one extent block in map_block.
   // There are (free_clusters - EXT4_FREECLUSTERS_WATERMARK = 910)
   // tasks choosing da_write method, left 512 tasks choose write_begin
   // method. If tasks which chooses da_write path run first.
   // dirty_clusters = 910, free_clusters = 1422
   // Tasks which choose write_begin path will get ENOSPC:
   //  free_clusters < (nclusters + dirty_clusters + resv_clusters)
   //  1422 < (1 + 910 + 512)
4. After certain number of map_block iterations in ext4_writepages.
   // free_clusters = 0,
   // dirty_clusters = 910 - (1422 / 2) = 199
5. Delete one 4K file.  // free_clusters = 1
6. ext4_writepages traps into dead loop:
    mpage_map_and_submit_extent
     mpage_map_one_extent // ret = ENOSPC
       ext4_map_blocks -> ext4_ext_map_blocks -> ext4_mb_new_blocks ->
       ext4_claim_free_clusters:
         if (free_clusters >= (nclusters + dirty_clusters)) // false
     if (err == -ENOSPC && ext4_count_free_clusters(sb)) // true
       return err
     *give_up_on_write = true // won't be executed

Fix it by terminating ext4_writepages if no free blocks generated.
Signed-off-by: NZhihao Cheng <chengzhihao1@huawei.com>
(cherry picked from commit 07a8109d)

ef111117

05 6月, 2023 1 次提交

Revert "ext4: dio take shared inode lock when overwriting preallocated blocks" · dab7ae63

由 Baokun Li 提交于 6月 05, 2023

hulk inclusion
category: perf
bugzilla: 188836, https://gitee.com/openeuler/kernel/issues/I7AYNZ

--------------------------------

This reverts commit 5193a88e.
This commit may cause performance degradation, so it is being reverted
temporarily.
Signed-off-by: NBaokun Li <libaokun1@huawei.com>
(cherry picked from commit 6ff958e1)

dab7ae63

02 6月, 2023 1 次提交

ext4: mitigatin cacheline false sharing in struct ext4_inode_info · b18ddb66

由 Yang Yingliang 提交于 6月 02, 2023

hulk inclusion
category: performance
bugzilla: https://gitee.com/openeuler/kernel/issues/I7ADMY

--------------------------------

In the test of execl, shell1 and shell8 of UnixBench, L3 false sharing
occurs between rwsem_try_write_lock_unqueued() and filemap_map_pages().

The offset between address_space.host and address_space.i_mmap_rwsem
is 48. It may occur L3 false sharing. Their offsets in struct ext4_inode_info
is 696 and 744, so when the address of ext4_inode_info after L3 aligned,
it may occur L3 false sharing in the following condition:

[0x00 ~ 0x10] false sharing
[0x18 ~ 0x40] no false sharing
[0x48 ~ 0x80] false sharing

Change the offset of 'vfs_inode' from 320 to 360 in ext4_inode_info and
make the address of ext4_inode_info L3 aligned, so the offset of host
and i_mmap_rwsem in ext4_inode_info is changed to 736 and 784, it can
make them in different L3 to avoid false sharing.

./Run -c 96 -i 3 execl

Before this patch:
System Benchmarks Partial Index              BASELINE       RESULT    INDEX
Execl Throughput                                 43.0      24238.0   5636.8
                                                                   ========
System Benchmarks Index Score (Partial Only)                         5636.8

After this patch:
System Benchmarks Partial Index              BASELINE       RESULT    INDEX
Execl Throughput                                 43.0      29363.7   6828.8
                                                                   ========
System Benchmarks Index Score (Partial Only)                         6828.8
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

b18ddb66

01 6月, 2023 1 次提交

ext4: avoid a potential slab-out-of-bounds in ext4_group_desc_csum · 2794f826

由 Tudor Ambarus 提交于 5月 31, 2023

stable inclusion
from stable-v5.10.180
commit 0dde3141c527b09b96bef1e7eeb18b8127810ce9
category: bugfix
bugzilla: 188791,https://gitee.com/openeuler/kernel/issues/I76XUJ

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=0dde3141c527b09b96bef1e7eeb18b8127810ce9

--------------------------------

commit 4f043518 upstream.

When modifying the block device while it is mounted by the filesystem,
syzbot reported the following:

BUG: KASAN: slab-out-of-bounds in crc16+0x206/0x280 lib/crc16.c:58
Read of size 1 at addr ffff888075f5c0a8 by task syz-executor.2/15586

CPU: 1 PID: 15586 Comm: syz-executor.2 Not tainted 6.2.0-rc5-syzkaller-00205-gc9661827 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/12/2023
Call Trace:
 <TASK>
 __dump_stack lib/dump_stack.c:88 [inline]
 dump_stack_lvl+0x1b1/0x290 lib/dump_stack.c:106
 print_address_description+0x74/0x340 mm/kasan/report.c:306
 print_report+0x107/0x1f0 mm/kasan/report.c:417
 kasan_report+0xcd/0x100 mm/kasan/report.c:517
 crc16+0x206/0x280 lib/crc16.c:58
 ext4_group_desc_csum+0x81b/0xb20 fs/ext4/super.c:3187
 ext4_group_desc_csum_set+0x195/0x230 fs/ext4/super.c:3210
 ext4_mb_clear_bb fs/ext4/mballoc.c:6027 [inline]
 ext4_free_blocks+0x191a/0x2810 fs/ext4/mballoc.c:6173
 ext4_remove_blocks fs/ext4/extents.c:2527 [inline]
 ext4_ext_rm_leaf fs/ext4/extents.c:2710 [inline]
 ext4_ext_remove_space+0x24ef/0x46a0 fs/ext4/extents.c:2958
 ext4_ext_truncate+0x177/0x220 fs/ext4/extents.c:4416
 ext4_truncate+0xa6a/0xea0 fs/ext4/inode.c:4342
 ext4_setattr+0x10c8/0x1930 fs/ext4/inode.c:5622
 notify_change+0xe50/0x1100 fs/attr.c:482
 do_truncate+0x200/0x2f0 fs/open.c:65
 handle_truncate fs/namei.c:3216 [inline]
 do_open fs/namei.c:3561 [inline]
 path_openat+0x272b/0x2dd0 fs/namei.c:3714
 do_filp_open+0x264/0x4f0 fs/namei.c:3741
 do_sys_openat2+0x124/0x4e0 fs/open.c:1310
 do_sys_open fs/open.c:1326 [inline]
 __do_sys_creat fs/open.c:1402 [inline]
 __se_sys_creat fs/open.c:1396 [inline]
 __x64_sys_creat+0x11f/0x160 fs/open.c:1396
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7f72f8a8c0c9
Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 f1 19 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f72f97e3168 EFLAGS: 00000246 ORIG_RAX: 0000000000000055
RAX: ffffffffffffffda RBX: 00007f72f8bac050 RCX: 00007f72f8a8c0c9
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000020000280
RBP: 00007f72f8ae7ae9 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ffd165348bf R14: 00007f72f97e3300 R15: 0000000000022000

Replace
	le16_to_cpu(sbi->s_es->s_desc_size)
with
	sbi->s_desc_size

It reduces ext4's compiled text size, and makes the code more efficient
(we remove an extra indirect reference and a potential byte
swap on big endian systems), and there is no downside. It also avoids the
potential KASAN / syzkaller failure, as a bonus.

Reported-by: syzbot+fc51227e7100c9294894@syzkaller.appspotmail.com
Reported-by: syzbot+8785e41224a3afd04321@syzkaller.appspotmail.com
Link: https://syzkaller.appspot.com/bug?id=70d28d11ab14bd7938f3e088365252aa923cff42
Link: https://syzkaller.appspot.com/bug?id=b85721b38583ecc6b5e72ff524c67302abbc30f3
Link: https://lore.kernel.org/all/000000000000ece18705f3b20934@google.com/
Fixes: 717d50e4 ("Ext4: Uninitialized Block Groups")
Cc: stable@vger.kernel.org
Signed-off-by: NTudor Ambarus <tudor.ambarus@linaro.org>
Link: https://lore.kernel.org/r/20230504121525.3275886-1-tudor.ambarus@linaro.orgSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NBaokun Li <libaokun1@huawei.com>
Reviewed-by: NYang Erkun <yangerkun@huawei.com>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
(cherry picked from commit 0d4053b9)

2794f826

10 5月, 2023 2 次提交

ext4: check iomap type only if ext4_iomap_begin() does not fail · e9ab135b

由 Baokun Li 提交于 5月 09, 2023

maillist inclusion
category: bugfix
bugzilla: 188724, https://gitee.com/openeuler/kernel/issues/I70Q22

Reference: https://www.spinics.net/lists/kernel/msg4779681.html

----------------------------------------

When ext4_iomap_overwrite_begin() calls ext4_iomap_begin() map blocks may
fail for some reason (e.g. memory allocation failure, bare disk write), and
later because "iomap->type ! = IOMAP_MAPPED" triggers WARN_ON(). When ext4
iomap_begin() returns an error, it is normal that the type of iomap->type
may not match the expectation. Therefore, we only determine if iomap->type
is as expected when ext4_iomap_begin() is executed successfully.

Reported-by: syzbot+08106c4b7d60702dbc14@syzkaller.appspotmail.com
Link: https://lore.kernel.org/all/00000000000015760b05f9b4eee9@google.comReviewed-by: NJan Kara <jack@suse.cz>
Signed-off-by: NBaokun Li <libaokun1@huawei.com>
Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
Reviewed-by: NYang Erkun <yangerkun@huawei.com>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>

e9ab135b

ext4: only update i_reserved_data_blocks on successful block allocation · 9ecf9e08

由 Baokun Li 提交于 5月 09, 2023

maillist inclusion
category: bugfix
bugzilla: 188499, https://gitee.com/openeuler/kernel/issues/I6TNVT
CVE: NA

Reference: https://patchwork.ozlabs.org/project/linux-ext4/patch/20230412124126.2286716-2-libaokun1@huawei.com/

----------------------------------------

In our fault injection test, we create an ext4 file, migrate it to
non-extent based file, then punch a hole and finally trigger a WARN_ON
in the ext4_da_update_reserve_space():

EXT4-fs warning (device sda): ext4_da_update_reserve_space:369:
ino 14, used 11 with only 10 reserved data blocks

When writing back a non-extent based file, if we enable delalloc, the
number of reserved blocks will be subtracted from the number of blocks
mapped by ext4_ind_map_blocks(), and the extent status tree will be
updated. We update the extent status tree by first removing the old
extent_status and then inserting the new extent_status. If the block range
we remove happens to be in an extent, then we need to allocate another
extent_status with ext4_es_alloc_extent().

       use old    to remove   to add new
    |----------|------------|------------|
              old extent_status

The problem is that the allocation of a new extent_status failed due to a
fault injection, and __es_shrink() did not get free memory, resulting in
a return of -ENOMEM. Then do_writepages() retries after receiving -ENOMEM,
we map to the same extent again, and the number of reserved blocks is again
subtracted from the number of blocks in that extent. Since the blocks in
the same extent are subtracted twice, we end up triggering WARN_ON at
ext4_da_update_reserve_space() because used > ei->i_reserved_data_blocks.

For non-extent based file, we update the number of reserved blocks after
ext4_ind_map_blocks() is executed, which causes a problem that when we call
ext4_ind_map_blocks() to create a block, it doesn't always create a block,
but we always reduce the number of reserved blocks. So we move the logic
for updating reserved blocks to ext4_ind_map_blocks() to ensure that the
number of reserved blocks is updated only after we do succeed in allocating
some new blocks.

Fixes: 5f634d06 ("ext4: Fix quota accounting error with fallocate")
Reviewed-by: NJan Kara <jack@suse.cz>
Signed-off-by: NBaokun Li <libaokun1@huawei.com>
Reviewed-by: NYang Erkun <yangerkun@huawei.com>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>

9ecf9e08

19 4月, 2023 9 次提交

ext4: continue to expand file system when the target size doesn't reach · 5d5daa93

由 Jerry Lee 李修賢提交于 4月 18, 2023

stable inclusion
from stable-v5.10.150
commit 67de22cb0b6c6a6a71a472dbe8f0b3f3eb60565d
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I6D0XA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=67de22cb0b6c6a6a71a472dbe8f0b3f3eb60565d

--------------------------------

commit df3cb754 upstream.

When expanding a file system from (16TiB-2MiB) to 18TiB, the operation
exits early which leads to result inconsistency between resize2fs and
Ext4 kernel driver.

=== before ===
○ → resize2fs /dev/mapper/thin
resize2fs 1.45.5 (07-Jan-2020)
Filesystem at /dev/mapper/thin is mounted on /mnt/test; on-line resizing required
old_desc_blocks = 2048, new_desc_blocks = 2304
The filesystem on /dev/mapper/thin is now 4831837696 (4k) blocks long.

[  865.186308] EXT4-fs (dm-5): mounted filesystem with ordered data mode. Opts: (null). Quota mode: none.
[  912.091502] dm-4: detected capacity change from 34359738368 to 38654705664
[  970.030550] dm-5: detected capacity change from 34359734272 to 38654701568
[ 1000.012751] EXT4-fs (dm-5): resizing filesystem from 4294966784 to 4831837696 blocks
[ 1000.012878] EXT4-fs (dm-5): resized filesystem to 4294967296

=== after ===
[  129.104898] EXT4-fs (dm-5): mounted filesystem with ordered data mode. Opts: (null). Quota mode: none.
[  143.773630] dm-4: detected capacity change from 34359738368 to 38654705664
[  198.203246] dm-5: detected capacity change from 34359734272 to 38654701568
[  207.918603] EXT4-fs (dm-5): resizing filesystem from 4294966784 to 4831837696 blocks
[  207.918754] EXT4-fs (dm-5): resizing filesystem from 4294967296 to 4831837696 blocks
[  207.918758] EXT4-fs (dm-5): Converting file system to meta_bg
[  207.918790] EXT4-fs (dm-5): resizing filesystem from 4294967296 to 4831837696 blocks
[  221.454050] EXT4-fs (dm-5): resized to 4658298880 blocks
[  227.634613] EXT4-fs (dm-5): resized filesystem to 4831837696
Signed-off-by: NJerry Lee <jerrylee@qnap.com>
Link: https://lore.kernel.org/r/PU1PR04MB22635E739BD21150DC182AC6A18C9@PU1PR04MB2263.apcprd04.prod.outlook.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>

5d5daa93

ext4: update 'state->fc_regions_size' after successful memory allocation · 003c2297

由 Ye Bin 提交于 4月 18, 2023

stable inclusion
from stable-v5.10.150
commit 2189756eabbb5f8366b4302183dfa3e98bfed196
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I6D0XA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=2189756eabbb5f8366b4302183dfa3e98bfed196

--------------------------------

commit 27cd4978 upstream.

To avoid to 'state->fc_regions_size' mismatch with 'state->fc_regions'
when fail to reallocate 'fc_reqions',only update 'state->fc_regions_size'
after 'state->fc_regions' is allocated successfully.

Cc: stable@kernel.org
Signed-off-by: NYe Bin <yebin10@huawei.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20220921064040.3693255-4-yebin10@huawei.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>

003c2297

ext4: fix potential memory leak in ext4_fc_record_regions() · c3f730c9

由 Ye Bin 提交于 4月 18, 2023

stable inclusion
from stable-v5.10.150
commit 2cfb769d60a2a57eb3566765428b6131cd16dcfe
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I6D0XA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=2cfb769d60a2a57eb3566765428b6131cd16dcfe

--------------------------------

commit 7069d105 upstream.

As krealloc may return NULL, in this case 'state->fc_regions' may not be
freed by krealloc, but 'state->fc_regions' already set NULL. Then will
lead to 'state->fc_regions' memory leak.

Cc: stable@kernel.org
Signed-off-by: NYe Bin <yebin10@huawei.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20220921064040.3693255-3-yebin10@huawei.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>

c3f730c9

ext4: fix potential memory leak in ext4_fc_record_modified_inode() · 19725b16

由 Ye Bin 提交于 4月 18, 2023

stable inclusion
from stable-v5.10.150
commit c9ce7766dc4e88e624c62a68221a3bbe8f06e856
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I6D0XA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=c9ce7766dc4e88e624c62a68221a3bbe8f06e856

--------------------------------

commit 9305721a upstream.

As krealloc may return NULL, in this case 'state->fc_modified_inodes'
may not be freed by krealloc, but 'state->fc_modified_inodes' already
set NULL. Then will lead to 'state->fc_modified_inodes' memory leak.

Cc: stable@kernel.org
Signed-off-by: NYe Bin <yebin10@huawei.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20220921064040.3693255-2-yebin10@huawei.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>

19725b16

ext4: fix miss release buffer head in ext4_fc_write_inode · d60ef330

由 Ye Bin 提交于 4月 18, 2023

stable inclusion
from stable-v5.10.150
commit d575fb52c46686f169774e409382af4b5990b215
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I6D0XA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=d575fb52c46686f169774e409382af4b5990b215

--------------------------------

commit ccbf8eeb upstream.

In 'ext4_fc_write_inode' function first call 'ext4_get_inode_loc' get 'iloc',
after use it miss release 'iloc.bh'.
So just release 'iloc.bh' before 'ext4_fc_write_inode' return.

Cc: stable@kernel.org
Signed-off-by: NYe Bin <yebin10@huawei.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20220914100859.1415196-1-yebin10@huawei.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>

d60ef330

ext4: place buffer head allocation before handle start · 92d2a3a2

由 Jinke Han 提交于 4月 18, 2023

stable inclusion
from stable-v5.10.150
commit 74d2a398d2d8c54d6468bc1e9da60ed9f3c4739f
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I6D0XA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=74d2a398d2d8c54d6468bc1e9da60ed9f3c4739f

--------------------------------

commit d1052d23 upstream.

In our product environment, we encounter some jbd hung waiting handles to
stop while several writters were doing memory reclaim for buffer head
allocation in delay alloc write path. Ext4 do buffer head allocation with
holding transaction handle which may be blocked too long if the reclaim
works not so smooth. According to our bcc trace, the reclaim time in
buffer head allocation can reach 258s and the jbd transaction commit also
take almost the same time meanwhile. Except for these extreme cases,
we often see several seconds delays for cgroup memory reclaim on our
servers. This is more likely to happen considering docker environment.

One thing to note, the allocation of buffer heads is as often as page
allocation or more often when blocksize less than page size. Just like
page cache allocation, we should also place the buffer head allocation
before startting the handle.

Cc: stable@kernel.org
Signed-off-by: NJinke Han <hanjinke.666@bytedance.com>
Link: https://lore.kernel.org/r/20220903012429.22555-1-hanjinke.666@bytedance.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>

92d2a3a2

ext4: don't increase iversion counter for ea_inodes · 726645ce

由 Lukas Czerner 提交于 4月 18, 2023

stable inclusion
from stable-v5.10.150
commit 0e1764ad71abca735418fd596a82067074b59687
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I6D0XA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=0e1764ad71abca735418fd596a82067074b59687

--------------------------------

commit 50f094a5 upstream.

ea_inodes are using i_version for storing part of the reference count so
we really need to leave it alone.

The problem can be reproduced by xfstest ext4/026 when iversion is
enabled. Fix it by not calling inode_inc_iversion() for EXT4_EA_INODE_FL
inodes in ext4_mark_iloc_dirty().

Cc: stable@kernel.org
Signed-off-by: NLukas Czerner <lczerner@redhat.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Reviewed-by: NJeff Layton <jlayton@kernel.org>
Reviewed-by: NChristian Brauner (Microsoft) <brauner@kernel.org>
Link: https://lore.kernel.org/r/20220824160349.39664-1-lczerner@redhat.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>

726645ce

ext4: make ext4_lazyinit_thread freezable · 10c2364e

由 Lalith Rajendran 提交于 4月 18, 2023

stable inclusion
from stable-v5.10.150
commit ac66db1a436504159463f811b9e9864c1d2b03f1
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I6D0XA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=ac66db1a436504159463f811b9e9864c1d2b03f1

--------------------------------

commit 3b575495 upstream.

ext4_lazyinit_thread is not set freezable. Hence when the thread calls
try_to_freeze it doesn't freeze during suspend and continues to send
requests to the storage during suspend, resulting in suspend failures.

Cc: stable@kernel.org
Signed-off-by: NLalith Rajendran <lalithkraj@google.com>
Link: https://lore.kernel.org/r/20220818214049.1519544-1-lalithkraj@google.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>

10c2364e

ext4: avoid crash when inline data creation follows DIO write · 667e2790

由 Jan Kara 提交于 4月 18, 2023

stable inclusion
from stable-v5.10.150
commit fb98cb61efff3b2a1964939465ccaaf906af1d4f
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I6D0XA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=fb98cb61efff3b2a1964939465ccaaf906af1d4f

--------------------------------

commit 4bb26f28 upstream.

When inode is created and written to using direct IO, there is nothing
to clear the EXT4_STATE_MAY_INLINE_DATA flag. Thus when inode gets
truncated later to say 1 byte and written using normal write, we will
try to store the data as inline data. This confuses the code later
because the inode now has both normal block and inline data allocated
and the confusion manifests for example as:

kernel BUG at fs/ext4/inode.c:2721!
invalid opcode: 0000 [#1] PREEMPT SMP KASAN
CPU: 0 PID: 359 Comm: repro Not tainted 5.19.0-rc8-00001-g31ba1e3b8305-dirty #15
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-1.fc36 04/01/2014
RIP: 0010:ext4_writepages+0x363d/0x3660
RSP: 0018:ffffc90000ccf260 EFLAGS: 00010293
RAX: ffffffff81e1abcd RBX: 0000008000000000 RCX: ffff88810842a180
RDX: 0000000000000000 RSI: 0000008000000000 RDI: 0000000000000000
RBP: ffffc90000ccf650 R08: ffffffff81e17d58 R09: ffffed10222c680b
R10: dfffe910222c680c R11: 1ffff110222c680a R12: ffff888111634128
R13: ffffc90000ccf880 R14: 0000008410000000 R15: 0000000000000001
FS:  00007f72635d2640(0000) GS:ffff88811b000000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000565243379180 CR3: 000000010aa74000 CR4: 0000000000150eb0
Call Trace:
 <TASK>
 do_writepages+0x397/0x640
 filemap_fdatawrite_wbc+0x151/0x1b0
 file_write_and_wait_range+0x1c9/0x2b0
 ext4_sync_file+0x19e/0xa00
 vfs_fsync_range+0x17b/0x190
 ext4_buffered_write_iter+0x488/0x530
 ext4_file_write_iter+0x449/0x1b90
 vfs_write+0xbcd/0xf40
 ksys_write+0x198/0x2c0
 __x64_sys_write+0x7b/0x90
 do_syscall_64+0x3d/0x90
 entry_SYSCALL_64_after_hwframe+0x63/0xcd
 </TASK>

Fix the problem by clearing EXT4_STATE_MAY_INLINE_DATA when we are doing
direct IO write to a file.

Cc: stable@kernel.org
Reported-by: NTadeusz Struk <tadeusz.struk@linaro.org>
Reported-by: syzbot+bd13648a53ed6933ca49@syzkaller.appspotmail.com
Link: https://syzkaller.appspot.com/bug?id=a1e89d09bbbcbd5c4cb45db230ee28c822953984Signed-off-by: NJan Kara <jack@suse.cz>
Reviewed-by: NLukas Czerner <lczerner@redhat.com>
Tested-by: Tadeusz Struk<tadeusz.struk@linaro.org>
Link: https://lore.kernel.org/r/20220727155753.13969-1-jack@suse.czSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>

667e2790

12 4月, 2023 3 次提交

ext4: Fix i_disksize exceeding i_size problem in paritally written case · ae4b9933

由 Zhihao Cheng 提交于 4月 12, 2023

maillist inclusion
category: bugfix
bugzilla: https://gitee.com/src-openeuler/kernel/issues/I6SMBI
CVE: NA

Reference: https://www.spinics.net/lists/linux-ext4/msg88386.html

--------------------------------

Following process makes i_disksize exceed i_size:

generic_perform_write
 copied = iov_iter_copy_from_user_atomic(len) // copied < len
 ext4_da_write_end
 | ext4_update_i_disksize
 |  new_i_size = pos + copied;
 |  WRITE_ONCE(EXT4_I(inode)->i_disksize, newsize) // update i_disksize
 | generic_write_end
 |  copied = block_write_end(copied, len) // copied = 0
 |   if (unlikely(copied < len))
 |    if (!PageUptodate(page))
 |     copied = 0;
 |  if (pos + copied > inode->i_size) // return false
 if (unlikely(copied == 0))
  goto again;
 if (unlikely(iov_iter_fault_in_readable(i, bytes))) {
  status = -EFAULT;
  break;
 }

We get i_disksize greater than i_size here, which could trigger WARNING
check 'i_size_read(inode) < EXT4_I(inode)->i_disksize' while doing dio:

ext4_dio_write_iter
 iomap_dio_rw
  __iomap_dio_rw // return err, length is not aligned to 512
 ext4_handle_inode_extension
  WARN_ON_ONCE(i_size_read(inode) < EXT4_I(inode)->i_disksize) // Oops

 WARNING: CPU: 2 PID: 2609 at fs/ext4/file.c:319
 CPU: 2 PID: 2609 Comm: aa Not tainted 6.3.0-rc2
 RIP: 0010:ext4_file_write_iter+0xbc7
 Call Trace:
  vfs_write+0x3b1
  ksys_write+0x77
  do_syscall_64+0x39

Fix it by updating 'copied' value before updating i_disksize just like
ext4_write_inline_data_end() does.

Fetch a reproducer in [Link].

Link: https://bugzilla.kernel.org/show_bug.cgi?id=217209
Fixes: 64769240 ("ext4: Add delayed allocation support in data=writeback mode")
Signed-off-by: NZhihao Cheng <chengzhihao1@huawei.com>
Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>

ae4b9933

ext4: ext4_put_super: Remove redundant checking for 'sbi->s_journal_bdev' · 2db2f17b

由 Zhihao Cheng 提交于 4月 12, 2023

maillist inclusion
category: bugfix
bugzilla: https://gitee.com/src-openeuler/kernel/issues/I6MMUV
CVE: NA

Reference: https://www.spinics.net/lists/linux-ext4/msg88237.html

--------------------------------

As discussed in [1], 'sbi->s_journal_bdev != sb->s_bdev' will always
become true if sbi->s_journal_bdev exists. Filesystem block device and
journal block device are both opened with 'FMODE_EXCL' mode, so these
two devices can't be same one. Then we can remove the redundant checking
'sbi->s_journal_bdev != sb->s_bdev' if 'sbi->s_journal_bdev' exists.

[1] https://lore.kernel.org/lkml/f86584f6-3877-ff18-47a1-2efaa12d18b2@huawei.com/Signed-off-by: NZhihao Cheng <chengzhihao1@huawei.com>
Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>

2db2f17b

ext4: Fix reusing stale buffer heads from last failed mounting · 1f6736e6

由 Zhihao Cheng 提交于 4月 12, 2023

maillist inclusion
category: bugfix
bugzilla: https://gitee.com/src-openeuler/kernel/issues/I6MMUV
CVE: NA

Reference: https://www.spinics.net/lists/linux-ext4/msg88237.html

--------------------------------

Following process makes ext4 load stale buffer heads from last failed
mounting in a new mounting operation:
mount_bdev
 ext4_fill_super
 | ext4_load_and_init_journal
 |  ext4_load_journal
 |   jbd2_journal_load
 |    load_superblock
 |     journal_get_superblock
 |      set_buffer_verified(bh) // buffer head is verified
 |   jbd2_journal_recover // failed caused by EIO
 | goto failed_mount3a // skip 'sb->s_root' initialization
 deactivate_locked_super
  kill_block_super
   generic_shutdown_super
    if (sb->s_root)
    // false, skip ext4_put_super->invalidate_bdev->
    // invalidate_mapping_pages->mapping_evict_folio->
    // filemap_release_folio->try_to_free_buffers, which
    // cannot drop buffer head.
   blkdev_put
    blkdev_put_whole
     if (atomic_dec_and_test(&bdev->bd_openers))
     // false, systemd-udev happens to open the device. Then
     // blkdev_flush_mapping->kill_bdev->truncate_inode_pages->
     // truncate_inode_folio->truncate_cleanup_folio->
     // folio_invalidate->block_invalidate_folio->
     // filemap_release_folio->try_to_free_buffers will be skipped,
     // dropping buffer head is missed again.

Second mount:
ext4_fill_super
 ext4_load_and_init_journal
  ext4_load_journal
   ext4_get_journal
    jbd2_journal_init_inode
     journal_init_common
      bh = getblk_unmovable
       bh = __find_get_block // Found stale bh in last failed mounting
      journal->j_sb_buffer = bh
   jbd2_journal_load
    load_superblock
     journal_get_superblock
      if (buffer_verified(bh))
      // true, skip journal->j_format_version = 2, value is 0
    jbd2_journal_recover
     do_one_pass
      next_log_block += count_tags(journal, bh)
      // According to journal_tag_bytes(), 'tag_bytes' calculating is
      // affected by jbd2_has_feature_csum3(), jbd2_has_feature_csum3()
      // returns false because 'j->j_format_version >= 2' is not true,
      // then we get wrong next_log_block. The do_one_pass may exit
      // early whenoccuring non JBD2_MAGIC_NUMBER in 'next_log_block'.

The filesystem is corrupted here, journal is partially replayed, and
new journal sequence number actually is already used by last mounting.

The invalidate_bdev() can drop all buffer heads even racing with bare
reading block device(eg. systemd-udev), so we can fix it by invalidating
bdev in error handling path in __ext4_fill_super().

Fetch a reproducer in [Link].

Link: https://bugzilla.kernel.org/show_bug.cgi?id=217171
Fixes: 25ed6e8a ("jbd2: enable journal clients to enable v2 checksumming")
Cc: stable@vger.kernel.org # v3.5
Conflicts:
	fs/ext4/super.c
	[ a7a79c29("ext4: unify the ext4 super block loading
	  operation") is not applied.
	  7edfd85b("ext4: Completely separate options parsing and sb
	  setup") is not applied. ]
Signed-off-by: NZhihao Cheng <chengzhihao1@huawei.com>
Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>

1f6736e6

04 4月, 2023 2 次提交

ext4: dio take shared inode lock when overwriting preallocated blocks · 5193a88e

由 Zhang Yi 提交于 4月 04, 2023

mainline inclusion
from mainline-v6.3-rc1
commit 240930fb
category: perf
bugzilla: https://gitee.com/openeuler/kernel/issues/I6S63P
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=240930fb7e6b52229bdee5b1423bfeab0002fed2

--------------------------------

In the dio write path, we only take shared inode lock for the case of
aligned overwriting initialized blocks inside EOF. But for overwriting
preallocated blocks, it may only need to split unwritten extents, this
procedure has been protected under i_data_sem lock, it's safe to
release the exclusive inode lock and take shared inode lock.

This could give a significant speed up for multi-threaded writes. Test
on Intel Xeon Gold 6140 and nvme SSD with below fio parameters.

 direct=1
 ioengine=libaio
 iodepth=10
 numjobs=10
 runtime=60
 rw=randwrite
 size=100G

And the test result are:
Before:
 bs=4k       IOPS=11.1k, BW=43.2MiB/s
 bs=16k      IOPS=11.1k, BW=173MiB/s
 bs=64k      IOPS=11.2k, BW=697MiB/s

After:
 bs=4k       IOPS=41.4k, BW=162MiB/s
 bs=16k      IOPS=41.3k, BW=646MiB/s
 bs=64k      IOPS=13.5k, BW=843MiB/s
Signed-off-by: NZhang Yi <yi.zhang@huawei.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20221226062015.3479416-1-yi.zhang@huaweicloud.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
Conflicts:
	fs/ext4/file.c
	[ 2f632965("iomap: pass a flags argument to iomap_dio_rw")
	  is not applied. ]
Signed-off-by: NZhihao Cheng <chengzhihao1@huawei.com>
Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>

5193a88e

ext4: fix race between writepages and remount · b66c23ec

由 Baokun Li 提交于 4月 04, 2023

hulk inclusion
category: bugfix
bugzilla: 188500, https://gitee.com/openeuler/kernel/issues/I6RJ0V
CVE: NA

--------------------------------

We got a WARNING in ext4_add_complete_io:
==================================================================
 WARNING: at fs/ext4/page-io.c:231 ext4_put_io_end_defer+0x182/0x250
 CPU: 10 PID: 77 Comm: ksoftirqd/10 Tainted: 6.3.0-rc2 #85
 RIP: 0010:ext4_put_io_end_defer+0x182/0x250 [ext4]
 [...]
 Call Trace:
  <TASK>
  ext4_end_bio+0xa8/0x240 [ext4]
  bio_endio+0x195/0x310
  blk_update_request+0x184/0x770
  scsi_end_request+0x2f/0x240
  scsi_io_completion+0x75/0x450
  scsi_finish_command+0xef/0x160
  scsi_complete+0xa3/0x180
  blk_complete_reqs+0x60/0x80
  blk_done_softirq+0x25/0x40
  __do_softirq+0x119/0x4c8
  run_ksoftirqd+0x42/0x70
  smpboot_thread_fn+0x136/0x3c0
  kthread+0x140/0x1a0
  ret_from_fork+0x2c/0x50
==================================================================

Above issue may happen as follows:

            cpu1                        cpu2
----------------------------|----------------------------
mount -o dioread_lock
ext4_writepages
 ext4_do_writepages
  *if (ext4_should_dioread_nolock(inode))*
    // rsv_blocks is not assigned here
                                 mount -o remount,dioread_nolock
  ext4_journal_start_with_reserve
   __ext4_journal_start
    __ext4_journal_start_sb
     jbd2__journal_start
      *if (rsv_blocks)*
        // h_rsv_handle is not initialized here
  mpage_map_and_submit_extent
    mpage_map_one_extent
      dioread_nolock = ext4_should_dioread_nolock(inode)
      if (dioread_nolock && (map->m_flags & EXT4_MAP_UNWRITTEN))
        mpd->io_submit.io_end->handle = handle->h_rsv_handle
        ext4_set_io_unwritten_flag
          io_end->flag |= EXT4_IO_END_UNWRITTEN
      // now io_end->handle is NULL but has EXT4_IO_END_UNWRITTEN flag

scsi_finish_command
 scsi_io_completion
  scsi_io_completion_action
   scsi_end_request
    blk_update_request
     req_bio_endio
      bio_endio
       bio->bi_end_io  > ext4_end_bio
        ext4_put_io_end_defer
	 ext4_add_complete_io
	  // trigger WARN_ON(!io_end->handle && sbi->s_journal);

The immediate cause of this problem is that ext4_should_dioread_nolock()
function returns inconsistent values in the ext4_do_writepages() and
mpage_map_one_extent(). There are four conditions in this function that
can be changed at mount time to cause this problem. These four conditions
can be divided into two categories:

    (1) journal_data and EXT4_EXTENTS_FL, which can be changed by ioctl
    (2) DELALLOC and DIOREAD_NOLOCK, which can be changed by remount

The two in the first category have been fixed by commit c8585c6f
("ext4: fix races between changing inode journal mode and ext4_writepages")
and commit cb85f4d2 ("ext4: fix race between writepages and enabling
EXT4_EXTENTS_FL") respectively.

Two cases in the other category have not yet been fixed, and the above
issue is caused by this situation. We refer to the fix for the first
category, when applying options during remount, we grab s_writepages_rwsem
to avoid racing with writepages ops to trigger this problem.

Fixes: 6b523df4 ("ext4: use transaction reservation for extent conversion in ext4_end_io")
Cc: stable@vger.kernel.org
Signed-off-by: NBaokun Li <libaokun1@huawei.com>
Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
Reviewed-by: NYang Erkun <yangerkun@huawei.com>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>

b66c23ec

29 3月, 2023 2 次提交

ext4: make sure fs error flag setted before clear journal error · fb18d820

由 Ye Bin 提交于 3月 29, 2023

mainline inclusion
from mainline-v6.3-rc2
commit f57886ca
category: bugfix
bugzilla: 188471,https://gitee.com/openeuler/kernel/issues/I6MR1V
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=f57886ca1606ba74cc4ec4eb5cbf073934ffa559

--------------------------------

Now, jounral error number maybe cleared even though ext4_commit_super()
failed. This may lead to error flag miss, then fsck will miss to check
file system deeply.
Signed-off-by: NYe Bin <yebin10@huawei.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20230307061703.245965-3-yebin@huaweicloud.comSigned-off-by: NBaokun Li <libaokun1@huawei.com>
Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
Reviewed-by: NYang Erkun <yangerkun@huawei.com>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>

fb18d820

ext4: commit super block if fs record error when journal record without error · 2d6e8d57

由 Ye Bin 提交于 3月 29, 2023

mainline inclusion
from mainline-v6.3-rc2
commit eee00237
category: bugfix
bugzilla: 188471,https://gitee.com/openeuler/kernel/issues/I6MR1V
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=eee00237fa5ec8f704f7323b54e48cc34e2d9168

--------------------------------

Now, 'es->s_state' maybe covered by recover journal. And journal errno
maybe not recorded in journal sb as IO error. ext4_update_super() only
update error information when 'sbi->s_add_error_count' large than zero.
Then 'EXT4_ERROR_FS' flag maybe lost.
To solve above issue just recover 'es->s_state' error flag after journal
replay like error info.
Signed-off-by: NYe Bin <yebin10@huawei.com>
Reviewed-by: NBaokun Li <libaokun1@huawei.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20230307061703.245965-2-yebin@huaweicloud.comSigned-off-by: NBaokun Li <libaokun1@huawei.com>
Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
Reviewed-by: NYang Erkun <yangerkun@huawei.com>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>

2d6e8d57

22 3月, 2023 1 次提交

ext4: fix another off-by-one fsmap error on 1k block filesystems · 2ef9d264

由 Darrick J. Wong 提交于 3月 22, 2023

mainline inclusion
from mainline-v6.3-rc2
commit c993799b
category: bugfix
bugzilla: 188522,https://gitee.com/openeuler/kernel/issues/I6N7ZP
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c993799baf9c5861f8df91beb80e1611b12efcbd

--------------------------------

Apparently syzbot figured out that issuing this FSMAP call:

struct fsmap_head cmd = {
	.fmh_count	= ...;
	.fmh_keys	= {
		{ .fmr_device = /* ext4 dev */, .fmr_physical = 0, },
		{ .fmr_device = /* ext4 dev */, .fmr_physical = 0, },
	},
...
};
ret = ioctl(fd, FS_IOC_GETFSMAP, &cmd);

Produces this crash if the underlying filesystem is a 1k-block ext4
filesystem:

kernel BUG at fs/ext4/ext4.h:3331!
invalid opcode: 0000 [#1] PREEMPT SMP
CPU: 3 PID: 3227965 Comm: xfs_io Tainted: G        W  O       6.2.0-rc8-achx
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1 04/01/2014
RIP: 0010:ext4_mb_load_buddy_gfp+0x47c/0x570 [ext4]
RSP: 0018:ffffc90007c03998 EFLAGS: 00010246
RAX: ffff888004978000 RBX: ffffc90007c03a20 RCX: ffff888041618000
RDX: 0000000000000000 RSI: 00000000000005a4 RDI: ffffffffa0c99b11
RBP: ffff888012330000 R08: ffffffffa0c2b7d0 R09: 0000000000000400
R10: ffffc90007c03950 R11: 0000000000000000 R12: 0000000000000001
R13: 00000000ffffffff R14: 0000000000000c40 R15: ffff88802678c398
FS:  00007fdf2020c880(0000) GS:ffff88807e100000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007ffd318a5fe8 CR3: 000000007f80f001 CR4: 00000000001706e0
Call Trace:
 <TASK>
 ext4_mballoc_query_range+0x4b/0x210 [ext4 dfa189daddffe8fecd3cdfd00564e0f265a8ab80]
 ext4_getfsmap_datadev+0x713/0x890 [ext4 dfa189daddffe8fecd3cdfd00564e0f265a8ab80]
 ext4_getfsmap+0x2b7/0x330 [ext4 dfa189daddffe8fecd3cdfd00564e0f265a8ab80]
 ext4_ioc_getfsmap+0x153/0x2b0 [ext4 dfa189daddffe8fecd3cdfd00564e0f265a8ab80]
 __ext4_ioctl+0x2a7/0x17e0 [ext4 dfa189daddffe8fecd3cdfd00564e0f265a8ab80]
 __x64_sys_ioctl+0x82/0xa0
 do_syscall_64+0x2b/0x80
 entry_SYSCALL_64_after_hwframe+0x46/0xb0
RIP: 0033:0x7fdf20558aff
RSP: 002b:00007ffd318a9e30 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 00000000000200c0 RCX: 00007fdf20558aff
RDX: 00007fdf1feb2010 RSI: 00000000c0c0583b RDI: 0000000000000003
RBP: 00005625c0634be0 R08: 00005625c0634c40 R09: 0000000000000001
R10: 0000000000000000 R11: 0000000000000246 R12: 00007fdf1feb2010
R13: 00005625be70d994 R14: 0000000000000800 R15: 0000000000000000

For GETFSMAP calls, the caller selects a physical block device by
writing its block number into fsmap_head.fmh_keys[01].fmr_device.
To query mappings for a subrange of the device, the starting byte of the
range is written to fsmap_head.fmh_keys[0].fmr_physical and the last
byte of the range goes in fsmap_head.fmh_keys[1].fmr_physical.

IOWs, to query what mappings overlap with bytes 3-14 of /dev/sda, you'd
set the inputs as follows:

	fmh_keys[0] = { .fmr_device = major(8, 0), .fmr_physical = 3},
	fmh_keys[1] = { .fmr_device = major(8, 0), .fmr_physical = 14},

Which would return you whatever is mapped in the 12 bytes starting at
physical offset 3.

The crash is due to insufficient range validation of keys[1] in
ext4_getfsmap_datadev.  On 1k-block filesystems, block 0 is not part of
the filesystem, which means that s_first_data_block is nonzero.
ext4_get_group_no_and_offset subtracts this quantity from the blocknr
argument before cracking it into a group number and a block number
within a group.  IOWs, block group 0 spans blocks 1-8192 (1-based)
instead of 0-8191 (0-based) like what happens with larger blocksizes.

The net result of this encoding is that blocknr < s_first_data_block is
not a valid input to this function.  The end_fsb variable is set from
the keys that are copied from userspace, which means that in the above
example, its value is zero.  That leads to an underflow here:

	blocknr = blocknr - le32_to_cpu(es->s_first_data_block);

The division then operates on -1:

	offset = do_div(blocknr, EXT4_BLOCKS_PER_GROUP(sb)) >>
		EXT4_SB(sb)->s_cluster_bits;

Leaving an impossibly large group number (2^32-1) in blocknr.
ext4_getfsmap_check_keys checked that keys[0].fmr_physical and
keys[1].fmr_physical are in increasing order, but
ext4_getfsmap_datadev adjusts keys[0].fmr_physical to be at least
s_first_data_block.  This implies that we have to check it again after
the adjustment, which is the piece that I forgot.

Reported-by: syzbot+6be2b977c89f79b6b153@syzkaller.appspotmail.com
Fixes: 4a495624 ("ext4: fix off-by-one fsmap error on 1k block filesystems")
Link: https://syzkaller.appspot.com/bug?id=79d5768e9bfe362911ac1a5057a36fc6b5c30002
Cc: stable@vger.kernel.org
Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
Link: https://lore.kernel.org/r/Y+58NPTH7VNGgzdd@magnoliaSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
Signed-off-by: NBaokun Li <libaokun1@huawei.com>
Reviewed-by: NZhihao Cheng <chengzhihao1@huawei.com>
Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>

2ef9d264

15 3月, 2023 2 次提交

ext4: fix WARNING in mb_find_extent · 0f0441bf

由 Ye Bin 提交于 3月 15, 2023

maillist inclusion
category: bugfix
bugzilla: https://gitee.com/src-openeuler/kernel/issues/I6K53I

Reference: https://patchwork.ozlabs.org/project/linux-ext4/patch/20230116020015.1506120-1-yebin@huaweicloud.com/

--------------------------------

Syzbot found the following issue:

EXT4-fs: Warning: mounting with data=journal disables delayed allocation, dioread_nolock, O_DIRECT and fast_commit support!
EXT4-fs (loop0): orphan cleanup on readonly fs
------------[ cut here ]------------
WARNING: CPU: 1 PID: 5067 at fs/ext4/mballoc.c:1869 mb_find_extent+0x8a1/0xe30
Modules linked in:
CPU: 1 PID: 5067 Comm: syz-executor307 Not tainted 6.2.0-rc1-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/26/2022
RIP: 0010:mb_find_extent+0x8a1/0xe30 fs/ext4/mballoc.c:1869
RSP: 0018:ffffc90003c9e098 EFLAGS: 00010293
RAX: ffffffff82405731 RBX: 0000000000000041 RCX: ffff8880783457c0
RDX: 0000000000000000 RSI: 0000000000000041 RDI: 0000000000000040
RBP: 0000000000000040 R08: ffffffff82405723 R09: ffffed10053c9402
R10: ffffed10053c9402 R11: 1ffff110053c9401 R12: 0000000000000000
R13: ffffc90003c9e538 R14: dffffc0000000000 R15: ffffc90003c9e2cc
FS:  0000555556665300(0000) GS:ffff8880b9900000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000056312f6796f8 CR3: 0000000022437000 CR4: 00000000003506e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <TASK>
 ext4_mb_complex_scan_group+0x353/0x1100 fs/ext4/mballoc.c:2307
 ext4_mb_regular_allocator+0x1533/0x3860 fs/ext4/mballoc.c:2735
 ext4_mb_new_blocks+0xddf/0x3db0 fs/ext4/mballoc.c:5605
 ext4_ext_map_blocks+0x1868/0x6880 fs/ext4/extents.c:4286
 ext4_map_blocks+0xa49/0x1cc0 fs/ext4/inode.c:651
 ext4_getblk+0x1b9/0x770 fs/ext4/inode.c:864
 ext4_bread+0x2a/0x170 fs/ext4/inode.c:920
 ext4_quota_write+0x225/0x570 fs/ext4/super.c:7105
 write_blk fs/quota/quota_tree.c:64 [inline]
 get_free_dqblk+0x34a/0x6d0 fs/quota/quota_tree.c:130
 do_insert_tree+0x26b/0x1aa0 fs/quota/quota_tree.c:340
 do_insert_tree+0x722/0x1aa0 fs/quota/quota_tree.c:375
 do_insert_tree+0x722/0x1aa0 fs/quota/quota_tree.c:375
 do_insert_tree+0x722/0x1aa0 fs/quota/quota_tree.c:375
 dq_insert_tree fs/quota/quota_tree.c:401 [inline]
 qtree_write_dquot+0x3b6/0x530 fs/quota/quota_tree.c:420
 v2_write_dquot+0x11b/0x190 fs/quota/quota_v2.c:358
 dquot_acquire+0x348/0x670 fs/quota/dquot.c:444
 ext4_acquire_dquot+0x2dc/0x400 fs/ext4/super.c:6740
 dqget+0x999/0xdc0 fs/quota/dquot.c:914
 __dquot_initialize+0x3d0/0xcf0 fs/quota/dquot.c:1492
 ext4_process_orphan+0x57/0x2d0 fs/ext4/orphan.c:329
 ext4_orphan_cleanup+0xb60/0x1340 fs/ext4/orphan.c:474
 __ext4_fill_super fs/ext4/super.c:5516 [inline]
 ext4_fill_super+0x81cd/0x8700 fs/ext4/super.c:5644
 get_tree_bdev+0x400/0x620 fs/super.c:1282
 vfs_get_tree+0x88/0x270 fs/super.c:1489
 do_new_mount+0x289/0xad0 fs/namespace.c:3145
 do_mount fs/namespace.c:3488 [inline]
 __do_sys_mount fs/namespace.c:3697 [inline]
 __se_sys_mount+0x2d3/0x3c0 fs/namespace.c:3674
 do_syscall_x64 arch/x86/entry/common.c:50 [inline]
 do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80
 entry_SYSCALL_64_after_hwframe+0x63/0xcd

Add some debug information:
mb_find_extent: mb_find_extent block=41, order=0 needed=64 next=0 ex=0/41/1@3735929054 64 64 7
block_bitmap: ff 3f 0c 00 fc 01 00 00 d2 3d 00 00 00 00 00 00 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff

Acctually, blocks per group is 64, but block bitmap indicate at least has
128 blocks. Now, ext4_validate_block_bitmap() didn't check invalid block's
bitmap if set.
To resolve above issue, add check like fsck "Padding at end of block bitmap is
not set".

Reported-by: syzbot+68223fe9f6c95ad43bed@syzkaller.appspotmail.com
Signed-off-by: NYe Bin <yebin10@huawei.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Signed-off-by: NZhihao Cheng <chengzhihao1@huawei.com>
Reviewed-by: NYang Erkun <yangerkun@huawei.com>
Reviewed-by: NHou Tao <houtao1@huawei.com>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>

0f0441bf

ext4: fix incorrect options show of original mount_opt and extend mount_opt2 · c8ca1b9a

由 Zhang Yi 提交于 3月 15, 2023

maillist inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I6D5XF

Reference: https://lore.kernel.org/linux-ext4/20230130111138.76tp6pij3yhh4brh@quack3/T/#t

--------------------------------

Current _ext4_show_options() do not distinguish MOPT_2 flag, so it mixed
extend sbi->s_mount_opt2 options with sbi->s_mount_opt, it could lead to
show incorrect options, e.g. show fc_debug_force if we mount with
errors=continue mode and miss it if we set.

  $ mkfs.ext4 /dev/pmem0
  $ mount -o errors=remount-ro /dev/pmem0 /mnt
  $ cat /proc/fs/ext4/pmem0/options | grep fc_debug_force
    #empty
  $ mount -o remount,errors=continue /mnt
  $ cat /proc/fs/ext4/pmem0/options | grep fc_debug_force
    fc_debug_force
  $ mount -o remount,errors=remount-ro,fc_debug_force /mnt
  $ cat /proc/fs/ext4/pmem0/options | grep fc_debug_force
    #empty

Fixes: 995a3ed6 ("ext4: add fast_commit feature and handling for extended mount options")
Signed-off-by: NZhang Yi <yi.zhang@huawei.com>

Conflict:
  fs/ext4/super.c
Reviewed-by: NZhihao Cheng <chengzhihao1@huawei.com>
Reviewed-by: NZhang Xiaoxu <zhangxiaoxu5@huawei.com>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>

c8ca1b9a

17 2月, 2023 2 次提交

ext4: make directory inode spreading reflect flexbg size · 6b1268f7

由 Jan Kara 提交于 2月 15, 2023

stable inclusion
from stable-v5.10.146
commit c18383218c3102f8f9ba56fd9abed86ac80a69c3
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I6D0VX

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=c18383218c3102f8f9ba56fd9abed86ac80a69c3

--------------------------------

commit 613c5a85 upstream.

Currently the Orlov inode allocator searches for free inodes for a
directory only in flex block groups with at most inodes_per_group/16
more directory inodes than average per flex block group. However with
growing size of flex block group this becomes unnecessarily strict.
Scale allowed difference from average directory count per flex block
group with flex block group size as we do with other metrics.
Tested-by: NStefan Wahren <stefan.wahren@i2se.com>
Tested-by: NOjaswin Mujoo <ojaswin@linux.ibm.com>
Cc: stable@kernel.org
Link: https://lore.kernel.org/all/0d81a7c2-46b7-6010-62a4-3e6cfc1628d6@i2se.com/Signed-off-by: NJan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20220908092136.11770-3-jack@suse.czSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
Reviewed-by: NZheng Zengkai <zhengzengkai@huawei.com>

6b1268f7

ext4: limit the number of retries after discarding preallocations blocks · 9c394e2e

由 Theodore Ts'o 提交于 2月 15, 2023

stable inclusion
from stable-v5.10.146
commit a968542d7e2471f3c75ed9380110049354e1de44
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I6D0VX

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=a968542d7e2471f3c75ed9380110049354e1de44

--------------------------------

commit 80fa46d6 upstream.

This patch avoids threads live-locking for hours when a large number
threads are competing over the last few free extents as they blocks
getting added and removed from preallocation pools.  From our bug
reporter:

   A reliable way for triggering this has multiple writers
   continuously write() to files when the filesystem is full, while
   small amounts of space are freed (e.g. by truncating a large file
   -1MiB at a time). In the local filesystem, this can be done by
   simply not checking the return code of write (0) and/or the error
   (ENOSPACE) that is set. Over NFS with an async mount, even clients
   with proper error checking will behave this way since the linux NFS
   client implementation will not propagate the server errors [the
   write syscalls immediately return success] until the file handle is
   closed. This leads to a situation where NFS clients send a
   continuous stream of WRITE rpcs which result in ERRNOSPACE -- but
   since the client isn't seeing this, the stream of writes continues
   at maximum network speed.

   When some space does appear, multiple writers will all attempt to
   claim it for their current write. For NFS, we may see dozens to
   hundreds of threads that do this.

   The real-world scenario of this is database backup tooling (in
   particular, github.com/mdkent/percona-xtrabackup) which may write
   large files (>1TiB) to NFS for safe keeping. Some temporary files
   are written, rewound, and read back -- all before closing the file
   handle (the temp file is actually unlinked, to trigger automatic
   deletion on close/crash.) An application like this operating on an
   async NFS mount will not see an error code until TiB have been
   written/read.

   The lockup was observed when running this database backup on large
   filesystems (64 TiB in this case) with a high number of block
   groups and no free space. Fragmentation is generally not a factor
   in this filesystem (~thousands of large files, mostly contiguous
   except for the parts written while the filesystem is at capacity.)
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
Reviewed-by: NZheng Zengkai <zhengzengkai@huawei.com>

9c394e2e

09 2月, 2023 2 次提交

ext4: fix use-after-free in ext4_orphan_cleanup · 579f436a

由 Baokun Li 提交于 2月 07, 2023

stable inclusion
from stable-v5.10.163
commit 7223d5e75f26352354ea2c0ccf8b579821b52adf
category: bugfix
bugzilla: 187904,https://gitee.com/openeuler/kernel/issues/I6BJAR
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=7223d5e75f26352354ea2c0ccf8b579821b52adf

--------------------------------

commit a71248b1 upstream.

I caught a issue as follows:

==================================================================
 BUG: KASAN: use-after-free in __list_add_valid+0x28/0x1a0
 Read of size 8 at addr ffff88814b13f378 by task mount/710

 CPU: 1 PID: 710 Comm: mount Not tainted 6.1.0-rc3-next #370
 Call Trace:
  <TASK>
  dump_stack_lvl+0x73/0x9f
  print_report+0x25d/0x759
  kasan_report+0xc0/0x120
  __asan_load8+0x99/0x140
  __list_add_valid+0x28/0x1a0
  ext4_orphan_cleanup+0x564/0x9d0 [ext4]
  __ext4_fill_super+0x48e2/0x5300 [ext4]
  ext4_fill_super+0x19f/0x3a0 [ext4]
  get_tree_bdev+0x27b/0x450
  ext4_get_tree+0x19/0x30 [ext4]
  vfs_get_tree+0x49/0x150
  path_mount+0xaae/0x1350
  do_mount+0xe2/0x110
  __x64_sys_mount+0xf0/0x190
  do_syscall_64+0x35/0x80
  entry_SYSCALL_64_after_hwframe+0x63/0xcd
  </TASK>
 [...]
==================================================================

Above issue may happen as follows:
-------------------------------------
ext4_fill_super
  ext4_orphan_cleanup
   --- loop1: assume last_orphan is 12 ---
    list_add(&EXT4_I(inode)->i_orphan, &EXT4_SB(sb)->s_orphan)
    ext4_truncate --> return 0
      ext4_inode_attach_jinode --> return -ENOMEM
    iput(inode) --> free inode<12>
   --- loop2: last_orphan is still 12 ---
    list_add(&EXT4_I(inode)->i_orphan, &EXT4_SB(sb)->s_orphan);
    // use inode<12> and trigger UAF

To solve this issue, we need to propagate the return value of
ext4_inode_attach_jinode() appropriately.
Signed-off-by: NBaokun Li <libaokun1@huawei.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20221102080633.1630225-1-libaokun1@huawei.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NBaokun Li <libaokun1@huawei.com>
Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
(cherry picked from commit 21851e4c)

579f436a

ext4: fix null-ptr-deref in ext4_write_info · 45d1d7ec

由 Baokun Li 提交于 2月 07, 2023

stable inclusion
from stable-v5.10.150
commit f34ab95162763cd7352f46df169296eec28b688d
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I6BJ5F
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=f34ab95162763cd7352f46df169296eec28b688d

--------------------------------

commit f9c1f248 upstream.

I caught a null-ptr-deref bug as follows:

==================================================================
KASAN: null-ptr-deref in range [0x0000000000000068-0x000000000000006f]
CPU: 1 PID: 1589 Comm: umount Not tainted 5.10.0-02219-dirty #339
RIP: 0010:ext4_write_info+0x53/0x1b0
[...]
Call Trace:
 dquot_writeback_dquots+0x341/0x9a0
 ext4_sync_fs+0x19e/0x800
 __sync_filesystem+0x83/0x100
 sync_filesystem+0x89/0xf0
 generic_shutdown_super+0x79/0x3e0
 kill_block_super+0xa1/0x110
 deactivate_locked_super+0xac/0x130
 deactivate_super+0xb6/0xd0
 cleanup_mnt+0x289/0x400
 __cleanup_mnt+0x16/0x20
 task_work_run+0x11c/0x1c0
 exit_to_user_mode_prepare+0x203/0x210
 syscall_exit_to_user_mode+0x5b/0x3a0
 do_syscall_64+0x59/0x70
 entry_SYSCALL_64_after_hwframe+0x44/0xa9
 ==================================================================

Above issue may happen as follows:
-------------------------------------
exit_to_user_mode_prepare
 task_work_run
  __cleanup_mnt
   cleanup_mnt
    deactivate_super
     deactivate_locked_super
      kill_block_super
       generic_shutdown_super
        shrink_dcache_for_umount
         dentry = sb->s_root
         sb->s_root = NULL              <--- Here set NULL
        sync_filesystem
         __sync_filesystem
          sb->s_op->sync_fs > ext4_sync_fs
           dquot_writeback_dquots
            sb->dq_op->write_info > ext4_write_info
             ext4_journal_start(d_inode(sb->s_root), EXT4_HT_QUOTA, 2)
              d_inode(sb->s_root)
               s_root->d_inode          <--- Null pointer dereference

To solve this problem, we use ext4_journal_start_sb directly
to avoid s_root being used.

Cc: stable@kernel.org
Signed-off-by: NBaokun Li <libaokun1@huawei.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20220805123947.565152-1-libaokun1@huawei.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NBaokun Li <libaokun1@huawei.com>
Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
(cherry picked from commit 7c7ce1ec)

45d1d7ec

17 1月, 2023 1 次提交

ext4: fix bad checksum after online resize · 376a398b

由 Baokun Li 提交于 1月 11, 2023

mainline inclusion
from mainline-v6.2-rc1
commit a408f33e
category: bugfix
bugzilla: 187935,https://gitee.com/src-openeuler/kernel/issues/I5ZR8Z
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=a408f33e895e455f16cf964cb5cd4979b658db7b

--------------------------------

When online resizing is performed twice consecutively, the error message
"Superblock checksum does not match superblock" is displayed for the
second time. Here's the reproducer:

	mkfs.ext4 -F /dev/sdb 100M
	mount /dev/sdb /tmp/test
	resize2fs /dev/sdb 5G
	resize2fs /dev/sdb 6G

To solve this issue, we moved the update of the checksum after the
es->s_overhead_clusters is updated.

Fixes: 026d0d27 ("ext4: reduce computation of overhead during resize")
Fixes: de394a86 ("ext4: update s_overhead_clusters in the superblock during an on-line resize")
Signed-off-by: NBaokun Li <libaokun1@huawei.com>
Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
Reviewed-by: NJan Kara <jack@suse.cz>
Cc: stable@kernel.org
Link: https://lore.kernel.org/r/20221117040341.1380702-2-libaokun1@huawei.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

Conflict:
	fs/ext4/resize.c
Signed-off-by: NBaokun Li <libaokun1@huawei.com>
Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
(cherry picked from commit 44942478)

376a398b

30 11月, 2022 1 次提交

ext4: fix bug in extents parsing when eh_entries == 0 and eh_depth > 0 · 0514826e

由 Luís Henriques 提交于 11月 30, 2022

stable inclusion
from stable-v5.10.146
commit 958b0ee23f5ac106e7cc11472b71aa2ea9a033bc
category: bugfix
bugzilla: 187444, https://gitee.com/openeuler/kernel/issues/I6261Z
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=958b0ee23f5ac106e7cc11472b71aa2ea9a033bc

--------------------------------

commit 29a5b8a1 upstream.

When walking through an inode extents, the ext4_ext_binsearch_idx() function
assumes that the extent header has been previously validated.  However, there
are no checks that verify that the number of entries (eh->eh_entries) is
non-zero when depth is > 0.  And this will lead to problems because the
EXT_FIRST_INDEX() and EXT_LAST_INDEX() will return garbage and result in this:

[  135.245946] ------------[ cut here ]------------
[  135.247579] kernel BUG at fs/ext4/extents.c:2258!
[  135.249045] invalid opcode: 0000 [#1] PREEMPT SMP
[  135.250320] CPU: 2 PID: 238 Comm: tmp118 Not tainted 5.19.0-rc8+ #4
[  135.252067] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.15.0-0-g2dd4b9b-rebuilt.opensuse.org 04/01/2014
[  135.255065] RIP: 0010:ext4_ext_map_blocks+0xc20/0xcb0
[  135.256475] Code:
[  135.261433] RSP: 0018:ffffc900005939f8 EFLAGS: 00010246
[  135.262847] RAX: 0000000000000024 RBX: ffffc90000593b70 RCX: 0000000000000023
[  135.264765] RDX: ffff8880038e5f10 RSI: 0000000000000003 RDI: ffff8880046e922c
[  135.266670] RBP: ffff8880046e9348 R08: 0000000000000001 R09: ffff888002ca580c
[  135.268576] R10: 0000000000002602 R11: 0000000000000000 R12: 0000000000000024
[  135.270477] R13: 0000000000000000 R14: 0000000000000024 R15: 0000000000000000
[  135.272394] FS:  00007fdabdc56740(0000) GS:ffff88807dd00000(0000) knlGS:0000000000000000
[  135.274510] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  135.276075] CR2: 00007ffc26bd4f00 CR3: 0000000006261004 CR4: 0000000000170ea0
[  135.277952] Call Trace:
[  135.278635]  <TASK>
[  135.279247]  ? preempt_count_add+0x6d/0xa0
[  135.280358]  ? percpu_counter_add_batch+0x55/0xb0
[  135.281612]  ? _raw_read_unlock+0x18/0x30
[  135.282704]  ext4_map_blocks+0x294/0x5a0
[  135.283745]  ? xa_load+0x6f/0xa0
[  135.284562]  ext4_mpage_readpages+0x3d6/0x770
[  135.285646]  read_pages+0x67/0x1d0
[  135.286492]  ? folio_add_lru+0x51/0x80
[  135.287441]  page_cache_ra_unbounded+0x124/0x170
[  135.288510]  filemap_get_pages+0x23d/0x5a0
[  135.289457]  ? path_openat+0xa72/0xdd0
[  135.290332]  filemap_read+0xbf/0x300
[  135.291158]  ? _raw_spin_lock_irqsave+0x17/0x40
[  135.292192]  new_sync_read+0x103/0x170
[  135.293014]  vfs_read+0x15d/0x180
[  135.293745]  ksys_read+0xa1/0xe0
[  135.294461]  do_syscall_64+0x3c/0x80
[  135.295284]  entry_SYSCALL_64_after_hwframe+0x46/0xb0

This patch simply adds an extra check in __ext4_ext_check(), verifying that
eh_entries is not 0 when eh_depth is > 0.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=215941
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216283
Cc: Baokun Li <libaokun1@huawei.com>
Cc: stable@kernel.org
Signed-off-by: NLuís Henriques <lhenriques@suse.de>
Reviewed-by: NJan Kara <jack@suse.cz>
Reviewed-by: NBaokun Li <libaokun1@huawei.com>
Link: https://lore.kernel.org/r/20220822094235.2690-1-lhenriques@suse.deSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NBaokun Li <libaokun1@huawei.com>
Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

0514826e

21 11月, 2022 3 次提交

ext4: fix super block checksum incorrect after mount · 2c739341

由 Ye Bin 提交于 11月 21, 2022

mainline inclusion
from mainline-v5.19-rc3
commit 9b6641dd
category: bugfix
bugzilla: 186927, https://gitee.com/src-openeuler/kernel/issues/I5YIY6
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=9b6641dd95a0c441b277dd72ba22fed8d61f76ad

--------------------------------

We got issue as follows:
[home]# mount  /dev/sda  test
EXT4-fs (sda): warning: mounting fs with errors, running e2fsck is recommended
[home]# dmesg
EXT4-fs (sda): warning: mounting fs with errors, running e2fsck is recommended
EXT4-fs (sda): Errors on filesystem, clearing orphan list.
EXT4-fs (sda): recovery complete
EXT4-fs (sda): mounted filesystem with ordered data mode. Quota mode: none.
[home]# debugfs /dev/sda
debugfs 1.46.5 (30-Dec-2021)
Checksum errors in superblock!  Retrying...

Reason is ext4_orphan_cleanup will reset ‘s_last_orphan’ but not update
super block checksum.

To solve above issue, defer update super block checksum after
ext4_orphan_cleanup.
Signed-off-by: NYe Bin <yebin10@huawei.com>
Cc: stable@kernel.org
Reviewed-by: NJan Kara <jack@suse.cz>
Reviewed-by: NRitesh Harjani <ritesh.list@gmail.com>
Link: https://lore.kernel.org/r/20220525012904.1604737-1-yebin10@huawei.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
Signed-off-by: NBaokun Li <libaokun1@huawei.com>
Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

2c739341

ext4: avoid resizing to a partial cluster size · 59507441

由 Kiselev, Oleg 提交于 11月 21, 2022

stable inclusion
from stable-v5.10.138
commit 80288883294c5b4ed18bae0d8bd9c4a12f297074
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I60QFD

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=80288883294c5b4ed18bae0d8bd9c4a12f297074

--------------------------------

[ Upstream commit 69cb8e9d ]

This patch avoids an attempt to resize the filesystem to an
unaligned cluster boundary.  An online resize to a size that is not
integral to cluster size results in the last iteration attempting to
grow the fs by a negative amount, which trips a BUG_ON and leaves the fs
with a corrupted in-memory superblock.
Signed-off-by: NOleg Kiselev <okiselev@amazon.com>
Link: https://lore.kernel.org/r/0E92A0AB-4F16-4F1A-94B7-702CC6504FDE@amazon.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

59507441

ext4: avoid remove directory when directory is corrupted · 70443094

由 Ye Bin 提交于 11月 21, 2022

stable inclusion
from stable-v5.10.138
commit 285447b81925ff785533c677d552c97818081181
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I60QFD

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=285447b81925ff785533c677d552c97818081181

--------------------------------

[ Upstream commit b24e77ef ]

Now if check directoy entry is corrupted, ext4_empty_dir may return true
then directory will be removed when file system mounted with "errors=continue".
In order not to make things worse just return false when directory is corrupted.
Signed-off-by: NYe Bin <yebin10@huawei.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20220622090223.682234-1-yebin10@huawei.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

70443094

18 11月, 2022 3 次提交

ext4: fix extent status tree race in writeback error recovery path · 311a6951

由 Eric Whitney 提交于 11月 18, 2022

stable inclusion
from stable-v5.10.137
commit e8c747496f23e2cf152899e35de2f25ce647d72b
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I60PLB

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=e8c747496f23e2cf152899e35de2f25ce647d72b

--------------------------------

commit 7f0d8e1d upstream.

A race can occur in the unlikely event ext4 is unable to allocate a
physical cluster for a delayed allocation in a bigalloc file system
during writeback.  Failure to allocate a cluster forces error recovery
that includes a call to mpage_release_unused_pages().  That function
removes any corresponding delayed allocated blocks from the extent
status tree.  If a new delayed write is in progress on the same cluster
simultaneously, resulting in the addition of an new extent containing
one or more blocks in that cluster to the extent status tree, delayed
block accounting can be thrown off if that delayed write then encounters
a similar cluster allocation failure during future writeback.

Write lock the i_data_sem in mpage_release_unused_pages() to fix this
problem.  Ext4's block/cluster accounting code for bigalloc relies on
i_data_sem for mutual exclusion, as is found in the delayed write path,
and the locking in mpage_release_unused_pages() is missing.

Cc: stable@kernel.org
Reported-by: NYe Bin <yebin10@huawei.com>
Signed-off-by: NEric Whitney <enwlinux@gmail.com>
Link: https://lore.kernel.org/r/20220615160530.1928801-1-enwlinux@gmail.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

311a6951

ext4: update s_overhead_clusters in the superblock during an on-line resize · cc815892

由 Theodore Ts'o 提交于 11月 18, 2022

stable inclusion
from stable-v5.10.137
commit ac8cc061145a54ff8d4e0f17f19f0200aabc21ff
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I60PLB

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=ac8cc061145a54ff8d4e0f17f19f0200aabc21ff

--------------------------------

commit de394a86 upstream.

When doing an online resize, the on-disk superblock on-disk wasn't
updated.  This means that when the file system is unmounted and
remounted, and the on-disk overhead value is non-zero, this would
result in the results of statfs(2) to be incorrect.

This was partially fixed by Commits 10b01ee9 ("ext4: fix overhead
calculation to account for the reserved gdt blocks"), 85d825db
("ext4: force overhead calculation if the s_overhead_cluster makes no
sense"), and eb705421 ("ext4: update the cached overhead value in
the superblock").

However, since it was too expensive to forcibly recalculate the
overhead for bigalloc file systems at every mount, this didn't fix the
problem for bigalloc file systems.  This commit should address the
problem when resizing file systems with the bigalloc feature enabled.
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
Reviewed-by: NAndreas Dilger <adilger@dilger.ca>
Link: https://lore.kernel.org/r/20220629040026.112371-1-tytso@mit.eduSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

cc815892

ext4: fix warning in ext4_iomap_begin as race between bmap and write · 4f862919

由 Ye Bin 提交于 11月 18, 2022

stable inclusion
from stable-v5.10.137
commit e1682c7171a6c0ff576fe8116b8cba5b8f538b94
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I60PLB

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=e1682c7171a6c0ff576fe8116b8cba5b8f538b94

--------------------------------

commit 51ae846c upstream.

We got issue as follows:
------------[ cut here ]------------
WARNING: CPU: 3 PID: 9310 at fs/ext4/inode.c:3441 ext4_iomap_begin+0x182/0x5d0
RIP: 0010:ext4_iomap_begin+0x182/0x5d0
RSP: 0018:ffff88812460fa08 EFLAGS: 00010293
RAX: ffff88811f168000 RBX: 0000000000000000 RCX: ffffffff97793c12
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000003
RBP: ffff88812c669160 R08: ffff88811f168000 R09: ffffed10258cd20f
R10: ffff88812c669077 R11: ffffed10258cd20e R12: 0000000000000001
R13: 00000000000000a4 R14: 000000000000000c R15: ffff88812c6691ee
FS:  00007fd0d6ff3740(0000) GS:ffff8883af180000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fd0d6dda290 CR3: 0000000104a62000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 iomap_apply+0x119/0x570
 iomap_bmap+0x124/0x150
 ext4_bmap+0x14f/0x250
 bmap+0x55/0x80
 do_vfs_ioctl+0x952/0xbd0
 __x64_sys_ioctl+0xc6/0x170
 do_syscall_64+0x33/0x40
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

Above issue may happen as follows:
          bmap                    write
bmap
  ext4_bmap
    iomap_bmap
      ext4_iomap_begin
                            ext4_file_write_iter
			      ext4_buffered_write_iter
			        generic_perform_write
				  ext4_da_write_begin
				    ext4_da_write_inline_data_begin
				      ext4_prepare_inline_data
				        ext4_create_inline_data
					  ext4_set_inode_flag(inode,
						EXT4_INODE_INLINE_DATA);
      if (WARN_ON_ONCE(ext4_has_inline_data(inode))) ->trigger bug_on

To solved above issue hold inode lock in ext4_bamp.
Signed-off-by: NYe Bin <yebin10@huawei.com>
Link: https://lore.kernel.org/r/20220617013935.397596-1-yebin10@huawei.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

4f862919

openeuler / Kernel 大约 2 年 前同步成功

openeuler / Kernel
大约 2 年前同步成功