提交 · ad36cedd2b0de845b55f88c15282f4796c71018f · openeuler / Kernel

14 7月, 2023 2 次提交

ext4: Add debug message to notify user space is out of free · ad36cedd

由 Zhihao Cheng 提交于 7月 14, 2023

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I7CBCS
CVE: NA

--------------------------------

Add debug message to notify user that ext4_writepages is stuck in loop
caused by ENOSPC.
Signed-off-by: NZhihao Cheng <chengzhihao1@huawei.com>
(cherry picked from commit 4ae7e703)

ad36cedd

Revert "ext4: Stop trying writing pages if no free blocks generated" · b42d3e12

由 Zhihao Cheng 提交于 7月 14, 2023

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I7CBCS
CVE: NA

--------------------------------

This reverts commit 07a8109d.

When ext4 runs out of space, there could be a potential data lost in
ext4_writepages:
If there are many preallocated blocks for some files, e4b bitmap is
different from block bitmap, and there are more free blocks accounted
by block bitmap.

    ext4_writepages                         P2
ext4_mb_new_blocks                  ext4_map_blocks
 ext4_mb_regular_allocator // No free bits in e4b bitmap
 ext4_mb_discard_preallocations_should_retry
  ext4_mb_discard_preallocations
   ext4_mb_discard_group_preallocations
    ext4_mb_release_inode_pa // updates e4b bitmap by pa->pa_free
     mb_free_blocks
                                     ext4_mb_new_blocks
                                      ext4_mb_regular_allocator
                                      // Got e4b bitmap's free bits
 ext4_mb_regular_allocator  // After 3 times retrying, ret ENOSPC

ext4_writepages
 mpage_map_and_submit_extent
  mpage_map_one_extent // ret ENOSPC
  if (err == -ENOSPC && EXT4_SB(sb)->s_mb_free_pending)
  // s_mb_free_pending is 0
  *give_up_on_write = true  // Abandon writeback, data lost!

Fixes: 07a8109d ("ext4: Stop trying writing pages if no free ...")
Signed-off-by: NZhihao Cheng <chengzhihao1@huawei.com>
(cherry picked from commit 5f142164)

b42d3e12

06 7月, 2023 1 次提交

ext4: Stop trying writing pages if no free blocks generated · 77d99dff

由 Zhihao Cheng 提交于 7月 05, 2023

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I7CBCS

--------------------------------

Folllowing steps could make ext4_wripages trap into a dead loop:

1. Consume free_clusters until free_clusters > 2 * sbi->s_resv_clusters,
   and free_clusters > EXT4_FREECLUSTERS_WATERMARK.
   // eg. free_clusters = 1422, sbi->s_resv_clusters = 512
   // nr_cpus = 4, EXT4_FREECLUSTERS_WATERMARK = 512
2. umount && mount.  // dirty_clusters = 0
3. Run free_clusters tasks concurrently to write different files, many
   tasks write(appendant) 4K data by da_write method. And each inode will
   consume one data block and one extent block in map_block.
   // There are (free_clusters - EXT4_FREECLUSTERS_WATERMARK = 910)
   // tasks choosing da_write method, left 512 tasks choose write_begin
   // method. If tasks which chooses da_write path run first.
   // dirty_clusters = 910, free_clusters = 1422
   // Tasks which choose write_begin path will get ENOSPC:
   //  free_clusters < (nclusters + dirty_clusters + resv_clusters)
   //  1422 < (1 + 910 + 512)
4. After certain number of map_block iterations in ext4_writepages.
   // free_clusters = 0,
   // dirty_clusters = 910 - (1422 / 2) = 199
5. Delete one 4K file.  // free_clusters = 1
6. ext4_writepages traps into dead loop:
    mpage_map_and_submit_extent
     mpage_map_one_extent // ret = ENOSPC
       ext4_map_blocks -> ext4_ext_map_blocks -> ext4_mb_new_blocks ->
       ext4_claim_free_clusters:
         if (free_clusters >= (nclusters + dirty_clusters)) // false
     if (err == -ENOSPC && ext4_count_free_clusters(sb)) // true
       return err
     *give_up_on_write = true // won't be executed

Fix it by terminating ext4_writepages if no free blocks generated.
Signed-off-by: NZhihao Cheng <chengzhihao1@huawei.com>
(cherry picked from commit 07a8109d)

77d99dff

10 5月, 2023 2 次提交

ext4: check iomap type only if ext4_iomap_begin() does not fail · a090a59f

由 Baokun Li 提交于 5月 09, 2023

maillist inclusion
category: bugfix
bugzilla: 188724, https://gitee.com/openeuler/kernel/issues/I70Q22

Reference: https://www.spinics.net/lists/kernel/msg4779681.html

----------------------------------------

When ext4_iomap_overwrite_begin() calls ext4_iomap_begin() map blocks may
fail for some reason (e.g. memory allocation failure, bare disk write), and
later because "iomap->type ! = IOMAP_MAPPED" triggers WARN_ON(). When ext4
iomap_begin() returns an error, it is normal that the type of iomap->type
may not match the expectation. Therefore, we only determine if iomap->type
is as expected when ext4_iomap_begin() is executed successfully.

Reported-by: syzbot+08106c4b7d60702dbc14@syzkaller.appspotmail.com
Link: https://lore.kernel.org/all/00000000000015760b05f9b4eee9@google.comReviewed-by: NJan Kara <jack@suse.cz>
Signed-off-by: NBaokun Li <libaokun1@huawei.com>
Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
Reviewed-by: NYang Erkun <yangerkun@huawei.com>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>

a090a59f

ext4: only update i_reserved_data_blocks on successful block allocation · f0af88ce

由 Baokun Li 提交于 5月 09, 2023

maillist inclusion
category: bugfix
bugzilla: 188499, https://gitee.com/openeuler/kernel/issues/I6TNVT
CVE: NA

Reference: https://patchwork.ozlabs.org/project/linux-ext4/patch/20230412124126.2286716-2-libaokun1@huawei.com/

----------------------------------------

In our fault injection test, we create an ext4 file, migrate it to
non-extent based file, then punch a hole and finally trigger a WARN_ON
in the ext4_da_update_reserve_space():

EXT4-fs warning (device sda): ext4_da_update_reserve_space:369:
ino 14, used 11 with only 10 reserved data blocks

When writing back a non-extent based file, if we enable delalloc, the
number of reserved blocks will be subtracted from the number of blocks
mapped by ext4_ind_map_blocks(), and the extent status tree will be
updated. We update the extent status tree by first removing the old
extent_status and then inserting the new extent_status. If the block range
we remove happens to be in an extent, then we need to allocate another
extent_status with ext4_es_alloc_extent().

       use old    to remove   to add new
    |----------|------------|------------|
              old extent_status

The problem is that the allocation of a new extent_status failed due to a
fault injection, and __es_shrink() did not get free memory, resulting in
a return of -ENOMEM. Then do_writepages() retries after receiving -ENOMEM,
we map to the same extent again, and the number of reserved blocks is again
subtracted from the number of blocks in that extent. Since the blocks in
the same extent are subtracted twice, we end up triggering WARN_ON at
ext4_da_update_reserve_space() because used > ei->i_reserved_data_blocks.

For non-extent based file, we update the number of reserved blocks after
ext4_ind_map_blocks() is executed, which causes a problem that when we call
ext4_ind_map_blocks() to create a block, it doesn't always create a block,
but we always reduce the number of reserved blocks. So we move the logic
for updating reserved blocks to ext4_ind_map_blocks() to ensure that the
number of reserved blocks is updated only after we do succeed in allocating
some new blocks.

Fixes: 5f634d06 ("ext4: Fix quota accounting error with fallocate")
Reviewed-by: NJan Kara <jack@suse.cz>
Signed-off-by: NBaokun Li <libaokun1@huawei.com>
Reviewed-by: NYang Erkun <yangerkun@huawei.com>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>

f0af88ce

13 4月, 2023 2 次提交

ext4: place buffer head allocation before handle start · 29d721e0

由 Jinke Han 提交于 4月 13, 2023

stable inclusion
from stable-v5.10.150
commit 74d2a398d2d8c54d6468bc1e9da60ed9f3c4739f
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I6D0XA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=74d2a398d2d8c54d6468bc1e9da60ed9f3c4739f

--------------------------------

commit d1052d23 upstream.

In our product environment, we encounter some jbd hung waiting handles to
stop while several writters were doing memory reclaim for buffer head
allocation in delay alloc write path. Ext4 do buffer head allocation with
holding transaction handle which may be blocked too long if the reclaim
works not so smooth. According to our bcc trace, the reclaim time in
buffer head allocation can reach 258s and the jbd transaction commit also
take almost the same time meanwhile. Except for these extreme cases,
we often see several seconds delays for cgroup memory reclaim on our
servers. This is more likely to happen considering docker environment.

One thing to note, the allocation of buffer heads is as often as page
allocation or more often when blocksize less than page size. Just like
page cache allocation, we should also place the buffer head allocation
before startting the handle.

Cc: stable@kernel.org
Signed-off-by: NJinke Han <hanjinke.666@bytedance.com>
Link: https://lore.kernel.org/r/20220903012429.22555-1-hanjinke.666@bytedance.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>

29d721e0

ext4: don't increase iversion counter for ea_inodes · 18254b7f

由 Lukas Czerner 提交于 4月 13, 2023

stable inclusion
from stable-v5.10.150
commit 0e1764ad71abca735418fd596a82067074b59687
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I6D0XA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=0e1764ad71abca735418fd596a82067074b59687

--------------------------------

commit 50f094a5 upstream.

ea_inodes are using i_version for storing part of the reference count so
we really need to leave it alone.

The problem can be reproduced by xfstest ext4/026 when iversion is
enabled. Fix it by not calling inode_inc_iversion() for EXT4_EA_INODE_FL
inodes in ext4_mark_iloc_dirty().

Cc: stable@kernel.org
Signed-off-by: NLukas Czerner <lczerner@redhat.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Reviewed-by: NJeff Layton <jlayton@kernel.org>
Reviewed-by: NChristian Brauner (Microsoft) <brauner@kernel.org>
Link: https://lore.kernel.org/r/20220824160349.39664-1-lczerner@redhat.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>

18254b7f

12 4月, 2023 1 次提交

ext4: Fix i_disksize exceeding i_size problem in paritally written case · 1be2adf6

由 Zhihao Cheng 提交于 4月 12, 2023

maillist inclusion
category: bugfix
bugzilla: https://gitee.com/src-openeuler/kernel/issues/I6SMBI
CVE: NA

Reference: https://www.spinics.net/lists/linux-ext4/msg88386.html

--------------------------------

Following process makes i_disksize exceed i_size:

generic_perform_write
 copied = iov_iter_copy_from_user_atomic(len) // copied < len
 ext4_da_write_end
 | ext4_update_i_disksize
 |  new_i_size = pos + copied;
 |  WRITE_ONCE(EXT4_I(inode)->i_disksize, newsize) // update i_disksize
 | generic_write_end
 |  copied = block_write_end(copied, len) // copied = 0
 |   if (unlikely(copied < len))
 |    if (!PageUptodate(page))
 |     copied = 0;
 |  if (pos + copied > inode->i_size) // return false
 if (unlikely(copied == 0))
  goto again;
 if (unlikely(iov_iter_fault_in_readable(i, bytes))) {
  status = -EFAULT;
  break;
 }

We get i_disksize greater than i_size here, which could trigger WARNING
check 'i_size_read(inode) < EXT4_I(inode)->i_disksize' while doing dio:

ext4_dio_write_iter
 iomap_dio_rw
  __iomap_dio_rw // return err, length is not aligned to 512
 ext4_handle_inode_extension
  WARN_ON_ONCE(i_size_read(inode) < EXT4_I(inode)->i_disksize) // Oops

 WARNING: CPU: 2 PID: 2609 at fs/ext4/file.c:319
 CPU: 2 PID: 2609 Comm: aa Not tainted 6.3.0-rc2
 RIP: 0010:ext4_file_write_iter+0xbc7
 Call Trace:
  vfs_write+0x3b1
  ksys_write+0x77
  do_syscall_64+0x39

Fix it by updating 'copied' value before updating i_disksize just like
ext4_write_inline_data_end() does.

Fetch a reproducer in [Link].

Link: https://bugzilla.kernel.org/show_bug.cgi?id=217209
Fixes: 64769240 ("ext4: Add delayed allocation support in data=writeback mode")
Signed-off-by: NZhihao Cheng <chengzhihao1@huawei.com>
Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>

1be2adf6

07 2月, 2023 1 次提交

ext4: fix use-after-free in ext4_orphan_cleanup · 21851e4c

由 Baokun Li 提交于 2月 07, 2023

stable inclusion
from stable-v5.10.163
commit 7223d5e75f26352354ea2c0ccf8b579821b52adf
category: bugfix
bugzilla: 187904,https://gitee.com/openeuler/kernel/issues/I6BJAR
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=7223d5e75f26352354ea2c0ccf8b579821b52adf

--------------------------------

commit a71248b1 upstream.

I caught a issue as follows:

==================================================================
 BUG: KASAN: use-after-free in __list_add_valid+0x28/0x1a0
 Read of size 8 at addr ffff88814b13f378 by task mount/710

 CPU: 1 PID: 710 Comm: mount Not tainted 6.1.0-rc3-next #370
 Call Trace:
  <TASK>
  dump_stack_lvl+0x73/0x9f
  print_report+0x25d/0x759
  kasan_report+0xc0/0x120
  __asan_load8+0x99/0x140
  __list_add_valid+0x28/0x1a0
  ext4_orphan_cleanup+0x564/0x9d0 [ext4]
  __ext4_fill_super+0x48e2/0x5300 [ext4]
  ext4_fill_super+0x19f/0x3a0 [ext4]
  get_tree_bdev+0x27b/0x450
  ext4_get_tree+0x19/0x30 [ext4]
  vfs_get_tree+0x49/0x150
  path_mount+0xaae/0x1350
  do_mount+0xe2/0x110
  __x64_sys_mount+0xf0/0x190
  do_syscall_64+0x35/0x80
  entry_SYSCALL_64_after_hwframe+0x63/0xcd
  </TASK>
 [...]
==================================================================

Above issue may happen as follows:
-------------------------------------
ext4_fill_super
  ext4_orphan_cleanup
   --- loop1: assume last_orphan is 12 ---
    list_add(&EXT4_I(inode)->i_orphan, &EXT4_SB(sb)->s_orphan)
    ext4_truncate --> return 0
      ext4_inode_attach_jinode --> return -ENOMEM
    iput(inode) --> free inode<12>
   --- loop2: last_orphan is still 12 ---
    list_add(&EXT4_I(inode)->i_orphan, &EXT4_SB(sb)->s_orphan);
    // use inode<12> and trigger UAF

To solve this issue, we need to propagate the return value of
ext4_inode_attach_jinode() appropriately.
Signed-off-by: NBaokun Li <libaokun1@huawei.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20221102080633.1630225-1-libaokun1@huawei.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NBaokun Li <libaokun1@huawei.com>
Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>

21851e4c

18 11月, 2022 2 次提交

ext4: fix extent status tree race in writeback error recovery path · 311a6951

由 Eric Whitney 提交于 11月 18, 2022

stable inclusion
from stable-v5.10.137
commit e8c747496f23e2cf152899e35de2f25ce647d72b
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I60PLB

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=e8c747496f23e2cf152899e35de2f25ce647d72b

--------------------------------

commit 7f0d8e1d upstream.

A race can occur in the unlikely event ext4 is unable to allocate a
physical cluster for a delayed allocation in a bigalloc file system
during writeback.  Failure to allocate a cluster forces error recovery
that includes a call to mpage_release_unused_pages().  That function
removes any corresponding delayed allocated blocks from the extent
status tree.  If a new delayed write is in progress on the same cluster
simultaneously, resulting in the addition of an new extent containing
one or more blocks in that cluster to the extent status tree, delayed
block accounting can be thrown off if that delayed write then encounters
a similar cluster allocation failure during future writeback.

Write lock the i_data_sem in mpage_release_unused_pages() to fix this
problem.  Ext4's block/cluster accounting code for bigalloc relies on
i_data_sem for mutual exclusion, as is found in the delayed write path,
and the locking in mpage_release_unused_pages() is missing.

Cc: stable@kernel.org
Reported-by: NYe Bin <yebin10@huawei.com>
Signed-off-by: NEric Whitney <enwlinux@gmail.com>
Link: https://lore.kernel.org/r/20220615160530.1928801-1-enwlinux@gmail.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

311a6951

ext4: fix warning in ext4_iomap_begin as race between bmap and write · 4f862919

由 Ye Bin 提交于 11月 18, 2022

stable inclusion
from stable-v5.10.137
commit e1682c7171a6c0ff576fe8116b8cba5b8f538b94
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I60PLB

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=e1682c7171a6c0ff576fe8116b8cba5b8f538b94

--------------------------------

commit 51ae846c upstream.

We got issue as follows:
------------[ cut here ]------------
WARNING: CPU: 3 PID: 9310 at fs/ext4/inode.c:3441 ext4_iomap_begin+0x182/0x5d0
RIP: 0010:ext4_iomap_begin+0x182/0x5d0
RSP: 0018:ffff88812460fa08 EFLAGS: 00010293
RAX: ffff88811f168000 RBX: 0000000000000000 RCX: ffffffff97793c12
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000003
RBP: ffff88812c669160 R08: ffff88811f168000 R09: ffffed10258cd20f
R10: ffff88812c669077 R11: ffffed10258cd20e R12: 0000000000000001
R13: 00000000000000a4 R14: 000000000000000c R15: ffff88812c6691ee
FS:  00007fd0d6ff3740(0000) GS:ffff8883af180000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fd0d6dda290 CR3: 0000000104a62000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 iomap_apply+0x119/0x570
 iomap_bmap+0x124/0x150
 ext4_bmap+0x14f/0x250
 bmap+0x55/0x80
 do_vfs_ioctl+0x952/0xbd0
 __x64_sys_ioctl+0xc6/0x170
 do_syscall_64+0x33/0x40
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

Above issue may happen as follows:
          bmap                    write
bmap
  ext4_bmap
    iomap_bmap
      ext4_iomap_begin
                            ext4_file_write_iter
			      ext4_buffered_write_iter
			        generic_perform_write
				  ext4_da_write_begin
				    ext4_da_write_inline_data_begin
				      ext4_prepare_inline_data
				        ext4_create_inline_data
					  ext4_set_inode_flag(inode,
						EXT4_INODE_INLINE_DATA);
      if (WARN_ON_ONCE(ext4_has_inline_data(inode))) ->trigger bug_on

To solved above issue hold inode lock in ext4_bamp.
Signed-off-by: NYe Bin <yebin10@huawei.com>
Link: https://lore.kernel.org/r/20220617013935.397596-1-yebin10@huawei.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>

4f862919

03 11月, 2022 1 次提交

ext4: add EXT4_IGET_BAD flag to prevent unexpected bad inode · dcc87dad

由 Baokun Li 提交于 11月 03, 2022

hulk inclusion
category: bugfix
bugzilla: 187821,https://gitee.com/openeuler/kernel/issues/I5X9U0
CVE: NA

--------------------------------

There are many places that will get unhappy (and crash) when ext4_iget()
returns a bad inode. However, if iget the boot loader inode, allows a bad
inode to be returned, because the inode may not be initialized. This
mechanism can be used to bypass some checks and cause panic. To solve this
problem, we add a special iget flag EXT4_IGET_BAD. Only with this flag
we'd be returning bad inode from ext4_iget(), otherwise we always return
the error code if the inode is bad inode.(suggested by Jan Kara)
Signed-off-by: NBaokun Li <libaokun1@huawei.com>
Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

dcc87dad

04 8月, 2022 1 次提交

ext4: Remove EA inode entry from mbcache on inode eviction · bd9cf9ac

由 Jan Kara 提交于 8月 04, 2022

hulk inclusion
category: bugfix
bugzilla: 186975, https://gitee.com/openeuler/kernel/issues/I5HT6F
CVE: NA

Reference: https://patchwork.ozlabs.org/project/linux-ext4/list/?series=309169

--------------------------------

Currently we remove EA inode from mbcache as soon as its xattr refcount
drops to zero. However there can be pending attempts to reuse the inode
and thus refcount handling code has to handle the situation when
refcount increases from zero anyway. So save some work and just keep EA
inode in mbcache until it is getting evicted. At that moment we are sure
following iget() of EA inode will fail anyway (or wait for eviction to
finish and load things from the disk again) and so removing mbcache
entry at that moment is fine and simplifies the code a bit.

CC: stable@vger.kernel.org
Fixes: 82939d79 ("ext4: convert to mbcache2")
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NZhihao Cheng <chengzhihao1@huawei.com>
Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

bd9cf9ac

26 7月, 2022 2 次提交

ext4: limit length to bitmap_maxbytes - blocksize in punch_hole · aaf9e2fa

由 Tadeusz Struk 提交于 7月 26, 2022

stable inclusion
from stable-v5.10.113
commit 22c450d39f8922ae26de459cf4f83b2b294f207e
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5ISAH

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=22c450d39f8922ae26de459cf4f83b2b294f207e

--------------------------------

commit 2da37622 upstream.

Syzbot found an issue [1] in ext4_fallocate().
The C reproducer [2] calls fallocate(), passing size 0xffeffeff000ul,
and offset 0x1000000ul, which, when added together exceed the
bitmap_maxbytes for the inode. This triggers a BUG in
ext4_ind_remove_space(). According to the comments in this function
the 'end' parameter needs to be one block after the last block to be
removed. In the case when the BUG is triggered it points to the last
block. Modify the ext4_punch_hole() function and add constraint that
caps the length to satisfy the one before laster block requirement.

LINK: [1] https://syzkaller.appspot.com/bug?id=b80bd9cf348aac724a4f4dff251800106d721331
LINK: [2] https://syzkaller.appspot.com/text?tag=ReproC&x=14ba0238700000

Fixes: a4bb6b64 ("ext4: enable "punch hole" functionality")
Reported-by: syzbot+7a806094edd5d07ba029@syzkaller.appspotmail.com
Signed-off-by: NTadeusz Struk <tadeusz.struk@linaro.org>
Link: https://lore.kernel.org/r/20220331200515.153214-1-tadeusz.struk@linaro.orgSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>

aaf9e2fa

ext4: fix fallocate to use file_modified to update permissions consistently · 97cada6b

由 Darrick J. Wong 提交于 7月 26, 2022

stable inclusion
from stable-v5.10.113
commit f6038d43b25bba1cd50d2a77e207f6550aee9954
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I5ISAH

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=f6038d43b25bba1cd50d2a77e207f6550aee9954

--------------------------------

commit ad5cd4f4 upstream.

Since the initial introduction of (posix) fallocate back at the turn of
the century, it has been possible to use this syscall to change the
user-visible contents of files.  This can happen by extending the file
size during a preallocation, or through any of the newer modes (punch,
zero, collapse, insert range).  Because the call can be used to change
file contents, we should treat it like we do any other modification to a
file -- update the mtime, and drop set[ug]id privileges/capabilities.

The VFS function file_modified() does all this for us if pass it a
locked inode, so let's make fallocate drop permissions correctly.
Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
Link: https://lore.kernel.org/r/20220308185043.GA117678@magnoliaSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>

97cada6b

06 7月, 2022 2 次提交

ext4: correct the misjudgment in ext4_iget_extra_inode · b5f5d3d0

由 Baokun Li 提交于 7月 06, 2022

hulk inclusion
category: bugfix
bugzilla: 186866, https://gitee.com/openeuler/kernel/issues/I5DTBL
CVE: NA

--------------------------------

Use the EXT4_INODE_HAS_XATTR_SPACE macro to more accurately
determine whether the inode have xattr space.
Signed-off-by: NBaokun Li <libaokun1@huawei.com>
Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

b5f5d3d0

ext4: don't BUG if someone dirty pages without asking ext4 first · e80175ba

由 Theodore Ts'o 提交于 7月 06, 2022

stable inclusion
from stable-v5.10.110
commit 330d0e44fc5a47c27df958ecdd4693a3cb1d8b81
bugzilla: https://gitee.com/openeuler/kernel/issues/I574AL

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=330d0e44fc5a47c27df958ecdd4693a3cb1d8b81

--------------------------------

[ Upstream commit cc509574 ]

[un]pin_user_pages_remote is dirtying pages without properly warning
the file system in advance.  A related race was noted by Jan Kara in
2018[1]; however, more recently instead of it being a very hard-to-hit
race, it could be reliably triggered by process_vm_writev(2) which was
discovered by Syzbot[2].

This is technically a bug in mm/gup.c, but arguably ext4 is fragile in
that if some other kernel subsystem dirty pages without properly
notifying the file system using page_mkwrite(), ext4 will BUG, while
other file systems will not BUG (although data will still be lost).

So instead of crashing with a BUG, issue a warning (since there may be
potential data loss) and just mark the page as clean to avoid
unprivileged denial of service attacks until the problem can be
properly fixed.  More discussion and background can be found in the
thread starting at [2].

[1] https://lore.kernel.org/linux-mm/20180103100430.GE4911@quack2.suse.cz
[2] https://lore.kernel.org/r/Yg0m6IjcNmfaSokM@google.com

Reported-by: syzbot+d59332e2db681cf18f0318a06e994ebbb529a8db@syzkaller.appspotmail.com
Reported-by: NLee Jones <lee.jones@linaro.org>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Link: https://lore.kernel.org/r/YiDS9wVfq4mM2jGK@mit.eduSigned-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NYu Liao <liaoyu15@huawei.com>
Reviewed-by: NWei Li <liwei391@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

e80175ba

23 5月, 2022 1 次提交

ext4: Fix warning in ext4_da_release_space · 782a6ba7

由 Ye Bin 提交于 5月 23, 2022

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I58KLD
CVE: NA

---------------------------

We got issue as follows:
WARNING: CPU: 2 PID: 1936 at fs/ext4/inode.c:1511 ext4_da_release_space+0x1b9/0x266
Modules linked in:
CPU: 2 PID: 1936 Comm: dd Not tainted 5.10.0+ #344
RIP: 0010:ext4_da_release_space+0x1b9/0x266
RSP: 0018:ffff888127307848 EFLAGS: 00010292
RAX: 0000000000000000 RBX: 0000000000000001 RCX: ffffffff843f67cc
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffed1024e60ed9
RBP: ffff888124dc8140 R08: 0000000000000083 R09: ffffed1075da6d23
R10: ffff8883aed36917 R11: ffffed1075da6d22 R12: ffff888124dc83f0
R13: ffff888124dc844c R14: ffff888124dc8168 R15: 000000000000000c
FS:  00007f6b7247d740(0000) GS:ffff8883aed00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007ffc1a0b7dd8 CR3: 00000001065ce000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 ext4_es_remove_extent+0x187/0x230
 mpage_release_unused_pages+0x3af/0x470
 ext4_writepages+0xb9b/0x1160
 do_writepages+0xbb/0x1e0
 __filemap_fdatawrite_range+0x1b1/0x1f0
 file_write_and_wait_range+0x80/0xe0
 ext4_sync_file+0x13d/0x800
 vfs_fsync_range+0x75/0x140
 do_fsync+0x4d/0x90
 __x64_sys_fsync+0x1d/0x30
 do_syscall_64+0x33/0x40
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

Above issue may happens as follows:
	process1                        process2
ext4_da_write_begin
  ext4_da_reserve_space
    ext4_es_insert_delayed_block[1/1]
                                    ext4_da_write_begin
				      ext4_es_insert_delayed_block[0/1]
ext4_writepages
  ****Delayed block allocation failed****
  mpage_release_unused_pages
    ext4_es_remove_extent[1/1]
      ext4_da_release_space [reserved 0]

ext4_da_write_begin
  ext4_es_scan_clu(inode, &ext4_es_is_delonly, lblk)
   ->As there exist [0, 1] extent, so will return true
                                   ext4_writepages
				   ****Delayed block allocation failed****
                                     mpage_release_unused_pages
				       ext4_es_remove_extent[0/1]
				         ext4_da_release_space [reserved 1]
					   ei->i_reserved_data_blocks [1->0]

  ext4_es_insert_delayed_block[1/1]

ext4_writepages
  ****Delayed block allocation failed****
  mpage_release_unused_pages
  ext4_es_remove_extent[1/1]
   ext4_da_release_space [reserved 1]
    ei->i_reserved_data_blocks[0, -1]
    ->As ei->i_reserved_data_blocks already is zero but to_free is 1,
    will trigger warning.

To solve above issue, introduce i_clu_lock to protect insert delayed
block and remove block under cluster delay allocate mode.
Signed-off-by: NYe Bin <yebin10@huawei.com>
Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

782a6ba7

21 5月, 2022 2 次提交

ext4: fix warning in ext4_handle_inode_extension · d3b4c686

由 Ye Bin 提交于 5月 21, 2022

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I58A7W?from=project-issue
CVE: N/A

---------------------------

We got issue as follows:
EXT4-fs error (device loop0) in ext4_reserve_inode_write:5741: Out of memory
EXT4-fs error (device loop0): ext4_setattr:5462: inode #13: comm syz-executor.0: mark_inode_dirty error
EXT4-fs error (device loop0) in ext4_setattr:5519: Out of memory
EXT4-fs error (device loop0): ext4_ind_map_blocks:595: inode #13: comm syz-executor.0: Can't allocate blocks for non-extent mapped inodes with bigalloc
------------[ cut here ]------------
WARNING: CPU: 1 PID: 4361 at fs/ext4/file.c:301 ext4_file_write_iter+0x11c9/0x1220
Modules linked in:
CPU: 1 PID: 4361 Comm: syz-executor.0 Not tainted 5.10.0+ #1
RIP: 0010:ext4_file_write_iter+0x11c9/0x1220
RSP: 0018:ffff924d80b27c00 EFLAGS: 00010282
RAX: ffffffff815a3379 RBX: 0000000000000000 RCX: 000000003b000000
RDX: ffff924d81601000 RSI: 00000000000009cc RDI: 00000000000009cd
RBP: 000000000000000d R08: ffffffffbc5a2c6b R09: 0000902e0e52a96f
R10: ffff902e2b7c1b40 R11: ffff902e2b7c1b40 R12: 000000000000000a
R13: 0000000000000001 R14: ffff902e0e52aa10 R15: ffffffffffffff8b
FS:  00007f81a7f65700(0000) GS:ffff902e3bc80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffffffff600400 CR3: 000000012db88001 CR4: 00000000003706e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 do_iter_readv_writev+0x2e5/0x360
 do_iter_write+0x112/0x4c0
 do_pwritev+0x1e5/0x390
 __x64_sys_pwritev2+0x7e/0xa0
 do_syscall_64+0x37/0x50
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

Above issue may happen as follows:
Assume
inode.i_size=4096
EXT4_I(inode)->i_disksize=4096

step 1: set inode->i_isize = 8192
ext4_setattr
  if (attr->ia_size != inode->i_size)
    EXT4_I(inode)->i_disksize = attr->ia_size;
    rc = ext4_mark_inode_dirty
       ext4_reserve_inode_write
          ext4_get_inode_loc
            __ext4_get_inode_loc
              sb_getblk --> return -ENOMEM
   ...
   if (!error)  ->will not update i_size
     i_size_write(inode, attr->ia_size);
Now:
inode.i_size=4096
EXT4_I(inode)->i_disksize=8192

step 2: Direct write 4096 bytes
ext4_file_write_iter
 ext4_dio_write_iter
   iomap_dio_rw ->return error
 if (extend)
   ext4_handle_inode_extension
     WARN_ON_ONCE(i_size_read(inode) < EXT4_I(inode)->i_disksize);
->Then trigger warning.

To solve above issue, if mark inode dirty failed in ext4_setattr just
set 'EXT4_I(inode)->i_disksize' with old value.
Signed-off-by: NYe Bin <yebin10@huawei.com>
Signed-off-by: NLi Nan <linan122@huawei.com>
Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

d3b4c686

ext4: fix race condition between ext4_write and ext4_convert_inline_data · 5e347d13

由 Baokun Li 提交于 5月 21, 2022

hulk inclusion
category: bugfix
bugzilla: 186638, https://gitee.com/openeuler/kernel/issues/I57PM8
CVE: NA

--------------------------------

Hulk Robot reported a BUG_ON:
 ==================================================================
 EXT4-fs error (device loop3): ext4_mb_generate_buddy:805: group 0,
 block bitmap and bg descriptor inconsistent: 25 vs 31513 free clusters
 kernel BUG at fs/ext4/ext4_jbd2.c:53!
 invalid opcode: 0000 [#1] SMP KASAN PTI
 CPU: 0 PID: 25371 Comm: syz-executor.3 Not tainted 5.10.0+ #1
 RIP: 0010:ext4_put_nojournal fs/ext4/ext4_jbd2.c:53 [inline]
 RIP: 0010:__ext4_journal_stop+0x10e/0x110 fs/ext4/ext4_jbd2.c:116
 [...]
 Call Trace:
  ext4_write_inline_data_end+0x59a/0x730 fs/ext4/inline.c:795
  generic_perform_write+0x279/0x3c0 mm/filemap.c:3344
  ext4_buffered_write_iter+0x2e3/0x3d0 fs/ext4/file.c:270
  ext4_file_write_iter+0x30a/0x11c0 fs/ext4/file.c:520
  do_iter_readv_writev+0x339/0x3c0 fs/read_write.c:732
  do_iter_write+0x107/0x430 fs/read_write.c:861
  vfs_writev fs/read_write.c:934 [inline]
  do_pwritev+0x1e5/0x380 fs/read_write.c:1031
 [...]
 ==================================================================

Above issue may happen as follows:
           cpu1                     cpu2
__________________________|__________________________
do_pwritev
 vfs_writev
  do_iter_write
   ext4_file_write_iter
    ext4_buffered_write_iter
     generic_perform_write
      ext4_da_write_begin
                           vfs_fallocate
                            ext4_fallocate
                             ext4_convert_inline_data
                              ext4_convert_inline_data_nolock
                               ext4_destroy_inline_data_nolock
                                clear EXT4_STATE_MAY_INLINE_DATA
                               ext4_map_blocks
                                ext4_ext_map_blocks
                                 ext4_mb_new_blocks
                                  ext4_mb_regular_allocator
                                   ext4_mb_good_group_nolock
                                    ext4_mb_init_group
                                     ext4_mb_init_cache
                                      ext4_mb_generate_buddy  --> error
       ext4_test_inode_state(inode, EXT4_STATE_MAY_INLINE_DATA)
                                ext4_restore_inline_data
                                 set EXT4_STATE_MAY_INLINE_DATA
       ext4_block_write_begin
      ext4_da_write_end
       ext4_test_inode_state(inode, EXT4_STATE_MAY_INLINE_DATA)
       ext4_write_inline_data_end
        handle=NULL
        ext4_journal_stop(handle)
         __ext4_journal_stop
          ext4_put_nojournal(handle)
           ref_cnt = (unsigned long)handle
           BUG_ON(ref_cnt == 0)  ---> BUG_ON

The lock held by ext4_convert_inline_data is xattr_sem, but the lock
held by generic_perform_write is i_rwsem. Therefore, the two locks can
be concurrent.

To solve above issue, we add inode_lock() for ext4_convert_inline_data().
At the same time, move ext4_convert_inline_data() in front of
ext4_punch_hole(), remove similar handling from ext4_punch_hole().

Fixes: 0c8d414f ("ext4: let fallocate handle inline data correctly")
Cc: stable@vger.kernel.org
Reported-by: NHulk Robot <hulkci@huawei.com>
Signed-off-by: NBaokun Li <libaokun1@huawei.com>
Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

5e347d13

27 4月, 2022 3 次提交

ext4: fast commit may miss tracking unwritten range during ftruncate · c98bcf2d

由 Xin Yin 提交于 4月 27, 2022

stable inclusion
from stable-v5.10.94
commit f26b24b4c115f9c8fe8defd2c158420d30b7af0f
bugzilla: https://gitee.com/openeuler/kernel/issues/I531X9

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=f26b24b4c115f9c8fe8defd2c158420d30b7af0f

--------------------------------

commit 9725958b upstream.

If use FALLOC_FL_KEEP_SIZE to alloc unwritten range at bottom, the
inode->i_size will not include the unwritten range. When call
ftruncate with fast commit enabled, it will miss to track the
unwritten range.

Change to trace the full range during ftruncate.
Signed-off-by: NXin Yin <yinxin.x@bytedance.com>
Reviewed-by: NHarshad Shirwadkar <harshadshirwadkar@gmail.com>
Link: https://lore.kernel.org/r/20211223032337.5198-3-yinxin.x@bytedance.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>

c98bcf2d

ext4: fix fast commit may miss tracking range for FALLOC_FL_ZERO_RANGE · cf74d435

由 Xin Yin 提交于 4月 27, 2022

stable inclusion
from stable-v5.10.94
commit e4221629d5e1479db400d8a4cbf865c65a457630
bugzilla: https://gitee.com/openeuler/kernel/issues/I531X9

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=e4221629d5e1479db400d8a4cbf865c65a457630

--------------------------------

commit 5e4d0eba upstream.

when call falloc with FALLOC_FL_ZERO_RANGE, to set an range to unwritten,
which has been already initialized. If the range is align to blocksize,
fast commit will not track range for this change.

Also track range for unwritten range in ext4_map_blocks().
Signed-off-by: NXin Yin <yinxin.x@bytedance.com>
Reviewed-by: NHarshad Shirwadkar <harshadshirwadkar@gmail.com>
Link: https://lore.kernel.org/r/20211221022839.374606-1-yinxin.x@bytedance.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>

cf74d435

ext4: initialize err_blk before calling __ext4_get_inode_loc · 21c4a627

由 Harshad Shirwadkar 提交于 4月 27, 2022

stable inclusion
from stable-v5.10.94
commit 720508dd118d04035875823f44bcd27388ff39b2
bugzilla: https://gitee.com/openeuler/kernel/issues/I531X9

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=720508dd118d04035875823f44bcd27388ff39b2

--------------------------------

commit c27c29c6 upstream.

It is not guaranteed that __ext4_get_inode_loc will definitely set
err_blk pointer when it returns EIO. To avoid using uninitialized
variables, let's first set err_blk to 0.
Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NHarshad Shirwadkar <harshadshirwadkar@gmail.com>
Link: https://lore.kernel.org/r/20211201163421.2631661-1-harshads@google.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
Acked-by: NXie XiuQi <xiexiuqi@huawei.com>

21c4a627

22 1月, 2022 1 次提交

ext4: fix an use-after-free issue about data=journal writeback mode · e0f43056

由 Zhang Yi 提交于 1月 22, 2022

mainline inclusion
from mainline-5.17-rc1
commit 5c48a7df
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4RN96
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5c48a7df91499e371ef725895b2e2d21a126e227

---------------------------

Our syzkaller report an use-after-free issue that accessing the freed
buffer_head on the writeback page in __ext4_journalled_writepage(). The
problem is that if there was a truncate racing with the data=journalled
writeback procedure, the writeback length could become zero and
bget_one() refuse to get buffer_head's refcount, then the truncate
procedure release buffer once we drop page lock, finally, the last
ext4_walk_page_buffers() trigger the use-after-free problem.

sync                               truncate
ext4_sync_file()
 file_write_and_wait_range()
                                   ext4_setattr(0)
                                    inode->i_size = 0
  ext4_writepage()
   len = 0
   __ext4_journalled_writepage()
    page_bufs = page_buffers(page)
    ext4_walk_page_buffers(bget_one) <- does not get refcount
                                    do_invalidatepage()
                                      free_buffer_head()
    ext4_walk_page_buffers(page_bufs) <- trigger use-after-free

After commit bdf96838 ("ext4: fix race between truncate and
__ext4_journalled_writepage()"), we have already handled the racing
case, so the bget_one() and bput_one() are not needed. So this patch
simply remove these hunk, and recheck the i_size to make it safe.

Fixes: bdf96838 ("ext4: fix race between truncate and __ext4_journalled_writepage()")
Signed-off-by: NZhang Yi <yi.zhang@huawei.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20211225090937.712867-1-yi.zhang@huawei.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

Conflict:
	fs/ext4/inode.c
Signed-off-by: NZhang Yi <yi.zhang@huawei.com>
Reviewed-by: NYe bin <yebin10@huawei.com>
Reviewed-by: NJason Yan <yanaijie@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

e0f43056

03 12月, 2021 1 次提交

ext4: stop IO for page without buffer_head · d7e879dd

由 yangerkun 提交于 12月 03, 2021

hulk inclusion
category: bugfix
bugzilla: 185810, https://gitee.com/openeuler/kernel/issues/I4JX1G
CVE: NA

---------------------------

dio_bio_complete will set page dirty without consider is there still
buffer_head valid with this page. This will trigger some problem while
ext4 try to writeback this page. For ext4, we fix it by skip writeback
the page without buffer_head.

[1] https://lwn.net/Articles/774411/ : "DMA and get_user_pages()"
[2] https://lwn.net/Articles/753027/ : "The Trouble with get_user_pages()"
[3] https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=fc1d8e7cca2daa18Signed-off-by: Nyangerkun <yangerkun@huawei.com>
Reviewed-by: Nzhangyi (F) <yi.zhang@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
Signed-off-by: NXie XiuQi <xiexiuqi@huawei.com>

Conflicts:
	fs/ext4/inode.c
Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

d7e879dd

15 11月, 2021 5 次提交

ext4: fix reserved space counter leakage · e9bfcc13

由 Jeffle Xu 提交于 11月 15, 2021

stable inclusion
from stable-5.10.71
commit 9ccf35492b084ecbc916761a5c6f42599450f013
bugzilla: 182981 https://gitee.com/openeuler/kernel/issues/I4I3KD

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=9ccf35492b084ecbc916761a5c6f42599450f013

--------------------------------

commit 6fed8395 upstream.

When ext4_insert_delayed block receives and recovers from an error from
ext4_es_insert_delayed_block(), e.g., ENOMEM, it does not release the
space it has reserved for that block insertion as it should. One effect
of this bug is that s_dirtyclusters_counter is not decremented and
remains incorrectly elevated until the file system has been unmounted.
This can result in premature ENOSPC returns and apparent loss of free
space.

Another effect of this bug is that
/sys/fs/ext4/<dev>/delayed_allocation_blocks can remain non-zero even
after syncfs has been executed on the filesystem.

Besides, add check for s_dirtyclusters_counter when inode is going to be
evicted and freed. s_dirtyclusters_counter can still keep non-zero until
inode is written back in .evict_inode(), and thus the check is delayed
to .destroy_inode().

Fixes: 51865fda ("ext4: let ext4 maintain extent status tree")
Cc: stable@kernel.org
Suggested-by: NGao Xiang <hsiangkao@linux.alibaba.com>
Signed-off-by: NJeffle Xu <jefflexu@linux.alibaba.com>
Reviewed-by: NEric Whitney <enwlinux@gmail.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Link: https://lore.kernel.org/r/20210823061358.84473-1-jefflexu@linux.alibaba.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

e9bfcc13

ext4: drop unnecessary journal handle in delalloc write · e86ccd92

由 Zhang Yi 提交于 11月 15, 2021

mainline inclusion
from mainline-5.15-rc4
commit cc883236
category: perf
bugzilla: 182881 https://gitee.com/openeuler/kernel/issues/I4DDEL

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=cc883236b79297f6266ca6f4e7f24f3fd3c736c1

---------------------------

After we factor out the inline data write procedure from
ext4_da_write_end(), we don't need to start journal handle for the cases
of both buffer overwrite and append-write. If we need to update
i_disksize, mark_inode_dirty() do start handle and update inode buffer.
So we could just remove all the journal handle codes in the delalloc
write procedure.

After this patch, we could get a lot of performance improvement. Below
is the Unixbench comparison data test on my machine with 'Intel Xeon
Gold 5120' CPU and nvme SSD backend.

Test cmd:

  ./Run -c 56 -i 3 fstime fsbuffer fsdisk

Before this patch:

  System Benchmarks Partial Index           BASELINE       RESULT   INDEX
  File Copy 1024 bufsize 2000 maxblocks       3960.0     422965.0   1068.1
  File Copy 256 bufsize 500 maxblocks         1655.0     105077.0   634.9
  File Copy 4096 bufsize 8000 maxblocks       5800.0    1429092.0   2464.0
                                                                    ======
  System Benchmarks Index Score (Partial Only)                      1186.6

After this patch:

  System Benchmarks Partial Index           BASELINE       RESULT   INDEX
  File Copy 1024 bufsize 2000 maxblocks       3960.0     732716.0   1850.3
  File Copy 256 bufsize 500 maxblocks         1655.0     184940.0   1117.5
  File Copy 4096 bufsize 8000 maxblocks       5800.0    2427152.0   4184.7
                                                                    ======
  System Benchmarks Index Score (Partial Only)                      2053.0
Signed-off-by: NZhang Yi <yi.zhang@huawei.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Link: https://lore.kernel.org/r/20210716122024.1105856-5-yi.zhang@huawei.comReviewed-by: NYang Erkun <yangerkun@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

e86ccd92

ext4: factor out write end code of inline file · 245d0ae1

由 Zhang Yi 提交于 11月 15, 2021

mainline inclusion
from mainline-5.15-rc4
commit 6984aef5
category: perf
bugzilla: 182881 https://gitee.com/openeuler/kernel/issues/I4DDEL

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=6984aef59814fb5c47b0e30c56e101186b5ebf8c

---------------------------

Now that the inline_data file write end procedure are falled into the
common write end functions, it is not clear. Factor them out and do
some cleanup. This patch also drop ext4_da_write_inline_data_end()
and switch to use ext4_write_inline_data_end() instead because we also
need to do the same error processing if we failed to write data into
inline entry.
Signed-off-by: NZhang Yi <yi.zhang@huawei.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Link: https://lore.kernel.org/r/20210716122024.1105856-4-yi.zhang@huawei.com

Conflicts:
	fs/ext4/inline.c
Reviewed-by: NYang Erkun <yangerkun@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

245d0ae1

ext4: correct the error path of ext4_write_inline_data_end() · 3e296ed2

由 Zhang Yi 提交于 11月 15, 2021

mainline inclusion
from mainline-5.15-rc4
commit 55ce2f64
category: perf
bugzilla: 182881 https://gitee.com/openeuler/kernel/issues/I4DDEL

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=55ce2f649b9e88111270333a8127e23f4f8f42d7

---------------------------

Current error path of ext4_write_inline_data_end() is not correct.

Firstly, it should pass out the error value if ext4_get_inode_loc()
return fail, or else it could trigger infinite loop if we inject error
here. And then it's better to add inode to orphan list if it return fail
in ext4_journal_stop(), otherwise we could not restore inline xattr
entry after power failure. Finally, we need to reset the 'ret' value if
ext4_write_inline_data_end() return success in ext4_write_end() and
ext4_journalled_write_end(), otherwise we could not get the error return
value of ext4_journal_stop().
Signed-off-by: NZhang Yi <yi.zhang@huawei.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Link: https://lore.kernel.org/r/20210716122024.1105856-3-yi.zhang@huawei.comReviewed-by: NYang Erkun <yangerkun@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

3e296ed2

ext4: check and update i_disksize properly · 3608fd2c

由 Zhang Yi 提交于 11月 15, 2021

mainline inclusion
from mainline-5.15-rc4
commit 4df031ff
category: perf
bugzilla: 182881 https://gitee.com/openeuler/kernel/issues/I4DDEL

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4df031ff5876d94b48dd9ee486ba5522382a06b2

---------------------------

After commit 3da40c7b ("ext4: only call ext4_truncate when size <=
isize"), i_disksize could always be updated to i_size in ext4_setattr(),
and we could sure that i_disksize <= i_size since holding inode lock and
if i_disksize < i_size there are delalloc writes pending in the range
upto i_size. If the end of the current write is <= i_size, there's no
need to touch i_disksize since writeback will push i_disksize upto
i_size eventually. So we can switch to check i_size instead of
i_disksize in ext4_da_write_end() when write to the end of the file.
we also could remove ext4_mark_inode_dirty() together because we defer
inode dirtying to generic_write_end() or ext4_da_write_inline_data_end().
Signed-off-by: NZhang Yi <yi.zhang@huawei.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Link: https://lore.kernel.org/r/20210716122024.1105856-2-yi.zhang@huawei.comReviewed-by: NYang Erkun <yangerkun@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

3608fd2c

19 10月, 2021 5 次提交

ext4: prevent getting empty inode buffer · a3e3ff2d

由 Zhang Yi 提交于 10月 19, 2021

hulk inclusion
category: bugfix
bugzilla: 174653 https://gitee.com/openeuler/kernel/issues/I4DDEL
---------------------------

In ext4_get_inode_loc(), we may skip IO and get an zero && uptodate
inode buffer when the inode monopolize an inode block for performance
reason. For most cases, ext4_mark_iloc_dirty() will fill the inode
buffer to make it fine, but we could miss this call if something bad
happened. Finally, __ext4_get_inode_loc_noinmem() may probably get an
empty inode buffer and trigger ext4 error.

For example, if we remove a nonexistent xattr on inode A,
ext4_xattr_set_handle() will return ENODATA before invoking
ext4_mark_iloc_dirty(), it will left an uptodate but zero buffer. We
will get checksum error message in ext4_iget() when getting inode again.

  EXT4-fs error (device sda): ext4_lookup:1784: inode #131074: comm cat: iget: checksum invalid

Even worse, if we allocate another inode B at the same inode block, it
will corrupt the inode A on disk when write back inode B.

So this patch initialize the inode buffer by filling the in-mem inode
contents if we skip read I/O, ensure that the buffer is really uptodate.
Signed-off-by: NZhang Yi <yi.zhang@huawei.com>
Reviewed-by: NYang Erkun <yangerkun@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

a3e3ff2d

ext4: move ext4_fill_raw_inode() related functions · fce5c640

由 Zhang Yi 提交于 10月 19, 2021

hulk inclusion
category: bugfix
bugzilla: 174653 https://gitee.com/openeuler/kernel/issues/I4DDEL
---------------------------

In preparation for calling ext4_fill_raw_inode() in
__ext4_get_inode_loc(), move three related functions before
__ext4_get_inode_loc(), no logical change.
Signed-off-by: NZhang Yi <yi.zhang@huawei.com>
Reviewed-by: NYang Erkun <yangerkun@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

fce5c640

ext4: factor out ext4_fill_raw_inode() · 40b43297

由 Zhang Yi 提交于 10月 19, 2021

hulk inclusion
category: bugfix
bugzilla: 174653 https://gitee.com/openeuler/kernel/issues/I4DDEL
---------------------------

Factor out ext4_fill_raw_inode() from ext4_do_update_inode(), which is
use to fill the in-mem inode contents into the inode table buffer, in
preparation for initializing the exclusive inode buffer without reading
the block in __ext4_get_inode_loc().
Signed-off-by: NZhang Yi <yi.zhang@huawei.com>
Reviewed-by: NYang Erkun <yangerkun@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

40b43297

ext4: make the updating inode data procedure atomic · 77c57232

由 Zhang Yi 提交于 10月 19, 2021

mainline inclusion
from mainline-5.15-rc1
commit baaae979
category: bugfix
bugzilla: 174653 https://gitee.com/openeuler/kernel/issues/I4DDEL
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=baaae979b112642a41b71c71c599d875c067d257

---------------------------

Now that ext4_do_update_inode() return error before filling the whole
inode data if we fail to set inode blocks in ext4_inode_blocks_set().
This error should never happen in theory since sb->s_maxbytes should not
have allowed this, we have already init sb->s_maxbytes according to this
feature in ext4_fill_super(). So even through that could only happen due
to the filesystem corruption, we'd better to return after we finish
updating the inode because it may left an uninitialized buffer and we
could read this buffer later in "errors=continue" mode.

This patch make the updating inode data procedure atomic, call
EXT4_ERROR_INODE() after we dropping i_raw_lock after something bad
happened, make sure that the inode is integrated, and also drop a BUG_ON
and do some small cleanups.
Signed-off-by: NZhang Yi <yi.zhang@huawei.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20210826130412.3921207-4-yi.zhang@huawei.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
Reviewed-by: NYang Erkun <yangerkun@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

77c57232

ext4: move inode eio simulation behind io completeion · cc1cdb17

由 Zhang Yi 提交于 10月 19, 2021

mainline inclusion
from mainline-5.15-rc1
commit 0904c9ae
category: bugfix
bugzilla: 174653 https://gitee.com/openeuler/kernel/issues/I4DDEL
Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0904c9ae3465c7acc066a564a76b75c0af83e6c7

---------------------------

No EIO simulation is required if the buffer is uptodate, so move the
simulation behind read bio completeion just like inode/block bitmap
simulation does.
Signed-off-by: NZhang Yi <yi.zhang@huawei.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20210826130412.3921207-2-yi.zhang@huawei.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
Reviewed-by: NYang Erkun <yangerkun@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

cc1cdb17

13 10月, 2021 1 次提交

ext4: fix overflow in ext4_iomap_alloc() · d0770d8c

由 Jan Kara 提交于 10月 13, 2021

stable inclusion
from stable-5.10.50
commit b368b0375e776b21c3cc42a1a4680f3ca6823224
bugzilla: 174522 https://gitee.com/openeuler/kernel/issues/I4DNFY

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=b368b0375e776b21c3cc42a1a4680f3ca6823224

--------------------------------

commit d0b040f5 upstream.

A code in iomap alloc may overflow block number when converting it to
byte offset. Luckily this is mostly harmless as we will just use more
expensive method of writing using unwritten extents even though we are
writing beyond i_size.

Cc: stable@kernel.org
Fixes: 378f32ba ("ext4: introduce direct I/O write using iomap infrastructure")
Signed-off-by: NJan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20210412102333.2676-4-jack@suse.czSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

d0770d8c

03 7月, 2021 1 次提交

ext4: stop return ENOSPC from ext4_issue_zeroout · 8119d09e

由 yangerkun 提交于 6月 28, 2021

hulk inclusion
category: bugfix
bugzilla: 167373
CVE: NA

---------------------------

Our testcase(briefly described as fsstress on dm thin-provisioning which
ext4 see volume size with 100G but actual size 10G) trigger a hungtask
bug since ext4_writepages fall into a infinite loop:

static int ext4_writepages(xxx)
{
    ...
   while (!done && mpd.first_page <= mpd.last_page) {
       ...
       ret = mpage_prepare_extent_to_map(&mpd);
       if (!ret) {
           ...
           ret = mpage_map_and_submit_extent(handle,
&mpd,&give_up_on_write);
           <----- will return -ENOSPC
           ...
       }
       ...
       if (ret == -ENOSPC && sbi->s_journal) {
           <------ we cannot break since we will get ENOSPC forever
           jbd2_journal_force_commit_nested(sbi->s_journal);
           ret = 0;
           continue;
       }
       ...
   }
}

Got ENOSPC with follow stack:
...
ext4_ext_map_blocks
  ext4_ext_convert_to_initialized
    ext4_ext_zeroout
      ext4_issue_zeroout
        ...
        submit_bio_wait <-- bio to thinpool will return ENOSPC

Actually the ENOSPC from thin-provisioning means that a EIO from block
device. We need convert the err as EIO to stop confuse ext4.
Signed-off-by: Nyangerkun <yangerkun@huawei.com>
Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

8119d09e

22 4月, 2021 1 次提交

ext4: fix bh ref count on error paths · 195301d6

由 Zhaolong Zhang 提交于 4月 19, 2021

stable inclusion
from stable-5.10.28
commit e178f362f0957f4c95f614671945d89b0bba97c8
bugzilla: 51779

--------------------------------

[ Upstream commit c915fb80 ]

__ext4_journalled_writepage should drop bhs' ref count on error paths
Signed-off-by: NZhaolong Zhang <zhangzl2013@126.com>
Link: https://lore.kernel.org/r/1614678151-70481-1-git-send-email-zhangzl2013@126.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: N  Weilong Chen <chenweilong@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

195301d6

13 4月, 2021 2 次提交

ext4: fix potential error in ext4_do_update_inode · 2a1a95b4

由 Shijie Luo 提交于 3月 31, 2021

stable inclusion
from stable-5.10.26
commit e8fa569465e5d45e322ce61759d06b4629384bda
bugzilla: 51363

--------------------------------

commit 7d8bd3c7 upstream.

If set_large_file = 1 and errors occur in ext4_handle_dirty_metadata(),
the error code will be overridden, go to out_brelse to avoid this
situation.
Signed-off-by: NShijie Luo <luoshijie1@huawei.com>
Link: https://lore.kernel.org/r/20210312065051.36314-1-luoshijie1@huawei.com
Cc: stable@kernel.org
Reviewed-by: NJan Kara <jack@suse.cz>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: N  Weilong Chen <chenweilong@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

2a1a95b4

ext4: stop inode update before return · b4019d34

由 Pan Bian 提交于 3月 31, 2021

stable inclusion
from stable-5.10.26
commit d130b802f98a80c43c13607003911a7bb03b0cc7
bugzilla: 51363

--------------------------------

commit 512c15ef upstream.

The inode update should be stopped before returing the error code.
Signed-off-by: NPan Bian <bianpan2016@163.com>
Link: https://lore.kernel.org/r/20210117085732.93788-1-bianpan2016@163.com
Fixes: 8016e29f ("ext4: fast commit recovery path")
Cc: stable@kernel.org
Reviewed-by: NHarshad Shirwadkar <harshadshirwadkar@gmail.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChen Jun <chenjun102@huawei.com>
Acked-by: N  Weilong Chen <chenweilong@huawei.com>
Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>

b4019d34

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功