提交 · ad36cedd2b0de845b55f88c15282f4796c71018f · openeuler / Kernel

14 7月, 2023 4 次提交

ext4: Add debug message to notify user space is out of free · ad36cedd

由 Zhihao Cheng 提交于 7月 14, 2023

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I7CBCS
CVE: NA

--------------------------------

Add debug message to notify user that ext4_writepages is stuck in loop
caused by ENOSPC.
Signed-off-by: NZhihao Cheng <chengzhihao1@huawei.com>
(cherry picked from commit 4ae7e703)

ad36cedd

Revert "ext4: Stop trying writing pages if no free blocks generated" · b42d3e12

由 Zhihao Cheng 提交于 7月 14, 2023

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I7CBCS
CVE: NA

--------------------------------

This reverts commit 07a8109d.

When ext4 runs out of space, there could be a potential data lost in
ext4_writepages:
If there are many preallocated blocks for some files, e4b bitmap is
different from block bitmap, and there are more free blocks accounted
by block bitmap.

    ext4_writepages                         P2
ext4_mb_new_blocks                  ext4_map_blocks
 ext4_mb_regular_allocator // No free bits in e4b bitmap
 ext4_mb_discard_preallocations_should_retry
  ext4_mb_discard_preallocations
   ext4_mb_discard_group_preallocations
    ext4_mb_release_inode_pa // updates e4b bitmap by pa->pa_free
     mb_free_blocks
                                     ext4_mb_new_blocks
                                      ext4_mb_regular_allocator
                                      // Got e4b bitmap's free bits
 ext4_mb_regular_allocator  // After 3 times retrying, ret ENOSPC

ext4_writepages
 mpage_map_and_submit_extent
  mpage_map_one_extent // ret ENOSPC
  if (err == -ENOSPC && EXT4_SB(sb)->s_mb_free_pending)
  // s_mb_free_pending is 0
  *give_up_on_write = true  // Abandon writeback, data lost!

Fixes: 07a8109d ("ext4: Stop trying writing pages if no free ...")
Signed-off-by: NZhihao Cheng <chengzhihao1@huawei.com>
(cherry picked from commit 5f142164)

b42d3e12

O
Merge branch 'openEuler-22.03-LTS-SP1' of https://gitee.com/openeuler/kernel... · 3fa6953f
由 openeuler-sync-bot 提交于 7月 14, 2023
```
Merge branch 'openEuler-22.03-LTS-SP1' of https://gitee.com/openeuler/kernel into openEuler-22.03-LTS-SP1
```
3fa6953f

!759 【kernel-openEuler-22.03-LTS-SP1】kernel：fix a type error with 5.10 kernel... · f4841ef1

由 openeuler-ci-bot 提交于 7月 14, 2023

!759 【kernel-openEuler-22.03-LTS-SP1】kernel：fix a type error with 5.10 kernel on openEuler 22.03 LTS SP1 system

Merge Pull Request from: @zhujun3 
 
This PR is to adapt the 5.10 kernel to BC-Linux for Euler V22.10 U1 OS, the step one is compile kernel

Kernel Issue:

(https://gitee.com/openeuler/kernel/issues/I7E2XC?from=project-issue)
    
    
 
Link:https://gitee.com/openeuler/kernel/pulls/759 

Reviewed-by: sanglipeng <sanglipeng1@jd.com> 
Reviewed-by: Xie XiuQi <xiexiuqi@huawei.com> 
Signed-off-by: Xie XiuQi <xiexiuqi@huawei.com>

f4841ef1

13 7月, 2023 6 次提交

O
Merge branch 'openEuler-22.03-LTS-SP1' of https://gitee.com/openeuler/kernel... · 52b0d429
由 openeuler-sync-bot 提交于 7月 13, 2023
```
Merge branch 'openEuler-22.03-LTS-SP1' of https://gitee.com/openeuler/kernel into openEuler-22.03-LTS-SP1
```
52b0d429

ubifs: Fix memory leak in do_rename · 12d98636

由 Mårten Lindahl 提交于 7月 08, 2023

mainline inclusion
from mainline-v6.4-rc1
commit 3a36d20e
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I7JO0G
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=3a36d20e012903f45714df2731261fdefac900cb

--------------------------------

If renaming a file in an encrypted directory, function
fscrypt_setup_filename allocates memory for a file name. This name is
never used, and before returning to the caller the memory for it is not
freed.

When running kmemleak on it we see that it is registered as a leak. The
report below is triggered by a simple program 'rename' that renames a
file in an encrypted directory:

  unreferenced object 0xffff888101502840 (size 32):
    comm "rename", pid 9404, jiffies 4302582475 (age 435.735s)
    backtrace:
      __kmem_cache_alloc_node
      __kmalloc
      fscrypt_setup_filename
      do_rename
      ubifs_rename
      vfs_rename
      do_renameat2

To fix this we can remove the call to fscrypt_setup_filename as it's not
needed.

Fixes: 278d9a24 ("ubifs: Rename whiteout atomically")
Reported-by: NZhihao Cheng <chengzhihao1@huawei.com>
Signed-off-by: NMårten Lindahl <marten.lindahl@axis.com>
Reviewed-by: NZhihao Cheng <chengzhihao1@huawei.com>
Cc: stable@vger.kernel.org
Signed-off-by: NRichard Weinberger <richard@nod.at>
Signed-off-by: NZhaoLong Wang <wangzhaolong1@huawei.com>
(cherry picked from commit 6bc63230)

12d98636

ubifs: Free memory for tmpfile name · 939a5822

由 Mårten Lindahl 提交于 7月 08, 2023

mainline inclusion
from mainline-vv6.4-rc1
commit 1fb815b3
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I7JO0G
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=1fb815b38bb31d6af9bd0540b8652a0d6fe6cfd3

--------------------------------

When opening a ubifs tmpfile on an encrypted directory, function
fscrypt_setup_filename allocates memory for the name that is to be
stored in the directory entry, but after the name has been copied to the
directory entry inode, the memory is not freed.

When running kmemleak on it we see that it is registered as a leak. The
report below is triggered by a simple program 'tmpfile' just opening a
tmpfile:

  unreferenced object 0xffff88810178f380 (size 32):
    comm "tmpfile", pid 509, jiffies 4294934744 (age 1524.742s)
    backtrace:
      __kmem_cache_alloc_node
      __kmalloc
      fscrypt_setup_filename
      ubifs_tmpfile
      vfs_tmpfile
      path_openat

Free this memory after it has been copied to the inode.
Signed-off-by: NMårten Lindahl <marten.lindahl@axis.com>
Reviewed-by: NZhihao Cheng <chengzhihao1@huawei.com>
Cc: stable@vger.kernel.org
Signed-off-by: NRichard Weinberger <richard@nod.at>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NZhaoLong Wang <wangzhaolong1@huawei.com>
(cherry picked from commit 3c594ca7)

939a5822

!1389 [sync] PR-1312: quota: fix race condition between dqput() and dquot_mark_dquot_dirty() · c3ef7795

由 openeuler-ci-bot 提交于 7月 13, 2023

Merge Pull Request from: @openeuler-sync-bot 
 

Origin pull request: 
https://gitee.com/openeuler/kernel/pulls/1312 
 
PR sync from: Baokun Li <libaokun1@huawei.com>
https://mailweb.openeuler.org/hyperkitty/list/kernel@openeuler.org/message/7ATD3RNUBURBEYA34VGOOZB53J377OZQ/ 
Baokun Li (5):
  quota: factor out dquot_write_dquot()
  quota: rename dquot_active() to inode_quota_active()
  quota: add new helper dquot_active()
  quota: fix dqput() to follow the guarantees dquot_srcu should provide
  quota: simplify drop_dquot_ref()


-- 
2.31.1
 
 
Link:https://gitee.com/openeuler/kernel/pulls/1389 

Reviewed-by: Jialin Zhang <zhangjialin11@huawei.com> 
Signed-off-by: Jialin Zhang <zhangjialin11@huawei.com>

c3ef7795

!1392 [sync] PR-1376: jbd2: Check 'jh->b_transaction' before remove it from checkpoint · 564bbed3

由 openeuler-ci-bot 提交于 7月 13, 2023

Merge Pull Request from: @openeuler-sync-bot 
 

Origin pull request: 
https://gitee.com/openeuler/kernel/pulls/1376 
 
PR sync from: Zhihao Cheng <chengzhihao1@huawei.com>
https://mailweb.openeuler.org/hyperkitty/list/kernel@openeuler.org/message/XNJZFYFNQIMIIQRPICSJB7KUZJDPS27T/ 
 
 
Link:https://gitee.com/openeuler/kernel/pulls/1392 

Reviewed-by: zhangyi (F) <yi.zhang@huawei.com> 
Signed-off-by: Jialin Zhang <zhangjialin11@huawei.com>

564bbed3

!1308 [sync] PR-1280: cgroup: always put cset in cgroup_css_set_put_fork · 2c5ad3ab

由 openeuler-ci-bot 提交于 7月 13, 2023

Merge Pull Request from: @openeuler-sync-bot 
 

Origin pull request: 
https://gitee.com/openeuler/kernel/pulls/1280 
 
    A successful call to cgroup_css_set_fork() will always have taken
    a ref on kargs->cset (regardless of CLONE_INTO_CGROUP), so always
    do a corresponding put in cgroup_css_set_put_fork().

    Without this, a cset and its contained css structures will be
    leaked for some fork failures.  The following script reproduces
    the leak for a fork failure due to exceeding pids.max in the
    pids controller.  A similar thing can happen if we jump to the
    bad_fork_cancel_cgroup label in copy_process().

    [ -z "$1" ] && echo "Usage $0 pids-root" && exit 1
    PID_ROOT=$1
    CGROUP=$PID_ROOT/foo

    [ -e $CGROUP ] && rmdir -f $CGROUP
    mkdir $CGROUP
    echo 5 > $CGROUP/pids.max
    echo $$ > $CGROUP/cgroup.procs

    fork_bomb()
    {
            set -e
            for i in $(seq 10); do
                    /bin/sleep 3600 &
            done
    }

    (fork_bomb) &
    wait
    echo $$ > $PID_ROOT/cgroup.procs
    kill $(cat $CGROUP/cgroup.procs)
    rmdir $CGROUP 
 
Link:https://gitee.com/openeuler/kernel/pulls/1308 

Reviewed-by: Jialin Zhang <zhangjialin11@huawei.com> 
Signed-off-by: Jialin Zhang <zhangjialin11@huawei.com>

2c5ad3ab

12 7月, 2023 10 次提交

jbd2: Check 'jh->b_transaction' before remove it from checkpoint · 663a92d7

由 Zhihao Cheng 提交于 7月 11, 2023

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I70WHL
CVE: NA

--------------------------------

Following process will corrupt ext4 image:
Step 1:
jbd2_journal_commit_transaction
 __jbd2_journal_insert_checkpoint(jh, commit_transaction)
 // Put jh into trans1->t_checkpoint_list
 journal->j_checkpoint_transactions = commit_transaction
 // Put trans1 into journal->j_checkpoint_transactions

Step 2:
do_get_write_access
 test_clear_buffer_dirty(bh) // clear buffer dirty，set jbd dirty
 __jbd2_journal_file_buffer(jh, transaction) // jh belongs to trans2

Step 3:
drop_cache
 journal_shrink_one_cp_list
  jbd2_journal_try_remove_checkpoint
   if (!trylock_buffer(bh))  // lock bh, true
   if (buffer_dirty(bh))     // buffer is not dirty
   __jbd2_journal_remove_checkpoint(jh)
   // remove jh from trans1->t_checkpoint_list

Step 4:
jbd2_log_do_checkpoint
 trans1 = journal->j_checkpoint_transactions
 // jh is not in trans1->t_checkpoint_list
 jbd2_cleanup_journal_tail(journal)  // trans1 is done

Step 5: Power cut, trans2 is not committed, jh is lost in next mounting.

Fix it by checking 'jh->b_transaction' before remove it from checkpoint.

Fixes: 80079353 ("jbd2: fix a race when checking checkpoint ...")
Signed-off-by: NZhihao Cheng <chengzhihao1@huawei.com>
(cherry picked from commit 7723e91d)

663a92d7

quota: simplify drop_dquot_ref() · 8d16ece8

由 Baokun Li 提交于 7月 05, 2023

maillist inclusion
category: bugfix
bugzilla: 188812,https://gitee.com/openeuler/kernel/issues/I7E0YR

Reference: https://www.spinics.net/lists/kernel/msg4844759.html

----------------------------------------

As Honza said, remove_inode_dquot_ref() currently does not release the
last dquot reference but instead adds the dquot to tofree_head list. This
is because dqput() can sleep while dropping of the last dquot reference
(writing back the dquot and calling ->release_dquot()) and that must not
happen under dq_list_lock. Now that dqput() queues the final dquot cleanup
into a workqueue, remove_inode_dquot_ref() can call dqput() unconditionally
and we can significantly simplify it.

Here we open code the simplified code of remove_inode_dquot_ref() into
remove_dquot_ref() and remove the function put_dquot_list() which is no
longer used.
Signed-off-by: NBaokun Li <libaokun1@huawei.com>
(cherry picked from commit a13fcef3)

8d16ece8

quota: fix dqput() to follow the guarantees dquot_srcu should provide · 50a9c1dc

由 Baokun Li 提交于 7月 05, 2023

maillist inclusion
category: bugfix
bugzilla: 188812,https://gitee.com/openeuler/kernel/issues/I7E0YR

Reference: https://www.spinics.net/lists/kernel/msg4844759.html

----------------------------------------

The dquot_mark_dquot_dirty() using dquot references from the inode
should be protected by dquot_srcu. quota_off code takes care to call
synchronize_srcu(&dquot_srcu) to not drop dquot references while they
are used by other users. But dquot_transfer() breaks this assumption.
We call dquot_transfer() to drop the last reference of dquot and add
it to free_dquots, but there may still be other users using the dquot
at this time, as shown in the function graph below:

       cpu1              cpu2
_________________|_________________
wb_do_writeback         CHOWN(1)
 ...
  ext4_da_update_reserve_space
   dquot_claim_block
    ...
     dquot_mark_dquot_dirty // try to dirty old quota
      test_bit(DQ_ACTIVE_B, &dquot->dq_flags) // still ACTIVE
      if (test_bit(DQ_MOD_B, &dquot->dq_flags))
      // test no dirty, wait dq_list_lock
                    ...
                     dquot_transfer
                      __dquot_transfer
                      dqput_all(transfer_from) // rls old dquot
                       dqput // last dqput
                        dquot_release
                         clear_bit(DQ_ACTIVE_B, &dquot->dq_flags)
                        atomic_dec(&dquot->dq_count)
                        put_dquot_last(dquot)
                         list_add_tail(&dquot->dq_free, &free_dquots)
                         // add the dquot to free_dquots
      if (!test_and_set_bit(DQ_MOD_B, &dquot->dq_flags))
        add dqi_dirty_list // add released dquot to dirty_list

This can cause various issues, such as dquot being destroyed by
dqcache_shrink_scan() after being added to free_dquots, which can trigger
a UAF in dquot_mark_dquot_dirty(); or after dquot is added to free_dquots
and then to dirty_list, it is added to free_dquots again after
dquot_writeback_dquots() is executed, which causes the free_dquots list to
be corrupted and triggers a UAF when dqcache_shrink_scan() is called for
freeing dquot twice.

As Honza said, we need to fix dquot_transfer() to follow the guarantees
dquot_srcu should provide. But calling synchronize_srcu() directly from
dquot_transfer() is too expensive (and mostly unnecessary). So we add
dquot whose last reference should be dropped to the new global dquot
list releasing_dquots, and then queue work item which would call
synchronize_srcu() and after that perform the final cleanup of all the
dquots on releasing_dquots.

Fixes: 4580b30e ("quota: Do not dirty bad dquots")
Suggested-by: NJan Kara <jack@suse.cz>
Signed-off-by: NBaokun Li <libaokun1@huawei.com>
(cherry picked from commit d82ddaab)

50a9c1dc

quota: add new helper dquot_active() · 364aa369

由 Baokun Li 提交于 7月 05, 2023

maillist inclusion
category: bugfix
bugzilla: 188812,https://gitee.com/openeuler/kernel/issues/I7E0YR

Reference: https://www.spinics.net/lists/kernel/msg4844759.html

----------------------------------------

Add new helper function dquot_active() to make the code more concise.
Signed-off-by: NBaokun Li <libaokun1@huawei.com>
(cherry picked from commit 3fb7aa3a)

364aa369

quota: rename dquot_active() to inode_quota_active() · 2dc40f74

由 Baokun Li 提交于 7月 05, 2023

maillist inclusion
category: bugfix
bugzilla: 188812,https://gitee.com/openeuler/kernel/issues/I7E0YR

Reference: https://www.spinics.net/lists/kernel/msg4844759.html

----------------------------------------

Now we have a helper function dquot_dirty() to determine if dquot has
DQ_MOD_B bit. dquot_active() can easily be misunderstood as a helper
function to determine if dquot has DQ_ACTIVE_B bit. So we avoid this by
renaming it to inode_quota_active() and later on we will add the helper
function dquot_active() to determine if dquot has DQ_ACTIVE_B bit.
Signed-off-by: NBaokun Li <libaokun1@huawei.com>
(cherry picked from commit 329a1eb4)

2dc40f74

quota: factor out dquot_write_dquot() · 42d3a2de

由 Baokun Li 提交于 7月 05, 2023

maillist inclusion
category: bugfix
bugzilla: 188812,https://gitee.com/openeuler/kernel/issues/I7E0YR

Reference: https://www.spinics.net/lists/kernel/msg4844759.html

----------------------------------------

Refactor out dquot_write_dquot() to reduce duplicate code.
Signed-off-by: NBaokun Li <libaokun1@huawei.com>
(cherry picked from commit 0a3781ae)

42d3a2de

O
Merge branch 'openEuler-22.03-LTS-SP1' of https://gitee.com/openeuler/kernel... · 593f244e
由 openeuler-sync-bot 提交于 7月 12, 2023
```
Merge branch 'openEuler-22.03-LTS-SP1' of https://gitee.com/openeuler/kernel into openEuler-22.03-LTS-SP1
```
593f244e

!1329 [sync] PR-1325: jbd2: fix several checkpoint · 6c44b563

由 openeuler-ci-bot 提交于 7月 12, 2023

Merge Pull Request from: @openeuler-sync-bot 
 

Origin pull request: 
https://gitee.com/openeuler/kernel/pulls/1325 
 
PR sync from: Zhihao Cheng <chengzhihao1@huawei.com>
https://mailweb.openeuler.org/hyperkitty/list/kernel@openeuler.org/message/QARA5X5OQUKRFUIORG2YVB6YE3V5CGQB/ 
Zhang Yi (4):
  jbd2: remove journal_clean_one_cp_list()
  jbd2: fix a race when checking checkpoint buffer busy
  jbd2: remove __journal_try_to_free_buffer()
  jbd2: fix checkpoint cleanup performance regression

Zhihao Cheng (1):
  jbd2: Fix wrongly judgement for buffer head removing while doing
    checkpoint


-- 
2.31.1
 
 
Link:https://gitee.com/openeuler/kernel/pulls/1329 

Reviewed-by: zhangyi (F) <yi.zhang@huawei.com> 
Signed-off-by: Jialin Zhang <zhangjialin11@huawei.com>

6c44b563

!1332 [sync] PR-1314: ext4: Stop trying writing pages if no free blocks generated · 3bb5ef86

由 openeuler-ci-bot 提交于 7月 12, 2023

Merge Pull Request from: @openeuler-sync-bot 
 

Origin pull request: 
https://gitee.com/openeuler/kernel/pulls/1314 
 
PR sync from: Zhihao Cheng <chengzhihao1@huawei.com>
https://mailweb.openeuler.org/hyperkitty/list/kernel@openeuler.org/message/ALOJ633HB2KNGCGZVSSVUI34JMM2MTRP/ 
 
 
Link:https://gitee.com/openeuler/kernel/pulls/1332 

Reviewed-by: zhangyi (F) <yi.zhang@huawei.com> 
Signed-off-by: Jialin Zhang <zhangjialin11@huawei.com>

3bb5ef86

dm thin: fix deadlock when swapping to thin device · 73c633e6

由 Coly Li 提交于 7月 08, 2023

mainline inclusion
from mainline-v6.3-rc4
commit 9bbf5fee
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I7JLUM
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v6.4&id=9bbf5feecc7eab2c370496c1c161bbfe62084028

----------------------------------------

This is an already known issue that dm-thin volume cannot be used as
swap, otherwise a deadlock may happen when dm-thin internal memory
demand triggers swap I/O on the dm-thin volume itself.

But thanks to commit a666e5c0 ("dm: fix deadlock when swapping to
encrypted device"), the limit_swap_bios target flag can also be used
for dm-thin to avoid the recursive I/O when it is used as swap.

Fix is to simply set ti->limit_swap_bios to true in both pool_ctr()
and thin_ctr().

In my test, I create a dm-thin volume /dev/vg/swap and use it as swap
device. Then I run fio on another dm-thin volume /dev/vg/main and use
large --blocksize to trigger swap I/O onto /dev/vg/swap.

The following fio command line is used in my test,
  fio --name recursive-swap-io --lockmem 1 --iodepth 128 \
     --ioengine libaio --filename /dev/vg/main --rw randrw \
    --blocksize 1M --numjobs 32 --time_based --runtime=12h

Without this fix, the whole system can be locked up within 15 seconds.

With this fix, there is no any deadlock or hung task observed after
2 hours of running fio.

Furthermore, if blocksize is changed from 1M to 128M, after around 30
seconds fio has no visible I/O, and the out-of-memory killer message
shows up in kernel message. After around 20 minutes all fio processes
are killed and the whole system is back to being alive.

This is exactly what is expected when recursive I/O happens on dm-thin
volume when it is used as swap.

Depends-on: a666e5c0 ("dm: fix deadlock when swapping to encrypted device")
Cc: stable@vger.kernel.org
Signed-off-by: NColy Li <colyli@suse.de>
Acked-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@kernel.org>

Conflict:
  drivers/md/dm-thin.c
Signed-off-by: NLi Lingfeng <lilingfeng3@huawei.com>
(cherry picked from commit 6283fa7e)

73c633e6

11 7月, 2023 6 次提交

!1340 [sync] PR-1286: ext4: turning quotas off if mount failed after enable quotas · 36fbed46

由 openeuler-ci-bot 提交于 7月 11, 2023

Merge Pull Request from: @openeuler-sync-bot 
 

Origin pull request: 
https://gitee.com/openeuler/kernel/pulls/1286 
 
PR sync from: Baokun Li <libaokun1@huawei.com>
https://mailweb.openeuler.org/hyperkitty/list/kernel@openeuler.org/message/X3ZSP2AARUKCTNGQH7V2EC4D2KQ67AMO/ 
 
 
Link:https://gitee.com/openeuler/kernel/pulls/1340 

Reviewed-by: Jialin Zhang <zhangjialin11@huawei.com> 
Signed-off-by: Jialin Zhang <zhangjialin11@huawei.com>

36fbed46

!1367 [sync] PR-1324: io_uring: hold uring mutex around poll removal · 617c037e

由 openeuler-ci-bot 提交于 7月 11, 2023

Merge Pull Request from: @openeuler-sync-bot 
 

Origin pull request: 
https://gitee.com/openeuler/kernel/pulls/1324 
 
PR sync from: Zhong Jinghua <zhongjinghua@huawei.com>
https://mailweb.openeuler.org/hyperkitty/list/kernel@openeuler.org/message/2P2KGVU22TWAYJ5N3JDYWA7EXWJOL2OS/ 
 
 
Link:https://gitee.com/openeuler/kernel/pulls/1367 

Reviewed-by: zhangyi (F) <yi.zhang@huawei.com> 
Signed-off-by: Jialin Zhang <zhangjialin11@huawei.com>

617c037e

!1363 [sync] PR-1287: ipvlan:Fix out-of-bounds caused by unclear skb->cb · 492a8f90

由 openeuler-ci-bot 提交于 7月 11, 2023

Merge Pull Request from: @openeuler-sync-bot 
 

Origin pull request: 
https://gitee.com/openeuler/kernel/pulls/1287 
 
PR sync from: Zhengchao Shao <shaozhengchao@huawei.com>
https://mailweb.openeuler.org/hyperkitty/list/kernel@openeuler.org/message/AM4RDLF2OSU74VL45PDNQCRW7E3VXA63/ 
 
 
Link:https://gitee.com/openeuler/kernel/pulls/1363 

Reviewed-by: Yue Haibing <yuehaibing@huawei.com> 
Signed-off-by: Jialin Zhang <zhangjialin11@huawei.com>

492a8f90

io_uring: hold uring mutex around poll removal · 1f614ed5

由 Jens Axboe 提交于 7月 05, 2023

stable inclusion
from stable-v5.10.185
commit 4716c73b188566865bdd79c3a6709696a224ac04
category: bugfix
bugzilla: 188954, https://gitee.com/src-openeuler/kernel/issues/I7GVI5?from=project-issue
CVE: CVE-2023-3389

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4716c73b188566865bdd79c3a6709696a224ac04

----------------------------------------

Snipped from commit 9ca9fb24 upstream.

While reworking the poll hashing in the v6.0 kernel, we ended up
grabbing the ctx->uring_lock in poll update/removal. This also fixed
a bug with linked timeouts racing with timeout expiry and poll
removal.

Bring back just the locking fix for that.
Reported-and-tested-by: NQuerijn Voet <querijnqyn@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NZhong Jinghua <zhongjinghua@huawei.com>
(cherry picked from commit 43a7aef4)

1f614ed5

ipvlan:Fix out-of-bounds caused by unclear skb->cb · 16bcf782

由 t.feng 提交于 7月 03, 2023

stable inclusion
from stable-v5.10.181
commit f4a371d3f5a7a71dff1ab48b3122c5cf23cc7ad5
category: bugfix
bugzilla: https://gitee.com/src-openeuler/kernel/issues/I7GVI1
CVE: CVE-2023-3090

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=f4a371d3f5a7a71dff1ab48b3122c5cf23cc7ad5

--------------------------------

[ Upstream commit 90cbed52 ]

If skb enqueue the qdisc, fq_skb_cb(skb)->time_to_send is changed which
is actually skb->cb, and IPCB(skb_in)->opt will be used in
__ip_options_echo. It is possible that memcpy is out of bounds and lead
to stack overflow.
We should clear skb->cb before ip_local_out or ip6_local_out.

v2:
1. clean the stack info
2. use IPCB/IP6CB instead of skb->cb

crash on stable-5.10(reproduce in kasan kernel).
Stack info:
[ 2203.651571] BUG: KASAN: stack-out-of-bounds in
__ip_options_echo+0x589/0x800
[ 2203.653327] Write of size 4 at addr ffff88811a388f27 by task
swapper/3/0
[ 2203.655460] CPU: 3 PID: 0 Comm: swapper/3 Kdump: loaded Not tainted
5.10.0-60.18.0.50.h856.kasan.eulerosv2r11.x86_64 #1
[ 2203.655466] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS rel-1.10.2-0-g5f4c7b1-20181220_000000-szxrtosci10000 04/01/2014
[ 2203.655475] Call Trace:
[ 2203.655481]  <IRQ>
[ 2203.655501]  dump_stack+0x9c/0xd3
[ 2203.655514]  print_address_description.constprop.0+0x19/0x170
[ 2203.655530]  __kasan_report.cold+0x6c/0x84
[ 2203.655586]  kasan_report+0x3a/0x50
[ 2203.655594]  check_memory_region+0xfd/0x1f0
[ 2203.655601]  memcpy+0x39/0x60
[ 2203.655608]  __ip_options_echo+0x589/0x800
[ 2203.655654]  __icmp_send+0x59a/0x960
[ 2203.655755]  nf_send_unreach+0x129/0x3d0 [nf_reject_ipv4]
[ 2203.655763]  reject_tg+0x77/0x1bf [ipt_REJECT]
[ 2203.655772]  ipt_do_table+0x691/0xa40 [ip_tables]
[ 2203.655821]  nf_hook_slow+0x69/0x100
[ 2203.655828]  __ip_local_out+0x21e/0x2b0
[ 2203.655857]  ip_local_out+0x28/0x90
[ 2203.655868]  ipvlan_process_v4_outbound+0x21e/0x260 [ipvlan]
[ 2203.655931]  ipvlan_xmit_mode_l3+0x3bd/0x400 [ipvlan]
[ 2203.655967]  ipvlan_queue_xmit+0xb3/0x190 [ipvlan]
[ 2203.655977]  ipvlan_start_xmit+0x2e/0xb0 [ipvlan]
[ 2203.655984]  xmit_one.constprop.0+0xe1/0x280
[ 2203.655992]  dev_hard_start_xmit+0x62/0x100
[ 2203.656000]  sch_direct_xmit+0x215/0x640
[ 2203.656028]  __qdisc_run+0x153/0x1f0
[ 2203.656069]  __dev_queue_xmit+0x77f/0x1030
[ 2203.656173]  ip_finish_output2+0x59b/0xc20
[ 2203.656244]  __ip_finish_output.part.0+0x318/0x3d0
[ 2203.656312]  ip_finish_output+0x168/0x190
[ 2203.656320]  ip_output+0x12d/0x220
[ 2203.656357]  __ip_queue_xmit+0x392/0x880
[ 2203.656380]  __tcp_transmit_skb+0x1088/0x11c0
[ 2203.656436]  __tcp_retransmit_skb+0x475/0xa30
[ 2203.656505]  tcp_retransmit_skb+0x2d/0x190
[ 2203.656512]  tcp_retransmit_timer+0x3af/0x9a0
[ 2203.656519]  tcp_write_timer_handler+0x3ba/0x510
[ 2203.656529]  tcp_write_timer+0x55/0x180
[ 2203.656542]  call_timer_fn+0x3f/0x1d0
[ 2203.656555]  expire_timers+0x160/0x200
[ 2203.656562]  run_timer_softirq+0x1f4/0x480
[ 2203.656606]  __do_softirq+0xfd/0x402
[ 2203.656613]  asm_call_irq_on_stack+0x12/0x20
[ 2203.656617]  </IRQ>
[ 2203.656623]  do_softirq_own_stack+0x37/0x50
[ 2203.656631]  irq_exit_rcu+0x134/0x1a0
[ 2203.656639]  sysvec_apic_timer_interrupt+0x36/0x80
[ 2203.656646]  asm_sysvec_apic_timer_interrupt+0x12/0x20
[ 2203.656654] RIP: 0010:default_idle+0x13/0x20
[ 2203.656663] Code: 89 f0 5d 41 5c 41 5d 41 5e c3 cc cc cc cc cc cc cc
cc cc cc cc cc cc 0f 1f 44 00 00 0f 1f 44 00 00 0f 00 2d 9f 32 57 00 fb
f4 <c3> cc cc cc cc 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 54 be 08
[ 2203.656668] RSP: 0018:ffff88810036fe78 EFLAGS: 00000256
[ 2203.656676] RAX: ffffffffaf2a87f0 RBX: ffff888100360000 RCX:
ffffffffaf290191
[ 2203.656681] RDX: 0000000000098b5e RSI: 0000000000000004 RDI:
ffff88811a3c4f60
[ 2203.656686] RBP: 0000000000000000 R08: 0000000000000001 R09:
ffff88811a3c4f63
[ 2203.656690] R10: ffffed10234789ec R11: 0000000000000001 R12:
0000000000000003
[ 2203.656695] R13: ffff888100360000 R14: 0000000000000000 R15:
0000000000000000
[ 2203.656729]  default_idle_call+0x5a/0x150
[ 2203.656735]  cpuidle_idle_call+0x1c6/0x220
[ 2203.656780]  do_idle+0xab/0x100
[ 2203.656786]  cpu_startup_entry+0x19/0x20
[ 2203.656793]  secondary_startup_64_no_verify+0xc2/0xcb

[ 2203.657409] The buggy address belongs to the page:
[ 2203.658648] page:0000000027a9842f refcount:1 mapcount:0
mapping:0000000000000000 index:0x0 pfn:0x11a388
[ 2203.658665] flags:
0x17ffffc0001000(reserved|node=0|zone=2|lastcpupid=0x1fffff)
[ 2203.658675] raw: 0017ffffc0001000 ffffea000468e208 ffffea000468e208
0000000000000000
[ 2203.658682] raw: 0000000000000000 0000000000000000 00000001ffffffff
0000000000000000
[ 2203.658686] page dumped because: kasan: bad access detected

To reproduce(ipvlan with IPVLAN_MODE_L3):
Env setting:
=======================================================
modprobe ipvlan ipvlan_default_mode=1
sysctl net.ipv4.conf.eth0.forwarding=1
iptables -t nat -A POSTROUTING -s 20.0.0.0/255.255.255.0 -o eth0 -j
MASQUERADE
ip link add gw link eth0 type ipvlan
ip -4 addr add 20.0.0.254/24 dev gw
ip netns add net1
ip link add ipv1 link eth0 type ipvlan
ip link set ipv1 netns net1
ip netns exec net1 ip link set ipv1 up
ip netns exec net1 ip -4 addr add 20.0.0.4/24 dev ipv1
ip netns exec net1 route add default gw 20.0.0.254
ip netns exec net1 tc qdisc add dev ipv1 root netem loss 10%
ifconfig gw up
iptables -t filter -A OUTPUT -p tcp --dport 8888 -j REJECT --reject-with
icmp-port-unreachable
=======================================================
And then excute the shell(curl any address of eth0 can reach):

for((i=1;i<=100000;i++))
do
        ip netns exec net1 curl x.x.x.x:8888
done
=======================================================

Fixes: 2ad7bf36 ("ipvlan: Initial check-in of the IPVLAN driver.")
Signed-off-by: N"t.feng" <fengtao40@huawei.com>
Suggested-by: NFlorian Westphal <fw@strlen.de>
Reviewed-by: NPaolo Abeni <pabeni@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NZhengchao Shao <shaozhengchao@huawei.com>
(cherry picked from commit 2572b83c)

16bcf782

!1343 [sync] PR-1272: xfs: fix some problems recently · b0421760

由 openeuler-ci-bot 提交于 7月 11, 2023

Merge Pull Request from: @openeuler-sync-bot 
 

Origin pull request: 
https://gitee.com/openeuler/kernel/pulls/1272 
 
PR sync from: Long Li <leo.lilong@huawei.com>
https://mailweb.openeuler.org/hyperkitty/list/kernel@openeuler.org/message/W6KN2XSLJE5HZR2Y5D2OTDQ2GTLDGC5O/ 
Patchs 1-6 fix some problems recently.
Patchs 7-8 backport from mainline.

Darrick J. Wong (1):
  xfs: fix uninitialized variable access

Dave Chinner (1):
  xfs: set XFS_FEAT_NLINK correctly

Long Li (4):
  xfs: factor out xfs_defer_pending_abort
  xfs: don't leak intent item when recovery intents fail
  xfs: factor out xfs_destroy_perag()
  xfs: don't leak perag when growfs fails

Ye Bin (1):
  xfs: fix warning in xfs_vm_writepages()

yangerkun (1):
  xfs: fix mounting failed caused by sequencing problem in the log
    records


-- 
2.31.1
 
 
Link:https://gitee.com/openeuler/kernel/pulls/1343 

Reviewed-by: Jialin Zhang <zhangjialin11@huawei.com> 
Signed-off-by: Jialin Zhang <zhangjialin11@huawei.com>

b0421760

07 7月, 2023 9 次提交

xfs: fix uninitialized variable access · e157b904

由 Darrick J. Wong 提交于 6月 29, 2023

mainline inclusion
from mainline-v6.2-rc6
commit 60b730a4
category: bugfix
bugzilla: 188220, https://gitee.com/openeuler/kernel/issues/I4KIAO

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=60b730a40c43fbcc034970d3e77eb0f25b8cc1cf

--------------------------------

If the end position of a GETFSMAP query overlaps an allocated space and
we're using the free space info to generate fsmap info, the akeys
information gets fed into the fsmap formatter with bad results.
Zero-init the space.

Reported-by: syzbot+090ae72d552e6bd93cfe@syzkaller.appspotmail.com
Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
Signed-off-by: NLong Li <leo.lilong@huawei.com>
(cherry picked from commit aebc38d3)

e157b904

xfs: set XFS_FEAT_NLINK correctly · 163b4f9f

由 Dave Chinner 提交于 6月 29, 2023

mainline inclusion
from mainline-v5.18-rc2
commit dd0d2f97
category: bugfix
bugzilla: 188220, https://gitee.com/openeuler/kernel/issues/I4KIAO

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=dd0d2f9755191690541b09e6385d0f8cd8bc9d8f

--------------------------------

While xfs_has_nlink() is not used in kernel, it is used in userspace
(e.g. by xfs_db) so we need to set the XFS_FEAT_NLINK flag correctly
in xfs_sb_version_to_features().
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
Signed-off-by: NDave Chinner <david@fromorbit.com>
Signed-off-by: NLong Li <leo.lilong@huawei.com>
(cherry picked from commit f2096cec)

163b4f9f

xfs: don't leak perag when growfs fails · ebe9ed75

由 Long Li 提交于 6月 29, 2023

Offering: HULK
hulk inclusion
category: bugfix
bugzilla: 188878, https://gitee.com/openeuler/kernel/issues/I76JSK

--------------------------------

During growfs, if new ag in memory has been initialized, however sb_agcount
has not been updated, if an error occurs at this time it will cause ag
leaks as follows, these new ags will not been freed during umount because
of sb_agcount is not been updated.

unreferenced object 0xffff88810751b000 (size 1024):
  comm "xfs_growfs", pid 123624, jiffies 4300733989 (age 124294.081s)
  hex dump (first 32 bytes):
    00 a0 38 16 81 88 ff ff 05 00 00 00 00 00 00 00  ..8.............
    00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<00000000725c8ae4>] kmem_alloc+0x92/0x1d0 [xfs]
    [<000000005c32d74e>] xfs_initialize_perag+0x8d/0x3b0 [xfs]
    [<00000000830354cf>] xfs_growfs_data_private.isra.0+0x2af/0x610 [xfs]
    [<0000000038a29cb1>] xfs_growfs_data+0x228/0x300 [xfs]
    [<0000000004937dd2>] xfs_file_ioctl+0x8f3/0x10d0 [xfs]
    [<000000001a5d29a8>] __se_sys_ioctl+0xeb/0x120
    [<00000000cf30385a>] do_syscall_64+0x30/0x40
    [<00000000e4a6fd2f>] entry_SYSCALL_64_after_hwframe+0x61/0xc6

When growfs fails, use xfs_destroy_perag() to destroy newly initialized ag
in error handle path.
Signed-off-by: NLong Li <leo.lilong@huawei.com>
(cherry picked from commit 670cd2c8)

ebe9ed75

xfs: factor out xfs_destroy_perag() · ffbfbe96

由 Long Li 提交于 6月 29, 2023

Offering: HULK
hulk inclusion
category: bugfix
bugzilla: 188878, https://gitee.com/openeuler/kernel/issues/I76JSK

--------------------------------

Factor out xfs_destroy_perag() from xfs_initialize_perag() for error
handle, delete perag from radix tree requires lock protection, just like
any other places where perag tree are modified.
Signed-off-by: NLong Li <leo.lilong@huawei.com>
(cherry picked from commit 42297cd9)

ffbfbe96

xfs: fix warning in xfs_vm_writepages() · 11a04e90

由 Ye Bin 提交于 6月 29, 2023

Offering: HULK
hulk inclusion
category: bugfix
bugzilla: 188782, https://gitee.com/openeuler/kernel/issues/I76JSK

-----------------------------------------------

When do BULKSTAT test got issues as follows:
WARNING: CPU: 3 PID: 8425 at fs/xfs/xfs_aops.c:509 xfs_vm_writepages+0x184/0x1c0
Modules linked in:
CPU: 3 PID: 8425 Comm: xfs_bulkstat Not tainted 6.3.0-next-20230505-00003-gf3329adf5424-dirty #456
RIP: 0010:xfs_vm_writepages+0x184/0x1c0
RSP: 0018:ffffc90014bb7088 EFLAGS: 00010246
RAX: 0000000000000000 RBX: 1ffff92002976e11 RCX: ffff88817aef8000
RDX: 0000000000000000 RSI: ffff88817aef8000 RDI: 0000000000000002
RBP: ffff888267dd2ad8 R08: ffffffff8313f414 R09: ffffed1022377c18
R10: ffff888111bbe0bb R11: ffffed1022377c17 R12: ffff88817aef8000
R13: ffffc90014bb7358 R14: dffffc0000000000 R15: ffffffff8313f290
FS:  00007f9568bb0440(0000) GS:ffff88882fc80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000d7a008 CR3: 000000024e11f000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <TASK>
 do_writepages+0x1a8/0x630
 __writeback_single_inode+0x126/0xe00
 writeback_single_inode+0x2ae/0x530
 write_inode_now+0x16e/0x1e0
 iput.part.0+0x46c/0x730
 iput+0x60/0x80
 xfs_bulkstat_one_int+0xd87/0x1580
 xfs_bulkstat_iwalk+0x6e/0xd0
 xfs_iwalk_ag_recs+0x449/0x770
 xfs_iwalk_run_callbacks+0x305/0x630
 xfs_iwalk_ag+0x819/0xae0
 xfs_iwalk+0x2d5/0x4e0
 xfs_bulkstat+0x358/0x520
 xfs_ioc_bulkstat.isra.0+0x242/0x340
 xfs_file_ioctl+0x1d6/0x1ba0
 __x64_sys_ioctl+0x197/0x210
 do_syscall_64+0x39/0xb0
 entry_SYSCALL_64_after_hwframe+0x63/0xcd

Above issue may happens as follows:
Porcess1          Process2           process3         process4
xfs_bulkstat
 xfs_trans_alloc_empty
 xfs_bulkstat_one_int
   xfs_iget(XFS_IGET_DONTCACHE)
   ->Get inode from disk and mark
     inode with I_DONTCACHE

                                                     xfs_lookup
                                                       xfs_iget
                                                       ->Hold inode refcount
   xfs_irele

                 xfs_file_write_iter
                 ->Write file made some dirty pages
                 close file

                                    xfs_bulkstat
                                      xfs_trans_alloc_empty
                                      xfs_bulkstat_one_int
                                        xfs_iget(XFS_IGET_DONTCACHE)

                                                       -> process4 close file

        ******Trigger dentry reclaim, inode refcount is 1******
                                        xfs_irele
                                          iput ->Put the last refcount
                                            iput_final
                                              write_inode_now
                                                xfs_vm_writepages
                                                  WARN_ON_ONCE(current->journal_info)
                                                  ->Trigger warning

As commit a6343e4d grab an empty transaction when do BULKSTAT. If put
the last refcount of inode maybe cause writepages will trigger warning, and
also lead to data loss.
To solve above issue if xfs_iget_cache_hit() just clear inode's I_DONTCACHE
flags.

Fixes: a6343e4d ("xfs: avoid buffer deadlocks when walking fs inodes")
Signed-off-by: NYe Bin <yebin10@huawei.com>
Signed-off-by: NLong Li <leo.lilong@huawei.com>
(cherry picked from commit 28ce0ae2)

11a04e90

xfs: don't leak intent item when recovery intents fail · c1df2e81

由 Long Li 提交于 6月 29, 2023

Offering: HULK
hulk inclusion
category: bugfix
bugzilla: 188865, https://gitee.com/openeuler/kernel/issues/I76JSK

--------------------------------

When recovery intents, it may capture some deferred ops and commit the new
intent items, if recovery intents fails, there will be no done item drop
the reference to the new intent item. This leads to a memory leak as
fllows:

unreferenced object 0xffff888016719108 (size 432):
  comm "mount", pid 529, jiffies 4294706839 (age 144.463s)
  hex dump (first 32 bytes):
    08 91 71 16 80 88 ff ff 08 91 71 16 80 88 ff ff  ..q.......q.....
    18 91 71 16 80 88 ff ff 18 91 71 16 80 88 ff ff  ..q.......q.....
  backtrace:
    [<ffffffff8230c68f>] xfs_efi_init+0x18f/0x1d0
    [<ffffffff8230c720>] xfs_extent_free_create_intent+0x50/0x150
    [<ffffffff821b671a>] xfs_defer_create_intents+0x16a/0x340
    [<ffffffff821bac3e>] xfs_defer_ops_capture_and_commit+0x8e/0xad0
    [<ffffffff82322bb9>] xfs_cui_item_recover+0x819/0x980
    [<ffffffff823289b6>] xlog_recover_process_intents+0x246/0xb70
    [<ffffffff8233249a>] xlog_recover_finish+0x8a/0x9a0
    [<ffffffff822eeafb>] xfs_log_mount_finish+0x2bb/0x4a0
    [<ffffffff822c0f4f>] xfs_mountfs+0x14bf/0x1e70
    [<ffffffff822d1f80>] xfs_fs_fill_super+0x10d0/0x1b20
    [<ffffffff81a21fa2>] get_tree_bdev+0x3d2/0x6d0
    [<ffffffff81a1ee09>] vfs_get_tree+0x89/0x2c0
    [<ffffffff81a9f35f>] path_mount+0xecf/0x1800
    [<ffffffff81a9fd83>] do_mount+0xf3/0x110
    [<ffffffff81aa00e4>] __x64_sys_mount+0x154/0x1f0
    [<ffffffff83968739>] do_syscall_64+0x39/0x80

Fix it by abort intent items in capture list that don't have a done item
when recovery intents fail. If transaction that have deferred ops is
commmit fails in xfs_defer_ops_capture_and_commit(), defer capture would
not added to capture list, it also need abort too.
Signed-off-by: NLong Li <leo.lilong@huawei.com>
(cherry picked from commit c1b08a41)

c1df2e81

xfs: factor out xfs_defer_pending_abort · 4ef24aa2

由 Long Li 提交于 6月 29, 2023

Offering: HULK
hulk inclusion
category: bugfix
bugzilla: 188865, https://gitee.com/openeuler/kernel/issues/I76JSK

--------------------------------

Factor out xfs_defer_pending_abort() from xfs_defer_trans_abort(), which
not use transaction parameter, so it can be used after the transaction
life cycle.
Signed-off-by: NLong Li <leo.lilong@huawei.com>
(cherry picked from commit 9bd2b3bd)

4ef24aa2

xfs: fix mounting failed caused by sequencing problem in the log records · d9083881

由 yangerkun 提交于 6月 29, 2023

Offering: HULK
hulk inclusion
category: bugfix
bugzilla: 188870, https://gitee.com/openeuler/kernel/issues/I76JSK

--------------------------------

During the test of growfs + power-off, we encountered a mounting failure
issue.
The specific call stack is as follows:

[584505.210179] XFS (loop0): xfs_buf_find: daddr 0x6d6002 out of range,
EOFS 0x6d6000
...
[584505.210739] Call Trace:
[584505.210776]  xfs_buf_get_map+0x44/0x230 [xfs]
[584505.210780]  ? trace_event_buffer_commit+0x57/0x140
[584505.210818]  xfs_buf_read_map+0x54/0x280 [xfs]
[584505.210858]  ? xlog_recover_items_pass2+0x53/0xb0 [xfs]
[584505.210899]  xlog_recover_buf_commit_pass2+0x112/0x440 [xfs]
[584505.210939]  ? xlog_recover_items_pass2+0x53/0xb0 [xfs]
[584505.210980]  xlog_recover_items_pass2+0x53/0xb0 [xfs]
[584505.211020]  xlog_recover_commit_trans+0x2ca/0x320 [xfs]
[584505.211061]  xlog_recovery_process_trans+0xc6/0xf0 [xfs]
[584505.211101]  xlog_recover_process_data+0x9e/0x110 [xfs]
[584505.211141]  xlog_do_recovery_pass+0x3b4/0x5c0 [xfs]
[584505.211181]  xlog_do_log_recovery+0x5e/0x80 [xfs]
[584505.211223]  xlog_do_recover+0x33/0x1a0 [xfs]
[584505.211262]  xlog_recover+0xd7/0x170 [xfs]
[584505.211303]  xfs_log_mount+0x217/0x2b0 [xfs]
[584505.211341]  xfs_mountfs+0x3da/0x870 [xfs]
[584505.211384]  xfs_fc_fill_super+0x3fa/0x7a0 [xfs]
[584505.211428]  ? xfs_setup_devices+0x80/0x80 [xfs]
[584505.211432]  get_tree_bdev+0x16f/0x260
[584505.211434]  vfs_get_tree+0x25/0xc0
[584505.211436]  do_new_mount+0x156/0x1b0
[584505.211438]  __se_sys_mount+0x165/0x1d0
[584505.211440]  do_syscall_64+0x33/0x40
[584505.211442]  entry_SYSCALL_64_after_hwframe+0x61/0xc6

After analyzing the log records, we have discovered the following
content:

============================================================================
cycle: 173  version: 2    lsn: 173,2742 tail_lsn: 173,1243
length of Log Record: 25600 prev offset: 2702   num ops: 258
uuid: fb958458-48a3-4c76-ae23-7a1cf3053065   format: little endian linux
h_size: 32768
----------------------------------------------------------------------------
...
----------------------------------------------------------------------------
Oper (100): tid: 1c010724  len: 24  clientid: TRANS  flags: none
BUF:  #regs: 2   start blkno: 7168002 (0x6d6002)  len: 1  bmap size: 1
flags: 0x3800
Oper (101): tid: 1c010724  len: 128  clientid: TRANS  flags: none
AGI Buffer: XAGI
ver: 1  seq#: 28  len: 2048  cnt: 0  root: 3
level: 1  free#: 0x0  newino: 0x140
bucket[0 - 3]: 0xffffffff 0xffffffff 0xffffffff 0xffffffff
bucket[4 - 7]: 0xffffffff 0xffffffff 0xffffffff 0xffffffff
bucket[8 - 11]: 0xffffffff 0xffffffff 0xffffffff 0xffffffff
bucket[12 - 15]: 0xffffffff 0xffffffff 0xffffffff 0xffffffff
bucket[16 - 19]: 0xffffffff
----------------------------------------------------------------------------
...
----------------------------------------------------------------------------
Oper (108): tid: 1c010724  len: 24  clientid: TRANS  flags: none
BUF:  #regs: 2   start blkno: 0 (0x0)  len: 1  bmap size: 1  flags:
0x9000
Oper (109): tid: 1c010724  len: 384  clientid: TRANS  flags: none
SUPER BLOCK Buffer:
icount: 6360863066640355328  ifree: 898048  fdblks: 0  frext: 0
----------------------------------------------------------------------------
...

We found that in the log records, the modification transaction for the
expanded block is before the growfs transaction, which leads to
verification
failure during log replay.

We need to ensure that when replaying logs, transactions related to the
superblock are replayed first.
Signed-off-by: NWu Guanghao <wuguanghao3@huawei.com>
Signed-off-by: Nyangerkun <yangerkun@huawei.com>
Signed-off-by: NLong Li <leo.lilong@huawei.com>
(cherry picked from commit dba19fb8)

d9083881

ext4: turning quotas off if mount failed after enable quotas · 8c14141a

由 Baokun Li 提交于 7月 01, 2023

mainline inclusion
from mainline-v6.5
commit d13f99632748462c32fc95d729f5e754bab06064
category: bugfix
bugzilla: 188906, https://gitee.com/openeuler/kernel/issues/I7E9M5
CVE: NA

Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d13f99632748462c32fc95d729f5e754bab06064

--------------------------------

Yi found during a review of the patch "ext4: don't BUG on inconsistent
journal feature" that when ext4_mark_recovery_complete() returns an error
value, the error handling path does not turn off the enabled quotas,
which triggers the following kmemleak:

================================================================
unreferenced object 0xffff8cf68678e7c0 (size 64):
comm "mount", pid 746, jiffies 4294871231 (age 11.540s)
hex dump (first 32 bytes):
00 90 ef 82 f6 8c ff ff 00 00 00 00 41 01 00 00  ............A...
c7 00 00 00 bd 00 00 00 0a 00 00 00 48 00 00 00  ............H...
backtrace:
[<00000000c561ef24>] __kmem_cache_alloc_node+0x4d4/0x880
[<00000000d4e621d7>] kmalloc_trace+0x39/0x140
[<00000000837eee74>] v2_read_file_info+0x18a/0x3a0
[<0000000088f6c877>] dquot_load_quota_sb+0x2ed/0x770
[<00000000340a4782>] dquot_load_quota_inode+0xc6/0x1c0
[<0000000089a18bd5>] ext4_enable_quotas+0x17e/0x3a0 [ext4]
[<000000003a0268fa>] __ext4_fill_super+0x3448/0x3910 [ext4]
[<00000000b0f2a8a8>] ext4_fill_super+0x13d/0x340 [ext4]
[<000000004a9489c4>] get_tree_bdev+0x1dc/0x370
[<000000006e723bf1>] ext4_get_tree+0x1d/0x30 [ext4]
[<00000000c7cb663d>] vfs_get_tree+0x31/0x160
[<00000000320e1bed>] do_new_mount+0x1d5/0x480
[<00000000c074654c>] path_mount+0x22e/0xbe0
[<0000000003e97a8e>] do_mount+0x95/0xc0
[<000000002f3d3736>] __x64_sys_mount+0xc4/0x160
[<0000000027d2140c>] do_syscall_64+0x3f/0x90
================================================================

To solve this problem, we add a "failed_mount10" tag, and call
ext4_quota_off_umount() in this tag to release the enabled qoutas.

Fixes: 11215630 ("ext4: don't BUG on inconsistent journal feature")
Cc: stable@kernel.org
Signed-off-by: NZhang Yi <yi.zhang@huawei.com>
Signed-off-by: NBaokun Li <libaokun1@huawei.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20230327141630.156875-2-libaokun1@huawei.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

Conflicts:
	fs/ext4/super.c
Signed-off-by: NBaokun Li <libaokun1@huawei.com>
(cherry picked from commit e980e714)

8c14141a

06 7月, 2023 5 次提交

ext4: Stop trying writing pages if no free blocks generated · 77d99dff

由 Zhihao Cheng 提交于 7月 05, 2023

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I7CBCS

--------------------------------

Folllowing steps could make ext4_wripages trap into a dead loop:

1. Consume free_clusters until free_clusters > 2 * sbi->s_resv_clusters,
   and free_clusters > EXT4_FREECLUSTERS_WATERMARK.
   // eg. free_clusters = 1422, sbi->s_resv_clusters = 512
   // nr_cpus = 4, EXT4_FREECLUSTERS_WATERMARK = 512
2. umount && mount.  // dirty_clusters = 0
3. Run free_clusters tasks concurrently to write different files, many
   tasks write(appendant) 4K data by da_write method. And each inode will
   consume one data block and one extent block in map_block.
   // There are (free_clusters - EXT4_FREECLUSTERS_WATERMARK = 910)
   // tasks choosing da_write method, left 512 tasks choose write_begin
   // method. If tasks which chooses da_write path run first.
   // dirty_clusters = 910, free_clusters = 1422
   // Tasks which choose write_begin path will get ENOSPC:
   //  free_clusters < (nclusters + dirty_clusters + resv_clusters)
   //  1422 < (1 + 910 + 512)
4. After certain number of map_block iterations in ext4_writepages.
   // free_clusters = 0,
   // dirty_clusters = 910 - (1422 / 2) = 199
5. Delete one 4K file.  // free_clusters = 1
6. ext4_writepages traps into dead loop:
    mpage_map_and_submit_extent
     mpage_map_one_extent // ret = ENOSPC
       ext4_map_blocks -> ext4_ext_map_blocks -> ext4_mb_new_blocks ->
       ext4_claim_free_clusters:
         if (free_clusters >= (nclusters + dirty_clusters)) // false
     if (err == -ENOSPC && ext4_count_free_clusters(sb)) // true
       return err
     *give_up_on_write = true // won't be executed

Fix it by terminating ext4_writepages if no free blocks generated.
Signed-off-by: NZhihao Cheng <chengzhihao1@huawei.com>
(cherry picked from commit 07a8109d)

77d99dff

jbd2: fix checkpoint cleanup performance regression · 4e584046

由 Zhang Yi 提交于 7月 05, 2023

hulk inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I7IO1D
CVE: NA

--------------------------------

journal_clean_one_cp_list() has been merged into
journal_shrink_one_cp_list(), but do chekpoint buffer cleanup from the
committing process is just a best effort, it should stop scan once it
meet a busy buffer, or else it will cause a lot of invalid buffer scan
and checks. We catch a performance regression when doing fs_mark tests
below.

Test cmd:
 ./fs_mark  -d  scratch  -s  1024  -n  10000  -t  1  -D  100  -N  100

Before merging checkpoint buffer cleanup:
 FSUse%        Count         Size    Files/sec     App Overhead
     95        10000         1024       8304.9            49033

After merging checkpoint buffer cleanup:
 FSUse%        Count         Size    Files/sec     App Overhead
     95        10000         1024       7649.0            50012
 FSUse%        Count         Size    Files/sec     App Overhead
     95        10000         1024       2107.1            50871

After merging checkpoint buffer cleanup, the total loop count in
journal_shrink_one_cp_list() could be up to 6,261,600+ (50,000+ ~
100,000+ in general), most of them are invalid. This patch fix it
through passing 'shrink_type' into journal_shrink_one_cp_list() and add
a new 'SHRINK_BUSY_STOP' to indicate it should stop once meet a busy
buffer. After fix, the loop count descending back to 10,000+.

After this fix:
 FSUse%        Count         Size    Files/sec     App Overhead
     95        10000         1024       8558.4            49109
Signed-off-by: NZhang Yi <yi.zhang@huawei.com>
Signed-off-by: NZhihao Cheng <chengzhihao1@huawei.com>
(cherry picked from commit 30b833d5)

4e584046

jbd2: remove __journal_try_to_free_buffer() · 3616625a

由 Zhang Yi 提交于 7月 05, 2023

maillist inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I70WHL
CVE: NA

Reference: https://lore.kernel.org/linux-ext4/20230606135928.434610-1-yi.zhang@huaweicloud.com/T/#t

--------------------------------

__journal_try_to_free_buffer() has only one caller and it's logic is
much simple now, so just remove it and open code in
jbd2_journal_try_to_free_buffers().
Signed-off-by: NZhang Yi <yi.zhang@huawei.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Signed-off-by: NZhihao Cheng <chengzhihao1@huawei.com>
(cherry picked from commit b177d4d4)

3616625a

jbd2: fix a race when checking checkpoint buffer busy · 1b4d87d1

由 Zhang Yi 提交于 7月 05, 2023

maillist inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I70WHL
CVE: NA

Reference: https://lore.kernel.org/linux-ext4/20230606135928.434610-1-yi.zhang@huaweicloud.com/T/#t

--------------------------------

Before removing checkpoint buffer from the t_checkpoint_list, we have to
check both BH_Dirty and BH_Lock bits together to distinguish buffers
have not been or were being written back. But __cp_buffer_busy() checks
them separately, it first check lock state and then check dirty, the
window between these two checks could be raced by writing back
procedure, which locks buffer and clears buffer dirty before I/O
completes. So it cannot guarantee checkpointing buffers been written
back to disk if some error happens later. Finally, it may clean
checkpoint transactions and lead to inconsistent filesystem.

jbd2_journal_forget() and __journal_try_to_free_buffer() also have the
same problem (journal_unmap_buffer() escape from this issue since it's
running under the buffer lock), so fix them through introducing a new
helper to try holding the buffer lock and remove really clean buffer.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=217490
Cc: stable@vger.kernel.org
Suggested-by: NJan Kara <jack@suse.cz>
Signed-off-by: NZhang Yi <yi.zhang@huawei.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Signed-off-by: NZhihao Cheng <chengzhihao1@huawei.com>
(cherry picked from commit 80079353)

1b4d87d1

jbd2: Fix wrongly judgement for buffer head removing while doing checkpoint · c2c33b5f

由 Zhihao Cheng 提交于 7月 05, 2023

maillist inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I70WHL
CVE: NA

Reference: https://lore.kernel.org/linux-ext4/20230606135928.434610-1-yi.zhang@huaweicloud.com/T/#t

--------------------------------

Following process,

jbd2_journal_commit_transaction
// there are several dirty buffer heads in transaction->t_checkpoint_list
          P1                   wb_workfn
jbd2_log_do_checkpoint
 if (buffer_locked(bh)) // false
                            __block_write_full_page
                             trylock_buffer(bh)
                             test_clear_buffer_dirty(bh)
 if (!buffer_dirty(bh))
  __jbd2_journal_remove_checkpoint(jh)
   if (buffer_write_io_error(bh)) // false
                             >> bh IO error occurs <<
 jbd2_cleanup_journal_tail
  __jbd2_update_log_tail
   jbd2_write_superblock
   // The bh won't be replayed in next mount.
, which could corrupt the ext4 image, fetch a reproducer in [Link].

Since writeback process clears buffer dirty after locking buffer head,
we can fix it by try locking buffer and check dirtiness while buffer is
locked, the buffer head can be removed if it is neither dirty nor locked.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=217490
Fixes: 470decc6 ("[PATCH] jbd2: initial copy of files from jbd")
Signed-off-by: NZhihao Cheng <chengzhihao1@huawei.com>
Signed-off-by: NZhang Yi <yi.zhang@huawei.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Signed-off-by: NZhihao Cheng <chengzhihao1@huawei.com>
(cherry picked from commit 782635a8)

c2c33b5f

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功