提交 · 8b9ea9010139596e71aae4be5264c3dd901b13cb · openeuler / raspberrypi-kernel

17 3月, 2020 1 次提交

pagecache: support percpu refcount to imporve performance · 8b9ea901

由 Yunfeng Ye 提交于 3月 17, 2020

euleros inclusion
category: feature
feature: pagecache percpu refcount
bugzilla: 31398
CVE: NA

-------------------------------------------------

The pagecache manages the file physical pages, and the life cycle of
page is managed by atomic counting. With the increasing number of cpu
cores, the cost of atomic counting is very large when reading file
pagecaches at large concurrent.

For example, when running nginx http application, the biggest hotspot is
found in the atomic operation of find_get_entry():

 11.94% [kernel] [k] find_get_entry
  7.45% [kernel] [k] do_tcp_sendpages
  6.12% [kernel] [k] generic_file_buffered_read

So we using the percpu refcount mechanism to fix this problem. and the
test result show that the read performance of nginx http can be improved
by 100%：

  worker   original(requests/sec)   percpu(requests/sec)   imporve
  64       759656.87                1627088.95             114.2%

Notes: we use page->lru to save percpu information, so the pages with
percpu attribute will not be recycled by memory recycling process, we
should avoid grow the file size unlimited.
Signed-off-by: NYunfeng Ye <yeyunfeng@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

8b9ea901

12 3月, 2020 2 次提交

btrfs: tree-checker: Remove comprehensive root owner check · 217d3ab7

由 Qu Wenruo 提交于 3月 12, 2020

mainline inclusion
from mainline-5.2-rc1
commit ff2ac107
category: bugfix
bugzilla: NA
CVE: CVE-2019-19036
---------------------------

Commit 1ba98d08 ("Btrfs: detect corruption when non-root leaf has
zero item") introduced comprehensive root owner checker.

However it's pretty expensive tree search to locate the owner root,
especially when it get reused by mandatory read and write time
tree-checker.

This patch will remove that check, and completely rely on owner based
empty leaf check, which is much faster and still works fine for most
case.

And since we skip the old root owner check, now write time tree check
can be merged with btrfs_check_leaf_full().
Signed-off-by: NQu Wenruo <wqu@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>
Conflict:
	fs/btrfs/tree-checker.c
Signed-off-by: NYufen Yu <yuyufen@huawei.com>
Reviewed-by: Nzhangyi (F) <yi.zhang@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

217d3ab7

xfs: add agf freeblocks verify in xfs_agf_verify · 095cb0e6

由 Zheng Bin 提交于 3月 12, 2020

mainline inclusion
from mainline-v5.6
commit d0c7feaf
category: bugfix
bugzilla: 30215
CVE: NA

---------------------------

We recently used fuzz(hydra) to test XFS and automatically generate
tmp.img(XFS v5 format, but some metadata is wrong)

xfs_repair information(just one AG):
agf_freeblks 0, counted 3224 in ag 0
agf_longest 536874136, counted 3224 in ag 0
sb_fdblocks 613, counted 3228

Test as follows:
mount tmp.img tmpdir
cp file1M tmpdir
sync

In 4.19-stable, sync will stuck, the reason is:
xfs_mountfs
  xfs_check_summary_counts
    if ((!xfs_sb_version_haslazysbcount(&mp->m_sb) ||
       XFS_LAST_UNMOUNT_WAS_CLEAN(mp)) &&
       !xfs_fs_has_sickness(mp, XFS_SICK_FS_COUNTERS))
	return 0;  -->just return, incore sb_fdblocks still be 613
    xfs_initialize_perag_data

cp file1M tmpdir -->ok(write file to pagecache)
sync -->stuck(write pagecache to disk)
xfs_map_blocks
  xfs_iomap_write_allocate
    while (count_fsb != 0) {
      nimaps = 0;
      while (nimaps == 0) { --> endless loop
         nimaps = 1;
         xfs_bmapi_write(..., &nimaps) --> nimaps becomes 0 again
xfs_bmapi_write
  xfs_bmap_alloc
    xfs_bmap_btalloc
      xfs_alloc_vextent
        xfs_alloc_fix_freelist
          xfs_alloc_space_available -->fail(agf_freeblks is 0)

In linux-next, sync not stuck, cause commit c2b31643 ("xfs:
use the latest extent at writeback delalloc conversion time") remove
the above while, dmesg is as follows:
[   55.250114] XFS (loop0): page discard on page ffffea0008bc7380, inode 0x1b0c, offset 0.

Users do not know why this page is discard, the better soultion is:
1. Like xfs_repair, make sure sb_fdblocks is equal to counted
(xfs_initialize_perag_data did this, who is not called at this mount)
2. Add agf verify, if fail, will tell users to repair

This patch use the second soultion.
Signed-off-by: NZheng Bin <zhengbin13@huawei.com>
Signed-off-by: NRen Xudong <renxudong1@huawei.com>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NZheng Bin <zhengbin13@huawei.com>
Reviewed-by: Nzhangyi (F) <yi.zhang@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

095cb0e6

05 3月, 2020 37 次提交

ext4: add cond_resched() to __ext4_find_entry() · ef91f0f3

由 Shijie Luo via Kernel 提交于 2月 29, 2020

mainline inclusion
from mainline-v5.6-rc3
commit 9424ef56
category: bugfix
bugzilla: 31127
CVE: NA

-------------------------------------------------
We tested a soft lockup problem in linux 4.19 which could also
be found in linux 5.x.

When dir inode takes up a large number of blocks, and if the
directory is growing when we are searching, it's possible the
restart branch could be called many times, and the do while loop
could hold cpu a long time.

Here is the call trace in linux 4.19.

[  473.756186] Call trace:
[  473.756196]  dump_backtrace+0x0/0x198
[  473.756199]  show_stack+0x24/0x30
[  473.756205]  dump_stack+0xa4/0xcc
[  473.756210]  watchdog_timer_fn+0x300/0x3e8
[  473.756215]  __hrtimer_run_queues+0x114/0x358
[  473.756217]  hrtimer_interrupt+0x104/0x2d8
[  473.756222]  arch_timer_handler_virt+0x38/0x58
[  473.756226]  handle_percpu_devid_irq+0x90/0x248
[  473.756231]  generic_handle_irq+0x34/0x50
[  473.756234]  __handle_domain_irq+0x68/0xc0
[  473.756236]  gic_handle_irq+0x6c/0x150
[  473.756238]  el1_irq+0xb8/0x140
[  473.756286]  ext4_es_lookup_extent+0xdc/0x258 [ext4]
[  473.756310]  ext4_map_blocks+0x64/0x5c0 [ext4]
[  473.756333]  ext4_getblk+0x6c/0x1d0 [ext4]
[  473.756356]  ext4_bread_batch+0x7c/0x1f8 [ext4]
[  473.756379]  ext4_find_entry+0x124/0x3f8 [ext4]
[  473.756402]  ext4_lookup+0x8c/0x258 [ext4]
[  473.756407]  __lookup_hash+0x8c/0xe8
[  473.756411]  filename_create+0xa0/0x170
[  473.756413]  do_mkdirat+0x6c/0x140
[  473.756415]  __arm64_sys_mkdirat+0x28/0x38
[  473.756419]  el0_svc_common+0x78/0x130
[  473.756421]  el0_svc_handler+0x38/0x78
[  473.756423]  el0_svc+0x8/0xc
[  485.755156] watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [tmp:5149]

Add cond_resched() to avoid soft lockup and to provide a better
system responding.

Link: https://lore.kernel.org/r/20200215080206.13293-1-luoshijie1@huawei.comSigned-off-by: NShijie Luo <luoshijie1@huawei.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Reviewed-by: NJan Kara <jack@suse.cz>
Cc: stable@kernel.org
Reviewed-by: Nzhangyi (F) <yi.zhang@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

ef91f0f3

files_cgroup: Fix soft lockup when refcnt overflow. · 22f98d8e

由 Zhang Xiaoxu 提交于 2月 26, 2020

hulk inclusion
category: bugfix
bugzilla: 31087
CVE: NA

---------------------

There is a soft lockup call trace as below:
  CPU: 0 PID: 1360 Comm: imapsvcd Kdump: loaded Tainted: G           OE
  task: ffff8a7296e1eeb0 ti: ffff8a7296aa0000 task.ti: ffff8a7296aa0000
  RIP: 0010:[<ffffffffb691ecb4>]  [<ffffffffb691ecb4>]
  __css_tryget+0x24/0x50
  RSP: 0018:ffff8a7296aa3db8  EFLAGS: 00000a87
  RAX: 0000000080000000 RBX: ffff8a7296aa3df8 RCX: ffff8a72820d9a08
  RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff8a72820d9a00
  RBP: ffff8a7296aa3db8 R08: 000000000001c360 R09: ffffffffb6a478f4
  R10: ffffffffb6935e83 R11: ffffffffffffffd0 R12: 0000000057d35cd8
  R13: 000000d000000002 R14: ffffffffb6892fbe R15: 000000d000000002
  FS:  0000000000000000(0000) GS:ffff8a72fec00000(0063)
  knlGS:00000000c6e65b40
  CS:  0010 DS: 002b ES: 002b CR0: 0000000080050033
  CR2: 0000000057d35cd8 CR3: 00000007e8008000 CR4: 00000000003607f0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  Call Trace:
   [<ffffffffb6a93578>] files_cgroup_assign+0x48/0x60
   [<ffffffffb6a47972>] dup_fd+0xb2/0x2f0
   [<ffffffffb6935e83>] ? audit_alloc+0xe3/0x180
   [<ffffffffb6893a03>] copy_process+0xbd3/0x1a40
   [<ffffffffb6894a21>] do_fork+0x91/0x320
   [<ffffffffb6f329e6>] ? trace_do_page_fault+0x56/0x150
   [<ffffffffb6894d36>] SyS_clone+0x16/0x20
   [<ffffffffb6f3bf8c>] ia32_ptregs_common+0x4c/0xfc
   code: 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 8d 4f 08 48 89 e5 8b
         47 08 8d 90 00 00 00 80 85 c0 0f 49 d0 8d 72 01 89 d0 f0 0f b1

When the child process exit, we doesn't call dec refcnt, so, the refcnt
maybe overflow. Then the 'task_get_css' will dead loop because the
'css_refcnt' will return an unbias refcnt, if the refcnt is negitave,
'__css_tryget' always return false, then 'task_get_css' dead looped.

The child process always call 'close_files' when exit, add dec refcnt in
it.
Signed-off-by: NZhang Xiaoxu <zhangxiaoxu5@huawei.com>
Reviewed-by: NHou Tao <houtao1@huawei.com>
Signed-off-by: NZhang Xiaoxu <zhangxiaoxu5@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

22f98d8e

jbd2: do not clear the BH_Mapped flag when forgetting a metadata buffer · 9e486a76

由 zhangyi (F) 提交于 2月 24, 2020

[ Upstream commit c96dceea ]

Commit 904cdbd4 ("jbd2: clear dirty flag when revoking a buffer from
an older transaction") set the BH_Freed flag when forgetting a metadata
buffer which belongs to the committing transaction, it indicate the
committing process clear dirty bits when it is done with the buffer. But
it also clear the BH_Mapped flag at the same time, which may trigger
below NULL pointer oops when block_size < PAGE_SIZE.

rmdir 1             kjournald2                 mkdir 2
                    jbd2_journal_commit_transaction
		    commit transaction N
jbd2_journal_forget
set_buffer_freed(bh1)
                    jbd2_journal_commit_transaction
                     commit transaction N+1
                     ...
                     clear_buffer_mapped(bh1)
                                               ext4_getblk(bh2 ummapped)
                                               ...
                                               grow_dev_page
                                                init_page_buffers
                                                 bh1->b_private=NULL
                                                 bh2->b_private=NULL
                     jbd2_journal_put_journal_head(jh1)
                      __journal_remove_journal_head(hb1)
		       jh1 is NULL and trigger oops

*) Dir entry block bh1 and bh2 belongs to one page, and the bh2 has
   already been unmapped.

For the metadata buffer we forgetting, we should always keep the mapped
flag and clear the dirty flags is enough, so this patch pick out the
these buffers and keep their BH_Mapped flag.

Link: https://lore.kernel.org/r/20200213063821.30455-3-yi.zhang@huawei.com
Fixes: 904cdbd4 ("jbd2: clear dirty flag when revoking a buffer from an older transaction")
Reviewed-by: NJan Kara <jack@suse.cz>
Signed-off-by: Nzhangyi (F) <yi.zhang@huawei.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

9e486a76

jbd2: move the clearing of b_modified flag to the journal_unmap_buffer() · c61ee205

由 zhangyi (F) 提交于 2月 24, 2020

[ Upstream commit 6a66a7ded12baa6ebbb2e3e82f8cb91382814839 ]

There is no need to delay the clearing of b_modified flag to the
transaction committing time when unmapping the journalled buffer, so
just move it to the journal_unmap_buffer().

Link: https://lore.kernel.org/r/20200213063821.30455-2-yi.zhang@huawei.comReviewed-by: NJan Kara <jack@suse.cz>
Signed-off-by: Nzhangyi (F) <yi.zhang@huawei.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

c61ee205

Revert "debugfs: fix kabi for function debugfs_remove_recursive" · acd24e6d

由 Yang Yingliang 提交于 2月 24, 2020

hulk inclusion
category: bugfix
bugzilla: 30939
CVE: NA

---------------------------

The kabi can be broken before official release.

This reverts commit ce620c1a6783b2341a376ef948484b5314ed064e.
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
Reviewed-By: NXie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

acd24e6d

files_cgroup: fix error pointer when kvm_vm_worker_thread · 97b7da5b

由 Zhang Xiaoxu 提交于 2月 20, 2020

hulk inclusion
category: bugfix
bugzilla: NA
CVE: NA

---------------------------

When fix CVE-2018-12207, the kvm_vm_worker_thread will attach all
cgroup subsystem. But the files cgroup doesn't support kernel thread.

Because the init_files doesn't init the files cgroup, when kernel thread
'kvm_vm_worker_thread' attach the files cgroup, the files_cgroup
get from 'init_files' is an error pointer. It lead the kernel panic
as below:
  [  724.842302]  page_counter_uncharge+0x1d/0x30
  [  724.842431]  files_cgroup_attach+0x7c/0x130
  [  724.842564]  ? css_set_move_task+0x12e/0x230
  [  724.842694]  cgroup_migrate_execute+0x2f9/0x3b0
  [  724.842833]  cgroup_attach_task+0x156/0x200
  [  724.843010]  ? kvm_mmu_pte_write+0x490/0x490 [kvm]
  [  724.843153]  cgroup_attach_task_all+0x81/0xd0
  [  724.843289]  ? __schedule+0x294/0x910
  [  724.843419]  kvm_vm_worker_thread+0x4a/0xc0 [kvm]
  [  724.843579]  ? kvm_exit+0x80/0x80 [kvm]
  [  724.843690]  kthread+0x112/0x130
  [  724.843792]  ?kthread_create_worker_on_cpu+0x70/0x70
  [  724.843948]  ret_from_fork+0x35/0x40

So, we add some check, if the task is kernel thread (files is
'init_files'), we doesn't do the more operation about the
files cgroup.

Fixes: baa10bc24e1e ("kvm: Add helper function for creating VM ...")
Signed-off-by: NZhang Xiaoxu <zhangxiaoxu5@huawei.com>
Reviewed-by: NJason Yan <yanaijie@huawei.com>
Reviewed-by: NHou Tao <houtao1@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

97b7da5b

bdi: get device name under rcu protect · de1b854e

由 Yufen Yu 提交于 2月 20, 2020

hulk inclusion
category: bugfix
bugzilla: 30109
CVE: NA
---------------------------

bdi->dev may be set as "NULL" or freed by bdi_unregister().
To avoid causing "NULL" pointer reference or use-after-free
in user, we add a common function bdi_get_dev_name(), in which
dev is protected by RCU lock. Then, the caller can get device
name safely.

Fixes: 5ca4579ae59b ("bdi: fix use-after-free for the bdi device")
Signed-off-by: NYufen Yu <yuyufen@huawei.com>
Reviewed-by: NHou Tao <houao1@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

de1b854e

debugfs: fix kabi for function debugfs_remove_recursive · 27d247b2

由 yu kuai 提交于 2月 20, 2020

hulk inclusion
category: bugfix
bugzilla: 24454
CVE: NA

---------------------------

debugfs_remove_recursive was changed from a function to an alias to
debugfs_remove in patch "simple_recursive_removal(): kernel-side rm -rf
for ramfs-style filesystems". Change it back to a function.
Signed-off-by: Nyu kuai <yukuai3@huawei.com>
Reviewed-by: Nzhangyi (F) <yi.zhang@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

27d247b2

simple_recursive_removal(): kernel-side rm -rf for ramfs-style filesystems · 3d1b056c

由 yu kuai 提交于 2月 20, 2020

mainline inclusion
from mainline-5.6-rc1
commit a3d1e7eb5abe3aa1095bc75d1a6760d3809bd672
category: bugfix
bugzilla: 24454
CVE: NA

---------------------------

two requirements: no file creations in IS_DEADDIR and no cross-directory
renames whatsoever.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

Conflicts:
 fs/debugfs/inode.c
 fs/libfs.c
 fs/tracefs/inode.c
 include/linux/debugfs.h
 include/linux/fs.h
 include/linux/tracefs.h
 kernel/trace/trace.c
functional changes:
 replace current_time() with current_fs_time()
 remove call to fsnotify_rmdir() and fsnotify_unlink()
Signed-off-by: Nyu kuai <yukuai3@huawei.com>
Reviewed-by: Nzhangyi (F) <yi.zhang@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

3d1b056c

debugfs: simplify __debugfs_remove_file() · a58de7a2

由 Amir Goldstein 提交于 2月 20, 2020

mainline inclusion
from mainline-5.3-rc1
commit 823e545c027795997f29ec5c255aff605cf39e85
category: bugfix
bugzilla: 24454
CVE: MA

---------------------------

Move simple_unlink()+d_delete() from __debugfs_remove_file() into
caller __debugfs_remove() and rename helper for post remove file to
__debugfs_file_removed().

This will simplify adding fsnotify_unlink() hook.

Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: Nyu kuai <yukuai3@huawei.com>
Reviewed-by: Nzhangyi (F) <yi.zhang@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

a58de7a2

ext4: add cond_resched() to ext4_protect_reserved_inode · d097c009

由 Shijie Luo 提交于 2月 20, 2020

mainline inclusion
from mainline-v5.6-rc2
commit af133ade9a40794a37104ecbcc2827c0ea373a3c
category: bugfix
bugzilla: 13690
CVE: CVE-2020-8992

-------------------------------------------------

When journal size is set too big by "mkfs.ext4 -J size=", or when
we mount a crafted image to make journal inode->i_size too big,
the loop, "while (i < num)", holds cpu too long. This could cause
soft lockup.

[  529.357541] Call trace:
[  529.357551]  dump_backtrace+0x0/0x198
[  529.357555]  show_stack+0x24/0x30
[  529.357562]  dump_stack+0xa4/0xcc
[  529.357568]  watchdog_timer_fn+0x300/0x3e8
[  529.357574]  __hrtimer_run_queues+0x114/0x358
[  529.357576]  hrtimer_interrupt+0x104/0x2d8
[  529.357580]  arch_timer_handler_virt+0x38/0x58
[  529.357584]  handle_percpu_devid_irq+0x90/0x248
[  529.357588]  generic_handle_irq+0x34/0x50
[  529.357590]  __handle_domain_irq+0x68/0xc0
[  529.357593]  gic_handle_irq+0x6c/0x150
[  529.357595]  el1_irq+0xb8/0x140
[  529.357599]  __ll_sc_atomic_add_return_acquire+0x14/0x20
[  529.357668]  ext4_map_blocks+0x64/0x5c0 [ext4]
[  529.357693]  ext4_setup_system_zone+0x330/0x458 [ext4]
[  529.357717]  ext4_fill_super+0x2170/0x2ba8 [ext4]
[  529.357722]  mount_bdev+0x1a8/0x1e8
[  529.357746]  ext4_mount+0x44/0x58 [ext4]
[  529.357748]  mount_fs+0x50/0x170
[  529.357752]  vfs_kern_mount.part.9+0x54/0x188
[  529.357755]  do_mount+0x5ac/0xd78
[  529.357758]  ksys_mount+0x9c/0x118
[  529.357760]  __arm64_sys_mount+0x28/0x38
[  529.357764]  el0_svc_common+0x78/0x130
[  529.357766]  el0_svc_handler+0x38/0x78
[  529.357769]  el0_svc+0x8/0xc
[  541.356516] watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [mount:18674]

Link: https://lore.kernel.org/r/20200211011752.29242-1-luoshijie1@huawei.comReviewed-by: NJan Kara <jack@suse.cz>
Signed-off-by: NShijie Luo <luoshijie1@huawei.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
Reviewed-by: Nzhangyi (F) <yi.zhang@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

d097c009

vfs: fix do_last() regression · 06008ebc

由 Al Viro 提交于 2月 20, 2020

commit 6404674acd596de41fd3ad5f267b4525494a891a upstream.

Brown paperbag time: fetching ->i_uid/->i_mode really should've been
done from nd->inode.  I even suggested that, but the reason for that has
slipped through the cracks and I went for dir->d_inode instead - made
for more "obvious" patch.

Analysis:

 - at the entry into do_last() and all the way to step_into(): dir (aka
   nd->path.dentry) is known not to have been freed; so's nd->inode and
   it's equal to dir->d_inode unless we are already doomed to -ECHILD.
   inode of the file to get opened is not known.

 - after step_into(): inode of the file to get opened is known; dir
   might be pointing to freed memory/be negative/etc.

 - at the call of may_create_in_sticky(): guaranteed to be out of RCU
   mode; inode of the file to get opened is known and pinned; dir might
   be garbage.

The last was the reason for the original patch.  Except that at the
do_last() entry we can be in RCU mode and it is possible that
nd->path.dentry->d_inode has already changed under us.

In that case we are going to fail with -ECHILD, but we need to be
careful; nd->inode is pointing to valid struct inode and it's the same
as nd->path.dentry->d_inode in "won't fail with -ECHILD" case, so we
should use that.
Reported-by: N"Rantala, Tommi T. (Nokia - FI/Espoo)" <tommi.t.rantala@nokia.com>
Reported-by: syzbot+190005201ced78a74ad6@syzkaller.appspotmail.com
Wearing-brown-paperbag: Al Viro <viro@zeniv.linux.org.uk>
Cc: stable@kernel.org
Fixes: d0cb50185ae9 ("do_last(): fetch directory ->i_mode and ->i_uid before it's too late")
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

06008ebc

do_last(): fetch directory ->i_mode and ->i_uid before it's too late · f90b8e38

由 Al Viro 提交于 2月 20, 2020

commit d0cb50185ae942b03c4327be322055d622dc79f6 upstream.

may_create_in_sticky() call is done when we already have dropped the
reference to dir.

Fixes: 30aba665 (namei: allow restricted O_CREAT of FIFOs and regular files)
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

f90b8e38

ext4: reserve revoke credits in __ext4_new_inode · e8a1b6b3

由 yangerkun 提交于 2月 20, 2020

mainline inclusion
from mainline-5.5-rc3
commit a70fd5ac2ea787cafe07b69dadd16b3648ad64ac
category: bugfix
bugzilla: 25031
CVE: NA
-----------------------------------

It's possible that __ext4_new_inode will release the xattr block, so
it will trigger a warning since there is revoke credits will be 0 if
the handle == NULL. The below scripts can reproduce it easily.

------------[ cut here ]------------
WARNING: CPU: 0 PID: 3861 at fs/jbd2/revoke.c:374 jbd2_journal_revoke+0x30e/0x540 fs/jbd2/revoke.c:374
...
__ext4_forget+0x1d7/0x800 fs/ext4/ext4_jbd2.c:248
ext4_free_blocks+0x213/0x1d60 fs/ext4/mballoc.c:4743
ext4_xattr_release_block+0x55b/0x780 fs/ext4/xattr.c:1254
ext4_xattr_block_set+0x1c2c/0x2c40 fs/ext4/xattr.c:2112
ext4_xattr_set_handle+0xa7e/0x1090 fs/ext4/xattr.c:2384
__ext4_set_acl+0x54d/0x6c0 fs/ext4/acl.c:214
ext4_init_acl+0x218/0x2e0 fs/ext4/acl.c:293
__ext4_new_inode+0x352a/0x42b0 fs/ext4/ialloc.c:1151
ext4_mkdir+0x2e9/0xbd0 fs/ext4/namei.c:2774
vfs_mkdir+0x386/0x5f0 fs/namei.c:3811
do_mkdirat+0x11c/0x210 fs/namei.c:3834
do_syscall_64+0xa1/0x530 arch/x86/entry/common.c:294
...
-------------------------------------

scripts:
mkfs.ext4 /dev/vdb
mount /dev/vdb /mnt
cd /mnt && mkdir dir && for i in {1..8}; do setfacl -dm "u:user_"$i":rx" dir; done
mkdir dir/dir1 && mv dir/dir1 ./
sh repro.sh && add some user

[root@localhost ~]# cat repro.sh
while [ 1 -eq 1 ]; do
    rm -rf dir
    rm -rf dir1/dir1
    mkdir dir
    for i in {1..8}; do  setfacl -dm "u:test"$i":rx" dir; done
    setfacl -m "u:user_9:rx" dir &
    mkdir dir1/dir1 &
done

Before exec repro.sh, dir1 has inherit the default acl from dir, and
xattr block of dir1 dir is not the same, so the h_refcount of these
two dir's xattr block will be 1. Then repro.sh can trigger the warning
with the situation show as below. The last h_refcount can be clear
with mkdir, and __ext4_new_inode has not reserved revoke credits, so
the warning will happened, fix it by reserve revoke credits in
__ext4_new_inode.

Thread 1                        Thread 2
mkdir dir
set default acl(will create
a xattr block blk1 and the
refcount of ext4_xattr_header
will be 1)
				...
                                mkdir dir1/dir1
				->....->ext4_init_acl
				->__ext4_set_acl(set default acl,
			          will reuse blk1, and h_refcount
				  will be 2)

setfacl->ext4_set_acl->...
->ext4_xattr_block_set(will create
new block blk2 to store xattr)

				->__ext4_set_acl(set access acl, since
				  h_refcount of blk1 is 2, will create
				  blk3 to store xattr)

  ->ext4_xattr_release_block(dec
  h_refcount of blk1 to 1)
				  ->ext4_xattr_release_block(dec
				    h_refcount and since it is 0,
				    will release the block and trigger
				    the warning)

Link: https://lore.kernel.org/r/20191213014900.47228-1-yangerkun@huawei.comReported-by: NHulk Robot <hulkci@huawei.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Signed-off-by: Nyangerkun <yangerkun@huawei.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Signed-off-by: Nzhangyi (F) <yi.zhang@huawei.com>
Reviewed-by: NYang Erkun <yangerkun@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

e8a1b6b3

jbd2: Fine tune estimate of necessary descriptor blocks · cf55eb91

由 Jan Kara 提交于 2月 20, 2020

mainline inclusion
from mainline-5.5-rc1
commit 19014d697147c6aea3a34eea00a2844e698d070f
category: bugfix
bugzilla: 25031
CVE: NA
---------------------------

Currently we reserve j_max_transaction_buffers / 32 for transaction
descriptor blocks. Now that revoke descriptors are accounted for
separately this estimate is unnecessarily high and we can actually
compute much tighter estimate. In the common case of 32k journal blocks
and 4k blocksize this actually reduces the amount of reserved descriptor
blocks from 256 to ~25 which allows us to fit more real data into a
transaction.
Signed-off-by: NJan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20191105164437.32602-25-jack@suse.czSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
Signed-off-by: Nzhangyi (F) <yi.zhang@huawei.com>
Reviewed-by: NYang Erkun <yangerkun@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

cf55eb91

jbd2: Provide trace event for handle restarts · 92a8f9f1

由 Jan Kara 提交于 2月 20, 2020

mainline inclusion
from mainline-5.5-rc1
commit 0094f981bbaca3ae707c95c5e5977429d29c2dd0
category: bugfix
bugzilla: 25031
CVE: NA
---------------------------

Provide trace event for handle restarts to ease debugging.
Reviewed-by: NTheodore Ts'o <tytso@mit.edu>
Signed-off-by: NJan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20191105164437.32602-24-jack@suse.czSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
Signed-off-by: Nzhangyi (F) <yi.zhang@huawei.com>
Reviewed-by: NYang Erkun <yangerkun@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

92a8f9f1

ext4: Reserve revoke credits for freed blocks · a9e4f54d

由 Jan Kara 提交于 2月 20, 2020

mainline inclusion
from mainline-5.5-rc1
commit 83448bdfb59731c2f54784ed3f4a93ff95be6e7e
category: bugfix
bugzilla: 25031
CVE: NA
---------------------------

So far we have reserved only relatively high fixed amount of revoke
credits for each transaction. We over-reserved by large amount for most
cases but when freeing large directories or files with data journalling,
the fixed amount is not enough. In fact the worst case estimate is
inconveniently large (maximum extent size) for freeing of one extent.

We fix this by doing proper estimate of the amount of blocks that need
to be revoked when removing blocks from the inode due to truncate or
hole punching and otherwise reserve just a small amount of revoke
credits for each transaction to accommodate freeing of xattrs block or
so.
Signed-off-by: NJan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20191105164437.32602-23-jack@suse.czSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
Signed-off-by: Nzhangyi (F) <yi.zhang@huawei.com>
Reviewed-by: NYang Erkun <yangerkun@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

a9e4f54d

jbd2: Make credit checking more strict · 31b0cfea

由 Jan Kara 提交于 2月 20, 2020

mainline inclusion
from mainline-5.5-rc1
commit d090707edab59cb07047d6d7e138ffcc3bdc42be
category: bugfix
bugzilla: 25031
CVE: NA
---------------------------

Make checking of available credits in jbd2_journal_dirty_metadata() more
strict. There should be always enough credits in the handle to write all
potential revoke descriptors. Also we warn in case there are not enough
credits since this is a bug in the filesystem.
Reviewed-by: NTheodore Ts'o <tytso@mit.edu>
Signed-off-by: NJan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20191105164437.32602-22-jack@suse.czSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
Signed-off-by: Nzhangyi (F) <yi.zhang@huawei.com>
Reviewed-by: NYang Erkun <yangerkun@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

31b0cfea

jbd2: Rename h_buffer_credits to h_total_credits · c38a5168

由 Jan Kara 提交于 2月 20, 2020

mainline inclusion
from mainline-5.5-rc1
commit 933f1c1e0b75bbc29730eef07c9e196c6dfd37e5
category: bugfix
bugzilla: 25031
CVE: NA
---------------------------

The credit counter now contains both buffer and revoke descriptor block
credits. Rename to counter to h_total_credits to reflect that. No
functional change.
Reviewed-by: NTheodore Ts'o <tytso@mit.edu>
Signed-off-by: NJan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20191105164437.32602-21-jack@suse.czSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

Conflict:
  fs/jbd2/transaction.c
Signed-off-by: Nzhangyi (F) <yi.zhang@huawei.com>
Reviewed-by: NYang Erkun <yangerkun@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

c38a5168

jbd2: Reserve space for revoke descriptor blocks · 38d8c053

由 Jan Kara 提交于 2月 20, 2020

mainline inclusion
from mainline-5.5-rc1
commit fdc3ef882a5d59c1709a13b5486ae2b1632e12b6
category: bugfix
bugzilla: 25031
CVE: NA
---------------------------

Extend functions for starting, extending, and restarting transaction
handles to take number of revoke records handle must be able to
accommodate. These functions then make sure transaction has enough
credits to be able to store resulting revoke descriptor blocks. Also
revoke code tracks number of revoke records created by a handle to catch
situation where some place didn't reserve enough space for revoke
records. Similarly to standard transaction credits, space for unused
reserved revoke records is released when the handle is stopped.

On the ext4 side we currently take a simplistic approach of reserving
space for 1024 revoke records for any transaction. This grows amount of
credits reserved for each handle only by a few and is enough for any
normal workload so that we don't hit warnings in jbd2. We will refine
the logic in following commits.
Signed-off-by: NJan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20191105164437.32602-20-jack@suse.czSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

Conflict:
  include/linux/jbd2.h
Signed-off-by: Nzhangyi (F) <yi.zhang@huawei.com>
Reviewed-by: NYang Erkun <yangerkun@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

38d8c053

jbd2: Drop jbd2_space_needed() · a5140ec0

由 Jan Kara 提交于 2月 20, 2020

mainline inclusion
from mainline-5.5-rc1
commit 77444ac4f9537bc4211f928959d5231445e30c6e
category: bugfix
bugzilla: 25031
CVE: NA
---------------------------

The function is now just a trivial wrapper returning
journal->j_max_transaction_buffers. Drop it.
Reviewed-by: NTheodore Ts'o <tytso@mit.edu>
Signed-off-by: NJan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20191105164437.32602-19-jack@suse.czSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
Signed-off-by: Nzhangyi (F) <yi.zhang@huawei.com>
Reviewed-by: NYang Erkun <yangerkun@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

a5140ec0

jbd2: remove repeated assignments in __jbd2_log_wait_for_space() · 49a4aa06

由 Liu Song 提交于 2月 20, 2020

mainline inclusion
from mainline-5.2-rc1
commit fb203751099eecf145317685ee480a51e5b246de
category: bugfix
bugzilla: 25031
CVE: NA
---------------------------

At the beginning, nblocks has been assigned. There is no need
to repeat the assignment in the while loop, and remove it.
Signed-off-by: NLiu Song <liu.song11@zte.com.cn>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Reviewed-by: NJan Kara <jack@suse.cz>
Signed-off-by: Nzhangyi (F) <yi.zhang@huawei.com>
Reviewed-by: NYang Erkun <yangerkun@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

49a4aa06

jbd2: Account descriptor blocks into t_outstanding_credits · ad68953d

由 Jan Kara 提交于 2月 20, 2020

mainline inclusion
from mainline-5.5-rc1
commit 9f356e5a4f12008fa0df8b6385fc0ab830416e72
category: bugfix
bugzilla: 25031
CVE: NA
---------------------------

Currently, journal descriptor blocks were not accounted in
transaction->t_outstanding_credits and we were just leaving some slack
space in the journal for them (in jbd2_log_space_left() and
jbd2_space_needed()). This is making proper accounting (and reservation
we want to add) of descriptor blocks difficult so switch to accounting
descriptor blocks in transaction->t_outstanding_credits and just reserve
the same amount of credits in t_outstanding credits for journal
descriptor blocks when creating transaction.
Signed-off-by: NJan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20191105164437.32602-18-jack@suse.czSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

Conflict:
  include/linux/jbd2.h
Signed-off-by: Nzhangyi (F) <yi.zhang@huawei.com>
Reviewed-by: NYang Erkun <yangerkun@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

ad68953d

jbd2: Factor out common parts of stopping and restarting a handle · 24b3a1a1

由 Jan Kara 提交于 2月 20, 2020

mainline inclusion
from mainline-5.5-rc1
commit ec8b6f600e49dc87a8564807fec4193bf93ee2b5
category: bugfix
bugzilla: 25031
CVE: NA
---------------------------

jbd2__journal_restart() has quite some code that is common with
jbd2_journal_stop(). Factor this functionality into stop_this_handle()
helper and use it from both functions. Note that this also drops
t_handle_lock protection from jbd2__journal_restart() as
jbd2_journal_stop() does the same thing without it.
Signed-off-by: NJan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20191105164437.32602-17-jack@suse.czSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
Signed-off-by: Nzhangyi (F) <yi.zhang@huawei.com>
Reviewed-by: NYang Erkun <yangerkun@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

24b3a1a1

jbd2: Drop pointless wakeup from jbd2_journal_stop() · 5a8e7a6e

由 Jan Kara 提交于 2月 20, 2020

mainline inclusion
from mainline-5.5-rc1
commit 5559b2d81b51de75cb7864bb1fbb82982f7e8fff
category: bugfix
bugzilla: 25031
CVE: NA
---------------------------

When we drop last handle from a transaction and journal->j_barrier_count
> 0, jbd2_journal_stop() wakes up journal->j_wait_transaction_locked
wait queue. This looks pointless - wait for outstanding handles always
happens on journal->j_wait_updates waitqueue.
journal->j_wait_transaction_locked is used to wait for transaction state
changes and by start_this_handle() for waiting until
journal->j_barrier_count drops to 0. The first case is clearly
irrelevant here since only jbd2 thread changes transaction state. The
second case looks related but jbd2_journal_unlock_updates() is
responsible for the wakeup in this case. So just drop the wakeup.
Reviewed-by: NTheodore Ts'o <tytso@mit.edu>
Signed-off-by: NJan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20191105164437.32602-16-jack@suse.czSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
Signed-off-by: Nzhangyi (F) <yi.zhang@huawei.com>
Reviewed-by: NYang Erkun <yangerkun@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

5a8e7a6e

jbd2: Drop pointless check from jbd2_journal_stop() · 6d31bb53

由 Jan Kara 提交于 2月 20, 2020

mainline inclusion
from mainline-5.5-rc1
commit 150549ed2fcf4be9bf3efedd99b72924dff26166
category: bugfix
bugzilla: 25031
CVE: NA
---------------------------

If a transaction is larger than journal->j_max_transaction_buffers, that
is a bug and not a trigger for transaction commit. Also the very next
attempt to start new handle will start transaction commit anyway. So
just remove the pointless check. Arguably, we could start transaction
commit whenever the transaction size is *close* to
journal->j_max_transaction_buffers. This has a potential to reduce
latency of the next jbd2_journal_start() at the cost of somewhat smaller
transactions. However for this to have any effect, it would mean that
there isn't someone already waiting in jbd2_journal_start() which means
metadata load for the fs is pretty light anyway so probably this
optimization is not worth it.
Reviewed-by: NTheodore Ts'o <tytso@mit.edu>
Signed-off-by: NJan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20191105164437.32602-15-jack@suse.czSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
Signed-off-by: Nzhangyi (F) <yi.zhang@huawei.com>
Reviewed-by: NYang Erkun <yangerkun@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

6d31bb53

jbd2: Reorganize jbd2_journal_stop() · 8503a897

由 Jan Kara 提交于 2月 20, 2020

mainline inclusion
from mainline-5.5-rc1
commit dfaf5ffda227be3e867fee7c0f6a66749392fbd0
category: bugfix
bugzilla: 25031
CVE: NA
---------------------------

Move code in jbd2_journal_stop() around a bit. It removes some
unnecessary code duplication and will make factoring out parts common
with jbd2__journal_restart() easier.
Reviewed-by: NTheodore Ts'o <tytso@mit.edu>
Signed-off-by: NJan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20191105164437.32602-14-jack@suse.czSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
Signed-off-by: Nzhangyi (F) <yi.zhang@huawei.com>
Reviewed-by: NYang Erkun <yangerkun@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

8503a897

ocfs2: Use accessor function for h_buffer_credits · c75a7e3b

由 Jan Kara 提交于 2月 20, 2020

mainline inclusion
from mainline-5.5-rc1
commit 9797a902480521dc8e7a478e38f0c896ffff8784
category: bugfix
bugzilla: 25031
CVE: NA
---------------------------

Use the jbd2 accessor function for h_buffer_credits.
Reviewed-by: NTheodore Ts'o <tytso@mit.edu>
Signed-off-by: NJan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20191105164437.32602-12-jack@suse.czSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
Signed-off-by: Nzhangyi (F) <yi.zhang@huawei.com>
Reviewed-by: NYang Erkun <yangerkun@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

c75a7e3b

ext4, jbd2: Provide accessor function for handle credits · 02b7342a

由 Jan Kara 提交于 2月 20, 2020

mainline inclusion
from mainline-5.5-rc1
commit a9a8344ee1714f835ba394077e8c13d751e2f148
category: bugfix
bugzilla: 25031
CVE: NA
---------------------------

Provide accessor function to get number of credits available in a handle
and use it from ext4. Later, computation of available credits won't be
so straightforward.
Reviewed-by: NTheodore Ts'o <tytso@mit.edu>
Signed-off-by: NJan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20191105164437.32602-11-jack@suse.czSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
Signed-off-by: Nzhangyi (F) <yi.zhang@huawei.com>
Reviewed-by: NYang Erkun <yangerkun@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

02b7342a

ext4: Provide function to handle transaction restarts · e86e70d2

由 Jan Kara 提交于 2月 20, 2020

mainline inclusion
from mainline-5.5-rc1
commit a413036791d040e33badcc634453a4d0c0705499
category: bugfix
bugzilla: 25031
CVE: NA
---------------------------

Provide ext4_journal_ensure_credits_fn() function to ensure transaction
has given amount of credits and call helper function to prepare for
restarting a transaction. This allows to remove some boilerplate code
from various places, add proper error handling for the case where
transaction extension or restart fails, and reduces following changes
needed for proper revoke record reservation tracking.
Signed-off-by: NJan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20191105164437.32602-10-jack@suse.czSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

Conflict:
  fs/ext4/ext4.h
Signed-off-by: Nzhangyi (F) <yi.zhang@huawei.com>
Reviewed-by: NYang Erkun <yangerkun@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

e86e70d2

ext4: Avoid unnecessary revokes in ext4_alloc_branch() · d6cf3853

由 Jan Kara 提交于 2月 20, 2020

mainline inclusion
from mainline-5.5-rc1
commit f2890730f8292831b7741d89a65b9c6834d85ee6
category: bugfix
bugzilla: 25031
CVE: NA
---------------------------

Error cleanup path in ext4_alloc_branch() calls ext4_forget() on freshly
allocated indirect blocks with 'metadata' set to 1. This results in
generating revoke records for these blocks. However this is unnecessary
as the freed blocks are only allocated in the current transaction and
thus they will never be journalled. Make this cleanup path similar to
e.g. cleanup in ext4_splice_branch() and use ext4_free_blocks() to
handle block forgetting by passing EXT4_FREE_BLOCKS_FORGET and not
EXT4_FREE_BLOCKS_METADATA to ext4_free_blocks(). This also allows
allocating transaction not to reserve any credits for revoke records.
Reviewed-by: NTheodore Ts'o <tytso@mit.edu>
Signed-off-by: NJan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20191105164437.32602-9-jack@suse.czSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
Signed-off-by: Nzhangyi (F) <yi.zhang@huawei.com>
Reviewed-by: NYang Erkun <yangerkun@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

d6cf3853

ext4: Use ext4_journal_extend() instead of jbd2_journal_extend() · 1a5d37a8

由 Jan Kara 提交于 2月 20, 2020

mainline inclusion
from mainline-5.5-rc1
commit 6cb367c2d1f8875043aa2d238eca9a2602dc1f72
category: bugfix
bugzilla: 25031
CVE: NA
---------------------------

Use ext4 helper ext4_journal_extend() instead of opencoding it in
ext4_try_to_expand_extra_isize().
Reviewed-by: NTheodore Ts'o <tytso@mit.edu>
Signed-off-by: NJan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20191105164437.32602-8-jack@suse.czSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
Signed-off-by: Nzhangyi (F) <yi.zhang@huawei.com>
Reviewed-by: NYang Erkun <yangerkun@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

1a5d37a8

ext4: Fix ext4_should_journal_data() for EA inodes · d8f6dedd

由 Jan Kara 提交于 2月 20, 2020

mainline inclusion
from mainline-5.5-rc1
commit 321238fbfb49003c66caecb1eefb5238dce27b61
category: bugfix
bugzilla: 25031
CVE: NA
---------------------------

Similarly to directories, EA inodes do only journalled modifications to
their data. Change ext4_should_journal_data() to return true for them so
that we don't have to special-case them during truncate.
Signed-off-by: NJan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20191105164437.32602-7-jack@suse.czSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
Signed-off-by: Nzhangyi (F) <yi.zhang@huawei.com>
Reviewed-by: NYang Erkun <yangerkun@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

d8f6dedd

ext4: Do not iput inode under running transaction · d5684257

由 Jan Kara 提交于 2月 20, 2020

mainline inclusion
from mainline-5.5-rc1
commit 9b88f9fb0d2fc8f7e71e75a42c5a064bc6cfffd2
category: bugfix
bugzilla: 25031
CVE: NA
---------------------------

When ext4_mkdir(), ext4_symlink(), ext4_create(), or ext4_mknod() fail
to add entry into directory, it ends up dropping freshly created inode
under the running transaction and thus inode truncation happens under
that transaction. That breaks assumptions that evict() does not get
called from a transaction context and at least in ext4_symlink() case it
can result in inode eviction deadlocking in inode_wait_for_writeback()
when flush worker finds symlink inode, starts to write it back and
blocks on starting a transaction. So change the code in ext4_mkdir() and
ext4_add_nondir() to drop inode reference only after the transaction is
stopped. We also have to add inode to the orphan list in that case as
otherwise the inode would get leaked in case we crash before inode
deletion is committed.

CC: stable@vger.kernel.org
Signed-off-by: NJan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20191105164437.32602-5-jack@suse.czSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
Signed-off-by: Nzhangyi (F) <yi.zhang@huawei.com>
Reviewed-by: NYang Erkun <yangerkun@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

d5684257

ext4: Move marking of handle as sync to ext4_add_nondir() · ddce62f1

由 Jan Kara 提交于 2月 20, 2020

mainline inclusion
from mainline-5.5-rc1
commit a9e26328adfa82b1f3c941bc6e3daea47631abce
category: bugfix
bugzilla: 25031
CVE: NA
---------------------------

Every caller of ext4_add_nondir() marks handle as sync if directory has
DIRSYNC set. Move this marking to ext4_add_nondir() so reduce some
duplication.
Signed-off-by: NJan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20191105164437.32602-4-jack@suse.czSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
Signed-off-by: Nzhangyi (F) <yi.zhang@huawei.com>
Reviewed-by: NYang Erkun <yangerkun@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

ddce62f1

jbd2: Completely fill journal descriptor blocks · 489862d3

由 Jan Kara 提交于 2月 20, 2020

mainline inclusion
from mainline-5.5-rc1
commit b90bfdf581194a0fa5f6c26fef1e522f15f6212e
category: bugfix
bugzilla: 25031
CVE: NA
---------------------------

With 32-bit block numbers, we don't allocate the array for journal
buffer heads large enough for corresponding descriptor tags to fill the
descriptor block. Thus we end up writing out half-full descriptor blocks
to the journal unnecessarily growing the transaction. Fix the logic to
allocate the array large enough.
Signed-off-by: NJan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20191105164437.32602-3-jack@suse.czSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
Signed-off-by: Nzhangyi (F) <yi.zhang@huawei.com>
Reviewed-by: NYang Erkun <yangerkun@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

489862d3

jbd2: Fixup stale comment in commit code · 84cbc05b

由 Jan Kara 提交于 2月 20, 2020

mainline inclusion
from mainline-5.5-rc1
commit 0db45889453644bb5d3e3c6044f4d81b910d41ef
category: bugfix
bugzilla: 25031
CVE: NA
---------------------------

jbd2_journal_next_log_block() does not look at
transaction->t_outstanding_credits. Remove the misleading comment.
Reviewed-by: NTheodore Ts'o <tytso@mit.edu>
Signed-off-by: NJan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20191105164437.32602-2-jack@suse.czSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
Signed-off-by: Nzhangyi (F) <yi.zhang@huawei.com>
Reviewed-by: NYang Erkun <yangerkun@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

84cbc05b