- 29 3月, 2023 1 次提交
-
-
由 Dave Chinner 提交于
mainline inclusion from mainline-v5.17-rc3 commit ebb7fb15 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I4KIAO CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ebb7fb1557b1d03b906b668aa2164b51e6b7d19a -------------------------------- Trond Myklebust reported soft lockups in XFS IO completion such as this: watchdog: BUG: soft lockup - CPU#12 stuck for 23s! [kworker/12:1:3106] CPU: 12 PID: 3106 Comm: kworker/12:1 Not tainted 4.18.0-305.10.2.el8_4.x86_64 #1 Workqueue: xfs-conv/md127 xfs_end_io [xfs] RIP: 0010:_raw_spin_unlock_irqrestore+0x11/0x20 Call Trace: wake_up_page_bit+0x8a/0x110 iomap_finish_ioend+0xd7/0x1c0 iomap_finish_ioends+0x7f/0xb0 xfs_end_ioend+0x6b/0x100 [xfs] xfs_end_io+0xb9/0xe0 [xfs] process_one_work+0x1a7/0x360 worker_thread+0x1fa/0x390 kthread+0x116/0x130 ret_from_fork+0x35/0x40 Ioends are processed as an atomic completion unit when all the chained bios in the ioend have completed their IO. Logically contiguous ioends can also be merged and completed as a single, larger unit. Both of these things can be problematic as both the bio chains per ioend and the size of the merged ioends processed as a single completion are both unbound. If we have a large sequential dirty region in the page cache, write_cache_pages() will keep feeding us sequential pages and we will keep mapping them into ioends and bios until we get a dirty page at a non-sequential file offset. These large sequential runs can will result in bio and ioend chaining to optimise the io patterns. The pages iunder writeback are pinned within these chains until the submission chaining is broken, allowing the entire chain to be completed. This can result in huge chains being processed in IO completion context. We get deep bio chaining if we have large contiguous physical extents. We will keep adding pages to the current bio until it is full, then we'll chain a new bio to keep adding pages for writeback. Hence we can build bio chains that map millions of pages and tens of gigabytes of RAM if the page cache contains big enough contiguous dirty file regions. This long bio chain pins those pages until the final bio in the chain completes and the ioend can iterate all the chained bios and complete them. OTOH, if we have a physically fragmented file, we end up submitting one ioend per physical fragment that each have a small bio or bio chain attached to them. We do not chain these at IO submission time, but instead we chain them at completion time based on file offset via iomap_ioend_try_merge(). Hence we can end up with unbound ioend chains being built via completion merging. XFS can then do COW remapping or unwritten extent conversion on that merged chain, which involves walking an extent fragment at a time and running a transaction to modify the physical extent information. IOWs, we merge all the discontiguous ioends together into a contiguous file range, only to then process them individually as discontiguous extents. This extent manipulation is computationally expensive and can run in a tight loop, so merging logically contiguous but physically discontigous ioends gains us nothing except for hiding the fact the fact we broke the ioends up into individual physical extents at submission and then need to loop over those individual physical extents at completion. Hence we need to have mechanisms to limit ioend sizes and to break up completion processing of large merged ioend chains: 1. bio chains per ioend need to be bound in length. Pure overwrites go straight to iomap_finish_ioend() in softirq context with the exact bio chain attached to the ioend by submission. Hence the only way to prevent long holdoffs here is to bound ioend submission sizes because we can't reschedule in softirq context. 2. iomap_finish_ioends() has to handle unbound merged ioend chains correctly. This relies on any one call to iomap_finish_ioend() being bound in runtime so that cond_resched() can be issued regularly as the long ioend chain is processed. i.e. this relies on mechanism #1 to limit individual ioend sizes to work correctly. 3. filesystems have to loop over the merged ioends to process physical extent manipulations. This means they can loop internally, and so we break merging at physical extent boundaries so the filesystem can easily insert reschedule points between individual extent manipulations. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reported-and-tested-by: NTrond Myklebust <trondmy@hammerspace.com> Reviewed-by: NDarrick J. Wong <djwong@kernel.org> Signed-off-by: NDarrick J. Wong <djwong@kernel.org> Conflicts: include/linux/iomap.h fs/iomap/buffered-io.c fs/xfs/xfs_aops.c [ 6e552494 ("iomap: remove unused private field from ioend") is not applied. 95c4cd05 ("iomap: Convert to_iomap_page to take a folio") is not applied. 8ffd74e9 ("iomap: Convert bio completions to use folios") is not applied. 044c6449 ("xfs: drop unused ioend private merge and setfilesize code") is not applied. ] Signed-off-by: NZhihao Cheng <chengzhihao1@huawei.com> Reviewed-by: NZhang Yi <yi.zhang@huawei.com> Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
-
- 15 3月, 2023 1 次提交
-
-
由 Dave Chinner 提交于
mainline inclusion from mainline-v5.17-rc6 commit d2d7c047 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I4KIAO CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d2d7c0473586d2f22e85d615275f34cf19f94447 -------------------------------- Most buffer io list operations are run with the bp->b_lock held, but xfs_iflush_abort() can be called without the buffer lock being held resulting in inodes being removed from the buffer list while other list operations are occurring. This causes problems with corrupted bp->b_io_list inode lists during filesystem shutdown, leading to traversals that never end, double removals from the AIL, etc. Fix this by passing the buffer to xfs_iflush_abort() if we have it locked. If the inode is attached to the buffer, we're going to have to remove it from the buffer list and we'd have to get the buffer off the inode log item to do that anyway. If we don't have a buffer passed in (e.g. from xfs_reclaim_inode()) then we can determine if the inode has a log item and if it is attached to a buffer before we do anything else. If it does have an attached buffer, we can lock it safely (because the inode has a reference to it) and then perform the inode abort. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NDarrick J. Wong <djwong@kernel.org> Signed-off-by: NDarrick J. Wong <djwong@kernel.org> conflicts: fs/xfs/xfs_icache.c Signed-off-by: NLong Li <leo.lilong@huawei.com> Reviewed-by: NZhang Yi <yi.zhang@huawei.com> Reviewed-by: NYang Erkun <yangerkun@huawei.com> Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
-
- 08 2月, 2023 1 次提交
-
-
由 Dave Chinner 提交于
mainline inclusion from mainline-v5.19-rc1 commit c230a4a8 category: bugfix bugzilla: 187372, https://gitee.com/openeuler/kernel/issues/I5K0OM CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c230a4a85bcdbfc1a7415deec6caf04e8fca1301 -------------------------------- Ever since we added shadown format buffers to the log items, log items need to handle the item being released with shadow buffers attached. Due to the fact this requirement was added at the same time we added new rmap/reflink intents, we missed the cleanup of those items. In theory, this means shadow buffers can be leaked in a very small window when a shutdown is initiated. Testing with KASAN shows this leak does not happen in practice - we haven't identified a single leak in several years of shutdown testing since ~v4.8 kernels. However, the intent whiteout cleanup mechanism results in every cancelled intent in exactly the same state as this tiny race window creates and so if intents down clean up shadow buffers on final release we will leak the shadow buffer for just about every intent we create. Hence we start with this patch to close this condition off and ensure that when whiteouts start to be used we don't leak lots of memory. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NDarrick J. Wong <djwong@kernel.org> Reviewed-by: NAllison Henderson <allison.henderson@oracle.com> Signed-off-by: NDave Chinner <david@fromorbit.com> conflicts: fs/xfs/xfs_bmap_item.c fs/xfs/xfs_icreate_item.c fs/xfs/xfs_refcount_item.c fs/xfs/xfs_rmap_item.c Signed-off-by: NLi Nan <linan122@huawei.com> Reviewed-by: NYang Erkun <yangerkun@huawei.com> Reviewed-by: NZhang Yi <yi.zhang@huawei.com> Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
-
- 31 1月, 2023 1 次提交
-
-
由 Wu Guanghao 提交于
mainline inclusion from mainline-v6.2-rc1 commit 4da11251 category: bugfix bugzilla: 187874,https://gitee.com/openeuler/kernel/issues/I4KIAO CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4da112513c01d7d0acf1025b8764349d46e177d6 -------------------------------- We are doing a test about deleting a large number of files when memory is low. A deadlock problem was found. [ 1240.279183] -> #1 (fs_reclaim){+.+.}-{0:0}: [ 1240.280450] lock_acquire+0x197/0x460 [ 1240.281548] fs_reclaim_acquire.part.0+0x20/0x30 [ 1240.282625] kmem_cache_alloc+0x2b/0x940 [ 1240.283816] xfs_trans_alloc+0x8a/0x8b0 [ 1240.284757] xfs_inactive_ifree+0xe4/0x4e0 [ 1240.285935] xfs_inactive+0x4e9/0x8a0 [ 1240.286836] xfs_inodegc_worker+0x160/0x5e0 [ 1240.287969] process_one_work+0xa19/0x16b0 [ 1240.289030] worker_thread+0x9e/0x1050 [ 1240.290131] kthread+0x34f/0x460 [ 1240.290999] ret_from_fork+0x22/0x30 [ 1240.291905] [ 1240.291905] -> #0 ((work_completion)(&gc->work)){+.+.}-{0:0}: [ 1240.293569] check_prev_add+0x160/0x2490 [ 1240.294473] __lock_acquire+0x2c4d/0x5160 [ 1240.295544] lock_acquire+0x197/0x460 [ 1240.296403] __flush_work+0x6bc/0xa20 [ 1240.297522] xfs_inode_mark_reclaimable+0x6f0/0xdc0 [ 1240.298649] destroy_inode+0xc6/0x1b0 [ 1240.299677] dispose_list+0xe1/0x1d0 [ 1240.300567] prune_icache_sb+0xec/0x150 [ 1240.301794] super_cache_scan+0x2c9/0x480 [ 1240.302776] do_shrink_slab+0x3f0/0xaa0 [ 1240.303671] shrink_slab+0x170/0x660 [ 1240.304601] shrink_node+0x7f7/0x1df0 [ 1240.305515] balance_pgdat+0x766/0xf50 [ 1240.306657] kswapd+0x5bd/0xd20 [ 1240.307551] kthread+0x34f/0x460 [ 1240.308346] ret_from_fork+0x22/0x30 [ 1240.309247] [ 1240.309247] other info that might help us debug this: [ 1240.309247] [ 1240.310944] Possible unsafe locking scenario: [ 1240.310944] [ 1240.312379] CPU0 CPU1 [ 1240.313363] ---- ---- [ 1240.314433] lock(fs_reclaim); [ 1240.315107] lock((work_completion)(&gc->work)); [ 1240.316828] lock(fs_reclaim); [ 1240.318088] lock((work_completion)(&gc->work)); [ 1240.319203] [ 1240.319203] *** DEADLOCK *** ... [ 2438.431081] Workqueue: xfs-inodegc/sda xfs_inodegc_worker [ 2438.432089] Call Trace: [ 2438.432562] __schedule+0xa94/0x1d20 [ 2438.435787] schedule+0xbf/0x270 [ 2438.436397] schedule_timeout+0x6f8/0x8b0 [ 2438.445126] wait_for_completion+0x163/0x260 [ 2438.448610] __flush_work+0x4c4/0xa40 [ 2438.455011] xfs_inode_mark_reclaimable+0x6ef/0xda0 [ 2438.456695] destroy_inode+0xc6/0x1b0 [ 2438.457375] dispose_list+0xe1/0x1d0 [ 2438.458834] prune_icache_sb+0xe8/0x150 [ 2438.461181] super_cache_scan+0x2b3/0x470 [ 2438.461950] do_shrink_slab+0x3cf/0xa50 [ 2438.462687] shrink_slab+0x17d/0x660 [ 2438.466392] shrink_node+0x87e/0x1d40 [ 2438.467894] do_try_to_free_pages+0x364/0x1300 [ 2438.471188] try_to_free_pages+0x26c/0x5b0 [ 2438.473567] __alloc_pages_slowpath.constprop.136+0x7aa/0x2100 [ 2438.482577] __alloc_pages+0x5db/0x710 [ 2438.485231] alloc_pages+0x100/0x200 [ 2438.485923] allocate_slab+0x2c0/0x380 [ 2438.486623] ___slab_alloc+0x41f/0x690 [ 2438.490254] __slab_alloc+0x54/0x70 [ 2438.491692] kmem_cache_alloc+0x23e/0x270 [ 2438.492437] xfs_trans_alloc+0x88/0x880 [ 2438.493168] xfs_inactive_ifree+0xe2/0x4e0 [ 2438.496419] xfs_inactive+0x4eb/0x8b0 [ 2438.497123] xfs_inodegc_worker+0x16b/0x5e0 [ 2438.497918] process_one_work+0xbf7/0x1a20 [ 2438.500316] worker_thread+0x8c/0x1060 [ 2438.504938] ret_from_fork+0x22/0x30 When the memory is insufficient, xfs_inonodegc_worker will trigger memory reclamation when memory is allocated, then flush_work() may be called to wait for the work to complete. This causes a deadlock. So use memalloc_nofs_save() to avoid triggering memory reclamation in xfs_inodegc_worker. Signed-off-by: NWu Guanghao <wuguanghao3@huawei.com> Reviewed-by: NDarrick J. Wong <djwong@kernel.org> Signed-off-by: NDarrick J. Wong <djwong@kernel.org> Signed-off-by: NGuo Xuenan <guoxuenan@huawei.com> Reviewed-by: NZhang Yi <yi.zhang@huawei.com> Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
-
- 18 1月, 2023 4 次提交
-
-
由 Darrick J. Wong 提交于
stable inclusion from stable-v5.10.141 commit f168801da95fe62c6751235665c27edf5ca2458a category: bugfix bugzilla: 188251,https://gitee.com/openeuler/kernel/issues/I685FC CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=f168801da95fe62c6751235665c27edf5ca2458a -------------------------------- commit 82be38bc upstream. Due to cycling of m_sb_lock, it's possible for multiple callers of xfs_reserve_blocks to race at changing the pool size, subtracting blocks from fdblocks, and actually putting it in the pool. The result of all this is that we can overfill the reserve pool to hilarious levels. xfs_mod_fdblocks, when called with a positive value, already knows how to take freed blocks and either fill the reserve until it's full, or put them in fdblocks. Use that instead of setting m_resblks_avail directly. Signed-off-by: NDarrick J. Wong <djwong@kernel.org> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NAmir Goldstein <amir73il@gmail.com> Acked-by: NDarrick J. Wong <djwong@kernel.org> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Nyangerkun <yangerkun@huawei.com> Reviewed-by: NZhang Yi <yi.zhang@huawei.com> Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
-
由 Darrick J. Wong 提交于
stable inclusion from stable-v5.10.141 commit 72a259bdd50dd6646a88e29fc769e50377e06d57 category: bugfix bugzilla: 188251,https://gitee.com/openeuler/kernel/issues/I685FC CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=72a259bdd50dd6646a88e29fc769e50377e06d57 -------------------------------- commit 0baa2657 upstream. Nowadays, xfs_mod_fdblocks will always choose to fill the reserve pool with freed blocks before adding to fdblocks. Therefore, we can change the behavior of xfs_reserve_blocks slightly -- setting the target size of the pool should always succeed, since a deficiency will eventually be made up as blocks get freed. Signed-off-by: NDarrick J. Wong <djwong@kernel.org> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NAmir Goldstein <amir73il@gmail.com> Acked-by: NDarrick J. Wong <djwong@kernel.org> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Nyangerkun <yangerkun@huawei.com> Reviewed-by: NZhang Yi <yi.zhang@huawei.com> Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
-
由 Darrick J. Wong 提交于
stable inclusion from stable-v5.10.141 commit cb41f22df3ec7b0b4f7cd9a730a538645e324f2f category: bugfix bugzilla: 188251,https://gitee.com/openeuler/kernel/issues/I685FC CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=cb41f22df3ec7b0b4f7cd9a730a538645e324f2f -------------------------------- commit 15f04fdc upstream. [Added wrapper xfs_fdblocks_unavailable() for 5.10.y backport] Infinite loops in kernel code are scary. Calls to xfs_reserve_blocks should be rare (people should just use the defaults!) so we really don't need to try so hard. Simplify the logic here by removing the infinite loop. Cc: Brian Foster <bfoster@redhat.com> Signed-off-by: NDarrick J. Wong <djwong@kernel.org> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NAmir Goldstein <amir73il@gmail.com> Acked-by: NDarrick J. Wong <djwong@kernel.org> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Conflicts: fs/xfs/xfs_mount.h Signed-off-by: Nyangerkun <yangerkun@huawei.com> Reviewed-by: NZhang Yi <yi.zhang@huawei.com> Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
-
由 Dave Chinner 提交于
stable inclusion from stable-v5.10.129 commit b261cd005ab980c4018634a849f77e036bfd4f80 category: bugfix bugzilla: 188251,https://gitee.com/openeuler/kernel/issues/I5YNDQ CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=b261cd005ab980c4018634a849f77e036bfd4f80 -------------------------------- commit 756b1c34 upstream. Because the iomap code using PF_MEMALLOC_NOFS to detect transaction recursion in XFS is just wrong. Remove it from the iomap code and replace it with XFS specific internal checks using current->journal_info instead. [djwong: This change also realigns the lifetime of NOFS flag changes to match the incore transaction, instead of the inconsistent scheme we have now.] Fixes: 9070733b ("xfs: abstract PF_FSTRANS to PF_MEMALLOC_NOFS") Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NDarrick J. Wong <djwong@kernel.org> Signed-off-by: NDarrick J. Wong <djwong@kernel.org> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAmir Goldstein <amir73il@gmail.com> Acked-by: NDarrick J. Wong <djwong@kernel.org> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Conflicts: fs/xfs/xfs_aops.c fs/xfs/xfs_trans.c fs/xfs/xfs_trans.h Signed-off-by: Nyangerkun <yangerkun@huawei.com> Reviewed-by: NZhang Yi <yi.zhang@huawei.com> Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
-
- 11 1月, 2023 1 次提交
-
-
由 Darrick J. Wong 提交于
mainline inclusion from mainline-v5.19-rc5 commit 95ff0363 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I4KIAO CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=95ff0363f3f6ae70c21a0f2b0603e54438e5988b -------------------------------- The kernel build robot reported a UAF error while running xfs/433 (edited somewhat for brevity): BUG: KASAN: use-after-free in xfs_attr3_node_inactive (fs/xfs/xfs_attr_inactive.c:214) xfs Read of size 4 at addr ffff88820ac2bd44 by task kworker/0:2/139 CPU: 0 PID: 139 Comm: kworker/0:2 Tainted: G S 5.19.0-rc2-00004-g7cf2b0f9 #1 Hardware name: Hewlett-Packard p6-1451cx/2ADA, BIOS 8.15 02/05/2013 Workqueue: xfs-inodegc/sdb4 xfs_inodegc_worker [xfs] Call Trace: <TASK> dump_stack_lvl (lib/dump_stack.c:107 (discriminator 1)) print_address_description+0x1f/0x200 print_report.cold (mm/kasan/report.c:430) kasan_report (mm/kasan/report.c:162 mm/kasan/report.c:493) xfs_attr3_node_inactive (fs/xfs/xfs_attr_inactive.c:214) xfs xfs_attr3_root_inactive (fs/xfs/xfs_attr_inactive.c:296) xfs xfs_attr_inactive (fs/xfs/xfs_attr_inactive.c:371) xfs xfs_inactive (fs/xfs/xfs_inode.c:1781) xfs xfs_inodegc_worker (fs/xfs/xfs_icache.c:1837 fs/xfs/xfs_icache.c:1860) xfs process_one_work worker_thread kthread ret_from_fork </TASK> Allocated by task 139: kasan_save_stack (mm/kasan/common.c:39) __kasan_slab_alloc (mm/kasan/common.c:45 mm/kasan/common.c:436 mm/kasan/common.c:469) kmem_cache_alloc (mm/slab.h:750 mm/slub.c:3214 mm/slub.c:3222 mm/slub.c:3229 mm/slub.c:3239) _xfs_buf_alloc (include/linux/instrumented.h:86 include/linux/atomic/atomic-instrumented.h:41 fs/xfs/xfs_buf.c:232) xfs xfs_buf_get_map (fs/xfs/xfs_buf.c:660) xfs xfs_buf_read_map (fs/xfs/xfs_buf.c:777) xfs xfs_trans_read_buf_map (fs/xfs/xfs_trans_buf.c:289) xfs xfs_da_read_buf (fs/xfs/libxfs/xfs_da_btree.c:2652) xfs xfs_da3_node_read (fs/xfs/libxfs/xfs_da_btree.c:392) xfs xfs_attr3_root_inactive (fs/xfs/xfs_attr_inactive.c:272) xfs xfs_attr_inactive (fs/xfs/xfs_attr_inactive.c:371) xfs xfs_inactive (fs/xfs/xfs_inode.c:1781) xfs xfs_inodegc_worker (fs/xfs/xfs_icache.c:1837 fs/xfs/xfs_icache.c:1860) xfs process_one_work worker_thread kthread ret_from_fork Freed by task 139: kasan_save_stack (mm/kasan/common.c:39) kasan_set_track (mm/kasan/common.c:45) kasan_set_free_info (mm/kasan/generic.c:372) __kasan_slab_free (mm/kasan/common.c:368 mm/kasan/common.c:328 mm/kasan/common.c:374) kmem_cache_free (mm/slub.c:1753 mm/slub.c:3507 mm/slub.c:3524) xfs_buf_rele (fs/xfs/xfs_buf.c:1040) xfs xfs_attr3_node_inactive (fs/xfs/xfs_attr_inactive.c:210) xfs xfs_attr3_root_inactive (fs/xfs/xfs_attr_inactive.c:296) xfs xfs_attr_inactive (fs/xfs/xfs_attr_inactive.c:371) xfs xfs_inactive (fs/xfs/xfs_inode.c:1781) xfs xfs_inodegc_worker (fs/xfs/xfs_icache.c:1837 fs/xfs/xfs_icache.c:1860) xfs process_one_work worker_thread kthread ret_from_fork I reproduced this for my own satisfaction, and got the same report, along with an extra morsel: The buggy address belongs to the object at ffff88802103a800 which belongs to the cache xfs_buf of size 432 The buggy address is located 396 bytes inside of 432-byte region [ffff88802103a800, ffff88802103a9b0) I tracked this code down to: error = xfs_trans_get_buf(*trans, mp->m_ddev_targp, child_blkno, XFS_FSB_TO_BB(mp, mp->m_attr_geo->fsbcount), 0, &child_bp); if (error) return error; error = bp->b_error; That doesn't look right -- I think this should be dereferencing child_bp, not bp. Looking through the codebase history, I think this was added by commit 2911edb6 ("xfs: remove the mappedbno argument to xfs_da_get_buf"), which replaced a call to xfs_da_get_buf with the current call to xfs_trans_get_buf. Not sure why we trans_brelse'd @bp earlier in the function, but I'm guessing it's to avoid pinning too many buffers in memory while we inactivate the bottom of the attr tree. Hence we now have to get the buffer back. I /think/ this was supposed to check child_bp->b_error and fail the rest of the invalidation if child_bp had experienced any kind of IO or corruption error. I bet the xfs_da3_node_read earlier in the loop will catch most cases of incoming on-disk corruption which makes this check mostly moot unless someone corrupts the buffer and the AIL pushes it out to disk while the buffer's unlocked. In the first case we'll never get to the bad check, and in the second case the AIL will shut down the log, at which point there's no reason to check b_error. Remove the check, and null out @bp to avoid this problem in the future. Cc: hch@lst.de Reported-by: Nkernel test robot <oliver.sang@intel.com> Fixes: 2911edb6 ("xfs: remove the mappedbno argument to xfs_da_get_buf") Signed-off-by: NDarrick J. Wong <djwong@kernel.org> Reviewed-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NLong Li <leo.lilong@huawei.com> Reviewed-by: NZhang Yi <yi.zhang@huawei.com> Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
-
- 06 1月, 2023 3 次提交
-
-
由 Guo Xuenan 提交于
mainline inclusion from mainline-v6.1-rc4 commit 575689fc category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I4KIAO CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=575689fc0ffa6c4bb4e72fd18e31a6525a6124e0 -------------------------------- xfs log io error will trigger xlog shut down, and end_io worker call xlog_state_shutdown_callbacks to unpin and release the buf log item. The race condition is that when there are some thread doing transaction commit and happened not to be intercepted by xlog_is_shutdown, then, these log item will be insert into CIL, when unpin and release these buf log item, UAF will occur. BTW, add delay before `xlog_cil_commit` can increase recurrence probability. The following call graph actually encountered this bad situation. fsstress io end worker kworker/0:1H-216 xlog_ioend_work ->xlog_force_shutdown ->xlog_state_shutdown_callbacks ->xlog_cil_process_committed ->xlog_cil_committed ->xfs_trans_committed_bulk ->xfs_trans_apply_sb_deltas ->li_ops->iop_unpin(lip, 1); ->xfs_trans_getsb ->_xfs_trans_bjoin ->xfs_buf_item_init ->if (bip) { return 0;} //relog ->xlog_cil_commit ->xlog_cil_insert_items //insert into CIL ->xfs_buf_ioend_fail(bp); ->xfs_buf_ioend ->xfs_buf_item_done ->xfs_buf_item_relse ->xfs_buf_item_free when cil push worker gather percpu cil and insert super block buf log item into ctx->log_items then uaf occurs. ================================================================== BUG: KASAN: use-after-free in xlog_cil_push_work+0x1c8f/0x22f0 Write of size 8 at addr ffff88801800f3f0 by task kworker/u4:4/105 CPU: 0 PID: 105 Comm: kworker/u4:4 Tainted: G W 6.1.0-rc1-00001-g274115149b42 #136 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014 Workqueue: xfs-cil/sda xlog_cil_push_work Call Trace: <TASK> dump_stack_lvl+0x4d/0x66 print_report+0x171/0x4a6 kasan_report+0xb3/0x130 xlog_cil_push_work+0x1c8f/0x22f0 process_one_work+0x6f9/0xf70 worker_thread+0x578/0xf30 kthread+0x28c/0x330 ret_from_fork+0x1f/0x30 </TASK> Allocated by task 2145: kasan_save_stack+0x1e/0x40 kasan_set_track+0x21/0x30 __kasan_slab_alloc+0x54/0x60 kmem_cache_alloc+0x14a/0x510 xfs_buf_item_init+0x160/0x6d0 _xfs_trans_bjoin+0x7f/0x2e0 xfs_trans_getsb+0xb6/0x3f0 xfs_trans_apply_sb_deltas+0x1f/0x8c0 __xfs_trans_commit+0xa25/0xe10 xfs_symlink+0xe23/0x1660 xfs_vn_symlink+0x157/0x280 vfs_symlink+0x491/0x790 do_symlinkat+0x128/0x220 __x64_sys_symlink+0x7a/0x90 do_syscall_64+0x35/0x80 entry_SYSCALL_64_after_hwframe+0x63/0xcd Freed by task 216: kasan_save_stack+0x1e/0x40 kasan_set_track+0x21/0x30 kasan_save_free_info+0x2a/0x40 __kasan_slab_free+0x105/0x1a0 kmem_cache_free+0xb6/0x460 xfs_buf_ioend+0x1e9/0x11f0 xfs_buf_item_unpin+0x3d6/0x840 xfs_trans_committed_bulk+0x4c2/0x7c0 xlog_cil_committed+0xab6/0xfb0 xlog_cil_process_committed+0x117/0x1e0 xlog_state_shutdown_callbacks+0x208/0x440 xlog_force_shutdown+0x1b3/0x3a0 xlog_ioend_work+0xef/0x1d0 process_one_work+0x6f9/0xf70 worker_thread+0x578/0xf30 kthread+0x28c/0x330 ret_from_fork+0x1f/0x30 The buggy address belongs to the object at ffff88801800f388 which belongs to the cache xfs_buf_item of size 272 The buggy address is located 104 bytes inside of 272-byte region [ffff88801800f388, ffff88801800f498) The buggy address belongs to the physical page: page:ffffea0000600380 refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff88801800f208 pfn:0x1800e head:ffffea0000600380 order:1 compound_mapcount:0 compound_pincount:0 flags: 0x1fffff80010200(slab|head|node=0|zone=1|lastcpupid=0x1fffff) raw: 001fffff80010200 ffffea0000699788 ffff88801319db50 ffff88800fb50640 raw: ffff88801800f208 000000000015000a 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff88801800f280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff88801800f300: fb fb fb fc fc fc fc fc fc fc fc fc fc fc fc fc >ffff88801800f380: fc fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff88801800f400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff88801800f480: fb fb fb fc fc fc fc fc fc fc fc fc fc fc fc fc ================================================================== Disabling lock debugging due to kernel taint Signed-off-by: NGuo Xuenan <guoxuenan@huawei.com> Reviewed-by: NDarrick J. Wong <djwong@kernel.org> Signed-off-by: NDarrick J. Wong <djwong@kernel.org> Signed-off-by: NGuo Xuenan <guoxuenan@huawei.com> Reviewed-by: NZhang Yi <yi.zhang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Guo Xuenan 提交于
mainline inclusion from mainline-v6.1-rc4 commit 1eb52a6a category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I4KIAO CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=1eb52a6a71981b80f9acbd915acd6a05a5037196 -------------------------------- Fix uaf in xfs_trans_ail_delete during xlog force shutdown. In commit cd6f79d1 ("xfs: run callbacks before waking waiters in xlog_state_shutdown_callbacks") changed the order of running callbacks and wait for iclog completion to avoid unmount path untimely destroy AIL. But which seems not enough to ensue this, adding mdelay in `xfs_buf_item_unpin` can prove that. The reproduction is as follows. To ensure destroy AIL safely, we should wait all xlog ioend workers done and sync the AIL. ================================================================== BUG: KASAN: use-after-free in xfs_trans_ail_delete+0x240/0x2a0 Read of size 8 at addr ffff888023169400 by task kworker/1:1H/43 CPU: 1 PID: 43 Comm: kworker/1:1H Tainted: G W 6.1.0-rc1-00002-gc28266863c4a #137 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014 Workqueue: xfs-log/sda xlog_ioend_work Call Trace: <TASK> dump_stack_lvl+0x4d/0x66 print_report+0x171/0x4a6 kasan_report+0xb3/0x130 xfs_trans_ail_delete+0x240/0x2a0 xfs_buf_item_done+0x7b/0xa0 xfs_buf_ioend+0x1e9/0x11f0 xfs_buf_item_unpin+0x4c8/0x860 xfs_trans_committed_bulk+0x4c2/0x7c0 xlog_cil_committed+0xab6/0xfb0 xlog_cil_process_committed+0x117/0x1e0 xlog_state_shutdown_callbacks+0x208/0x440 xlog_force_shutdown+0x1b3/0x3a0 xlog_ioend_work+0xef/0x1d0 process_one_work+0x6f9/0xf70 worker_thread+0x578/0xf30 kthread+0x28c/0x330 ret_from_fork+0x1f/0x30 </TASK> Allocated by task 9606: kasan_save_stack+0x1e/0x40 kasan_set_track+0x21/0x30 __kasan_kmalloc+0x7a/0x90 __kmalloc+0x59/0x140 kmem_alloc+0xb2/0x2f0 xfs_trans_ail_init+0x20/0x320 xfs_log_mount+0x37e/0x690 xfs_mountfs+0xe36/0x1b40 xfs_fs_fill_super+0xc5c/0x1a70 get_tree_bdev+0x3c5/0x6c0 vfs_get_tree+0x85/0x250 path_mount+0xec3/0x1830 do_mount+0xef/0x110 __x64_sys_mount+0x150/0x1f0 do_syscall_64+0x35/0x80 entry_SYSCALL_64_after_hwframe+0x63/0xcd Freed by task 9662: kasan_save_stack+0x1e/0x40 kasan_set_track+0x21/0x30 kasan_save_free_info+0x2a/0x40 __kasan_slab_free+0x105/0x1a0 __kmem_cache_free+0x99/0x2d0 kvfree+0x3a/0x40 xfs_log_unmount+0x60/0xf0 xfs_unmountfs+0xf3/0x1d0 xfs_fs_put_super+0x78/0x300 generic_shutdown_super+0x151/0x400 kill_block_super+0x9a/0xe0 deactivate_locked_super+0x82/0xe0 deactivate_super+0x91/0xb0 cleanup_mnt+0x32a/0x4a0 task_work_run+0x15f/0x240 exit_to_user_mode_prepare+0x188/0x190 syscall_exit_to_user_mode+0x12/0x30 do_syscall_64+0x42/0x80 entry_SYSCALL_64_after_hwframe+0x63/0xcd The buggy address belongs to the object at ffff888023169400 which belongs to the cache kmalloc-128 of size 128 The buggy address is located 0 bytes inside of 128-byte region [ffff888023169400, ffff888023169480) The buggy address belongs to the physical page: page:ffffea00008c5a00 refcount:1 mapcount:0 mapping:0000000000000000 index:0xffff888023168f80 pfn:0x23168 head:ffffea00008c5a00 order:1 compound_mapcount:0 compound_pincount:0 flags: 0x1fffff80010200(slab|head|node=0|zone=1|lastcpupid=0x1fffff) raw: 001fffff80010200 ffffea00006b3988 ffffea0000577a88 ffff88800f842ac0 raw: ffff888023168f80 0000000000150007 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff888023169300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff888023169380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc >ffff888023169400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff888023169480: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff888023169500: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ================================================================== Disabling lock debugging due to kernel taint Fixes: cd6f79d1 ("xfs: run callbacks before waking waiters in xlog_state_shutdown_callbacks") Signed-off-by: NGuo Xuenan <guoxuenan@huawei.com> Reviewed-by: NDarrick J. Wong <djwong@kernel.org> Signed-off-by: NDarrick J. Wong <djwong@kernel.org> Signed-off-by: NGuo Xuenan <guoxuenan@huawei.com> Reviewed-by: NZhang Yi <yi.zhang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Guo Xuenan 提交于
mainline inclusion from mainline-v6.1-rc4 commit 8c25febf category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I4KIAO CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=8c25febf23963431686f04874b96321288504127 -------------------------------- xfs_btree_check_block contains debugging knobs. With XFS_DEBUG setting up, turn on the debugging knob can trigger the assert of xfs_btree_islastblock, test script as follows: while true do mount $disk $mountpoint fsstress -d $testdir -l 0 -n 10000 -p 4 >/dev/null echo 1 > /sys/fs/xfs/sda/errortag/btree_chk_sblk sleep 10 umount $mountpoint done Kick off fsstress and only *then* turn on the debugging knob. If it happens that the knob gets turned on after the cntbt lookup succeeds but before the call to xfs_btree_islastblock, then we *can* end up in the situation where a previously checked btree block suddenly starts returning EFSCORRUPTED from xfs_btree_check_block. Kaboom. Darrick give a very detailed explanation as follows: Looking back at commit 27d9ee57, I think the point of all this was to make sure that the cursor has actually performed a lookup, and that the btree block at whatever level we're asking about is ok. If the caller hasn't ever done a lookup, the bc_levels array will be empty, so cur->bc_levels[level].bp pointer will be NULL. The call to xfs_btree_get_block will crash anyway, so the "ASSERT(block);" part is pointless. If the caller did a lookup but the lookup failed due to block corruption, the corresponding cur->bc_levels[level].bp pointer will also be NULL, and we'll still crash. The "ASSERT(xfs_btree_check_block);" logic is also unnecessary. If the cursor level points to an inode root, the block buffer will be incore, so it had better always be consistent. If the caller ignores a failed lookup after a successful one and calls this function, the cursor state is garbage and the assert wouldn't have tripped anyway. So get rid of the assert. Fixes: 27d9ee57 ("xfs: actually check xfs_btree_check_block return in xfs_btree_islastblock") Signed-off-by: NGuo Xuenan <guoxuenan@huawei.com> Reviewed-by: NDarrick J. Wong <djwong@kernel.org> Signed-off-by: NDarrick J. Wong <djwong@kernel.org> Signed-off-by: NGuo Xuenan <guoxuenan@huawei.com> Reviewed-by: NZhang Yi <yi.zhang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
- 07 12月, 2022 28 次提交
-
-
由 Li Zetao 提交于
mainline inclusion from mainline-v6.1-rc1 commit d08af403 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I4KIAO Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d08af40340cad0e025d643c3982781a8f99d5032 -------------------------------- kmemleak reported a sequence of memory leaks, and one of them indicated we failed to free a pointer: comm "mount", pid 19610, jiffies 4297086464 (age 60.635s) hex dump (first 8 bytes): 73 64 61 00 81 88 ff ff sda..... backtrace: [<00000000d77f3e04>] kstrdup_const+0x46/0x70 [<00000000e51fa804>] kobject_set_name_vargs+0x2f/0xb0 [<00000000247cd595>] kobject_init_and_add+0xb0/0x120 [<00000000f9139aaf>] xfs_mountfs+0x367/0xfc0 [<00000000250d3caf>] xfs_fs_fill_super+0xa16/0xdc0 [<000000008d873d38>] get_tree_bdev+0x256/0x390 [<000000004881f3fa>] vfs_get_tree+0x41/0xf0 [<000000008291ab52>] path_mount+0x9b3/0xdd0 [<0000000022ba8f2d>] __x64_sys_mount+0x190/0x1d0 As mentioned in kobject_init_and_add() comment, if this function returns an error, kobject_put() must be called to properly clean up the memory associated with the object. Apparently, xfs_sysfs_init() does not follow such a requirement. When kobject_init_and_add() returns an error, the space of kobj->kobject.name alloced by kstrdup_const() is unfree, which will cause the above stack. Fix it by adding kobject_put() when kobject_init_and_add returns an error. Fixes: a31b1d3d ("xfs: add xfs_mount sysfs kobject") Signed-off-by: NLi Zetao <lizetao1@huawei.com> Reviewed-by: NDarrick J. Wong <djwong@kernel.org> Signed-off-by: NDarrick J. Wong <djwong@kernel.org> Signed-off-by: NGuo Xuenan <guoxuenan@huawei.com> Reviewed-by: NZhang Yi <yi.zhang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Zeng Heng 提交于
mainline inclusion from mainline-v6.1-rc1 commit cf4f4c12 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I4KIAO Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=cf4f4c12dea7a977a143c8fe5af1740b7f9876f8 -------------------------------- When `xfs_sysfs_init` returns failed, `mp->m_errortag` needs to free. Otherwise kmemleak would report memory leak after mounting xfs image: unreferenced object 0xffff888101364900 (size 192): comm "mount", pid 13099, jiffies 4294915218 (age 335.207s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<00000000f08ad25c>] __kmalloc+0x41/0x1b0 [<00000000dca9aeb6>] kmem_alloc+0xfd/0x430 [<0000000040361882>] xfs_errortag_init+0x20/0x110 [<00000000b384a0f6>] xfs_mountfs+0x6ea/0x1a30 [<000000003774395d>] xfs_fs_fill_super+0xe10/0x1a80 [<000000009cf07b6c>] get_tree_bdev+0x3e7/0x700 [<00000000046b5426>] vfs_get_tree+0x8e/0x2e0 [<00000000952ec082>] path_mount+0xf8c/0x1990 [<00000000beb1f838>] do_mount+0xee/0x110 [<000000000e9c41bb>] __x64_sys_mount+0x14b/0x1f0 [<00000000f7bb938e>] do_syscall_64+0x3b/0x90 [<000000003fcd67a9>] entry_SYSCALL_64_after_hwframe+0x63/0xcd Fixes: c6840101 ("xfs: expose errortag knobs via sysfs") Signed-off-by: NZeng Heng <zengheng4@huawei.com> Reviewed-by: NDarrick J. Wong <djwong@kernel.org> Signed-off-by: NDarrick J. Wong <djwong@kernel.org> Signed-off-by: NGuo Xuenan <guoxuenan@huawei.com> Reviewed-by: NZhang Yi <yi.zhang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Long Li 提交于
Offering: HULK hulk inclusion category: bugfix bugzilla: 186982,https://gitee.com/openeuler/kernel/issues/I4KIAO -------------------------------- When lazysbcount is enabled, fsstress and loop mount/unmount test report the following problems: XFS (loop0): SB summary counter sanity check failed XFS (loop0): Metadata corruption detected at xfs_sb_write_verify+0x13b/0x460, xfs_sb block 0x0 XFS (loop0): Unmount and run xfs_repair XFS (loop0): First 128 bytes of corrupted metadata buffer: 00000000: 58 46 53 42 00 00 10 00 00 00 00 00 00 28 00 00 XFSB.........(.. 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00000020: 69 fb 7c cd 5f dc 44 af 85 74 e0 cc d4 e3 34 5a i.|._.D..t....4Z 00000030: 00 00 00 00 00 20 00 06 00 00 00 00 00 00 00 80 ..... .......... 00000040: 00 00 00 00 00 00 00 81 00 00 00 00 00 00 00 82 ................ 00000050: 00 00 00 01 00 0a 00 00 00 00 00 04 00 00 00 00 ................ 00000060: 00 00 0a 00 b4 b5 02 00 02 00 00 08 00 00 00 00 ................ 00000070: 00 00 00 00 00 00 00 00 0c 09 09 03 14 00 00 19 ................ XFS (loop0): Corruption of in-memory data (0x8) detected at _xfs_buf_ioapply +0xe1e/0x10e0 (fs/xfs/xfs_buf.c:1580). Shutting down filesystem. XFS (loop0): Please unmount the filesystem and rectify the problem(s) XFS (loop0): log mount/recovery failed: error -117 XFS (loop0): log mount failed This corruption will shutdown the file system and the file system will no longer be mountable. The following script can reproduce the problem, but it may take a long time. #!/bin/bash device=/dev/sda testdir=/mnt/test round=0 function fail() { echo "$*" exit 1 } mkdir -p $testdir while [ $round -lt 10000 ] do echo "******* round $round ********" mkfs.xfs -f $device mount $device $testdir || fail "mount failed!" fsstress -d $testdir -l 0 -n 10000 -p 4 >/dev/null & sleep 4 killall -w fsstress umount $testdir xfs_repair -e $device > /dev/null if [ $? -eq 2 ];then echo "ERR CODE 2: Dirty log exception during repair." exit 1 fi round=$(($round+1)) done With lazysbcount is enabled, There is no additional lock protection for reading m_ifree and m_icount in xfs_log_sb(), if other cpu modifies the m_ifree, this will make the m_ifree greater than m_icount. For example, consider the following sequence and ifreedelta is postive: CPU0 CPU1 xfs_log_sb xfs_trans_unreserve_and_mod_sb ---------- ------------------------------ percpu_counter_sum(&mp->m_icount) percpu_counter_add_batch(&mp->m_icount, idelta, XFS_ICOUNT_BATCH) percpu_counter_add(&mp->m_ifree, ifreedelta); percpu_counter_sum(&mp->m_ifree) After this, incorrect inode count (sb_ifree > sb_icount) will be writen to the log. In the subsequent writing of sb, incorrect inode count (sb_ifree > sb_icount) will fail to pass the boundary check in xfs_validate_sb_write() that cause the file system shutdown. When lazysbcount is enabled, we don't need to guarantee that Lazy sb counters are completely correct, but we do need to guarantee that sb_ifree <= sb_icount. On the other hand, the constraint that m_ifree <= m_icount must be satisfied any time that there /cannot/ be other threads allocating or freeing inode chunks. If the constraint is violated under these circumstances, sb_i{count,free} (the ondisk superblock inode counters) maybe incorrect and need to be marked sick at unmount, the count will be rebuilt on the next mount. Fixes: 8756a5af ("libxfs: add more bounds checking to sb sanity checks") Signed-off-by: NLong Li <leo.lilong@huawei.com> Signed-off-by: NGuo Xuenan <guoxuenan@huawei.com> Reviewed-by: NZhang Yi <yi.zhang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Darrick J. Wong 提交于
stable inclusion from stable-v5.10.140 commit 1b9b4139d794cf0ae51ba3dd91f009c77fab16d0 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5W3GQ Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=1b9b4139d794cf0ae51ba3dd91f009c77fab16d0 -------------------------------- commit 29d650f7 upstream. Syzbot tripped over the following complaint from the kernel: WARNING: CPU: 2 PID: 15402 at mm/util.c:597 kvmalloc_node+0x11e/0x125 mm/util.c:597 While trying to run XFS_IOC_GETBMAP against the following structure: struct getbmap fubar = { .bmv_count = 0x22dae649, }; Obviously, this is a crazy huge value since the next thing that the ioctl would do is allocate 37GB of memory. This is enough to make kvmalloc mad, but isn't large enough to trip the validation functions. In other words, I'm fussing with checks that were **already sufficient** because that's easier than dealing with 644 internal bug reports. Yes, that's right, six hundred and forty-four. Signed-off-by: NDarrick J. Wong <djwong@kernel.org> Reviewed-by: NAllison Henderson <allison.henderson@oracle.com> Reviewed-by: NCatherine Hoang <catherine.hoang@oracle.com> Signed-off-by: NAmir Goldstein <amir73il@gmail.com> Acked-by: NDarrick J. Wong <djwong@kernel.org> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NWang Hai <wanghai38@huawei.com> Signed-off-by: NGuo Xuenan <guoxuenan@huawei.com> Reviewed-by: NZhang Yi <yi.zhang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Dan Carpenter 提交于
stable inclusion from stable-v5.10.140 commit 6a564bad3a6474a5247491d2b48637ec69d429dd category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5W3GQ Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=6a564bad3a6474a5247491d2b48637ec69d429dd -------------------------------- commit 6ed6356b upstream. The "bufsize" comes from the root user. If "bufsize" is negative then, because of type promotion, neither of the validation checks at the start of the function are able to catch it: if (bufsize < sizeof(struct xfs_attrlist) || bufsize > XFS_XATTR_LIST_MAX) return -EINVAL; This means "bufsize" will trigger (WARN_ON_ONCE(size > INT_MAX)) in kvmalloc_node(). Fix this by changing the type from int to size_t. Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com> Reviewed-by: NDarrick J. Wong <djwong@kernel.org> Signed-off-by: NDarrick J. Wong <djwong@kernel.org> Signed-off-by: NAmir Goldstein <amir73il@gmail.com> Acked-by: NDarrick J. Wong <djwong@kernel.org> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NWang Hai <wanghai38@huawei.com> Signed-off-by: NGuo Xuenan <guoxuenan@huawei.com> Reviewed-by: NZhang Yi <yi.zhang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Darrick J. Wong 提交于
mainline inclusion from mainline-v5.16-rc3 commit 7561cea5 category: bugfix bugzilla: 187526,https://gitee.com/openeuler/kernel/issues/I4KIAO Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=7561cea5dbb97fecb952548a0fb74fb105bf4664 -------------------------------- KASAN reported the following use after free bug when running generic/475: XFS (dm-0): Mounting V5 Filesystem XFS (dm-0): Starting recovery (logdev: internal) XFS (dm-0): Ending recovery (logdev: internal) Buffer I/O error on dev dm-0, logical block 20639616, async page read Buffer I/O error on dev dm-0, logical block 20639617, async page read XFS (dm-0): log I/O error -5 XFS (dm-0): Filesystem has been shut down due to log error (0x2). XFS (dm-0): Unmounting Filesystem XFS (dm-0): Please unmount the filesystem and rectify the problem(s). ================================================================== BUG: KASAN: use-after-free in do_raw_spin_lock+0x246/0x270 Read of size 4 at addr ffff888109dd84c4 by task 3:1H/136 CPU: 3 PID: 136 Comm: 3:1H Not tainted 5.19.0-rc4-xfsx #rc4 8e53ab5ad0fddeb31cee5e7063ff9c361915a9c4 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1 04/01/2014 Workqueue: xfs-log/dm-0 xlog_ioend_work [xfs] Call Trace: <TASK> dump_stack_lvl+0x34/0x44 print_report.cold+0x2b8/0x661 ? do_raw_spin_lock+0x246/0x270 kasan_report+0xab/0x120 ? do_raw_spin_lock+0x246/0x270 do_raw_spin_lock+0x246/0x270 ? rwlock_bug.part.0+0x90/0x90 xlog_force_shutdown+0xf6/0x370 [xfs 4ad76ae0d6add7e8183a553e624c31e9ed567318] xlog_ioend_work+0x100/0x190 [xfs 4ad76ae0d6add7e8183a553e624c31e9ed567318] process_one_work+0x672/0x1040 worker_thread+0x59b/0xec0 ? __kthread_parkme+0xc6/0x1f0 ? process_one_work+0x1040/0x1040 ? process_one_work+0x1040/0x1040 kthread+0x29e/0x340 ? kthread_complete_and_exit+0x20/0x20 ret_from_fork+0x1f/0x30 </TASK> Allocated by task 154099: kasan_save_stack+0x1e/0x40 __kasan_kmalloc+0x81/0xa0 kmem_alloc+0x8d/0x2e0 [xfs] xlog_cil_init+0x1f/0x540 [xfs] xlog_alloc_log+0xd1e/0x1260 [xfs] xfs_log_mount+0xba/0x640 [xfs] xfs_mountfs+0xf2b/0x1d00 [xfs] xfs_fs_fill_super+0x10af/0x1910 [xfs] get_tree_bdev+0x383/0x670 vfs_get_tree+0x7d/0x240 path_mount+0xdb7/0x1890 __x64_sys_mount+0x1fa/0x270 do_syscall_64+0x2b/0x80 entry_SYSCALL_64_after_hwframe+0x46/0xb0 Freed by task 154151: kasan_save_stack+0x1e/0x40 kasan_set_track+0x21/0x30 kasan_set_free_info+0x20/0x30 ____kasan_slab_free+0x110/0x190 slab_free_freelist_hook+0xab/0x180 kfree+0xbc/0x310 xlog_dealloc_log+0x1b/0x2b0 [xfs] xfs_unmountfs+0x119/0x200 [xfs] xfs_fs_put_super+0x6e/0x2e0 [xfs] generic_shutdown_super+0x12b/0x3a0 kill_block_super+0x95/0xd0 deactivate_locked_super+0x80/0x130 cleanup_mnt+0x329/0x4d0 task_work_run+0xc5/0x160 exit_to_user_mode_prepare+0xd4/0xe0 syscall_exit_to_user_mode+0x1d/0x40 entry_SYSCALL_64_after_hwframe+0x46/0xb0 This appears to be a race between the unmount process, which frees the CIL and waits for in-flight iclog IO; and the iclog IO completion. When generic/475 runs, it starts fsstress in the background, waits a few seconds, and substitutes a dm-error device to simulate a disk falling out of a machine. If the fsstress encounters EIO on a pure data write, it will exit but the filesystem will still be online. The next thing the test does is unmount the filesystem, which tries to clean the log, free the CIL, and wait for iclog IO completion. If an iclog was being written when the dm-error switch occurred, it can race with log unmounting as follows: Thread 1 Thread 2 xfs_log_unmount xfs_log_clean xfs_log_quiesce xlog_ioend_work <observe error> xlog_force_shutdown test_and_set_bit(XLOG_IOERROR) xfs_log_force <log is shut down, nop> xfs_log_umount_write <log is shut down, nop> xlog_dealloc_log xlog_cil_destroy <wait for iclogs> spin_lock(&log->l_cilp->xc_push_lock) <KABOOM> Therefore, free the CIL after waiting for the iclogs to complete. I /think/ this race has existed for quite a few years now, though I don't remember the ~2014 era logging code well enough to know if it was a real threat then or if the actual race was exposed only more recently. Fixes: ac983517 ("xfs: don't sleep in xlog_cil_force_lsn on shutdown") Signed-off-by: NDarrick J. Wong <djwong@kernel.org> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NGuo Xuenan <guoxuenan@huawei.com> Reviewed-by: NZhang Yi <yi.zhang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Darrick J. Wong 提交于
mainline inclusion from mainline-v5.16-rc3 commit 86d40f1e category: bugfix bugzilla: 187526,https://gitee.com/openeuler/kernel/issues/I4KIAO Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=86d40f1e49e9a909d25c35ba01bea80dbcd758cb -------------------------------- xfs/434 and xfs/436 have been reporting occasional memory leaks of xfs_dquot objects. These tests themselves were the messenger, not the culprit, since they unload the xfs module, which trips the slub debugging code while tearing down all the xfs slab caches: Reviewed-by: NZhang Yi <yi.zhang@huawei.com> ============================================================================= BUG xfs_dquot (Tainted: G W ): Objects remaining in xfs_dquot on __kmem_cache_shutdown() ----------------------------------------------------------------------------- Slab 0xffffea000606de00 objects=30 used=5 fp=0xffff888181b78a78 flags=0x17ff80000010200(slab|head|node=0|zone=2|lastcpupid=0xfff) CPU: 0 PID: 3953166 Comm: modprobe Tainted: G W 5.18.0-rc6-djwx #rc6 d5824be9e46a2393677bda868f9b154d917ca6a7 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20171121_152543-x86-ol7-builder-01.us.oracle.com-4.el7.1 04/01/2014 Since we don't generally rmmod the xfs module between fstests, this means that xfs/434 is really just the canary in the coal mine -- something leaked a dquot, but we don't know who. After days of pounding on fstests with kmemleak enabled, I finally got it to spit this out: unreferenced object 0xffff8880465654c0 (size 536): comm "u10:4", pid 88, jiffies 4294935810 (age 29.512s) hex dump (first 32 bytes): 60 4a 56 46 80 88 ff ff 58 ea e4 5c 80 88 ff ff `JVF....X..\.... 00 e0 52 49 80 88 ff ff 01 00 01 00 00 00 00 00 ..RI............ backtrace: [<ffffffffa0740f6c>] xfs_dquot_alloc+0x2c/0x530 [xfs] [<ffffffffa07443df>] xfs_qm_dqread+0x6f/0x330 [xfs] [<ffffffffa07462a2>] xfs_qm_dqget+0x132/0x4e0 [xfs] [<ffffffffa0756bb0>] xfs_qm_quotacheck_dqadjust+0xa0/0x3e0 [xfs] [<ffffffffa075724d>] xfs_qm_dqusage_adjust+0x35d/0x4f0 [xfs] [<ffffffffa06c9068>] xfs_iwalk_ag_recs+0x348/0x5d0 [xfs] [<ffffffffa06c95d3>] xfs_iwalk_run_callbacks+0x273/0x540 [xfs] [<ffffffffa06c9e8d>] xfs_iwalk_ag+0x5ed/0x890 [xfs] [<ffffffffa06ca22f>] xfs_iwalk_ag_work+0xff/0x170 [xfs] [<ffffffffa06d22c9>] xfs_pwork_work+0x79/0x130 [xfs] [<ffffffff81170bb2>] process_one_work+0x672/0x1040 [<ffffffff81171b1b>] worker_thread+0x59b/0xec0 [<ffffffff8118711e>] kthread+0x29e/0x340 [<ffffffff810032bf>] ret_from_fork+0x1f/0x30 Now we know that quotacheck is at fault, but even this report was canaryish -- it was triggered by xfs/494, which doesn't actually mount any filesystems. (kmemleak can be a little slow to notice leaks, even with fstests repeatedly whacking it to look for them.) Looking at the *previous* fstest, however, showed that the test run before xfs/494 was xfs/117. The tipoff to the problem is in this excerpt from dmesg: XFS (sda4): Quotacheck needed: Please wait. XFS (sda4): Metadata corruption detected at xfs_dinode_verify.part.0+0xdb/0x7b0 [xfs], inode 0x119 dinode XFS (sda4): Unmount and run xfs_repair XFS (sda4): First 128 bytes of corrupted metadata buffer: 00000000: 49 4e 81 a4 03 02 00 00 00 00 00 00 00 00 00 00 IN.............. 00000010: 00 00 00 01 00 00 00 00 00 90 57 54 54 1a 4c 68 ..........WTT.Lh 00000020: 81 f9 7d e1 6d ee 16 00 34 bd 7d e1 6d ee 16 00 ..}.m...4.}.m... 00000030: 34 bd 7d e1 6d ee 16 00 00 00 00 00 00 00 00 00 4.}.m........... 00000040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00000050: 00 00 00 02 00 00 00 00 00 00 00 00 96 80 f3 ab ................ 00000060: ff ff ff ff da 57 7b 11 00 00 00 00 00 00 00 03 .....W{......... 00000070: 00 00 00 01 00 00 00 10 00 00 00 00 00 00 00 08 ................ XFS (sda4): Quotacheck: Unsuccessful (Error -117): Disabling quotas. The dinode verifier decided that the inode was corrupt, which causes iget to return with EFSCORRUPTED. Since this happened during quotacheck, it is obvious that the kernel aborted the inode walk on account of the corruption error and disabled quotas. Unfortunately, we neglect to purge the dquot cache before doing that, which is how the dquots leaked. The problems started 10 years ago in commit b84a3a, when the dquot lists were converted to a radix tree, but the error handling behavior was not correctly preserved -- in that commit, if the bulkstat failed and usrquota was enabled, the bulkstat failure code would be overwritten by the result of flushing all the dquots to disk. As long as that succeeds, we'd continue the quota mount as if everything were ok, but instead we're now operating with a corrupt inode and incorrect quota usage counts. I didn't notice this bug in 2019 when I wrote commit ebd126a6, which changed quotacheck to skip the dqflush when the scan doesn't complete due to inode walk failures. Introduced-by: b84a3a96 ("xfs: remove the per-filesystem list of dquots") Fixes: ebd126a6 ("xfs: convert quotacheck to use the new iwalk functions") Signed-off-by: NDarrick J. Wong <djwong@kernel.org> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com> Signed-off-by: NGuo Xuenan <guoxuenan@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Eric Sandeen 提交于
mainline inclusion from mainline-v5.16-rc3 commit bc37e4fb category: bugfix bugzilla: 187526,https://gitee.com/openeuler/kernel/issues/I4KIAO Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=bc37e4fb5cac2925b2e286b1f1d4fc2b519f7d92 -------------------------------- This reverts commit 4b8628d5. XFS quota has had the concept of a "quota warning limit" since the earliest Irix implementation, but a mechanism for incrementing the warning counter was never implemented, as documented in the xfs_quota(8) man page. We do know from the historical archive that it was never incremented at runtime during quota reservation operations. With this commit, the warning counter quickly increments for every allocation attempt after the user has crossed a quote soft limit threshold, and this in turn transitions the user to hard quota failures, rendering soft quota thresholds and timers useless. This was reported as a regression by users. Because the intended behavior of this warning counter has never been understood or documented, and the result of this change is a regression in soft quota functionality, revert this commit to make soft quota limits and timers operable again. Fixes: 4b8628d5 ("xfs: actually bump warning counts when we send warnings) Signed-off-by: NEric Sandeen <sandeen@redhat.com> Reviewed-by: NDarrick J. Wong <djwong@kernel.org> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com> Signed-off-by: NGuo Xuenan <guoxuenan@huawei.com> Reviewed-by: NZhang Yi <yi.zhang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Dave Chinner 提交于
mainline inclusion from mainline-v5.16-rc3 commit cd6f79d1 category: bugfix bugzilla: 187526,https://gitee.com/openeuler/kernel/issues/I4KIAO Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=cd6f79d1fb324968a3bae92f82eeb7d28ca1fd22 -------------------------------- Brian reported a null pointer dereference failure during unmount in xfs/006. He tracked the problem down to the AIL being torn down before a log shutdown had completed and removed all the items from the AIL. The failure occurred in this path while unmount was proceeding in another task: xfs_trans_ail_delete+0x102/0x130 [xfs] xfs_buf_item_done+0x22/0x30 [xfs] xfs_buf_ioend+0x73/0x4d0 [xfs] xfs_trans_committed_bulk+0x17e/0x2f0 [xfs] xlog_cil_committed+0x2a9/0x300 [xfs] xlog_cil_process_committed+0x69/0x80 [xfs] xlog_state_shutdown_callbacks+0xce/0xf0 [xfs] xlog_force_shutdown+0xdf/0x150 [xfs] xfs_do_force_shutdown+0x5f/0x150 [xfs] xlog_ioend_work+0x71/0x80 [xfs] process_one_work+0x1c5/0x390 worker_thread+0x30/0x350 kthread+0xd7/0x100 ret_from_fork+0x1f/0x30 This is processing an EIO error to a log write, and it's triggering a force shutdown. This causes the log to be shut down, and then it is running attached iclog callbacks from the shutdown context. That means the fs and log has already been marked as xfs_is_shutdown/xlog_is_shutdown and so high level code will abort (e.g. xfs_trans_commit(), xfs_log_force(), etc) with an error because of shutdown. The umount would have been blocked waiting for a log force completion inside xfs_log_cover() -> xfs_sync_sb(). The first thing for this situation to occur is for xfs_sync_sb() to exit without waiting for the iclog buffer to be comitted to disk. The above trace is the completion routine for the iclog buffer, and it is shutting down the filesystem. xlog_state_shutdown_callbacks() does this: { struct xlog_in_core *iclog; LIST_HEAD(cb_list); spin_lock(&log->l_icloglock); iclog = log->l_iclog; do { if (atomic_read(&iclog->ic_refcnt)) { /* Reference holder will re-run iclog callbacks. */ continue; } list_splice_init(&iclog->ic_callbacks, &cb_list); >>>>>> wake_up_all(&iclog->ic_write_wait); >>>>>> wake_up_all(&iclog->ic_force_wait); } while ((iclog = iclog->ic_next) != log->l_iclog); wake_up_all(&log->l_flush_wait); spin_unlock(&log->l_icloglock); >>>>>> xlog_cil_process_committed(&cb_list); } This wakes any thread waiting on IO completion of the iclog (in this case the umount log force) before shutdown processes all the pending callbacks. That means the xfs_sync_sb() waiting on a sync transaction in xfs_log_force() on iclog->ic_force_wait will get woken before the callbacks attached to that iclog are run. This results in xfs_sync_sb() returning an error, and so unmount unblocks and continues to run whilst the log shutdown is still in progress. Normally this is just fine because the force waiter has nothing to do with AIL operations. But in the case of this unmount path, the log force waiter goes on to tear down the AIL because the log is now shut down and so nothing ever blocks it again from the wait point in xfs_log_cover(). Hence it's a race to see who gets to the AIL first - the unmount code or xlog_cil_process_committed() killing the superblock buffer. To fix this, we just have to change the order of processing in xlog_state_shutdown_callbacks() to run the callbacks before it wakes any task waiting on completion of the iclog. Reported-by: NBrian Foster <bfoster@redhat.com> Fixes: aad7272a ("xfs: separate out log shutdown callback processing") Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NDarrick J. Wong <djwong@kernel.org> Signed-off-by: NDarrick J. Wong <djwong@kernel.org> Signed-off-by: NGuo Xuenan <guoxuenan@huawei.com> Reviewed-by: NZhang Yi <yi.zhang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Dave Chinner 提交于
mainline inclusion from mainline-v5.16-rc3 commit 70447e0a category: bugfix bugzilla: 187526,https://gitee.com/openeuler/kernel/issues/I4KIAO Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=70447e0ad9781f84e60e0990888bd8c84987f44e -------------------------------- When the AIL tries to flush the CIL, it relies on the CIL push ending up on stable storage without having to wait for and manipulate iclog state directly. However, if there is already a pending CIL push when the AIL tries to flush the CIL, it won't set the cil->xc_push_commit_stable flag and so the CIL push will not actively flush the commit record iclog. generic/530 when run on a single CPU test VM can trigger this fairly reliably. This test exercises unlinked inode recovery, and can result in inodes being pinned in memory by ongoing modifications to the inode cluster buffer to record unlinked list modifications. As a result, the first inode unlinked in a buffer can pin the tail of the log whilst the inode cluster buffer is pinned by the current checkpoint that has been pushed but isn't on stable storage because because the cil->xc_push_commit_stable was not set. This results in the log/AIL effectively deadlocking until something triggers the commit record iclog to be pushed to stable storage (i.e. the periodic log worker calling xfs_log_force()). The fix is two-fold - first we should always set the cil->xc_push_commit_stable when xlog_cil_flush() is called, regardless of whether there is already a pending push or not. Second, if the CIL is empty, we should trigger an iclog flush to ensure that the iclogs of the last checkpoint have actually been submitted to disk as that checkpoint may not have been run under stable completion constraints. Reported-and-tested-by: NMatthew Wilcox <willy@infradead.org> Fixes: 0020a190 ("xfs: AIL needs asynchronous CIL forcing") Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NDarrick J. Wong <djwong@kernel.org> Signed-off-by: NDarrick J. Wong <djwong@kernel.org> Signed-off-by: NGuo Xuenan <guoxuenan@huawei.com> Reviewed-by: NZhang Yi <yi.zhang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Darrick J. Wong 提交于
mainline inclusion from mainline-v5.16-rc3 commit eba0549b category: bugfix bugzilla: 187526,https://gitee.com/openeuler/kernel/issues/I4KIAO Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=eba0549bc7d100691c13384b774346b8aa9cf9a9 -------------------------------- There are a few places where we test the current process' capability set to decide if we're going to be more or less generous with resource acquisition for a system call. If the process doesn't have the capability, we can continue the call, albeit in a degraded mode. These are /not/ the actual security decisions, so it's not proper to use capable(), which (in certain selinux setups) causes audit messages to get logged. Switch them to has_capability_noaudit. Fixes: 7317a03d ("xfs: refactor inode ownership change transaction/inode/quota allocation idiom") Fixes: ea9a46e1 ("xfs: only return detailed fsmap info if the caller has CAP_SYS_ADMIN") Signed-off-by: NDarrick J. Wong <djwong@kernel.org> Cc: Dave Chinner <david@fromorbit.com> Reviewed-by: NOndrej Mosnacek <omosnace@redhat.com> Acked-by: NSerge Hallyn <serge@hallyn.com> Reviewed-by: NEric Sandeen <sandeen@redhat.com> Signed-off-by: NGuo Xuenan <guoxuenan@huawei.com> Conflicts: fs/xfs/xfs_fsmap.c Reviewed-by: NZhang Yi <yi.zhang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Darrick J. Wong 提交于
mainline inclusion from mainline-v5.16-rc3 commit b97cca3b category: bugfix bugzilla: 187526,https://gitee.com/openeuler/kernel/issues/I4KIAO Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=b97cca3ba9098522e5a1c3388764ead42640c1a5 -------------------------------- In commit 02b9984d, we pushed a sync_filesystem() call from the VFS into xfs_fs_remount. The only time that we ever need to push dirty file data or metadata to disk for a remount is if we're remounting the filesystem read only, so this really could be moved to xfs_remount_ro. Once we've moved the call site, actually check the return value from sync_filesystem. Fixes: 02b9984d ("fs: push sync_filesystem() down to the file system's remount_fs()") Signed-off-by: NDarrick J. Wong <djwong@kernel.org> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NGuo Xuenan <guoxuenan@huawei.com> Reviewed-by: NZhang Yi <yi.zhang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Christoph Hellwig 提交于
mainline inclusion from mainline-v5.16-rc2 commit 1090427b category: bugfix bugzilla: 187526,https://gitee.com/openeuler/kernel/issues/I4KIAO Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=1090427bf18f9835b3ccbd36edf43f2509444e27 -------------------------------- With the remove of xfs_dqrele_all_inodes, xfs_inew_wait and all the infrastructure used to wake the XFS_INEW bit waitqueue is unused. Reported-by: Nkernel test robot <lkp@intel.com> Fixes: 777eb1fa ("xfs: remove xfs_dqrele_all_inodes") Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NDarrick J. Wong <djwong@kernel.org> Signed-off-by: NDarrick J. Wong <djwong@kernel.org> Signed-off-by: NGuo Xuenan <guoxuenan@huawei.com> Conflicts: fs/xfs/xfs_inode.h Reviewed-by: NZhang Yi <yi.zhang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Darrick J. Wong 提交于
mainline inclusion from mainline-v5.12-rc4 commit 9de4b514 category: bugfix bugzilla: 187526,https://gitee.com/openeuler/kernel/issues/I4KIAO Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=9de4b514494a3b49fa708186c0dc4611f1fe549c -------------------------------- If scrub observes cross-referencing errors while scanning a data structure, mark the data structure sick. There's /something/ inconsistent, even if we can't really tell what it is. Fixes: 4860a05d ("xfs: scrub/repair should update filesystem metadata health") Signed-off-by: NDarrick J. Wong <djwong@kernel.org> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NGuo Xuenan <guoxuenan@huawei.com> Reviewed-by: NZhang Yi <yi.zhang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Darrick J. Wong 提交于
mainline inclusion from mainline-v5.11-rc4 commit 89e0eb8c category: bugfix bugzilla: 187526,https://gitee.com/openeuler/kernel/issues/I4KIAO Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=89e0eb8c13bb842e224b27d7e071262cd84717cb -------------------------------- In commit 9669f51d I tried to get rid of the undocumented cow gc lifetime knob. The knob's function was never documented and it now doesn't really have a function since eof and cow gc have been consolidated. Regrettably, xfs/231 relies on it and regresses on for-next. I did not succeed at getting far enough through fstests patch review for the fixup to land in time. Restore the sysctl knob, document what it did (does?), put it on the deprecation schedule, and rip out a redundant function. Fixes: 9669f51d ("xfs: consolidate the eofblocks and cowblocks workers") Signed-off-by: NDarrick J. Wong <djwong@kernel.org> Reviewed-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NGuo Xuenan <guoxuenan@huawei.com> Reviewed-by: NZhang Yi <yi.zhang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Christoph Hellwig 提交于
mainline inclusion from mainline-v5.11-rc4 commit 354be7e3 category: bugfix bugzilla: 187526,https://gitee.com/openeuler/kernel/issues/I4KIAO Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=354be7e3b2baf32e63c0599cc131d393591ba299 -------------------------------- Ensure we don't block on the iolock, or waiting for I/O in xfs_file_aio_write_checks if the caller asked to avoid that. Fixes: 29a5d29e ("xfs: nowait aio support") Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NDarrick J. Wong <djwong@kernel.org> Signed-off-by: NDarrick J. Wong <djwong@kernel.org> Signed-off-by: NGuo Xuenan <guoxuenan@huawei.com> Reviewed-by: NZhang Yi <yi.zhang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Darrick J. Wong 提交于
mainline inclusion from mainline-v5.10-rc5 commit 4b80ac64 category: bugfix bugzilla: 187526,https://gitee.com/openeuler/kernel/issues/I4KIAO Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4b80ac64450f169bae364df631d233d66070a06e -------------------------------- It's possible that xfs_iget can return EINVAL for inodes that the inobt thinks are free, or ENOENT for inodes that look free. If this is the case, mark the directory corrupt immediately when we check ftype. Note that we already check the ftype of the '.' and '..' entries, so we can skip the iget part since we already know the inode type for '.' and we have a separate parent pointer scrubber for '..'. Fixes: a5c46e5e ("xfs: scrub directory metadata") Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NGuo Xuenan <guoxuenan@huawei.com> Reviewed-by: NZhang Yi <yi.zhang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Christoph Hellwig 提交于
mainline inclusion from mainline-v5.11-rc4 commit f50b8f47 category: bugfix bugzilla: 187526,https://gitee.com/openeuler/kernel/issues/I4KIAO Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=f50b8f475a2c70ae8309c16b6d4ecb305a4aa9d6 -------------------------------- Add a helper to factor out the nowait locking logical for the read/write helpers. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NDarrick J. Wong <djwong@kernel.org> Signed-off-by: NDarrick J. Wong <djwong@kernel.org> Signed-off-by: NGuo Xuenan <guoxuenan@huawei.com> Conflicts: fs/xfs/xfs_file.c Signed-off-by: NGuo Xuenan <guoxuenan@huawei.com> Reviewed-by: NZhang Yi <yi.zhang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Darrick J. Wong 提交于
mainline inclusion from mainline-v5.10-rc5 commit da531cc4 category: bugfix bugzilla: 187526,https://gitee.com/openeuler/kernel/issues/I4KIAO Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=da531cc46ef16301b1bc5bc74acbaacc628904f5 -------------------------------- xfs_iget can return -ENOENT for a file that the inobt thinks is allocated but has zeroed mode. This currently causes scrub to exit with an operational error instead of flagging this as a corruption. The end result is that scrub mistakenly reports the ENOENT to the user instead of "directory parent pointer corrupt" like we do for EINVAL. Fixes: 5927268f ("xfs: flag inode corruption if parent ptr doesn't get us a real inode") Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NGuo Xuenan <guoxuenan@huawei.com> Reviewed-by: NZhang Yi <yi.zhang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Guo Xuenan 提交于
Offering: HULK hulk inclusion category: bugfix bugzilla: 186943,https://gitee.com/openeuler/kernel/issues/I4KIAO -------------------------------- For leaf dir, In most cases, there should be as many bestfree slots as the dir data blocks that can fit under i_size (except for [1]). Root cause is we don't examin the number bestfree slots, when the slots number less than dir data blocks, if we need to allocate new dir data block and update the bestfree array, we will use the dir block number as index to assign bestfree array, while we did not check the leaf buf boundary which may cause UAF or other memory access problem. This issue can also triggered with test cases xfs/473 from fstests. Considering the special case [1] , only add check bestfree array boundary, to avoid UAF or slab-out-of bound. [1] https://lore.kernel.org/all/163961697197.3129691.1911552605195534271.stgit@magnolia/ Simplify the testcase xfs/473 with commands below: DEV=/dev/sdb MP=/mnt/sdb WORKDIR=/mnt/sdb/341 #1. mkfs create new xfs image mkfs.xfs -f ${DEV} mount ${DEV} ${MP} mkdir -p ${WORKDIR} for i in `seq 1 341` #2. create leaf dir with 341 entries file name len 8 do touch ${WORKDIR}/$(printf "%08d" ${i}) done inode=$(ls -i ${MP} | cut -d' ' -f1) umount ${MP} #3. xfs_db set bestcount to 0 xfs_db -x ${DEV} -c "inode ${inode}" -c "dblock 8388608" \ -c "write ltail.bestcount 0" mount ${DEV} ${MP} touch ${WORKDIR}/{1..100}.txt #4. touch new file, reproduce The error log is shown as follows: Signed-off-by: NGuo Xuenan <guoxuenan@huawei.com> Reviewed-by: NZhang Yi <yi.zhang@huawei.com> ================================================================== BUG: KASAN: use-after-free in xfs_dir2_leaf_addname+0x1995/0x1ac0 Write of size 2 at addr ffff88810168b000 by task touch/1552 CPU: 5 PID: 1552 Comm: touch Not tainted 6.0.0-rc3+ #101 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x4d/0x66 print_report.cold+0xf6/0x691 kasan_report+0xa8/0x120 xfs_dir2_leaf_addname+0x1995/0x1ac0 xfs_dir_createname+0x58c/0x7f0 xfs_create+0x7af/0x1010 xfs_generic_create+0x270/0x5e0 path_openat+0x270b/0x3450 do_filp_open+0x1cf/0x2b0 do_sys_openat2+0x46b/0x7a0 do_sys_open+0xb7/0x130 do_syscall_64+0x35/0x80 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x7fe4d9e9312b Code: 25 00 00 41 00 3d 00 00 41 00 74 4b 64 8b 04 25 18 00 00 00 85 c0 75 67 44 89 e2 48 89 ee bf 9c ff ff ff b8 01 01 00 00 0f 05 <48> 3d 00 f0 ff ff 0f 87 91 00 00 00 48 8b 4c 24 28 64 48 33 0c 25 RSP: 002b:00007ffda4c16c20 EFLAGS: 00000246 ORIG_RAX: 0000000000000101 RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007fe4d9e9312b RDX: 0000000000000941 RSI: 00007ffda4c17f33 RDI: 00000000ffffff9c RBP: 00007ffda4c17f33 R08: 0000000000000000 R09: 0000000000000000 R10: 00000000000001b6 R11: 0000000000000246 R12: 0000000000000941 R13: 00007fe4d9f631a4 R14: 00007ffda4c17f33 R15: 0000000000000000 </TASK> The buggy address belongs to the physical page: page:ffffea000405a2c0 refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x10168b flags: 0x2fffff80000000(node=0|zone=2|lastcpupid=0x1fffff) raw: 002fffff80000000 ffffea0004057788 ffffea000402dbc8 0000000000000000 raw: 0000000000000000 0000000000170000 00000000ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff88810168af00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffff88810168af80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >ffff88810168b000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ^ ffff88810168b080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ffff88810168b100: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ================================================================== Disabling lock debugging due to kernel taint 00000000: 58 44 44 33 5b 53 35 c2 00 00 00 00 00 00 00 78 XDD3[S5........x XFS (sdb): Internal error xfs_dir2_data_use_free at line 1200 of file fs/xfs/libxfs/xfs_dir2_data.c. Caller xfs_dir2_data_use_free+0x28a/0xeb0 CPU: 5 PID: 1552 Comm: touch Tainted: G B 6.0.0-rc3+ Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x4d/0x66 xfs_corruption_error+0x132/0x150 xfs_dir2_data_use_free+0x198/0xeb0 xfs_dir2_leaf_addname+0xa59/0x1ac0 xfs_dir_createname+0x58c/0x7f0 xfs_create+0x7af/0x1010 xfs_generic_create+0x270/0x5e0 path_openat+0x270b/0x3450 do_filp_open+0x1cf/0x2b0 do_sys_openat2+0x46b/0x7a0 do_sys_open+0xb7/0x130 do_syscall_64+0x35/0x80 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x7fe4d9e9312b Code: 25 00 00 41 00 3d 00 00 41 00 74 4b 64 8b 04 25 18 00 00 00 85 c0 75 67 44 89 e2 48 89 ee bf 9c ff ff ff b8 01 01 00 00 0f 05 <48> 3d 00 f0 ff ff 0f 87 91 00 00 00 48 8b 4c 24 28 64 48 33 0c 25 RSP: 002b:00007ffda4c16c20 EFLAGS: 00000246 ORIG_RAX: 0000000000000101 RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007fe4d9e9312b RDX: 0000000000000941 RSI: 00007ffda4c17f46 RDI: 00000000ffffff9c RBP: 00007ffda4c17f46 R08: 0000000000000000 R09: 0000000000000001 R10: 00000000000001b6 R11: 0000000000000246 R12: 0000000000000941 R13: 00007fe4d9f631a4 R14: 00007ffda4c17f46 R15: 0000000000000000 </TASK> XFS (sdb): Corruption detected. Unmount and run xfs_repair Signed-off-by: NGuo Xuenan <guoxuenan@huawei.com> Reviewed-by: NHou Tao <houtao1@huawei.com> Signed-off-by: NGuo Xuenan <guoxuenan@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Chandan Babu R 提交于
mainline inclusion from mainline-v5.12-rc1 commit 5147ef30 category: bugfix bugzilla: 187510,https://gitee.com/openeuler/kernel/issues/I4KIAO Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5147ef30f2cd128c9eedf7a697e8cb2ce2767989 -------------------------------- With dax enabled filesystems, a direct write operation into an existing unwritten extent results in xfs_iomap_write_direct() zero-ing and converting the extent into a normal extent before the actual data is copied from the userspace buffer. The inode extent count can increase by 2 if the extent range being written to maps to the middle of the existing unwritten extent range. Hence this commit uses XFS_IEXT_WRITE_UNWRITTEN_CNT as the extent count delta when such a write operation is being performed. Fixes: 727e1acd ("xfs: Check for extent overflow when trivally adding a new extent") Reported-by: NDarrick J. Wong <djwong@kernel.org> Signed-off-by: NChandan Babu R <chandanrlinux@gmail.com> Reviewed-by: NDarrick J. Wong <djwong@kernel.org> Signed-off-by: NDarrick J. Wong <djwong@kernel.org> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NYu Kuai <yukuai3@huawei.com> Signed-off-by: NGuo Xuenan <guoxuenan@huawei.com> Reviewed-by: NZhang Yi <yi.zhang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Chandan Babu R 提交于
mainline inclusion from mainline-v5.12-rc1 commit bcc561f2 category: bugfix bugzilla: 187510,https://gitee.com/openeuler/kernel/issues/I4KIAO Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=bcc561f21f115437a010307420fc43d91be91c66 -------------------------------- Removing an initial range of source/donor file's extent and adding a new extent (from donor/source file) in its place will cause extent count to increase by 1. Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NAllison Henderson <allison.henderson@oracle.com> Signed-off-by: NChandan Babu R <chandanrlinux@gmail.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NYu Kuai <yukuai3@huawei.com> Signed-off-by: NGuo Xuenan <guoxuenan@huawei.com> Reviewed-by: NZhang Yi <yi.zhang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Chandan Babu R 提交于
mainline inclusion from mainline-v5.12-rc1 commit ee898d78 category: bugfix bugzilla: 187510,https://gitee.com/openeuler/kernel/issues/I4KIAO Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ee898d78c3540b44270a5fdffe208d7bbb219d93 -------------------------------- Remapping an extent involves unmapping the existing extent and mapping in the new extent. When unmapping, an extent containing the entire unmap range can be split into two extents, i.e. | Old extent | hole | Old extent | Hence extent count increases by 1. Mapping in the new extent into the destination file can increase the extent count by 1. Reviewed-by: NAllison Henderson <allison.henderson@oracle.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NChandan Babu R <chandanrlinux@gmail.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NYu Kuai <yukuai3@huawei.com> Signed-off-by: NGuo Xuenan <guoxuenan@huawei.com> Reviewed-by: NZhang Yi <yi.zhang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Chandan Babu R 提交于
mainline inclusion from mainline-v5.12-rc1 commit 5f1d5bbf category: bugfix bugzilla: 187510,https://gitee.com/openeuler/kernel/issues/I4KIAO Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5f1d5bbfb2e674052a9fe542f53678978af20770 -------------------------------- Moving an extent to data fork can cause a sub-interval of an existing extent to be unmapped. This will increase extent count by 1. Mapping in the new extent can increase the extent count by 1 again i.e. | Old extent | New extent | Old extent | Hence number of extents increases by 2. Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NAllison Henderson <allison.henderson@oracle.com> Signed-off-by: NChandan Babu R <chandanrlinux@gmail.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NYu Kuai <yukuai3@huawei.com> Signed-off-by: NGuo Xuenan <guoxuenan@huawei.com> Reviewed-by: NZhang Yi <yi.zhang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Chandan Babu R 提交于
mainline inclusion from mainline-v5.12-rc1 commit c442f308 category: bugfix bugzilla: 187510,https://gitee.com/openeuler/kernel/issues/I4KIAO Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c442f3086d5a108b7ff086c8ade1923a8f389db5 -------------------------------- A write to a sub-interval of an existing unwritten extent causes the original extent to be split into 3 extents i.e. | Unwritten | Real | Unwritten | Hence extent count can increase by 2. Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NAllison Henderson <allison.henderson@oracle.com> Signed-off-by: NChandan Babu R <chandanrlinux@gmail.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Conflict: commit 3a1af6c3 ("xfs: refactor common transaction/inode/quota allocation idiom") is backported, which introduce some conflicts on code context. Signed-off-by: NYu Kuai <yukuai3@huawei.com> Signed-off-by: NGuo Xuenan <guoxuenan@huawei.com> Reviewed-by: NZhang Yi <yi.zhang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Chandan Babu R 提交于
mainline inclusion from mainline-v5.12-rc1 commit 3a19bb14 category: bugfix bugzilla: 187510,https://gitee.com/openeuler/kernel/issues/I4KIAO Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=3a19bb147c72d2e9b77137bf5130b9cfb50a5eef -------------------------------- Adding/removing an xattr can cause XFS_DA_NODE_MAXDEPTH extents to be added. One extra extent for dabtree in case a local attr is large enough to cause a double split. It can also cause extent count to increase proportional to the size of a remote xattr's value. Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NAllison Henderson <allison.henderson@oracle.com> Signed-off-by: NChandan Babu R <chandanrlinux@gmail.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Conflict: commit 3a1af6c3 ("xfs: refactor common transaction/inode/quota allocation idiom") is backported, which introduce some conflicts on code context. Signed-off-by: NYu Kuai <yukuai3@huawei.com> Signed-off-by: NGuo Xuenan <guoxuenan@huawei.com> Reviewed-by: NZhang Yi <yi.zhang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Chandan Babu R 提交于
mainline inclusion from mainline-v5.12-rc1 commit 02092a2f category: bugfix bugzilla: 187510,https://gitee.com/openeuler/kernel/issues/I4KIAO Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=02092a2f034fdeabab524ae39c2de86ba9ffa15a -------------------------------- A rename operation is essentially a directory entry remove operation from the perspective of parent directory (i.e. src_dp) of rename's source. Hence the only place where we check for extent count overflow for src_dp is in xfs_bmap_del_extent_real(). xfs_bmap_del_extent_real() returns -ENOSPC when it detects a possible extent count overflow and in response, the higher layers of directory handling code do the following: 1. Data/Free blocks: XFS lets these blocks linger until a future remove operation removes them. 2. Dabtree blocks: XFS swaps the blocks with the last block in the Leaf space and unmaps the last block. For target_dp, there are two cases depending on whether the destination directory entry exists or not. When destination directory entry does not exist (i.e. target_ip == NULL), extent count overflow check is performed only when transaction has a non-zero sized space reservation associated with it. With a zero-sized space reservation, XFS allows a rename operation to continue only when the directory has sufficient free space in its data/leaf/free space blocks to hold the new entry. When destination directory entry exists (i.e. target_ip != NULL), all we need to do is change the inode number associated with the already existing entry. Hence there is no need to perform an extent count overflow check. Signed-off-by: NChandan Babu R <chandanrlinux@gmail.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NYu Kuai <yukuai3@huawei.com> Signed-off-by: NGuo Xuenan <guoxuenan@huawei.com> Reviewed-by: NZhang Yi <yi.zhang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Chandan Babu R 提交于
mainline inclusion from mainline-v5.12-rc1 commit 0dbc5cb1 category: bugfix bugzilla: 187510,https://gitee.com/openeuler/kernel/issues/I4KIAO Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0dbc5cb1a91cc8c44b1c75429f5b9351837114fd -------------------------------- Directory entry removal must always succeed; Hence XFS does the following during low disk space scenario: 1. Data/Free blocks linger until a future remove operation. 2. Dabtree blocks would be swapped with the last block in the leaf space and then the new last block will be unmapped. This facility is reused during low inode extent count scenario i.e. this commit causes xfs_bmap_del_extent_real() to return -ENOSPC error code so that the above mentioned behaviour is exercised causing no change to the directory's extent count. Signed-off-by: NChandan Babu R <chandanrlinux@gmail.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NYu Kuai <yukuai3@huawei.com> Signed-off-by: NGuo Xuenan <guoxuenan@huawei.com> Reviewed-by: NZhang Yi <yi.zhang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-