- 09 3月, 2022 1 次提交
-
-
由 Dave Chinner 提交于
mainline-inclusion from mainline-v5.14-rc4 commit b36d4651 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I4V7IK CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=b36d4651e1650082d27fa477318183c4a7210e30 ------------------------------------------------- The running of a forced shutdown is a bit of a mess. It does racy checks for XFS_MOUNT_SHUTDOWN in xfs_do_force_shutdown(), then does more racy checks in xfs_log_force_unmount() before finally setting XFS_MOUNT_SHUTDOWN and XLOG_IO_ERROR under the log->icloglock. Move the checking and setting of XFS_MOUNT_SHUTDOWN into xfs_do_force_shutdown() so we only process a shutdown once and once only. Serialise this with the mp->m_sb_lock spinlock so that the state change is atomic and won't race. Move all the mount specific shutdown state changes from xfs_log_force_unmount() to xfs_do_force_shutdown() so they are done atomically with setting XFS_MOUNT_SHUTDOWN. Then get rid of the racy xlog_is_shutdown() check from xlog_force_shutdown(), and gate the log shutdown on the test_and_set_bit(XLOG_IO_ERROR) test under the icloglock. This means that the log is shutdown once and once only, and code that needs to prevent races with shutdown can do so by holding the icloglock and checking the return value of xlog_is_shutdown(). This results in a predictable shutdown execution process - we set the shutdown flags once and process the shutdown once rather than the current "as many concurrent shutdowns as can race to the flag setting" situation we have now. Also, now that shutdown is atomic, alway emit a stack trace when the error level for the filesystem is high enough. This means that we always get a stack trace when trying to diagnose the cause of shutdowns in the field, rather than just for SHUTDOWN_CORRUPT_INCORE cases. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NDarrick J. Wong <djwong@kernel.org> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDarrick J. Wong <djwong@kernel.org> Signed-off-by: NLihong Kou <koulihong@huawei.com> Reviewed-by: NZhang Yi <yi.zhang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
- 27 12月, 2021 1 次提交
-
-
由 Darrick J. Wong 提交于
mainline-inclusion from mainline-v5.12-rc4 commit 1aec7c3d category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I4KIAO CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=1aec7c3d05670b92b7339b19999009a93808efb9 ------------------------------------------------- In commit f8f2835a we changed the behavior of XFS to use EFIs to remove blocks from an overfilled AGFL because there were complaints about transaction overruns that stemmed from trying to free multiple blocks in a single transaction. Unfortunately, that commit missed a subtlety in the debug-mode transaction accounting when a realtime volume is attached. If a realtime file undergoes a data fork mapping change such that realtime extents are allocated (or freed) in the same transaction that a data device block is also allocated (or freed), we can trip a debugging assertion. This can happen (for example) if a realtime extent is allocated and it is necessary to reshape the bmbt to hold the new mapping. When we go to allocate a bmbt block from an AG, the first thing the data device block allocator does is ensure that the freelist is the proper length. If the freelist is too long, it will trim the freelist to the proper length. In debug mode, trimming the freelist calls xfs_trans_agflist_delta() to record the decrement in the AG free list count. Prior to f8f28 we would put the free block back in the free space btrees in the same transaction, which calls xfs_trans_agblocks_delta() to record the increment in the AG free block count. Since AGFL blocks are included in the global free block count (fdblocks), there is no corresponding fdblocks update, so the AGFL free satisfies the following condition in xfs_trans_apply_sb_deltas: /* * Check that superblock mods match the mods made to AGF counters. */ ASSERT((tp->t_fdblocks_delta + tp->t_res_fdblocks_delta) == (tp->t_ag_freeblks_delta + tp->t_ag_flist_delta + tp->t_ag_btree_delta)); The comparison here used to be: (X + 0) == ((X+1) + -1 + 0), where X is the number blocks that were allocated. After commit f8f28 we defer the block freeing to the next chained transaction, which means that the calls to xfs_trans_agflist_delta and xfs_trans_agblocks_delta occur in separate transactions. The (first) transaction that shortens the free list trips on the comparison, which has now become: (X + 0) == ((X) + -1 + 0) because we haven't freed the AGFL block yet; we've only logged an intention to free it. When the second transaction (the deferred free) commits, it will evaluate the expression as: (0 + 0) == (1 + 0 + 0) and trip over that in turn. At this point, the astute reader may note that the two commits tagged by this patch have been in the kernel for a long time but haven't generated any bug reports. How is it that the author became aware of this bug? This originally surfaced as an intermittent failure when I was testing realtime rmap, but a different bug report by Zorro Lang reveals the same assertion occuring on !lazysbcount filesystems. The common factor to both reports (and why this problem wasn't previously reported) becomes apparent if we consider when xfs_trans_apply_sb_deltas is called by __xfs_trans_commit(): if (tp->t_flags & XFS_TRANS_SB_DIRTY) xfs_trans_apply_sb_deltas(tp); With a modern lazysbcount filesystem, transactions update only the percpu counters, so they don't need to set XFS_TRANS_SB_DIRTY, hence xfs_trans_apply_sb_deltas is rarely called. However, updates to the count of free realtime extents are not part of lazysbcount, so XFS_TRANS_SB_DIRTY will be set on transactions adding or removing data fork mappings to realtime files; similarly, XFS_TRANS_SB_DIRTY is always set on !lazysbcount filesystems. Dave mentioned in response to an earlier version of this patch: "IIUC, what you are saying is that this debug code is simply not exercised in normal testing and hasn't been for the past decade? And it still won't be exercised on anything other than realtime device testing? "...it was debugging code from 1994 that was largely turned into dead code when lazysbcounters were introduced in 2007. Hence I'm not sure it holds any value anymore." This debugging code isn't especially helpful - you can modify the flcount on one AG and the freeblks of another AG, and it won't trigger. Add the fact that nobody noticed for a decade, and let's just get rid of it (and start testing realtime :P). This bug was found by running generic/051 on either a V4 filesystem lacking lazysbcount; or a V5 filesystem with a realtime volume. Cc: bfoster@redhat.com, zlang@redhat.com Fixes: f8f2835a ("xfs: defer agfl block frees when dfops is available") Signed-off-by: NDarrick J. Wong <djwong@kernel.org> Reviewed-by: NBrian Foster <bfoster@redhat.com> Signed-off-by: NGuo Xuenan <guoxuenan@huawei.com> Reviewed-by: NLihong Kou <koulihong@huawei.com> Reviewed-by: NZhang Yi <yi.zhang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
- 07 5月, 2020 1 次提交
-
-
由 Brian Foster 提交于
Both types control shutdown messaging and neither is used in the current codebase. Signed-off-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NAllison Collins <allison.henderson@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
- 29 6月, 2019 1 次提交
-
-
由 Eric Sandeen 提交于
There are many, many xfs header files which are included but unneeded (or included twice) in the xfs code, so remove them. nb: xfs_linux.h includes about 9 headers for everyone, so those explicit includes get removed by this. I'm not sure what the preference is, but if we wanted explicit includes everywhere, a followup patch could remove those xfs_*.h includes from xfs_linux.h and move them into the files that need them. Or it could be left as-is. Signed-off-by: NEric Sandeen <sandeen@redhat.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
- 12 6月, 2019 1 次提交
-
-
由 Darrick J. Wong 提交于
Separate the inode geometry information into a distinct structure. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NDave Chinner <dchinner@redhat.com>
-
- 02 5月, 2019 1 次提交
-
-
由 Eric Sandeen 提交于
There are several functions which have no opportunity to return an error, and don't contain any ASSERTs which could be argued to be better constructed as error cases. So, make them voids to simplify the callers. Signed-off-by: NEric Sandeen <sandeen@redhat.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NDave Chinner <dchinner@redhat.com>
-
- 15 2月, 2019 1 次提交
-
-
由 Darrick J. Wong 提交于
Log recovery frees all the inodes stored in the unlinked list, which can cause expansion of the free inode btree. The ifree code skips block reservations if it thinks there's a per-AG space reservation, but we don't set up the reservation until after log recovery, which means that a finobt expansion blows up in xfs_trans_mod_sb when we exceed the transaction's block reservation. To fix this, we set the "no finobt reservation" flag to true when we create the xfs_mount and only set it to false if we confirm that every AG had enough free space to put aside for the finobt. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDave Chinner <dchinner@redhat.com>
-
- 30 12月, 2018 1 次提交
-
-
由 Julia Lawall 提交于
Drop LIST_HEAD where the variable it declares is never used. Commit 0410c3bb ("xfs: factor ag btree root block initialisation") stopped using buffer_list and started using a buffer list in an aghdr_init_data structure, but the declaration of buffer_list was not removed. The semantic patch that fixes this problem is as follows: (http://coccinelle.lip6.fr/) // <smpl> @@ identifier x; @@ - LIST_HEAD(x); ... when != x // </smpl> Fixes: 0410c3bb ("xfs: factor ag btree root block initialisation") Signed-off-by: NJulia Lawall <Julia.Lawall@lip6.fr> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
- 13 12月, 2018 1 次提交
-
-
由 Darrick J. Wong 提交于
Add new helpers to convert units of fs blocks into inodes, and AG blocks into AG inodes, respectively. Convert all the open-coded conversions and XFS_OFFBNO_TO_AGINO(, , 0) calls to use them, as appropriate. The OFFBNO_TO_AGINO macro is retained for xfs_repair. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NBrian Foster <bfoster@redhat.com>
-
- 18 10月, 2018 1 次提交
-
-
由 Dave Chinner 提交于
The kernel only issues a log message that it's been shut down when the filesystem triggers a shutdown itself. Hence there is no trace in the log when a shutdown is triggered manually from userspace. This can make it hard to see sequence of events in the log when things go wrong, so make sure we always log a message when a shutdown is run. While there, clean up the logic flow so we don't have to continually check if the shutdown trigger was user initiated before logging shutdown messages. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
- 30 7月, 2018 1 次提交
-
-
由 Darrick J. Wong 提交于
Pass a tranaction pointer through to all helpers that calculate the per-AG block reservation. Online repair will use this to reinitialize per-ag reservations while it still holds all the AG headers locked to the repair transaction. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NBrian Foster <bfoster@redhat.com>
-
- 25 6月, 2018 1 次提交
-
-
由 Darrick J. Wong 提交于
If we somehow end up with a filesystem that has fewer free blocks than the blocks set aside to avoid ENOSPC deadlocks, it's possible that the free space calculation in xfs_reserve_blocks will spit out a negative number (because percpu_counter_sum returns s64). We fail to notice this negative number and set fdblks_delta to it. Now we increment fdblocks(!) and the unsigned type of m_resblks means that we end up setting a ridiculously huge m_resblks reservation. Avoid this comedy of errors by detecting the negative free space and returning -ENOSPC. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NAllison Henderson <allison.henderson@oracle.com> Reviewed-by: NChristoph Hellwig <hch@lst.de>
-
- 07 6月, 2018 1 次提交
-
-
由 Dave Chinner 提交于
Remove the verbose license text from XFS files and replace them with SPDX tags. This does not change the license of any of the code, merely refers to the common, up-to-date license files in LICENSES/ This change was mostly scripted. fs/xfs/Makefile and fs/xfs/libxfs/xfs_fs.h were modified by hand, the rest were detected and modified by the following command: for f in `git grep -l "GNU General" fs/xfs/` ; do echo $f cat $f | awk -f hdr.awk > $f.new mv -f $f.new $f done And the hdr.awk script that did the modification (including detecting the difference between GPL-2.0 and GPL-2.0+ licenses) is as follows: $ cat hdr.awk BEGIN { hdr = 1.0 tag = "GPL-2.0" str = "" } /^ \* This program is free software/ { hdr = 2.0; next } /any later version./ { tag = "GPL-2.0+" next } /^ \*\// { if (hdr > 0.0) { print "// SPDX-License-Identifier: " tag print str print $0 str="" hdr = 0.0 next } print $0 next } /^ \* / { if (hdr > 1.0) next if (hdr > 0.0) { if (str != "") str = str "\n" str = str $0 next } print $0 next } /^ \*/ { if (hdr > 0.0) next print $0 next } // { if (hdr > 0.0) { if (str != "") str = str "\n" str = str $0 next } print $0 } END { } $ Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
- 16 5月, 2018 9 次提交
-
-
由 Dave Chinner 提交于
Growfs currently manually codes the extension of the last AG in a filesytem during the growfs process. Factor that out of the growfs code and move it into libxfs along with teh rest of the AG header modification code. Signed-Off-By: NDave Chinner <dchinner@redhat.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
由 Dave Chinner 提交于
So it can be shared with userspace (e.g. mkfs) easily. Signed-Off-By: NDave Chinner <dchinner@redhat.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
由 Dave Chinner 提交于
Right now we wait until we've committed changes to the primary superblock before we initialise any of the new secondary superblocks. This means that if we have any write errors for new secondary superblocks we end up with garbage in place rather than zeros or even an "in progress" superblock to indicate a grow operation is being done. To ensure we can write the secondary superblocks, initialise them earlier in the same loop that initialises the AG headers. We stamp the new secondary superblocks here with the old geometry, but set the "sb_inprogress" field to indicate that updates are being done to the superblock so they cannot be used. This will result in the secondary superblock fields being updated or triggering errors that will abort the grow before we commit any permanent changes. This also means we can change the update mechanism of the secondary superblocks. We know that we are going to wholly overwrite the information in the struct xfs_sb in the buffer, so there's no point reading it from disk. Just allocate an uncached buffer, zero it in memory, stamp the new superblock structure in it and write it out. If we fail to write it out, then we'll leave the existing sb (old or new w/ inprogress) on disk for repair to deal with later. Signed-Off-By: NDave Chinner <dchinner@redhat.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
由 Dave Chinner 提交于
This happens after all the transactions to update the superblock occur, and errors need to be handled slightly differently. Seperate out the code into it's own function, and clean up the error goto stack in the core growfs code as it is now much simpler. Signed-Off-By: NDave Chinner <dchinner@redhat.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
由 Dave Chinner 提交于
When growfs changes the imaxpct value of the filesystem, it runs through all the "change size" growfs code, whether it needs to or not. Separate out changing imaxpct into it's own function and transaction to simplify the rest of the growfs code. Signed-Off-By: NDave Chinner <dchinner@redhat.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
由 Dave Chinner 提交于
There's still more cookie cutter code in setting up each AG header. Separate all the variables into a simple structure and iterate a table of header definitions to initialise everything. Signed-Off-By: NDave Chinner <dchinner@redhat.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
由 Dave Chinner 提交于
Cookie cutter code, easily factored. Signed-Off-By: NDave Chinner <dchinner@redhat.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
由 Dave Chinner 提交于
We currently write all new AG headers synchronously, which can be slow for large grow operations. All we really need to do is ensure all the headers are on disk before we run the growfs transaction, so convert this to a buffer list and a delayed write operation. We block waiting for the delayed write buffer submission to complete, so this will fulfill the requirement to have all the buffers written correctly before proceeding. Signed-Off-By: NDave Chinner <dchinner@redhat.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
由 Dave Chinner 提交于
The intialisation of new AG headers is mostly common with the userspace mkfs code and growfs in the kernel, so start factoring it out so we can move it to libxfs and use it in both places. Signed-Off-By: NDave Chinner <dchinner@redhat.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
- 12 3月, 2018 1 次提交
-
-
由 Dave Chinner 提交于
The AGFL size calculation is about to get more complex, so lets turn the macro into a function first and remove the macro. Signed-off-by: NDave Chinner <dchinner@redhat.com> [darrick: forward port to newer kernel, simplify the helper] Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NBrian Foster <bfoster@redhat.com>
-
- 13 1月, 2018 2 次提交
-
-
由 Darrick J. Wong 提交于
Starting with commit 57e73442 ("vsprintf: refactor %pK code out of pointer"), the behavior of the raw '%p' printk format specifier was changed to print a 32-bit hash of the pointer value to avoid leaking kernel pointers into dmesg. For most situations that's good. This is /undesirable/ behavior when we're trying to debug XFS, however, so define a PTR_FMT that prints the actual pointer when we're in debug mode. Note that %p for tracepoints still prints the raw pointer, so in the long run we could consider rewriting some of these messages as tracepoints. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NDave Chinner <dchinner@redhat.com>
-
由 Darrick J. Wong 提交于
Since %p prepends "0x" to the outputted string, we can drop the prefix. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NDave Chinner <dchinner@redhat.com>
-
- 09 1月, 2018 1 次提交
-
-
由 Darrick J. Wong 提交于
Move xfs_fs_geometry to libxfs so that we can clean up the fs geometry reporting in xfsprogs. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NBrian Foster <bfoster@redhat.com>
-
- 22 12月, 2017 1 次提交
-
-
由 Darrick J. Wong 提交于
Calling xfs_rmap_free with an unknown owner is supposed to remove any rmaps covering that range regardless of owner. This is used by the EFI recovery code to say "we're freeing this, it mustn't be owned by anything anymore", but for whatever reason xfs_free_ag_extent filters them out. Therefore, remove the filter and make xfs_rmap_unmap actually treat it as a wildcard owner -- free anything that's already there, and if there's no owner at all then that's fine too. There are two existing callers of bmap_add_free that take care the rmap deferred ops themselves and use OWN_UNKNOWN to skip the EFI-based rmap cleanup; convert these to use OWN_NULL (via helpers), and now we really require that an RUI (if any) gets added to the defer ops before any EFI. Lastly, now that xfs_free_extent filters out OWN_NULL rmap free requests, growfs will have to consult directly with the rmap to ensure that there aren't any rmaps in the grown region. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NChristoph Hellwig <hch@lst.de>
-
- 20 6月, 2017 1 次提交
-
-
由 Darrick J. Wong 提交于
This is a purely mechanical patch that removes the private __{u,}int{8,16,32,64}_t typedefs in favor of using the system {u,}int{8,16,32,64}_t typedefs. This is the sed script used to perform the transformation and fix the resulting whitespace and indentation errors: s/typedef\t__uint8_t/typedef __uint8_t\t/g s/typedef\t__uint/typedef __uint/g s/typedef\t__int\([0-9]*\)_t/typedef int\1_t\t/g s/__uint8_t\t/__uint8_t\t\t/g s/__uint/uint/g s/__int\([0-9]*\)_t\t/__int\1_t\t\t/g s/__int/int/g /^typedef.*int[0-9]*_t;$/d Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NChristoph Hellwig <hch@lst.de>
-
- 31 1月, 2017 2 次提交
-
-
由 Eric Sandeen 提交于
Now that xfs_btree_init_block_int is able to determine crc status from the passed-in mp, we can determine the proper magic as well if we are given a btree number, rather than an explicit magic value. Change xfs_btree_init_block[_int] callers to pass in the btree number, and let xfs_btree_init_block_int use the xfs_magics array via the xfs_btree_magic macro to determine which magic value is needed. This makes all of the if (crc) / else stanzas identical, and the if/else can be removed, leading to a single, common init_block call. Signed-off-by: NEric Sandeen <sandeen@redhat.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
由 Eric Sandeen 提交于
xfs_btree_init_block_int() can determine whether crcs are in effect without the passed-in XFS_BTREE_CRC_BLOCKS flag; the mp argument allows us to determine this from the superblock. Remove the flag from callers, and use xfs_sb_version_hascrc(&mp->m_sb) internally instead. This removes one difference between the if & else cases in the callers. Signed-off-by: NEric Sandeen <sandeen@redhat.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
- 04 1月, 2017 1 次提交
-
-
由 Darrick J. Wong 提交于
We need to use the actual AG length when making per-AG reservations, since we could otherwise end up reserving more blocks out of the last AG than there are actual blocks. Complained-about-by: NBrian Foster <bfoster@redhat.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NChristoph Hellwig <hch@lst.de>
-
- 06 10月, 2016 1 次提交
-
-
由 Darrick J. Wong 提交于
To gracefully handle the situation where a CoW operation turns a single refcount extent into a lot of tiny ones and then run out of space when a tree split has to happen, use the per-AG reserved block pool to pre-allocate all the space we'll ever need for a maximal btree. For a 4K block size, this only costs an overhead of 0.3% of available disk space. When reflink is enabled, we have an unfortunate problem with rmap -- since we can share a block billions of times, this means that the reverse mapping btree can expand basically infinitely. When an AG is so full that there are no free blocks with which to expand the rmapbt, the filesystem will shut down hard. This is rather annoying to the user, so use the AG reservation code to reserve a "reasonable" amount of space for rmap. We'll prevent reflinks and CoW operations if we think we're getting close to exhausting an AG's free space rather than shutting down, but this permanent reservation should be enough for "most" users. Hopefully. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> [hch@lst.de: ensure that we invalidate the freed btree buffer] Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
- 05 10月, 2016 1 次提交
-
-
由 Darrick J. Wong 提交于
Report the reflink feature in the XFS geometry so that xfs_info and friends know the filesystem has this feature. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NChristoph Hellwig <hch@lst.de>
-
- 04 10月, 2016 1 次提交
-
-
由 Darrick J. Wong 提交于
Modify the growfs code to initialize new refcount btree blocks. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NChristoph Hellwig <hch@lst.de>
-
- 19 9月, 2016 1 次提交
-
-
由 Darrick J. Wong 提交于
One unfortunate quirk of the reference count and reverse mapping btrees -- they can expand in size when blocks are written to *other* allocation groups if, say, one large extent becomes a lot of tiny extents. Since we don't want to start throwing errors in the middle of CoWing, we need to reserve some blocks to handle future expansion. The transaction block reservation counters aren't sufficient here because we have to have a reserve of blocks in every AG, not just somewhere in the filesystem. Therefore, create two per-AG block reservation pools. One feeds the AGFL so that rmapbt expansion always succeeds, and the other feeds all other metadata so that refcountbt expansion never fails. Use the count of how many reserved blocks we need to have on hand to create a virtual reservation in the AG. Through selective clamping of the maximum length of allocation requests and of the length of the longest free extent, we can make it look like there's less free space in the AG unless the reservation owner is asking for blocks. In other words, play some accounting tricks in-core to make sure that we always have blocks available. On the plus side, there's nothing to clean up if we crash, which is contrast to the strategy that the rough draft used (actually removing extents from the freespace btrees). Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
- 17 8月, 2016 1 次提交
-
-
由 Darrick J. Wong 提交于
Track the number of blocks used for the rmapbt in the AGF. When we get to the AG reservation code we need this counter to quickly make our reservation during mount. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
- 03 8月, 2016 4 次提交
-
-
由 Darrick J. Wong 提交于
Originally-From: Dave Chinner <dchinner@redhat.com> So xfs_info and other userspace utilities know the filesystem is using this feature. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Darrick J. Wong 提交于
Originally-From: Dave Chinner <dchinner@redhat.com> The rmap btree is allocated from the AGFL, which means we have to ensure ENOSPC is reported to userspace before we run out of free space in each AG. The last allocation in an AG can cause a full height rmap btree split, and that means we have to reserve at least this many blocks *in each AG* to be placed on the AGFL at ENOSPC. Update the various space calculation functions to handle this. Also, because the macros are now executing conditional code and are called quite frequently, convert them to functions that initialise variables in the struct xfs_mount, use the new variables everywhere and document the calculations better. [darrick.wong@oracle.com: don't reserve blocks if !rmap] [dchinner@redhat.com: update m_ag_max_usable after growfs] Signed-off-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Darrick J. Wong 提交于
Originally-From: Dave Chinner <dchinner@redhat.com> Now we can read and write rmap btree blocks, we can add support to the growfs code to initialise new rmap btree blocks. [darrick.wong@oracle.com: fill out the rmap offset fields] Signed-off-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Darrick J. Wong 提交于
For the rmap btree to work, we have to feed the extent owner information to the the allocation and freeing functions. This information is what will end up in the rmap btree that tracks allocated extents. While we technically don't need the owner information when freeing extents, passing it allows us to validate that the extent we are removing from the rmap btree actually belonged to the owner we expected it to belong to. We also define a special set of owner values for internal metadata that would otherwise have no owner. This allows us to tell the difference between metadata owned by different per-ag btrees, as well as static fs metadata (e.g. AG headers) and internal journal blocks. There are also a couple of special cases we need to take care of - during EFI recovery, we don't actually know who the original owner was, so we need to pass a wildcard to indicate that we aren't checking the owner for validity. We also need special handling in growfs, as we "free" the space in the last AG when extending it, but because it's new space it has no actual owner... While touching the xfs_bmap_add_free() function, re-order the parameters to put the struct xfs_mount first. Extend the owner field to include both the owner type and some sort of index within the owner. The index field will be used to support reverse mappings when reflink is enabled. When we're freeing extents from an EFI, we don't have the owner information available (rmap updates have their own redo items). xfs_free_extent therefore doesn't need to do an rmap update. Make sure that the log replay code signals this correctly. This is based upon a patch originally from Dave Chinner. It has been extended to add more owner information with the intent of helping recovery operations when things go wrong (e.g. offset of user data block in a file). [dchinner: de-shout the xfs_rmap_*_owner helpers] [darrick: minor style fixes suggested by Christoph Hellwig] Signed-off-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-