- 12 6月, 2019 3 次提交
-
-
由 Darrick J. Wong 提交于
inode_cluster_size is supposed to represent the size (in bytes) of an inode cluster buffer. We avoid having to handle multiple clusters per filesystem block on filesystems with large blocks by openly rounding this value up to 1 FSB when necessary. However, we never reset inode_cluster_size to reflect this new rounded value, which adds to the potential for mistakes in calculating geometries. Fix this by setting inode_cluster_size to reflect the rounded-up size if needed, and special-case the few places in the sparse inodes code where we actually need the smaller value to validate on-disk metadata. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NDave Chinner <dchinner@redhat.com>
-
由 Darrick J. Wong 提交于
Migrate all of the inode geometry setup code from xfs_mount.c into a single libxfs function that we can share with xfsprogs. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NDave Chinner <dchinner@redhat.com>
-
由 Darrick J. Wong 提交于
Separate the inode geometry information into a distinct structure. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NDave Chinner <dchinner@redhat.com>
-
- 27 4月, 2019 2 次提交
-
-
由 Darrick J. Wong 提交于
"reclaim" is used throughout the icache code to mean reclamation of incore inode structures. It's also used for two helper functions that toggle background deletion of speculative preallocations. Separate the second of the two uses to make things less confusing. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NDave Chinner <dchinner@redhat.com>
-
由 Darrick J. Wong 提交于
Add a percpu counter to track the number of blocks directly reserved for delayed allocations on the data device. This counter (in contrast to i_delayed_blks) does not track allocated CoW staging extents or anything going on with the realtime device. It will be used in the upcoming summary counter scrub function to check the free block counts without having to freeze the filesystem or walk all the inodes to find the delayed allocations. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NDave Chinner <dchinner@redhat.com>
-
- 15 4月, 2019 3 次提交
-
-
由 Darrick J. Wong 提交于
If we know the filesystem metadata isn't healthy during unmount, we want to encourage the administrator to run xfs_repair right away. We can't do this if BAD_SUMMARY will cause an unclean log unmount to force summary recalculation, so turn it off if the fs is bad. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NBrian Foster <bfoster@redhat.com>
-
由 Darrick J. Wong 提交于
Replace the BAD_SUMMARY mount flag with calls to the equivalent health tracking code. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NBrian Foster <bfoster@redhat.com>
-
由 Darrick J. Wong 提交于
Add the necessary in-core metadata fields to keep track of which parts of the filesystem have been observed and which parts were observed to be unhealthy, and print a warning at unmount time if we have unfixed problems. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NBrian Foster <bfoster@redhat.com>
-
- 12 2月, 2019 1 次提交
-
-
由 Darrick J. Wong 提交于
Use a rhashtable to cache the unlinked list incore. This should speed up unlinked processing considerably when there are a lot of inodes on the unlinked list because iunlink_remove no longer has to traverse an entire bucket list to find which inode points to the one being removed. The incore list structure records "X.next_unlinked = Y" relations, with the rhashtable using Y to index the records. This makes finding the inode X that points to a inode Y very quick. If our cache fails to find anything we can always fall back on the old method. FWIW this drastically reduces the amount of time it takes to remove inodes from the unlinked list. I wrote a program to open a lot of O_TMPFILE files and then close them in the same order, which takes a very long time if we have to traverse the unlinked lists. With the ptach, I see: + /d/t/tmpfile/tmpfile Opened 193531 files in 6.33s. Closed 193531 files in 5.86s real 0m12.192s user 0m0.064s sys 0m11.619s + cd / + umount /mnt real 0m0.050s user 0m0.004s sys 0m0.030s And without the patch: + /d/t/tmpfile/tmpfile Opened 193588 files in 6.35s. Closed 193588 files in 751.61s real 12m38.853s user 0m0.084s sys 12m34.470s + cd / + umount /mnt real 0m0.086s user 0m0.000s sys 0m0.060s Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NBrian Foster <bfoster@redhat.com>
-
- 13 12月, 2018 2 次提交
-
-
由 Darrick J. Wong 提交于
Store the inode cluster alignment information in units of inodes and blocks in the mount data so that we don't have to keep recalculating them. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NBrian Foster <bfoster@redhat.com>
-
由 Darrick J. Wong 提交于
Store the number of inodes and blocks per inode cluster in the mount data so that we don't have to keep recalculating them. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NBrian Foster <bfoster@redhat.com>
-
- 13 8月, 2018 1 次提交
-
-
由 Darrick J. Wong 提交于
Since the sb write verifier trips on bad icounts, we should also force a mount time recalculation of the summary counters if the icount is bad. This helps us avoid blowing up at freeze/unmount time when the bad counter gets written back out. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NAllison Henderson <allison.henderson@oracle.com> Reviewed-by: NCarlos Maiolino <cmaiolino@redhat.com>
-
- 01 8月, 2018 1 次提交
-
-
由 Darrick J. Wong 提交于
Move the per-AG busy extent tree initialization to the per-ag structure initialization since we don't want online repair to leak the old tree. We only deconstruct the tree at unmount time, so this should be safe. This also enables us to eliminate the commented out initialization in the xfsprogs libxfs. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NCarlos Maiolino <cmaiolino@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de>
-
- 27 7月, 2018 1 次提交
-
-
由 Darrick J. Wong 提交于
Replace the IRELE macro with a proper function so that we can do proper typechecking and so that we can stop open-coding iput in scrub, which means that we'll be able to ftrace inode lifetimes going through scrub correctly. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NCarlos Maiolino <cmaiolino@redhat.com> Reviewed-by: NBrian Foster <bfoster@redhat.com>
-
- 24 7月, 2018 2 次提交
-
-
由 Darrick J. Wong 提交于
Use the "bad summary count" mount flag from the previous patch to skip writing the unmount record to force log recovery at the next mount, which will recalculate the summary counters for us. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NChristoph Hellwig <hch@lst.de>
-
由 Darrick J. Wong 提交于
Filippo Giunchedi complained that xfs doesn't even perform basic sanity checks of the fs summary counters at mount time. Therefore, recalculate the summary counters from the AGFs after log recovery if the counts were bad (or we had to recover the fs). Enhance the recalculation routine to fail the mount entirely if the new values are also obviously incorrect. We use a mount state flag to record the "bad summary count" state so that the (subsequent) online fsck patches can detect subtlely incorrect counts and set the flag; clear it userspace asks for a repair; or force a recalculation at the next mount if nobody fixes it by unmount time. Reported-by: NFilippo Giunchedi <fgiunchedi@wikimedia.org> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NChristoph Hellwig <hch@lst.de>
-
- 07 6月, 2018 1 次提交
-
-
由 Dave Chinner 提交于
Remove the verbose license text from XFS files and replace them with SPDX tags. This does not change the license of any of the code, merely refers to the common, up-to-date license files in LICENSES/ This change was mostly scripted. fs/xfs/Makefile and fs/xfs/libxfs/xfs_fs.h were modified by hand, the rest were detected and modified by the following command: for f in `git grep -l "GNU General" fs/xfs/` ; do echo $f cat $f | awk -f hdr.awk > $f.new mv -f $f.new $f done And the hdr.awk script that did the modification (including detecting the difference between GPL-2.0 and GPL-2.0+ licenses) is as follows: $ cat hdr.awk BEGIN { hdr = 1.0 tag = "GPL-2.0" str = "" } /^ \* This program is free software/ { hdr = 2.0; next } /any later version./ { tag = "GPL-2.0+" next } /^ \*\// { if (hdr > 0.0) { print "// SPDX-License-Identifier: " tag print str print $0 str="" hdr = 0.0 next } print $0 next } /^ \* / { if (hdr > 1.0) next if (hdr > 0.0) { if (str != "") str = str "\n" str = str $0 next } print $0 next } /^ \*/ { if (hdr > 0.0) next print $0 next } // { if (hdr > 0.0) { if (str != "") str = str "\n" str = str $0 next } print $0 } END { } $ Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
- 06 6月, 2018 1 次提交
-
-
由 Dave Chinner 提交于
When looking up the root inode at mount time, we don't actually do any verification to check that the inode is allocated and accounted for correctly in the INOBT. Make the checks on the root inode more robust by making it an untrusted lookup. This forces the inode lookup to use the inode btree to verify the inode is allocated and mapped correctly to disk. This will also have the effect of catching a significant number of AGI/INOBT related corruptions in AG 0 at mount time. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NCarlos Maiolino <cmaiolino@redhat.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
- 16 5月, 2018 1 次提交
-
-
由 Darrick J. Wong 提交于
Rebuilding the reverse-mapping tree requires us to quiesce all inodes in the filesystem, so we must stop background reclamation of post-EOF and CoW prealloc blocks. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NDave Chinner <dchinner@redhat.com>
-
- 26 3月, 2018 1 次提交
-
-
由 Brian Foster 提交于
Most of the generic data structures embedded in xfs_mount are dynamically initialized immediately after mp is allocated. A few fields are left out and initialized during the xfs_mountfs() sequence, after mp has been attached to the superblock. To clean this up and help prevent premature access of associated fields, refactor xfs_mount allocation and all dependent init calls into a new helper. This self-documents that all low level data structures (i.e., locks, trees, etc.) should be initialized before xfs_mount is attached to the superblock. Signed-off-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
- 12 3月, 2018 1 次提交
-
-
由 Eric Sandeen 提交于
The dmevmask structure member is a dmapi leftover; it's set here and there but never actually used. Remove it. Signed-off-by: NEric Sandeen <sandeen@redhat.com> Reviewed-by: NBill O'Donnell <billodo@redhat.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
- 13 1月, 2018 1 次提交
-
-
由 Xiongwei Song 提交于
The mutex pag_ici_reclaim_lock of xfs_perag_t structure is initialized in xfs_initialize_perag. If happen errors in xfs_initialize_perag, or free resources in xfs_free_perag, wo need to destroy the mutex before free perag. Signed-off-by: NXiongwei Song <sxwjean@me.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
- 10 11月, 2017 1 次提交
-
-
由 Darrick J. Wong 提交于
When mounting fails, we must force-reclaim inodes (and disable delayed reclaim) /after/ the realtime and quota control have let go of the realtime and quota inodes. Without this, we corrupt the timer list and cause other weird problems. Found by xfs/376 fuzzing u3.bmbt[0].lastoff on an rmap filesystem to force a bogus post-eof extent reclaim that causes the fs to go down. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NDave Chinner <dchinner@redhat.com>
-
- 12 10月, 2017 1 次提交
-
-
由 Thomas Meyer 提交于
Bool initializations should use true and false. Bool tests don't need comparisons. Signed-off-by: NThomas Meyer <thomas@m3y3r.de> Reviewed-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
- 18 8月, 2017 2 次提交
-
-
由 Darrick J. Wong 提交于
If we fail a mount on account of cow recovery errors, it's possible that a previous quotacheck left some dquots in memory. The bailout clause of xfs_mountfs forgets to purge these, and so we leak them. Fix that. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NBrian Foster <bfoster@redhat.com>
-
由 Darrick J. Wong 提交于
Way back when we established inode block-map redo log items, it was discovered that we needed to prevent the VFS from evicting inodes during log recovery because any given inode might be have bmap redo items to replay even if the inode has no link count and is ultimately deleted, and any eviction of an unlinked inode causes the inode to be truncated and freed too early. To make this possible, we set MS_ACTIVE so that inodes would not be torn down immediately upon release. Unfortunately, this also results in the quota inodes not being released at all if a later part of the mount process should fail, because we never reclaim the inodes. So, set MS_ACTIVE right before we do the last part of log recovery and clear it immediately after we finish the log recovery so that everything will be torn down properly if we abort the mount. Fixes: 17c12bcd ("xfs: when replaying bmap operations, don't let unlinked inodes get reaped") Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NBrian Foster <bfoster@redhat.com>
-
- 28 6月, 2017 1 次提交
-
-
由 Darrick J. Wong 提交于
Remove the xfs_etest structure in favor of a per-mountpoint structure. This will give us the flexibility to set as many error injection points as we want, and later enable us to set up sysfs knobs to set the trigger frequency as we wish. This comes at a cost of higher memory use, but unti we hit 1024 injection points (we're at 29) or a lot of mounts this shouldn't be a huge issue. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NCarlos Maiolino <cmaiolino@redhat.com>
-
- 21 6月, 2017 1 次提交
-
-
由 Nikolay Borisov 提交于
Currently, percpu_counter_add is a wrapper around __percpu_counter_add which is preempt safe due to explicit calls to preempt_disable. Given how __ prefix is used in percpu related interfaces, the naming unfortunately creates the false sense that __percpu_counter_add is less safe than percpu_counter_add. In terms of context-safety, they're equivalent. The only difference is that the __ version takes a batch parameter. Make this a bit more explicit by just renaming __percpu_counter_add to percpu_counter_add_batch. This patch doesn't cause any functional changes. tj: Minor updates to patch description for clarity. Cosmetic indentation updates. Signed-off-by: NNikolay Borisov <nborisov@suse.com> Signed-off-by: NTejun Heo <tj@kernel.org> Cc: Chris Mason <clm@fb.com> Cc: Josef Bacik <jbacik@fb.com> Cc: David Sterba <dsterba@suse.com> Cc: Darrick J. Wong <darrick.wong@oracle.com> Cc: Jan Kara <jack@suse.com> Cc: Jens Axboe <axboe@fb.com> Cc: linux-mm@kvack.org Cc: "David S. Miller" <davem@davemloft.net>
-
- 20 6月, 2017 1 次提交
-
-
由 Darrick J. Wong 提交于
This is a purely mechanical patch that removes the private __{u,}int{8,16,32,64}_t typedefs in favor of using the system {u,}int{8,16,32,64}_t typedefs. This is the sed script used to perform the transformation and fix the resulting whitespace and indentation errors: s/typedef\t__uint8_t/typedef __uint8_t\t/g s/typedef\t__uint/typedef __uint/g s/typedef\t__int\([0-9]*\)_t/typedef int\1_t\t/g s/__uint8_t\t/__uint8_t\t\t/g s/__uint/uint/g s/__int\([0-9]*\)_t\t/__int\1_t\t\t/g s/__int/int/g /^typedef.*int[0-9]*_t;$/d Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NChristoph Hellwig <hch@lst.de>
-
- 05 6月, 2017 3 次提交
-
-
由 Christoph Hellwig 提交于
For some file systems we still memcpy into it, but in various places this already allows us to use the proper uuid helpers. More to come.. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NAmir Goldstein <amir73il@gmail.com> Acked-by: Mimi Zohar <zohar@linux.vnet.ibm.com> (Changes to IMA/EVM) Reviewed-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com>
-
由 Amir Goldstein 提交于
Use the common helper uuid_is_null() and remove the xfs specific helper uuid_is_nil(). The common helper does not check for the NULL pointer value as xfs helper did, but xfs code never calls the helper with a pointer that can be NULL. Conform comments and warning strings to use the term 'null uuid' instead of 'nil uuid', because this is the terminology used by lib/uuid.c and its users. It is also the terminology used in userspace by libuuid and xfsprogs. Signed-off-by: NAmir Goldstein <amir73il@gmail.com> [hch: remove now unused uuid.[ch]] Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com>
-
由 Christoph Hellwig 提交于
Opencode uuid_getnodeuniq in the only caller, and directly decode the uuid_t representation instead of using a structure cast for it. Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
- 28 4月, 2017 1 次提交
-
-
由 Amir Goldstein 提交于
Copy the uuid of the filesystem to struct super_block s_uuid field, as several other filesystems already do. Copy regardless of the nouuid mount option, because other filesystems also do not guaranty uniqueness of the s_uuid field in super_block struct. Signed-off-by: NAmir Goldstein <amir73il@gmail.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
- 08 3月, 2017 1 次提交
-
-
由 Chandan Rajendra 提交于
When block size is larger than inode cluster size, the call to XFS_B_TO_FSBT(mp, mp->m_inode_cluster_size) returns 0. Also, mkfs.xfs would have set xfs_sb->sb_inoalignmt to 0. Hence in xfs_set_inoalignment(), xfs_mount->m_inoalign_mask gets initialized to -1 instead of 0. However, xfs_mount->m_sinoalign would get correctly intialized to 0 because for every positive value of xfs_mount->m_dalign, the condition "!(mp->m_dalign & mp->m_inoalign_mask)" would evaluate to false. Also, xfs_imap() worked fine even with xfs_mount->m_inoalign_mask having -1 as the value because blks_per_cluster variable would have the value 1 and hence we would never have a need to use xfs_mount->m_inoalign_mask to compute the inode chunk's agbno and offset within the chunk. Signed-off-by: NChandan Rajendra <chandan@linux.vnet.ibm.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
- 10 2月, 2017 3 次提交
-
-
由 Christoph Hellwig 提交于
Instead we submit the discard requests and use another workqueue to release the extents from the extent busy list. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
由 Christoph Hellwig 提交于
Currently we force the log and simply try again if we hit a busy extent, but especially with online discard enabled it might take a while after the log force for the busy extents to disappear, and we might have already completed our second pass. So instead we add a new waitqueue and a generation counter to the pag structure so that we can do wakeups once we've removed busy extents, and we replace the single retry with an unconditional one - after all we hold the AGF buffer lock, so no other allocations or frees can be racing with us in this AG. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
由 Bill O'Donnell 提交于
If pag cannot be allocated, the current error exit path will trip a null pointer deference error when calling xfs_buf_hash_destroy with a null pag. Fix this by adding a new error exit labels and jumping to those accordingly, avoiding the hash destroy and unnecessary kmem_free on pag. Up to three things need to be properly unwound: 1) pag memory allocation 2) xfs_buf_hash_init 3) radix_tree_insert For any given iteration through the loop, any of the above which succeed must be unwound for /this/ pag, and then all prior initialized pags must be unwound. Addresses-Coverity-Id: 1397628 ("Dereference after null check") Reported-by: NColin Ian King <colin.king@canonical.com> Signed-off-by: NBill O'Donnell <billodo@redhat.com> Reviewed-by: NEric Sandeen <sandeen@redhat.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
- 07 12月, 2016 1 次提交
-
-
由 Lucas Stach 提交于
On filesystems with a lot of metadata and in metadata intensive workloads xfs_buf_find() is showing up at the top of the CPU cycles trace. Most of the CPU time is spent on CPU cache misses while traversing the rbtree. As the buffer cache does not need any kind of ordering, but fast lookups a hashtable is the natural data structure to use. The rhashtable infrastructure provides a self-scaling hashtable implementation and allows lookups to proceed while the table is going through a resize operation. This reduces the CPU-time spent for the lookups to 1/3 even for small filesystems with a relatively small number of cached buffers, with possibly much larger gains on higher loaded filesystems. [dchinner: reduce minimum hash size to an acceptable size for large filesystems with many AGs with no active use.] [dchinner: remove stale rbtree asserts.] [dchinner: use xfs_buf_map for compare function argument.] [dchinner: make functions static.] [dchinner: remove redundant comments.] Signed-off-by: NLucas Stach <dev@lynxeye.de> Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
- 20 10月, 2016 1 次提交
-
-
由 Darrick J. Wong 提交于
As part of the inode block map intent log item recovery process, we had to set the IRECOVERY flag to prevent an unlinked inode from being truncated during the first iput call. This required us to set MS_ACTIVE so that iput puts the inode on the lru instead of immediately evicting the inode. Unfortunately, if the mount fails later on, the inodes that have been loaded (root dir and realtime) actually need to be evicted since we're aborting the mount. If we don't clear MS_ACTIVE in the failure step, those inodes are not evicted and therefore leak. The leak was found by running xfs/130 and rmmoding xfs immediately after the test. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
- 06 10月, 2016 1 次提交
-
-
由 Darrick J. Wong 提交于
Trim CoW reservations made on behalf of a cowextsz hint if they get too old or we run low on quota, so long as we don't have dirty data awaiting writeback or directio operations in progress. Garbage collection of the cowextsize extents are kept separate from prealloc extent reaping because setting the CoW prealloc lifetime to a (much) higher value than the regular prealloc extent lifetime has been useful for combatting CoW fragmentation on VM hosts where the VMs experience bursty write behaviors and we can keep the utilization ratios low enough that we don't start to run out of space. IOWs, it benefits us to keep the CoW fork reservations around for as long as we can unless we run out of blocks or hit inode reclaim. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NChristoph Hellwig <hch@lst.de>
-