- 31 10月, 2013 2 次提交
-
-
由 Dave Chinner 提交于
The remaining non-vectorised code for the directory structure is the node format blocks. This is shared with the attribute tree, and so is slightly more complex to vectorise. Introduce a "non-directory" directory ops structure that is attached to all non-directory inodes so that attribute operations can be vectorised for all inodes. Once we do this, we can vectorise all the da btree operations. Because this patch adds more infrastructure than it removes the binary size does not decrease: text data bss dec hex filename 794490 96802 1096 892388 d9de4 fs/xfs/xfs.o.orig 792986 96802 1096 890884 d9804 fs/xfs/xfs.o.p1 792350 96802 1096 890248 d9588 fs/xfs/xfs.o.p2 789293 96802 1096 887191 d8997 fs/xfs/xfs.o.p3 789005 96802 1096 886903 d8997 fs/xfs/xfs.o.p4 789061 96802 1096 886959 d88af fs/xfs/xfs.o.p5 789733 96802 1096 887631 d8b4f fs/xfs/xfs.o.p6 Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NBen Myers <bpm@sgi.com> Signed-off-by: NBen Myers <bpm@sgi.com>
-
由 Dave Chinner 提交于
Lots of the dir code now goes through switches to determine what is the correct on-disk format to parse. It generally involves a "xfs_sbversion_hasfoo" check, deferencing the superblock version and feature fields and hence touching several cache lines per operation in the process. Some operations do multiple checks because they nest conditional operations and they don't pass the information in a direct fashion between each other. Hence, add an ops vector to the xfs_inode structure that is configured when the inode is initialised to point to all the correct decode and encoding operations. This will significantly reduce the branchiness and cacheline footprint of the directory object decoding and encoding. This is the first patch in a series of conversion patches. It will introduce the ops structure, the setup of it and add the first operation to the vector. Subsequent patches will convert directory ops one at a time to keep the changes simple and obvious. Just this patch shows the benefit of such an approach on code size. Just converting the two shortform dir operations as this patch does decreases the built binary size by ~1500 bytes: $ size fs/xfs/xfs.o.orig fs/xfs/xfs.o.p1 text data bss dec hex filename 794490 96802 1096 892388 d9de4 fs/xfs/xfs.o.orig 792986 96802 1096 890884 d9804 fs/xfs/xfs.o.p1 $ That's a significant decrease in the instruction cache footprint of the directory code for such a simple change, and indicates that this approach is definitely worth pursuing further. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NBen Myers <bpm@sgi.com>
-
- 13 8月, 2013 4 次提交
-
-
由 Jie Liu 提交于
Introduce a new structure xfs_trans_res to hold transaction reservation item info per log ticket. We also need to improve xfs_trans_resv_calc() by initializing the log count as well as log flags for permanent log reservation. Signed-off-by: NJie Liu <jeff.liu@oracle.com> Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NMark Tinguely <tinguely@sgi.com> Signed-off-by: NBen Myers <bpm@sgi.com>
-
由 Dave Chinner 提交于
The struct xfs_perag has many kernel-only definitions in it, requiring a __KERNEL__ guard so userspace can use it to. Move it to xfs_mount.h so that it it kernel-only, and let userspace redefine it's own version of the structure containing only what it needs. This gets rid of another __KERNEL__ check in the XFS header files. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NMark Tinguely <tinguely@sgi.com> Signed-off-by: NBen Myers <bpm@sgi.com>
-
由 Dave Chinner 提交于
xfs_mount.c is shared with userspace, but the only functions that are shared are to do with physical superblock manipulations. This means that less than 25% of the xfs_mount.c code is actually shared with userspace. Move all the superblock functions to xfs_sb.c and share that instead with libxfs. Note that this will leave all the in-core transaction related superblock counter modifications in xfs_mount.c as none of that is shared with userspace. With a few more small changes, xfs_mount.h won't need to be shared with userspace anymore, either. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NMark Tinguely <tinguely@sgi.com> Signed-off-by: NBen Myers <bpm@sgi.com>
-
由 Dave Chinner 提交于
The transaction reservation size calculations is used by both kernel and userspace, but most of the transaction code in xfs_trans.c is kernel specific. Split all the transaction reservation code out into it's own files to make sharing with userspace simpler. This just leaves kernel-only definitions in xfs_trans.h, so it doesn't need to be shared with userspace anymore, either. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NMark Tinguely <tinguely@sgi.com> Signed-off-by: NBen Myers <bpm@sgi.com>
-
- 20 6月, 2013 1 次提交
-
-
由 Jie Liu 提交于
XFS_MOUNT_RETERR is going to be set at xfs_parseargs() if mp->m_dalign is enabled, so any time we enter "if (mp->m_dalign)" branch in xfs_update_alignment(), XFS_MOUNT_RETERR is set and so we always be emitting a warning and returning an error. Hence, we can remove it and get rid of a couple of redundant check up against it at xfs_upate_alignment(). Thanks Dave Chinner for the suggestions of simplify the code in xfs_parseargs(). Signed-off-by: NJie Liu <jeff.liu@oracle.com> Cc: Dave Chinner <dchinner@redhat.com> Cc: Mark Tinguely <tinguely@sgi.com> Reviewed-by: NMark Tinguely <tinguely@sgi.com> Signed-off-by: NBen Myers <bpm@sgi.com>
-
- 18 6月, 2013 1 次提交
-
-
由 Jeff Liu 提交于
Remove struct xfs_chash from struct xfs_mount as there is no user of it nowadays. Signed-off-by: NJie Liu <jeff.liu@oracle.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NMark Tinguely <tinguely@sgi.com> Signed-off-by: NBen Myers <bpm@sgi.com>
-
- 28 4月, 2013 1 次提交
-
-
由 Dave Chinner 提交于
With the addition of CRCs, there is such a wide and varied change to the on disk format that it makes sense to bump the superblock version number rather than try to use feature bits for all the new functionality. This commit introduces all the new superblock fields needed for all the new functionality: feature masks similar to ext4, separate project quota inodes, a LSN field for recovery and the CRC field. This commit does not bump the superblock version number, however. That will be done as a separate commit at the end of the series after all the new functionality is present so we switch it all on in one commit. This means that we can slowly introduce the changes without them being active and hence maintain bisectability of the tree. This patch is based on a patch originally written by myself back from SGI days, which was subsequently modified by Christoph Hellwig. There is relatively little of that patch remaining, but the history of the patch still should be acknowledged here. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NBen Myers <bpm@sgi.com> Signed-off-by: NBen Myers <bpm@sgi.com>
-
- 15 3月, 2013 1 次提交
-
-
由 Jeff Liu 提交于
Looks the old m_inode_shrink is obsoleted as we perform inodes reclaim per AG via m_reclaim_workqueue, this patch remove it from the xfs_mount structure if so. Signed-off-by: NJie Liu <jeff.liu@oracle.com> Cc: Dave Chinner <dchinner@redhat.com> Reviewed-by: NMark Tinguely <tinguely@sgi.com> Signed-off-by: NBen Myers <bpm@sgi.com>
-
- 02 2月, 2013 7 次提交
-
-
由 Jeff Liu 提交于
Currently, we calculate the attribute set transaction log space reservation at runtime in two parts: 1) XFS_ATTRSET_LOG_RES() which is calcuated out at mount time. 2) ((ext * (mp)->m_sb.sb_sectsize) + \ (ext * XFS_FSB_TO_B((mp), XFS_BM_MAXLEVELS(mp, XFS_ATTR_FORK))) + \ (128 * (ext + (ext * XFS_BM_MAXLEVELS(mp, XFS_ATTR_FORK)))))) which is calculated out at runtime since it depend on the given extent length in blocks. This patch renamed XFS_ATTRSET_LOG_RES(mp) to XFS_ATTRSETM_LOG_RES(mp) to indicate that it is figured out at mount time. Introduce XFS_ATTRSETRT_LOG_RES(mp) which would be used to calculate out the unit of the log space reservation for one block. In this way, the total runtime space for the given extent length can be figured out by: XFS_ATTRSETM_LOG_RES(mp) + XFS_ATTRSETRT_LOG_RES(mp) * ext Signed-off-by: NJie Liu <jeff.liu@oracle.com> CC: Dave Chinner <david@fromorbit.com> Reviewed-by: NMark Tinguely <tinguely@sgi.com> Signed-off-by: NBen Myers <bpm@sgi.com>
-
由 Jeff Liu 提交于
Introduce a new transaction space reservation XFS_SB_LOG_RES() for those transactions that need to modify the superblock on disk. Signed-off-by: NJie Liu <jeff.liu@oracle.com> CC: Dave Chinner <david@fromorbit.com> Reviewed-by: NMark Tinguely <tinguely@sgi.com> Signed-off-by: NBen Myers <bpm@sgi.com>
-
由 Jeff Liu 提交于
Convert the calculation for end of quotaoff log space reservation from runtime to mount time. Signed-off-by: NJie Liu <jeff.liu@oracle.com> CC: Dave Chinner <david@fromorbit.com> Reviewed-by: NMark Tinguely <tinguely@sgi.com> Signed-off-by: NBen Myers <bpm@sgi.com>
-
由 Jeff Liu 提交于
Convert the calculation of quota off transaction log space reservation from runtime to mount time. Signed-off-by: NJie Liu <jeff.liu@oracle.com> CC: Dave Chinner <david@fromorbit.com> Reviewed-by: NMark Tinguely <tinguely@sgi.com> Signed-off-by: NBen Myers <bpm@sgi.com>
-
由 Jeff Liu 提交于
The disk quota allocation log space reservation is calcuated at runtime, this patch does it at mount time. Signed-off-by: NJie Liu <jeff.liu@oracle.com> CC: Dave Chinner <david@fromorbit.com> Reviewed-by: NMark Tinguely <tinguely@sgi.com> Signed-off-by: NBen Myers <bpm@sgi.com>
-
由 Jeff Liu 提交于
For adjusting quota limits transactions, we calculate out the log space reservation at runtime, this patch does it at mount time. Signed-off-by: NJie Liu <jeff.liu@oracle.com> CC: Dave Chinner <david@fromorbit.com> Reviewed-by: NMark Tinguely <tinguely@sgi.com> Signed-off-by: NBen Myers <bpm@sgi.com>
-
由 Jeff Liu 提交于
The transaction log space for clearing/reseting the quota flags is calculated out at runtime, this patch can figure it out at mount time. Signed-off-by: NJie Liu <jeff.liu@oracle.com> CC: Dave Chinner <david@fromorbit.com> Reviewed-by: NMark Tinguely <tinguely@sgi.com> Signed-off-by: NBen Myers <bpm@sgi.com>
-
- 16 11月, 2012 3 次提交
-
-
由 Dave Chinner 提交于
To separate the verifiers from iodone functions and associate read and write verifiers at the same time, introduce a buffer verifier operations structure to the xfs_buf. This avoids the need for assigning the write verifier, clearing the iodone function and re-running ioend processing in the read verifier, and gets rid of the nasty "b_pre_io" name for the write verifier function pointer. If we ever need to, it will also be easier to add further content specific callbacks to a buffer with an ops structure in place. We also avoid needing to export verifier functions, instead we can simply export the ops structures for those that are needed outside the function they are defined in. This patch also fixes a directory block readahead verifier issue it exposed. This patch also adds ops callbacks to the inode/alloc btree blocks initialised by growfs. These will need more work before they will work with CRCs. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NPhil White <pwhite@sgi.com> Signed-off-by: NBen Myers <bpm@sgi.com>
-
由 Dave Chinner 提交于
Metadata buffers that are read from disk have write verifiers already attached to them, but newly allocated buffers do not. Add appropriate write verifiers to all new metadata buffers. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NBen Myers <bpm@sgi.com> Signed-off-by: NBen Myers <bpm@sgi.com>
-
由 Dave Chinner 提交于
Add a superblock verify callback function and pass it into the buffer read functions. Remove the now redundant verification code that is currently in use. Adding verification shows that secondary superblocks never have their "sb_inprogress" flag cleared by mkfs.xfs, so when validating the secondary superblocks during a grow operation we have to avoid checking this field. Even if we fix mkfs, we will still have to ignore this field for verification purposes unless a version of mkfs that does not have this bug was used. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NPhil White <pwhite@sgi.com> Signed-off-by: NBen Myers <bpm@sgi.com>
-
- 09 11月, 2012 1 次提交
-
-
由 Brian Foster 提交于
Create a new mount workqueue and delayed_work to enable background scanning and freeing of eofblocks inodes. The scanner kicks in once speculative preallocation occurs and stops requeueing itself when no eofblocks inodes exist. The scan interval is based on the new 'speculative_prealloc_lifetime' tunable (default to 5m). The background scanner performs unfiltered, best effort scans (which skips inodes under lock contention or with a dirty cache mapping). Signed-off-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NMark Tinguely <tinguely@sgi.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NBen Myers <bpm@sgi.com>
-
- 18 10月, 2012 4 次提交
-
-
由 Dave Chinner 提交于
xfs_sync.c now only contains inode reclaim functions and inode cache iteration functions. It is not related to sync operations anymore. Rename to xfs_icache.c to reflect it's contents and prepare for consolidation with the other inode cache file that exists (xfs_iget.c). Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NMark Tinguely <tinguely@sgi.com> Signed-off-by: NBen Myers <bpm@sgi.com>
-
由 Dave Chinner 提交于
With the syncd functions moved to the log and/or removed, the syncd workqueue is the only remaining bit left. It is used by the log covering/ail pushing work, as well as by the inode reclaim work. Given how cheap workqueues are these days, give the log and inode reclaim work their own work queues and kill the syncd work queue. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NMark Tinguely <tinguely@sgi.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NBen Myers <bpm@sgi.com>
-
由 Dave Chinner 提交于
We don't do any data writeback from XFS any more - the VFS is completely responsible for that, including for freeze. We can replace the remaining caller with a VFS level function that achieves the same thing, but without conflicting with current writeback work. This means we can remove the flush_work and xfs_flush_inodes() - the VFS functionality completely replaces the internal flush queue for doing this writeback work in a separate context to avoid stack overruns. This does have one complication - it cannot be called with page locks held. Hence move the flushing of delalloc space when ENOSPC occurs back up into xfs_file_aio_buffered_write when we don't hold any locks that will stall writeback. Unfortunately, writeback_inodes_sb_if_idle() is not sufficient to trigger delalloc conversion fast enough to prevent spurious ENOSPC whent here are hundreds of writers, thousands of small files and GBs of free RAM. Hence we need to use sync_sb_inodes() to block callers while we wait for writeback like the previous xfs_flush_inodes implementation did. That means we have to hold the s_umount lock here, but because this call can nest inside i_mutex (the parent directory in the create case, held by the VFS), we have to use down_read_trylock() to avoid potential deadlocks. In practice, this trylock will succeed on almost every attempt as unmount/remount type operations are exceedingly rare. Note: we always need to pass a count of zero to generic_file_buffered_write() as the previously written byte count. We only do this by accident before this patch by the virtue of ret always being zero when there are no errors. Make this explicit rather than needing to specifically zero ret in the ENOSPC retry case. Signed-off-by: NDave Chinner <dchinner@redhat.com> Tested-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NBen Myers <bpm@sgi.com>
-
由 Dave Chinner 提交于
The only thing the periodic sync work does now is flush the AIL and idle the log. These are really functions of the log code, so move the work to xfs_log.c and rename it appropriately. The only wart that this leaves behind is the xfssyncd_centisecs sysctl, otherwise the xfssyncd is dead. Clean up any comments that related to xfssyncd to reflect it's passing. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NMark Tinguely <tinguely@sgi.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NBen Myers <bpm@sgi.com>
-
- 17 8月, 2012 1 次提交
-
-
由 Alex Elder 提交于
I noticed that "struct xfs_mount_args" was still declared in "fs/xfs/xfs_mount.h". That struct doesn't even exist any more (and is obviously not referenced elsewhere in that header file). While in there, delete four other unneeded struct declarations in that file. Doing so highlights that "fs/xfs/xfs_trace.h" was relying indirectly on "xfs_mount.h" to be #included in order to declare "struct xfs_bmbt_irec", so add that declaration to resolve that issue. Signed-off-by: NAlex Elder <elder@inktank.com> Signed-off-by: NBen Myers <bpm@sgi.com>
-
- 31 7月, 2012 1 次提交
-
-
由 Jan Kara 提交于
Generic code now blocks all writers from standard write paths. So we add blocking of all writers coming from ioctl (we get a protection of ioctl against racing remount read-only as a bonus) and convert xfs_file_aio_write() to a non-racy freeze protection. We also keep freeze protection on transaction start to block internal filesystem writes such as removal of preallocated blocks. CC: Ben Myers <bpm@sgi.com> CC: Alex Elder <elder@kernel.org> CC: xfs@oss.sgi.com Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 22 6月, 2012 2 次提交
-
-
由 Mark Tinguely 提交于
Rename the XFS log structure to xlog to help crash distinquish it from the other logs in Linux. Signed-off-by: NMark Tinguely <tinguely@sgi.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NBen Myers <bpm@sgi.com>
-
由 Mark Tinguely 提交于
Rename the XFS log structure to xlog to help crash distinquish it from the other logs in Linux. Signed-off-by: NMark Tinguely <tinguely@sgi.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NBen Myers <bpm@sgi.com>
-
- 15 6月, 2012 2 次提交
-
-
由 Dave Chinner 提交于
XFS_MAXIOFFSET() is just a simple macro that resolves to mp->m_maxioffset. It doesn't need to exist, and it just makes the code unnecessarily loud and shouty. Make it quiet and easy to read. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NEric Sandeen <sandeen@redhat.com> Signed-off-by: NBen Myers <bpm@sgi.com>
-
由 Dave Chinner 提交于
The m_maxioffset field in the struct xfs_mount contains the same value as the superblock s_maxbytes field. There is no need to carry two copies of this limit around, so use the VFS superblock version. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NEric Sandeen <sandeen@redhat.com> Signed-off-by: NBen Myers <bpm@sgi.com>
-
- 15 5月, 2012 2 次提交
-
-
由 Dave Chinner 提交于
Doing background CIL flushes adds significant latency to whatever async transaction that triggers it. To avoid blocking async transactions on things like waiting for log buffer IO to complete, move the CIL push off into a workqueue. By moving the push work into a workqueue, we remove all the latency that the commit adds from the foreground transaction commit path. This also means that single threaded workloads won't do the CIL push procssing, leaving them more CPU to do more async transactions. To do this, we need to keep track of the sequence number we have pushed work for. This avoids having many transaction commits attempting to schedule work for the same sequence, and ensures that we only ever have one push (background or forced) in progress at a time. It also means that we don't need to take the CIL lock in write mode to check for potential background push races, which reduces lock contention. To avoid potential issues with "smart" IO schedulers, don't use the workqueue for log force triggered flushes. Instead, do them directly so that the log IO is done directly by the process issuing the log force and so doesn't get stuck on IO elevator queue idling incorrectly delaying the log IO from the workqueue. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NMark Tinguely <tinguely@sgi.com> Signed-off-by: NBen Myers <bpm@sgi.com>
-
由 Christoph Hellwig 提交于
Now that we write back all metadata either synchronously or through the AIL we can simply implement metadata freezing in terms of emptying the AIL. The implementation for this is fairly simply and straight-forward: A new routine is added that asks the xfsaild to push the AIL to the end and waits for it to complete and send a wakeup. The routine will then loop if the AIL is not actually empty, and continue to do so until the AIL is compeltely empty. We keep an inode reclaim pass in the freeze process to avoid having memory pressure have to reclaim inodes that require dirtying the filesystem to be reclaimed after the freeze has completed. This means we can also treat unmount in the exact same way as freeze. As an upside we can now remove the radix tree based inode writeback and xfs_unmountfs_writesb. [ Dave Chinner: - Cleaned up commit message. - Added inode reclaim passes back into freeze. - Cleaned up wakeup mechanism to avoid the use of a new sleep counter variable. ] Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NMark Tinguely <tinguely@sgi.com> Signed-off-by: NBen Myers <bpm@sgi.com>
-
- 06 3月, 2012 1 次提交
-
-
由 Christoph Hellwig 提交于
The new concurrency managed workqueues are cheap enough that we can create per-filesystem instead of global workqueues. This allows us to remove the trylock or defer scheme on the ilock, which is not helpful once we have outstanding log reservations until finishing a size update. Also allow the default concurrency on this workqueues so that I/O completions blocking on the ilock for one inode do not block process for another inode. Reviewed-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NMark Tinguely <tinguely@sgi.com> Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NBen Myers <bpm@sgi.com>
-
- 04 2月, 2012 1 次提交
-
-
由 Chandra Seetharaman 提交于
Change xfs_sb_from_disk() interface to take a mount pointer instead of a superblock pointer. This is to print mount point specific error messages in future fixes. Signed-off-by: NChandra Seetharaman <sekharan@us.ibm.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NBen Myers <bpm@sgi.com>
-
- 09 12月, 2011 1 次提交
-
-
由 Christoph Hellwig 提交于
The delaylog mode has been the default for a long time, and the nodelaylog option has been scheduled for removal in Linux 3.3. Remove it and code only used by it now that we have opened the 3.3 window. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NBen Myers <bpm@sgi.com>
-
- 21 7月, 2011 1 次提交
-
-
由 Chandra Seetharaman 提交于
Remove the second parameter to xfs_sb_count() since all callers of the function set them. Also, fix the header comment regarding it being called periodically. Signed-off-by: NChandra Seetharaman <sekharan@us.ibm.com> Signed-off-by: NAlex Elder <aelder@sgi.com>
-
- 25 5月, 2011 1 次提交
-
-
由 Christoph Hellwig 提交于
Now that we have reliably tracking of deleted extents in a transaction we can easily implement "online" discard support which calls blkdev_issue_discard once a transaction commits. The actual discard is a two stage operation as we first have to mark the busy extent as not available for reuse before we can start the actual discard. Note that we don't bother supporting discard for the non-delaylog mode. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAlex Elder <aelder@sgi.com>
-
- 08 4月, 2011 2 次提交
-
-
由 Dave Chinner 提交于
Background inode reclaim needs to run more frequently that the XFS syncd work is run as 30s is too long between optimal reclaim runs. Add a new periodic work item to the xfs syncd workqueue to run a fast, non-blocking inode reclaim scan. Background inode reclaim is kicked by the act of marking inodes for reclaim. When an AG is first marked as having reclaimable inodes, the background reclaim work is kicked. It will continue to run periodically untill it detects that there are no more reclaimable inodes. It will be kicked again when the first inode is queued for reclaim. To ensure shrinker based inode reclaim throttles to the inode cleaning and reclaim rate but still reclaim inodes efficiently, make it kick the background inode reclaim so that when we are low on memory we are trying to reclaim inodes as efficiently as possible. This kick shoul d not be necessary, but it will protect against failures to kick the background reclaim when inodes are first dirtied. To provide the rate throttling, make the shrinker pass do synchronous inode reclaim so that it blocks on inodes under IO. This means that the shrinker will reclaim inodes rather than just skipping over them, but it does not adversely affect the rate of reclaim because most dirty inodes are already under IO due to the background reclaim work the shrinker kicked. These two modifications solve one of the two OOM killer invocations Chris Mason reported recently when running a stress testing script. The particular workload trigger for the OOM killer invocation is where there are more threads than CPUs all unlinking files in an extremely memory constrained environment. Unlike other solutions, this one does not have a performance impact on performance when memory is not constrained or the number of concurrent threads operating is <= to the number of CPUs. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NAlex Elder <aelder@sgi.com>
-
由 Dave Chinner 提交于
On of the problems with the current inode flush at ENOSPC is that we queue a flush per ENOSPC event, regardless of how many are already queued. Thi can result in hundreds of queued flushes, most of which simply burn CPU scanned and do no real work. This simply slows down allocation at ENOSPC. We really only need one active flush at a time, and we can easily implement that via the new xfs_syncd_wq. All we need to do is queue a flush if one is not already active, then block waiting for the currently active flush to complete. The result is that we only ever have a single ENOSPC inode flush active at a time and this greatly reduces the overhead of ENOSPC processing. On my 2p test machine, this results in tests exercising ENOSPC conditions running significantly faster - 042 halves execution time, 083 drops from 60s to 5s, etc - while not introducing test regressions. This allows us to remove the old xfssyncd threads and infrastructure as they are no longer used. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NAlex Elder <aelder@sgi.com>
-