- 02 10月, 2013 1 次提交
-
-
由 Jie Liu 提交于
Get rid of function variable count from xfs_iomap_write_allocate() as it is unused. Additionally, checkpatch warn me of the following for this change: WARNING: extern prototypes should be avoided in .h files +extern int xfs_iomap_write_allocate(struct xfs_inode *, xfs_off_t, So this patch also remove all extern function prototypes at xfs_iomap.h to suppress it to make this code style in consistent manner in this file. Signed-off-by: NJie Liu <jeff.liu@oracle.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NBen Myers <bpm@sgi.com>
-
- 13 8月, 2013 4 次提交
-
-
由 Jie Liu 提交于
With the new xfs_trans_res structure has been introduced, the log reservation size, log count as well as log flags are pre-initialized at mount time. So it's time to refine xfs_trans_reserve() interface to be more neat. Also, introduce a new helper M_RES() to return a pointer to the mp->m_resv structure to simplify the input. Signed-off-by: NJie Liu <jeff.liu@oracle.com> Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NMark Tinguely <tinguely@sgi.com> Signed-off-by: NBen Myers <bpm@sgi.com>
-
由 Dave Chinner 提交于
There are a few small helper functions in xfs_util, all related to xfs_inode modifications. Move them all to xfs_inode.c so all xfs_inode operations are consiolidated in the one place. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NMark Tinguely <tinguely@sgi.com> Signed-off-by: NBen Myers <bpm@sgi.com>
-
由 Dave Chinner 提交于
There is a bunch of code in xfs_bmap.c that is kernel specific and not shared with userspace. To minimise the difference between the kernel and userspace code, shift this unshared code to xfs_bmap_util.c, and the declarations to xfs_bmap_util.h. The biggest issue here is xfs_bmap_finish() - userspace has it's own definition of this function, and so we need to move it out of xfs_bmap.[ch]. This means several other files need to include xfs_bmap_util.h as well. It also introduces and interesting dance for the stack switching code in xfs_bmapi_allocate(). The stack switching/workqueue code is actually moved to xfs_bmap_util.c, so that userspace can simply use a #define in a header file to connect the dots without needing to know about the stack switch code at all. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NMark Tinguely <tinguely@sgi.com> Signed-off-by: NBen Myers <bpm@sgi.com>
-
由 Dave Chinner 提交于
The on disk format definitions of the on-disk dquot, log formats and quota off log formats are all intertwined with other definitions for quotas. Separate them out into their own header file so they can easily be shared with userspace. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NMark Tinguely <tinguely@sgi.com> Signed-off-by: NBen Myers <bpm@sgi.com>
-
- 28 6月, 2013 1 次提交
-
-
由 Dave Chinner 提交于
Dedicated small file workloads have been seeing significant free space fragmentation causing premature inode allocation failure when large inode sizes are in use. A particular test case showed that a workload that runs to a real ENOSPC on 256 byte inodes would fail inode allocation with ENOSPC about about 80% full with 512 byte inodes, and at about 50% full with 1024 byte inodes. The same workload, when run with -o allocsize=4096 on 1024 byte inodes would run to being 100% full before giving ENOSPC. That is, no freespace fragmentation at all. The issue was caused by the specific IO pattern the application had - the framework it was using did not support direct IO, and so it was emulating it by using fadvise(DONT_NEED). The result was that the data was getting written back before the speculative prealloc had been trimmed from memory by the close(), and so small single block files were being allocated with 2 blocks, and then having one truncated away. The result was lots of small 4k free space extents, and hence each new 8k allocation would take another 8k from contiguous free space and turn it into 4k of allocated space and 4k of free space. Hence inode allocation, which requires contiguous, aligned allocation of 16k (256 byte inodes), 32k (512 byte inodes) or 64k (1024 byte inodes) can fail to find sufficiently large freespace and hence fail while there is still lots of free space available. There's a simple fix for this, and one that has precendence in the allocator code already - don't do speculative allocation unless the size of the file is larger than a certain size. In this case, that size is the minimum default preallocation size: mp->m_writeio_blocks. And to keep with the concept of being nice to people when the files are still relatively small, cap the prealloc to mp->m_writeio_blocks until the file goes over a stripe unit is size, at which point we'll fall back to the current behaviour based on the last extent size. This will effectively turn off speculative prealloc for very small files, keep preallocation low for small files, and behave as it currently does for any file larger than a stripe unit. This completely avoids the freespace fragmentation problem this particular IO pattern was causing. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NMark Tinguely <tinguely@sgi.com> Signed-off-by: NBen Myers <bpm@sgi.com>
-
- 23 3月, 2013 4 次提交
-
-
由 Brian Foster 提交于
Add a tracepoint to provide some feedback on preallocation size calculation. Signed-off-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NMark Tinguely <tinguely@sgi.com> Signed-off-by: NBen Myers <bpm@sgi.com>
-
由 Brian Foster 提交于
Introduce the need_throttle() and calc_throttle() functions to independently check whether throttling is required for a particular dquot and if so, calculate the associated throttling metrics based on the state of the quota. We use the same general algorithm to calculate the throttle shift as for global free space with the exception of using three stages rather than five. Update xfs_iomap_prealloc_size() to use the smallest available prealloc size based on each of the constraints and apply the maximum shift to obtain the throttled preallocation size. Signed-off-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NMark Tinguely <tinguely@sgi.com> Signed-off-by: NBen Myers <bpm@sgi.com>
-
由 Brian Foster 提交于
The round down occurs towards the beginning of the function. Push it down after throttling has occurred. This is to support adding further transformations to 'alloc_blocks' that might not preserve power-of-two alignment (and thus could lead to rounding down multiple times). Signed-off-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NBen Myers <bpm@sgi.com> Reviewed-by: NMark Tinguely <tinguely@sgi.com> Signed-off-by: NBen Myers <bpm@sgi.com>
-
由 Brian Foster 提交于
The majority of xfs_iomap_prealloc_size() executes within the check for lack of default I/O size. Reverse the logic to remove the extra indentation. Signed-off-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NBen Myers <bpm@sgi.com> Reviewed-by: NMark Tinguely <tinguely@sgi.com> Signed-off-by: NBen Myers <bpm@sgi.com>
-
- 19 3月, 2013 2 次提交
-
-
由 Mark Tinguely 提交于
Fix the return type of xfs_iomap_eof_prealloc_initial_size() to xfs_fsblock_t to reflect the fact that the return value may be an unsigned 64 bits if XFS_BIG_BLKNOS is defined. Signed-off-by: NMark Tinguely <tinguely@sgi.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NBen Myers <bpm@sgi.com> (cherry picked from commit e8108ced)
-
由 Brian Foster 提交于
If freesp == 0, we could end up in an infinite loop while squashing the preallocation. Break the loop when we've killed the prealloc entirely. Signed-off-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NBen Myers <bpm@sgi.com> (cherry picked from commit e78c420b)
-
- 08 3月, 2013 3 次提交
-
-
由 Mark Tinguely 提交于
Fix the return type of xfs_iomap_eof_prealloc_initial_size() to xfs_fsblock_t to reflect the fact that the return value may be an unsigned 64 bits if XFS_BIG_BLKNOS is defined. Signed-off-by: NMark Tinguely <tinguely@sgi.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NBen Myers <bpm@sgi.com>
-
由 Brian Foster 提交于
The updated speculative preallocation algorithm for handling sparse files can becomes less effective in situations with a high number of concurrent, sequential writers. The number of writers and amount of available RAM affect the writeback bandwidth slicing algorithm, which in turn affects the block allocation pattern of XFS. For example, running 32 sequential writers on a system with 32GB RAM, preallocs become fixed at a value of around 128MB (instead of steadily increasing to the 8GB maximum as sequential writes proceed). Update the speculative prealloc heuristic to base the size of the next prealloc on double the size of the preceding extent. This preserves the original aggressive speculative preallocation behavior and continues to accomodate sparse files at a slight cost of increasing the size of preallocated data regions following holes of sparse files. Signed-off-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NBen Myers <bpm@sgi.com>
-
由 Brian Foster 提交于
If freesp == 0, we could end up in an infinite loop while squashing the preallocation. Break the loop when we've killed the prealloc entirely. Signed-off-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NBen Myers <bpm@sgi.com>
-
- 15 2月, 2013 1 次提交
-
-
由 Dave Chinner 提交于
Speculative preallocation based on the current file size works well for contiguous files, but is sub-optimal for sparse files where the EOF preallocation can fill holes and result in large amounts of zeros being written when it is not necessary. The algorithm is modified to prevent EOF speculative preallocation from triggering larger allocations on IO patterns of truncate--to-zero-seek-write-seek-write-.... which results in non-sparse files for large files. This, unfortunately, is the way cp now behaves when copying sparse files and so needs to be fixed. What this code does is that it looks at the existing extent adjacent to the current EOF and if it determines that it is a hole we disable speculative preallocation altogether. To avoid the next write from doing a large prealloc, it takes the size of subsequent preallocations from the current size of the existing EOF extent. IOWs, if you leave a hole in the file, it resets preallocation behaviour to the same as if it was a zero size file. Example new behaviour: $ xfs_io -f -c "pwrite 0 31m" \ -c "pwrite 33m 1m" \ -c "pwrite 128m 1m" \ -c "fiemap -v" /mnt/scratch/blah wrote 32505856/32505856 bytes at offset 0 31 MiB, 7936 ops; 0.0000 sec (1.608 GiB/sec and 421432.7439 ops/sec) wrote 1048576/1048576 bytes at offset 34603008 1 MiB, 256 ops; 0.0000 sec (1.462 GiB/sec and 383233.5329 ops/sec) wrote 1048576/1048576 bytes at offset 134217728 1 MiB, 256 ops; 0.0000 sec (1.719 GiB/sec and 450704.2254 ops/sec) /mnt/scratch/blah: EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS 0: [0..65535]: 96..65631 65536 0x0 1: [65536..67583]: hole 2048 2: [67584..69631]: 67680..69727 2048 0x0 3: [69632..262143]: hole 192512 4: [262144..264191]: 262240..264287 2048 0x1 Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NMark Tinguely <tinguely@sgi.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Signed-off-by: NBen Myers <bpm@sgi.com>
-
- 29 1月, 2013 1 次提交
-
-
由 Dave Chinner 提交于
There is a window on small filesytsems where specualtive preallocation can be larger than that ENOSPC throttling thresholds, resulting in specualtive preallocation trying to reserve more space than there is space available. This causes immediate ENOSPC to be triggered, prealloc to be turned off and flushing to occur. One the next write (i.e. next 4k page), we do exactly the same thing, and so effective drive into synchronous 4k writes by triggering ENOSPC flushing on every page while in the window between the prealloc size and the ENOSPC prealloc throttle threshold. Fix this by checking to see if the prealloc size would consume all free space, and throttle it appropriately to avoid premature ENOSPC... Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Signed-off-by: NBen Myers <bpm@sgi.com>
-
- 25 1月, 2013 1 次提交
-
-
由 Dave Chinner 提交于
There is a window on small filesytsems where specualtive preallocation can be larger than that ENOSPC throttling thresholds, resulting in specualtive preallocation trying to reserve more space than there is space available. This causes immediate ENOSPC to be triggered, prealloc to be turned off and flushing to occur. One the next write (i.e. next 4k page), we do exactly the same thing, and so effective drive into synchronous 4k writes by triggering ENOSPC flushing on every page while in the window between the prealloc size and the ENOSPC prealloc throttle threshold. Fix this by checking to see if the prealloc size would consume all free space, and throttle it appropriately to avoid premature ENOSPC... Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Signed-off-by: NBen Myers <bpm@sgi.com>
-
- 09 11月, 2012 2 次提交
-
-
由 Brian Foster 提交于
Add the XFS_ICI_EOFBLOCKS_TAG inode tag to identify inodes with speculatively preallocated blocks beyond EOF. An inode is tagged when speculative preallocation occurs and untagged either via truncate down or when post-EOF blocks are freed via release or reclaim. The tag management is intentionally not aggressive to prefer simplicity over the complexity of handling all the corner cases under which post-EOF blocks could be freed (i.e., forward truncation, fallocate, write error conditions, etc.). This means that a tagged inode may or may not have post-EOF blocks after a period of time. The tag is eventually cleared when the inode is released or reclaimed. Signed-off-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NMark Tinguely <tinguely@sgi.com> Signed-off-by: NBen Myers <bpm@sgi.com>
-
由 Dave Chinner 提交于
Certain allocation paths through xfs_bmapi_write() are in situations where we have limited stack available. These are almost always in the buffered IO writeback path when convertion delayed allocation extents to real extents. The current stack switch occurs for userdata allocations, which means we also do stack switches for preallocation, direct IO and unwritten extent conversion, even those these call chains have never been implicated in a stack overrun. Hence, let's target just the single stack overun offended for stack switches. To do that, introduce a XFS_BMAPI_STACK_SWITCH flag that the caller can pass xfs_bmapi_write() to indicate it should switch stacks if it needs to do allocation. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NMark Tinguely <tinguely@sgi.com> Signed-off-by: NBen Myers <bpm@sgi.com>
-
- 19 10月, 2012 1 次提交
-
-
由 Dave Chinner 提交于
Certain allocation paths through xfs_bmapi_write() are in situations where we have limited stack available. These are almost always in the buffered IO writeback path when convertion delayed allocation extents to real extents. The current stack switch occurs for userdata allocations, which means we also do stack switches for preallocation, direct IO and unwritten extent conversion, even those these call chains have never been implicated in a stack overrun. Hence, let's target just the single stack overun offended for stack switches. To do that, introduce a XFS_BMAPI_STACK_SWITCH flag that the caller can pass xfs_bmapi_write() to indicate it should switch stacks if it needs to do allocation. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NMark Tinguely <tinguely@sgi.com> Signed-off-by: NBen Myers <bpm@sgi.com>
-
- 18 10月, 2012 1 次提交
-
-
由 Dave Chinner 提交于
We don't do any data writeback from XFS any more - the VFS is completely responsible for that, including for freeze. We can replace the remaining caller with a VFS level function that achieves the same thing, but without conflicting with current writeback work. This means we can remove the flush_work and xfs_flush_inodes() - the VFS functionality completely replaces the internal flush queue for doing this writeback work in a separate context to avoid stack overruns. This does have one complication - it cannot be called with page locks held. Hence move the flushing of delalloc space when ENOSPC occurs back up into xfs_file_aio_buffered_write when we don't hold any locks that will stall writeback. Unfortunately, writeback_inodes_sb_if_idle() is not sufficient to trigger delalloc conversion fast enough to prevent spurious ENOSPC whent here are hundreds of writers, thousands of small files and GBs of free RAM. Hence we need to use sync_sb_inodes() to block callers while we wait for writeback like the previous xfs_flush_inodes implementation did. That means we have to hold the s_umount lock here, but because this call can nest inside i_mutex (the parent directory in the create case, held by the VFS), we have to use down_read_trylock() to avoid potential deadlocks. In practice, this trylock will succeed on almost every attempt as unmount/remount type operations are exceedingly rare. Note: we always need to pass a count of zero to generic_file_buffered_write() as the previously written byte count. We only do this by accident before this patch by the virtue of ret always being zero when there are no errors. Make this explicit rather than needing to specifically zero ret in the ENOSPC retry case. Signed-off-by: NDave Chinner <dchinner@redhat.com> Tested-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NBen Myers <bpm@sgi.com>
-
- 31 7月, 2012 1 次提交
-
-
由 Jan Kara 提交于
Generic code now blocks all writers from standard write paths. So we add blocking of all writers coming from ioctl (we get a protection of ioctl against racing remount read-only as a bonus) and convert xfs_file_aio_write() to a non-racy freeze protection. We also keep freeze protection on transaction start to block internal filesystem writes such as removal of preallocated blocks. CC: Ben Myers <bpm@sgi.com> CC: Alex Elder <elder@kernel.org> CC: xfs@oss.sgi.com Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 15 6月, 2012 2 次提交
-
-
由 Dave Chinner 提交于
XFS_MAXIOFFSET() is just a simple macro that resolves to mp->m_maxioffset. It doesn't need to exist, and it just makes the code unnecessarily loud and shouty. Make it quiet and easy to read. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NEric Sandeen <sandeen@redhat.com> Signed-off-by: NBen Myers <bpm@sgi.com>
-
由 Dave Chinner 提交于
The m_maxioffset field in the struct xfs_mount contains the same value as the superblock s_maxbytes field. There is no need to carry two copies of this limit around, so use the VFS superblock version. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NEric Sandeen <sandeen@redhat.com> Signed-off-by: NBen Myers <bpm@sgi.com>
-
- 21 5月, 2012 1 次提交
-
-
由 Dave Chinner 提交于
xfstest 270 was causing quota reservations way beyond what was sane (ten to hundreds of TB) for a 4GB filesystem. There's a sign problem in the error handling path of xfs_bmapi_reserve_delalloc() because xfs_trans_unreserve_quota_nblks() simple negates the value passed - which doesn't work for an unsigned variable. This causes reservations of close to 2^32 block instead of removing a reservation of a handful of blocks. Fix the same problem in the other xfs_trans_unreserve_quota_nblks() callers where unsigned integer variables are used, too. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NEric Sandeen <sandeen@redhat.com> Signed-off-by: NBen Myers <bpm@sgi.com>
-
- 15 5月, 2012 5 次提交
-
-
由 Dave Chinner 提交于
With the removal of xfs_rw.h and other changes over time, xfs_bit.h is being included in many files that don't actually need it. Clean up the includes as necessary. Also move the only-used-once xfs_ialloc_find_free() static inline function out of a header file that is widely included to reduce the number of needless dependencies on xfs_bit.h. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NMark Tinguely <tinguely@sgi.com> Signed-off-by: NBen Myers <bpm@sgi.com>
-
由 Dave Chinner 提交于
The only thing left in xfs_rw.h is a function prototype for an inode function. Move that to xfs_inode.h, and kill xfs_rw.h. Also move the function implementing the prototype from xfs_rw.c to xfs_inode.c so we only have one function left in xfs_rw.c Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NMark Tinguely <tinguely@sgi.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NBen Myers <bpm@sgi.com>
-
由 Dave Chinner 提交于
Untangle the header file includes a bit by moving the definition of xfs_agino_t to xfs_types.h. This removes the dependency that xfs_ag.h has on xfs_inum.h, meaning we don't need to include xfs_inum.h everywhere we include xfs_ag.h. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NMark Tinguely <tinguely@sgi.com> Signed-off-by: NBen Myers <bpm@sgi.com>
-
由 Dave Chinner 提交于
Speculative delayed allocation beyond EOF near the maximum supported file offset can result in creating delalloc extents beyond mp->m_maxioffset (8EB). These can never be trimmed during xfs_free_eof_blocks() because they are beyond mp->m_maxioffset, and that results in assert failures in xfs_fs_destroy_inode() due to delalloc blocks still being present. xfstests 071 exposes this problem. Limit speculative delalloc to mp->m_maxioffset to avoid this problem. Signed-off-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NBen Myers <bpm@sgi.com>
-
由 Dave Chinner 提交于
For the direct IO write path, we only really need the ilock to be taken in exclusive mode during IO submission if we need to do extent allocation instead of all the time. Change the block mapping code to take the ilock in shared mode for the initial block mapping, and only retake it exclusively when we actually have to perform extent allocations. We were already dropping the ilock for the transaction allocation, so this doesn't introduce new race windows. Based on an earlier patch from Dave Chinner. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NMark Tinguely <tinguely@sgi.com> Signed-off-by: NBen Myers <bpm@sgi.com>
-
- 06 3月, 2012 1 次提交
-
-
由 Christoph Hellwig 提交于
If we convert and unwritten extent past the current i_size log the size update as part of the extent manipulation transactions instead of doing an unlogged metadata update later. Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NMark Tinguely <tinguely@sgi.com> Signed-off-by: NBen Myers <bpm@sgi.com>
-
- 18 1月, 2012 1 次提交
-
-
由 Christoph Hellwig 提交于
There is no fundamental need to keep an in-memory inode size copy in the XFS inode. We already have the on-disk value in the dinode, and the separate in-memory copy that we need for regular files only in the XFS inode. Remove the xfs_inode i_size field and change the XFS_ISIZE macro to use the VFS inode i_size field for regular files. Switch code that was directly accessing the i_size field in the xfs_inode to XFS_ISIZE, or in cases where we are limited to regular files direct access of the VFS inode i_size field. This also allows dropping some fairly complicated code in the write path which dealt with keeping the xfs_inode i_size uptodate with the VFS i_size that is getting updated inside ->write_end. Note that we do not bother resetting the VFS i_size when truncating a file that gets freed to zero as there is no point in doing so because the VFS inode is no longer in use at this point. Just relax the assert in xfs_ifree to only check the on-disk size instead. Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NBen Myers <bpm@sgi.com>
-
- 14 1月, 2012 1 次提交
-
-
由 Christoph Hellwig 提交于
Replace the nasty if, else if, elseif condition with more natural C flow that expressed the logic we want here better. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NBen Myers <bpm@sgi.com>
-
- 12 10月, 2011 4 次提交
-
-
由 Christoph Hellwig 提交于
There is no reason to keep a reference to the inode even if we unlock it during transaction commit because we never drop a reference between the ijoin and commit. Also use this fact to merge xfs_trans_ijoin_ref back into xfs_trans_ijoin - the third argument decides if an unlock is needed now. I'm actually starting to wonder if allowing inodes to be unlocked at transaction commit really is worth the effort. The only real benefit is that they can be unlocked earlier when commiting a synchronous transactions, but that could be solved by doing the log force manually after the unlock, too. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAlex Elder <aelder@sgi.com>
-
由 Dave Chinner 提交于
Now that all the read-only users of xfs_bmapi have been converted to use xfs_bmapi_read(), we can remove all the read-only handling cases from xfs_bmapi(). Once this is done, rename xfs_bmapi to xfs_bmapi_write to reflect the fact it is for allocation only. This enables us to kill the XFS_BMAPI_WRITE flag as well. Also clean up xfs_bmapi_write to the style used in the newly added xfs_bmapi_read/delay functions. Signed-off-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAlex Elder <aelder@sgi.com>
-
由 Christoph Hellwig 提交于
Delalloc reservations are much simpler than allocations, so give them a separate bmapi-level interface. Using the previously added xfs_bmapi_reserve_delalloc we get a function that is only minimally more complicated than xfs_bmapi_read, which is far from the complexity in xfs_bmapi. Also remove the XFS_BMAPI_DELAY code after switching over the only user to xfs_bmapi_delay. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAlex Elder <aelder@sgi.com>
-
由 Dave Chinner 提交于
xfs_bmapi() currently handles both extent map reading and allocation. As a result, the code is littered with "if (wr)" branches to conditionally do allocation operations if required. This makes the code much harder to follow and causes significant indent issues with the code. Given that read mapping is much simpler than allocation, we can split out read mapping from xfs_bmapi() and reuse the logic that we have already factored out do do all the hard work of handling the extent map manipulations. The results in a much simpler function for the common extent read operations, and will allow the allocation code to be simplified in another commit. Once xfs_bmapi_read() is implemented, convert all the callers of xfs_bmapi() that are only reading extents to use the new function. Signed-off-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAlex Elder <aelder@sgi.com>
-
- 11 7月, 2011 1 次提交
-
-
由 Alex Elder 提交于
This reverts commit 7a249cf8. That commit created a situation that could lead to a filesystem hang. As Dave Chinner pointed out, xfs_trans_alloc() could hold a reference to m_active_trans (i.e., keep it non-zero) and then wait for SB_FREEZE_TRANS to complete. Meanwhile a filesystem freeze request could set SB_FREEZE_TRANS and then wait for m_active_trans to drop to zero. Nobody benefits from this sequence of events... Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAlex Elder <aelder@sgi.com>
-
- 08 7月, 2011 1 次提交
-
-
由 Christoph Hellwig 提交于
As pointed out by Jan xfs_trans_alloc can race with a concurrent filesystem freeze when it sleeps during the memory allocation. Fix this by moving the wait_for_freeze call after the memory allocation. This means moving the freeze into the low-level _xfs_trans_alloc helper, which thus grows a new argument. Also fix up some comments in that area while at it. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NAlex Elder <aelder@sgi.com> Reviewed-by: NDave Chinner <david@fromorbit.com>
-