- 26 4月, 2017 5 次提交
-
-
由 Christoph Hellwig 提交于
The main thing that xfs_bmap_remap_alloc does is fixing the AGFL, similar to what we do in the space allocator. But the reflink code doesn't touch the allocation btree unlike the normal space allocator, so we couldn't care less about the state of the AGFL. So remove xfs_bmap_remap_alloc and just handle the di_nblocks update in the caller. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
由 Christoph Hellwig 提交于
Add a new helper to be used for reflink extent list additions instead of funneling them through xfs_bmapi_write and overloading the firstblock member in struct xfs_bmalloca and struct xfs_alloc_args. With some small changes to xfs_bmap_remap_alloc this also means we do not need a xfs_bmalloca structure for this case at all. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
由 Christoph Hellwig 提交于
For the reflink case we'd much rather pass the required arguments than faking up a struct xfs_bmalloca. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
由 Christoph Hellwig 提交于
We never do COW operations for the attr fork, so don't pretend we handle them. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
由 Christoph Hellwig 提交于
bno should be a xfs_fsblock_t, which is 64-bit wides instead of a xfs_aglock_t, which truncates the value to 32 bits. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
- 09 3月, 2017 2 次提交
-
-
由 Christoph Hellwig 提交于
When a reflink operation causes the bmap code to allocate a btree block we're currently doing single-AG allocations due to having ->firstblock set and then try any higher AG due a little reflink quirk we've put in when adding the reflink code. But given that we do not have a minleft reservation of any kind in this AG we can still not have any space in the same or higher AG even if the file system has enough free space. To fix this use a XFS_ALLOCTYPE_FIRST_AG allocation in this fall back path instead. [And yes, we need to redo this properly instead of piling hacks over hacks. I'm working on that, but it's not going to be a small series. In the meantime this fixes the customer reported issue] Also add a warning for failing allocations to make it easier to debug. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
由 Brian Foster 提交于
Commit fa7f138a ("xfs: clear delalloc and cache on buffered write failure") fixed one regression in the iomap error handling code and exposed another. The fundamental problem is that if a buffered write is a rewrite of preexisting delalloc blocks and the write fails, the failure handling code can punch out preexisting blocks with valid file data. This was reproduced directly by sub-block writes in the LTP kernel/syscalls/write/write03 test. A first 100 byte write allocates a single block in a file. A subsequent 100 byte write fails and punches out the block, including the data successfully written by the previous write. To address this problem, update the ->iomap_begin() handler to distinguish newly allocated delalloc blocks from preexisting delalloc blocks via the IOMAP_F_NEW flag. Use this flag in the ->iomap_end() handler to decide when a failed or short write should punch out delalloc blocks. This introduces the subtle requirement that ->iomap_begin() should never combine newly allocated delalloc blocks with existing blocks in the resulting iomap descriptor. This can occur when a new delalloc reservation merges with a neighboring extent that is part of the current write, for example. Therefore, drop the post-allocation extent lookup from xfs_bmapi_reserve_delalloc() and just return the record inserted into the fork. This ensures only new blocks are returned and thus that preexisting delalloc blocks are always handled as "found" blocks and not punched out on a failed rewrite. Reported-by: NXiong Zhou <xzhou@redhat.com> Signed-off-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
- 17 2月, 2017 3 次提交
-
-
由 Christoph Hellwig 提交于
In various places we currently assert that xfs_bmap_btalloc allocates from the same as the firstblock value passed in, unless it's either NULLAGNO or the dop_low flag is set. But the reflink code does not fully follow this convention as it passes in firstblock purely as a hint for the allocator without actually having previous allocations in the transaction, and without having a minleft check on the current AG, leading to the assert firing on a very full and heavily used file system. As even the reflink code only allocates from equal or higher AGs for now we can simply the check to always allow for equal or higher AGs. Note that we need to eventually split the two meanings of the firstblock value. At that point we can also allow the reflink code to allocate from any AG instead of limiting it in any way. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
由 Brian Foster 提交于
Certain workoads that punch holes into speculative preallocation can cause delalloc indirect reservation splits when the delalloc extent is split in two. If further splits occur, an already short-handed extent can be split into two in a manner that leaves zero indirect blocks for one of the two new extents. This occurs because the shortage is large enough that the xfs_bmap_split_indlen() algorithm completely drains the requested indlen of one of the extents before it honors the existing reservation. This ultimately results in a warning from xfs_bmap_del_extent(). This has been observed during file copies of large, sparse files using 'cp --sparse=always.' To avoid this problem, update xfs_bmap_split_indlen() to explicitly apply the reservation shortage fairly between both extents. This smooths out the overall indlen shortage and defers the situation where we end up with a delalloc extent with zero indlen reservation to extreme circumstances. Reported-by: NPatrick Dung <mpatdung@gmail.com> Signed-off-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
由 Brian Foster 提交于
When a delalloc extent is created, it can be merged with pre-existing, contiguous, delalloc extents. When this occurs, xfs_bmap_add_extent_hole_delay() merges the extents along with the associated indirect block reservations. The expectation here is that the combined worst case indlen reservation is always less than or equal to the indlen reservation for the individual extents. This is not always the case, however, as existing extents can less than the expected indlen reservation if the extent was previously split due to a hole punch. If a new extent merges with such an extent, the total indlen requirement may be larger than the sum of the indlen reservations held by both extents. xfs_bmap_add_extent_hole_delay() assumes that the worst case indlen reservation is always available and assigns it to the merged extent without consideration for the indlen held by the pre-existing extent. As a result, the subsequent xfs_mod_fdblocks() call can attempt an unintentional allocation rather than a free (indicated by an ASSERT() failure). Further, if the allocation happens to fail in this context, the failure goes unhandled and creates a filesystem wide block accounting inconsistency. Fix xfs_bmap_add_extent_hole_delay() to function as designed. Cap the indlen reservation assigned to the merged extent to the sum of the indlen reservations held by each of the individual extents. Signed-off-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
- 07 2月, 2017 1 次提交
-
-
由 Christoph Hellwig 提交于
When we allocate COW fork blocks for direct I/O writes we currently first create a delayed allocation, and then convert it to a real allocation once we've got the delayed one. As there is no good reason for that this patch instead makes use call xfs_bmapi_write from the COW allocation path. The only interesting bits are a few tweaks the low-level allocator to allow for this, most notably the need to remove the call to xfs_bmap_extsize_align for the cowextsize in xfs_bmap_btalloc - for the existing convert case it's a no-op, but for the direct allocation case it would blow up our block reservation way beyond what we reserved for the transaction. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
- 03 2月, 2017 2 次提交
-
-
由 Darrick J. Wong 提交于
In the data fork, we only allow extents to perform the following state transitions: delay -> real <-> unwritten There's no way to move directly from a delalloc reservation to an /unwritten/ allocated extent. However, for the CoW fork we want to be able to do the following to each extent: delalloc -> unwritten -> written -> remapped to data fork This will help us to avoid a race in the speculative CoW preallocation code between a first thread that is allocating a CoW extent and a second thread that is remapping part of a file after a write. In order to do this, however, we need two things: first, we have to be able to transition from da to unwritten, and second the function that converts between real and unwritten has to be made aware of the cow fork. Do both of those things. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NChristoph Hellwig <hch@lst.de>
-
由 Darrick J. Wong 提交于
Don't let anybody load an obviously bad btree pointer. Since the values come from disk, we must return an error, not just ASSERT. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NEric Sandeen <sandeen@redhat.com>
-
- 31 1月, 2017 2 次提交
-
-
由 Eric Sandeen 提交于
Now that xfs_btree_init_block_int is able to determine crc status from the passed-in mp, we can determine the proper magic as well if we are given a btree number, rather than an explicit magic value. Change xfs_btree_init_block[_int] callers to pass in the btree number, and let xfs_btree_init_block_int use the xfs_magics array via the xfs_btree_magic macro to determine which magic value is needed. This makes all of the if (crc) / else stanzas identical, and the if/else can be removed, leading to a single, common init_block call. Signed-off-by: NEric Sandeen <sandeen@redhat.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
由 Eric Sandeen 提交于
xfs_btree_init_block_int() can determine whether crcs are in effect without the passed-in XFS_BTREE_CRC_BLOCKS flag; the mp argument allows us to determine this from the superblock. Remove the flag from callers, and use xfs_sb_version_hascrc(&mp->m_sb) internally instead. This removes one difference between the if & else cases in the callers. Signed-off-by: NEric Sandeen <sandeen@redhat.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
- 26 1月, 2017 1 次提交
-
-
由 Christoph Hellwig 提交于
With COW files they are the hotpath, just like for files with the extent size hint attribute. We really shouldn't micro-manage anything but failure cases with unlikely. Additionally Arnd Bergmann recently reported that one of these two unlikely annotations causes link failures together with an upcoming kernel instrumentation patch, so let's get rid of it ASAP. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reported-by: NArnd Bergmann <arnd@arndb.de> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
- 24 1月, 2017 1 次提交
-
-
由 Christoph Hellwig 提交于
Due to the way how xfs_iomap_write_allocate tries to convert the whole found extents from delalloc to real space we can run into a race condition with multiple threads doing writes to this same extent. For the non-COW case that is harmless as the only thing that can happen is that we call xfs_bmapi_write on an extent that has already been converted to a real allocation. For COW writes where we move the extent from the COW to the data fork after I/O completion the race is, however, not quite as harmless. In the worst case we are now calling xfs_bmapi_write on a region that contains hole in the COW work, which will trip up an assert in debug builds or lead to file system corruption in non-debug builds. This seems to be reproducible with workloads of small O_DSYNC write, although so far I've not managed to come up with a with an isolated reproducer. The fix for the issue is relatively simple: tell xfs_bmapi_write that we are only asked to convert delayed allocations and skip holes in that case. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
- 10 1月, 2017 1 次提交
-
-
由 Christoph Hellwig 提交于
We can't just set minleft to 0 when we're low on space - that's exactly what we need minleft for: to protect space in the AG for btree block allocations when we are low on free space. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NBrian Foster <bfoster@redhat.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
- 05 12月, 2016 4 次提交
-
-
由 Darrick J. Wong 提交于
We shouldn't assert if somehow we end up trying to add an attr fork to an inode that apparently already has attr extents because this is an indication of on-disk corruption. Instead, return an error code to userspace. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Darrick J. Wong 提交于
When reading into memory all extents of a btree-format inode fork, complain if the number of extents we find is not the same as the number of extents reported in the inode core. This is needed to stop an IO action from accessing the garbage areas of the in-core fork. [dchinner: removed redundant assert] Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Eric Sandeen 提交于
By inspection, xfs_bmap_trace_exlist isn't handling cow forks, and will trace the data fork instead. Fix this by setting state appropriately if whichfork == XFS_COW_FORK. ()___() < @ @ > | | {o_o} (|) Signed-off-by: NEric Sandeen <sandeen@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Eric Sandeen 提交于
When xfs_bmap_trace_exlist called trace_xfs_extlist, it sent in the "whichfork" var instead of the bmap "state" as expected (even though state was already set up for this purpose). As a result, the xfs_bmap_class in tracing code used "whichfork" not state in xfs_iext_state_to_fork(), and got the wrong ifork pointer. It all goes downhill from there, including an ASSERT when ifp_bytes is empty by the time it reaches xfs_iext_get_ext(): XFS: Assertion failed: idx < ifp->if_bytes / sizeof(xfs_bmbt_rec_t) Signed-off-by: NEric Sandeen <sandeen@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
- 28 11月, 2016 2 次提交
-
-
由 Brian Foster 提交于
Speculative preallocation is currently processed entirely by the callers of xfs_bmapi_reserve_delalloc(). The caller determines how much preallocation to include, adjusts the extent length and passes down the resulting request. While this works fine for post-eof speculative preallocation, it is not as reliable for COW fork preallocation. COW fork preallocation is implemented via the cowextszhint, which aligns the start offset as well as the length of the extent. Further, it is difficult for the caller to accurately identify when preallocation occurs because the returned extent could have been merged with neighboring extents in the fork. To simplify this situation and facilitate further COW fork preallocation enhancements, update xfs_bmapi_reserve_delalloc() to take a separate preallocation parameter to incorporate into the allocation request. The preallocation blocks value is tacked onto the end of the request and adjusted to accommodate neighboring extents and extent size limits. Since xfs_bmapi_reserve_delalloc() now knows precisely how much preallocation was included in the allocation, it can also tag the inodes appropriately to support preallocation reclaim. Note that xfs_bmapi_reserve_delalloc() callers are not yet updated to use the preallocation mechanism. This patch should not change behavior outside of correctly tagging reflink inodes when start offset preallocation occurs (which the caller does not handle correctly). Signed-off-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Darrick J. Wong 提交于
When we're estimating the amount of space it's going to take to satisfy a delalloc reservation, we need to include the space that we might need to grow the rmapbt. This helps us to avoid running out of space later when _iomap_write_allocate needs more space than we reserved. Eryu Guan observed this happening on generic/224 when sunit/swidth were set. Reported-by: NEryu Guan <eguan@redhat.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
- 24 11月, 2016 7 次提交
-
-
由 Christoph Hellwig 提交于
We only ever set a field to this constant for an impossible to reach error case in xfs_bmap_search_extents. That functions has been removed, so we can remove the constant as well. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NBrian Foster <bfoster@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Christoph Hellwig 提交于
Now that all users are gone. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NBrian Foster <bfoster@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Christoph Hellwig 提交于
We can easily lookup the previous extent for the cases where we need it, which saves the callers from looking it up for us later in the series. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NBrian Foster <bfoster@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Christoph Hellwig 提交于
Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NBrian Foster <bfoster@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Christoph Hellwig 提交于
Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NBrian Foster <bfoster@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Christoph Hellwig 提交于
Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NBrian Foster <bfoster@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Christoph Hellwig 提交于
Rewrite the function using xfs_iext_lookup_extent and xfs_iext_get_extent, and massage the flow into something easily understandable. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NBrian Foster <bfoster@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
- 08 11月, 2016 2 次提交
-
-
由 Eric Sandeen 提交于
The open-coded pattern: ifp->if_bytes / (uint)sizeof(xfs_bmbt_rec_t) is all over the xfs code; provide a new helper xfs_iext_count(ifp) to count the number of inline extents in an inode fork. [dchinner: pick up several missed conversions] Signed-off-by: NEric Sandeen <sandeen@redhat.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Darrick J. Wong 提交于
Check the return value of xfs_trans_reserve_quota_nblks for errors. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
- 20 10月, 2016 5 次提交
-
-
由 Christoph Hellwig 提交于
Since no one uses it anymore. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Christoph Hellwig 提交于
Split out two helpers for deleting delayed or real extents from the COW fork. This allows to call them directly from xfs_reflink_cow_end_io once that function is refactored to iterate the extent tree. It will also allow to reuse the delalloc deletion from xfs_bunmapi in the future. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Darrick J. Wong 提交于
This helpers allows to trim an extent to a subset of it's original range while making sure the block numbers in it remain valid, In the future xfs_trim_extent and xfs_bmapi_trim_map should probably be merged in some form. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> [hch: split from a previous patch from Darrick, moved around and added support for "raw" delayed extents"] Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Eric Sandeen 提交于
The commit: f65306ea xfs: map an inode's offset to an exact physical block added a pointless error0: target; remove it. Addresses-Coverity-Id: 1373865 Signed-off-by: NEric Sandeen <sandeen@redhat.com> Reviewed-by: NBill O'Donnell <billodo@redhat.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Colin Ian King 提交于
Remove redundant ifp = ifp statement, it does nothing. Found with static analysis by CoverityScan. Signed-off-by: NColin Ian King <colin.king@canonical.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
- 06 10月, 2016 2 次提交
-
-
由 Darrick J. Wong 提交于
Prior to the introduction of reflink, allocating a block and mapping it into a file was performed in a single transaction with a single block reservation, and the allocator was supposed to find enough blocks to allocate the extent and any BMBT blocks that might be necessary (unless we're low on space). However, due to the way copy on write works, allocation and mapping have been split into two transactions, which means that we must be able to handle the case where we allocate an extent for CoW but that AG runs out of free space before the blocks can be mapped into a file, and the mapping requires a new BMBT block. When this happens, look in one of the other AGs for a BMBT block instead of taking the FS down. The same applies to the functions that convert a data fork to extents and later btree format. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NChristoph Hellwig <hch@lst.de>
-
由 Darrick J. Wong 提交于
Create a per-inode extent size allocator hint for copy-on-write. This hint is separate from the existing extent size hint so that CoW can take advantage of the fragmentation-reducing properties of extent size hints without disabling delalloc for regular writes. The extent size hint that's fed to the allocator during a copy on write operation is the greater of the cowextsize and regular extsize hint. During reflink, if we're sharing the entire source file to the entire destination file and the destination file doesn't already have a cowextsize hint, propagate the source file's cowextsize hint to the destination file. Furthermore, zero the bulkstat buffer prior to setting the fields so that we don't copy kernel memory contents into userspace. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NChristoph Hellwig <hch@lst.de>
-