- 07 4月, 2019 1 次提交
-
-
由 Christoph Hellwig 提交于
Currently support for 64-bit sector_t and blkcnt_t is optional on 32-bit architectures. These types are required to support block device and/or file sizes larger than 2 TiB, and have generally defaulted to on for a long time. Enabling the option only increases the i386 tinyconfig size by 145 bytes, and many data structures already always use 64-bit values for their in-core and on-disk data structures anyway, so there should not be a large change in dynamic memory usage either. Dropping this option removes a somewhat weird non-default config that has cause various bugs or compiler warnings when actually used. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 26 3月, 2019 1 次提交
-
-
由 Brian Foster 提交于
XFS applies more strict serialization constraints to unaligned direct writes to accommodate things like direct I/O layer zeroing, unwritten extent conversion, etc. Unaligned submissions acquire the exclusive iolock and wait for in-flight dio to complete to ensure multiple submissions do not race on the same block and cause data corruption. This generally works in the case of an aligned dio followed by an unaligned dio, but the serialization is lost if I/Os occur in the opposite order. If an unaligned write is submitted first and immediately followed by an overlapping, aligned write, the latter submits without the typical unaligned serialization barriers because there is no indication of an unaligned dio still in-flight. This can lead to unpredictable results. To provide proper unaligned dio serialization, require that such direct writes are always the only dio allowed in-flight at one time for a particular inode. We already acquire the exclusive iolock and drain pending dio before submitting the unaligned dio. Wait once more after the dio submission to hold the iolock across the I/O and prevent further submissions until the unaligned I/O completes. This is heavy handed, but consistent with the current pre-submission serialization for unaligned direct writes. Signed-off-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NAllison Henderson <allison.henderson@oracle.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
- 25 3月, 2019 1 次提交
-
-
由 Darrick J. Wong 提交于
The xfs fstrim implementation uses the free space btrees to find free space that can be discarded. If we haven't recovered the log, the bnobt will be stale and we absolutely *cannot* use stale metadata to zap the underlying storage. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NEric Sandeen <sandeen@redhat.com>
-
- 19 3月, 2019 3 次提交
-
-
由 Darrick J. Wong 提交于
Always init the tp/ip fields of bma in xfs_bmapi_write so that the bmapi_finish at the bottom never trips over null transaction or inode pointers. Coverity-id: 1443964 Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NBrian Foster <bfoster@redhat.com>
-
由 Darrick J. Wong 提交于
In xchk_btree_check_owner, we can be passed a null buffer pointer. This should only happen for the root of a root-in-inode btree type, but we should program defensively in case the btree cursor state ever gets screwed up and we get a null buffer anyway. Coverity-id: 1438713 Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NBrian Foster <bfoster@redhat.com>
-
由 Darrick J. Wong 提交于
Make sure scrub's dabtree iterator function checks that we're not going deeper in the stack than our cursor permits. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NBrian Foster <bfoster@redhat.com>
-
- 18 3月, 2019 1 次提交
-
-
由 Brian Foster 提交于
We've had rather rare reports of bmap btree block corruption where the bmap root block has a level count of zero. The root cause of the corruption is so far unknown. We do have verifier checks to detect this form of on-disk corruption, but this doesn't cover a memory corruption variant of the problem. The latter is a reasonable possibility because the root block is part of the inode fork and can reside in-core for some time before inode extents are read. If this occurs, it leads to a system crash such as the following: BUG: unable to handle kernel paging request at ffffffff00000221 PF error: [normal kernel read fault] ... RIP: 0010:xfs_trans_brelse+0xf/0x200 [xfs] ... Call Trace: xfs_iread_extents+0x379/0x540 [xfs] xfs_file_iomap_begin_delay+0x11a/0xb40 [xfs] ? xfs_attr_get+0xd1/0x120 [xfs] ? iomap_write_begin.constprop.40+0x2d0/0x2d0 xfs_file_iomap_begin+0x4c4/0x6d0 [xfs] ? __vfs_getxattr+0x53/0x70 ? iomap_write_begin.constprop.40+0x2d0/0x2d0 iomap_apply+0x63/0x130 ? iomap_write_begin.constprop.40+0x2d0/0x2d0 iomap_file_buffered_write+0x62/0x90 ? iomap_write_begin.constprop.40+0x2d0/0x2d0 xfs_file_buffered_aio_write+0xe4/0x3b0 [xfs] __vfs_write+0x150/0x1b0 vfs_write+0xba/0x1c0 ksys_pwrite64+0x64/0xa0 do_syscall_64+0x5a/0x1d0 entry_SYSCALL_64_after_hwframe+0x49/0xbe The crash occurs because xfs_iread_extents() attempts to release an uninitialized buffer pointer as the level == 0 value prevented the buffer from ever being allocated or read. Change the level > 0 assert to an explicit error check in xfs_iread_extents() to avoid crashing the kernel in the event of localized, in-core inode corruption. Signed-off-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
- 13 3月, 2019 1 次提交
-
-
由 Darrick J. Wong 提交于
Remove typedefs and consolidate local variable initialization. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NAllison Henderson <allison.henderson@oracle.com> Reviewed-by: NBill O'Donnell <billodo@redhat.com>
-
- 11 3月, 2019 1 次提交
-
-
由 Darrick J. Wong 提交于
Smatch complains about the following: fs/xfs/libxfs/xfs_dir2_leaf.c:848 xfs_dir2_leaf_addname() error: uninitialized symbol 'lowstale'. fs/xfs/libxfs/xfs_dir2_leaf.c:849 xfs_dir2_leaf_addname() error: uninitialized symbol 'highstale'. I don't think there's any incorrect behavior associated with the uninitialized variable, but as the author of the previous zero-init patch points out, it's best not to be passing around pointers to uninitialized stack areas. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NNathan Chancellor <natechancellor@gmail.com> Reviewed-by: NAllison Henderson <allison.henderson@oracle.com> Reviewed-by: NBill O'Donnell <billodo@redhat.com>
-
- 09 3月, 2019 2 次提交
-
-
由 Darrick J. Wong 提交于
Remove typedefs and consolidate local variable initialization. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NNick Desaulniers <ndesaulniers@google.com>
-
由 Nathan Chancellor 提交于
When building with -Wsometimes-uninitialized, Clang warns: fs/xfs/libxfs/xfs_dir2_node.c:481:6: warning: variable 'lowstale' is used uninitialized whenever 'if' condition is false [-Wsometimes-uninitialized] fs/xfs/libxfs/xfs_dir2_node.c:481:6: warning: variable 'highstale' is used uninitialized whenever 'if' condition is false [-Wsometimes-uninitialized] While it isn't technically wrong, it isn't a problem in practice because highstale and lowstale are only initialized in xfs_dir2_leafn_add when compact is not zero then they are passed to xfs_dir3_leaf_find_entry, where they are initialized before use when compact is zero. Regardless, it's better not to be passing around uninitialized stack memory so zero initialize these variables, which silences this warning. Link: https://github.com/ClangBuiltLinux/linux/issues/393Signed-off-by: NNathan Chancellor <natechancellor@gmail.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
- 02 3月, 2019 1 次提交
-
-
由 Luis R. Rodriguez 提交于
statx(2) notes that any attribute that is not indicated as supported by stx_attributes_mask has no usable value. Commit 5f955f26 ("xfs: report crtime and attribute flags to statx") added support for informing userspace of extra file attributes but forgot to list these flags as supported making reporting them rather useless for the pedantic userspace author. $ git describe --contains 5f955f26 v4.11-rc6~5^2^2~2 Fixes: 5f955f26 ("xfs: report crtime and attribute flags to statx") Signed-off-by: NLuis R. Rodriguez <mcgrof@kernel.org> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> [darrick: add a comment reminding people to keep attributes_mask up to date] Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
- 26 2月, 2019 4 次提交
-
-
由 Darrick J. Wong 提交于
Fix a backwards endian conversion of a constant. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NAllison Henderson <allison.henderson@oracle.com>
-
由 Darrick J. Wong 提交于
smatch complained about some uninitialized error returns, so fix those. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NAllison Henderson <allison.henderson@oracle.com>
-
由 Darrick J. Wong 提交于
Rework the data flow in xfs_file_iomap_begin where we decide if we have to break shared extents. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NBrian Foster <bfoster@redhat.com>
-
由 Darrick J. Wong 提交于
Don't pass raw iomap flags to xfs_reflink_allocate_cow; signal our intention with a boolean argument. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NBrian Foster <bfoster@redhat.com>
-
- 24 2月, 2019 1 次提交
-
-
由 Christoph Hellwig 提交于
Store the request queue the last bio was submitted to in the iocb private data in addition to the cookie so that we find the right block device. Also refactor the common direct I/O bio submission code into a nice little helper. Signed-off-by: NChristoph Hellwig <hch@lst.de> Modified to use bio_set_polled(). Reviewed-by: NHannes Reinecke <hare@suse.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 21 2月, 2019 9 次提交
-
-
由 Colin Ian King 提交于
A previous commit removed the initialization of variable 'error' to zero, and can cause a bogus error return. This occurs when error contains a non-zero garbage value and the call to xchk_should_terminate detects a pending fatal signal and checks for a zero error before setting it to -EAGAIN. Fix the issue by initializing error to zero. Fixes: b9454fe0 ("xfs: clean up the inode cluster checking in the inobt scrub") Signed-off-by: NColin Ian King <colin.king@canonical.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
由 Christoph Hellwig 提交于
Add a mode where XFS never overwrites existing blocks in place. This is to aid debugging our COW code, and also put infatructure in place for things like possible future support for zoned block devices, which can't support overwrites. This mode is enabled globally by doing a: echo 1 > /sys/fs/xfs/debug/always_cow Note that the parameter is global to allow running all tests in xfstests easily in this mode, which would not easily be possible with a per-fs sysfs file. In always_cow mode persistent preallocations are disabled, and fallocate will fail when called with a 0 mode (with our without FALLOC_FL_KEEP_SIZE), and not create unwritten extent for zeroed space when called with FALLOC_FL_ZERO_RANGE or FALLOC_FL_UNSHARE_RANGE. There are a few interesting xfstests failures when run in always_cow mode: - generic/392 fails because the bytes used in the file used to test hole punch recovery are less after the log replay. This is because the blocks written and then punched out are only freed with a delay due to the logging mechanism. - xfs/170 will fail as the already fragile file streams mechanism doesn't seem to interact well with the COW allocator - xfs/180 xfs/182 xfs/192 xfs/198 xfs/204 and xfs/208 will claim the file system is badly fragmented, but there is not much we can do to avoid that when always writing out of place - xfs/205 fails because overwriting a file in always_cow mode will require new space allocation and the assumption in the test thus don't work anymore. - xfs/326 fails to modify the file at all in always_cow mode after injecting the refcount error, leading to an unexpected md5sum after the remount, but that again is expected Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
由 Christoph Hellwig 提交于
No user of it in the iomap code at the moment, but we should not actively report wrong information if we can trivially get it right. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
由 Christoph Hellwig 提交于
If we have racing buffered and direct I/O COW fork extents under writeback can have been moved to the data fork by the time we call xfs_reflink_convert_cow from xfs_submit_ioend. This would be mostly harmless as the block numbers don't change by this move, except for the fact that xfs_bmapi_write will crash or trigger asserts when not finding existing extents, even despite trying to paper over this with the XFS_BMAPI_CONVERT_ONLY flag. Instead of special casing non-transaction conversions in the already way too complicated xfs_bmapi_write just add a new helper for the much simpler non-transactional COW fork case, which simplify ignores not found extents. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
由 Christoph Hellwig 提交于
Besides simplifying the code a bit this allows to actually implement the behavior of using COW preallocation for non-COW data mentioned in the current comments. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
由 Christoph Hellwig 提交于
This only matters if we want to write data through the COW fork that is not actually an overwrite of existing data. Reasons for that are speculative COW fork allocations using the cowextsize, or a mode where we always write through the COW fork. Currently both can't actually happen, but I plan to enable them. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
由 Christoph Hellwig 提交于
While using delalloc for extsize hints is generally a good idea, the current code that does so only for COW doesn't help us much and creates a lot of special cases. Switch it to use real allocations like we do for direct I/O. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
由 Christoph Hellwig 提交于
We speculatively allocate extents in the COW fork to reduce fragmentation. But when we write data into such COW fork blocks that do now shadow an allocation in the data fork SEEK_DATA will not correctly report it, as it only looks at the data fork extents. The only reason why that hasn't been an issue so far is because we even use these speculative COW fork preallocations over holes in the data fork at all for buffered writes, and blocks in the COW fork that are written by direct writes are moved into the data fork immediately at I/O completion time. Add a new set of iomap_ops for SEEK_HOLE/SEEK_DATA which looks into both the COW and data fork, and reports all COW extents as unwritten to the iomap layer. While this isn't strictly true for COW fork extents that were already converted to real extents, the practical semantics that you can't read data from them until they are moved into the data fork are very similar, and this will force the iomap layer into probing the extents for actually present data. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
由 Christoph Hellwig 提交于
Move checking for invalid zero blocks and setting of various iomap flags into this helper. Also make it deal with "raw" delalloc extents to avoid clutter in the callers. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
- 19 2月, 2019 1 次提交
-
-
由 Darrick J. Wong 提交于
Create a separate magic16 check function so that we don't run afoul of static checkers. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NBrian Foster <bfoster@redhat.com>
-
- 18 2月, 2019 10 次提交
-
-
由 Christoph Hellwig 提交于
While we can only truncate a block under the page lock for the current page, there is no high-level synchronization for moving extents from the COW to the data fork. This means that for example we can have another thread doing a direct I/O completion that moves extents from the COW to the data fork race with writeback. While this race is very hard to hit the always_cow seems to reproduce it reasonably well, and it also exists without that. Because of that there is a chance that a delalloc conversion for the COW fork might not find any extents to convert. In that case we should retry the whole block lookup and now find the blocks in the data fork. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
由 Christoph Hellwig 提交于
Now that we properly handle the race with truncate in the delalloc allocator there is no need to short cut this exceptional case earlier on. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
由 Christoph Hellwig 提交于
This function is a small wrapper only used by the writeback code, so move it together with the writeback code and simplify it down to the glorified do { } while loop that is now is. A few bits intentionally got lost here: no need to call xfs_qm_dqattach because quotas are always attached when we create the delalloc reservation, and no need for the imap->br_startblock == 0 check given that xfs_bmapi_convert_delalloc already has a WARN_ON_ONCE for exactly that condition. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
由 Christoph Hellwig 提交于
This way we can actually count how many bytes got converted and how many calls we need, unlike in the caller which doesn't have the detailed view. Note that this includes a slight change in behavior as the xs_xstrat_quick is now bumped for every allocation instead of just the one covering the requested writeback offset, which makes a lot more sense. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
由 Christoph Hellwig 提交于
No need to deal with the transaction and the inode locking in the caller. Note that we also switch to passing whichfork as the second paramter, matching what most related functions do. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
由 Christoph Hellwig 提交于
Delalloc conversion has traditionally been part of our function to allocate blocks on disk (first xfs_bmapi, then xfs_bmapi_write), but delalloc conversion is a little special as we really do not want to allocate blocks over holes, for which we don't have reservations. Split the delalloc conversions into a separate helper to keep the code simple and structured. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
由 Christoph Hellwig 提交于
We want to be able to reuse them for the upcoming dedidcated delalloc convert routine. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
由 Christoph Hellwig 提交于
Move boilerplate code from the callers into xfs_bmap_btree_to_extents: - exit early without failure if we don't need to convert to the extent format - assert that we have a btree cursor - don't reinitialize the passed in logflags argument Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
由 Christoph Hellwig 提交于
We already ensure all data fits into s_maxbytes in the write / fault path. The only reason we have them here is that they were copy and pasted from xfs_bmapi_read when we stopped using that function. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
由 Christoph Hellwig 提交于
The io_type field contains what is basically a summary of information from the inode fork and the imap. But we can just as easily use that information directly, simplifying a few bits here and there and improving the trace points. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
- 15 2月, 2019 3 次提交
-
-
由 Ming Lei 提交于
This patch pulls the trigger for multi-page bvecs. Reviewed-by: NOmar Sandoval <osandov@fb.com> Signed-off-by: NMing Lei <ming.lei@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Ming Lei 提交于
This patch introduces one extra iterator variable to bio_for_each_segment_all(), then we can allow bio_for_each_segment_all() to iterate over multi-page bvec. Given it is just one mechannical & simple change on all bio_for_each_segment_all() users, this patch does tree-wide change in one single patch, so that we can avoid to use a temporary helper for this conversion. Reviewed-by: NOmar Sandoval <osandov@fb.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NMing Lei <ming.lei@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Darrick J. Wong 提交于
When XFS creates an O_TMPFILE file, the inode is created with nlink = 1, put on the unlinked list, and then the VFS sets nlink = 0 in d_tmpfile. If we crash before anything logs the inode (it's dirty incore but the vfs doesn't tell us it's dirty so we never log that change), the iunlink processing part of recovery will then explode with a pile of: XFS: Assertion failed: VFS_I(ip)->i_nlink == 0, file: fs/xfs/xfs_log_recover.c, line: 5072 Worse yet, since nlink is nonzero, the inodes also don't get cleaned up and they just leak until the next xfs_repair run. Therefore, change xfs_iunlink to require that inodes being put on the unlinked list have nlink == 0, change the tmpfile callers to instantiate nodes that way, and set the nlink to 1 just prior to calling d_tmpfile. Fix the comment for xfs_iunlink while we're at it. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NChristoph Hellwig <hch@lst.de>
-