- 04 12月, 2014 9 次提交
-
-
由 Dave Chinner 提交于
The kernel compile doesn't turn on these checks by default, so it's only when I do a kernel-user sync that I find that there are lots of compiler warnings waiting to be fixed. Fix up these set-but-unused warnings. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NEric Sandeen <sandeen@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Dave Chinner 提交于
These are currently considered private to libxfs, but they are widely used by the userspace code to decode, walk and check directory structures. Hence they really form part of the external API and as such need to bemoved to xfs_dir2.h. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Dave Chinner 提交于
These functions are needed in userspace for repair and mkfs to do the right thing. Move them to libxfs so they can be easily shared. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Dave Chinner 提交于
There's a case in that code where it checks for a buffer match in a transaction where the buffer is not marked done. i.e. trying to catch a buffer we have locked in the transaction but have not completed IO on. The only way we can find a buffer that has not had IO completed on it is if it had readahead issued on it, but we never do readahead on buffers that we have already joined into a transaction. Hence this condition cannot occur, and buffers locked and joined into a transaction should always be marked done and not under IO. Remove this code and re-order xfs_trans_read_buf_map() to remove duplicated IO dispatch and error handling code. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Dave Chinner 提交于
vn_active only ever gets decremented, so it has a very large negative number. Make it track the inode count we currently have allocated properly so we can easily track the size of the inode cache via tools like PCP. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NMark Tinguely <tinguely@sgi.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Dave Chinner 提交于
Signed-off-by: NDave Chinner <dchinner@redhat.com> xfs_bmse_merge() has a jump label for return that just returns the error value. Convert all the code to just return the error directly and use XFS_WANT_CORRUPTED_RETURN. This also allows the final call to xfs_bmbt_update() to return directly. Noticed while reviewing coccinelle return cleanup patches and wondering why the same return pattern as in xfs_bmse_shift_one() wasn't picked up by the checker pattern... Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Dave Chinner 提交于
xfs_bmse_shift_one() jumps around determining whether to shift or merge, making the code flow difficult to follow. Clean it up and use direct error returns (including XFS_WANT_CORRUPTED_RETURN) to make the code flow better and be easier to read. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Dave Chinner 提交于
After growing a filesystem, XFS can fail to allocate inodes even though there is a large amount of space available in the filesystem for inodes. The issue is caused by a nearly full allocation group having enough free space in it to be considered for inode allocation, but not enough contiguous free space to actually allocation inodes. This situation results in successful selection of the AG for allocation, then failure of the allocation resulting in ENOSPC being reported to the caller. It is caused by two possible issues. Firstly, we only consider the lognest free extent and whether it would fit an inode chunk. If the extent is not correctly aligned, then we can't allocate an inode chunk in it regardless of the fact that it is large enough. This tends to be a permanent error until space in the AG is freed. The second issue is that we don't actually lock the AGI or AGF when we are doing these checks, and so by the time we get to actually allocating the inode chunk the space we thought we had in the AG may have been allocated. This tends to be a spurious error as it requires a race to trigger. Hence this case is ignored in this patch as the reported problem is for permanent errors. The first issue could be addressed by simply taking into account the alignment when checking the longest extent. This, however, would prevent allocation in AGs that have aligned, exact sized extents free. However, this case should be fairly rare compared to the number of allocations that occur near ENOSPC that would trigger this condition. Hence, when selecting the inode AG, take into account the inode cluster alignment when checking the lognest free extent in the AG. If we can't find any AGs with a contiguous free space large enough to be aligned, drop the alignment addition and just try for an AG that has enough contiguous free space available for an inode chunk. This won't prevent issues from occurring, but should avoid situations where other AGs have lots of free space but the selected AG can't allocate due to alignment constraints. Reported-by: NArkadiusz Miskiewicz <arekm@maven.pl> Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Peter Watkins 提交于
If extsize is set and new_last_fsb is larger than 32 bits, the roundup to extsize will overflow the align variable. Instead, combine alignments by rounding stripe size up to extsize. Signed-off-by: NPeter Watkins <treestem@gmail.com> Reviewed-by: NNathaniel W. Turner <nate@houseofnate.net> Reviewed-by: NBrian Foster <bfoster@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
- 01 12月, 2014 4 次提交
-
-
由 kbuild test robot 提交于
fs/xfs/libxfs/xfs_bmap.c:5591:1-6: WARNING: end returns can be simpified Simplify a trivial if-return sequence. Possibly combine with a preceding function call. Generated by: scripts/coccinelle/misc/simple_return.cocci CC: Brian Foster <bfoster@redhat.com> Signed-off-by: NFengguang Wu <fengguang.wu@intel.com>
-
由 kbuild test robot 提交于
fs/xfs/xfs_file.c:919:1-6: WARNING: end returns can be simpified and declaration on line 902 can be dropped Simplify a trivial if-return sequence. Possibly combine with a preceding function call. Generated by: scripts/coccinelle/misc/simple_return.cocci Signed-off-by: NFengguang Wu <fengguang.wu@intel.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 kbuild test robot 提交于
fs/xfs/libxfs/xfs_ialloc.c:1141:1-6: WARNING: end returns can be simpified Simplify a trivial if-return sequence. Possibly combine with a preceding function call. Generated by: scripts/coccinelle/misc/simple_return.cocci Signed-off-by: NFengguang Wu <fengguang.wu@intel.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Markus Elfring 提交于
The functions xfs_blkdev_put() and xfs_qm_dqrele() test whether their argument is NULL and then return immediately. Thus the test around the call is not needed. This issue was detected by using the Coccinelle software. Signed-off-by: NMarkus Elfring <elfring@users.sourceforge.net> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
- 28 11月, 2014 5 次提交
-
-
由 Eric Sandeen 提交于
Here blkno is a daddr_t, which is a __s64; it's possible to hold a value which is negative, and thus pass the (blkno >= eofs) test. Then we try to do a xfs_perag_get() for a ridiculous agno via xfs_daddr_to_agno(), and bad things happen when that fails, and returns a null pag which is dereferenced shortly thereafter. Found via a user-supplied fuzzed image... Signed-off-by: NEric Sandeen <sandeen@redhat.com> Reviewed-by: NMark Tinguely <tinguely@sgi.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Brian Foster 提交于
The expectation since the introduction the lazy superblock counters is that the counters are synced and superblock logged appropriately as part of the filesystem freeze sequence. This does not occur, however, due to the logic in xfs_fs_writable() that prevents progress when the fs is in any state other than SB_UNFROZEN. While this is a bug, it has not been exposed to date because the last thing XFS does during freeze is dirty the log. The log recovery process recalculates the counters from AGI/AGF metadata to ensure everything is correct. Therefore should a crash occur while an fs is frozen, the subsequent log recovery puts everything back in order. See the following commit for reference: 92821e2b [XFS] Lazy Superblock Counters We might not always want to rely on dirtying the log on a frozen fs. Modify xfs_log_sbcount() to proceed when the filesystem is freezing but not once the freeze process has completed. Modify xfs_fs_writable() to accept the minimum freeze level for which modifications should be blocked to support various codepaths. Signed-off-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Brian Foster 提交于
The error handling in xfs_qm_log_quotaoff() has a couple problems. If xfs_trans_commit() fails, we fall through to the error block and call xfs_trans_cancel(). This is incorrect on commit failure. If xfs_trans_reserve() fails, we jump to the error block, cancel the tp and restore the superblock qflags to oldsbqflag. However, oldsbqflag has been initialized to zero and not yet updated from the original flags so we set the flags to zero. Fix up the error handling in xfs_qm_log_quotaoff() to not restore flags if they haven't been modified and not cancel the tp on commit failure. Remove the flag restore code altogether because commit error is the only failure condition and we don't know whether the transaction made it to disk. Reported-by: NDan Carpenter <dan.carpenter@oracle.com> Signed-off-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Brian Foster 提交于
There's no need to store a full struct xfs_trans_res on the stack in xfs_create() and copy the fields. Use a pointer to the appropriate structures embedded in the xfs_mount. Signed-off-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Brian Foster 提交于
The xfslogd workqueue is a global, single-job workqueue for buffer ioend processing. This means we allow for a single work item at a time for all possible XFS mounts on a system. fsstress testing in loopback XFS over XFS configurations has reproduced xfslogd deadlocks due to the single threaded nature of the queue and dependencies introduced between the separate XFS instances by online discard (-o discard). Discard over a loopback device converts the discard request to a hole punch (fallocate) on the underlying file. Online discard requests are issued synchronously and from xfslogd context in XFS, hence the xfslogd workqueue is blocked in the upper fs waiting on a hole punch request to be servied in the lower fs. If the lower fs issues I/O that depends on xfslogd to complete, both filesystems end up hung indefinitely. This is reproduced reliabily by generic/013 on XFS->loop->XFS test devices with the '-o discard' mount option. Further, docker implementations appear to use this kind of configuration for container instance filesystems by default (container fs->dm-> loop->base fs) and therefore are subject to this deadlock when running on XFS. Replace the global xfslogd workqueue with a per-mount variant. This guarantees each mount access to a single worker and prevents deadlocks due to inter-fs dependencies introduced by discard. Since the queue is only responsible for buffer iodone processing at this point in time, rename xfslogd to xfs-buf. Signed-off-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
- 13 10月, 2014 1 次提交
-
-
由 Eric Sandeen 提交于
caused a regression in xfs_inumbers, which in turn broke xfsdump, causing incomplete dumps. The loop in xfs_inumbers() needs to fill the user-supplied buffers, and iterates via xfs_btree_increment, reading new ags as needed. But the first time through the loop, if xfs_btree_increment() succeeds, we continue, which triggers the ++agno at the bottom of the loop, and we skip to soon to the next ag - without the proper setup under next_ag to read the next ag. Fix this by removing the agno increment from the loop conditional, and only increment agno if we have actually hit the code under the next_ag: target. Cc: stable@vger.kernel.org Signed-off-by: NEric Sandeen <sandeen@redhat.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
- 03 10月, 2014 1 次提交
-
-
由 Mark Tinguely 提交于
Commit 30136832 ("xfs: remove all the inodes on a buffer from the AIL in bulk") made the xfs inode flush callback more efficient by combining all the inode writes on the buffer and the deletions of the inode log item from AIL. The initial loop in this patch should be looping through all the log items on the buffer to see which items have xfs_iflush_done as their callback function. But currently, only the log item passed to the function has its callback compared to xfs_iflush_done. If the log item pointer passed to the function does have the xfs_iflush_done callback function, then all the log items on the buffer are removed from the li_bio_list on the buffer b_fspriv and could be removed from the AIL even though they may have not been written yet. This problem is masked by the fact that currently all inodes on a buffer will have the same calback function - either xfs_iflush_done or xfs_istale_done - and hence the bug cannot manifest in any way. Still, we need to remove the landmine so that if we add new callbacks in future this doesn't cause us problems. Signed-off-by: NMark Tinguely <tinguely@sgi.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
- 02 10月, 2014 20 次提交
-
-
由 Brian Foster 提交于
XFS currently discards delalloc blocks within the target range of a zero range request. Unaligned start and end offsets are zeroed through the page cache and the internal, aligned blocks are converted to unwritten extents. If EOF is page aligned and covered by a delayed allocation extent. The inode size is not updated until I/O completion. If a zero range request discards a delalloc range that covers page aligned EOF as such, the inode size update never occurs. For example: $ rm -f /mnt/file $ xfs_io -fc "pwrite 0 64k" -c "zero 60k 4k" /mnt/file $ stat -c "%s" /mnt/file 65536 $ umount /mnt $ mount <dev> /mnt $ stat -c "%s" /mnt/file 61440 Update xfs_zero_file_space() to flush the range rather than discard delalloc blocks to ensure that inode size updates occur appropriately. [dchinner: Note that this is really a workaround to avoid the underlying problems. More work is needed (and ongoing) to fix those issues so this fix is being added as a temporary stop-gap measure. ] Signed-off-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Brian Foster 提交于
xfs_vm_writepage() walks each buffer_head on the page, maps to the block on disk and attaches to a running ioend structure that represents the I/O submission. A new ioend is created when the type of I/O (unwritten, delayed allocation or overwrite) required for a particular buffer_head differs from the previous. If a buffer_head is a delalloc or unwritten buffer, the associated bits are cleared by xfs_map_at_offset() once the buffer_head is added to the ioend. The process of mapping each buffer_head occurs in xfs_map_blocks() and acquires the ilock in blocking or non-blocking mode, depending on the type of writeback in progress. If the lock cannot be acquired for non-blocking writeback, we cancel the ioend, redirty the page and return. Writeback will revisit the page at some later point. Note that we acquire the ilock for each buffer on the page. Therefore during non-blocking writeback, it is possible to add an unwritten buffer to the ioend, clear the unwritten state, fail to acquire the ilock when mapping a subsequent buffer and cancel the ioend. If this occurs, the unwritten status of the buffer sitting in the ioend has been lost. The page will eventually hit writeback again, but xfs_vm_writepage() submits overwrite I/O instead of unwritten I/O and does not perform unwritten extent conversion at I/O completion. This leads to data corruption because unwritten extents are treated as holes on reads and zeroes are returned instead of reading from disk. Modify xfs_cancel_ioend() to restore the buffer unwritten bit for ioends of type XFS_IO_UNWRITTEN. This ensures that unwritten extent conversion occurs once the page is eventually written back. Signed-off-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Eric Sandeen 提交于
Coverity spotted this. Granted, we *just* checked xfs_inod_dquot() in the caller (by calling xfs_quota_need_throttle). However, this is the only place we don't check the return value but the check is cheap and future-proof so add it. Signed-off-by: NEric Sandeen <sandeen@redhat.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Eric Sandeen 提交于
I discovered this in userspace, but the same change applies to the kernel. If we xfs_mdrestore an image from a non-crc filesystem, lo and behold the restored image has gained a CRC: # db/xfs_metadump.sh -o /dev/sdc1 - | xfs_mdrestore - test.img # xfs_db -c "sb 0" -c "p crc" /dev/sdc1 crc = 0 (correct) # xfs_db -c "sb 0" -c "p crc" test.img crc = 0xb6f8d6a0 (correct) This is because xfs_sb_from_disk doesn't fill in sb_crc, but xfs_sb_to_disk(XFS_SB_ALL_BITS) does write the in-memory CRC to disk - so we get uninitialized memory on disk. Fix this by always initializing sb_crc to 0 when we read the superblock, and masking out the CRC bit from ALL_BITS when we write it. Signed-off-by: NEric Sandeen <sandeen@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Eric Sandeen 提交于
In this case, if bp is NULL, error is set, and we send a NULL bp to xfs_trans_brelse, which will try to dereference it. Test whether we actually have a buffer before we try to free it. Coverity spotted this. Signed-off-by: NEric Sandeen <sandeen@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Brian Foster 提交于
If we write to the maximum file offset (2^63-2), XFS fails to log the inode size update when the page is flushed. For example: $ xfs_io -fc "pwrite `echo "2^63-1-1" | bc` 1" /mnt/file wrote 1/1 bytes at offset 9223372036854775806 1.000000 bytes, 1 ops; 0.0000 sec (22.711 KiB/sec and 23255.8140 ops/sec) $ stat -c %s /mnt/file 9223372036854775807 $ umount /mnt ; mount <dev> /mnt/ $ stat -c %s /mnt/file 0 This occurs because XFS calculates the new file size as io_offset + io_size, I/O occurs in block sized requests, and the maximum supported file size is not block aligned. Therefore, a write to the max allowable offset on a 4k blocksize fs results in a write of size 4k to offset 2^63-4096 (e.g., equivalent to round_down(2^63-1, 4096), or IOW the offset of the block that contains the max file size). The offset plus size calculation (2^63 - 4096 + 4096 == 2^63) overflows the signed 64-bit variable which goes negative and causes the > comparison to the on-disk inode size to fail. This returns 0 from xfs_new_eof() and results in no change to the inode on-disk. Update xfs_new_eof() to explicitly detect overflow of the local calculation and use the VFS inode size in this scenario. The VFS inode size is capped to the maximum and thus XFS writes the correct inode size to disk. Signed-off-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Dave Chinner 提交于
Currently the extent size hint is set unconditionally in xfs_ioctl_setattr() when the FSX_EXTSIZE flag is set. Hence we can set hints when the inode flags indicating the hint should be used are not set. Hence only set the extent size hint from userspace when the inode has the XFS_DIFLAG_EXTSIZE flag set to indicate that we should have an extent size hint set on the inode. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Dave Chinner 提交于
xfs_set_diflags() allows it to be set on non-directory inodes, and this flags errors in xfs_repair. Further, inode allocation allows the same directory-only flag to be inherited to non-directories. Make sure directory inode flags don't appear on other types of inodes. This fixes several xfstests scratch fileystem corruption reports (e.g. xfs/050) now that xfstests checks scratch filesystems after test completion. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Dave Chinner 提交于
The typedef for timespecs and nanotime() are completely unnecessary, and delay() can be moved to fs/xfs/linux.h, which means this file can go away. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Dave Chinner 提交于
struct compat_xfs_bstat is missing the di_forkoff field and so does not fully translate the structure correctly. Fix it. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Christoph Hellwig 提交于
xfs_zero_remaining_bytes() open codes a log of buffer manupulations to do a read forllowed by a write. It can simply be replaced by an uncached read followed by a xfs_bwrite() call. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Dave Chinner 提交于
xfs_buf_read_uncached() has two failure modes. If can either return NULL or bp->b_error != 0 depending on the type of failure, and not all callers check for both. Fix it so that xfs_buf_read_uncached() always returns the error status, and the buffer is returned as a function parameter. The buffer will only be returned on success. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Dave Chinner 提交于
There is a lot of cookie-cutter code that looks like: if (shutdown) handle buffer error xfs_buf_iorequest(bp) error = xfs_buf_iowait(bp) if (error) handle buffer error spread through XFS. There's significant complexity now in xfs_buf_iorequest() to specifically handle this sort of synchronous IO pattern, but there's all sorts of nasty surprises in different error handling code dependent on who owns the buffer references and the locks. Pull this pattern into a single helper, where we can hide all the synchronous IO warts and hence make the error handling for all the callers much saner. This removes the need for a special extra reference to protect IO completion processing, as we can now hold a single reference across dispatch and waiting, simplifying the sync IO smeantics and error handling. In doing this, also rename xfs_buf_iorequest to xfs_buf_submit and make it explicitly handle on asynchronous IO. This forces all users to be switched specifically to one interface or the other and removes any ambiguity between how the interfaces are to be used. It also means that xfs_buf_iowait() goes away. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Dave Chinner 提交于
There is only one caller now - xfs_trans_read_buf_map() - and it has very well defined call semantics - read, synchronous, and b_iodone is NULL. Hence it's pretty clear what error handling is necessary for this case. The bigger problem of untangling xfs_trans_read_buf_map error handling is left to a future patch. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Dave Chinner 提交于
Internal buffer write error handling is a mess due to the unnatural split between xfs_bioerror and xfs_bioerror_relse(). xfs_bwrite() only does sync IO and determines the handler to call based on b_iodone, so for this caller the only difference between xfs_bioerror() and xfs_bioerror_release() is the XBF_DONE flag. We don't care what the XBF_DONE flag state is because we stale the buffer in both paths - the next buffer lookup will clear XBF_DONE because XBF_STALE is set. Hence we can use common error handling for xfs_bwrite(). __xfs_buf_delwri_submit() is a similar - it's only ever called on writes - all sync or async - and again there's no reason to handle them any differently at all. Clean up the nasty error handling and remove xfs_bioerror(). Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Dave Chinner 提交于
Only has two callers, and is just a shutdown check and error handler around xfs_buf_iorequest. However, the error handling is a mess of read and write semantics, and both internal callers only call it for writes. Hence kill the wrapper, and follow up with a patch to sanitise the error handling. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Dave Chinner 提交于
Currently the report of a bio error from completion immediately marks the buffer with an error. The issue is that this is racy w.r.t. synchronous IO - the submitter can see b_error being set before the IO is complete, and hence we cannot differentiate between submission failures and completion failures. Add an internal b_io_error field protected by the b_lock to catch IO completion errors, and only propagate that to the buffer during final IO completion handling. Hence we can tell in xfs_buf_iorequest if we've had a submission failure bey checking bp->b_error before dropping our b_io_remaining reference - that reference will prevent b_io_error values from being propagated to b_error in the event that completion races with submission. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Dave Chinner 提交于
We do some work in xfs_buf_ioend, and some work in xfs_buf_iodone_work, but much of that functionality is the same. This work can all be done in a single function, leaving xfs_buf_iodone just a wrapper to determine if we should execute it by workqueue or directly. hence rename xfs_buf_iodone_work to xfs_buf_ioend(), and add a new xfs_buf_ioend_async() for places that need async processing. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Dave Chinner 提交于
When synchronous IO runs IO completion work, it does so without an IO reference or a hold reference on the buffer. The IO "hold reference" is owned by the submitter, and released when the submission is complete. The IO reference is released when both the submitter and the bio end_io processing is run, and so if the io completion work is run from IO completion context, it is run without an IO reference. Hence we can get the situation where the submitter can submit the IO, see an error on the buffer and unlock and free the buffer while there is still IO in progress. This leads to use-after-free and memory corruption. Fix this by taking a "sync IO hold" reference that is owned by the IO and not released until after the buffer completion calls are run to wake up synchronous waiters. This means that the buffer will not be freed in any circumstance until all IO processing is completed. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Dave Chinner 提交于
For the special case of delwri buffer submission and waiting, we don't need to issue IO synchronously at all. The second pass to call xfs_buf_iowait() can be replaced with blocking on xfs_buf_lock() - the buffer will be unlocked when the async IO is complete. This formalises a sane the method of waiting for async IO - take an extra reference, submit the IO, call xfs_buf_lock() when you want to wait for IO completion. i.e.: bp = xfs_buf_find(); xfs_buf_hold(bp); bp->b_flags |= XBF_ASYNC; xfs_buf_iosubmit(bp); xfs_buf_lock(bp) error = bp->b_error; .... xfs_buf_relse(bp); While this is somewhat racy for gathering IO errors, none of the code that calls xfs_buf_delwri_submit() will race against other users of the buffers being submitted. Even if they do, we don't really care if the error is detected by the delwri code or the user we raced against. Either way, the error will be detected and handled. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDave Chinner <david@fromorbit.com>
-