- 28 11月, 2014 1 次提交
-
-
由 Christoph Hellwig 提交于
More consolidatation for the on-disk format defintions. Note that the XFS_IS_REALTIME_INODE moves to xfs_linux.h instead as it is not related to the on disk format, but depends on a CONFIG_ option. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
- 13 10月, 2014 1 次提交
-
-
由 Eric Sandeen 提交于
caused a regression in xfs_inumbers, which in turn broke xfsdump, causing incomplete dumps. The loop in xfs_inumbers() needs to fill the user-supplied buffers, and iterates via xfs_btree_increment, reading new ags as needed. But the first time through the loop, if xfs_btree_increment() succeeds, we continue, which triggers the ++agno at the bottom of the loop, and we skip to soon to the next ag - without the proper setup under next_ag to read the next ag. Fix this by removing the agno increment from the loop conditional, and only increment agno if we have actually hit the code under the next_ag: target. Cc: stable@vger.kernel.org Signed-off-by: NEric Sandeen <sandeen@redhat.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
- 03 10月, 2014 1 次提交
-
-
由 Mark Tinguely 提交于
Commit 30136832 ("xfs: remove all the inodes on a buffer from the AIL in bulk") made the xfs inode flush callback more efficient by combining all the inode writes on the buffer and the deletions of the inode log item from AIL. The initial loop in this patch should be looping through all the log items on the buffer to see which items have xfs_iflush_done as their callback function. But currently, only the log item passed to the function has its callback compared to xfs_iflush_done. If the log item pointer passed to the function does have the xfs_iflush_done callback function, then all the log items on the buffer are removed from the li_bio_list on the buffer b_fspriv and could be removed from the AIL even though they may have not been written yet. This problem is masked by the fact that currently all inodes on a buffer will have the same calback function - either xfs_iflush_done or xfs_istale_done - and hence the bug cannot manifest in any way. Still, we need to remove the landmine so that if we add new callbacks in future this doesn't cause us problems. Signed-off-by: NMark Tinguely <tinguely@sgi.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
- 02 10月, 2014 21 次提交
-
-
由 Brian Foster 提交于
XFS currently discards delalloc blocks within the target range of a zero range request. Unaligned start and end offsets are zeroed through the page cache and the internal, aligned blocks are converted to unwritten extents. If EOF is page aligned and covered by a delayed allocation extent. The inode size is not updated until I/O completion. If a zero range request discards a delalloc range that covers page aligned EOF as such, the inode size update never occurs. For example: $ rm -f /mnt/file $ xfs_io -fc "pwrite 0 64k" -c "zero 60k 4k" /mnt/file $ stat -c "%s" /mnt/file 65536 $ umount /mnt $ mount <dev> /mnt $ stat -c "%s" /mnt/file 61440 Update xfs_zero_file_space() to flush the range rather than discard delalloc blocks to ensure that inode size updates occur appropriately. [dchinner: Note that this is really a workaround to avoid the underlying problems. More work is needed (and ongoing) to fix those issues so this fix is being added as a temporary stop-gap measure. ] Signed-off-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Brian Foster 提交于
xfs_vm_writepage() walks each buffer_head on the page, maps to the block on disk and attaches to a running ioend structure that represents the I/O submission. A new ioend is created when the type of I/O (unwritten, delayed allocation or overwrite) required for a particular buffer_head differs from the previous. If a buffer_head is a delalloc or unwritten buffer, the associated bits are cleared by xfs_map_at_offset() once the buffer_head is added to the ioend. The process of mapping each buffer_head occurs in xfs_map_blocks() and acquires the ilock in blocking or non-blocking mode, depending on the type of writeback in progress. If the lock cannot be acquired for non-blocking writeback, we cancel the ioend, redirty the page and return. Writeback will revisit the page at some later point. Note that we acquire the ilock for each buffer on the page. Therefore during non-blocking writeback, it is possible to add an unwritten buffer to the ioend, clear the unwritten state, fail to acquire the ilock when mapping a subsequent buffer and cancel the ioend. If this occurs, the unwritten status of the buffer sitting in the ioend has been lost. The page will eventually hit writeback again, but xfs_vm_writepage() submits overwrite I/O instead of unwritten I/O and does not perform unwritten extent conversion at I/O completion. This leads to data corruption because unwritten extents are treated as holes on reads and zeroes are returned instead of reading from disk. Modify xfs_cancel_ioend() to restore the buffer unwritten bit for ioends of type XFS_IO_UNWRITTEN. This ensures that unwritten extent conversion occurs once the page is eventually written back. Signed-off-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Eric Sandeen 提交于
Coverity spotted this. Granted, we *just* checked xfs_inod_dquot() in the caller (by calling xfs_quota_need_throttle). However, this is the only place we don't check the return value but the check is cheap and future-proof so add it. Signed-off-by: NEric Sandeen <sandeen@redhat.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Eric Sandeen 提交于
I discovered this in userspace, but the same change applies to the kernel. If we xfs_mdrestore an image from a non-crc filesystem, lo and behold the restored image has gained a CRC: # db/xfs_metadump.sh -o /dev/sdc1 - | xfs_mdrestore - test.img # xfs_db -c "sb 0" -c "p crc" /dev/sdc1 crc = 0 (correct) # xfs_db -c "sb 0" -c "p crc" test.img crc = 0xb6f8d6a0 (correct) This is because xfs_sb_from_disk doesn't fill in sb_crc, but xfs_sb_to_disk(XFS_SB_ALL_BITS) does write the in-memory CRC to disk - so we get uninitialized memory on disk. Fix this by always initializing sb_crc to 0 when we read the superblock, and masking out the CRC bit from ALL_BITS when we write it. Signed-off-by: NEric Sandeen <sandeen@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Eric Sandeen 提交于
In this case, if bp is NULL, error is set, and we send a NULL bp to xfs_trans_brelse, which will try to dereference it. Test whether we actually have a buffer before we try to free it. Coverity spotted this. Signed-off-by: NEric Sandeen <sandeen@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Brian Foster 提交于
If we write to the maximum file offset (2^63-2), XFS fails to log the inode size update when the page is flushed. For example: $ xfs_io -fc "pwrite `echo "2^63-1-1" | bc` 1" /mnt/file wrote 1/1 bytes at offset 9223372036854775806 1.000000 bytes, 1 ops; 0.0000 sec (22.711 KiB/sec and 23255.8140 ops/sec) $ stat -c %s /mnt/file 9223372036854775807 $ umount /mnt ; mount <dev> /mnt/ $ stat -c %s /mnt/file 0 This occurs because XFS calculates the new file size as io_offset + io_size, I/O occurs in block sized requests, and the maximum supported file size is not block aligned. Therefore, a write to the max allowable offset on a 4k blocksize fs results in a write of size 4k to offset 2^63-4096 (e.g., equivalent to round_down(2^63-1, 4096), or IOW the offset of the block that contains the max file size). The offset plus size calculation (2^63 - 4096 + 4096 == 2^63) overflows the signed 64-bit variable which goes negative and causes the > comparison to the on-disk inode size to fail. This returns 0 from xfs_new_eof() and results in no change to the inode on-disk. Update xfs_new_eof() to explicitly detect overflow of the local calculation and use the VFS inode size in this scenario. The VFS inode size is capped to the maximum and thus XFS writes the correct inode size to disk. Signed-off-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Dave Chinner 提交于
Currently the extent size hint is set unconditionally in xfs_ioctl_setattr() when the FSX_EXTSIZE flag is set. Hence we can set hints when the inode flags indicating the hint should be used are not set. Hence only set the extent size hint from userspace when the inode has the XFS_DIFLAG_EXTSIZE flag set to indicate that we should have an extent size hint set on the inode. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Dave Chinner 提交于
xfs_set_diflags() allows it to be set on non-directory inodes, and this flags errors in xfs_repair. Further, inode allocation allows the same directory-only flag to be inherited to non-directories. Make sure directory inode flags don't appear on other types of inodes. This fixes several xfstests scratch fileystem corruption reports (e.g. xfs/050) now that xfstests checks scratch filesystems after test completion. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Dave Chinner 提交于
The typedef for timespecs and nanotime() are completely unnecessary, and delay() can be moved to fs/xfs/linux.h, which means this file can go away. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Dave Chinner 提交于
struct compat_xfs_bstat is missing the di_forkoff field and so does not fully translate the structure correctly. Fix it. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Christoph Hellwig 提交于
xfs_zero_remaining_bytes() open codes a log of buffer manupulations to do a read forllowed by a write. It can simply be replaced by an uncached read followed by a xfs_bwrite() call. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Dave Chinner 提交于
xfs_buf_read_uncached() has two failure modes. If can either return NULL or bp->b_error != 0 depending on the type of failure, and not all callers check for both. Fix it so that xfs_buf_read_uncached() always returns the error status, and the buffer is returned as a function parameter. The buffer will only be returned on success. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Dave Chinner 提交于
There is a lot of cookie-cutter code that looks like: if (shutdown) handle buffer error xfs_buf_iorequest(bp) error = xfs_buf_iowait(bp) if (error) handle buffer error spread through XFS. There's significant complexity now in xfs_buf_iorequest() to specifically handle this sort of synchronous IO pattern, but there's all sorts of nasty surprises in different error handling code dependent on who owns the buffer references and the locks. Pull this pattern into a single helper, where we can hide all the synchronous IO warts and hence make the error handling for all the callers much saner. This removes the need for a special extra reference to protect IO completion processing, as we can now hold a single reference across dispatch and waiting, simplifying the sync IO smeantics and error handling. In doing this, also rename xfs_buf_iorequest to xfs_buf_submit and make it explicitly handle on asynchronous IO. This forces all users to be switched specifically to one interface or the other and removes any ambiguity between how the interfaces are to be used. It also means that xfs_buf_iowait() goes away. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Dave Chinner 提交于
There is only one caller now - xfs_trans_read_buf_map() - and it has very well defined call semantics - read, synchronous, and b_iodone is NULL. Hence it's pretty clear what error handling is necessary for this case. The bigger problem of untangling xfs_trans_read_buf_map error handling is left to a future patch. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Dave Chinner 提交于
Internal buffer write error handling is a mess due to the unnatural split between xfs_bioerror and xfs_bioerror_relse(). xfs_bwrite() only does sync IO and determines the handler to call based on b_iodone, so for this caller the only difference between xfs_bioerror() and xfs_bioerror_release() is the XBF_DONE flag. We don't care what the XBF_DONE flag state is because we stale the buffer in both paths - the next buffer lookup will clear XBF_DONE because XBF_STALE is set. Hence we can use common error handling for xfs_bwrite(). __xfs_buf_delwri_submit() is a similar - it's only ever called on writes - all sync or async - and again there's no reason to handle them any differently at all. Clean up the nasty error handling and remove xfs_bioerror(). Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Dave Chinner 提交于
Only has two callers, and is just a shutdown check and error handler around xfs_buf_iorequest. However, the error handling is a mess of read and write semantics, and both internal callers only call it for writes. Hence kill the wrapper, and follow up with a patch to sanitise the error handling. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Dave Chinner 提交于
Currently the report of a bio error from completion immediately marks the buffer with an error. The issue is that this is racy w.r.t. synchronous IO - the submitter can see b_error being set before the IO is complete, and hence we cannot differentiate between submission failures and completion failures. Add an internal b_io_error field protected by the b_lock to catch IO completion errors, and only propagate that to the buffer during final IO completion handling. Hence we can tell in xfs_buf_iorequest if we've had a submission failure bey checking bp->b_error before dropping our b_io_remaining reference - that reference will prevent b_io_error values from being propagated to b_error in the event that completion races with submission. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Dave Chinner 提交于
We do some work in xfs_buf_ioend, and some work in xfs_buf_iodone_work, but much of that functionality is the same. This work can all be done in a single function, leaving xfs_buf_iodone just a wrapper to determine if we should execute it by workqueue or directly. hence rename xfs_buf_iodone_work to xfs_buf_ioend(), and add a new xfs_buf_ioend_async() for places that need async processing. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Dave Chinner 提交于
When synchronous IO runs IO completion work, it does so without an IO reference or a hold reference on the buffer. The IO "hold reference" is owned by the submitter, and released when the submission is complete. The IO reference is released when both the submitter and the bio end_io processing is run, and so if the io completion work is run from IO completion context, it is run without an IO reference. Hence we can get the situation where the submitter can submit the IO, see an error on the buffer and unlock and free the buffer while there is still IO in progress. This leads to use-after-free and memory corruption. Fix this by taking a "sync IO hold" reference that is owned by the IO and not released until after the buffer completion calls are run to wake up synchronous waiters. This means that the buffer will not be freed in any circumstance until all IO processing is completed. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Dave Chinner 提交于
For the special case of delwri buffer submission and waiting, we don't need to issue IO synchronously at all. The second pass to call xfs_buf_iowait() can be replaced with blocking on xfs_buf_lock() - the buffer will be unlocked when the async IO is complete. This formalises a sane the method of waiting for async IO - take an extra reference, submit the IO, call xfs_buf_lock() when you want to wait for IO completion. i.e.: bp = xfs_buf_find(); xfs_buf_hold(bp); bp->b_flags |= XBF_ASYNC; xfs_buf_iosubmit(bp); xfs_buf_lock(bp) error = bp->b_error; .... xfs_buf_relse(bp); While this is somewhat racy for gathering IO errors, none of the code that calls xfs_buf_delwri_submit() will race against other users of the buffers being submitted. Even if they do, we don't really care if the error is detected by the delwri code or the user we raced against. Either way, the error will be detected and handled. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Dave Chinner 提交于
When we have marked the filesystem for shutdown, we want to prevent any further buffer IO from being submitted. However, we currently force the log after marking the filesystem as shut down, hence allowing IO to the log *after* we have marked both the filesystem and the log as in an error state. Clean this up by forcing the log before we mark the filesytem with an error. This replaces the pure CIL flush that we currently have which works around this same issue (i.e the CIL can't be flushed once the shutdown flags are set) and hence enables us to clean up the logic substantially. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
- 29 9月, 2014 9 次提交
-
-
由 Dave Chinner 提交于
Some argument callbacks can contain user buffers, and sparse warns about passing them as void pointers. Cast appropriately to remove the sparse warnings. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Dave Chinner 提交于
As it is accessed through the struct xfs_mount and can be set up entirely from fs/xfs/xfs_super.c Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Dave Chinner 提交于
To remove noise from the build. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Dave Chinner 提交于
Sparse warns that we are passing the big-endian valueo f agi_newino to the initial btree lookup function when trying to find a new inode. This is wrong - we need to pass the host order value, not the disk order value. This will adversely affect the next inode allocated, but given that the free inode btree is usually much smaller than the allocated inode btree it is much less likely to be a performance issue if we start the search in the wrong place. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Dave Chinner 提交于
Rework the transaction lookup and allocation code in xlog_recovery_process_ophdr() to fold two related call-once helper functions into a single helper. Then fold in all the XLOG_START_TRANS logic to that helper to clean up the remaining logic in xlog_recovery_process_ophdr(). Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Dave Chinner 提交于
The code for managing transactions anf the items for recovery is spread across 3 different locations in the file. Move them all together so that it is easy to read the code without needing to jump long distances in the file. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Dave Chinner 提交于
When an error occurs during buffer submission in xlog_recover_commit_trans(), we free the trans structure twice. Fix it by only freeing the structure in the caller regardless of the success or failure of the function. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Dave Chinner 提交于
The XLOG_UNMOUNT_TRANS case skips the transaction, despite the fact an unmount record is always in a standalone transaction. Hence whenever we come across one of these we need to free the transaction structure associated with it as there is no commit record that follows it. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Dave Chinner 提交于
Clean up xlog_recover_process_data() structure in preparation for fixing the allocation and freeing context of the transaction being recovered. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
- 23 9月, 2014 7 次提交
-
-
由 Dave Chinner 提交于
On a sub-page sized filesystem, truncating a mapped region down leaves us in a world of hurt. We truncate the pagecache, zeroing the newly unused tail, then punch blocks out from under the page. If we then truncate the file back up immediately, we expose that unmapped hole to a dirty page mapped into the user application, and that's where it all goes wrong. In truncating the page cache, we avoid unmapping the tail page of the cache because it still contains valid data. The problem is that it also contains a hole after the truncate, but nobody told the mm subsystem that. Therefore, if the page is dirty before the truncate, we'll never get a .page_mkwrite callout after we extend the file and the application writes data into the hole on the page. Hence when we come to writing that region of the page, it has no blocks and no delayed allocation reservation and hence we toss the data away. This patch adds code to the truncate up case to solve it, by ensuring the partial page at the old EOF is always cleaned after we do any zeroing and move the EOF upwards. We can't actually serialise the page writeback and truncate against page faults (yes, that problem AGAIN) so this is really just a best effort and assumes it is extremely unlikely that someone is concurrently writing to the page at the EOF while extending the file. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Dave Chinner 提交于
Fix sparse warning introduced by commit 4ef897a2 ("xfs: flush both inodes in xfs_swap_extents"). Signed-off-by: NFengguang Wu <fengguang.wu@intel.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Dave Chinner 提交于
Fix sparse warning introduced by commit ac8809f9 ("xfs: abort metadata writeback on permanent errors"). Signed-off-by: NFengguang Wu <fengguang.wu@intel.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Fengguang Wu 提交于
Fix sparse warning introduced by commit afabfd30 ("xfs: combine xfs_rtmodify_summary and xfs_rtget_summary"). Signed-off-by: NFengguang Wu <fengguang.wu@intel.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Fabian Frederick 提交于
xfs_quota.h was included twice. Signed-off-by: NFabian Frederick <fabf@skynet.be> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Eric Sandeen 提交于
xfs_dir3_data_get_ftype() gets the file type off disk, but ASSERTs if it's invalid: ASSERT(type < XFS_DIR3_FT_MAX); We shouldn't ASSERT on bad values read from disk. V3 dirs are CRC-protected, but V2 dirs + ftype are not. Signed-off-by: NEric Sandeen <sandeen@redhat.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Dave Chinner 提交于
When running a tight mount/unmount loop on an older kernel, RedHat QE found that unmount would occasionally hang in xfs_buf_unpin_wait() on the superblock buffer. Tracing and other debug work by Eric Sandeen indicated that it was hanging on the writing of the superblock during unmount immediately after logging the superblock counters in a synchronous transaction. Further debug indicated that the synchronous transaction was not waiting for completion correctly, and we narrowed it down to xlog_cil_force_lsn() returning NULLCOMMITLSN and hence not pushing the transaction in the iclog buffer to disk correctly. While this unmount superblock write code is now very different in mainline kernels, the xlog_cil_force_lsn() code is identical, and it was bisected to the backport of commit f876e446 ("xfs: always do log forces via the workqueue"). This commit made the CIL push asynchronous for log forces and hence exposed a race condition that couldn't occur on a synchronous push. Essentially, the xlog_cil_force_lsn() relied implicitly on the fact that the sequence push would be complete by the time xlog_cil_push_now() returned, resulting in the context being pushed being in the committing list. When it was made asynchronous, it was recognised that there was a race condition in detecting whether an asynchronous push has started or not and code was added to handle it. Unfortunately, the fix was not quite right and left a race condition where it it would detect an empty CIL while a push was in progress before the context had been added to the committing list. This was incorrectly seen as a "nothing to do" condition and so would tell xfs_log_force_lsn() that there is nothing to wait for, and hence it would push the iclogbufs in memory. The fix is simple, but explaining the logic and the race condition is a lot more complex. The fix is to add the context to the committing list before we start emptying the CIL. This allows us to detect the difference between an empty "do nothing" push and a push that has not started by adding a discrete "emptying the CIL" state to avoid the transient, incorrect "empty" condition that the (unchanged) waiting code was seeing. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-