- 22 6月, 2012 5 次提交
-
-
由 Mark Tinguely 提交于
Rename the XFS log structure to xlog to help crash distinquish it from the other logs in Linux. Signed-off-by: NMark Tinguely <tinguely@sgi.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NBen Myers <bpm@sgi.com>
-
由 Ben Myers 提交于
Revert commit 1307bbd2, which uses the s_umount semaphore to provide exclusion between xfs_sync_worker and unmount, in favor of shutting down the sync worker before freeing the log in xfs_log_unmount. This is a cleaner way of resolving the race between xfs_sync_worker and unmount than using s_umount. Signed-off-by: NBen Myers <bpm@sgi.com> Reviewed-by: NMark Tinguely <tinguely@sgi.com> Reviewed-by: NDave Chinner <dchinner@redhat.com>
-
由 Jan Kara 提交于
Commit de1cbee4 which removed b_file_offset in favor of b_bn introduced a bug causing xfs_buf_allocate_memory() to overestimate the number of necessary pages. The problem is that xfs_buf_alloc() sets b_bn to -1 and thus effectively every buffer is straddling a page boundary which causes xfs_buf_allocate_memory() to allocate two pages and use vmalloc() for access which is unnecessary. Dave says xfs_buf_alloc() doesn't need to set b_bn to -1 anymore since the buffer is inserted into the cache only after being fully initialized now. So just make xfs_buf_alloc() fill in proper block number from the beginning. CC: David Chinner <dchinner@redhat.com> Signed-off-by: NJan Kara <jack@suse.cz> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NBen Myers <bpm@sgi.com>
-
由 Dave Chinner 提交于
When we fail to find an matching extent near the requested extent specification during a left-right distance search in xfs_alloc_ag_vextent_near, we fail to free the original cursor that we used to look up the XFS_BTNUM_CNT tree and hence leak it. Reported-by: NChris J Arges <chris.j.arges@canonical.com> Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NBen Myers <bpm@sgi.com>
-
由 Brian Foster 提交于
An inode in the AIL can be flush locked and marked stale if a cluster free transaction occurs at the right time. The inode item is then marked as flushing, which causes xfsaild to spin and leaves the filesystem stalled. This is reproduced by running xfstests 273 in a loop for an extended period of time. Check for stale inodes before the flush lock. This marks the inode as pinned, leads to a log flush and allows the filesystem to proceed. Signed-off-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NMark Tinguely <tinguely@sgi.com> Signed-off-by: NBen Myers <bpm@sgi.com>
-
- 21 6月, 2012 2 次提交
-
-
由 Jeff Liu 提交于
Fengguang reports: [ 780.529603] XFS (vdd): Ending clean mount [ 781.454590] ODEBUG: object is on stack, but not annotated [ 781.455433] ------------[ cut here ]------------ [ 781.455433] WARNING: at /c/kernel-tests/sound/lib/debugobjects.c:301 __debug_object_init+0x173/0x1f1() [ 781.455433] Hardware name: Bochs [ 781.455433] Modules linked in: [ 781.455433] Pid: 26910, comm: kworker/0:2 Not tainted 3.4.0+ #51 [ 781.455433] Call Trace: [ 781.455433] [<ffffffff8106bc84>] warn_slowpath_common+0x83/0x9b [ 781.455433] [<ffffffff8106bcb6>] warn_slowpath_null+0x1a/0x1c [ 781.455433] [<ffffffff814919a5>] __debug_object_init+0x173/0x1f1 [ 781.455433] [<ffffffff81491c65>] debug_object_init+0x14/0x16 [ 781.455433] [<ffffffff8108842a>] __init_work+0x20/0x22 [ 781.455433] [<ffffffff8134ea56>] xfs_alloc_vextent+0x6c/0xd5 Use INIT_WORK_ONSTACK in xfs_alloc_vextent instead of INIT_WORK. Reported-by: NWu Fengguang <wfg@linux.intel.com> Signed-off-by: NJie Liu <jeff.liu@oracle.com> Signed-off-by: NBen Myers <bpm@sgi.com>
-
由 Alain Renaud 提交于
On filesytems with a block size smaller than PAGE_SIZE we currently have a problem with unwritten extents. If a we have multi-block page for which an unwritten extent has been allocated, and only some of the buffers have been written to, and they are not contiguous, we can expose stale data from disk in the blocks between the writes after extent conversion. Example of a page with unwritten and real data. buffer content 0 empty b_state = 0 1 DATA b_state = 0x1023 Uptodate,Dirty,Mapped,Unwritten 2 DATA b_state = 0x1023 Uptodate,Dirty,Mapped,Unwritten 3 empty b_state = 0 4 empty b_state = 0 5 DATA b_state = 0x1023 Uptodate,Dirty,Mapped,Unwritten 6 DATA b_state = 0x1023 Uptodate,Dirty,Mapped,Unwritten 7 empty b_state = 0 Buffers 1, 2, 5, and 6 have been written to, leaving 0, 3, 4, and 7 empty. Currently buffers 1, 2, 5, and 6 are added to a single ioend, and when IO has completed, extent conversion creates a real extent from block 1 through block 6, leaving 0 and 7 unwritten. However buffers 3 and 4 were not written to disk, so stale data is exposed from those blocks on a subsequent read. Fix this by setting iomap_valid = 0 when we find a buffer that is not Uptodate. This ensures that buffers 5 and 6 are not added to the same ioend as buffers 1 and 2. Later these blocks will be converted into two separate real extents, leaving the blocks in between unwritten. Signed-off-by: NAlain Renaud <arenaud@sgi.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NBen Myers <bpm@sgi.com>
-
- 02 6月, 2012 1 次提交
-
-
由 Josef Bacik 提交于
Btrfs has to make sure we have space to allocate new blocks in order to modify the inode, so updating time can fail. We've gotten around this by having our own file_update_time but this is kind of a pain, and Christoph has indicated he would like to make xfs do something different with atime updates. So introduce ->update_time, where we will deal with i_version an a/m/c time updates and indicate which changes need to be made. The normal version just does what it has always done, updates the time and marks the inode dirty, and then filesystems can choose to do something different. I've gone through all of the users of file_update_time and made them check for errors with the exception of the fault code since it's complicated and I wasn't quite sure what to do there, also Jan is going to be pushing the file time updates into page_mkwrite for those who have it so that should satisfy btrfs and make it not a big deal to check the file_update_time() return code in the generic fault path. Thanks, Signed-off-by: NJosef Bacik <josef@redhat.com>
-
- 30 5月, 2012 2 次提交
-
-
由 Al Viro 提交于
pass inode + parent's inode or NULL instead of dentry + bool saying whether we want the parent or not. NOTE: that needs ceph fix folded in. Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 21 5月, 2012 3 次提交
-
-
由 Dave Chinner 提交于
To enable easy tracing of the location of log forces and the frequency of them via perf, add a pair of trace points to the log force functions. This will help debug where excessive log forces are being issued from by simple perf commands like: # ~/perf/perf top -e xfs:xfs_log_force -G -U Which gives this sort of output: Events: 141 xfs:xfs_log_force - 100.00% [kernel] [k] xfs_log_force - xfs_log_force 87.04% xfsaild kthread kernel_thread_helper - 12.87% xfs_buf_lock _xfs_buf_find xfs_buf_get xfs_trans_get_buf xfs_da_do_buf xfs_da_get_buf xfs_dir2_data_init xfs_dir2_leaf_addname xfs_dir_createname xfs_create xfs_vn_mknod xfs_vn_create vfs_create do_last.isra.41 path_openat do_filp_open do_sys_open sys_open system_call_fastpath Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NMark Tinguely <tinguely@sgi.com> Signed-off-by: NBen Myers <bpm@sig.com>
-
由 Peter Watkins 提交于
Note xfs_iget can be called while holding a locked agi buffer. If it goes into memory reclaim then inode teardown may try to lock the same buffer. Prevent the deadlock by calling radix_tree_preload with GFP_NOFS. Signed-off-by: NPeter Watkins <treestem@gmail.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NBen Myers <bpm@sgi.com>
-
由 Dave Chinner 提交于
xfstest 270 was causing quota reservations way beyond what was sane (ten to hundreds of TB) for a 4GB filesystem. There's a sign problem in the error handling path of xfs_bmapi_reserve_delalloc() because xfs_trans_unreserve_quota_nblks() simple negates the value passed - which doesn't work for an unsigned variable. This causes reservations of close to 2^32 block instead of removing a reservation of a handful of blocks. Fix the same problem in the other xfs_trans_unreserve_quota_nblks() callers where unsigned integer variables are used, too. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NEric Sandeen <sandeen@redhat.com> Signed-off-by: NBen Myers <bpm@sgi.com>
-
- 16 5月, 2012 1 次提交
-
-
由 Ben Myers 提交于
xfs_sync_worker checks the MS_ACTIVE flag in s_flags to avoid doing work during mount and unmount. This flag can be cleared by unmount after the xfs_sync_worker checks it but before the work is completed. The has caused crashes in the completion handler for the dummy transaction commited by xfs_sync_worker: PID: 27544 TASK: ffff88013544e040 CPU: 3 COMMAND: "kworker/3:0" #0 [ffff88016fdff930] machine_kexec at ffffffff810244e9 #1 [ffff88016fdff9a0] crash_kexec at ffffffff8108d053 #2 [ffff88016fdffa70] oops_end at ffffffff813ad1b8 #3 [ffff88016fdffaa0] no_context at ffffffff8102bd48 #4 [ffff88016fdffaf0] __bad_area_nosemaphore at ffffffff8102c04d #5 [ffff88016fdffb40] bad_area_nosemaphore at ffffffff8102c12e #6 [ffff88016fdffb50] do_page_fault at ffffffff813afaee #7 [ffff88016fdffc60] page_fault at ffffffff813ac635 [exception RIP: xlog_get_lowest_lsn+0x30] RIP: ffffffffa04a9910 RSP: ffff88016fdffd10 RFLAGS: 00010246 RAX: ffffc90014e48000 RBX: ffff88014d879980 RCX: ffff88014d879980 RDX: ffff8802214ee4c0 RSI: 0000000000000000 RDI: 0000000000000000 RBP: ffff88016fdffd10 R8: ffff88014d879a80 R9: 0000000000000000 R10: 0000000000000001 R11: 0000000000000000 R12: ffff8802214ee400 R13: ffff88014d879980 R14: 0000000000000000 R15: ffff88022fd96605 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #8 [ffff88016fdffd18] xlog_state_do_callback at ffffffffa04aa186 [xfs] #9 [ffff88016fdffd98] xlog_state_done_syncing at ffffffffa04aa568 [xfs] Protect xfs_sync_worker by using the s_umount semaphore at the read level to provide exclusion with unmount while work is progressing. Reviewed-by: NMark Tinguely <tinguely@sgi.com> Signed-off-by: NBen Myers <bpm@sgi.com>
-
- 15 5月, 2012 26 次提交
-
-
由 Jeff Liu 提交于
This patch adds lseek(2) SEEK_DATA/SEEK_HOLE functionality to xfs. Signed-off-by: NJie Liu <jeff.liu@oracle.com> Reviewed-by: NMark Tinguely <tinguely@sgi.com> Signed-off-by: NBen Myers <bpm@sgi.com>
-
由 Ben Myers 提交于
Commit e459df5, 'xfs: move busy extent handling to it's own file' moved some code from xfs_alloc.c into xfs_extent_busy.c for convenience in userspace code merges. One of the functions moved is xfs_extent_busy_trim (formerly xfs_alloc_busy_trim) which is defined STATIC. Unfortunately this function is still used in xfs_alloc.c, and this results in an undefined symbol in xfs.ko. Make xfs_extent_busy_trim not static and add its prototype to xfs_extent_busy.h. Signed-off-by: NBen Myers <bpm@sgi.com> Reviewed-by: NMark Tinguely <tinguely@sgi.com>
-
由 Dave Chinner 提交于
Rather than specifying XBF_MAPPED for almost all buffers, introduce XBF_UNMAPPED for the couple of users that use unmapped buffers. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NMark Tinguely <tinguely@sgi.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NBen Myers <bpm@sgi.com>
-
由 Dave Chinner 提交于
When we fail to mount the log in xfs_mountfs(), we tear down all the infrastructure we have already allocated. However, the process of mounting the log may have progressed to the point of reading, caching and modifying buffers in memory. Hence before we can free all the infrastructure, we have to flush and remove all the buffers from memory. Problem first reported by Eric Sandeen, later a different incarnation was reported by Ben Myers. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NMark Tinguely <tinguely@sgi.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NBen Myers <bpm@sgi.com>
-
由 Dave Chinner 提交于
Recent event tracing during a debugging session showed that flags that define the IO type for a buffer are leaking into the flags on the buffer incorrectly. Fix the flag exclusion mask in xfs_buf_alloc() to avoid problems that may be caused by such leakage. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NMark Tinguely <tinguely@sgi.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NBen Myers <bpm@sgi.com>
-
由 Dave Chinner 提交于
With the removal of xfs_rw.h and other changes over time, xfs_bit.h is being included in many files that don't actually need it. Clean up the includes as necessary. Also move the only-used-once xfs_ialloc_find_free() static inline function out of a header file that is widely included to reduce the number of needless dependencies on xfs_bit.h. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NMark Tinguely <tinguely@sgi.com> Signed-off-by: NBen Myers <bpm@sgi.com>
-
由 Dave Chinner 提交于
xfs_do_force_shutdown now is the only thing in xfs_rw.c. There is no need to keep it in it's own file anymore, so move it to xfs_fsops.c next to xfs_fs_goingdown() and kill xfs_rw.c. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NMark Tinguely <tinguely@sgi.com> Signed-off-by: NBen Myers <bpm@sgi.com>
-
由 Dave Chinner 提交于
The only thing left in xfs_rw.h is a function prototype for an inode function. Move that to xfs_inode.h, and kill xfs_rw.h. Also move the function implementing the prototype from xfs_rw.c to xfs_inode.c so we only have one function left in xfs_rw.c Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NMark Tinguely <tinguely@sgi.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NBen Myers <bpm@sgi.com>
-
由 Dave Chinner 提交于
This is the only remaining useful function in xfs_rw.h, so move it to a header file responsible for block mapping functions that the callers already include. Soon we can get rid of xfs_rw.h. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NMark Tinguely <tinguely@sgi.com> Signed-off-by: NBen Myers <bpm@sgi.com>
-
由 Dave Chinner 提交于
Now that the busy extent tracking has been moved out of the allocation files, clean up the namespace it uses to "xfs_extent_busy" rather than a mix of "xfs_busy" and "xfs_alloc_busy". Signed-off-by: Dave Chinner<dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NMark Tinguely <tinguely@sgi.com> Signed-off-by: NBen Myers <bpm@sgi.com>
-
由 Dave Chinner 提交于
To make it easier to handle userspace code merges, move all the busy extent handling out of the allocation code and into it's own file. The userspace code does not need the busy extent code, so this simplifies the merging of the kernel code into the userspace xfsprogs library. Because the busy extent code has been almost completely rewritten over the past couple of years, also update the copyright on this new file to include the authors that made all those changes. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NMark Tinguely <tinguely@sgi.com> Signed-off-by: NBen Myers <bpm@sgi.com>
-
由 Dave Chinner 提交于
Untangle the header file includes a bit by moving the definition of xfs_agino_t to xfs_types.h. This removes the dependency that xfs_ag.h has on xfs_inum.h, meaning we don't need to include xfs_inum.h everywhere we include xfs_ag.h. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NMark Tinguely <tinguely@sgi.com> Signed-off-by: NBen Myers <bpm@sgi.com>
-
由 Dave Chinner 提交于
fsstress has a particular effective way of stopping debug XFS kernels. We keep seeing assert failures due finding delayed allocation extents where there should be none. This shows up when extracting extent maps and we are holding all the locks we should be to prevent races, so this really makes no sense to see these errors. After checking that fsstress does not use mmap, it occurred to me that fsstress uses something that no sane application uses - the XFS_IOC_ALLOCSP ioctl interfaces for preallocation. These interfaces do allocation of blocks beyond EOF without using preallocation, and then call setattr to extend and zero the allocated blocks. THe problem here is this is a buffered write, and hence the allocation is a delayed allocation. Unlike the buffered IO path, the allocation and zeroing are not serialised using the IOLOCK. Hence the ALLOCSP operation can race with operations holding the iolock to prevent buffered IO operations from occurring. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NMark Tinguely <tinguely@sgi.com> Signed-off-by: NBen Myers <bpm@sgi.com>
-
由 Dave Chinner 提交于
Just about all callers of xfs_buf_read() and xfs_buf_get() use XBF_DONTBLOCK. This is used to make memory allocation use GFP_NOFS rather than GFP_KERNEL to avoid recursion through memory reclaim back into the filesystem. All the blocking get calls in growfs occur inside a transaction, even though they are no part of the transaction, so all allocation will be GFP_NOFS due to the task flag PF_TRANS being set. The blocking read calls occur during log recovery, so they will probably be unaffected by converting to GFP_NOFS allocations. Hence make XBF_DONTBLOCK behaviour always occur for buffers and kill the flag. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NMark Tinguely <tinguely@sgi.com> Signed-off-by: NBen Myers <bpm@sgi.com>
-
由 Dave Chinner 提交于
xfs_read_buf() is effectively the same as xfs_trans_read_buf() when called outside a transaction context. The error handling is slightly different in that xfs_read_buf stales the errored buffer it gets back, but there is probably good reason for xfs_trans_read_buf() for doing this. Hence update xfs_trans_read_buf() to the same error handling as xfs_read_buf(), and convert all the callers of xfs_read_buf() to use the former function. We can then remove xfs_read_buf(). Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NMark Tinguely <tinguely@sgi.com> Signed-off-by: NBen Myers <bpm@sgi.com>
-
由 Dave Chinner 提交于
Buffers are always returned locked from the lookup routines. Hence we don't need to tell the lookup routines to return locked buffers, on to try and lock them. Remove XBF_LOCK from all the callers and from internal buffer cache usage. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NMark Tinguely <tinguely@sgi.com> Signed-off-by: NBen Myers <bpm@sgi.com>
-
由 Dave Chinner 提交于
xfs_buf_btoc and friends are simple macros that do basic block to page index conversion and vice versa. These aren't widely used, and we use open coded masking and shifting everywhere else. Hence remove the macros and open code the work they do. Also, use of PAGE_CACHE_{SIZE|SHIFT|MASK} for these macros is now incorrect - we are using pages directly and not the page cache, so use PAGE_{SIZE|MASK|SHIFT} instead. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NMark Tinguely <tinguely@sgi.com> Signed-off-by: NBen Myers <bpm@sgi.com>
-
由 Dave Chinner 提交于
Now that we pass block counts everywhere, and index buffers by block number and length in units of blocks, convert the desired IO size into block counts rather than bytes. Convert the code to use block counts, and those that need byte counts get converted at the time of use. Rename the b_desired_count variable to something closer to it's purpose - b_io_length - as it is only used to specify the length of an IO for a subset of the buffer. The only time this is used is for log IO - both writing iclogs and during log recovery. In all other cases, the b_io_length matches b_length, and hence a lot of code confuses the two. e.g. the buf item code uses the io count exclusively when it should be using the buffer length. Fix these apprpriately as they are found. Also, remove the XFS_BUF_{SET_}COUNT() macros that are just wrappers around the desired IO length. They only serve to make the code shouty loud, don't actually add any real value, and are often used incorrectly. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NMark Tinguely <tinguely@sgi.com> Signed-off-by: NBen Myers <bpm@sgi.com>
-
由 Dave Chinner 提交于
Now that we pass block counts everywhere, and index buffers by block number, track the length of the buffer in units of blocks rather than bytes. Convert the code to use block counts, and those that need byte counts get converted at the time of use. Also, remove the XFS_BUF_{SET_}SIZE() macros that are just wrappers around the buffer length. They only serve to make the code shouty loud and don't actually add any real value. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NMark Tinguely <tinguely@sgi.com> Signed-off-by: NBen Myers <bpm@sgi.com>
-
由 Dave Chinner 提交于
Seeing as we pass block numbers around everywhere in the buffer cache now, it makes no sense to index everything by byte offset. Replace all the byte offset indexing with block number based indexing, and replace all uses of the byte offset with direct conversion from the block index. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NMark Tinguely <tinguely@sgi.com> Signed-off-by: NBen Myers <bpm@sgi.com>
-
由 Dave Chinner 提交于
The xfs_buf_get/read API is not consistent in the units it uses, and does not use appropriate or consistent units/types for the variables. Convert the API to use disk addresses and block counts for all buffer get and read calls. Use consistent naming for all the functions and their declarations, and convert the internal functions to use disk addresses and block counts to avoid need to convert them from one type to another and back again. Fix all the callers to use disk addresses and block counts. In many cases, this removes an additional conversion from the function call as the callers already have a block count. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NMark Tinguely <tinguely@sgi.com> Signed-off-by: NBen Myers <bpm@sgi.com>
-
由 Dave Chinner 提交于
To replace the alloc/memset pair. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NMark Tinguely <tinguely@sgi.com> Signed-off-by: NBen Myers <bpm@sgi.com>
-
由 Dave Chinner 提交于
Because we no longer use the page cache for buffering, there is no direct block number to page offset relationship anymore. xfs_buf_get_pages is still setting up b_offset as if there was some relationship, and that is leading to incorrectly setting up *uncached* buffers that don't overwrite b_offset once they've had pages allocated. For cached buffers, the first block of the buffer is always at offset zero into the allocated memory. This is true for sub-page sized buffers, as well as for multiple-page buffers. For uncached buffers, b_offset is only non-zero when we are associating specific memory to the buffers, and that is set correctly by the code setting up the buffer. Hence remove the setting of b_offset in xfs_buf_get_pages, because it is now always the wrong thing to do. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NMark Tinguely <tinguely@sgi.com> Signed-off-by: NBen Myers <bpm@sgi.com>
-
由 Dave Chinner 提交于
If we call xfs_buf_iowait() on a buffer that failed dispatch due to an IO error, it will wait forever for an Io that does not exist. This is hndled in xfs_buf_read, but there is other code that calls xfs_buf_iowait directly that doesn't. Rather than make the call sites have to handle checking for dispatch errors and then checking for completion errors, make xfs_buf_iowait() check for dispatch errors on the buffer before waiting. This means we handle both dispatch and completion errors with one set of error handling at the caller sites. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NMark Tinguely <tinguely@sgi.com> Signed-off-by: NBen Myers <bpm@sgi.com>
-
由 Dave Chinner 提交于
When memory allocation fails to add the page array or tht epages to a buffer during xfs_buf_get(), the buffer is left in the cache in a partially initialised state. There is enough state left for the next lookup on that buffer to find the buffer, and for the buffer to then be used without finishing the initialisation. As a result, when an attempt to do IO on the buffer occurs, it fails with EIO because there are no pages attached to the buffer. We cannot remove the buffer from the cache immediately and free it, because there may already be a racing lookup that is blocked on the buffer lock. Hence the moment we unlock the buffer to then free it, the other user is woken and we have a use-after-free situation. To avoid this race condition altogether, allocate the pages for the buffer before we insert it into the cache. This then means that we don't have an allocation failure case to deal after the buffer is already present in the cache, and hence avoid the problem altogether. In most cases we won't have racing inserts for the same buffer, and so won't increase the memory pressure allocation before insertion may entail. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NMark Tinguely <tinguely@sgi.com> Signed-off-by: NBen Myers <bpm@sgi.com>
-
由 Dave Chinner 提交于
xfstest 229 exposes a problem with buffered IO, delayed allocation and extent size hints. That is when we do delayed allocation during buffered IO, we reserve space for the extent size hint alignment and allocate the physical space to align the extent, but we do not zero the regions of the extent that aren't written by the write(2) syscall. The result is that we expose stale data in unwritten regions of the extent size hints. There are two ways to fix this. The first is to detect that we are doing unaligned writes, check if there is already a mapping or data over the extent size hint range, and if not zero the page cache first before then doing the real write. This can be very expensive for large extent size hints, especially if the subsequent writes fill then entire extent size before the data is written to disk. The second, and simpler way, is simply to turn off delayed allocation when the extent size hint is set and use preallocation instead. This results in unwritten extents being laid down on disk and so only the written portions will be converted. This matches the behaviour for direct IO, and will also work for the real time device. The disadvantage of this approach is that for small extent size hints we can get file fragmentation, but in general extent size hints are fairly large (e.g. stripe width sized) so this isn't a big deal. Implement the second approach as it is simple and effective. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NMark Tinguely <tinguely@sgi.com> Signed-off-by: NBen Myers <bpm@sgi.com>
-