- 16 4月, 2015 2 次提交
-
-
由 Dave Chinner 提交于
When we are doing AIO DIO writes, the IOLOCK only provides an IO submission barrier. When we need to do EOF zeroing, we need to ensure that no other IO is in progress and all pending in-core EOF updates have been completed. This requires us to wait for all outstanding AIO DIO writes to the inode to complete and, if necessary, run their EOF updates. Once all the EOF updates are complete, we can then restart xfs_file_aio_write_checks() while holding the IOLOCK_EXCL, knowing that EOF is up to date and we have exclusive IO access to the file so we can run EOF block zeroing if we need to without interference. This gives EOF zeroing the same exclusivity against other IO as we provide truncate operations. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Dave Chinner 提交于
xfs_end_io_direct_write() can race with other IO completions when updating the in-core inode size. The IO completion processing is not serialised for direct IO - they are done either under the IOLOCK_SHARED for non-AIO DIO, and without any IOLOCK held at all during AIO DIO completion. Hence the non-atomic test-and-set update of the in-core inode size is racy and can result in the in-core inode size going backwards if the race if hit just right. If the inode size goes backwards, this can trigger the EOF zeroing code to run incorrectly on the next IO, which then will zero data that has successfully been written to disk by a previous DIO. To fix this bug, we need to serialise the test/set updates of the in-core inode size. This first patch introduces locking around the relevant updates and checks in the DIO path. Because we now have an ioend in xfs_end_io_direct_write(), we know exactly then we are doing an IO that requires an in-core EOF update, and we know that they are not running in interrupt context. As such, we do not need to use irqsave() spinlock variants to protect against interrupts while the lock is held. Hence we can use an existing spinlock in the inode to do this serialisation and so not need to grow the struct xfs_inode just to work around this problem. This patch does not address the test/set EOF update in generic_file_write_direct() for various reasons - that will be done as a followup with separate explanation. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
- 16 2月, 2015 1 次提交
-
-
由 Christoph Hellwig 提交于
Recall all outstanding pNFS layouts and truncates, writes and similar extent list modifying operations. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
- 11 2月, 2015 1 次提交
-
-
由 Kirill A. Shutemov 提交于
Nobody uses it anymore. [akpm@linux-foundation.org: fix filemap_xip.c] Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 02 2月, 2015 1 次提交
-
-
由 Christoph Hellwig 提交于
This logic is duplicated in xfs_file_fallocate and xfs_ioc_space, and we'll need another copy of it for pNFS block support. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
- 21 1月, 2015 1 次提交
-
-
由 Christoph Hellwig 提交于
Now that we got rid of the bdi abuse on character devices we can always use sb->s_bdi to get at the backing_dev_info for a file, except for the block device special case. Export inode_to_bdi and replace uses of mapping->backing_dev_info with it to prepare for the removal of mapping->backing_dev_info. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NTejun Heo <tj@kernel.org> Reviewed-by: NJan Kara <jack@suse.cz> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 01 12月, 2014 1 次提交
-
-
由 kbuild test robot 提交于
fs/xfs/xfs_file.c:919:1-6: WARNING: end returns can be simpified and declaration on line 902 can be dropped Simplify a trivial if-return sequence. Possibly combine with a preceding function call. Generated by: scripts/coccinelle/misc/simple_return.cocci Signed-off-by: NFengguang Wu <fengguang.wu@intel.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
- 28 11月, 2014 3 次提交
-
-
由 Christoph Hellwig 提交于
More on-disk format consolidation. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Christoph Hellwig 提交于
More on-disk format consolidation. A few declarations that weren't on-disk format related move into better suitable spots. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Christoph Hellwig 提交于
More consolidatation for the on-disk format defintions. Note that the XFS_IS_REALTIME_INODE moves to xfs_linux.h instead as it is not related to the on disk format, but depends on a CONFIG_ option. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
- 09 9月, 2014 2 次提交
-
-
由 Eric Sandeen 提交于
For some reason, the older commit: 965c8e59 lseek: the "whence" argument is called "whence" lseek: the "whence" argument is called "whence" But the kernel decided to call it "origin" instead. Fix most of the sites. left out xfs. So fix xfs. Signed-off-by: NEric Sandeen <sandeen@redhat.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NJie Liu <jeff.liu@oracle.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Eric Sandeen 提交于
xfs_seek_hole & xfs_seek_data are remarkably similar; so much so that they can be combined, saving a fair bit of semi-complex code duplication. The following patch passes generic/285 and generic/286, which specifically test seek behavior. Signed-off-by: NEric Sandeen <sandeen@redhat.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NJie Liu <jeff.liu@oracle.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
- 02 9月, 2014 3 次提交
-
-
由 Dave Chinner 提交于
Now we are not doing silly things with dirtying buffers beyond EOF and using invalidation correctly, we can finally reduce the ranges of writeback and invalidation used by direct IO to match that of the IO being issued. Bring the writeback and invalidation ranges back to match the generic direct IO code - this will greatly reduce the perturbation of cached data when direct IO and buffered IO are mixed, but still provide the same buffered vs direct IO coherency behaviour we currently have. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Dave Chinner 提交于
Similar to direct IO reads, direct IO writes are using truncate_pagecache_range to invalidate the page cache. This is incorrect due to the sub-block zeroing in the page cache that truncate_pagecache_range() triggers. This patch fixes things by using invalidate_inode_pages2_range instead. It preserves the page cache invalidation, but won't zero any pages. cc: stable@vger.kernel.org Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Chris Mason 提交于
xfs is using truncate_pagecache_range to invalidate the page cache during DIO reads. This is different from the other filesystems who only invalidate pages during DIO writes. truncate_pagecache_range is meant to be used when we are freeing the underlying data structs from disk, so it will zero any partial ranges in the page. This means a DIO read can zero out part of the page cache page, and it is possible the page will stay in cache. buffered reads will find an up to date page with zeros instead of the data actually on disk. This patch fixes things by using invalidate_inode_pages2_range instead. It preserves the page cache invalidation, but won't zero any pages. [dchinner: catch error and warn if it fails. Comment.] cc: stable@vger.kernel.org Signed-off-by: NChris Mason <clm@fb.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
- 04 8月, 2014 1 次提交
-
-
由 Dave Chinner 提交于
Move the IO flag definitions to xfs_inode.h and kill the header file as it is now empty. Removing the xfs_vnode.h file showed up an implicit header include path: xfs_linux.h -> xfs_vnode.h -> xfs_fs.h And so every xfs header file has been inplicitly been including xfs_fs.h where it is needed or not. Hence the removal of xfs_vnode.h causes all sorts of build issues because BBTOB() and friends are no longer automatically included in the build. This also gets fixed. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
- 24 7月, 2014 1 次提交
-
-
由 Brian Foster 提交于
From: Brian Foster <bfoster@redhat.com> Speculative preallocation and and the associated throttling metrics assume we're working with large files on large filesystems. Users have reported inefficiencies in these mechanisms when we happen to be dealing with large files on smaller filesystems. This can occur because while prealloc throttling is aggressive under low free space conditions, it is not active until we reach 5% free space or less. For example, a 40GB filesystem has enough space for several files large enough to have multi-GB preallocations at any given time. If those files are slow growing, they might reserve preallocation for long periods of time as well as avoid the background scanner due to frequent modification. If a new file is written under these conditions, said file has no access to this already reserved space and premature ENOSPC is imminent. To handle this scenario, modify the buffered write ENOSPC handling and retry sequence to invoke an eofblocks scan. In the smaller filesystem scenario, the eofblocks scan resets the usage of preallocation such that when the 5% free space threshold is met, throttling effectively takes over to provide fair and efficient preallocation until legitimate ENOSPC. The eofblocks scan is selective based on the nature of the failure. For example, an EDQUOT failure in a particular quota will use a filtered scan for that quota. Because we don't know which quota might have caused an allocation failure at any given time, we include each applicable quota determined to be under low free space conditions in the scan. Signed-off-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
- 25 6月, 2014 1 次提交
-
-
由 Dave Chinner 提交于
Convert all the errors the core XFs code to negative error signs like the rest of the kernel and remove all the sign conversion we do in the interface layers. Errors for conversion (and comparison) found via searches like: $ git grep " E" fs/xfs $ git grep "return E" fs/xfs $ git grep " E[A-Z].*;$" fs/xfs Negation points found via searches like: $ git grep "= -[a-z,A-Z]" fs/xfs $ git grep "return -[a-z,A-D,F-Z]" fs/xfs $ git grep " -[a-z].*;" fs/xfs [ with some bits I missed from Brian Foster ] Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
- 22 6月, 2014 1 次提交
-
-
由 Eric Sandeen 提交于
XFS_ERROR was designed long ago to trap return values, but it's not runtime configurable, it's not consistently used, and we can do similar error trapping with ftrace scripts and triggers from userspace. Just nuke XFS_ERROR and associated bits. Signed-off-by: NEric Sandeen <sandeen@redhat.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
- 12 6月, 2014 1 次提交
-
-
由 Al Viro 提交于
iter_file_splice_write() - a ->splice_write() instance that gathers the pipe buffers, builds a bio_vec-based iov_iter covering those and feeds it to ->write_iter(). A bunch of simple cases coverted to that... [AV: fixed the braino spotted by Cyrill] Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 15 5月, 2014 3 次提交
-
-
由 Dave Chinner 提交于
Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NJie Liu <jeff.liu@oracle.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Dave Chinner 提交于
Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NJie Liu <jeff.liu@oracle.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Dave Chinner 提交于
And it should be negative. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NJie Liu <jeff.liu@oracle.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
- 07 5月, 2014 7 次提交
-
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
pos is redundant (it's iocb->ki_pos), and iov/nr_segs/count are taken care of by lifting iov_iter into the caller. Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
Now It Can Be Done(tm) - we don't need to do iov_shorten() in generic_file_direct_write() anymore, now that all ->direct_IO() instances are converted to proper iov_iter methods and honour iter->count and iter->iov_offset properly. Get rid of count/ocount arguments of generic_file_direct_write(), while we are at it. Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
For now, just use the same thing we pass to ->direct_IO() - it's all iovec-based at the moment. Pass it explicitly to iov_iter_init() and account for kvec vs. iovec in there, by the same kludge NFS ->direct_IO() uses. Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
all callers of ->aio_read() and ->aio_write() have iov/nr_segs already checked - generic_segment_checks() done after that is just an odd way to spell iov_length(). Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 17 4月, 2014 1 次提交
-
-
由 Dave Chinner 提交于
We negate the error value being returned from a generic function incorrectly. The code path that it is running in returned negative errors, so there is no need to negate it to get the correct error signs here. This was uncovered by generic/019. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
- 14 4月, 2014 1 次提交
-
-
由 Eric Sandeen 提交于
This one hits a few functions as we unravel the unused arg up through the callers. Signed-off-by: NEric Sandeen <sandeen@redhat.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
- 12 4月, 2014 1 次提交
-
-
由 Lukas Czerner 提交于
Currently in do_fallocate in collapse range case we're checking whether offset + len is not bigger than i_size. However there is nothing which would prevent i_size from changing so the check is pointless. It should be done in the file system itself and the file system needs to make sure that i_size is not going to change. The i_size check for the other fallocate modes are also done in the filesystems. As it is now we can easily crash the kernel by having two processes doing truncate and fallocate collapse range at the same time. This can be reproduced on ext4 and it is theoretically possible on xfs even though I was not able to trigger it with this simple test. This commit removes the check from do_fallocate and adds it to the file system. Signed-off-by: NLukas Czerner <lczerner@redhat.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu> Acked-by: NDave Chinner <david@fromorbit.com> Reviewed-by: NChristoph Hellwig <hch@lst.de>
-
- 08 4月, 2014 1 次提交
-
-
由 Kirill A. Shutemov 提交于
filemap_map_pages() is generic implementation of ->map_pages() for filesystems who uses page cache. It should be safe to use filemap_map_pages() for ->map_pages() if filesystem use filemap_fault() for ->fault(). Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: NLinus Torvalds <torvalds@linux-foundation.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Rik van Riel <riel@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Matthew Wilcox <matthew.r.wilcox@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Dave Chinner <david@fromorbit.com> Cc: Ning Qu <quning@gmail.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 02 4月, 2014 3 次提交
-
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
always equal to &iocb->ki_pos. Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
same story - it's &iocb->ki_pos in all cases Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 13 3月, 2014 1 次提交
-
-
由 Lukas Czerner 提交于
Introduce new FALLOC_FL_ZERO_RANGE flag for fallocate. This has the same functionality as xfs ioctl XFS_IOC_ZERO_RANGE. We can also preallocate blocks past EOF in the same was as with fallocate. Flag FALLOC_FL_KEEP_SIZE will cause the inode size to remain the same even if we preallocate blocks past EOF. It uses the same code to zero range as it is used by the XFS_IOC_ZERO_RANGE ioctl. Signed-off-by: NLukas Czerner <lczerner@redhat.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
- 24 2月, 2014 1 次提交
-
-
由 Namjae Jeon 提交于
This patch implements fallocate's FALLOC_FL_COLLAPSE_RANGE for XFS. The semantics of this flag are following: 1) It collapses the range lying between offset and length by removing any data blocks which are present in this range and than updates all the logical offsets of extents beyond "offset + len" to nullify the hole created by removing blocks. In short, it does not leave a hole. 2) It should be used exclusively. No other fallocate flag in combination. 3) Offset and length supplied to fallocate should be fs block size aligned in case of xfs and ext4. 4) Collaspe range does not work beyond i_size. Signed-off-by: NNamjae Jeon <namjae.jeon@samsung.com> Signed-off-by: NAshish Sangwan <a.sangwan@samsung.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
- 10 2月, 2014 1 次提交
-
-
由 Al Viro 提交于
It actually goes back to 2004 ([PATCH] Concurrent O_SYNC write support) when sync_page_range() had been introduced; generic_file_write{,v}() correctly synced pos_after_write - written .. pos_after_write - 1 but generic_file_aio_write() synced pos_before_write .. pos_before_write + written - 1 instead. Which is not the same thing with O_APPEND, obviously. A couple of years later correct variant had been killed off when everything switched to use of generic_file_aio_write(). All users of generic_file_aio_write() are affected, and the same bug has been copied into other instances of ->aio_write(). The fix is trivial; the only subtle point is that generic_write_sync() ought to be inlined to avoid calculations useless for the majority of calls. Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-