- 23 11月, 2019 1 次提交
-
-
由 Christoph Hellwig 提交于
Replace the mappedbno argument with the simple flags for xfs_da_reada_buf and xfs_dir3_data_readahead. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
- 01 11月, 2019 1 次提交
-
-
由 Dave Chinner 提交于
AIO+DIO can extend the file size on IO completion, and it holds no inode locks while the IO is in flight. Therefore, a race condition exists in file size updates if we do something like this: aio-thread fallocate-thread lock inode submit IO beyond inode->i_size unlock inode ..... lock inode break layouts if (off + len > inode->i_size) new_size = off + len ..... inode_dio_wait() <blocks> ..... completes inode->i_size updated inode_dio_done() .... <wakes> <does stuff no long beyond EOF> if (new_size) xfs_vn_setattr(inode, new_size) Yup, that attempt to extend the file size in the fallocate code turns into a truncate - it removes the whatever the aio write allocated and put to disk, and reduced the inode size back down to where the fallocate operation ends. Fundamentally, xfs_file_fallocate() not compatible with racing AIO+DIO completions, so we need to move the inode_dio_wait() call up to where the lock the inode and break the layouts. Secondly, storing the inode size and then using it unchecked without holding the ILOCK is not safe; we can only do such a thing if we've locked out and drained all IO and other modification operations, which we don't do initially in xfs_file_fallocate. It should be noted that some of the fallocate operations are compound operations - they are made up of multiple manipulations that may zero data, and so we may need to flush and invalidate the file multiple times during an operation. However, we only need to lock out IO and other space manipulation operations once, as that lockout is maintained until the entire fallocate operation has been completed. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
- 29 10月, 2019 1 次提交
-
-
由 Christoph Hellwig 提交于
Remove xfs_zero_file_space and reorganize xfs_file_fallocate so that a single call to xfs_alloc_file_space covers all modes that preallocate blocks. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
- 28 10月, 2019 2 次提交
-
-
由 Christoph Hellwig 提交于
Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
由 Christoph Hellwig 提交于
Add a new xfs_inode_buftarg helper that gets the data I/O buftarg for a given inode. Replace the existing xfs_find_bdev_for_inode and xfs_find_daxdev_for_inode helpers with this new general one and cleanup some of the callers. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
- 22 10月, 2019 2 次提交
-
-
由 Christoph Hellwig 提交于
Instead of lots of magic conditionals in the main write_begin handler this make the intent very clear. Thing will become even better once we support delayed allocations for extent size hints and realtime allocations. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
由 Christoph Hellwig 提交于
Start untangling xfs_file_iomap_begin by splitting out the read-only case into its own set of iomap_ops with a very simply iomap_begin helper. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
- 15 10月, 2019 2 次提交
-
-
由 Jan Kara 提交于
Use iomap_dio_rw() to wait for unaligned direct IO instead of opencoding the wait. Signed-off-by: NJan Kara <jack@suse.cz> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
由 Jan Kara 提交于
Filesystems do not support doing IO as asynchronous in some cases. For example in case of unaligned writes or in case file size needs to be extended (e.g. for ext4). Instead of forcing filesystem to wait for AIO in such cases, add argument to iomap_dio_rw() which makes the function wait for IO completion. This also results in executing iomap_dio_complete() inline in iomap_dio_rw() providing its return value to the caller as for ordinary sync IO. Signed-off-by: NJan Kara <jack@suse.cz> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
- 20 9月, 2019 2 次提交
-
-
由 Christoph Hellwig 提交于
Add a new iomap_dio_ops structure that for now just contains the end_io handler. This avoid storing the function pointer in a mutable structure, which is a possible exploit vector for kernel code execution, and prepares for adding a submit_io handler that btrfs needs. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
由 Matthew Bobrowski 提交于
Modify the calling convention for the iomap_dio_rw ->end_io() callback. Rather than passing either dio->error or dio->size as the 'size' argument, instead pass both the dio->error and the dio->size value separately. In the instance that an error occurred during a write, we currently cannot determine whether any blocks have been allocated beyond the current EOF and data has subsequently been written to these blocks within the ->end_io() callback. As a result, we cannot judge whether we should take the truncate failed write path. Having both dio->error and dio->size will allow us to perform such checks within this callback. Signed-off-by: NMatthew Bobrowski <mbobrowski@mbobrowski.org> [hch: minor cleanups] Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
-
- 31 8月, 2019 1 次提交
-
-
由 Jan Kara 提交于
Hole puching currently evicts pages from page cache and then goes on to remove blocks from the inode. This happens under both XFS_IOLOCK_EXCL and XFS_MMAPLOCK_EXCL which provides appropriate serialization with racing reads or page faults. However there is currently nothing that prevents readahead triggered by fadvise() or madvise() from racing with the hole punch and instantiating page cache page after hole punching has evicted page cache in xfs_flush_unmap_range() but before it has removed blocks from the inode. This page cache page will be mapping soon to be freed block and that can lead to returning stale data to userspace or even filesystem corruption. Fix the problem by protecting handling of readahead requests by XFS_IOLOCK_SHARED similarly as we protect reads. CC: stable@vger.kernel.org Link: https://lore.kernel.org/linux-fsdevel/CAOQ4uxjQNmxqmtA_VbYW0Su9rKRk2zobJmahcyeaEVOFKVQ5dw@mail.gmail.com/Reported-by: NAmir Goldstein <amir73il@gmail.com> Signed-off-by: NJan Kara <jack@suse.cz> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
- 06 7月, 2019 1 次提交
-
-
由 Pankaj Gupta 提交于
Dont support 'MAP_SYNC' with non-DAX files and DAX files with asynchronous dax_device. Virtio pmem provides asynchronous host page cache flush mechanism. We don't support 'MAP_SYNC' with virtio pmem and xfs. Signed-off-by: NPankaj Gupta <pagupta@redhat.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
- 01 7月, 2019 1 次提交
-
-
由 Christoph Hellwig 提交于
Instead of a magic flag for xfs_trans_alloc, just ensure all callers that can't relclaim through the file system use memalloc_nofs_save to set the per-task nofs flag. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
- 29 6月, 2019 1 次提交
-
-
由 Eric Sandeen 提交于
There are many, many xfs header files which are included but unneeded (or included twice) in the xfs code, so remove them. nb: xfs_linux.h includes about 9 headers for everyone, so those explicit includes get removed by this. I'm not sure what the preference is, but if we wanted explicit includes everywhere, a followup patch could remove those xfs_*.h includes from xfs_linux.h and move them into the files that need them. Or it could be left as-is. Signed-off-by: NEric Sandeen <sandeen@redhat.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
- 10 6月, 2019 1 次提交
-
-
由 Amir Goldstein 提交于
Note that by using the helper, the order of calling file_remove_privs() after file_update_mtime() in xfs_file_aio_write_checks() has changed. Signed-off-by: NAmir Goldstein <amir73il@gmail.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
- 23 4月, 2019 1 次提交
-
-
由 Darrick J. Wong 提交于
Dave Chinner noticed that xfs_file_dio_aio_write returns EAGAIN without dropping the IOLOCK when its deciding not to wait, which means that we leak the IOLOCK there. Since we now make unaligned directio always wait, we have the opportunity to bail out before trying to take the lock, which should reduce the overhead of this never-gonna-work case considerably while also solving the dropped lock problem. Reported-by: NDave Chinner <david@fromorbit.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de>
-
- 26 3月, 2019 1 次提交
-
-
由 Brian Foster 提交于
XFS applies more strict serialization constraints to unaligned direct writes to accommodate things like direct I/O layer zeroing, unwritten extent conversion, etc. Unaligned submissions acquire the exclusive iolock and wait for in-flight dio to complete to ensure multiple submissions do not race on the same block and cause data corruption. This generally works in the case of an aligned dio followed by an unaligned dio, but the serialization is lost if I/Os occur in the opposite order. If an unaligned write is submitted first and immediately followed by an overlapping, aligned write, the latter submits without the typical unaligned serialization barriers because there is no indication of an unaligned dio still in-flight. This can lead to unpredictable results. To provide proper unaligned dio serialization, require that such direct writes are always the only dio allowed in-flight at one time for a particular inode. We already acquire the exclusive iolock and drain pending dio before submitting the unaligned dio. Wait once more after the dio submission to hold the iolock across the I/O and prevent further submissions until the unaligned I/O completes. This is heavy handed, but consistent with the current pre-submission serialization for unaligned direct writes. Signed-off-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NAllison Henderson <allison.henderson@oracle.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
- 24 2月, 2019 1 次提交
-
-
由 Christoph Hellwig 提交于
Store the request queue the last bio was submitted to in the iocb private data in addition to the cookie so that we find the right block device. Also refactor the common direct I/O bio submission code into a nice little helper. Signed-off-by: NChristoph Hellwig <hch@lst.de> Modified to use bio_set_polled(). Reviewed-by: NHannes Reinecke <hare@suse.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 21 2月, 2019 2 次提交
-
-
由 Christoph Hellwig 提交于
Add a mode where XFS never overwrites existing blocks in place. This is to aid debugging our COW code, and also put infatructure in place for things like possible future support for zoned block devices, which can't support overwrites. This mode is enabled globally by doing a: echo 1 > /sys/fs/xfs/debug/always_cow Note that the parameter is global to allow running all tests in xfstests easily in this mode, which would not easily be possible with a per-fs sysfs file. In always_cow mode persistent preallocations are disabled, and fallocate will fail when called with a 0 mode (with our without FALLOC_FL_KEEP_SIZE), and not create unwritten extent for zeroed space when called with FALLOC_FL_ZERO_RANGE or FALLOC_FL_UNSHARE_RANGE. There are a few interesting xfstests failures when run in always_cow mode: - generic/392 fails because the bytes used in the file used to test hole punch recovery are less after the log replay. This is because the blocks written and then punched out are only freed with a delay due to the logging mechanism. - xfs/170 will fail as the already fragile file streams mechanism doesn't seem to interact well with the COW allocator - xfs/180 xfs/182 xfs/192 xfs/198 xfs/204 and xfs/208 will claim the file system is badly fragmented, but there is not much we can do to avoid that when always writing out of place - xfs/205 fails because overwriting a file in always_cow mode will require new space allocation and the assumption in the test thus don't work anymore. - xfs/326 fails to modify the file at all in always_cow mode after injecting the refcount error, leading to an unexpected md5sum after the remount, but that again is expected Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
由 Christoph Hellwig 提交于
We speculatively allocate extents in the COW fork to reduce fragmentation. But when we write data into such COW fork blocks that do now shadow an allocation in the data fork SEEK_DATA will not correctly report it, as it only looks at the data fork extents. The only reason why that hasn't been an issue so far is because we even use these speculative COW fork preallocations over holes in the data fork at all for buffered writes, and blocks in the COW fork that are written by direct writes are moved into the data fork immediately at I/O completion time. Add a new set of iomap_ops for SEEK_HOLE/SEEK_DATA which looks into both the COW and data fork, and reports all COW extents as unwritten to the iomap layer. While this isn't strictly true for COW fork extents that were already converted to real extents, the practical semantics that you can't read data from them until they are moved into the data fork are very similar, and this will force the iomap layer into probing the extents for actually present data. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
- 20 11月, 2018 1 次提交
-
-
由 Eric Biggers 提交于
xfs_file_remap_range() is only used in fs/xfs/xfs_file.c, so make it static. This addresses a gcc warning when -Wmissing-prototypes is enabled. Signed-off-by: NEric Biggers <ebiggers@google.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
- 30 10月, 2018 5 次提交
-
-
由 Darrick J. Wong 提交于
Since xfs_file_remap_range is a thin wrapper, move the contents of xfs_reflink_remap_range into the shell. This cuts down on the vfs calls being made from internal xfs code. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Darrick J. Wong 提交于
Back when the XFS reflink code only supported clone_file_range, we were only able to return zero or negative error codes to userspace. However, now that copy_file_range (which returns bytes copied) can use XFS' clone_file_range, we have the opportunity to return partial results. For example, if userspace sends a 1GB clone request and we run out of space halfway through, we at least can tell userspace that we completed 512M of that request like a regular write. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Darrick J. Wong 提交于
Change the remap_file_range functions to take a number of bytes to operate upon and return the number of bytes they operated on. This is a requirement for allowing fs implementations to return short clone/dedupe results to the user, which will enable us to obey resource limits in a graceful manner. A subsequent patch will enable copy_file_range to signal to the ->clone_file_range implementation that it can handle a short length, which will be returned in the function's return value. For now the short return is not implemented anywhere so the behavior won't change -- either copy_file_range manages to clone the entire range or it tries an alternative. Neither clone ioctl can take advantage of this, alas. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NAmir Goldstein <amir73il@gmail.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Darrick J. Wong 提交于
Plumb the remap flags through the filesystem from the vfs function dispatcher all the way to the prep function to prepare for behavior changes in subsequent patches. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NAmir Goldstein <amir73il@gmail.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Darrick J. Wong 提交于
Combine the clone_file_range and dedupe_file_range operations into a single remap_file_range file operation dispatch since they're fundamentally the same operation. The differences between the two can be made in the prep functions. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NAmir Goldstein <amir73il@gmail.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
- 18 8月, 2018 1 次提交
-
-
由 Dave Jiang 提交于
This patch is reworked from an earlier patch that Dan has posted: https://patchwork.kernel.org/patch/10131727/ VM_MIXEDMAP is used by dax to direct mm paths like vm_normal_page() that the memory page it is dealing with is not typical memory from the linear map. The get_user_pages_fast() path, since it does not resolve the vma, is already using {pte,pmd}_devmap() as a stand-in for VM_MIXEDMAP, so we use that as a VM_MIXEDMAP replacement in some locations. In the cases where there is no pte to consult we fallback to using vma_is_dax() to detect the VM_MIXEDMAP special case. Now that we have explicit driver pfn_t-flag opt-in/opt-out for get_user_pages() support for DAX we can stop setting VM_MIXEDMAP. This also means we no longer need to worry about safely manipulating vm_flags in a future where we support dynamically changing the dax mode of a file. DAX should also now be supported with madvise_behavior(), vma_merge(), and copy_page_range(). This patch has been tested against ndctl unit test. It has also been tested against xfstests commit: 625515d using fake pmem created by memmap and no additional issues have been observed. Link: http://lkml.kernel.org/r/152847720311.55924.16999195879201817653.stgit@djiang5-desk3.ch.intel.comSigned-off-by: NDave Jiang <dave.jiang@intel.com> Acked-by: NDan Williams <dan.j.williams@intel.com> Cc: Jan Kara <jack@suse.cz> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 12 8月, 2018 1 次提交
-
-
由 Dave Jiang 提交于
This patch is the duplicate of ross's fix for ext4 for xfs. If the refcount of a page is lowered between the time that it is returned by dax_busy_page() and when the refcount is again checked in xfs_break_layouts() => ___wait_var_event(), the waiting function xfs_wait_dax_page() will never be called. This means that xfs_break_layouts() will still have 'retry' set to false, so we'll stop looping and never check the refcount of other pages in this inode. Instead, always continue looping as long as dax_layout_busy_page() gives us a page which it found with an elevated refcount. Signed-off-by: NDave Jiang <dave.jiang@intel.com> Reviewed-by: NJan Kara <jack@suse.cz> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
- 12 7月, 2018 1 次提交
-
-
由 Eric Sandeen 提交于
Signed-off-by: NEric Sandeen <sandeen@redhat.com> Reviewed-by: NAllison Henderson <allison.henderson@oracle.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
- 07 7月, 2018 2 次提交
-
-
由 Miklos Szeredi 提交于
Clean up f_op->dedupe_file_range() interface. 1) Use loff_t for offsets and length instead of u64 2) Order the arguments the same way as {copy|clone}_file_range(). Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
由 Miklos Szeredi 提交于
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
-
- 07 6月, 2018 1 次提交
-
-
由 Dave Chinner 提交于
Remove the verbose license text from XFS files and replace them with SPDX tags. This does not change the license of any of the code, merely refers to the common, up-to-date license files in LICENSES/ This change was mostly scripted. fs/xfs/Makefile and fs/xfs/libxfs/xfs_fs.h were modified by hand, the rest were detected and modified by the following command: for f in `git grep -l "GNU General" fs/xfs/` ; do echo $f cat $f | awk -f hdr.awk > $f.new mv -f $f.new $f done And the hdr.awk script that did the modification (including detecting the difference between GPL-2.0 and GPL-2.0+ licenses) is as follows: $ cat hdr.awk BEGIN { hdr = 1.0 tag = "GPL-2.0" str = "" } /^ \* This program is free software/ { hdr = 2.0; next } /any later version./ { tag = "GPL-2.0+" next } /^ \*\// { if (hdr > 0.0) { print "// SPDX-License-Identifier: " tag print str print $0 str="" hdr = 0.0 next } print $0 next } /^ \* / { if (hdr > 1.0) next if (hdr > 0.0) { if (str != "") str = str "\n" str = str $0 next } print $0 next } /^ \*/ { if (hdr > 0.0) next print $0 next } // { if (hdr > 0.0) { if (str != "") str = str "\n" str = str $0 next } print $0 } END { } $ Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
- 30 5月, 2018 1 次提交
-
-
由 Souptick Joarder 提交于
Use new return type vm_fault_t for fault handlers. Signed-off-by: NSouptick Joarder <jrdr.linux@gmail.com> Reviewed-by: NMatthew Wilcox <mawilcox@microsoft.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
- 22 5月, 2018 3 次提交
-
-
由 Dan Williams 提交于
xfs_break_dax_layouts(), similar to xfs_break_leased_layouts(), scans for busy / pinned dax pages and waits for those pages to go idle before any potential extent unmap operation. dax_layout_busy_page() handles synchronizing against new page-busy events (get_user_pages). It invalidates all mappings to trigger the get_user_pages slow path which will eventually block on the xfs inode lock held in XFS_MMAPLOCK_EXCL mode. If dax_layout_busy_page() finds a busy page it returns it for xfs to wait for the page-idle event that will fire when the page reference count reaches 1 (recall ZONE_DEVICE pages are idle at count 1, see generic_dax_pagefree()). While waiting, the XFS_MMAPLOCK_EXCL lock is dropped in order to not deadlock the process that might be trying to elevate the page count of more pages before arranging for any of them to go idle. I.e. the typical case of submitting I/O is that iov_iter_get_pages() elevates the reference count of all pages in the I/O before starting I/O on the first page. The process of elevating the reference count of all pages involved in an I/O may cause faults that need to take XFS_MMAPLOCK_EXCL. Although XFS_MMAPLOCK_EXCL is dropped while waiting, XFS_IOLOCK_EXCL is held while sleeping. We need this to prevent starvation of the truncate path as continuous submission of direct-I/O could starve the truncate path indefinitely if the lock is dropped. Cc: Dave Chinner <david@fromorbit.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Reported-by: NJan Kara <jack@suse.cz> Reviewed-by: NJan Kara <jack@suse.cz> Reviewed-by: NChristoph Hellwig <hch@lst.de> Acked-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
由 Dan Williams 提交于
When xfs is operating as the back-end of a pNFS block server, it prevents collisions between local and remote operations by requiring a lease to be held for remotely accessed blocks. Local filesystem operations break those leases before writing or mutating the extent map of the file. A similar mechanism is needed to prevent operations on pinned dax mappings, like device-DMA, from colliding with extent unmap operations. BREAK_WRITE and BREAK_UNMAP are introduced as two distinct levels of layout breaking. Layouts are broken in the BREAK_WRITE case to ensure that layout-holders do not collide with local writes. Additionally, layouts are broken in the BREAK_UNMAP case to make sure the layout-holder has a consistent view of the file's extent map. While BREAK_WRITE breaks can be satisfied be recalling FL_LAYOUT leases, BREAK_UNMAP breaks additionally require waiting for busy dax-pages to go idle while holding XFS_MMAPLOCK_EXCL. After this refactoring xfs_break_layouts() becomes the entry point for coordinating both types of breaks. Finally, xfs_break_leased_layouts() becomes just the BREAK_WRITE handler. Note that the unlock tracking is needed in a follow on change. That will coordinate retrying either break handler until both successfully test for a lease break while maintaining the lock state. Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: "Darrick J. Wong" <darrick.wong@oracle.com> Reported-by: NDave Chinner <david@fromorbit.com> Reported-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
由 Dan Williams 提交于
In preparation for adding coordination between extent unmap operations and busy dax-pages, update xfs_break_layouts() to permit it to be called with the mmap lock held. This lock scheme will be required for coordinating the break of 'dax layouts' (non-idle dax (ZONE_DEVICE) pages mapped into the file's address space). Breaking dax layouts will be added to xfs_break_layouts() in a future patch, for now this preps the unmap call sites to take and hold XFS_MMAPLOCK_EXCL over the call to xfs_break_layouts(). Cc: "Darrick J. Wong" <darrick.wong@oracle.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Dave Chinner <david@fromorbit.com> Suggested-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: N"Darrick J. Wong" <darrick.wong@oracle.com> Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
- 10 5月, 2018 2 次提交
-
-
由 Dave Chinner 提交于
Currently iomap_dio_rw() only handles (data)sync write completions for AIO. This means we can't optimised non-AIO IO to minimise device flushes as we can't tell the caller whether a flush is required or not. To solve this problem and enable further optimisations, make iomap_dio_rw responsible for data sync behaviour for all IO, not just AIO. In doing so, the sync operation is now accounted as part of the DIO IO by inode_dio_end(), hence post-IO data stability updates will no long race against operations that serialise via inode_dio_wait() such as truncate or hole punch. Signed-Off-By: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
由 Dave Chinner 提交于
To prepare for iomap iinfrastructure based DSYNC optimisations. While moving the code araound, move the XFS write bytes metric update for direct IO into xfs_dio_write_end_io callback so that we always capture the amount of data written via AIO+DIO. This fixes the problem where queued AIO+DIO writes are not accounted to this metric. Signed-Off-By: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
- 03 5月, 2018 1 次提交
-
-
由 Darrick J. Wong 提交于
Since deduplication potentially has to read in all the pages in both files in order to compare the contents, cap the deduplication request length at MAX_RW_COUNT/2 (roughly 1GB) so that we have /some/ upper bound on the request length and can't just lock up the kernel forever. Found by running generic/304 after commit 1ddae54555b62 ("common/rc: add missing 'local' keywords"). Reported-by: matorola@gmail.com Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NCarlos Maiolino <cmaiolino@redhat.com>
-