- 16 2月, 2023 1 次提交
-
-
由 Christoph Hellwig 提交于
No users left now that btrfs takes REQ_OP_WRITE bios from iomap and splits and converts them to REQ_OP_ZONE_APPEND internally. Reviewed-by: NJosef Bacik <josef@toxicpanda.com> Reviewed-by: NDamien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
- 15 11月, 2022 1 次提交
-
-
由 Keith Busch 提交于
Don't transform the logical block size to a bit shift only to shift it back to the original block size. Just use the size. Cc: Christoph Hellwig <hch@lst.de> Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com> Reviewed-by: NDarrick J. Wong <djwong@kernel.org> Reviewed-by: NBart Van Assche <bvanassche@acm.org> Signed-off-by: NKeith Busch <kbusch@kernel.org> Signed-off-by: NDarrick J. Wong <djwong@kernel.org> Reviewed-by: NMatthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: NChristoph Hellwig <hch@lst.de>
-
- 09 8月, 2022 1 次提交
-
-
由 Al Viro 提交于
Equivalent of single-segment iovec. Initialized by iov_iter_ubuf(), checked for by iter_is_ubuf(), otherwise behaves like ITER_IOVEC ones. We are going to expose the things like ->write_iter() et.al. to those in subsequent commits. New predicate (user_backed_iter()) that is true for ITER_IOVEC and ITER_UBUF; places like direct-IO handling should use that for checking that pages we modify after getting them from iov_iter_get_pages() would need to be dirtied. DO NOT assume that replacing iter_is_iovec() with user_backed_iter() will solve all problems - there's code that uses iter_is_iovec() to decide how to poke around in iov_iter guts and for that the predicate replacement obviously won't suffice. Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 15 7月, 2022 1 次提交
-
-
由 Bart Van Assche 提交于
Improve static type checking by using the enum req_op type for variables that represent a request operation and the new blk_opf_t type for the combination of a request operation and request flags. Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: NBart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20220714180729.1065367-56-bvanassche@acm.orgSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
- 27 6月, 2022 1 次提交
-
-
由 Keith Busch 提交于
Use the address alignment requirements from the block_device for direct io instead of requiring addresses be aligned to the block size. Signed-off-by: NKeith Busch <kbusch@kernel.org> Reviewed-by: NChristoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220610195830.3574005-12-kbusch@fb.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
- 11 6月, 2022 2 次提交
-
-
由 Al Viro 提交于
New helper to be used instead of direct checks for IOCB_DSYNC: iocb_is_dsync(iocb). Checks converted, which allows to avoid the IS_SYNC(iocb->ki_filp->f_mapping->host) part (4 cache lines) from iocb_flags() - it's checked in iocb_is_dsync() instead Reviewed-by: NChristian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
New flag, equivalent to removal of IOCB_DSYNC from iocb flags. This mimics what btrfs is doing (and that's what btrfs will switch to). However, I'm not at all sure that we want to suppress REQ_FUA for those - all btrfs hack really cares about is suppression of generic_write_sync(). For now let's keep the existing behaviour, but I really want to hear more detailed arguments pro or contra. [folded brain fix from willy] Suggested-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NChristian Brauner (Microsoft) <brauner@kernel.org> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 16 5月, 2022 2 次提交
-
-
由 Christoph Hellwig 提交于
Allow the file system to keep state for all iterations. For now only wire it up for direct I/O as there is an immediate need for it there. Reviewed-by: NDarrick J. Wong <djwong@kernel.org> Reviewed-by: NNikolay Borisov <nborisov@suse.com> Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Christoph Hellwig 提交于
Allow the file system to provide a specific bio_set for allocating direct I/O bios. This will allow file systems that use the ->submit_io hook to stash away additional information for file system use. To make use of this additional space for information in the completion path, the file system needs to override the ->bi_end_io callback and then call back into iomap, so export iomap_dio_bio_end_io for that. Reviewed-by: NDarrick J. Wong <djwong@kernel.org> Reviewed-by: NNikolay Borisov <nborisov@suse.com> Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDavid Sterba <dsterba@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
- 03 5月, 2022 1 次提交
-
-
由 Ming Lei 提交于
So far bio is marked as REQ_POLLED if RWF_HIPRI/IOCB_HIPRI is passed from userspace sync io interface, then block layer tries to poll until the bio is completed. But the current implementation calls blk_io_schedule() if bio_poll() returns 0, and this way causes io hang or timeout easily. But looks no one reports this kind of issue, which should have been triggered in normal io poll sanity test or blktests block/007 as observed by Changhui, that means it is very likely that no one uses it or no one cares it. Also after io_uring is invented, io poll for sync dio becomes legacy interface. So ignore RWF_HIPRI hint for sync dio. CC: linux-mm@kvack.org Cc: linux-xfs@vger.kernel.org Reported-by: NChanghui Zhong <czhong@redhat.com> Suggested-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NMing Lei <ming.lei@redhat.com> Tested-by: NChanghui Zhong <czhong@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220420143110.2679002-1-ming.lei@redhat.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
- 18 4月, 2022 1 次提交
-
-
由 Christoph Hellwig 提交于
Add a helper to check the FUA flag based on the block_device instead of having to poke into the block layer internal request_queue. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com> Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20220415045258.199825-14-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
- 08 3月, 2022 1 次提交
-
-
由 Christoph Hellwig 提交于
With the NVMe support for this gone, there are no consumers of these hints left, so remove them. Signed-off-by: NChristoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220304175556.407719-2-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
- 09 2月, 2022 1 次提交
-
-
由 Eric Biggers 提交于
Encrypted files traditionally haven't supported DIO, due to the need to encrypt/decrypt the data. However, when the encryption is implemented using inline encryption (blk-crypto) instead of the traditional filesystem-layer encryption, it is straightforward to support DIO. Add support for this to the iomap DIO implementation by calling fscrypt_set_bio_crypt_ctx() to set encryption contexts on the bios. Don't check for the rare case where a DUN (crypto data unit number) discontiguity creates a boundary that bios must not cross. Instead, filesystems are expected to handle this in ->iomap_begin() by limiting the length of the mapping so that iomap doesn't have to worry about it. Co-developed-by: NSatya Tangirala <satyat@google.com> Signed-off-by: NSatya Tangirala <satyat@google.com> Acked-by: NDarrick J. Wong <djwong@kernel.org> Reviewed-by: NChristoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220128233940.79464-3-ebiggers@kernel.orgSigned-off-by: NEric Biggers <ebiggers@google.com>
-
- 02 2月, 2022 1 次提交
-
-
由 Christoph Hellwig 提交于
Pass the block_device and operation that we plan to use this bio for to bio_alloc to optimize the assignment. NULL/0 can be passed, both for the passthrough case on a raw request_queue and to temporarily avoid refactoring some nasty code. Also move the gfp_mask argument after the nr_vecs argument for a much more logical calling convention matching what most of the kernel does. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20220124091107.642561-18-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
- 04 12月, 2021 1 次提交
-
-
由 Jens Axboe 提交于
No functional changes in this patch, just in preparation for efficiently calling this light function from the block O_DIRECT handling. Reviewed-by: NMatthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 26 10月, 2021 1 次提交
-
-
由 Jens Axboe 提交于
The second argument was only used by the USB gadget code, yet everyone pays the overhead of passing a zero to be passed into aio, where it ends up being part of the aio res2 value. Now that everybody is passing in zero, kill off the extra argument. Reviewed-by: NDarrick J. Wong <djwong@kernel.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 24 10月, 2021 3 次提交
-
-
由 Andreas Gruenbacher 提交于
Add a done_before argument to iomap_dio_rw that indicates how much of the request has already been transferred. When the request succeeds, we report that done_before additional bytes were tranferred. This is useful for finishing a request asynchronously when part of the request has already been completed synchronously. We'll use that to allow iomap_dio_rw to be used with page faults disabled: when a page fault occurs while submitting a request, we synchronously complete the part of the request that has already been submitted. The caller can then take care of the page fault and call iomap_dio_rw again for the rest of the request, passing in the number of bytes already tranferred. Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com> Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
-
由 Andreas Gruenbacher 提交于
In iomap_dio_rw, when iomap_apply returns an -EFAULT error and the IOMAP_DIO_PARTIAL flag is set, complete the request synchronously and return a partial result. This allows the caller to deal with the page fault and retry the remainder of the request. Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com> Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
-
由 Andreas Gruenbacher 提交于
When a user copy fails in one of the helpers of iomap_dio_rw, fail with -EFAULT instead of returning 0. This matches what iomap_dio_bio_actor returns when it gets an -EFAULT from bio_iov_iter_get_pages. With these changes, iomap_dio_actor now consistently fails with -EFAULT when a user page cannot be faulted in. Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com> Reviewed-by: NDarrick J. Wong <djwong@kernel.org> Reviewed-by: NChristoph Hellwig <hch@lst.de>
-
- 19 10月, 2021 1 次提交
-
-
由 Jens Axboe 提交于
struct io_comp_batch contains a list head and a completion handler, which will allow completions to more effciently completed batches of IO. For now, no functional changes in this patch, we just define the io_comp_batch structure and add the argument to the file_operations iopoll handler. Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 18 10月, 2021 3 次提交
-
-
由 Christoph Hellwig 提交于
Replace the blk_poll interface that requires the caller to keep a queue and cookie from the submissions with polling based on the bio. Polling for the bio itself leads to a few advantages: - the cookie construction can made entirely private in blk-mq.c - the caller does not need to remember the request_queue and cookie separately and thus sidesteps their lifetime issues - keeping the device and the cookie inside the bio allows to trivially support polling BIOs remapping by stacking drivers - a lot of code to propagate the cookie back up the submission path can be removed entirely. Signed-off-by: NChristoph Hellwig <hch@lst.de> Tested-by: NMark Wunderlich <mark.wunderlich@intel.com> Link: https://lore.kernel.org/r/20211012111226.760968-15-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
Switch the boolean spin argument to blk_poll to passing a set of flags instead. This will allow to control polling behavior in a more fine grained way. Signed-off-by: NChristoph Hellwig <hch@lst.de> Tested-by: NMark Wunderlich <mark.wunderlich@intel.com> Link: https://lore.kernel.org/r/20211012111226.760968-10-hch@lst.de [axboe: adapt to changed io_uring iopoll] Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
If an iocb is split into multiple bios we can't poll for both. So don't bother to even try to poll in that case. Signed-off-by: NChristoph Hellwig <hch@lst.de> Tested-by: NMark Wunderlich <mark.wunderlich@intel.com> Link: https://lore.kernel.org/r/20211012111226.760968-4-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
- 17 8月, 2021 1 次提交
-
-
由 Christoph Hellwig 提交于
Switch __iomap_dio_rw to use iomap_iter. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDarrick J. Wong <djwong@kernel.org> Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
-
- 04 8月, 2021 1 次提交
-
-
由 Gao Xiang 提交于
The existing inline data support only works for cases where the entire file is stored as inline data. For larger files, EROFS stores the initial blocks separately and the remainder of the file ("file tail") adjacent to the inode. Generalise inline data to allow reading the inline file tail. Tails may not cross a page boundary in memory. We currently have no filesystems that support tails and writing, so that case is currently disabled (see iomap_write_begin_inline). Reviewed-by: NDarrick J. Wong <djwong@kernel.org> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NMatthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com> Signed-off-by: NGao Xiang <hsiangkao@linux.alibaba.com> Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
-
- 01 5月, 2021 1 次提交
-
-
由 Jens Axboe 提交于
For reads, use the better variant of checking for the need to call filemap_write_and_wait_range() when doing O_DIRECT. This avoids falling back to the slow path for IOCB_NOWAIT, if there are no pages to wait for (or write out). Link: https://lkml.kernel.org/r/20210224164455.1096727-4-axboe@kernel.dkSigned-off-by: NJens Axboe <axboe@kernel.dk> Reviewed-by: NJan Kara <jack@suse.cz> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 11 3月, 2021 1 次提交
-
-
由 Christoph Hellwig 提交于
Ever since the addition of multipage bio_vecs BIO_MAX_PAGES has been horribly confusingly misnamed. Rename it to BIO_MAX_VECS to stop confusing users of the bio API. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NMatthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20210311110137.1132391-2-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
- 09 2月, 2021 1 次提交
-
-
由 Naohiro Aota 提交于
A ZONE_APPEND bio must follow hardware restrictions (e.g. not exceeding max_zone_append_sectors) not to be split. bio_iov_iter_get_pages builds such restricted bio using __bio_iov_append_get_pages if bio_op(bio) == REQ_OP_ZONE_APPEND. To utilize it, we need to set the bio_op before calling bio_iov_iter_get_pages(). This commit introduces IOMAP_F_ZONE_APPEND, so that iomap user can set the flag to indicate they want REQ_OP_ZONE_APPEND and restricted bio. Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: NNaohiro Aota <naohiro.aota@wdc.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
- 25 1月, 2021 1 次提交
-
-
由 Pavel Begunkov 提交于
Add a helper function calculating the number of bvec segments we need to allocate to construct a bio. It doesn't change anything functionally, but will be used to not duplicate special cases in the future. Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NPavel Begunkov <asml.silence@gmail.com> Reviewed-by: NMing Lei <ming.lei@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 24 1月, 2021 3 次提交
-
-
由 Christoph Hellwig 提交于
Add a flag to signal that only pure overwrites are allowed. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NDarrick J. Wong <djwong@kernel.org> Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
-
由 Christoph Hellwig 提交于
Pass a set of flags to iomap_dio_rw instead of the boolean wait_for_completion argument. The IOMAP_DIO_FORCE_WAIT flag replaces the wait_for_completion, but only needs to be passed when the iocb isn't synchronous to start with to simplify the callers. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> [djwong: rework xfs_file.c so that we can push iomap changes separately] Reviewed-by: NDarrick J. Wong <djwong@kernel.org> Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
-
由 Christoph Hellwig 提交于
Rename flags to iomap_flags to make the usage a little more clear. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NDarrick J. Wong <djwong@kernel.org> Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
-
- 28 9月, 2020 2 次提交
-
-
由 Goldwyn Rodrigues 提交于
iomap complete routine can deadlock with btrfs_fallocate because of the call to generic_write_sync(). P0 P1 inode_lock() fallocate(FALLOC_FL_ZERO_RANGE) __iomap_dio_rw() inode_lock() <block> <submits IO> <completes IO> inode_unlock() <gets inode_lock()> inode_dio_wait() iomap_dio_complete() generic_write_sync() btrfs_file_fsync() inode_lock() <deadlock> inode_dio_end() is used to notify the end of DIO data in order to synchronize with truncate. Call inode_dio_end() before calling generic_write_sync(), so filesystems can lock i_rwsem during a sync. This matches the way it is done in fs/direct-io.c:dio_complete(). Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NJosef Bacik <josef@toxicpanda.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
由 Christoph Hellwig 提交于
This is to avoid the deadlock caused in btrfs because of O_DIRECT | O_DSYNC. Filesystems such as btrfs require i_rwsem while performing sync on a file. iomap_dio_rw() is called under i_rw_sem. This leads to a deadlock because of: iomap_dio_complete() generic_write_sync() btrfs_sync_file() Separate out iomap_dio_complete() from iomap_dio_rw(), so filesystems can call iomap_dio_complete() after unlocking i_rwsem. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NJosef Bacik <josef@toxicpanda.com> Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
- 10 9月, 2020 2 次提交
-
-
由 Andreas Gruenbacher 提交于
When a direct I/O write falls back to buffered I/O entirely, dio->size will be 0 in iomap_dio_complete. Function invalidate_inode_pages2_range will try to invalidate the rest of the address space. If there are any dirty pages in that range, the write will fail and a "Page cache invalidation failure on direct I/O" error will be logged. On gfs2, this can be reproduced as follows: xfs_io \ -c "open -ft foo" -c "pwrite 4k 4k" -c "close" \ -c "open -d foo" -c "pwrite 0 4k" Fix this by recognizing 0-length writes. Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NChristoph Hellwig <hch@lst.de>
-
由 Qian Cai 提交于
It is trivial to trigger a WARN_ON_ONCE(1) in iomap_dio_actor() by unprivileged users which would taint the kernel, or worse - panic if panic_on_warn or panic_on_taint is set. Hence, just convert it to pr_warn_ratelimited() to let users know their workloads are racing. Thank Dave Chinner for the initial analysis of the racing reproducers. Signed-off-by: NQian Cai <cai@lca.pw> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
- 06 8月, 2020 2 次提交
-
-
由 Christoph Hellwig 提交于
Failing to invalid the page cache means data in incoherent, which is a very bad state for the system. Always fall back to buffered I/O through the page cache if we can't invalidate mappings. Signed-off-by: NChristoph Hellwig <hch@lst.de> Acked-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NGoldwyn Rodrigues <rgoldwyn@suse.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Acked-by: NBob Peterson <rpeterso@redhat.com> Acked-by: NDamien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Theodore Ts'o <tytso@mit.edu> # for ext4 Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com> # for gfs2 Reviewed-by: NRitesh Harjani <riteshh@linux.ibm.com>
-
由 Dave Chinner 提交于
The historic requirement for XFS to invalidate cached pages on direct IO reads has been lost in the twisty pages of history - it was inherited from Irix, which implemented page cache invalidation on read as a method of working around problems synchronising page cache state with uncached IO. XFS has carried this ever since. In the initial linux ports it was necessary to get mmap and DIO to play "ok" together and not immediately corrupt data. This was the state of play until the linux kernel had infrastructure to track unwritten extents and synchronise page faults with allocations and unwritten extent conversions (->page_mkwrite infrastructure). IOws, the page cache invalidation on DIO read was necessary to prevent trivial data corruptions. This didn't solve all the problems, though. There were peformance problems if we didn't invalidate the entire page cache over the file on read - we couldn't easily determine if the cached pages were over the range of the IO, and invalidation required taking a serialising lock (i_mutex) on the inode. This serialising lock was an issue for XFS, as it was the only exclusive lock in the direct Io read path. Hence if there were any cached pages, we'd just invalidate the entire file in one go so that subsequent IOs didn't need to take the serialising lock. This was a problem that prevented ranged invalidation from being particularly useful for avoiding the remaining coherency issues. This was solved with the conversion of i_mutex to i_rwsem and the conversion of the XFS inode IO lock to use i_rwsem. Hence we could now just do ranged invalidation and the performance problem went away. However, page cache invalidation was still needed to serialise sub-page/sub-block zeroing via direct IO against buffered IO because bufferhead state attached to the cached page could get out of whack when direct IOs were issued. We've removed bufferheads from the XFS code, and we don't carry any extent state on the cached pages anymore, and so this problem has gone away, too. IOWs, it would appear that we don't have any good reason to be invalidating the page cache on DIO reads anymore. Hence remove the invalidation on read because it is unnecessary overhead, not needed to maintain coherency between mmap/buffered access and direct IO anymore, and prevents anyone from using direct IO reads from intentionally invalidating the page cache of a file. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NMatthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
- 25 5月, 2020 2 次提交
-
-
由 Goldwyn Rodrigues 提交于
Filesystems such as btrfs can perform direct I/O without holding the inode->i_rwsem in some of the cases like writing within i_size. So, remove the check for lockdep_assert_held() in iomap_dio_rw(). Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-
由 Goldwyn Rodrigues 提交于
This helps filesystems to perform tasks on the bio while submitting for I/O. This could be post-write operations such as data CRC or data replication for fs-handled RAID. Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de> Reviewed-by: NNikolay Borisov <nborisov@suse.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com> Signed-off-by: NDavid Sterba <dsterba@suse.com>
-