提交 · 8e81aa16a42169faae1ba15cd648cc8bb83eaa48 · openeuler / Kernel

16 2月, 2023 1 次提交

iomap: remove IOMAP_F_ZONE_APPEND · 8e81aa16

由 Christoph Hellwig 提交于 1月 21, 2023

No users left now that btrfs takes REQ_OP_WRITE bios from iomap and
splits and converts them to REQ_OP_ZONE_APPEND internally.
Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
Reviewed-by: NDamien Le Moal <damien.lemoal@opensource.wdc.com>
Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

8e81aa16

15 11月, 2022 1 次提交

iomap: directly use logical block size · f1bd37a4

由 Keith Busch 提交于 11月 14, 2022

Don't transform the logical block size to a bit shift only to shift it
back to the original block size. Just use the size.

Cc: Christoph Hellwig <hch@lst.de>
Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
Reviewed-by: NBart Van Assche <bvanassche@acm.org>
Signed-off-by: NKeith Busch <kbusch@kernel.org>
Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
Reviewed-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

f1bd37a4

09 8月, 2022 1 次提交

new iov_iter flavour - ITER_UBUF · fcb14cb1

由 Al Viro 提交于 5月 22, 2022

Equivalent of single-segment iovec.  Initialized by iov_iter_ubuf(),
checked for by iter_is_ubuf(), otherwise behaves like ITER_IOVEC
ones.

We are going to expose the things like ->write_iter() et.al. to those
in subsequent commits.

New predicate (user_backed_iter()) that is true for ITER_IOVEC and
ITER_UBUF; places like direct-IO handling should use that for
checking that pages we modify after getting them from iov_iter_get_pages()
would need to be dirtied.

DO NOT assume that replacing iter_is_iovec() with user_backed_iter()
will solve all problems - there's code that uses iter_is_iovec() to
decide how to poke around in iov_iter guts and for that the predicate
replacement obviously won't suffice.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

fcb14cb1

15 7月, 2022 1 次提交

fs/iomap: Use the new blk_opf_t type · dbd4eb81

由 Bart Van Assche 提交于 7月 14, 2022

Improve static type checking by using the enum req_op type for variables
that represent a request operation and the new blk_opf_t type for
the combination of a request operation and request flags.

Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: NBart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20220714180729.1065367-56-bvanassche@acm.orgSigned-off-by: NJens Axboe <axboe@kernel.dk>

dbd4eb81

27 6月, 2022 1 次提交

iomap: add support for dma aligned direct-io · bf8d0853

由 Keith Busch 提交于 6月 10, 2022

Use the address alignment requirements from the block_device for direct
io instead of requiring addresses be aligned to the block size.
Signed-off-by: NKeith Busch <kbusch@kernel.org>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220610195830.3574005-12-kbusch@fb.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

bf8d0853

11 6月, 2022 2 次提交

iocb: delay evaluation of IS_SYNC(...) until we want to check IOCB_DSYNC · 91b94c5d

由 Al Viro 提交于 5月 22, 2022

New helper to be used instead of direct checks for IOCB_DSYNC:
iocb_is_dsync(iocb).  Checks converted, which allows to avoid
the IS_SYNC(iocb->ki_filp->f_mapping->host) part (4 cache lines)
from iocb_flags() - it's checked in iocb_is_dsync() instead
Reviewed-by: NChristian Brauner (Microsoft) <brauner@kernel.org>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

91b94c5d

teach iomap_dio_rw() to suppress dsync · 36518b6b

由 Al Viro 提交于 6月 07, 2022

New flag, equivalent to removal of IOCB_DSYNC from iocb flags.
This mimics what btrfs is doing (and that's what btrfs will
switch to).  However, I'm not at all sure that we want to
suppress REQ_FUA for those - all btrfs hack really cares about
is suppression of generic_write_sync().  For now let's keep
the existing behaviour, but I really want to hear more detailed
arguments pro or contra.

[folded brain fix from willy]
Suggested-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NChristian Brauner (Microsoft) <brauner@kernel.org>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

36518b6b

16 5月, 2022 2 次提交

iomap: add per-iomap_iter private data · 786f847f

由 Christoph Hellwig 提交于 5月 05, 2022

Allow the file system to keep state for all iterations.  For now only
wire it up for direct I/O as there is an immediate need for it there.
Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
Reviewed-by: NNikolay Borisov <nborisov@suse.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

786f847f

iomap: allow the file system to provide a bio_set for direct I/O · 908c5490

由 Christoph Hellwig 提交于 5月 05, 2022

Allow the file system to provide a specific bio_set for allocating
direct I/O bios.  This will allow file systems that use the
->submit_io hook to stash away additional information for file system
use.

To make use of this additional space for information in the completion
path, the file system needs to override the ->bi_end_io callback and
then call back into iomap, so export iomap_dio_bio_end_io for that.
Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
Reviewed-by: NNikolay Borisov <nborisov@suse.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

908c5490

03 5月, 2022 1 次提交

block: ignore RWF_HIPRI hint for sync dio · 9650b453

由 Ming Lei 提交于 4月 20, 2022

So far bio is marked as REQ_POLLED if RWF_HIPRI/IOCB_HIPRI is passed
from userspace sync io interface, then block layer tries to poll until
the bio is completed. But the current implementation calls
blk_io_schedule() if bio_poll() returns 0, and this way causes io hang or
timeout easily.

But looks no one reports this kind of issue, which should have been
triggered in normal io poll sanity test or blktests block/007 as
observed by Changhui, that means it is very likely that no one uses it
or no one cares it.

Also after io_uring is invented, io poll for sync dio becomes legacy
interface.

So ignore RWF_HIPRI hint for sync dio.

CC: linux-mm@kvack.org
Cc: linux-xfs@vger.kernel.org
Reported-by: NChanghui Zhong <czhong@redhat.com>
Suggested-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Tested-by: NChanghui Zhong <czhong@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220420143110.2679002-1-ming.lei@redhat.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

9650b453

18 4月, 2022 1 次提交

block: add a bdev_fua helper · a557e82e

由 Christoph Hellwig 提交于 4月 15, 2022

Add a helper to check the FUA flag based on the block_device instead of
having to poke into the block layer internal request_queue.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20220415045258.199825-14-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

a557e82e

08 3月, 2022 1 次提交

block: remove the per-bio/request write hint · c75e707f

由 Christoph Hellwig 提交于 3月 04, 2022

With the NVMe support for this gone, there are no consumers of these hints
left, so remove them.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220304175556.407719-2-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

c75e707f

09 2月, 2022 1 次提交

iomap: support direct I/O with fscrypt using blk-crypto · 489734ef

由 Eric Biggers 提交于 1月 28, 2022

Encrypted files traditionally haven't supported DIO, due to the need to
encrypt/decrypt the data. However, when the encryption is implemented
using inline encryption (blk-crypto) instead of the traditional
filesystem-layer encryption, it is straightforward to support DIO.

Add support for this to the iomap DIO implementation by calling
fscrypt_set_bio_crypt_ctx() to set encryption contexts on the bios.

Don't check for the rare case where a DUN (crypto data unit number)
discontiguity creates a boundary that bios must not cross. Instead,
filesystems are expected to handle this in ->iomap_begin() by limiting
the length of the mapping so that iomap doesn't have to worry about it.
Co-developed-by: NSatya Tangirala <satyat@google.com>
Signed-off-by: NSatya Tangirala <satyat@google.com>
Acked-by: NDarrick J. Wong <djwong@kernel.org>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20220128233940.79464-3-ebiggers@kernel.orgSigned-off-by: NEric Biggers <ebiggers@google.com>

489734ef

02 2月, 2022 1 次提交

block: pass a block_device and opf to bio_alloc · 07888c66

由 Christoph Hellwig 提交于 1月 24, 2022

Pass the block_device and operation that we plan to use this bio for to
bio_alloc to optimize the assignment. NULL/0 can be passed, both for the
passthrough case on a raw request_queue and to temporarily avoid
refactoring some nasty code.

Also move the gfp_mask argument after the nr_vecs argument for a much
more logical calling convention matching what most of the kernel does.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com>
Link: https://lore.kernel.org/r/20220124091107.642561-18-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

07888c66

04 12月, 2021 1 次提交

mm: move filemap_range_needs_writeback() into header · 4bdcd1dd

由 Jens Axboe 提交于 10月 28, 2021

No functional changes in this patch, just in preparation for efficiently
calling this light function from the block O_DIRECT handling.
Reviewed-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

4bdcd1dd

26 10月, 2021 1 次提交

fs: get rid of the res2 iocb->ki_complete argument · 6b19b766

由 Jens Axboe 提交于 10月 21, 2021

The second argument was only used by the USB gadget code, yet everyone
pays the overhead of passing a zero to be passed into aio, where it
ends up being part of the aio res2 value.

Now that everybody is passing in zero, kill off the extra argument.
Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

6b19b766

24 10月, 2021 3 次提交

iomap: Add done_before argument to iomap_dio_rw · 4fdccaa0

由 Andreas Gruenbacher 提交于 7月 24, 2021

Add a done_before argument to iomap_dio_rw that indicates how much of
the request has already been transferred.  When the request succeeds, we
report that done_before additional bytes were tranferred.  This is
useful for finishing a request asynchronously when part of the request
has already been completed synchronously.

We'll use that to allow iomap_dio_rw to be used with page faults
disabled: when a page fault occurs while submitting a request, we
synchronously complete the part of the request that has already been
submitted.  The caller can then take care of the page fault and call
iomap_dio_rw again for the rest of the request, passing in the number of
bytes already tranferred.
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: NDarrick J. Wong <djwong@kernel.org>

4fdccaa0

iomap: Support partial direct I/O on user copy failures · 97308f8b

由 Andreas Gruenbacher 提交于 7月 23, 2021

In iomap_dio_rw, when iomap_apply returns an -EFAULT error and the
IOMAP_DIO_PARTIAL flag is set, complete the request synchronously and
return a partial result.  This allows the caller to deal with the page
fault and retry the remainder of the request.
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: NDarrick J. Wong <djwong@kernel.org>

97308f8b

iomap: Fix iomap_dio_rw return value for user copies · 42c498c1

由 Andreas Gruenbacher 提交于 7月 15, 2021

When a user copy fails in one of the helpers of iomap_dio_rw, fail with
-EFAULT instead of returning 0.  This matches what iomap_dio_bio_actor
returns when it gets an -EFAULT from bio_iov_iter_get_pages.  With these
changes, iomap_dio_actor now consistently fails with -EFAULT when a user
page cannot be faulted in.
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

42c498c1

19 10月, 2021 1 次提交

block: add a struct io_comp_batch argument to fops->iopoll() · 5a72e899

由 Jens Axboe 提交于 10月 12, 2021

struct io_comp_batch contains a list head and a completion handler, which
will allow completions to more effciently completed batches of IO.

For now, no functional changes in this patch, we just define the
io_comp_batch structure and add the argument to the file_operations iopoll
handler.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

5a72e899

18 10月, 2021 3 次提交

block: switch polling to be bio based · 3e08773c

由 Christoph Hellwig 提交于 10月 12, 2021

Replace the blk_poll interface that requires the caller to keep a queue
and cookie from the submissions with polling based on the bio.

Polling for the bio itself leads to a few advantages:

 - the cookie construction can made entirely private in blk-mq.c
 - the caller does not need to remember the request_queue and cookie
   separately and thus sidesteps their lifetime issues
 - keeping the device and the cookie inside the bio allows to trivially
   support polling BIOs remapping by stacking drivers
 - a lot of code to propagate the cookie back up the submission path can
   be removed entirely.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Tested-by: NMark Wunderlich <mark.wunderlich@intel.com>
Link: https://lore.kernel.org/r/20211012111226.760968-15-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

3e08773c

block: replace the spin argument to blk_iopoll with a flags argument · ef99b2d3

由 Christoph Hellwig 提交于 10月 12, 2021

Switch the boolean spin argument to blk_poll to passing a set of flags
instead.  This will allow to control polling behavior in a more fine
grained way.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Tested-by: NMark Wunderlich <mark.wunderlich@intel.com>
Link: https://lore.kernel.org/r/20211012111226.760968-10-hch@lst.de
[axboe: adapt to changed io_uring iopoll]
Signed-off-by: NJens Axboe <axboe@kernel.dk>

ef99b2d3

iomap: don't try to poll multi-bio I/Os in __iomap_dio_rw · f79d4749

由 Christoph Hellwig 提交于 10月 12, 2021

If an iocb is split into multiple bios we can't poll for both. So don't
bother to even try to poll in that case.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Tested-by: NMark Wunderlich <mark.wunderlich@intel.com>
Link: https://lore.kernel.org/r/20211012111226.760968-4-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

f79d4749

17 8月, 2021 1 次提交

iomap: switch __iomap_dio_rw to use iomap_iter · a6d3d495

由 Christoph Hellwig 提交于 8月 10, 2021

Switch __iomap_dio_rw to use iomap_iter.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
Signed-off-by: NDarrick J. Wong <djwong@kernel.org>

a6d3d495

04 8月, 2021 1 次提交

iomap: support reading inline data from non-zero pos · 69f4a26c

由 Gao Xiang 提交于 8月 03, 2021

The existing inline data support only works for cases where the entire
file is stored as inline data.  For larger files, EROFS stores the
initial blocks separately and the remainder of the file ("file tail")
adjacent to the inode.  Generalise inline data to allow reading the
inline file tail.  Tails may not cross a page boundary in memory.

We currently have no filesystems that support tails and writing,
so that case is currently disabled (see iomap_write_begin_inline).
Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: NGao Xiang <hsiangkao@linux.alibaba.com>
Signed-off-by: NDarrick J. Wong <djwong@kernel.org>

69f4a26c

01 5月, 2021 1 次提交

iomap: use filemap_range_needs_writeback() for O_DIRECT reads · 985b71db

由 Jens Axboe 提交于 4月 29, 2021

For reads, use the better variant of checking for the need to call
filemap_write_and_wait_range() when doing O_DIRECT.  This avoids falling
back to the slow path for IOCB_NOWAIT, if there are no pages to wait for
(or write out).

Link: https://lkml.kernel.org/r/20210224164455.1096727-4-axboe@kernel.dkSigned-off-by: NJens Axboe <axboe@kernel.dk>
Reviewed-by: NJan Kara <jack@suse.cz>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

985b71db

11 3月, 2021 1 次提交

block: rename BIO_MAX_PAGES to BIO_MAX_VECS · a8affc03

由 Christoph Hellwig 提交于 3月 11, 2021

Ever since the addition of multipage bio_vecs BIO_MAX_PAGES has been
horribly confusingly misnamed. Rename it to BIO_MAX_VECS to stop
confusing users of the bio API.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20210311110137.1132391-2-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

a8affc03

09 2月, 2021 1 次提交

iomap: support REQ_OP_ZONE_APPEND · c3b0e880

由 Naohiro Aota 提交于 2月 04, 2021

A ZONE_APPEND bio must follow hardware restrictions (e.g. not exceeding
max_zone_append_sectors) not to be split. bio_iov_iter_get_pages builds
such restricted bio using __bio_iov_append_get_pages if bio_op(bio) ==
REQ_OP_ZONE_APPEND.

To utilize it, we need to set the bio_op before calling
bio_iov_iter_get_pages(). This commit introduces IOMAP_F_ZONE_APPEND, so
that iomap user can set the flag to indicate they want REQ_OP_ZONE_APPEND
and restricted bio.
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: NNaohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

c3b0e880

25 1月, 2021 1 次提交

bio: add a helper calculating nr segments to alloc · 3e1a88ec

由 Pavel Begunkov 提交于 1月 09, 2021

Add a helper function calculating the number of bvec segments we need to
allocate to construct a bio. It doesn't change anything functionally,
but will be used to not duplicate special cases in the future.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
Reviewed-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

3e1a88ec

24 1月, 2021 3 次提交

iomap: add a IOMAP_DIO_OVERWRITE_ONLY flag · 213f6271

由 Christoph Hellwig 提交于 1月 23, 2021

Add a flag to signal that only pure overwrites are allowed.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NBrian Foster <bfoster@redhat.com>
Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
Signed-off-by: NDarrick J. Wong <djwong@kernel.org>

213f6271

iomap: pass a flags argument to iomap_dio_rw · 2f632965

由 Christoph Hellwig 提交于 1月 23, 2021

Pass a set of flags to iomap_dio_rw instead of the boolean
wait_for_completion argument.  The IOMAP_DIO_FORCE_WAIT flag
replaces the wait_for_completion, but only needs to be passed
when the iocb isn't synchronous to start with to simplify the
callers.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NBrian Foster <bfoster@redhat.com>
[djwong: rework xfs_file.c so that we can push iomap changes separately]
Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
Signed-off-by: NDarrick J. Wong <djwong@kernel.org>

2f632965

iomap: rename the flags variable in __iomap_dio_rw · 5724be5d

由 Christoph Hellwig 提交于 1月 23, 2021

Rename flags to iomap_flags to make the usage a little more clear.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NBrian Foster <bfoster@redhat.com>
Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
Signed-off-by: NDarrick J. Wong <djwong@kernel.org>

5724be5d

28 9月, 2020 2 次提交

iomap: Call inode_dio_end() before generic_write_sync() · 1a31182e

由 Goldwyn Rodrigues 提交于 9月 28, 2020

iomap complete routine can deadlock with btrfs_fallocate because of the
call to generic_write_sync().

P0                      P1
inode_lock()            fallocate(FALLOC_FL_ZERO_RANGE)
__iomap_dio_rw()        inode_lock()
                        <block>
<submits IO>
<completes IO>
inode_unlock()
                        <gets inode_lock()>
                        inode_dio_wait()
iomap_dio_complete()
  generic_write_sync()
    btrfs_file_fsync()
      inode_lock()
      <deadlock>

inode_dio_end() is used to notify the end of DIO data in order
to synchronize with truncate. Call inode_dio_end() before calling
generic_write_sync(), so filesystems can lock i_rwsem during a sync.

This matches the way it is done in fs/direct-io.c:dio_complete().
Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

1a31182e

iomap: Allow filesystem to call iomap_dio_complete without i_rwsem · c3d4ed1a

由 Christoph Hellwig 提交于 9月 28, 2020

This is to avoid the deadlock caused in btrfs because of O_DIRECT |
O_DSYNC.

Filesystems such as btrfs require i_rwsem while performing sync on a
file. iomap_dio_rw() is called under i_rw_sem. This leads to a
deadlock because of:

iomap_dio_complete()
  generic_write_sync()
    btrfs_sync_file()

Separate out iomap_dio_complete() from iomap_dio_rw(), so filesystems
can call iomap_dio_complete() after unlocking i_rwsem.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

c3d4ed1a

10 9月, 2020 2 次提交

iomap: Fix direct I/O write consistency check · c114bbc6

由 Andreas Gruenbacher 提交于 9月 10, 2020

When a direct I/O write falls back to buffered I/O entirely, dio->size
will be 0 in iomap_dio_complete.  Function invalidate_inode_pages2_range
will try to invalidate the rest of the address space.  If there are any
dirty pages in that range, the write will fail and a "Page cache
invalidation failure on direct I/O" error will be logged.

On gfs2, this can be reproduced as follows:

  xfs_io \
    -c "open -ft foo" -c "pwrite 4k 4k" -c "close" \
    -c "open -d foo" -c "pwrite 0 4k"

Fix this by recognizing 0-length writes.
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

c114bbc6

iomap: fix WARN_ON_ONCE() from unprivileged users · a805c111

由 Qian Cai 提交于 9月 10, 2020

It is trivial to trigger a WARN_ON_ONCE(1) in iomap_dio_actor() by
unprivileged users which would taint the kernel, or worse - panic if
panic_on_warn or panic_on_taint is set. Hence, just convert it to
pr_warn_ratelimited() to let users know their workloads are racing.
Thank Dave Chinner for the initial analysis of the racing reproducers.
Signed-off-by: NQian Cai <cai@lca.pw>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

a805c111

06 8月, 2020 2 次提交

iomap: fall back to buffered writes for invalidation failures · 60263d58

由 Christoph Hellwig 提交于 7月 23, 2020

Failing to invalid the page cache means data in incoherent, which is
a very bad state for the system.  Always fall back to buffered I/O
through the page cache if we can't invalidate mappings.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Acked-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Acked-by: NBob Peterson <rpeterso@redhat.com>
Acked-by: NDamien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Theodore Ts'o <tytso@mit.edu> # for ext4
Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com> # for gfs2
Reviewed-by: NRitesh Harjani <riteshh@linux.ibm.com>

60263d58

iomap: Only invalidate page cache pages on direct IO writes · 54752de9

由 Dave Chinner 提交于 7月 23, 2020

The historic requirement for XFS to invalidate cached pages on
direct IO reads has been lost in the twisty pages of history - it was
inherited from Irix, which implemented page cache invalidation on
read as a method of working around problems synchronising page
cache state with uncached IO.

XFS has carried this ever since. In the initial linux ports it was
necessary to get mmap and DIO to play "ok" together and not
immediately corrupt data. This was the state of play until the linux
kernel had infrastructure to track unwritten extents and synchronise
page faults with allocations and unwritten extent conversions
(->page_mkwrite infrastructure). IOws, the page cache invalidation
on DIO read was necessary to prevent trivial data corruptions. This
didn't solve all the problems, though.

There were peformance problems if we didn't invalidate the entire
page cache over the file on read - we couldn't easily determine if
the cached pages were over the range of the IO, and invalidation
required taking a serialising lock (i_mutex) on the inode. This
serialising lock was an issue for XFS, as it was the only exclusive
lock in the direct Io read path.

Hence if there were any cached pages, we'd just invalidate the
entire file in one go so that subsequent IOs didn't need to take the
serialising lock. This was a problem that prevented ranged
invalidation from being particularly useful for avoiding the
remaining coherency issues. This was solved with the conversion of
i_mutex to i_rwsem and the conversion of the XFS inode IO lock to
use i_rwsem. Hence we could now just do ranged invalidation and the
performance problem went away.

However, page cache invalidation was still needed to serialise
sub-page/sub-block zeroing via direct IO against buffered IO because
bufferhead state attached to the cached page could get out of whack
when direct IOs were issued.  We've removed bufferheads from the
XFS code, and we don't carry any extent state on the cached pages
anymore, and so this problem has gone away, too.

IOWs, it would appear that we don't have any good reason to be
invalidating the page cache on DIO reads anymore. Hence remove the
invalidation on read because it is unnecessary overhead,
not needed to maintain coherency between mmap/buffered access and
direct IO anymore, and prevents anyone from using direct IO reads
from intentionally invalidating the page cache of a file.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

54752de9

25 5月, 2020 2 次提交

iomap: remove lockdep_assert_held() · 3ad99bec

由 Goldwyn Rodrigues 提交于 11月 30, 2019

Filesystems such as btrfs can perform direct I/O without holding the
inode->i_rwsem in some of the cases like writing within i_size.  So,
remove the check for lockdep_assert_held() in iomap_dio_rw().
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

3ad99bec

iomap: add a filesystem hook for direct I/O bio submission · 8cecd0ba

由 Goldwyn Rodrigues 提交于 5月 14, 2019

This helps filesystems to perform tasks on the bio while submitting for
I/O. This could be post-write operations such as data CRC or data
replication for fs-handled RAID.
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: NNikolay Borisov <nborisov@suse.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

8cecd0ba

openeuler / Kernel 大约 1 年 前同步成功

openeuler / Kernel
大约 1 年前同步成功