提交 · 7dd73802f97d2a1602b1cf5c1d6623fb08cb15c5 · openeuler / Kernel

07 11月, 2022 1 次提交

xfs: write page faults in iomap are not buffered writes · 118e021b

由 Dave Chinner 提交于 11月 07, 2022

When we reserve a delalloc region in xfs_buffered_write_iomap_begin,
we mark the iomap as IOMAP_F_NEW so that the the write context
understands that it allocated the delalloc region.

If we then fail that buffered write, xfs_buffered_write_iomap_end()
checks for the IOMAP_F_NEW flag and if it is set, it punches out
the unused delalloc region that was allocated for the write.

The assumption this code makes is that all buffered write operations
that can allocate space are run under an exclusive lock (i_rwsem).
This is an invalid assumption: page faults in mmap()d regions call
through this same function pair to map the file range being faulted
and this runs only holding the inode->i_mapping->invalidate_lock in
shared mode.

IOWs, we can have races between page faults and write() calls that
fail the nested page cache write operation that result in data loss.
That is, the failing iomap_end call will punch out the data that
the other racing iomap iteration brought into the page cache. This
can be reproduced with generic/34[46] if we arbitrarily fail page
cache copy-in operations from write() syscalls.

Code analysis tells us that the iomap_page_mkwrite() function holds
the already instantiated and uptodate folio locked across the iomap
mapping iterations. Hence the folio cannot be removed from memory
whilst we are mapping the range it covers, and as such we do not
care if the mapping changes state underneath the iomap iteration
loop:

1. if the folio is not already dirty, there is no writeback races
   possible.
2. if we allocated the mapping (delalloc or unwritten), the folio
   cannot already be dirty. See #1.
3. If the folio is already dirty, it must be up to date. As we hold
   it locked, it cannot be reclaimed from memory. Hence we always
   have valid data in the page cache while iterating the mapping.
4. Valid data in the page cache can exist when the underlying
   mapping is DELALLOC, UNWRITTEN or WRITTEN. Having the mapping
   change from DELALLOC->UNWRITTEN or UNWRITTEN->WRITTEN does not
   change the data in the page - it only affects actions if we are
   initialising a new page. Hence #3 applies  and we don't care
   about these extent map transitions racing with
   iomap_page_mkwrite().
5. iomap_page_mkwrite() checks for page invalidation races
   (truncate, hole punch, etc) after it locks the folio. We also
   hold the mapping->invalidation_lock here, and hence the mapping
   cannot change due to extent removal operations while we are
   iterating the folio.

As such, filesystems that don't use bufferheads will never fail
the iomap_folio_mkwrite_iter() operation on the current mapping,
regardless of whether the iomap should be considered stale.

Further, the range we are asked to iterate is limited to the range
inside EOF that the folio spans. Hence, for XFS, we will only map
the exact range we are asked for, and we will only do speculative
preallocation with delalloc if we are mapping a hole at the EOF
page. The iterator will consume the entire range of the folio that
is within EOF, and anything beyond the EOF block cannot be accessed.
We never need to truncate this post-EOF speculative prealloc away in
the context of the iomap_page_mkwrite() iterator because if it
remains unused we'll remove it when the last reference to the inode
goes away.

Hence we don't actually need an .iomap_end() cleanup/error handling
path at all for iomap_page_mkwrite() for XFS. This means we can
separate the page fault processing from the complexity of the
.iomap_end() processing in the buffered write path. This also means
that the buffered write path will also be able to take the
mapping->invalidate_lock as necessary.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDarrick J. Wong <djwong@kernel.org>

118e021b

31 10月, 2022 1 次提交

xfs: fix incorrect return type for fsdax fault handlers · 47ba8cc7

由 Darrick J. Wong 提交于 10月 24, 2022

The kernel robot complained about this:

>> fs/xfs/xfs_file.c:1266:31: sparse: sparse: incorrect type in return expression (different base types) @@     expected int @@     got restricted vm_fault_t @@
   fs/xfs/xfs_file.c:1266:31: sparse:     expected int
   fs/xfs/xfs_file.c:1266:31: sparse:     got restricted vm_fault_t
   fs/xfs/xfs_file.c:1314:21: sparse: sparse: incorrect type in assignment (different base types) @@     expected restricted vm_fault_t [usertype] ret @@     got int @@
   fs/xfs/xfs_file.c:1314:21: sparse:     expected restricted vm_fault_t [usertype] ret
   fs/xfs/xfs_file.c:1314:21: sparse:     got int

Fix the incorrect return type for these two functions.

While we're at it, make the !fsdax version return VM_FAULT_SIGBUS
because a zero return value will cause some callers to try to lock
vmf->page, which we never set here.

Fixes: ea6c49b7 ("xfs: support CoW in fsdax mode")
Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
Reviewed-by: NDave Chinner <dchinner@redhat.com>

47ba8cc7

06 8月, 2022 1 次提交

xfs: check return codes when flushing block devices · 7d839e32

由 Darrick J. Wong 提交于 7月 28, 2022

If a blkdev_issue_flush fails, fsync needs to report that to upper
levels.  Modify xfs_file_fsync to capture the errors, while trying to
flush as much data and log updates to disk as possible.

If log writes cannot flush the data device, we need to shut down the log
immediately because we've violated a log invariant.  Modify this code to
check the return value of blkdev_issue_flush as well.

This behavior seems to go back to about 2.6.15 or so, which makes this
fixes tag a bit misleading.

Link: https://elixir.bootlin.com/linux/v2.6.15/source/fs/xfs/xfs_vnodeops.c#L1187
Fixes: b5071ada ("xfs: remove xfs_blkdev_issue_flush")
Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
Reviewed-by: NDave Chinner <dchinner@redhat.com>

7d839e32

25 7月, 2022 1 次提交

xfs: Add async buffered write support · 1aa91d9c

由 Stefan Roesch 提交于 6月 23, 2022

This adds the async buffered write support to XFS. For async buffered
write requests, the request will return -EAGAIN if the ilock cannot be
obtained immediately.
Signed-off-by: NStefan Roesch <shr@fb.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
Link: https://lore.kernel.org/r/20220623175157.1715274-15-shr@fb.comSigned-off-by: NJens Axboe <axboe@kernel.dk>

1aa91d9c

18 7月, 2022 2 次提交

xfs: add dax dedupe support · 13f9e267

由 Shiyang Ruan 提交于 6月 03, 2022

Introduce xfs_mmaplock_two_inodes_and_break_dax_layout() for dax files who
are going to be deduped.  After that, call compare range function only
when files are both DAX or not.

Link: https://lkml.kernel.org/r/20220603053738.1218681-15-ruansy.fnst@fujitsu.comSigned-off-by: NShiyang Ruan <ruansy.fnst@fujitsu.com>
Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Dan Williams <dan.j.wiliams@intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Goldwyn Rodrigues <rgoldwyn@suse.com>
Cc: Goldwyn Rodrigues <rgoldwyn@suse.de>
Cc: Jane Chu <jane.chu@oracle.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: Ritesh Harjani <riteshh@linux.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>

13f9e267

xfs: support CoW in fsdax mode · ea6c49b7

由 Shiyang Ruan 提交于 6月 03, 2022

In fsdax mode, WRITE and ZERO on a shared extent need CoW performed. 
After that, new allocated extents needs to be remapped to the file.  So,
add a CoW identification in ->iomap_begin(), and implement ->iomap_end()
to do the remapping work.

[akpm@linux-foundation.org: make xfs_dax_fault() static]
Link: https://lkml.kernel.org/r/20220603053738.1218681-14-ruansy.fnst@fujitsu.comSigned-off-by: NShiyang Ruan <ruansy.fnst@fujitsu.com>
Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dan Williams <dan.j.wiliams@intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Goldwyn Rodrigues <rgoldwyn@suse.com>
Cc: Goldwyn Rodrigues <rgoldwyn@suse.de>
Cc: Jane Chu <jane.chu@oracle.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Miaohe Lin <linmiaohe@huawei.com>
Cc: Naoya Horiguchi <naoya.horiguchi@nec.com>
Cc: Ritesh Harjani <riteshh@linux.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>

ea6c49b7

22 5月, 2022 1 次提交

xfs: reduce IOCB_NOWAIT judgment for retry exclusive unaligned DIO · 93e6aa43

由 Kaixu Xia 提交于 5月 22, 2022

Retry unaligned DIO with exclusive blocking semantics only when the
IOCB_NOWAIT flag is not set. If we are doing nonblocking user I/O,
propagate the error directly.
Signed-off-by: NKaixu Xia <kaixuxia@tencent.com>
Reviewed-by: NChaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NDave Chinner <david@fromorbit.com>

93e6aa43

16 5月, 2022 1 次提交

iomap: add per-iomap_iter private data · 786f847f

由 Christoph Hellwig 提交于 5月 05, 2022

Allow the file system to keep state for all iterations.  For now only
wire it up for direct I/O as there is an immediate need for it there.
Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
Reviewed-by: NNikolay Borisov <nborisov@suse.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

786f847f

21 4月, 2022 2 次提交

xfs: convert inode lock flags to unsigned. · a1033753

由 Dave Chinner 提交于 4月 21, 2022

5.18 w/ std=gnu11 compiled with gcc-5 wants flags stored in unsigned
fields to be unsigned.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NChandan Babu R <chandan.babu@oracle.com>
Signed-off-by: NDave Chinner <david@fromorbit.com>

a1033753

xfs: simplify local variable assignment in file write code · 2d9ac431

由 Kaixu Xia 提交于 4月 21, 2022

Get the struct inode pointer from iocb->ki_filp->f_mapping->host
directly and the other variables are unnecessary, so simplify the
local variables assignment.
Signed-off-by: NKaixu Xia <kaixuxia@tencent.com>
Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
Signed-off-by: NDave Chinner <david@fromorbit.com>

2d9ac431

12 4月, 2022 1 次提交

xfs: Use generic_file_open() · f3bf67c6

由 Matthew Wilcox (Oracle) 提交于 4月 12, 2022

Remove the open-coded check of O_LARGEFILE.  This changes the errno
to be the same as other filesystems; it was changed generically in
2.6.24 but that fix skipped XFS.
Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NDave Chinner <david@fromorbit.com>

f3bf67c6

02 2月, 2022 5 次提交

xfs: ensure log flush at the end of a synchronous fallocate call · cea267c2