提交 · f150b4234397448c6abab8785e58a222bfd9ec00 · openeuler / Kernel

22 10月, 2019 3 次提交

xfs: split the iomap ops for buffered vs direct writes · f150b423

由 Christoph Hellwig 提交于 10月 19, 2019

Instead of lots of magic conditionals in the main write_begin
handler this make the intent very clear.  Thing will become even
better once we support delayed allocations for extent size hints
and realtime allocations.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

f150b423

xfs: pass two imaps to xfs_reflink_allocate_cow · ffb375a8

由 Christoph Hellwig 提交于 10月 19, 2019

xfs_reflink_allocate_cow consumes the source data fork imap, and
potentially returns the COW fork imap.  Split the arguments in two
to clear up the calling conventions and to prepare for returning
a source iomap from ->iomap_begin.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

ffb375a8

xfs: remove xfs_reflink_dirty_extents · dd26b846

由 Christoph Hellwig 提交于 10月 19, 2019

Now that xfs_file_unshare is not completely dumb we can just call it
directly without iterating the extent and reflink btrees ourselves.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

dd26b846

21 10月, 2019 1 次提交

iomap: ignore non-shared or non-data blocks in xfs_file_dirty · 3590c4d8

由 Christoph Hellwig 提交于 10月 18, 2019

xfs_file_dirty is used to unshare reflink blocks.  Rename the function
to xfs_file_unshare to better document that purpose, and skip iomaps
that are not shared and don't need zeroing.  This will allow to simplify
the caller.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

3590c4d8

28 8月, 2019 2 次提交

xfs: remove unnecessary int returns from deferred bmap functions · 3e08f42a

由 Darrick J. Wong 提交于 8月 26, 2019

Remove the return value from the functions that schedule deferred bmap
operations since they never fail and do not return status.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NDave Chinner <dchinner@redhat.com>

3e08f42a

xfs: remove unnecessary int returns from deferred refcount functions · 74b4c5d4

由 Darrick J. Wong 提交于 8月 26, 2019

Remove the return value from the functions that schedule deferred
refcount operations since they never fail and do not return status.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NDave Chinner <dchinner@redhat.com>

74b4c5d4

19 8月, 2019 1 次提交

xfs: fix reflink source file racing with directio writes · 5d888b48

由 Darrick J. Wong 提交于 8月 14, 2019

While trawling through the dedupe file comparison code trying to fix
page deadlocking problems, Dave Chinner noticed that the reflink code
only takes shared IOLOCK/MMAPLOCKs on the source file. Because
page_mkwrite and directio writes do not take the EXCL versions of those
locks, this means that reflink can race with writer processes.

For pure remapping this can lead to undefined behavior and file
corruption; for dedupe this means that we cannot be sure that the
contents are identical when we decide to go ahead with the remapping.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

5d888b48

01 7月, 2019 1 次提交

xfs: remove XFS_TRANS_NOFS · 73d30d48

由 Christoph Hellwig 提交于 6月 28, 2019

Instead of a magic flag for xfs_trans_alloc, just ensure all callers
that can't relclaim through the file system use memalloc_nofs_save to
set the per-task nofs flag.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

73d30d48

29 6月, 2019 1 次提交

xfs: remove unused header files · 250d4b4c

由 Eric Sandeen 提交于 6月 28, 2019

There are many, many xfs header files which are included but
unneeded (or included twice) in the xfs code, so remove them.

nb: xfs_linux.h includes about 9 headers for everyone, so those
explicit includes get removed by this.  I'm not sure what the
preference is, but if we wanted explicit includes everywhere,
a followup patch could remove those xfs_*.h includes from
xfs_linux.h and move them into the files that need them.
Or it could be left as-is.
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

250d4b4c

26 2月, 2019 2 次提交

xfs: fix uninitialized error variables · c1a4447f

由 Darrick J. Wong 提交于 2月 25, 2019

smatch complained about some uninitialized error returns, so fix those.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NAllison Henderson <allison.henderson@oracle.com>

c1a4447f

xfs: don't pass iomap flags to xfs_reflink_allocate_cow · affe250a

由 Darrick J. Wong 提交于 2月 21, 2019

Don't pass raw iomap flags to xfs_reflink_allocate_cow; signal our
intention with a boolean argument.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NBrian Foster <bfoster@redhat.com>

affe250a

21 2月, 2019 4 次提交

xfs: introduce an always_cow mode · 66ae56a5

由 Christoph Hellwig 提交于 2月 18, 2019

Add a mode where XFS never overwrites existing blocks in place.  This
is to aid debugging our COW code, and also put infatructure in place
for things like possible future support for zoned block devices, which
can't support overwrites.

This mode is enabled globally by doing a:

    echo 1 > /sys/fs/xfs/debug/always_cow

Note that the parameter is global to allow running all tests in xfstests
easily in this mode, which would not easily be possible with a per-fs
sysfs file.

In always_cow mode persistent preallocations are disabled, and fallocate
will fail when called with a 0 mode (with our without
FALLOC_FL_KEEP_SIZE), and not create unwritten extent for zeroed space
when called with FALLOC_FL_ZERO_RANGE or FALLOC_FL_UNSHARE_RANGE.

There are a few interesting xfstests failures when run in always_cow
mode:

 - generic/392 fails because the bytes used in the file used to test
   hole punch recovery are less after the log replay.  This is
   because the blocks written and then punched out are only freed
   with a delay due to the logging mechanism.
 - xfs/170 will fail as the already fragile file streams mechanism
   doesn't seem to interact well with the COW allocator
 - xfs/180 xfs/182 xfs/192 xfs/198 xfs/204 and xfs/208 will claim
   the file system is badly fragmented, but there is not much we
   can do to avoid that when always writing out of place
 - xfs/205 fails because overwriting a file in always_cow mode
   will require new space allocation and the assumption in the
   test thus don't work anymore.
 - xfs/326 fails to modify the file at all in always_cow mode after
   injecting the refcount error, leading to an unexpected md5sum
   after the remount, but that again is expected
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

66ae56a5

xfs: make COW fork unwritten extent conversions more robust · 26b91c72

由 Christoph Hellwig 提交于 2月 18, 2019

If we have racing buffered and direct I/O COW fork extents under
writeback can have been moved to the data fork by the time we call
xfs_reflink_convert_cow from xfs_submit_ioend.  This would be mostly
harmless as the block numbers don't change by this move, except for
the fact that xfs_bmapi_write will crash or trigger asserts when
not finding existing extents, even despite trying to paper over this
with the XFS_BMAPI_CONVERT_ONLY flag.

Instead of special casing non-transaction conversions in the already
way too complicated xfs_bmapi_write just add a new helper for the much
simpler non-transactional COW fork case, which simplify ignores not
found extents.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

26b91c72

xfs: merge COW handling into xfs_file_iomap_begin_delay · db46e604

由 Christoph Hellwig 提交于 2月 18, 2019

Besides simplifying the code a bit this allows to actually implement
the behavior of using COW preallocation for non-COW data mentioned
in the current comments.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

db46e604

xfs: don't use delalloc extents for COW on files with extsize hints · 78f0cc9d

由 Christoph Hellwig 提交于 2月 18, 2019

While using delalloc for extsize hints is generally a good idea, the
current code that does so only for COW doesn't help us much and creates
a lot of special cases.  Switch it to use real allocations like we
do for direct I/O.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

78f0cc9d

18 2月, 2019 1 次提交

xfs: remove the io_type field from the writeback context and ioend · be225fec

由 Christoph Hellwig 提交于 2月 15, 2019

The io_type field contains what is basically a summary of information
from the inode fork and the imap.  But we can just as easily use that
information directly, simplifying a few bits here and there and
improving the trace points.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NBrian Foster <bfoster@redhat.com>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

be225fec

13 12月, 2018 1 次提交

xfs: split up the xfs_reflink_end_cow work into smaller transactions · d6f215f3

由 Darrick J. Wong 提交于 12月 12, 2018

In xfs_reflink_end_cow, we allocate a single transaction for the entire
end_cow operation and then loop the CoW fork mappings to move them to
the data fork.  This design fails on a heavily fragmented filesystem
where an inode's data fork has exactly one more extent than would fit in
an extents-format fork, because the unmap can collapse the data fork
into extents format (freeing the bmbt block) but the remap can expand
the data fork back into a (newly allocated) bmbt block.  If the number
of extents we end up remapping is large, we can overflow the block
reservation because we reserved blocks assuming that we were adding
mappings into an already-cleared area of the data fork.

Let's say we have 8 extents in the data fork, 8 extents in the CoW fork,
and the data fork can hold at most 7 extents before needing to convert
to btree format; and that blocks A-P are discontiguous single-block
extents:

   0......7
D: ABCDEFGH
C: IJKLMNOP

When a write to file blocks 0-7 completes, we must remap I-P into the
data fork.  We start by removing H from the btree-format data fork.  Now
we have 7 extents, so we convert the fork to extents format, freeing the
bmbt block.   We then move P into the data fork and it now has 8 extents
again.  We must convert the data fork back to btree format, requiring a
block allocation.  If we repeat this sequence for blocks 6-5-4-3-2-1-0,
we'll need a total of 8 block allocations to remap all 8 blocks.  We
reserved only enough blocks to handle one btree split (5 blocks on a 4k
block filesystem), which means we overflow the block reservation.

To fix this issue, create a separate helper function to remap a single
extent, and change _reflink_end_cow to call it in a tight loop over the
entire range we're completing.  As a side effect this also removes the
size restrictions on how many extents we can end_cow at a time, though
nobody ever hit that.  It is not reasonable to reserve N blocks to remap
N blocks.

Note that this can be reproduced after ~320 million fsx ops while
running generic/938 (long soak directio fsx exerciser):

XFS: Assertion failed: tp->t_blk_res >= tp->t_blk_res_used, file: fs/xfs/xfs_trans.c, line: 116
<machine registers snipped>
Call Trace:
 xfs_trans_dup+0x211/0x250 [xfs]
 xfs_trans_roll+0x6d/0x180 [xfs]
 xfs_defer_trans_roll+0x10c/0x3b0 [xfs]
 xfs_defer_finish_noroll+0xdf/0x740 [xfs]
 xfs_defer_finish+0x13/0x70 [xfs]
 xfs_reflink_end_cow+0x2c6/0x680 [xfs]
 xfs_dio_write_end_io+0x115/0x220 [xfs]
 iomap_dio_complete+0x3f/0x130
 iomap_dio_rw+0x3c3/0x420
 xfs_file_dio_aio_write+0x132/0x3c0 [xfs]
 xfs_file_write_iter+0x8b/0xc0 [xfs]
 __vfs_write+0x193/0x1f0
 vfs_write+0xba/0x1c0
 ksys_write+0x52/0xc0
 do_syscall_64+0x50/0x160
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NBrian Foster <bfoster@redhat.com>

d6f215f3

22 11月, 2018 1 次提交

xfs: flush removing page cache in xfs_reflink_remap_prep · 2c307174

由 Dave Chinner 提交于 11月 19, 2018

On a sub-page block size filesystem, fsx is failing with a data
corruption after a series of operations involving copying a file
with the destination offset beyond EOF of the destination of the file:

8093(157 mod 256): TRUNCATE DOWN from 0x7a120 to 0x50000 ******WWWW
8094(158 mod 256): INSERT 0x25000 thru 0x25fff (0x1000 bytes)
8095(159 mod 256): COPY 0x18000 thru 0x1afff (0x3000 bytes) to 0x2f400
8096(160 mod 256): WRITE 0x5da00 thru 0x651ff (0x7800 bytes) HOLE
8097(161 mod 256): COPY 0x2000 thru 0x5fff (0x4000 bytes) to 0x6fc00

The second copy here is beyond EOF, and it is to sub-page (4k) but
block aligned (1k) offset. The clone runs the EOF zeroing, landing
in a pre-existing post-eof delalloc extent. This zeroes the post-eof
extents in the page cache just fine, dirtying the pages correctly.

The problem is that xfs_reflink_remap_prep() now truncates the page
cache over the range that it is copying it to, and rounds that down
to cover the entire start page. This removes the dirty page over the
delalloc extent from the page cache without having written it back.
Hence later, when the page cache is flushed, the page at offset
0x6f000 has not been written back and hence exposes stale data,
which fsx trips over less than 10 operations later.

Fix this by changing xfs_reflink_remap_prep() to use
xfs_flush_unmap_range().
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

2c307174

20 11月, 2018 1 次提交

xfs: fix shared extent data corruption due to missing cow reservation · 59e42931

由 Brian Foster 提交于 11月 14, 2018

Page writeback indirectly handles shared extents via the existence
of overlapping COW fork blocks. If COW fork blocks exist, writeback
always performs the associated copy-on-write regardless if the
underlying blocks are actually shared. If the blocks are shared,
then overlapping COW fork blocks must always exist.

fstests shared/010 reproduces a case where a buffered write occurs
over a shared block without performing the requisite COW fork
reservation.  This ultimately causes writeback to the shared extent
and data corruption that is detected across md5 checks of the
filesystem across a mount cycle.

The problem occurs when a buffered write lands over a shared extent
that crosses an extent size hint boundary and that also happens to
have a partial COW reservation that doesn't cover the start and end
blocks of the data fork extent.

For example, a buffered write occurs across the file offset (in FSB
units) range of [29, 57]. A shared extent exists at blocks [29, 35]
and COW reservation already exists at blocks [32, 34]. After
accommodating a COW extent size hint of 32 blocks and the existing
reservation at offset 32, xfs_reflink_reserve_cow() allocates 32
blocks of reservation at offset 0 and returns with COW reservation
across the range of [0, 34]. The associated data fork extent is
still [29, 35], however, which isn't fully covered by the COW
reservation.

This leads to a buffered write at file offset 35 over a shared
extent without associated COW reservation. Writeback eventually
kicks in, performs an overwrite of the underlying shared block and
causes the associated data corruption.

Update xfs_reflink_reserve_cow() to accommodate the fact that a
delalloc allocation request may not fully cover the extent in the
data fork. Trim the data fork extent appropriately, just as is done
for shared extent boundaries and/or existing COW reservations that
happen to overlap the start of the data fork extent. This prevents
shared/010 failures due to data corruption on reflink enabled
filesystems.
Signed-off-by: NBrian Foster <bfoster@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

59e42931

30 10月, 2018 12 次提交

xfs: remove [cm]time update from reflink calls · bf4a1fcf

由 Darrick J. Wong 提交于 10月 30, 2018

Now that the vfs remap helper dirties the inode [cm]time for us, xfs no
longer needs to do that on its own.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NDave Chinner <david@fromorbit.com>

bf4a1fcf

xfs: remove xfs_reflink_remap_range · 3fc9f5e4

由 Darrick J. Wong 提交于 10月 30, 2018

Since xfs_file_remap_range is a thin wrapper, move the contents of
xfs_reflink_remap_range into the shell.  This cuts down on the vfs
calls being made from internal xfs code.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NDave Chinner <david@fromorbit.com>

3fc9f5e4

xfs: remove redundant remap partial EOF block checks · 7a6ccf00

由 Darrick J. Wong 提交于 10月 30, 2018

Now that we've moved the partial EOF block checks to the VFS helpers, we
can remove the redundant functionality from XFS.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NDave Chinner <david@fromorbit.com>

7a6ccf00

xfs: support returning partial reflink results · 3f68c1f5

由 Darrick J. Wong 提交于 10月 30, 2018

Back when the XFS reflink code only supported clone_file_range, we were
only able to return zero or negative error codes to userspace. However,
now that copy_file_range (which returns bytes copied) can use XFS'
clone_file_range, we have the opportunity to return partial results.
For example, if userspace sends a 1GB clone request and we run out of
space halfway through, we at least can tell userspace that we completed
512M of that request like a regular write.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NDave Chinner <david@fromorbit.com>

3f68c1f5

xfs: clean up xfs_reflink_remap_blocks call site · 9f04aaff

由 Darrick J. Wong 提交于 10月 30, 2018

Move the offset <-> blocks unit conversions into
xfs_reflink_remap_blocks to make the call site less ugly.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NDave Chinner <david@fromorbit.com>

9f04aaff

xfs: fix pagecache truncation prior to reflink · 4918ef4e

由 Darrick J. Wong 提交于 10月 30, 2018

Prior to remapping blocks, it is necessary to remove pages from the
destination file's page cache.  Unfortunately, the truncation is not
aggressive enough -- if page size > block size, we'll end up zeroing
subpage blocks instead of removing them.  So, round the start offset
down and the end offset up to page boundaries.  We already wrote all
the dirty data so the larger range shouldn't be a problem.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NDave Chinner <david@fromorbit.com>

4918ef4e

vfs: clean up generic_remap_file_range_prep return value · 8c5c836b

由 Darrick J. Wong 提交于 10月 30, 2018

Since the remap prep function can update the length of the remap
request, we can change this function to return the usual return status
instead of the odd behavior it has now.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NDave Chinner <david@fromorbit.com>

8c5c836b

vfs: make remap_file_range functions take and return bytes completed · 42ec3d4c

由 Darrick J. Wong 提交于 10月 30, 2018

Change the remap_file_range functions to take a number of bytes to
operate upon and return the number of bytes they operated on.  This is a
requirement for allowing fs implementations to return short clone/dedupe
results to the user, which will enable us to obey resource limits in a
graceful manner.

A subsequent patch will enable copy_file_range to signal to the
->clone_file_range implementation that it can handle a short length,
which will be returned in the function's return value.  For now the
short return is not implemented anywhere so the behavior won't change --
either copy_file_range manages to clone the entire range or it tries an
alternative.

Neither clone ioctl can take advantage of this, alas.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NAmir Goldstein <amir73il@gmail.com>
Signed-off-by: NDave Chinner <david@fromorbit.com>

42ec3d4c

vfs: remap helper should update destination inode metadata · 8dde90bc

由 Darrick J. Wong 提交于 10月 30, 2018

Extend generic_remap_file_range_prep to handle inode metadata updates
when remapping into a file.  If the operation can possibly alter the
file contents, we must update the ctime and mtime and remove security
privileges, just like we do for regular file writes.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NAmir Goldstein <amir73il@gmail.com>
Signed-off-by: NDave Chinner <david@fromorbit.com>

8dde90bc

vfs: pass remap flags to generic_remap_file_range_prep · a91ae49b

由 Darrick J. Wong 提交于 10月 30, 2018

Plumb the remap flags through the filesystem from the vfs function
dispatcher all the way to the prep function to prepare for behavior
changes in subsequent patches.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NAmir Goldstein <amir73il@gmail.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NDave Chinner <david@fromorbit.com>

a91ae49b

vfs: rename vfs_clone_file_prep to be more descriptive · a83ab01a

由 Darrick J. Wong 提交于 10月 30, 2018

The vfs_clone_file_prep is a generic function to be called by filesystem
implementations only.  Rename the prefix to generic_ and make it more
clear that it applies to remap operations, not just clones.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NAmir Goldstein <amir73il@gmail.com>
Signed-off-by: NDave Chinner <david@fromorbit.com>

a83ab01a

vfs: check file ranges before cloning files · 1383a7ed

由 Darrick J. Wong 提交于 10月 30, 2018

Move the file range checks from vfs_clone_file_prep into a separate
generic_remap_checks function so that all the checks are collected in a
central location.  This forms the basis for adding more checks from
generic_write_checks that will make cloning's input checking more
consistent with write input checking.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NAmir Goldstein <amir73il@gmail.com>
Signed-off-by: NDave Chinner <david@fromorbit.com>

1383a7ed

18 10月, 2018 3 次提交

xfs: fix fork selection in xfs_find_trim_cow_extent · 032dc923

由 Christoph Hellwig 提交于 10月 18, 2018

We should want to write directly into the data fork for blocks that don't
have an extent in the COW fork covering them yet.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NBrian Foster <bfoster@redhat.com>
Signed-off-by: NDave Chinner <david@fromorbit.com>

032dc923

xfs: remove the unused trimmed argument from xfs_reflink_trim_around_shared · d392bc81

由 Christoph Hellwig 提交于 10月 18, 2018

Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NBrian Foster <bfoster@redhat.com>
Signed-off-by: NDave Chinner <david@fromorbit.com>

d392bc81

xfs: remove the unused shared argument to xfs_reflink_reserve_cow · fc439464

由 Christoph Hellwig 提交于 10月 18, 2018

Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NBrian Foster <bfoster@redhat.com>
Signed-off-by: NDave Chinner <david@fromorbit.com>

fc439464

06 10月, 2018 2 次提交

xfs: fix data corruption w/ unaligned reflink ranges · b3998900

由 Dave Chinner 提交于 10月 06, 2018

When reflinking sub-file ranges, a data corruption can occur when
the source file range includes a partial EOF block. This shares the
unknown data beyond EOF into the second file at a position inside
EOF, exposing stale data in the second file.

XFS only supports whole block sharing, but we still need to
support whole file reflink correctly.  Hence if the reflink
request includes the last block of the souce file, only proceed with
the reflink operation if it lands at or past the destination file's
current EOF. If it lands within the destination file EOF, reject the
entire request with -EINVAL and make the caller go the hard way.

This avoids the data corruption vector, but also avoids disruption
of returning EINVAL to userspace for the common case of whole file
cloning.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDave Chinner <david@fromorbit.com>

b3998900

xfs: fix data corruption w/ unaligned dedupe ranges · dceeb47b

由 Dave Chinner 提交于 10月 06, 2018

A deduplication data corruption is Exposed by fstests generic/505 on
XFS. It is caused by extending the block match range to include the
partial EOF block, but then allowing unknown data beyond EOF to be
considered a "match" to data in the destination file because the
comparison is only made to the end of the source file. This corrupts
the destination file when the source extent is shared with it.

XFS only supports whole block dedupe, but we still need to appear to
support whole file dedupe correctly. Hence if the dedupe request
includes the last block of the souce file, don't include it in the
actual XFS dedupe operation. If the rest of the range dedupes
successfully, then report the partial last block as deduped, too, so
that userspace sees it as a successful dedupe rather than return
EINVAL because we can't dedupe unaligned blocks.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDave Chinner <david@fromorbit.com>

dceeb47b

05 10月, 2018 3 次提交

xfs: update ctime and remove suid before cloning files · 7debbf01

由 Darrick J. Wong 提交于 10月 05, 2018

Before cloning into a file, update the ctime and remove sensitive
attributes like suid, just like we'd do for a regular file write.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NDave Chinner <david@fromorbit.com>

7debbf01

xfs: zero posteof blocks when cloning above eof · 410fdc72

由 Darrick J. Wong 提交于 10月 05, 2018

When we're reflinking between two files and the destination file range
is well beyond the destination file's EOF marker, zero any posteof
speculative preallocations in the destination file so that we don't
expose stale disk contents.  The previous strategy of trying to clear
the preallocations does not work if the destination file has the
PREALLOC flag set.

Uncovered by shared/010.
Reported-by: NZorro Lang <zlang@redhat.com>
Bugzilla-id: https://bugzilla.kernel.org/show_bug.cgi?id=201259Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NDave Chinner <david@fromorbit.com>

410fdc72

xfs: refactor clonerange preparation into a separate helper · 0d41e1d2

由 Darrick J. Wong 提交于 10月 05, 2018

Refactor all the reflink preparation steps into a separate helper
that we'll use to land all the upcoming fixes for insufficient input
checks.

This rework also moves the invalidation of the destination range to
the prep function so that it is done before the range is remapped.
This ensures that nobody can access the data in range being remapped
until the remap is complete.

[dgc: fix xfs_reflink_remap_prep() return value and caller check to
handle vfs_clone_file_prep_inodes() returning 0 to mean "nothing to
do". ]

[dgc: make sure length changed by vfs_clone_file_prep_inodes() gets
propagated back to XFS code that does the remapping. ]
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NDave Chinner <david@fromorbit.com>

0d41e1d2

29 9月, 2018 1 次提交

xfs: skip delalloc COW blocks in xfs_reflink_end_cow · f5f3f959

由 Christoph Hellwig 提交于 9月 29, 2018

The iomap direct I/O code issues a single ->end_io call for the whole
I/O request, and if some of the extents cowered needed a COW operation
it will call xfs_reflink_end_cow over the whole range.

When we do AIO writes we drop the iolock after doing the initial setup,
but before the I/O completion.  Between dropping the lock and completing
the I/O we can have a racing buffered write create new delalloc COW fork
extents in the region covered by the outstanding direct I/O write, and
thus see delalloc COW fork extents in xfs_reflink_end_cow.  As
concurrent writes are fundamentally racy and no guarantees are given we
can simply skip those.

This can be easily reproduced with xfstests generic/208 in always_cow
mode.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDave Chinner <david@fromorbit.com>

f5f3f959

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功