提交 · 168eae803cede459d67ed0ab3ddb19539700a78a · openeuler / Kernel

07 7月, 2020 6 次提交

xfs: reflink can skip remap existing mappings · 168eae80

由 Darrick J. Wong 提交于 6月 29, 2020

If the source and destination map are identical, we can skip the remap
step to save some time.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NBrian Foster <bfoster@redhat.com>

168eae80

xfs: only reserve quota blocks if we're mapping into a hole · 94b941fd

由 Darrick J. Wong 提交于 6月 29, 2020

When logging quota block count updates during a reflink operation, we
only log the /delta/ of the block count changes to the dquot. Since we
now know ahead of time the extent type of both dmap and smap (and that
they have the same length), we know that we only need to reserve quota
blocks for dmap's blockcount if we're mapping it into a hole.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NBrian Foster <bfoster@redhat.com>

94b941fd

xfs: only reserve quota blocks for bmbt changes if we're changing the data fork · aa5d0ba0

由 Darrick J. Wong 提交于 6月 29, 2020

Now that we've reworked xfs_reflink_remap_extent to remap only one
extent per transaction, we actually know if the extent being removed is
an allocated mapping.  This means that we now know ahead of time if
we're going to be touching the data fork.

Since we only need blocks for a bmbt split if we're going to update the
data fork, we only need to get quota reservation if we know we're going
to touch the data fork.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NBrian Foster <bfoster@redhat.com>

aa5d0ba0

xfs: redesign the reflink remap loop to fix blkres depletion crash · 00fd1d56

由 Darrick J. Wong 提交于 6月 29, 2020

The existing reflink remapping loop has some structural problems that
need addressing:

The biggest problem is that we create one transaction for each extent in
the source file without accounting for the number of mappings there are
for the same range in the destination file.  In other words, we don't
know the number of remap operations that will be necessary and we
therefore cannot guess the block reservation required.  On highly
fragmented filesystems (e.g. ones with active dedupe) we guess wrong,
run out of block reservation, and fail.

The second problem is that we don't actually use the bmap intents to
their full potential -- instead of calling bunmapi directly and having
to deal with its backwards operation, we could call the deferred ops
xfs_bmap_unmap_extent and xfs_refcount_decrease_extent instead.  This
makes the frontend loop much simpler.

Solve all of these problems by refactoring the remapping loops so that
we only perform one remapping operation per transaction, and each
operation only tries to remap a single extent from source to dest.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NBrian Foster <bfoster@redhat.com>
Reported-by: NEdwin Török <edwin@etorok.net>
Tested-by: NEdwin Török <edwin@etorok.net>

00fd1d56

xfs: rename xfs_bmap_is_real_extent to is_written_extent · 877f58f5

由 Darrick J. Wong 提交于 6月 29, 2020

The name of this predicate is a little misleading -- it decides if the
extent mapping is allocated and written. Change the name to be more
direct, as we're going to add a new predicate in the next patch.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NBrian Foster <bfoster@redhat.com>

877f58f5

xfs: fix reflink quota reservation accounting error · 83895227

由 Darrick J. Wong 提交于 6月 29, 2020

Quota reservations are supposed to account for the blocks that might be
allocated due to a bmap btree split.  Reflink doesn't do this, so fix
this to make the quota accounting more accurate before we start
rearranging things.

Fixes: 862bb360 ("xfs: reflink extents from one file to another")
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NBrian Foster <bfoster@redhat.com>

83895227

13 4月, 2020 1 次提交

xfs: fix partially uninitialized structure in xfs_reflink_remap_extent · c142932c

由 Darrick J. Wong 提交于 4月 12, 2020

In the reflink extent remap function, it turns out that uirec (the block
mapping corresponding only to the part of the passed-in mapping that got
unmapped) was not fully initialized.  Specifically, br_state was not
being copied from the passed-in struct to the uirec.  This could lead to
unpredictable results such as the reflinked mapping being marked
unwritten in the destination file.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NBrian Foster <bfoster@redhat.com>

c142932c

27 1月, 2020 1 次提交

xfs: remove unnecessary null pointer checks from _read_agf callers · 706b8c5b

由 Darrick J. Wong 提交于 1月 23, 2020

Drop the null buffer pointer checks in all code that calls
xfs_alloc_read_agf and doesn't pass XFS_ALLOC_FLAG_TRYLOCK because
they're no longer necessary.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDave Chinner <dchinner@redhat.com>

706b8c5b

21 1月, 2020 1 次提交

xfs: change return value of xfs_inode_need_cow to int · aa124436

由 zhengbin 提交于 1月 20, 2020

Fixes coccicheck warning:

fs/xfs/xfs_reflink.c:236:9-10: WARNING: return of 0/1 in function 'xfs_inode_need_cow' with return type bool
Reported-by: NHulk Robot <hulkci@huawei.com>
Signed-off-by: Nzhengbin <zhengbin13@huawei.com>
[darrick: rename the function so it doesn't sound like a predicate]
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

aa124436

15 1月, 2020 1 次提交

xfs: introduce XFS_MAX_FILEOFF · a5084865

由 Darrick J. Wong 提交于 1月 02, 2020

Introduce a new #define for the maximum supported file block offset.
We'll use this in the next patch to make it more obvious that we're
doing some operation for all possible inode fork mappings after a given
offset.  We can't use ULLONG_MAX here because bunmapi uses that to
detect when it's done.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

a5084865

24 10月, 2019 1 次提交

xfs: don't set bmapi total block req where minleft is · da781e64

由 Brian Foster 提交于 10月 21, 2019

xfs_bmapi_write() takes a total block requirement parameter that is
passed down to the block allocation code and is used to specify the
total block requirement of the associated transaction. This is used
to try and select an AG that can not only satisfy the requested
extent allocation, but can also accommodate subsequent allocations
that might be required to complete the transaction. For example,
additional bmbt block allocations may be required on insertion of
the resulting extent to an inode data fork.

While it's important for callers to calculate and reserve such extra
blocks in the transaction, it is not necessary to pass the total
value to xfs_bmapi_write() in all cases. The latter automatically
sets minleft to ensure that sufficient free blocks remain after the
allocation attempt to expand the format of the associated inode
(i.e., such as extent to btree conversion, btree splits, etc).
Therefore, any callers that pass a total block requirement of the
bmap mapping length plus worst case bmbt expansion essentially
specify the additional reservation requirement twice. These callers
can pass a total of zero to rely on the bmapi minleft policy.

Beyond being superfluous, the primary motivation for this change is
that the total reservation logic in the bmbt code is dubious in
scenarios where minlen < maxlen and a maxlen extent cannot be
allocated (which is more common for data extent allocations where
contiguity is not required). The total value is based on maxlen in
the xfs_bmapi_write() caller. If the bmbt code falls back to an
allocation between minlen and maxlen, that allocation will not
succeed until total is reset to minlen, which essentially throws
away any additional reservation included in total by the caller. In
addition, the total value is not reset until after alignment is
dropped, which means that such callers drop alignment far too
aggressively than necessary.

Update all callers of xfs_bmapi_write() that pass a total block
value of the mapping length plus bmbt reservation to instead pass
zero and rely on xfs_bmapi_minleft() to enforce the bmbt reservation
requirement. This trades off slightly less conservative AG selection
for the ability to preserve alignment in more scenarios.
xfs_bmapi_write() callers that incorporate unrelated or additional
reservations in total beyond what is already included in minleft
must continue to use the former.
Signed-off-by: NBrian Foster <bfoster@redhat.com>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

da781e64

22 10月, 2019 3 次提交

xfs: split the iomap ops for buffered vs direct writes · f150b423

由 Christoph Hellwig 提交于 10月 19, 2019

Instead of lots of magic conditionals in the main write_begin
handler this make the intent very clear.  Thing will become even
better once we support delayed allocations for extent size hints
and realtime allocations.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

f150b423

xfs: pass two imaps to xfs_reflink_allocate_cow · ffb375a8

由 Christoph Hellwig 提交于 10月 19, 2019

xfs_reflink_allocate_cow consumes the source data fork imap, and
potentially returns the COW fork imap.  Split the arguments in two
to clear up the calling conventions and to prepare for returning
a source iomap from ->iomap_begin.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

ffb375a8

xfs: remove xfs_reflink_dirty_extents · dd26b846

由 Christoph Hellwig 提交于 10月 19, 2019

Now that xfs_file_unshare is not completely dumb we can just call it
directly without iterating the extent and reflink btrees ourselves.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

dd26b846

21 10月, 2019 1 次提交

iomap: ignore non-shared or non-data blocks in xfs_file_dirty · 3590c4d8

由 Christoph Hellwig 提交于 10月 18, 2019

xfs_file_dirty is used to unshare reflink blocks.  Rename the function
to xfs_file_unshare to better document that purpose, and skip iomaps
that are not shared and don't need zeroing.  This will allow to simplify
the caller.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

3590c4d8

28 8月, 2019 2 次提交

xfs: remove unnecessary int returns from deferred bmap functions · 3e08f42a

由 Darrick J. Wong 提交于 8月 26, 2019

Remove the return value from the functions that schedule deferred bmap
operations since they never fail and do not return status.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NDave Chinner <dchinner@redhat.com>

3e08f42a

xfs: remove unnecessary int returns from deferred refcount functions · 74b4c5d4

由 Darrick J. Wong 提交于 8月 26, 2019

Remove the return value from the functions that schedule deferred
refcount operations since they never fail and do not return status.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NDave Chinner <dchinner@redhat.com>

74b4c5d4

19 8月, 2019 1 次提交

xfs: fix reflink source file racing with directio writes · 5d888b48

由 Darrick J. Wong 提交于 8月 14, 2019

While trawling through the dedupe file comparison code trying to fix
page deadlocking problems, Dave Chinner noticed that the reflink code
only takes shared IOLOCK/MMAPLOCKs on the source file. Because
page_mkwrite and directio writes do not take the EXCL versions of those
locks, this means that reflink can race with writer processes.

For pure remapping this can lead to undefined behavior and file
corruption; for dedupe this means that we cannot be sure that the
contents are identical when we decide to go ahead with the remapping.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

5d888b48

01 7月, 2019 1 次提交

xfs: remove XFS_TRANS_NOFS · 73d30d48

由 Christoph Hellwig 提交于 6月 28, 2019

Instead of a magic flag for xfs_trans_alloc, just ensure all callers
that can't relclaim through the file system use memalloc_nofs_save to
set the per-task nofs flag.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

73d30d48

29 6月, 2019 1 次提交

xfs: remove unused header files · 250d4b4c

由 Eric Sandeen 提交于 6月 28, 2019

There are many, many xfs header files which are included but
unneeded (or included twice) in the xfs code, so remove them.

nb: xfs_linux.h includes about 9 headers for everyone, so those
explicit includes get removed by this.  I'm not sure what the
preference is, but if we wanted explicit includes everywhere,
a followup patch could remove those xfs_*.h includes from
xfs_linux.h and move them into the files that need them.
Or it could be left as-is.
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

250d4b4c

26 2月, 2019 2 次提交

xfs: fix uninitialized error variables · c1a4447f

由 Darrick J. Wong 提交于 2月 25, 2019

smatch complained about some uninitialized error returns, so fix those.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NAllison Henderson <allison.henderson@oracle.com>

c1a4447f

xfs: don't pass iomap flags to xfs_reflink_allocate_cow · affe250a

由 Darrick J. Wong 提交于 2月 21, 2019

Don't pass raw iomap flags to xfs_reflink_allocate_cow; signal our
intention with a boolean argument.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NBrian Foster <bfoster@redhat.com>

affe250a

21 2月, 2019 4 次提交

xfs: introduce an always_cow mode · 66ae56a5

由 Christoph Hellwig 提交于 2月 18, 2019

Add a mode where XFS never overwrites existing blocks in place.  This
is to aid debugging our COW code, and also put infatructure in place
for things like possible future support for zoned block devices, which
can't support overwrites.

This mode is enabled globally by doing a:

    echo 1 > /sys/fs/xfs/debug/always_cow

Note that the parameter is global to allow running all tests in xfstests
easily in this mode, which would not easily be possible with a per-fs
sysfs file.

In always_cow mode persistent preallocations are disabled, and fallocate
will fail when called with a 0 mode (with our without
FALLOC_FL_KEEP_SIZE), and not create unwritten extent for zeroed space
when called with FALLOC_FL_ZERO_RANGE or FALLOC_FL_UNSHARE_RANGE.

There are a few interesting xfstests failures when run in always_cow
mode:

 - generic/392 fails because the bytes used in the file used to test
   hole punch recovery are less after the log replay.  This is
   because the blocks written and then punched out are only freed
   with a delay due to the logging mechanism.
 - xfs/170 will fail as the already fragile file streams mechanism
   doesn't seem to interact well with the COW allocator
 - xfs/180 xfs/182 xfs/192 xfs/198 xfs/204 and xfs/208 will claim
   the file system is badly fragmented, but there is not much we
   can do to avoid that when always writing out of place
 - xfs/205 fails because overwriting a file in always_cow mode
   will require new space allocation and the assumption in the
   test thus don't work anymore.
 - xfs/326 fails to modify the file at all in always_cow mode after
   injecting the refcount error, leading to an unexpected md5sum
   after the remount, but that again is expected
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

66ae56a5

xfs: make COW fork unwritten extent conversions more robust · 26b91c72

由 Christoph Hellwig 提交于 2月 18, 2019

If we have racing buffered and direct I/O COW fork extents under
writeback can have been moved to the data fork by the time we call
xfs_reflink_convert_cow from xfs_submit_ioend.  This would be mostly
harmless as the block numbers don't change by this move, except for
the fact that xfs_bmapi_write will crash or trigger asserts when
not finding existing extents, even despite trying to paper over this
with the XFS_BMAPI_CONVERT_ONLY flag.

Instead of special casing non-transaction conversions in the already
way too complicated xfs_bmapi_write just add a new helper for the much
simpler non-transactional COW fork case, which simplify ignores not
found extents.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

26b91c72

xfs: merge COW handling into xfs_file_iomap_begin_delay · db46e604

由 Christoph Hellwig 提交于 2月 18, 2019

Besides simplifying the code a bit this allows to actually implement
the behavior of using COW preallocation for non-COW data mentioned
in the current comments.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

db46e604

xfs: don't use delalloc extents for COW on files with extsize hints · 78f0cc9d

由 Christoph Hellwig 提交于 2月 18, 2019

While using delalloc for extsize hints is generally a good idea, the
current code that does so only for COW doesn't help us much and creates
a lot of special cases.  Switch it to use real allocations like we
do for direct I/O.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

78f0cc9d

18 2月, 2019 1 次提交

xfs: remove the io_type field from the writeback context and ioend · be225fec

由 Christoph Hellwig 提交于 2月 15, 2019

The io_type field contains what is basically a summary of information
from the inode fork and the imap.  But we can just as easily use that
information directly, simplifying a few bits here and there and
improving the trace points.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NBrian Foster <bfoster@redhat.com>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

be225fec

13 12月, 2018 1 次提交

xfs: split up the xfs_reflink_end_cow work into smaller transactions · d6f215f3

由 Darrick J. Wong 提交于 12月 12, 2018

In xfs_reflink_end_cow, we allocate a single transaction for the entire
end_cow operation and then loop the CoW fork mappings to move them to
the data fork.  This design fails on a heavily fragmented filesystem
where an inode's data fork has exactly one more extent than would fit in
an extents-format fork, because the unmap can collapse the data fork
into extents format (freeing the bmbt block) but the remap can expand
the data fork back into a (newly allocated) bmbt block.  If the number
of extents we end up remapping is large, we can overflow the block
reservation because we reserved blocks assuming that we were adding
mappings into an already-cleared area of the data fork.

Let's say we have 8 extents in the data fork, 8 extents in the CoW fork,
and the data fork can hold at most 7 extents before needing to convert
to btree format; and that blocks A-P are discontiguous single-block
extents:

   0......7
D: ABCDEFGH
C: IJKLMNOP

When a write to file blocks 0-7 completes, we must remap I-P into the
data fork.  We start by removing H from the btree-format data fork.  Now
we have 7 extents, so we convert the fork to extents format, freeing the
bmbt block.   We then move P into the data fork and it now has 8 extents
again.  We must convert the data fork back to btree format, requiring a
block allocation.  If we repeat this sequence for blocks 6-5-4-3-2-1-0,
we'll need a total of 8 block allocations to remap all 8 blocks.  We
reserved only enough blocks to handle one btree split (5 blocks on a 4k
block filesystem), which means we overflow the block reservation.

To fix this issue, create a separate helper function to remap a single
extent, and change _reflink_end_cow to call it in a tight loop over the
entire range we're completing.  As a side effect this also removes the
size restrictions on how many extents we can end_cow at a time, though
nobody ever hit that.  It is not reasonable to reserve N blocks to remap
N blocks.

Note that this can be reproduced after ~320 million fsx ops while
running generic/938 (long soak directio fsx exerciser):

XFS: Assertion failed: tp->t_blk_res >= tp->t_blk_res_used, file: fs/xfs/xfs_trans.c, line: 116
<machine registers snipped>
Call Trace:
 xfs_trans_dup+0x211/0x250 [xfs]
 xfs_trans_roll+0x6d/0x180 [xfs]
 xfs_defer_trans_roll+0x10c/0x3b0 [xfs]
 xfs_defer_finish_noroll+0xdf/0x740 [xfs]
 xfs_defer_finish+0x13/0x70 [xfs]
 xfs_reflink_end_cow+0x2c6/0x680 [xfs]
 xfs_dio_write_end_io+0x115/0x220 [xfs]
 iomap_dio_complete+0x3f/0x130
 iomap_dio_rw+0x3c3/0x420
 xfs_file_dio_aio_write+0x132/0x3c0 [xfs]
 xfs_file_write_iter+0x8b/0xc0 [xfs]
 __vfs_write+0x193/0x1f0
 vfs_write+0xba/0x1c0
 ksys_write+0x52/0xc0
 do_syscall_64+0x50/0x160
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NBrian Foster <bfoster@redhat.com>

d6f215f3

22 11月, 2018 1 次提交

xfs: flush removing page cache in xfs_reflink_remap_prep · 2c307174

由 Dave Chinner 提交于 11月 19, 2018

On a sub-page block size filesystem, fsx is failing with a data
corruption after a series of operations involving copying a file
with the destination offset beyond EOF of the destination of the file:

8093(157 mod 256): TRUNCATE DOWN from 0x7a120 to 0x50000 ******WWWW
8094(158 mod 256): INSERT 0x25000 thru 0x25fff (0x1000 bytes)
8095(159 mod 256): COPY 0x18000 thru 0x1afff (0x3000 bytes) to 0x2f400
8096(160 mod 256): WRITE 0x5da00 thru 0x651ff (0x7800 bytes) HOLE
8097(161 mod 256): COPY 0x2000 thru 0x5fff (0x4000 bytes) to 0x6fc00

The second copy here is beyond EOF, and it is to sub-page (4k) but
block aligned (1k) offset. The clone runs the EOF zeroing, landing
in a pre-existing post-eof delalloc extent. This zeroes the post-eof
extents in the page cache just fine, dirtying the pages correctly.

The problem is that xfs_reflink_remap_prep() now truncates the page
cache over the range that it is copying it to, and rounds that down
to cover the entire start page. This removes the dirty page over the
delalloc extent from the page cache without having written it back.
Hence later, when the page cache is flushed, the page at offset
0x6f000 has not been written back and hence exposes stale data,
which fsx trips over less than 10 operations later.

Fix this by changing xfs_reflink_remap_prep() to use
xfs_flush_unmap_range().
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

2c307174

20 11月, 2018 1 次提交

xfs: fix shared extent data corruption due to missing cow reservation · 59e42931

由 Brian Foster 提交于 11月 14, 2018

Page writeback indirectly handles shared extents via the existence
of overlapping COW fork blocks. If COW fork blocks exist, writeback
always performs the associated copy-on-write regardless if the
underlying blocks are actually shared. If the blocks are shared,
then overlapping COW fork blocks must always exist.

fstests shared/010 reproduces a case where a buffered write occurs
over a shared block without performing the requisite COW fork
reservation.  This ultimately causes writeback to the shared extent
and data corruption that is detected across md5 checks of the
filesystem across a mount cycle.

The problem occurs when a buffered write lands over a shared extent
that crosses an extent size hint boundary and that also happens to
have a partial COW reservation that doesn't cover the start and end
blocks of the data fork extent.

For example, a buffered write occurs across the file offset (in FSB
units) range of [29, 57]. A shared extent exists at blocks [29, 35]
and COW reservation already exists at blocks [32, 34]. After
accommodating a COW extent size hint of 32 blocks and the existing
reservation at offset 32, xfs_reflink_reserve_cow() allocates 32
blocks of reservation at offset 0 and returns with COW reservation
across the range of [0, 34]. The associated data fork extent is
still [29, 35], however, which isn't fully covered by the COW
reservation.

This leads to a buffered write at file offset 35 over a shared
extent without associated COW reservation. Writeback eventually
kicks in, performs an overwrite of the underlying shared block and
causes the associated data corruption.

Update xfs_reflink_reserve_cow() to accommodate the fact that a
delalloc allocation request may not fully cover the extent in the
data fork. Trim the data fork extent appropriately, just as is done
for shared extent boundaries and/or existing COW reservations that
happen to overlap the start of the data fork extent. This prevents
shared/010 failures due to data corruption on reflink enabled
filesystems.
Signed-off-by: NBrian Foster <bfoster@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

59e42931

30 10月, 2018 10 次提交

xfs: remove [cm]time update from reflink calls · bf4a1fcf

由 Darrick J. Wong 提交于 10月 30, 2018

Now that the vfs remap helper dirties the inode [cm]time for us, xfs no
longer needs to do that on its own.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NDave Chinner <david@fromorbit.com>

bf4a1fcf

xfs: remove xfs_reflink_remap_range · 3fc9f5e4

由 Darrick J. Wong 提交于 10月 30, 2018

Since xfs_file_remap_range is a thin wrapper, move the contents of
xfs_reflink_remap_range into the shell.  This cuts down on the vfs
calls being made from internal xfs code.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NDave Chinner <david@fromorbit.com>

3fc9f5e4

xfs: remove redundant remap partial EOF block checks · 7a6ccf00

由 Darrick J. Wong 提交于 10月 30, 2018

Now that we've moved the partial EOF block checks to the VFS helpers, we
can remove the redundant functionality from XFS.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NDave Chinner <david@fromorbit.com>

7a6ccf00

xfs: support returning partial reflink results · 3f68c1f5

由 Darrick J. Wong 提交于 10月 30, 2018

Back when the XFS reflink code only supported clone_file_range, we were
only able to return zero or negative error codes to userspace. However,
now that copy_file_range (which returns bytes copied) can use XFS'
clone_file_range, we have the opportunity to return partial results.
For example, if userspace sends a 1GB clone request and we run out of
space halfway through, we at least can tell userspace that we completed
512M of that request like a regular write.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NDave Chinner <david@fromorbit.com>

3f68c1f5

xfs: clean up xfs_reflink_remap_blocks call site · 9f04aaff

由 Darrick J. Wong 提交于 10月 30, 2018

Move the offset <-> blocks unit conversions into
xfs_reflink_remap_blocks to make the call site less ugly.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NDave Chinner <david@fromorbit.com>

9f04aaff

xfs: fix pagecache truncation prior to reflink · 4918ef4e

由 Darrick J. Wong 提交于 10月 30, 2018

Prior to remapping blocks, it is necessary to remove pages from the
destination file's page cache.  Unfortunately, the truncation is not
aggressive enough -- if page size > block size, we'll end up zeroing
subpage blocks instead of removing them.  So, round the start offset
down and the end offset up to page boundaries.  We already wrote all
the dirty data so the larger range shouldn't be a problem.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NDave Chinner <david@fromorbit.com>

4918ef4e

vfs: clean up generic_remap_file_range_prep return value · 8c5c836b

由 Darrick J. Wong 提交于 10月 30, 2018

Since the remap prep function can update the length of the remap
request, we can change this function to return the usual return status
instead of the odd behavior it has now.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NDave Chinner <david@fromorbit.com>

8c5c836b

vfs: make remap_file_range functions take and return bytes completed · 42ec3d4c

由 Darrick J. Wong 提交于 10月 30, 2018

Change the remap_file_range functions to take a number of bytes to
operate upon and return the number of bytes they operated on.  This is a
requirement for allowing fs implementations to return short clone/dedupe
results to the user, which will enable us to obey resource limits in a
graceful manner.

A subsequent patch will enable copy_file_range to signal to the
->clone_file_range implementation that it can handle a short length,
which will be returned in the function's return value.  For now the
short return is not implemented anywhere so the behavior won't change --
either copy_file_range manages to clone the entire range or it tries an
alternative.

Neither clone ioctl can take advantage of this, alas.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NAmir Goldstein <amir73il@gmail.com>
Signed-off-by: NDave Chinner <david@fromorbit.com>

42ec3d4c

vfs: remap helper should update destination inode metadata · 8dde90bc

由 Darrick J. Wong 提交于 10月 30, 2018

Extend generic_remap_file_range_prep to handle inode metadata updates
when remapping into a file.  If the operation can possibly alter the
file contents, we must update the ctime and mtime and remove security
privileges, just like we do for regular file writes.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NAmir Goldstein <amir73il@gmail.com>
Signed-off-by: NDave Chinner <david@fromorbit.com>

8dde90bc

vfs: pass remap flags to generic_remap_file_range_prep · a91ae49b

由 Darrick J. Wong 提交于 10月 30, 2018

Plumb the remap flags through the filesystem from the vfs function
dispatcher all the way to the prep function to prepare for behavior
changes in subsequent patches.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NAmir Goldstein <amir73il@gmail.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NDave Chinner <david@fromorbit.com>

a91ae49b

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功