提交 · 39708c20ab51337c3eb282a824eb0aaff7ebe2e1 · openeuler / Kernel

12 2月, 2019 36 次提交

xfs: miscellaneous verifier magic value fixups · 39708c20

由 Brian Foster 提交于 2月 07, 2019

Most buffer verifiers have hardcoded magic value checks
conditionalized on the version of the filesystem. The magic value
field of the verifier structure facilitates abstraction of some of
this code. Populate the ->magic field of various verifiers to take
advantage of this abstraction. No functional changes.
Signed-off-by: NBrian Foster <bfoster@redhat.com>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

39708c20

xfs: use verifier magic field in dir2 leaf verifiers · 09f42019

由 Brian Foster 提交于 2月 07, 2019

The dir2 leaf verifiers share the same underlying structure
verification code, but implement six accessor functions to multiplex
the code across the two verifiers. Further, the magic value isn't
sufficiently abstracted such that the common helper has to manually
fix up the magic from the caller on v5 filesystems.

Use the magic field in the verifier structure to eliminate the
duplicate code and clean this all up. No functional change.
Signed-off-by: NBrian Foster <bfoster@redhat.com>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

09f42019

xfs: distinguish between bnobt and cntbt magic values · b8f89801

由 Brian Foster 提交于 2月 07, 2019

The allocation btree verifiers share code that is unable to detect
cross-tree magic value corruptions such as a bnobt block with a
cntbt magic value. Populate the b_ops->magic field of the associated
verifier structures such that the structure verifier can check the
magic value against the expected value based on tree type.

The btree level check requires knowledge of the tree type to
determine the appropriate maximum value. This was previously part of
the hardcoded magic value checks. With that code removed, peek at
the first magic value in the verifier to determine the expected tree
type of the current block.
Signed-off-by: NBrian Foster <bfoster@redhat.com>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

b8f89801

xfs: split up allocation btree verifier · 27df4f50

由 Brian Foster 提交于 2月 07, 2019

Similar to the inode btree verifier, the same allocation btree
verifier structure is shared between the by-bno (bnobt) and by-size
(cntbt) btrees. This prevents the ability to distinguish magic
values between them. Separate the verifier into two, one for each
tree, and assign them appropriately. No functional changes.
Signed-off-by: NBrian Foster <bfoster@redhat.com>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

27df4f50

xfs: distinguish between inobt and finobt magic values · 8473fee3

由 Brian Foster 提交于 2月 07, 2019

The inode btree verifier code is shared between the inode btree and
free inode btree because the underlying metadata formats are
essentially equivalent. A side effect of this is that the verifier
cannot determine whether a particular btree block should have an
inobt or finobt magic value.

This logic allows an unfortunate xfs_repair bug to escape detection
where certain level > 0 nodes of the finobt are stamped with inobt
magic by xfs_repair finobt reconstruction. This is fortunately not a
severe problem since the inode btree magic values do not contribute
to any changes in kernel behavior, but we do need a means to detect
and prevent this problem in the future.

Add a field to xfs_buf_ops to store the v4 and v5 superblock magic
values expected by a particular verifier. Add a helper to check an
on-disk magic value against the value expected by the verifier. Call
the helper from the shared [f]inobt verifier code for magic value
verification. This ensures that the inode btree blocks each have the
appropriate magic value based on specific tree type and superblock
version.
Signed-off-by: NBrian Foster <bfoster@redhat.com>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

8473fee3

xfs: create a separate finobt verifier · 01e68f40

由 Brian Foster 提交于 2月 07, 2019

The inobt verifier is reused for the inobt and finobt, which
prevents the ability to distinguish between magic values on a
per-tree basis. Create a separate finobt structure in preparation
for changes to enforce the appropriate magic value for the
associated tree. This patch has no functional change.
Signed-off-by: NBrian Foster <bfoster@redhat.com>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

01e68f40

xfs: always check magic values in on-disk byte order · e34d3e74

由 Brian Foster 提交于 2月 07, 2019

Most verifiers that check on-disk magic values convert the CPU
endian magic value constant to disk endian to facilitate compile
time optimization of the byte swap and reduce the need for runtime
byte swaps in buffer verifiers. Several buffer verifiers do not
follow this pattern. Update those verifiers for consistency.

Also fix up a random typo in the inode readahead verifier name.
Signed-off-by: NBrian Foster <bfoster@redhat.com>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

e34d3e74

xfs: clarify documentation for the function to reverify buffers · 75d02303

由 Brian Foster 提交于 2月 06, 2019

Improve the documentation around xfs_buf_ensure_ops, which is the
function that is responsible for cleaning up the b_ops state of buffers
that go through xrep_findroot_block but don't match anything.  Rename
the function to xfs_buf_reverify.

[darrick: this started off as bfoster mods of a previous patch of mine,
but the renaming part is now this separate patch.]
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NBrian Foster <bfoster@redhat.com>

75d02303

xfs: cache unlinked pointers in an rhashtable · 9b247179

由 Darrick J. Wong 提交于 2月 07, 2019

Use a rhashtable to cache the unlinked list incore.  This should speed
up unlinked processing considerably when there are a lot of inodes on
the unlinked list because iunlink_remove no longer has to traverse an
entire bucket list to find which inode points to the one being removed.

The incore list structure records "X.next_unlinked = Y" relations, with
the rhashtable using Y to index the records.  This makes finding the
inode X that points to a inode Y very quick.  If our cache fails to find
anything we can always fall back on the old method.

FWIW this drastically reduces the amount of time it takes to remove
inodes from the unlinked list.  I wrote a program to open a lot of
O_TMPFILE files and then close them in the same order, which takes
a very long time if we have to traverse the unlinked lists.  With the
ptach, I see:

+ /d/t/tmpfile/tmpfile
Opened 193531 files in 6.33s.
Closed 193531 files in 5.86s

real    0m12.192s
user    0m0.064s
sys     0m11.619s
+ cd /
+ umount /mnt

real    0m0.050s
user    0m0.004s
sys     0m0.030s

And without the patch:

+ /d/t/tmpfile/tmpfile
Opened 193588 files in 6.35s.
Closed 193588 files in 751.61s

real    12m38.853s
user    0m0.084s
sys     12m34.470s
+ cd /
+ umount /mnt

real    0m0.086s
user    0m0.000s
sys     0m0.060s
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NBrian Foster <bfoster@redhat.com>

9b247179

xfs: add tracepoints for high level iunlink operations · 4664c66c

由 Darrick J. Wong 提交于 2月 07, 2019

Add tracepoints so we can associate high level operations with low level
updates.  No functional changes.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NBrian Foster <bfoster@redhat.com>

4664c66c

xfs: refactor inode update in iunlink_remove · b1d2a068

由 Darrick J. Wong 提交于 2月 07, 2019

In xfs_iunlink_remove we have two identical calls to
xfs_iunlink_update_inode, so move it out of the if statement to simplify
the code some more.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NBrian Foster <bfoster@redhat.com>

b1d2a068

xfs: refactor unlinked list search and mapping to a separate function · 23ffa52c

由 Darrick J. Wong 提交于 2月 07, 2019

There's a loop that searches an unlinked bucket list to find the inode
that points to a given inode.  Hoist this into a separate function;
later we'll use our iunlink backref cache to bypass the slow list
operation.  No functional changes.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NBrian Foster <bfoster@redhat.com>

23ffa52c

xfs: refactor inode unlinked pointer update functions · f2fc16a3

由 Darrick J. Wong 提交于 2月 07, 2019

Hoist the functions that update an inode's unlinked pointer updates into
a helper.  No functional changes.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NBrian Foster <bfoster@redhat.com>

f2fc16a3

xfs: strengthen AGI unlinked inode bucket pointer checks · 86bfd375

由 Darrick J. Wong 提交于 2月 07, 2019

Strengthen our checking of the AGI unlinked pointers when we start to
use them for updating the metadata.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NBrian Foster <bfoster@redhat.com>

86bfd375

xfs: refactor AGI unlinked bucket updates · 9a4a5118

由 Darrick J. Wong 提交于 2月 07, 2019

Split the AGI unlinked bucket updates into a separate function.  No
functional changes.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NBrian Foster <bfoster@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

9a4a5118

xfs: add xfs_verify_agino_or_null helper · 7d36c195

由 Darrick J. Wong 提交于 2月 07, 2019

Add a new helper to check that a per-AG inode pointer is either null or
points somewhere valid within that AG.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NBrian Foster <bfoster@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

7d36c195

xfs: clean up iunlink functions · 5837f625

由 Darrick J. Wong 提交于 2月 07, 2019

Fix some indentation issues with the iunlink functions and reorganize
the tops of the functions to be identical.  No functional changes.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NBrian Foster <bfoster@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

5837f625

xfs: use the latest extent at writeback delalloc conversion time · c2b31643

由 Brian Foster 提交于 2月 01, 2019

The writeback delalloc conversion code is racy with respect to
changes in the currently cached file mapping outside of the current
page. This is because the ilock is cycled between the time the
caller originally looked up the mapping and across each real
allocation of the provided file range. This code has collected
various hacks over the years to help combat the symptoms of these
races (i.e., truncate race detection, allocation into hole
detection, etc.), but none address the fundamental problem that the
imap may not be valid at allocation time.

Rather than continue to use race detection hacks, update writeback
delalloc conversion to a model that explicitly converts the delalloc
extent backing the current file offset being processed. The current
file offset is the only block we can trust to remain once the ilock
is dropped because any operation that can remove the block
(truncate, hole punch, etc.) must flush and discard pagecache pages
first.

Modify xfs_iomap_write_allocate() to use the xfs_bmapi_delalloc()
mechanism to request allocation of the entire delalloc extent
backing the current offset instead of assuming the extent passed by
the caller is unchanged. Record the range specified by the caller
and apply it to the resulting allocated extent so previous checks by
the caller for COW fork overlap are not lost. Finally, overload the
bmapi delalloc flag with the range reval flag behavior since this is
the only use case for both.

This ensures that writeback always picks up the correct
and current extent associated with the page, regardless of races
with other extent modifying operations. If operating on a data fork
and the COW overlap state has changed since the ilock was cycled,
the caller revalidates against the COW fork sequence number before
using the imap for the next block.
Signed-off-by: NBrian Foster <bfoster@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

c2b31643

xfs: create delalloc bmapi wrapper for full extent allocation · 627209fb

由 Brian Foster 提交于 2月 01, 2019

The writeback delalloc conversion code is racy with respect to
changes in the currently cached file mapping. This stems from the
fact that the bmapi allocation code requires a file range to
allocate and the writeback conversion code assumes the range of the
currently cached mapping is still valid with respect to the fork. It
may not be valid, however, because the ilock is cycled (potentially
multiple times) between the time the cached mapping was populated
and the delalloc conversion occurs.

To facilitate a solution to this problem, create a new
xfs_bmapi_delalloc() wrapper to xfs_bmapi_write() that takes a file
(FSB) offset and attempts to allocate whatever delalloc extent backs
the offset. Use a new bmapi flag to cause xfs_bmapi_write() to set
the range based on the extent backing the bno parameter unless bno
lands in a hole. If bno does land in a hole, fall back to the
current behavior (which may result in an error or quietly skipping
holes in the specified range depending on other parameters). This
patch does not change behavior.
Signed-off-by: NBrian Foster <bfoster@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

627209fb

xfs: remove superfluous writeback mapping eof trimming · 3b350898

由 Brian Foster 提交于 2月 01, 2019

Now that the cached writeback mapping is explicitly invalidated on
data fork changes, the EOF trimming band-aid is no longer necessary.
Remove xfs_trim_extent_eof() as well since it has no other users.
Signed-off-by: NBrian Foster <bfoster@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

3b350898

xfs: validate writeback mapping using data fork seq counter · d9252d52

由 Brian Foster 提交于 2月 01, 2019

The writeback code caches the current extent mapping across multiple
xfs_do_writepage() calls to avoid repeated lookups for sequential
pages backed by the same extent. This is known to be slightly racy
with extent fork changes in certain difficult to reproduce
scenarios. The cached extent is trimmed to within EOF to help avoid
the most common vector for this problem via speculative
preallocation management, but this is a band-aid that does not
address the fundamental problem.

Now that we have an xfs_ifork sequence counter mechanism used to
facilitate COW writeback, we can use the same mechanism to validate
consistency between the data fork and cached writeback mappings. On
its face, this is somewhat of a big hammer approach because any
change to the data fork invalidates any mapping currently cached by
a writeback in progress regardless of whether the data fork change
overlaps with the range under writeback. In practice, however, the
impact of this approach is minimal in most cases.

First, data fork changes (delayed allocations) caused by sustained
sequential buffered writes are amortized across speculative
preallocations. This means that a cached mapping won't be
invalidated by each buffered write of a common file copy workload,
but rather only on less frequent allocation events. Second, the
extent tree is always entirely in-core so an additional lookup of a
usable extent mostly costs a shared ilock cycle and in-memory tree
lookup. This means that a cached mapping reval is relatively cheap
compared to the I/O itself. Third, spurious invalidations don't
impact ioend construction. This means that even if the same extent
is revalidated multiple times across multiple writepage instances,
we still construct and submit the same size ioend (and bio) if the
blocks are physically contiguous.

Update struct xfs_writepage_ctx with a new field to hold the
sequence number of the data fork associated with the currently
cached mapping. Check the wpc seqno against the data fork when the
mapping is validated and reestablish the mapping whenever the fork
has changed since the mapping was cached. This ensures that
writeback always uses a valid extent mapping and thus prevents lost
writebacks and stale delalloc block problems.
Signed-off-by: NBrian Foster <bfoster@redhat.com>
Reviewed-by: NAllison Henderson <allison.henderson@oracle.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

d9252d52

xfs: update fork seq counter on data fork changes · 9f9bc034

由 Brian Foster 提交于 2月 01, 2019

The sequence counter in the xfs_ifork structure is only updated on
COW forks. This is because the counter is currently only used to
optimize out repetitive COW fork checks at writeback time.

Tweak the extent code to update the seq counter regardless of the
fork type in preparation for using this counter on data forks as
well.
Signed-off-by: NBrian Foster <bfoster@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NAllison Henderson <allison.henderson@oracle.com>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

9f9bc034

xfs: Introduce XFS_PTAG_VERIFIER_ERROR panic mask · d519da41

由 Marco Benatto 提交于 2月 01, 2019

Currently we have a few PTAGs in place allowing us to transform a filesystem
error in a BUG() call.  However, we don't have a panic tag for corrupt
metadata, so introduce XFS_PTAG_VERIFIER_ERROR so that the administrator can
use the fs.xfs.panic_mask sysctl knob to convert any error detected by buffer
verifiers into a kernel panic.
Signed-off-by: NMarco Benatto <mbenatto@redhat.com>
Reviewed-by: NEric Sandeen <sandeen@redhat.com>
[darrick: light editing of commit message]
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

d519da41

xfs: remove duplicated xfs_defer.h · e88db816

由 YueHaibing 提交于 2月 01, 2019

Remove duplicated include.
Signed-off-by: NYueHaibing <yuehaibing@huawei.com>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

e88db816

xfs: check attribute name validity · 65480536

由 Darrick J. Wong 提交于 2月 01, 2019

Check extended attribute entry names for invalid characters.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NBrian Foster <bfoster@redhat.com>

65480536

xfs: check directory name validity · e5d7d51b

由 Darrick J. Wong 提交于 2月 01, 2019

Check directory entry names for invalid characters.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NBrian Foster <bfoster@redhat.com>

e5d7d51b

xfs: fix off-by-one error in rtbitmap cross-reference · 87c9607d

由 Darrick J. Wong 提交于 2月 01, 2019

Fix an off-by-one error in the realtime bitmap "is used" cross-reference
helper function if the realtime extent size is a single block.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NBrian Foster <bfoster@redhat.com>

87c9607d

xfs: scrub should flag dir/attr offsets that aren't mappable with xfs_dablk_t · f8c1d702

由 Darrick J. Wong 提交于 2月 01, 2019

Teach scrub to flag extent maps that exceed the range that can be mapped
with a xfs_dablk_t.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NBrian Foster <bfoster@redhat.com>

f8c1d702

xfs: abort xattr scrub if fatal signals are pending · 3258cb20

由 Darrick J. Wong 提交于 2月 01, 2019

The extended attribute scrubber should abort the "read all attrs" loop
if there's a fatal signal pending on the process.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NBrian Foster <bfoster@redhat.com>

3258cb20

xfs: consolidate scrub dinode mapping code into a single function · f9e63342

由 Darrick J. Wong 提交于 2月 01, 2019

Move all the confusing dinode mapping code that's split between
xchk_iallocbt_check_cluster and xchk_iallocbt_check_cluster_ifree into
the first function so that it's clearer how we find the dinode for a
given inode.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NBrian Foster <bfoster@redhat.com>

f9e63342

xfs: scrub big block inode btrees correctly · 4539b8a7

由 Darrick J. Wong 提交于 2月 01, 2019

Teach scrub how to handle the case that there are one or more inobt
records covering a given inode cluster.  This fixes the operation on big
block filesystems (e.g. 64k blocks, 512 byte inodes).
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NBrian Foster <bfoster@redhat.com>

4539b8a7

xfs: clean up the inode cluster checking in the inobt scrub · b9454fe0

由 Darrick J. Wong 提交于 2月 01, 2019

The code to check inobt records against inode clusters is a mess of
poorly named variables and unnecessary parameters.  Clean the
unnecessary inode number parameters out of _check_cluster_freemask in
favor of computing them inside the function instead of making the caller
do it.  In xchk_iallocbt_check_cluster, rename the variables to make it
more obvious just what chunk_ino and cluster_ino represent.

Add a tracepoint to make it easier to track each inode cluster as we
scrub it.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NBrian Foster <bfoster@redhat.com>

b9454fe0

xfs: hoist inode cluster checks out of loop · a1954242

由 Darrick J. Wong 提交于 2月 01, 2019

Hoist the inode cluster checks out of the inobt record check loop into
a separate function in preparation for refactoring of that loop.  No
functional changes here; that's in the next patch.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NBrian Foster <bfoster@redhat.com>

a1954242

xfs: check inobt record alignment on big block filesystems · 22234c62

由 Darrick J. Wong 提交于 2月 01, 2019

On a big block filesystem, there may be multiple inobt records covering
a single inode cluster.  These records obviously won't be aligned to
cluster alignment rules, and they must cover the entire cluster.  Teach
scrub to check for these things.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NBrian Foster <bfoster@redhat.com>

22234c62

xfs: check the ir_startino alignment directly · c050fdfe

由 Darrick J. Wong 提交于 2月 01, 2019

In xchk_iallocbt_rec, check the alignment of ir_startino by converting
the inode cluster block alignment into units of inodes instead of the
other way around (converting ir_startino to blocks).  This prevents us
from tripping over off-by-one errors in ir_startino which are obscured
by the inode -> block conversion.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NBrian Foster <bfoster@redhat.com>

c050fdfe

xfs: never try to scrub more than 64 inodes per inobt record · 435dcf07

由 Darrick J. Wong 提交于 2月 01, 2019

Make sure we never check more than XFS_INODES_PER_CHUNK inodes for any
given inobt record since there can be more than one inobt record mapped
to an inode cluster.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NBrian Foster <bfoster@redhat.com>

435dcf07

04 2月, 2019 3 次提交

xfs: set buffer ops when repair probes for btree type · add46b3b

由 Darrick J. Wong 提交于 2月 03, 2019

In xrep_findroot_block, we work out the btree type and correctness of a
given block by calling different btree verifiers on root block
candidates.  However, we leave the NULL b_ops while ->verify_read
validates the block, which means that if the verifier calls
xfs_buf_verifier_error it'll crash on the null b_ops.  Fix it to set
b_ops before calling the verifier and unsetting it if the verifier
fails.

Furthermore, improve the documentation around xfs_buf_ensure_ops, which
is the function that is responsible for cleaning up the b_ops state of
buffers that go through xrep_findroot_block but don't match anything.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NBrian Foster <bfoster@redhat.com>

add46b3b

xfs: end sync buffer I/O properly on shutdown error · 465fa17f

由 Brian Foster 提交于 2月 03, 2019

As of commit e339dd8d ("xfs: use sync buffer I/O for sync delwri
queue submission"), the delwri submission code uses sync buffer I/O
for sync delwri I/O. Instead of waiting on async I/O to unlock the
buffer, it uses the underlying sync I/O completion mechanism.

If delwri buffer submission fails due to a shutdown scenario, an
error is set on the buffer and buffer completion never occurs. This
can cause xfs_buf_delwri_submit() to deadlock waiting on a
completion event.

We could check the error state before waiting on such buffers, but
that doesn't serialize against the case of an error set via a racing
I/O completion. Instead, invoke I/O completion in the shutdown case
regardless of buffer I/O type.
Signed-off-by: NBrian Foster <bfoster@redhat.com>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

465fa17f

xfs: eof trim writeback mapping as soon as it is cached · aa6ee4ab

由 Brian Foster 提交于 2月 01, 2019

The cached writeback mapping is EOF trimmed to try and avoid races
between post-eof block management and writeback that result in
sending cached data to a stale location. The cached mapping is
currently trimmed on the validation check, which leaves a race
window between the time the mapping is cached and when it is trimmed
against the current inode size.

For example, if a new mapping is cached by delalloc conversion on a
blocksize == page size fs, we could cycle various locks, perform
memory allocations, etc.  in the writeback codepath before the
associated mapping is eventually trimmed to i_size. This leaves
enough time for a post-eof truncate and file append before the
cached mapping is trimmed. The former event essentially invalidates
a range of the cached mapping and the latter bumps the inode size
such the trim on the next writepage event won't trim all of the
invalid blocks. fstest generic/464 reproduces this scenario
occasionally and causes a lost writeback and stale delalloc blocks
warning on inode inactivation.

To work around this problem, trim the cached writeback mapping as
soon as it is cached in addition to on subsequent validation checks.
This is a minor tweak to tighten the race window as much as possible
until a proper invalidation mechanism is available.

Fixes: 40214d12 ("xfs: trim writepage mapping to within eof")
Cc: <stable@vger.kernel.org> # v4.14+
Signed-off-by: NBrian Foster <bfoster@redhat.com>
Reviewed-by: NAllison Henderson <allison.henderson@oracle.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

aa6ee4ab

30 12月, 2018 1 次提交

xfs: xfs_fsops: drop useless LIST_HEAD · 90be9b86

由 Julia Lawall 提交于 12月 23, 2018

Drop LIST_HEAD where the variable it declares is never used.

Commit 0410c3bb ("xfs: factor ag btree root block
initialisation") stopped using buffer_list and started using a
buffer list in an aghdr_init_data structure, but the declaration
of buffer_list was not removed.

The semantic patch that fixes this problem is as follows:
(http://coccinelle.lip6.fr/)

// <smpl>
@@
identifier x;
@@
- LIST_HEAD(x);
  ... when != x
// </smpl>

Fixes: 0410c3bb ("xfs: factor ag btree root block initialisation")
Signed-off-by: NJulia Lawall <Julia.Lawall@lip6.fr>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

90be9b86

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功