提交 · c2b3164320b51a535d7c7a6acdcee255edbb22cf · openeuler / Kernel

12 2月, 2019 19 次提交

xfs: use the latest extent at writeback delalloc conversion time · c2b31643

由 Brian Foster 提交于 2月 01, 2019

The writeback delalloc conversion code is racy with respect to
changes in the currently cached file mapping outside of the current
page. This is because the ilock is cycled between the time the
caller originally looked up the mapping and across each real
allocation of the provided file range. This code has collected
various hacks over the years to help combat the symptoms of these
races (i.e., truncate race detection, allocation into hole
detection, etc.), but none address the fundamental problem that the
imap may not be valid at allocation time.

Rather than continue to use race detection hacks, update writeback
delalloc conversion to a model that explicitly converts the delalloc
extent backing the current file offset being processed. The current
file offset is the only block we can trust to remain once the ilock
is dropped because any operation that can remove the block
(truncate, hole punch, etc.) must flush and discard pagecache pages
first.

Modify xfs_iomap_write_allocate() to use the xfs_bmapi_delalloc()
mechanism to request allocation of the entire delalloc extent
backing the current offset instead of assuming the extent passed by
the caller is unchanged. Record the range specified by the caller
and apply it to the resulting allocated extent so previous checks by
the caller for COW fork overlap are not lost. Finally, overload the
bmapi delalloc flag with the range reval flag behavior since this is
the only use case for both.

This ensures that writeback always picks up the correct
and current extent associated with the page, regardless of races
with other extent modifying operations. If operating on a data fork
and the COW overlap state has changed since the ilock was cycled,
the caller revalidates against the COW fork sequence number before
using the imap for the next block.
Signed-off-by: NBrian Foster <bfoster@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

c2b31643

xfs: create delalloc bmapi wrapper for full extent allocation · 627209fb

由 Brian Foster 提交于 2月 01, 2019

The writeback delalloc conversion code is racy with respect to
changes in the currently cached file mapping. This stems from the
fact that the bmapi allocation code requires a file range to
allocate and the writeback conversion code assumes the range of the
currently cached mapping is still valid with respect to the fork. It
may not be valid, however, because the ilock is cycled (potentially
multiple times) between the time the cached mapping was populated
and the delalloc conversion occurs.

To facilitate a solution to this problem, create a new
xfs_bmapi_delalloc() wrapper to xfs_bmapi_write() that takes a file
(FSB) offset and attempts to allocate whatever delalloc extent backs
the offset. Use a new bmapi flag to cause xfs_bmapi_write() to set
the range based on the extent backing the bno parameter unless bno
lands in a hole. If bno does land in a hole, fall back to the
current behavior (which may result in an error or quietly skipping
holes in the specified range depending on other parameters). This
patch does not change behavior.
Signed-off-by: NBrian Foster <bfoster@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

627209fb

xfs: remove superfluous writeback mapping eof trimming · 3b350898

由 Brian Foster 提交于 2月 01, 2019

Now that the cached writeback mapping is explicitly invalidated on
data fork changes, the EOF trimming band-aid is no longer necessary.
Remove xfs_trim_extent_eof() as well since it has no other users.
Signed-off-by: NBrian Foster <bfoster@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

3b350898

xfs: validate writeback mapping using data fork seq counter · d9252d52

由 Brian Foster 提交于 2月 01, 2019

The writeback code caches the current extent mapping across multiple
xfs_do_writepage() calls to avoid repeated lookups for sequential
pages backed by the same extent. This is known to be slightly racy
with extent fork changes in certain difficult to reproduce
scenarios. The cached extent is trimmed to within EOF to help avoid
the most common vector for this problem via speculative
preallocation management, but this is a band-aid that does not
address the fundamental problem.

Now that we have an xfs_ifork sequence counter mechanism used to
facilitate COW writeback, we can use the same mechanism to validate
consistency between the data fork and cached writeback mappings. On
its face, this is somewhat of a big hammer approach because any
change to the data fork invalidates any mapping currently cached by
a writeback in progress regardless of whether the data fork change
overlaps with the range under writeback. In practice, however, the
impact of this approach is minimal in most cases.

First, data fork changes (delayed allocations) caused by sustained
sequential buffered writes are amortized across speculative
preallocations. This means that a cached mapping won't be
invalidated by each buffered write of a common file copy workload,
but rather only on less frequent allocation events. Second, the
extent tree is always entirely in-core so an additional lookup of a
usable extent mostly costs a shared ilock cycle and in-memory tree
lookup. This means that a cached mapping reval is relatively cheap
compared to the I/O itself. Third, spurious invalidations don't
impact ioend construction. This means that even if the same extent
is revalidated multiple times across multiple writepage instances,
we still construct and submit the same size ioend (and bio) if the
blocks are physically contiguous.

Update struct xfs_writepage_ctx with a new field to hold the
sequence number of the data fork associated with the currently
cached mapping. Check the wpc seqno against the data fork when the
mapping is validated and reestablish the mapping whenever the fork
has changed since the mapping was cached. This ensures that
writeback always uses a valid extent mapping and thus prevents lost
writebacks and stale delalloc block problems.
Signed-off-by: NBrian Foster <bfoster@redhat.com>
Reviewed-by: NAllison Henderson <allison.henderson@oracle.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

d9252d52

xfs: update fork seq counter on data fork changes · 9f9bc034

由 Brian Foster 提交于 2月 01, 2019

The sequence counter in the xfs_ifork structure is only updated on
COW forks. This is because the counter is currently only used to
optimize out repetitive COW fork checks at writeback time.

Tweak the extent code to update the seq counter regardless of the
fork type in preparation for using this counter on data forks as
well.
Signed-off-by: NBrian Foster <bfoster@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NAllison Henderson <allison.henderson@oracle.com>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

9f9bc034

xfs: Introduce XFS_PTAG_VERIFIER_ERROR panic mask · d519da41

由 Marco Benatto 提交于 2月 01, 2019

Currently we have a few PTAGs in place allowing us to transform a filesystem
error in a BUG() call.  However, we don't have a panic tag for corrupt
metadata, so introduce XFS_PTAG_VERIFIER_ERROR so that the administrator can
use the fs.xfs.panic_mask sysctl knob to convert any error detected by buffer
verifiers into a kernel panic.
Signed-off-by: NMarco Benatto <mbenatto@redhat.com>
Reviewed-by: NEric Sandeen <sandeen@redhat.com>
[darrick: light editing of commit message]
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

d519da41

xfs: remove duplicated xfs_defer.h · e88db816

由 YueHaibing 提交于 2月 01, 2019

Remove duplicated include.
Signed-off-by: NYueHaibing <yuehaibing@huawei.com>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

e88db816

xfs: check attribute name validity · 65480536

由 Darrick J. Wong 提交于 2月 01, 2019

Check extended attribute entry names for invalid characters.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NBrian Foster <bfoster@redhat.com>

65480536

xfs: check directory name validity · e5d7d51b

由 Darrick J. Wong 提交于 2月 01, 2019

Check directory entry names for invalid characters.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NBrian Foster <bfoster@redhat.com>

e5d7d51b

xfs: fix off-by-one error in rtbitmap cross-reference · 87c9607d

由 Darrick J. Wong 提交于 2月 01, 2019

Fix an off-by-one error in the realtime bitmap "is used" cross-reference
helper function if the realtime extent size is a single block.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NBrian Foster <bfoster@redhat.com>

87c9607d

xfs: scrub should flag dir/attr offsets that aren't mappable with xfs_dablk_t · f8c1d702

由 Darrick J. Wong 提交于 2月 01, 2019

Teach scrub to flag extent maps that exceed the range that can be mapped
with a xfs_dablk_t.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NBrian Foster <bfoster@redhat.com>

f8c1d702

xfs: abort xattr scrub if fatal signals are pending · 3258cb20

由 Darrick J. Wong 提交于 2月 01, 2019

The extended attribute scrubber should abort the "read all attrs" loop
if there's a fatal signal pending on the process.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NBrian Foster <bfoster@redhat.com>

3258cb20

xfs: consolidate scrub dinode mapping code into a single function · f9e63342

由 Darrick J. Wong 提交于 2月 01, 2019

Move all the confusing dinode mapping code that's split between
xchk_iallocbt_check_cluster and xchk_iallocbt_check_cluster_ifree into
the first function so that it's clearer how we find the dinode for a
given inode.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NBrian Foster <bfoster@redhat.com>

f9e63342

xfs: scrub big block inode btrees correctly · 4539b8a7

由 Darrick J. Wong 提交于 2月 01, 2019

Teach scrub how to handle the case that there are one or more inobt
records covering a given inode cluster.  This fixes the operation on big
block filesystems (e.g. 64k blocks, 512 byte inodes).
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NBrian Foster <bfoster@redhat.com>

4539b8a7

xfs: clean up the inode cluster checking in the inobt scrub · b9454fe0

由 Darrick J. Wong 提交于 2月 01, 2019

The code to check inobt records against inode clusters is a mess of
poorly named variables and unnecessary parameters.  Clean the
unnecessary inode number parameters out of _check_cluster_freemask in
favor of computing them inside the function instead of making the caller
do it.  In xchk_iallocbt_check_cluster, rename the variables to make it
more obvious just what chunk_ino and cluster_ino represent.

Add a tracepoint to make it easier to track each inode cluster as we
scrub it.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NBrian Foster <bfoster@redhat.com>

b9454fe0

xfs: hoist inode cluster checks out of loop · a1954242

由 Darrick J. Wong 提交于 2月 01, 2019

Hoist the inode cluster checks out of the inobt record check loop into
a separate function in preparation for refactoring of that loop.  No
functional changes here; that's in the next patch.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NBrian Foster <bfoster@redhat.com>

a1954242

xfs: check inobt record alignment on big block filesystems · 22234c62

由 Darrick J. Wong 提交于 2月 01, 2019

On a big block filesystem, there may be multiple inobt records covering
a single inode cluster.  These records obviously won't be aligned to
cluster alignment rules, and they must cover the entire cluster.  Teach
scrub to check for these things.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NBrian Foster <bfoster@redhat.com>

22234c62

xfs: check the ir_startino alignment directly · c050fdfe

由 Darrick J. Wong 提交于 2月 01, 2019

In xchk_iallocbt_rec, check the alignment of ir_startino by converting
the inode cluster block alignment into units of inodes instead of the
other way around (converting ir_startino to blocks).  This prevents us
from tripping over off-by-one errors in ir_startino which are obscured
by the inode -> block conversion.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NBrian Foster <bfoster@redhat.com>

c050fdfe

xfs: never try to scrub more than 64 inodes per inobt record · 435dcf07

由 Darrick J. Wong 提交于 2月 01, 2019

Make sure we never check more than XFS_INODES_PER_CHUNK inodes for any
given inobt record since there can be more than one inobt record mapped
to an inode cluster.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NBrian Foster <bfoster@redhat.com>

435dcf07

04 2月, 2019 3 次提交

xfs: set buffer ops when repair probes for btree type · add46b3b

由 Darrick J. Wong 提交于 2月 03, 2019

In xrep_findroot_block, we work out the btree type and correctness of a
given block by calling different btree verifiers on root block
candidates.  However, we leave the NULL b_ops while ->verify_read
validates the block, which means that if the verifier calls
xfs_buf_verifier_error it'll crash on the null b_ops.  Fix it to set
b_ops before calling the verifier and unsetting it if the verifier
fails.

Furthermore, improve the documentation around xfs_buf_ensure_ops, which
is the function that is responsible for cleaning up the b_ops state of
buffers that go through xrep_findroot_block but don't match anything.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NBrian Foster <bfoster@redhat.com>

add46b3b

xfs: end sync buffer I/O properly on shutdown error · 465fa17f

由 Brian Foster 提交于 2月 03, 2019

As of commit e339dd8d ("xfs: use sync buffer I/O for sync delwri
queue submission"), the delwri submission code uses sync buffer I/O
for sync delwri I/O. Instead of waiting on async I/O to unlock the
buffer, it uses the underlying sync I/O completion mechanism.

If delwri buffer submission fails due to a shutdown scenario, an
error is set on the buffer and buffer completion never occurs. This
can cause xfs_buf_delwri_submit() to deadlock waiting on a
completion event.

We could check the error state before waiting on such buffers, but
that doesn't serialize against the case of an error set via a racing
I/O completion. Instead, invoke I/O completion in the shutdown case
regardless of buffer I/O type.
Signed-off-by: NBrian Foster <bfoster@redhat.com>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

465fa17f

xfs: eof trim writeback mapping as soon as it is cached · aa6ee4ab

由 Brian Foster 提交于 2月 01, 2019

The cached writeback mapping is EOF trimmed to try and avoid races
between post-eof block management and writeback that result in
sending cached data to a stale location. The cached mapping is
currently trimmed on the validation check, which leaves a race
window between the time the mapping is cached and when it is trimmed
against the current inode size.

For example, if a new mapping is cached by delalloc conversion on a
blocksize == page size fs, we could cycle various locks, perform
memory allocations, etc.  in the writeback codepath before the
associated mapping is eventually trimmed to i_size. This leaves
enough time for a post-eof truncate and file append before the
cached mapping is trimmed. The former event essentially invalidates
a range of the cached mapping and the latter bumps the inode size
such the trim on the next writepage event won't trim all of the
invalid blocks. fstest generic/464 reproduces this scenario
occasionally and causes a lost writeback and stale delalloc blocks
warning on inode inactivation.

To work around this problem, trim the cached writeback mapping as
soon as it is cached in addition to on subsequent validation checks.
This is a minor tweak to tighten the race window as much as possible
until a proper invalidation mechanism is available.

Fixes: 40214d12 ("xfs: trim writepage mapping to within eof")
Cc: <stable@vger.kernel.org> # v4.14+
Signed-off-by: NBrian Foster <bfoster@redhat.com>
Reviewed-by: NAllison Henderson <allison.henderson@oracle.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

aa6ee4ab

30 12月, 2018 2 次提交

xfs: xfs_fsops: drop useless LIST_HEAD · 90be9b86

由 Julia Lawall 提交于 12月 23, 2018

Drop LIST_HEAD where the variable it declares is never used.

Commit 0410c3bb ("xfs: factor ag btree root block
initialisation") stopped using buffer_list and started using a
buffer list in an aghdr_init_data structure, but the declaration
of buffer_list was not removed.

The semantic patch that fixes this problem is as follows:
(http://coccinelle.lip6.fr/)

// <smpl>
@@
identifier x;
@@
- LIST_HEAD(x);
  ... when != x
// </smpl>

Fixes: 0410c3bb ("xfs: factor ag btree root block initialisation")
Signed-off-by: NJulia Lawall <Julia.Lawall@lip6.fr>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

90be9b86

xfs: xfs_buf: drop useless LIST_HEAD · 89be677b

由 Julia Lawall 提交于 12月 23, 2018

Drop LIST_HEAD where the variable it declares has never
been used.

The semantic patch that fixes this problem is as follows:
(http://coccinelle.lip6.fr/)

// <smpl>
@@
identifier x;
@@
- LIST_HEAD(x);
  ... when != x
// </smpl>

Fixes: 26f1fe85 ("xfs: reduce lock hold times in buffer writeback")
Signed-off-by: NJulia Lawall <Julia.Lawall@lip6.fr>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

89be677b

22 12月, 2018 1 次提交

xfs: reallocate realtime summary cache on growfs · 65eed012

由 Omar Sandoval 提交于 12月 21, 2018

At mount time, we allocate m_rsum_cache with the number of realtime
bitmap blocks. However, xfs_growfs_rt() can increase the number of
realtime bitmap blocks. Using the cache after this happens may access
out of the bounds of the cache. Fix it by reallocating the cache in this
case.

Fixes: 355e3532 ("xfs: cache minimum realtime summary level")
Signed-off-by: NOmar Sandoval <osandov@fb.com>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

65eed012

20 12月, 2018 6 次提交

xfs: stringify scrub types in ftrace output · 86d163db

由 Darrick J. Wong 提交于 12月 18, 2018

Use __print_symbolic to print the scrub type in ftrace output.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NEric Sandeen <sandeen@redhat.com>

86d163db

xfs: stringify btree cursor types in ftrace output · c494213f

由 Darrick J. Wong 提交于 12月 18, 2018

Use __print_symbolic to print the btree type in ftrace output.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NEric Sandeen <sandeen@redhat.com>

c494213f

xfs: move XFS_INODE_FORMAT_STR mappings to libxfs · 0357d21a

由 Darrick J. Wong 提交于 12月 18, 2018

Move XFS_INODE_FORMAT_STR to libxfs so that we don't forget to keep it
updated, and add necessary TRACE_DEFINE_ENUM.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NEric Sandeen <sandeen@redhat.com>

0357d21a

xfs: move XFS_AG_BTREE_CMP_FORMAT_STR mappings to libxfs · 05c753c4

由 Darrick J. Wong 提交于 12月 18, 2018

Move XFS_AG_BTREE_CMP_FORMAT_STR to libxfs so that we don't forget to
keep it updated, and TRACE_DEFINE_ENUM the values while we're at it.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NEric Sandeen <sandeen@redhat.com>

05c753c4

xfs: fix symbolic enum printing in ftrace output · 85f8dff0

由 Darrick J. Wong 提交于 12月 18, 2018

ftrace's __print_symbolic() has a (very poorly documented) requirement
that any enum values used in the symbol to string translation table be
wrapped in a TRACE_DEFINE_ENUM so that the enum value can be encoded in
the ftrace ring buffer. Fix this unsatisfied requirement.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NEric Sandeen <sandeen@redhat.com>

85f8dff0

xfs: fix function pointer type in ftrace format · 7af8150f

由 Darrick J. Wong 提交于 12月 18, 2018

Use %pS instead of %pF in ftrace strings so that we record the actual
function address instead of the function descriptor.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NEric Sandeen <sandeen@redhat.com>

7af8150f

19 12月, 2018 3 次提交

xfs: Fix x32 ioctls when cmd numbers differ from ia32. · a9d25bde

由 Nick Bowler 提交于 12月 17, 2018

Several ioctl structs change size between native 32-bit (ia32) and x32
applications, because x32 follows the native 64-bit (amd64) integer
alignment rules and uses 64-bit time_t.  In these instances, the ioctl
number changes so userspace simply gets -ENOTTY.  This scenario can be
handled by simply adding more cases.

Looking at the different ioctls implemented here:

- All the ones marked 'No size or alignment issue on any arch' should
  presumably all be fine.

- All the ones under BROKEN_X86_ALIGNMENT are different under integer
  alignment rules.  Since x32 matches amd64 here, we just need both
  sets of cases handled.

- XFS_IOC_SWAPEXT has both integer alignment differences and time_t
  differences.  Since x32 matches amd64 here, we need to add a case
  which calls the native implementation.

- The remaining ioctls have neither 64-bit integers nor time_t, so
  x32 matches ia32 here and no change is required at this level.  The
  bulkstat ioctl implementations have some pointer chasing which is
  handled separately.
Signed-off-by: NNick Bowler <nbowler@draconx.ca>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

a9d25bde

xfs: Fix bulkstat compat ioctls on x32 userspace. · 7ca860e3

由 Nick Bowler 提交于 12月 17, 2018

The bulkstat family of ioctls are problematic on x32, because there is
a mixup of native 32-bit and 64-bit conventions.  The xfs_fsop_bulkreq
struct contains pointers and 32-bit integers so that matches the native
32-bit layout, and that means the ioctl implementation goes into the
regular compat path on x32.

However, the 'ubuffer' member of that struct in turn refers to either
struct xfs_inogrp or xfs_bstat (or an array of these).  On x32, those
structures match the native 64-bit layout.  The compat implementation
writes out the 32-bit version of these structures.  This is not the
expected format for x32 userspace, causing problems.

Fortunately the functions which actually output these xfs_inogrp and
xfs_bstat structures have an easy way to select which output format
is required, so we just need a little tweak to select the right format
on x32.
Signed-off-by: NNick Bowler <nbowler@draconx.ca>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

7ca860e3

xfs: Align compat attrlist_by_handle with native implementation. · c456d644

由 Nick Bowler 提交于 12月 17, 2018

While inspecting the ioctl implementations, I noticed that the compat
implementation of XFS_IOC_ATTRLIST_BY_HANDLE does not do exactly the
same thing as the native implementation. Specifically, the "cursor"
does not appear to be written out to userspace on the compat path,
like it is on the native path.

This adjusts the compat implementation to copy out the cursor just
like the native implementation does. The attrlist cursor does not
require any special compat handling. This fixes xfstests xfs/269
on both IA-32 and x32 userspace, when running on an amd64 kernel.
Signed-off-by: NNick Bowler <nbowler@draconx.ca>
Fixes: 0facef7f ("xfs: in _attrlist_by_handle, copy the cursor back to userspace")
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

c456d644

14 12月, 2018 1 次提交

xfs: require both realtime inodes to mount · 64bafd2f

由 Darrick J. Wong 提交于 12月 12, 2018

Since mkfs always formats the filesystem with the realtime bitmap and
summary inodes immediately after the root directory, we should expect
that both of them are present and loadable, even if there isn't a
realtime volume attached. There's no reason to skip this if rbmino ==
NULLFSINO; in fact, this causes an immediate crash if the there /is/ a
realtime volume and someone writes to it.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NBill O'Donnell <billodo@redhat.com>

64bafd2f

13 12月, 2018 5 次提交

xfs: cache minimum realtime summary level · 355e3532

由 Omar Sandoval 提交于 12月 12, 2018

The realtime summary is a two-dimensional array on disk, effectively:

u32 rsum[log2(number of realtime extents) + 1][number of blocks in the bitmap]

rsum[log][bbno] is the number of extents of size 2**log which start in
bitmap block bbno.

xfs_rtallocate_extent_near() uses xfs_rtany_summary() to check whether
rsum[log][bbno] != 0 for any log level. However, the summary array is
stored in row-major order (i.e., like an array in C), so all of these
entries are not adjacent, but rather spread across the entire summary
file. In the worst case (a full bitmap block), xfs_rtany_summary() has
to check every level.

This means that on a moderately-used realtime device, an allocation will
waste a lot of time finding, reading, and releasing buffers for the
realtime summary. In particular, one of our storage services (which runs
on servers with 8 very slow CPUs and 15 8 TB XFS realtime filesystems)
spends almost 5% of its CPU cycles in xfs_rtbuf_get() and
xfs_trans_brelse() called from xfs_rtany_summary().

One solution would be to also store the summary with the dimensions
swapped. However, this would require a disk format change to a very old
component of XFS.

Instead, we can cache the minimum size which contains any extents. We do
so lazily; rather than guaranteeing that the cache contains the precise
minimum, it always contains a loose lower bound which we tighten when we
read or update a summary block. This only uses a few kilobytes of memory
and is already serialized via the realtime bitmap and summary inode
locks, so the cost is minimal. With this change, the same workload only
spends 0.2% of its CPU cycles in the realtime allocator.
Signed-off-by: NOmar Sandoval <osandov@fb.com>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

355e3532

xfs: count inode blocks correctly in inobt scrub · 2c2d9d3a

由 Darrick J. Wong 提交于 12月 12, 2018

A big block filesystem might require more than one inobt record to cover
all the inodes in the block. In these cases it is not correct to round
the irec count up to the nearest block because this causes us to
overestimate the number of inode blocks we expect to find.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NDave Chinner <dchinner@redhat.com>

2c2d9d3a

xfs: precalculate cluster alignment in inodes and blocks · c1b4a321

由 Darrick J. Wong 提交于 12月 12, 2018

Store the inode cluster alignment information in units of inodes and
blocks in the mount data so that we don't have to keep recalculating
them.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NBrian Foster <bfoster@redhat.com>

c1b4a321

xfs: precalculate inodes and blocks per inode cluster · 83dcdb44

由 Darrick J. Wong 提交于 12月 12, 2018

Store the number of inodes and blocks per inode cluster in the mount
data so that we don't have to keep recalculating them.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NBrian Foster <bfoster@redhat.com>

83dcdb44

xfs: add a block to inode count converter · 43004b2a

由 Darrick J. Wong 提交于 12月 12, 2018

Add new helpers to convert units of fs blocks into inodes, and AG blocks
into AG inodes, respectively.  Convert all the open-coded conversions
and XFS_OFFBNO_TO_AGINO(, , 0) calls to use them, as appropriate.  The
OFFBNO_TO_AGINO macro is retained for xfs_repair.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NBrian Foster <bfoster@redhat.com>

43004b2a

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功