提交 · e01b7eed5d0a9b101da53701e92136c3985998af · openeuler / Kernel

16 9月, 2020 4 次提交

xfs: Convert xfs_attr_sf macros to inline functions · e01b7eed

由 Carlos Maiolino 提交于 9月 07, 2020

xfs_attr_sf_totsize() requires access to xfs_inode structure, so, once
xfs_attr_shortform_addname() is its only user, move it to xfs_attr.c
instead of playing with more #includes.
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NCarlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

e01b7eed

xfs: Use variable-size array for nameval in xfs_attr_sf_entry · c418dbc9

由 Carlos Maiolino 提交于 9月 04, 2020

nameval is a variable-size array, so, define it as it, and remove all
the -1 magic number subtractions
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NCarlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

c418dbc9

xfs: Remove typedef xfs_attr_shortform_t · 47e6cc10

由 Carlos Maiolino 提交于 9月 04, 2020

Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NCarlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

47e6cc10

xfs: remove typedef xfs_attr_sf_entry_t · 6337c844

由 Carlos Maiolino 提交于 9月 04, 2020

Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NCarlos Maiolino <cmaiolino@redhat.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

6337c844

27 8月, 2020 2 次提交

xfs: initialize the shortform attr header padding entry · 125eac24

由 Darrick J. Wong 提交于 8月 26, 2020

Don't leak kernel memory contents into the shortform attr fork.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NEric Sandeen <sandeen@redhat.com>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

125eac24

xfs: fix boundary test in xfs_attr_shortform_verify · f4020438

由 Eric Sandeen 提交于 8月 26, 2020

The boundary test for the fixed-offset parts of xfs_attr_sf_entry in
xfs_attr_shortform_verify is off by one, because the variable array
at the end is defined as nameval[1] not nameval[].
Hence we need to subtract 1 from the calculation.

This can be shown by:

# touch file
# setfattr -n root.a file

and verifications will fail when it's written to disk.

This only matters for a last attribute which has a single-byte name
and no value, otherwise the combination of namelen & valuelen will
push endp further out and this test won't fail.

Fixes: 1e1bbd8e ("xfs: create structure verifier function for shortform xattrs")
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

f4020438

29 7月, 2020 4 次提交

xfs: Pull up trans roll in xfs_attr3_leaf_clearflag · 1fc618d7

由 Allison Collins 提交于 7月 20, 2020

New delayed allocation routines cannot be handling transactions so
pull them out into the calling functions
Signed-off-by: NAllison Collins <allison.henderson@oracle.com>
Reviewed-by: NBrian Foster <bfoster@redhat.com>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NChandan Rajendra <chandanrlinux@gmail.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Acked-by: NDave Chinner <dchinner@redhat.com>

1fc618d7

xfs: Pull up trans roll from xfs_attr3_leaf_setflag · 0949d317

由 Allison Collins 提交于 7月 20, 2020

New delayed allocation routines cannot be handling transactions so
pull them up into the calling functions
Signed-off-by: NAllison Collins <allison.henderson@oracle.com>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NBrian Foster <bfoster@redhat.com>
Reviewed-by: NChandan Rajendra <chandanrlinux@gmail.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Acked-by: NDave Chinner <dchinner@redhat.com>

0949d317

xfs: Pull up trans handling in xfs_attr3_leaf_flipflags · e3be1272

由 Allison Collins 提交于 7月 20, 2020

Since delayed operations cannot roll transactions, pull up the
transaction handling into the calling function
Signed-off-by: NAllison Collins <allison.henderson@oracle.com>
Reviewed-by: NBrian Foster <bfoster@redhat.com>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NChandan Rajendra <chandanrlinux@gmail.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Acked-by: NDave Chinner <dchinner@redhat.com>

e3be1272

xfs: Add xfs_has_attr and subroutines · 07120f1a

由 Allison Collins 提交于 7月 20, 2020

This patch adds a new functions to check for the existence of an
attribute. Subroutines are also added to handle the cases of leaf
blocks, nodes or shortform. Common code that appears in existing attr
add and remove functions have been factored out to help reduce the
appearance of duplicated code.  We will need these routines later for
delayed attributes since delayed operations cannot return error codes.
Signed-off-by: NAllison Collins <allison.henderson@oracle.com>
Reviewed-by: NChandan Rajendra <chandanrlinux@gmail.com>
Reviewed-by: NBrian Foster <bfoster@redhat.com>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
[darrick: fix a leak-on-error bug reported by Dan Carpenter]
[darrick: fix unused variable warning reported by 0day]
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Acked-by: NDave Chinner <dchinner@redhat.com>
Reported-by: dan.carpenter@oracle.com
Reported-by: Nkernel test robot <lkp@intel.com>

07120f1a

27 5月, 2020 1 次提交

xfs: more lockdep whackamole with kmem_alloc* · 6dcde60e

由 Darrick J. Wong 提交于 5月 26, 2020

Dave Airlie reported the following lockdep complaint:

>  ======================================================
>  WARNING: possible circular locking dependency detected
>  5.7.0-0.rc5.20200515git1ae7efb3.1.fc33.x86_64 #1 Not tainted
>  ------------------------------------------------------
>  kswapd0/159 is trying to acquire lock:
>  ffff9b38d01a4470 (&xfs_nondir_ilock_class){++++}-{3:3},
>  at: xfs_ilock+0xde/0x2c0 [xfs]
>
>  but task is already holding lock:
>  ffffffffbbb8bd00 (fs_reclaim){+.+.}-{0:0}, at:
>  __fs_reclaim_acquire+0x5/0x30
>
>  which lock already depends on the new lock.
>
>
>  the existing dependency chain (in reverse order) is:
>
>  -> #1 (fs_reclaim){+.+.}-{0:0}:
>         fs_reclaim_acquire+0x34/0x40
>         __kmalloc+0x4f/0x270
>         kmem_alloc+0x93/0x1d0 [xfs]
>         kmem_alloc_large+0x4c/0x130 [xfs]
>         xfs_attr_copy_value+0x74/0xa0 [xfs]
>         xfs_attr_get+0x9d/0xc0 [xfs]
>         xfs_get_acl+0xb6/0x200 [xfs]
>         get_acl+0x81/0x160
>         posix_acl_xattr_get+0x3f/0xd0
>         vfs_getxattr+0x148/0x170
>         getxattr+0xa7/0x240
>         path_getxattr+0x52/0x80
>         do_syscall_64+0x5c/0xa0
>         entry_SYSCALL_64_after_hwframe+0x49/0xb3
>
>  -> #0 (&xfs_nondir_ilock_class){++++}-{3:3}:
>         __lock_acquire+0x1257/0x20d0
>         lock_acquire+0xb0/0x310
>         down_write_nested+0x49/0x120
>         xfs_ilock+0xde/0x2c0 [xfs]
>         xfs_reclaim_inode+0x3f/0x400 [xfs]
>         xfs_reclaim_inodes_ag+0x20b/0x410 [xfs]
>         xfs_reclaim_inodes_nr+0x31/0x40 [xfs]
>         super_cache_scan+0x190/0x1e0
>         do_shrink_slab+0x184/0x420
>         shrink_slab+0x182/0x290
>         shrink_node+0x174/0x680
>         balance_pgdat+0x2d0/0x5f0
>         kswapd+0x21f/0x510
>         kthread+0x131/0x150
>         ret_from_fork+0x3a/0x50
>
>  other info that might help us debug this:
>
>   Possible unsafe locking scenario:
>
>         CPU0                    CPU1
>         ----                    ----
>    lock(fs_reclaim);
>                                 lock(&xfs_nondir_ilock_class);
>                                 lock(fs_reclaim);
>    lock(&xfs_nondir_ilock_class);
>
>   *** DEADLOCK ***
>
>  4 locks held by kswapd0/159:
>   #0: ffffffffbbb8bd00 (fs_reclaim){+.+.}-{0:0}, at:
>  __fs_reclaim_acquire+0x5/0x30
>   #1: ffffffffbbb7cef8 (shrinker_rwsem){++++}-{3:3}, at:
>  shrink_slab+0x115/0x290
>   #2: ffff9b39f07a50e8
>  (&type->s_umount_key#56){++++}-{3:3}, at: super_cache_scan+0x38/0x1e0
>   #3: ffff9b39f077f258
>  (&pag->pag_ici_reclaim_lock){+.+.}-{3:3}, at:
>  xfs_reclaim_inodes_ag+0x82/0x410 [xfs]

This is a known false positive because inodes cannot simultaneously be
getting reclaimed and the target of a getxattr operation, but lockdep
doesn't know that.  We can (selectively) shut up lockdep until either
it gets smarter or we change inode reclaim not to require the ILOCK by
applying a stupid GFP_NOLOCKDEP bandaid.
Reported-by: NDave Airlie <airlied@gmail.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Tested-by: NDave Airlie <airlied@gmail.com>
Reviewed-by: NBrian Foster <bfoster@redhat.com>

6dcde60e

20 5月, 2020 4 次提交

xfs: cleanup xfs_idestroy_fork · ef838512

由 Christoph Hellwig 提交于 5月 18, 2020

Move freeing the dynamically allocated attr and COW fork, as well
as zeroing the pointers where actually needed into the callers, and
just pass the xfs_ifork structure to xfs_idestroy_fork.  Also simplify
the kmem_free calls by not checking for NULL first.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NChandan Babu R <chandanrlinux@gmail.com>
Reviewed-by: NBrian Foster <bfoster@redhat.com>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

ef838512

xfs: move the fork format fields into struct xfs_ifork · f7e67b20

由 Christoph Hellwig 提交于 5月 18, 2020

Both the data and attr fork have a format that is stored in the legacy
idinode.  Move it into the xfs_ifork structure instead, where it uses
up padding.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NBrian Foster <bfoster@redhat.com>
Reviewed-by: NChandan Babu R <chandanrlinux@gmail.com>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

f7e67b20

xfs: move the per-fork nextents fields into struct xfs_ifork · daf83964

由 Christoph Hellwig 提交于 5月 18, 2020

There are there are three extents counters per inode, one for each of
the forks.  Two are in the legacy icdinode and one is directly in
struct xfs_inode.  Switch to a single counter in the xfs_ifork structure
where it uses up padding at the end of the structure.  This simplifies
various bits of code that just wants the number of extents counter and
can now directly dereference it.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NChandan Babu R <chandanrlinux@gmail.com>
Reviewed-by: NBrian Foster <bfoster@redhat.com>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

daf83964

xfs: don't fail verifier on empty attr3 leaf block · f28cef9e

由 Brian Foster 提交于 5月 14, 2020

The attr fork can transition from shortform to leaf format while
empty if the first xattr doesn't fit in shortform. While this empty
leaf block state is intended to be transient, it is technically not
due to the transactional implementation of the xattr set operation.

We historically have a couple of bandaids to work around this
problem. The first is to hold the buffer after the format conversion
to prevent premature writeback of the empty leaf buffer and the
second is to bypass the xattr count check in the verifier during
recovery. The latter assumes that the xattr set is also in the log
and will be recovered into the buffer soon after the empty leaf
buffer is reconstructed. This is not guaranteed, however.

If the filesystem crashes after the format conversion but before the
xattr set that induced it, only the format conversion may exist in
the log. When recovered, this creates a latent corrupted state on
the inode as any subsequent attempts to read the buffer fail due to
verifier failure. This includes further attempts to set xattrs on
the inode or attempts to destroy the attr fork, which prevents the
inode from ever being removed from the unlinked list.

To avoid this condition, accept that an empty attr leaf block is a
valid state and remove the count check from the verifier. This means
that on rare occasions an attr fork might exist in an unexpected
state, but is otherwise consistent and functional. Note that we
retain the logic to avoid racing with metadata writeback to reduce
the window where this can occur.
Signed-off-by: NBrian Foster <bfoster@redhat.com>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

f28cef9e

19 3月, 2020 1 次提交

xfs: only check the superblock version for dinode size calculation · e9e2eae8

由 Christoph Hellwig 提交于 3月 18, 2020

The size of the dinode structure is only dependent on the file system
version, so instead of checking the individual inode version just use
the newly added xfs_sb_version_has_large_dinode helper, and simplify
various calling conventions.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NBrian Foster <bfoster@redhat.com>
Reviewed-by: NChandan Rajendra <chandanrlinux@gmail.com>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

e9e2eae8

12 3月, 2020 1 次提交

xfs: add a function to deal with corrupt buffers post-verifiers · 8d57c216

由 Darrick J. Wong 提交于 3月 11, 2020

Add a helper function to get rid of buffers that we have decided are
corrupt after the verifiers have run.  This function is intended to
handle metadata checks that can't happen in the verifiers, such as
inter-block relationship checking.  Note that we now mark the buffer
stale so that it will not end up on any LRU and will be purged on
release.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NDave Chinner <dchinner@redhat.com>

8d57c216

03 3月, 2020 5 次提交

xfs: remove XFS_DA_OP_INCOMPLETE · 254f800f

由 Christoph Hellwig 提交于 2月 26, 2020

Now that we use the on-disk flags field also for the interface to the
lower level attr routines we can use the XFS_ATTR_INCOMPLETE definition
from the on-disk format directly instead.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NChandan Rajendra <chandanrlinux@gmail.com>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

254f800f

xfs: clean up the attr flag confusion · d5f0f49a

由 Christoph Hellwig 提交于 2月 26, 2020

The ATTR_* flags have a long IRIX history, where they a userspace
interface, the on-disk format and an internal interface.  We've split
out the on-disk interface to the XFS_ATTR_* values, but despite (or
because?) of that the flag have still been a mess.  Switch the
internal interface to pass the on-disk XFS_ATTR_* flags for the
namespace and the Linux XATTR_* flags for the actual flags instead.
The ATTR_* values that are actually used are move to xfs_fs.h with a
new XFS_IOC_* prefix to not conflict with the userspace version that
has the same name and must have the same value.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NChandan Rajendra <chandanrlinux@gmail.com>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

d5f0f49a

xfs: factor out a xfs_attr_match helper · 377f16ac

由 Christoph Hellwig 提交于 2月 26, 2020

Factor out a helper that compares an on-disk attr vs the name, length and
flags specified in struct xfs_da_args.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NChandan Rajendra <chandanrlinux@gmail.com>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

377f16ac

xfs: remove ATTR_ALLOC and XFS_DA_OP_ALLOCVAL · d49db18b

由 Christoph Hellwig 提交于 2月 26, 2020

Use a NULL args->value as the indicator to lazily allocate a buffer
instead, and let the caller always free args->value instead of
duplicating the cleanup.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NChandan Rajendra <chandanrlinux@gmail.com>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

d49db18b

xfs: remove ATTR_KERNOVAL · e513e25c

由 Christoph Hellwig 提交于 2月 26, 2020

We can just pass down the Linux convention of a zero valuelen to just
query for the existance of an attribute to the low-level code instead.
The use in the legacy xfs_attr_list code only used by the ioctl
interface was already dead code, as the callers check that the flag
is not present.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NChandan Rajendra <chandanrlinux@gmail.com>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

e513e25c

10 1月, 2020 1 次提交

xfs: fix misuse of the XFS_ATTR_INCOMPLETE flag · 780d2905

由 Christoph Hellwig 提交于 1月 07, 2020

XFS_ATTR_INCOMPLETE is a flag in the on-disk attribute format, and thus
in a different namespace as the ATTR_* flags in xfs_da_args.flags.
Switch to using a XFS_DA_OP_INCOMPLETE flag in op_flags instead.  Without
this users might be able to inject this flag into operations using the
attr by handle ioctl.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

780d2905

23 11月, 2019 3 次提交

xfs: remove the mappedbno argument to xfs_da_get_buf · 2911edb6

由 Christoph Hellwig 提交于 11月 20, 2019

Use the xfs_da_get_buf_daddr function directly for the two callers
that pass a mapped disk address, and then remove the mappedbno argument.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

2911edb6

xfs: remove the mappedbno argument to xfs_da_read_buf · cd2c9f1b

由 Christoph Hellwig 提交于 11月 20, 2019

Move the code for reading an already mapped block into
xfs_da3_node_read_mapped, which is the only caller ever passing a block
number in the mappedbno argument and replace the mappedbno argument with
the simple xfs_dabuf_get flags.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

cd2c9f1b

xfs: remove the mappedbno argument to xfs_attr3_leaf_read · dfb87594

由 Christoph Hellwig 提交于 11月 20, 2019

This argument is always hard coded to -1, so remove it.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

dfb87594

16 11月, 2019 1 次提交

xfs: fix attr leaf header freemap.size underflow · 2a2b5932

由 Brian Foster 提交于 11月 15, 2019

The leaf format xattr addition helper xfs_attr3_leaf_add_work()
adjusts the block freemap in a couple places. The first update drops
the size of the freemap that the caller had already selected to
place the xattr name/value data. Before the function returns, it
also checks whether the entries array has encroached on a freemap
range by virtue of the new entry addition. This is necessary because
the entries array grows from the start of the block (but end of the
block header) towards the end of the block while the name/value data
grows from the end of the block in the opposite direction. If the
associated freemap is already empty, however, size is zero and the
subtraction underflows the field and causes corruption.

This is reproduced rarely by generic/070. The observed behavior is
that a smaller sized freemap is aligned to the end of the entries
list, several subsequent xattr additions land in larger freemaps and
the entries list expands into the smaller freemap until it is fully
consumed and then underflows. Note that it is not otherwise a
corruption for the entries array to consume an empty freemap because
the nameval list (i.e. the firstused pointer in the xattr header)
starts beyond the end of the corrupted freemap.

Update the freemap size modification to account for the fact that
the freemap entry can be empty and thus stale.
Signed-off-by: NBrian Foster <bfoster@redhat.com>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

2a2b5932

11 11月, 2019 4 次提交

xfs: add a btree entries pointer to struct xfs_da3_icnode_hdr · 51908ca7

由 Christoph Hellwig 提交于 11月 08, 2019

All but two callers of the ->node_tree_p dir operation already have a
xfs_da3_icnode_hdr from a previous call to xfs_da3_node_hdr_from_disk at
hand.  Add a pointer to the btree entries to struct xfs_da3_icnode_hdr
to clean up this pattern.  The two remaining callers now expand the
whole header as well, but that isn't very expensive and not in a super
hot path anyway.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

51908ca7

xfs: devirtualize ->node_hdr_to_disk · e1c8af1e

由 Christoph Hellwig 提交于 11月 08, 2019

Replace the ->node_hdr_to_disk dir ops method with a directly called
xfs_da_node_hdr_to_disk helper that takes care of the v4 vs v5
difference.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

e1c8af1e

xfs: devirtualize ->node_hdr_from_disk · f475dc4d

由 Christoph Hellwig 提交于 11月 08, 2019

Replace the ->node_hdr_from_disk dir ops method with a directly called
xfs_da_node_hdr_from_disk helper that takes care of the v4 vs v5
difference.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

f475dc4d

xfs: Correct comment tyops -> typos · cf085a1b

由 Joe Perches 提交于 11月 07, 2019

Just fix the typos checkpatch notices...
Signed-off-by: NJoe Perches <joe@perches.com>
Reviewed-by: NBill O'Donnell <billodo@redhat.com>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

cf085a1b

05 11月, 2019 1 次提交

xfs: always log corruption errors · a5155b87

由 Darrick J. Wong 提交于 11月 02, 2019

Make sure we log something to dmesg whenever we return -EFSCORRUPTED up
the call stack.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NCarlos Maiolino <cmaiolino@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

a5155b87

30 10月, 2019 1 次提交

xfs: check attribute leaf block structure · c8476065

由 Darrick J. Wong 提交于 10月 28, 2019

Add missing structure checks in the attribute leaf verifier.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NBrian Foster <bfoster@redhat.com>

c8476065

22 10月, 2019 1 次提交

xfs: fix inode fork extent count overflow · 3f8a4f1d

由 Dave Chinner 提交于 10月 17, 2019

[commit message is verbose for discussion purposes - will trim it
down later. Some questions about implementation details at the end.]

Zorro Lang recently ran a new test to stress single inode extent
counts now that they are no longer limited by memory allocation.
The test was simply:

# xfs_io -f -c "falloc 0 40t" /mnt/scratch/big-file
# ~/src/xfstests-dev/punch-alternating /mnt/scratch/big-file

This test uncovered a problem where the hole punching operation
appeared to finish with no error, but apparently only created 268M
extents instead of the 10 billion it was supposed to.

Further, trying to punch out extents that should have been present
resulted in success, but no change in the extent count. It looked
like a silent failure.

While running the test and observing the behaviour in real time,
I observed the extent coutn growing at ~2M extents/minute, and saw
this after about an hour:

# xfs_io -f -c "stat" /mnt/scratch/big-file |grep next ; \
> sleep 60 ; \
> xfs_io -f -c "stat" /mnt/scratch/big-file |grep next
fsxattr.nextents = 127657993
fsxattr.nextents = 129683339
#

And a few minutes later this:

# xfs_io -f -c "stat" /mnt/scratch/big-file |grep next
fsxattr.nextents = 4177861124
#

Ah, what? Where did that 4 billion extra extents suddenly come from?

Stop the workload, unmount, mount:

# xfs_io -f -c "stat" /mnt/scratch/big-file |grep next
fsxattr.nextents = 166044375
#

And it's back at the expected number. i.e. the extent count is
correct on disk, but it's screwed up in memory. I loaded up the
extent list, and immediately:

# xfs_io -f -c "stat" /mnt/scratch/big-file |grep next
fsxattr.nextents = 4192576215
#

It's bad again. So, where does that number come from?
xfs_fill_fsxattr():

                if (ip->i_df.if_flags & XFS_IFEXTENTS)
                        fa->fsx_nextents = xfs_iext_count(&ip->i_df);
                else
                        fa->fsx_nextents = ip->i_d.di_nextents;

And that's the behaviour I just saw in a nutshell. The on disk count
is correct, but once the tree is loaded into memory, it goes whacky.
Clearly there's something wrong with xfs_iext_count():

inline xfs_extnum_t xfs_iext_count(struct xfs_ifork *ifp)
{
        return ifp->if_bytes / sizeof(struct xfs_iext_rec);
}

Simple enough, but 134M extents is 2**27, and that's right about
where things went wrong. A struct xfs_iext_rec is 16 bytes in size,
which means 2**27 * 2**4 = 2**31 and we're right on target for an
integer overflow. And, sure enough:

struct xfs_ifork {
        int                     if_bytes;       /* bytes in if_u1 */
....

Once we get 2**27 extents in a file, we overflow if_bytes and the
in-core extent count goes wrong. And when we reach 2**28 extents,
if_bytes wraps back to zero and things really start to go wrong
there. This is where the silent failure comes from - only the first
2**28 extents can be looked up directly due to the overflow, all the
extents above this index wrap back to somewhere in the first 2**28
extents. Hence with a regular pattern, trying to punch a hole in the
range that didn't have holes mapped to a hole in the first 2**28
extents and so "succeeded" without changing anything. Hence "silent
failure"...

Fix this by converting if_bytes to a int64_t and converting all the
index variables and size calculations to use int64_t types to avoid
overflows in future. Signed integers are still used to enable easy
detection of extent count underflows. This enables scalability of
extent counts to the limits of the on-disk format - MAXEXTNUM
(2**31) extents.

Current testing is at over 500M extents and still going:

fsxattr.nextents = 517310478
Reported-by: NZorro Lang <zlang@redhat.com>
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

3f8a4f1d

09 10月, 2019 3 次提交

xfs: move local to extent inode logging into bmap helper · aeea4b75

由 Brian Foster 提交于 10月 07, 2019

The callers of xfs_bmap_local_to_extents_empty() log the inode
external to the function, yet this function is where the on-disk
format value is updated. Push the inode logging down into the
function itself to help prevent future mistakes.

Note that internal bmap callers track the inode logging flags
independently and thus may log the inode core twice due to this
change. This is harmless, so leave this code around for consistency
with the other attr fork conversion functions.
Signed-off-by: NBrian Foster <bfoster@redhat.com>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

aeea4b75

xfs: remove broken error handling on failed attr sf to leaf change · 603efebd

由 Brian Foster 提交于 10月 07, 2019

xfs_attr_shortform_to_leaf() attempts to put the shortform fork back
together after a failed attempt to convert from shortform to leaf
format. While this code reallocates and copies back the shortform
attr fork data, it never resets the inode format field back to local
format. Further, now that the inode is properly logged after the
initial switch from local format, any error that triggers the
recovery code will eventually abort the transaction and shutdown the
fs. Therefore, remove the broken and unnecessary error handling
code.
Signed-off-by: NBrian Foster <bfoster@redhat.com>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

603efebd

xfs: log the inode on directory sf to block format change · 0b10d8a8

由 Brian Foster 提交于 10月 07, 2019

When a directory changes from shortform (sf) to block format, the sf
format is copied to a temporary buffer, the inode format is modified
and the updated format filled with the dentries from the temporary
buffer. If the inode format is modified and attempt to grow the
inode fails (due to I/O error, for example), it is possible to
return an error while leaving the directory in an inconsistent state
and with an otherwise clean transaction. This results in corruption
of the associated directory and leads to xfs_dabuf_map() errors as
subsequent lookups cannot accurately determine the format of the
directory. This problem is reproduced occasionally by generic/475.

The fundamental problem is that xfs_dir2_sf_to_block() changes the
on-disk inode format without logging the inode. The inode is
eventually logged by the bmapi layer in the common case, but error
checking introduces the possibility of failing the high level
request before this happens.

Update both of the dir2 and attr callers of
xfs_bmap_local_to_extents_empty() to log the inode core as
consistent with the bmap local to extent format change codepath.
This ensures that any subsequent errors after the format has changed
cause the transaction to abort.
Signed-off-by: NBrian Foster <bfoster@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

0b10d8a8

31 8月, 2019 3 次提交

xfs: allocate xattr buffer on demand · ddbca70c

由 Dave Chinner 提交于 8月 29, 2019

When doing file lookups and checking for permissions, we end up in
xfs_get_acl() to see if there are any ACLs on the inode. This
requires and xattr lookup, and to do that we have to supply a buffer
large enough to hold an maximum sized xattr.

On workloads were we are accessing a wide range of cache cold files
under memory pressure (e.g. NFS fileservers) we end up spending a
lot of time allocating the buffer. The buffer is 64k in length, so
is a contiguous multi-page allocation, and if that then fails we
fall back to vmalloc(). Hence the allocation here is /expensive/
when we are looking up hundreds of thousands of files a second.

Initial numbers from a bpf trace show average time in xfs_get_acl()
is ~32us, with ~19us of that in the memory allocation. Note these
are average times, so there are going to be affected by the worst
case allocations more than the common fast case...

To avoid this, we could just do a "null"  lookup to see if the ACL
xattr exists and then only do the allocation if it exists. This,
however, optimises the path for the "no ACL present" case at the
expense of the "acl present" case. i.e. we can halve the time in
xfs_get_acl() for the no acl case (i.e down to ~10-15us), but that
then increases the ACL case by 30% (i.e. up to 40-45us).

To solve this and speed up both cases, drive the xattr buffer
allocation into the attribute code once we know what the actual
xattr length is. For the no-xattr case, we avoid the allocation
completely, speeding up that case. For the common ACL case, we'll
end up with a fast heap allocation (because it'll be smaller than a
page), and only for the rarer "we have a remote xattr" will we have
a multi-page allocation occur. Hence the common ACL case will be
much faster, too.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

ddbca70c

xfs: consolidate attribute value copying · 9df243a1

由 Dave Chinner 提交于 8月 29, 2019

The same code is used to copy do the attribute copying in three
different places. Consolidate them into a single function in
preparation from on-demand buffer allocation.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

9df243a1

xfs: move remote attr retrieval into xfs_attr3_leaf_getvalue · e3cc4554

由 Dave Chinner 提交于 8月 29, 2019

Because we repeat exactly the same code to get the remote attribute
value after both calls to xfs_attr3_leaf_getvalue() if it's a remote
attr. Just do it in xfs_attr3_leaf_getvalue() so the callers don't
have to care about it.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

e3cc4554

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功