- 10 7月, 2022 1 次提交
-
-
由 Darrick J. Wong 提交于
We're about to make this logic do a bit more, so convert the macro to a static inline function for better typechecking and fewer shouty macros. No functional changes here. Signed-off-by: NDarrick J. Wong <djwong@kernel.org> Reviewed-by: NDave Chinner <dchinner@redhat.com>
-
- 29 6月, 2022 2 次提交
-
-
由 Darrick J. Wong 提交于
Now that we've established (again!) that empty xattr leaf buffers are ok, we no longer need to bhold them to transactions when we're creating new leaf blocks. Get rid of the entire mechanism, which should simplify the xattr code quite a bit. The original justification for using bhold here was to prevent the AIL from trying to write the empty leaf block into the fs during the brief time that we release the buffer lock. The reason for /that/ was to prevent recovery from tripping over the empty ondisk block. Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
-
由 Darrick J. Wong 提交于
TLDR: Revert commit 51e6104f ("xfs: detect empty attr leaf blocks in xfs_attr3_leaf_verify") because it was wrong. Every now and then we get a corruption report from the kernel or xfs_repair about empty leaf blocks in the extended attribute structure. We've long thought that these shouldn't be possible, but prior to 5.18 one would shake loose in the recoveryloop fstests about once a month. A new addition to the xattr leaf block verifier in 5.19-rc1 makes this happen every 7 minutes on my testing cloud. I added a ton of logging to detect any time we set the header count on an xattr leaf block to zero. This produced the following dmesg output on generic/388: XFS (sda4): ino 0x21fcbaf leaf 0x129bf78 hdcount==0! Call Trace: <TASK> dump_stack_lvl+0x34/0x44 xfs_attr3_leaf_create+0x187/0x230 xfs_attr_shortform_to_leaf+0xd1/0x2f0 xfs_attr_set_iter+0x73e/0xa90 xfs_xattri_finish_update+0x45/0x80 xfs_attr_finish_item+0x1b/0xd0 xfs_defer_finish_noroll+0x19c/0x770 __xfs_trans_commit+0x153/0x3e0 xfs_attr_set+0x36b/0x740 xfs_xattr_set+0x89/0xd0 __vfs_setxattr+0x67/0x80 __vfs_setxattr_noperm+0x6e/0x120 vfs_setxattr+0x97/0x180 setxattr+0x88/0xa0 path_setxattr+0xc3/0xe0 __x64_sys_setxattr+0x27/0x30 do_syscall_64+0x35/0x80 entry_SYSCALL_64_after_hwframe+0x46/0xb0 So now we know that someone is creating empty xattr leaf blocks as part of converting a sf xattr structure into a leaf xattr structure. The conversion routine logs any existing sf attributes in the same transaction that creates the leaf block, so we know this is a setxattr to a file that has no attributes at all. Next, g/388 calls the shutdown ioctl and cycles the mount to trigger log recovery. I also augmented buffer item recovery to call ->verify_struct on any attr leaf blocks and complain if it finds a failure: XFS (sda4): Unmounting Filesystem XFS (sda4): Mounting V5 Filesystem XFS (sda4): Starting recovery (logdev: internal) XFS (sda4): xattr leaf daddr 0x129bf78 hdrcount == 0! Call Trace: <TASK> dump_stack_lvl+0x34/0x44 xfs_attr3_leaf_verify+0x3b8/0x420 xlog_recover_buf_commit_pass2+0x60a/0x6c0 xlog_recover_items_pass2+0x4e/0xc0 xlog_recover_commit_trans+0x33c/0x350 xlog_recovery_process_trans+0xa5/0xe0 xlog_recover_process_data+0x8d/0x140 xlog_do_recovery_pass+0x19b/0x720 xlog_do_log_recovery+0x62/0xc0 xlog_do_recover+0x33/0x1d0 xlog_recover+0xda/0x190 xfs_log_mount+0x14c/0x360 xfs_mountfs+0x517/0xa60 xfs_fs_fill_super+0x6bc/0x950 get_tree_bdev+0x175/0x280 vfs_get_tree+0x1a/0x80 path_mount+0x6f5/0xaa0 __x64_sys_mount+0x103/0x140 do_syscall_64+0x35/0x80 entry_SYSCALL_64_after_hwframe+0x46/0xb0 RIP: 0033:0x7fc61e241eae And a moment later, the _delwri_submit of the recovered buffers trips the same verifier and recovery fails: XFS (sda4): Metadata corruption detected at xfs_attr3_leaf_verify+0x393/0x420 [xfs], xfs_attr3_leaf block 0x129bf78 XFS (sda4): Unmount and run xfs_repair XFS (sda4): First 128 bytes of corrupted metadata buffer: 00000000: 00 00 00 00 00 00 00 00 3b ee 00 00 00 00 00 00 ........;....... 00000010: 00 00 00 00 01 29 bf 78 00 00 00 00 00 00 00 00 .....).x........ 00000020: a5 1b d0 02 b2 9a 49 df 8e 9c fb 8d f8 31 3e 9d ......I......1>. 00000030: 00 00 00 00 02 1f cb af 00 00 00 00 10 00 00 00 ................ 00000040: 00 50 0f b0 00 00 00 00 00 00 00 00 00 00 00 00 .P.............. 00000050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00000060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00000070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ XFS (sda4): Corruption of in-memory data (0x8) detected at _xfs_buf_ioapply+0x37f/0x3b0 [xfs] (fs/xfs/xfs_buf.c:1518). Shutting down filesystem. XFS (sda4): Please unmount the filesystem and rectify the problem(s) XFS (sda4): log mount/recovery failed: error -117 XFS (sda4): log mount failed I think I see what's going on here -- setxattr is racing with something that shuts down the filesystem: Thread 1 Thread 2 -------- -------- xfs_attr_sf_addname xfs_attr_shortform_to_leaf <create empty leaf> xfs_trans_bhold(leaf) xattri_dela_state = XFS_DAS_LEAF_ADD <roll transaction> <flush log> <shut down filesystem> xfs_trans_bhold_release(leaf) <discover fs is dead, bail> Thread 3 -------- <cycle mount, start recovery> xlog_recover_buf_commit_pass2 xlog_recover_do_reg_buffer <replay empty leaf buffer from recovered buf item> xfs_buf_delwri_queue(leaf) xfs_buf_delwri_submit _xfs_buf_ioapply(leaf) xfs_attr3_leaf_write_verify <trip over empty leaf buffer> <fail recovery> As you can see, the bhold keeps the leaf buffer locked and thus prevents the *AIL* from tripping over the ichdr.count==0 check in the write verifier. Unfortunately, it doesn't prevent the log from getting flushed to disk, which sets up log recovery to fail. So. It's clear that the kernel has always had the ability to persist attr leaf blocks with ichdr.count==0, which means that it's part of the ondisk format now. Unfortunately, this check has been added and removed multiple times throughout history. It first appeared in[1] kernel 3.10 as part of the early V5 format patches. The check was later discovered to break log recovery and hence disabled[2] during log recovery in kernel 4.10. Simultaneously, the check was added[3] to xfs_repair 4.9.0 to try to weed out the empty leaf blocks. This was still not correct because log recovery would recover an empty attr leaf block successfully only for regular xattr operations to trip over the empty block during of the block during regular operation. Therefore, the check was removed entirely[4] in kernel 5.7 but removal of the xfs_repair check was forgotten. The continued complaints from xfs_repair lead to us mistakenly re-adding[5] the verifier check for kernel 5.19. Remove it once again. [1] 517c2220 ("xfs: add CRCs to attr leaf blocks") [2] 2e1d2337 ("xfs: ignore leaf attr ichdr.count in verifier during log replay") [3] f7140161 ("xfs_repair: junk leaf attribute if count == 0") [4] f28cef9e ("xfs: don't fail verifier on empty attr3 leaf block") [5] 51e6104f ("xfs: detect empty attr leaf blocks in xfs_attr3_leaf_verify") Looking at the rest of the xattr code, it seems that files with empty leaf blocks behave as expected -- listxattr reports no attributes; getxattr on any xattr returns nothing as expected; removexattr does nothing; and setxattr can add attributes just fine. Original-bug: 517c2220 ("xfs: add CRCs to attr leaf blocks") Still-not-fixed-by: 2e1d2337 ("xfs: ignore leaf attr ichdr.count in verifier during log replay") Removed-in: f28cef9e ("xfs: don't fail verifier on empty attr3 leaf block") Fixes: 51e6104f ("xfs: detect empty attr leaf blocks in xfs_attr3_leaf_verify") Signed-off-by: NDarrick J. Wong <djwong@kernel.org> Reviewed-by: NDave Chinner <dchinner@redhat.com>
-
- 16 6月, 2022 1 次提交
-
-
由 Darrick J. Wong 提交于
I found a race involving the larp control knob, aka the debugging knob that lets developers enable logging of extended attribute updates: Thread 1 Thread 2 echo 0 > /sys/fs/xfs/debug/larp setxattr(REPLACE) xfs_has_larp (returns false) xfs_attr_set echo 1 > /sys/fs/xfs/debug/larp xfs_attr_defer_replace xfs_attr_init_replace_state xfs_has_larp (returns true) xfs_attr_init_remove_state <oops, wrong DAS state!> This isn't a particularly severe problem right now because xattr logging is only enabled when CONFIG_XFS_DEBUG=y, and developers *should* know what they're doing. However, the eventual intent is that callers should be able to ask for the assistance of the log in persisting xattr updates. This capability might not be required for /all/ callers, which means that dynamic control must work correctly. Once an xattr update has decided whether or not to use logged xattrs, it needs to stay in that mode until the end of the operation regardless of what subsequent parallel operations might do. Therefore, it is an error to continue sampling xfs_globals.larp once xfs_attr_change has made a decision about larp, and it was not correct for me to have told Allison that ->create_intent functions can sample the global log incompat feature bitfield to decide to elide a log item. Instead, create a new op flag for the xfs_da_args structure, and convert all other callers of xfs_has_larp and xfs_sb_version_haslogxattrs within the attr update state machine to look for the operations flag. Signed-off-by: NDarrick J. Wong <djwong@kernel.org> Reviewed-by: NAllison Henderson <allison.henderson@oracle.com>
-
- 12 5月, 2022 3 次提交
-
-
由 Dave Chinner 提交于
xfs_repair flags these as a corruption error, so the verifier should catch software bugs that result in empty leaf blocks being written to disk, too. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NAllison Henderson <allison.henderson@oracle.com> Reviewed-by: NDarrick J. Wong <djwong@kernel.org> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Dave Chinner 提交于
We can't use the same algorithm for replacing an existing attribute when logging attributes. The existing algorithm is essentially: 1. create new attr w/ INCOMPLETE 2. atomically flip INCOMPLETE flags between old + new attribute 3. remove old attr which is marked w/ INCOMPLETE This algorithm guarantees that we see either the old or new attribute, and if we fail after the atomic flag flip, we don't have to recover the removal of the old attr because we never see INCOMPLETE attributes in lookups. For logged attributes, however, this does not work. The logged attribute intents do not track the work that has been done as the transaction rolls, and hence the only recovery mechanism we have is "run the replace operation from scratch". This is further exacerbated by the attempt to avoid needing the INCOMPLETE flag to create an atomic swap. This means we can create a second active attribute of the same name before we remove the original. If we fail at any point after the create but before the removal has completed, we end up with duplicate attributes in the attr btree and recovery only tries to replace one of them. There are several other failure modes where we can leave partially allocated remote attributes that expose stale data, partially free remote attributes that enable UAF based stale data exposure, etc. TO fix this, we need a different algorithm for replace operations when LARP is enabled. Luckily, it's not that complex if we take the right first step. That is, the first thing we log is the attri intent with the new name/value pair and mark the old attr as INCOMPLETE in the same transaction. From there, we then remove the old attr and keep relogging the new name/value in the intent, such that we always know that we have to create the new attr in recovery. Once the old attr is removed, we then run a normal ATTR_CREATE operation relogging the intent as we go. If the new attr is local, then it gets created in a single atomic transaction that also logs the final intent done. If the new attr is remote, the we set INCOMPLETE on the new attr while we allocate and set the remote value, and then we clear the INCOMPLETE flag at in the last transaction taht logs the final intent done. If we fail at any point in this algorithm, log recovery will always see the same state on disk: the new name/value in the intent, and either an INCOMPLETE attr or no attr in the attr btree. If we find an INCOMPLETE attr, we run the full replace starting with removing the INCOMPLETE attr. If we don't find it, then we simply create the new attr. Notably, recovery of a failed create that has an INCOMPLETE flag set is now the same - we start with the lookup of the INCOMPLETE attr, and if that exists then we do the full replace recovery process, otherwise we just create the new attr. Hence changing the way we do the replace operation when LARP is enabled allows us to use the same log recovery algorithm for both the ATTR_CREATE and ATTR_REPLACE operations. This is also the same algorithm we use for runtime ATTR_REPLACE operations (except for the step setting up the initial conditions). The result is that: - ATTR_CREATE uses the same algorithm regardless of whether LARP is enabled or not - ATTR_REPLACE with larp=0 is identical to the old algorithm - ATTR_REPLACE with larp=1 runs an unmodified attr removal algorithm from the larp=0 code and then runs the unmodified ATTR_CREATE code. - log recovery when larp=1 runs the same ATTR_REPLACE algorithm as it uses at runtime. Because the state machine is now quite clean, changing the algorithm is really just a case of changing the initial state and how the states link together for the ATTR_REPLACE case. Hence it's not a huge amount of code for what is a fairly substantial rework of the attr logging and recovery algorithm.... Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NAllison Henderson <allison.henderson@oracle.com> Reviewed-by: NDarrick J. Wong <djwong@kernel.org> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
由 Dave Chinner 提交于
We currently store the high level attr operation in args->attr_flags. This field contains what the VFS is telling us to do, but don't necessarily match what we are doing in the low level modification state machine. e.g. XATTR_REPLACE implies both XFS_DA_OP_ADDNAME and XFS_DA_OP_RENAME because it is doing both a remove and adding a new attr. However, deep in the individual state machine operations, we check errors against this high level VFS op flags, not the low level XFS_DA_OP flags. Indeed, we don't even have a low level flag for a REMOVE operation, so the only way we know we are doing a remove is the complete absence of XATTR_REPLACE, XATTR_CREATE, XFS_DA_OP_ADDNAME and XFS_DA_OP_RENAME. And because there are other flags in these fields, this is a pain to check if we need to. As the XFS_DA_OP flags are only needed once the deferred operations are set up, set these flags appropriately when we set the initial operation state. We also introduce a XFS_DA_OP_REMOVE flag to make it easy to know that we are doing a remove operation. With these, we can remove the use of XATTR_REPLACE and XATTR_CREATE in low level lookup operations, and manipulate the low level flags according to the low level context that is operating. e.g. log recovery does not have a VFS xattr operation state to copy into args->attr_flags, and the low level state machine ops we do for recovery do not match the high level VFS operations that were in progress when the system failed... Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NDarrick J. Wong <djwong@kernel.org> Reviewed-by: NAllison Henderson <allison.henderson@oracle.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
- 11 5月, 2022 1 次提交
-
-
由 Allison Henderson 提交于
Add an error tag on xfs_attr3_leaf_to_node to test log attribute recovery and replay. Signed-off-by: NCatherine Hoang <catherine.hoang@oracle.com> Reviewed-by: NAllison Henderson <allison.henderson@oracle.com> Reviewed-by: NDarrick J. Wong <djwong@kernel.org> Reviewed-by: NChandan Babu R <chandan.babu@oracle.com> Signed-off-by: NAllison Henderson <allison.henderson@oracle.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
- 09 5月, 2022 1 次提交
-
-
由 Allison Henderson 提交于
This is a clean up patch that skips the flip flag logic for delayed attr renames. Since the log replay keeps the inode locked, we do not need to worry about race windows with attr lookups. So we can skip over flipping the flag and the extra transaction roll for it Signed-off-by: NAllison Henderson <allison.henderson@oracle.com> Reviewed-by: NDarrick J. Wong <djwong@kernel.org> Reviewed-by: NChandan Babu R <chandanrlinux@gmail.com> Signed-off-by: NDave Chinner <david@fromorbit.com>
-
- 23 10月, 2021 1 次提交
-
-
由 Darrick J. Wong 提交于
Now that we've gotten rid of the kmem_zone_t typedef, rename the variables to _cache since that's what they are. Signed-off-by: NDarrick J. Wong <djwong@kernel.org> Reviewed-by: NChandan Babu R <chandan.babu@oracle.com>
-
- 20 8月, 2021 3 次提交
-
-
由 Dave Chinner 提交于
Stop directly referencing b_bn in code outside the buffer cache, as b_bn is supposed to be used only as an internal cache index. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDarrick J. Wong <djwong@kernel.org> Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
-
由 Dave Chinner 提交于
Replace m_flags feature checks with xfs_has_<feature>() calls and rework the setup code to set flags in m_features. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NDarrick J. Wong <djwong@kernel.org> Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
-
由 Dave Chinner 提交于
Convert the xfs_sb_version_hasfoo() to checks against mp->m_features. Checks of the superblock itself during disk operations (e.g. in the read/write verifiers and the to/from disk formatters) are not converted - they operate purely on the superblock state. Everything else should use the mount features. Large parts of this conversion were done with sed with commands like this: for f in `git grep -l xfs_sb_version_has fs/xfs/*.c`; do sed -i -e 's/xfs_sb_version_has\(.*\)(&\(.*\)->m_sb)/xfs_has_\1(\2)/' $f done With manual cleanups for things like "xfs_has_extflgbit" and other little inconsistencies in naming. The result is ia lot less typing to check features and an XFS binary size reduced by a bit over 3kB: $ size -t fs/xfs/built-in.a text data bss dec hex filenam before 1130866 311352 484 1442702 16038e (TOTALS) after 1127727 311352 484 1439563 15f74b (TOTALS) Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDarrick J. Wong <djwong@kernel.org> Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
-
- 10 8月, 2021 2 次提交
-
-
由 Dave Chinner 提交于
There is no reason for this wrapper existing anymore. All the places that use KM_NOFS allocation are within transaction contexts and hence covered by memalloc_nofs_save/restore contexts. Hence we don't need any special handling of vmalloc for large IOs anymore and so special casing this code isn't necessary. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NDarrick J. Wong <djwong@kernel.org> Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
-
由 Darrick J. Wong 提交于
Fix a few whitespace errors such as spaces at the end of the line, etc. This gets us back to something more closely resembling parity. Signed-off-by: NDarrick J. Wong <djwong@kernel.org> Reviewed-by: NChandan Babu R <chandanrlinux@gmail.com> Reviewed-by: NChristoph Hellwig <hch@lst.de>
-
- 10 6月, 2021 1 次提交
-
-
由 Allison Henderson 提交于
This patch renames the following functions to make the nameing scheme more consistent: xfs_attr_shortform_remove -> xfs_attr_sf_removename xfs_attr_node_remove_name -> xfs_attr_node_removename xfs_attr_set_fmt -> xfs_attr_sf_addname Suggested-by: NDarrick J. Wong <djwong@kernel.org> Signed-off-by: NAllison Henderson <allison.henderson@oracle.com> Reviewed-by: NChandan Babu R <chandanrlinux@gmail.com> Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
-
- 02 6月, 2021 2 次提交
-
-
由 Dave Chinner 提交于
They are AG functions, not superblock functions, so move them to the appropriate location. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
-
由 Allison Henderson 提交于
This patch modifies the attr remove routines to be delay ready. This means they no longer roll or commit transactions, but instead return -EAGAIN to have the calling routine roll and refresh the transaction. In this series, xfs_attr_remove_args is merged with xfs_attr_node_removename become a new function, xfs_attr_remove_iter. This new version uses a sort of state machine like switch to keep track of where it was when EAGAIN was returned. A new version of xfs_attr_remove_args consists of a simple loop to refresh the transaction until the operation is completed. A new XFS_DAC_DEFER_FINISH flag is used to finish the transaction where ever the existing code used to. Calls to xfs_attr_rmtval_remove are replaced with the delay ready version __xfs_attr_rmtval_remove. We will rename __xfs_attr_rmtval_remove back to xfs_attr_rmtval_remove when we are done. xfs_attr_rmtval_remove itself is still in use by the set routines (used during a rename). For reasons of preserving existing function, we modify xfs_attr_rmtval_remove to call xfs_defer_finish when the flag is set. Similar to how xfs_attr_remove_args does here. Once we transition the set routines to be delay ready, xfs_attr_rmtval_remove is no longer used and will be removed. This patch also adds a new struct xfs_delattr_context, which we will use to keep track of the current state of an attribute operation. The new xfs_delattr_state enum is used to track various operations that are in progress so that we know not to repeat them, and resume where we left off before EAGAIN was returned to cycle out the transaction. Other members take the place of local variables that need to retain their values across multiple function calls. See xfs_attr.h for a more detailed diagram of the states. Signed-off-by: NAllison Henderson <allison.henderson@oracle.com> Reviewed-by: NChandan Babu R <chandanrlinux@gmail.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
-
- 16 4月, 2021 2 次提交
-
-
由 Christoph Hellwig 提交于
The in-memory XFS_IFEXTENTS is now only used to check if an inode with extents still needs the extents to be read into memory before doing operations that need the extent map. Add a new xfs_need_iread_extents helper that returns true for btree format forks that do not have any entries in the in-memory extent btree, and use that instead of checking the XFS_IFEXTENTS flag. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDarrick J. Wong <djwong@kernel.org> Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
-
由 Christoph Hellwig 提交于
Just check for an inline format fork instead of the using the equivalent in-memory XFS_IFINLINE flag. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDarrick J. Wong <djwong@kernel.org> Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
-
- 08 4月, 2021 1 次提交
-
-
由 Christoph Hellwig 提交于
In preparation of removing the historic icinode struct, move the forkoff field into the containing xfs_inode structure. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDarrick J. Wong <djwong@kernel.org> Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
-
- 19 11月, 2020 1 次提交
-
-
由 Gao Xiang 提交于
Currently, commit e9e2eae8 dropped a (int) decoration from XFS_LITINO(mp), and since sizeof() expression is also involved, the result of XFS_LITINO(mp) is simply as the size_t type (commonly unsigned long). Considering the expression in xfs_attr_shortform_bytesfit(): offset = (XFS_LITINO(mp) - bytes) >> 3; let "bytes" be (int)340, and "XFS_LITINO(mp)" be (unsigned long)336. on 64-bit platform, the expression is offset = ((unsigned long)336 - (int)340) >> 3 = (int)(0xfffffffffffffffcUL >> 3) = -1 but on 32-bit platform, the expression is offset = ((unsigned long)336 - (int)340) >> 3 = (int)(0xfffffffcUL >> 3) = 0x1fffffff instead. so offset becomes a large positive number on 32-bit platform, and cause xfs_attr_shortform_bytesfit() returns maxforkoff rather than 0. Therefore, one result is "ASSERT(new_size <= XFS_IFORK_SIZE(ip, whichfork));" assertion failure in xfs_idata_realloc(), which was also the root cause of the original bugreport from Dennis, see: https://bugzilla.redhat.com/show_bug.cgi?id=1894177 And it can also be manually triggered with the following commands: $ touch a; $ setfattr -n user.0 -v "`seq 0 80`" a; $ setfattr -n user.1 -v "`seq 0 80`" a on 32-bit platform. Fix the case in xfs_attr_shortform_bytesfit() by bailing out "XFS_LITINO(mp) < bytes" in advance suggested by Eric and a misleading comment together with this bugfix suggested by Darrick. It seems the other users of XFS_LITINO(mp) are not impacted. Fixes: e9e2eae8 ("xfs: only check the superblock version for dinode size calculation") Cc: <stable@vger.kernel.org> # 5.7+ Reported-and-tested-by: NDennis Gilmore <dgilmore@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NGao Xiang <hsiangkao@redhat.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
- 16 9月, 2020 4 次提交
-
-
由 Carlos Maiolino 提交于
xfs_attr_sf_totsize() requires access to xfs_inode structure, so, once xfs_attr_shortform_addname() is its only user, move it to xfs_attr.c instead of playing with more #includes. Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Signed-off-by: NCarlos Maiolino <cmaiolino@redhat.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
由 Carlos Maiolino 提交于
nameval is a variable-size array, so, define it as it, and remove all the -1 magic number subtractions Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NCarlos Maiolino <cmaiolino@redhat.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
由 Carlos Maiolino 提交于
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NCarlos Maiolino <cmaiolino@redhat.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
由 Carlos Maiolino 提交于
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NCarlos Maiolino <cmaiolino@redhat.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
- 27 8月, 2020 2 次提交
-
-
由 Darrick J. Wong 提交于
Don't leak kernel memory contents into the shortform attr fork. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NEric Sandeen <sandeen@redhat.com> Reviewed-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de>
-
由 Eric Sandeen 提交于
The boundary test for the fixed-offset parts of xfs_attr_sf_entry in xfs_attr_shortform_verify is off by one, because the variable array at the end is defined as nameval[1] not nameval[]. Hence we need to subtract 1 from the calculation. This can be shown by: # touch file # setfattr -n root.a file and verifications will fail when it's written to disk. This only matters for a last attribute which has a single-byte name and no value, otherwise the combination of namelen & valuelen will push endp further out and this test won't fail. Fixes: 1e1bbd8e ("xfs: create structure verifier function for shortform xattrs") Signed-off-by: NEric Sandeen <sandeen@redhat.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NChristoph Hellwig <hch@lst.de>
-
- 29 7月, 2020 4 次提交
-
-
由 Allison Collins 提交于
New delayed allocation routines cannot be handling transactions so pull them out into the calling functions Signed-off-by: NAllison Collins <allison.henderson@oracle.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NChandan Rajendra <chandanrlinux@gmail.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Acked-by: NDave Chinner <dchinner@redhat.com>
-
由 Allison Collins 提交于
New delayed allocation routines cannot be handling transactions so pull them up into the calling functions Signed-off-by: NAllison Collins <allison.henderson@oracle.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NChandan Rajendra <chandanrlinux@gmail.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Acked-by: NDave Chinner <dchinner@redhat.com>
-
由 Allison Collins 提交于
Since delayed operations cannot roll transactions, pull up the transaction handling into the calling function Signed-off-by: NAllison Collins <allison.henderson@oracle.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NChandan Rajendra <chandanrlinux@gmail.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Acked-by: NDave Chinner <dchinner@redhat.com>
-
由 Allison Collins 提交于
This patch adds a new functions to check for the existence of an attribute. Subroutines are also added to handle the cases of leaf blocks, nodes or shortform. Common code that appears in existing attr add and remove functions have been factored out to help reduce the appearance of duplicated code. We will need these routines later for delayed attributes since delayed operations cannot return error codes. Signed-off-by: NAllison Collins <allison.henderson@oracle.com> Reviewed-by: NChandan Rajendra <chandanrlinux@gmail.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> [darrick: fix a leak-on-error bug reported by Dan Carpenter] [darrick: fix unused variable warning reported by 0day] Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Acked-by: NDave Chinner <dchinner@redhat.com> Reported-by: dan.carpenter@oracle.com Reported-by: Nkernel test robot <lkp@intel.com>
-
- 27 5月, 2020 1 次提交
-
-
由 Darrick J. Wong 提交于
Dave Airlie reported the following lockdep complaint: > ====================================================== > WARNING: possible circular locking dependency detected > 5.7.0-0.rc5.20200515git1ae7efb3.1.fc33.x86_64 #1 Not tainted > ------------------------------------------------------ > kswapd0/159 is trying to acquire lock: > ffff9b38d01a4470 (&xfs_nondir_ilock_class){++++}-{3:3}, > at: xfs_ilock+0xde/0x2c0 [xfs] > > but task is already holding lock: > ffffffffbbb8bd00 (fs_reclaim){+.+.}-{0:0}, at: > __fs_reclaim_acquire+0x5/0x30 > > which lock already depends on the new lock. > > > the existing dependency chain (in reverse order) is: > > -> #1 (fs_reclaim){+.+.}-{0:0}: > fs_reclaim_acquire+0x34/0x40 > __kmalloc+0x4f/0x270 > kmem_alloc+0x93/0x1d0 [xfs] > kmem_alloc_large+0x4c/0x130 [xfs] > xfs_attr_copy_value+0x74/0xa0 [xfs] > xfs_attr_get+0x9d/0xc0 [xfs] > xfs_get_acl+0xb6/0x200 [xfs] > get_acl+0x81/0x160 > posix_acl_xattr_get+0x3f/0xd0 > vfs_getxattr+0x148/0x170 > getxattr+0xa7/0x240 > path_getxattr+0x52/0x80 > do_syscall_64+0x5c/0xa0 > entry_SYSCALL_64_after_hwframe+0x49/0xb3 > > -> #0 (&xfs_nondir_ilock_class){++++}-{3:3}: > __lock_acquire+0x1257/0x20d0 > lock_acquire+0xb0/0x310 > down_write_nested+0x49/0x120 > xfs_ilock+0xde/0x2c0 [xfs] > xfs_reclaim_inode+0x3f/0x400 [xfs] > xfs_reclaim_inodes_ag+0x20b/0x410 [xfs] > xfs_reclaim_inodes_nr+0x31/0x40 [xfs] > super_cache_scan+0x190/0x1e0 > do_shrink_slab+0x184/0x420 > shrink_slab+0x182/0x290 > shrink_node+0x174/0x680 > balance_pgdat+0x2d0/0x5f0 > kswapd+0x21f/0x510 > kthread+0x131/0x150 > ret_from_fork+0x3a/0x50 > > other info that might help us debug this: > > Possible unsafe locking scenario: > > CPU0 CPU1 > ---- ---- > lock(fs_reclaim); > lock(&xfs_nondir_ilock_class); > lock(fs_reclaim); > lock(&xfs_nondir_ilock_class); > > *** DEADLOCK *** > > 4 locks held by kswapd0/159: > #0: ffffffffbbb8bd00 (fs_reclaim){+.+.}-{0:0}, at: > __fs_reclaim_acquire+0x5/0x30 > #1: ffffffffbbb7cef8 (shrinker_rwsem){++++}-{3:3}, at: > shrink_slab+0x115/0x290 > #2: ffff9b39f07a50e8 > (&type->s_umount_key#56){++++}-{3:3}, at: super_cache_scan+0x38/0x1e0 > #3: ffff9b39f077f258 > (&pag->pag_ici_reclaim_lock){+.+.}-{3:3}, at: > xfs_reclaim_inodes_ag+0x82/0x410 [xfs] This is a known false positive because inodes cannot simultaneously be getting reclaimed and the target of a getxattr operation, but lockdep doesn't know that. We can (selectively) shut up lockdep until either it gets smarter or we change inode reclaim not to require the ILOCK by applying a stupid GFP_NOLOCKDEP bandaid. Reported-by: NDave Airlie <airlied@gmail.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Tested-by: NDave Airlie <airlied@gmail.com> Reviewed-by: NBrian Foster <bfoster@redhat.com>
-
- 20 5月, 2020 4 次提交
-
-
由 Christoph Hellwig 提交于
Move freeing the dynamically allocated attr and COW fork, as well as zeroing the pointers where actually needed into the callers, and just pass the xfs_ifork structure to xfs_idestroy_fork. Also simplify the kmem_free calls by not checking for NULL first. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NChandan Babu R <chandanrlinux@gmail.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
由 Christoph Hellwig 提交于
Both the data and attr fork have a format that is stored in the legacy idinode. Move it into the xfs_ifork structure instead, where it uses up padding. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NChandan Babu R <chandanrlinux@gmail.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
由 Christoph Hellwig 提交于
There are there are three extents counters per inode, one for each of the forks. Two are in the legacy icdinode and one is directly in struct xfs_inode. Switch to a single counter in the xfs_ifork structure where it uses up padding at the end of the structure. This simplifies various bits of code that just wants the number of extents counter and can now directly dereference it. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NChandan Babu R <chandanrlinux@gmail.com> Reviewed-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
由 Brian Foster 提交于
The attr fork can transition from shortform to leaf format while empty if the first xattr doesn't fit in shortform. While this empty leaf block state is intended to be transient, it is technically not due to the transactional implementation of the xattr set operation. We historically have a couple of bandaids to work around this problem. The first is to hold the buffer after the format conversion to prevent premature writeback of the empty leaf buffer and the second is to bypass the xattr count check in the verifier during recovery. The latter assumes that the xattr set is also in the log and will be recovered into the buffer soon after the empty leaf buffer is reconstructed. This is not guaranteed, however. If the filesystem crashes after the format conversion but before the xattr set that induced it, only the format conversion may exist in the log. When recovered, this creates a latent corrupted state on the inode as any subsequent attempts to read the buffer fail due to verifier failure. This includes further attempts to set xattrs on the inode or attempts to destroy the attr fork, which prevents the inode from ever being removed from the unlinked list. To avoid this condition, accept that an empty attr leaf block is a valid state and remove the count check from the verifier. This means that on rare occasions an attr fork might exist in an unexpected state, but is otherwise consistent and functional. Note that we retain the logic to avoid racing with metadata writeback to reduce the window where this can occur. Signed-off-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NChristoph Hellwig <hch@lst.de>
-
- 19 3月, 2020 1 次提交
-
-
由 Christoph Hellwig 提交于
The size of the dinode structure is only dependent on the file system version, so instead of checking the individual inode version just use the newly added xfs_sb_version_has_large_dinode helper, and simplify various calling conventions. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NBrian Foster <bfoster@redhat.com> Reviewed-by: NChandan Rajendra <chandanrlinux@gmail.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-
- 12 3月, 2020 1 次提交
-
-
由 Darrick J. Wong 提交于
Add a helper function to get rid of buffers that we have decided are corrupt after the verifiers have run. This function is intended to handle metadata checks that can't happen in the verifiers, such as inter-block relationship checking. Note that we now mark the buffer stale so that it will not end up on any LRU and will be purged on release. Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com> Reviewed-by: NDave Chinner <dchinner@redhat.com>
-
- 03 3月, 2020 1 次提交
-
-
由 Christoph Hellwig 提交于
Now that we use the on-disk flags field also for the interface to the lower level attr routines we can use the XFS_ATTR_INCOMPLETE definition from the on-disk format directly instead. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChandan Rajendra <chandanrlinux@gmail.com> Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com> Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
-