1. 13 7月, 2022 1 次提交
  2. 10 7月, 2022 4 次提交
    • D
      xfs: use XFS_IFORK_Q to determine the presence of an xattr fork · e45d7cb2
      Darrick J. Wong 提交于
      Modify xfs_ifork_ptr to return a NULL pointer if the caller asks for the
      attribute fork but i_forkoff is zero.  This eliminates the ambiguity
      between i_forkoff and i_af.if_present, which should make it easier to
      understand the lifetime of attr forks.
      
      While we're at it, remove the if_present checks around calls to
      xfs_idestroy_fork and xfs_ifork_zap_attr since they can both handle attr
      forks that have already been torn down.
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      e45d7cb2
    • D
      xfs: make inode attribute forks a permanent part of struct xfs_inode · 2ed5b09b
      Darrick J. Wong 提交于
      Syzkaller reported a UAF bug a while back:
      
      ==================================================================
      BUG: KASAN: use-after-free in xfs_ilock_attr_map_shared+0xe3/0xf6 fs/xfs/xfs_inode.c:127
      Read of size 4 at addr ffff88802cec919c by task syz-executor262/2958
      
      CPU: 2 PID: 2958 Comm: syz-executor262 Not tainted
      5.15.0-0.30.3-20220406_1406 #3
      Hardware name: Red Hat KVM, BIOS 1.13.0-2.module+el8.3.0+7860+a7792d29
      04/01/2014
      Call Trace:
       <TASK>
       __dump_stack lib/dump_stack.c:88 [inline]
       dump_stack_lvl+0x82/0xa9 lib/dump_stack.c:106
       print_address_description.constprop.9+0x21/0x2d5 mm/kasan/report.c:256
       __kasan_report mm/kasan/report.c:442 [inline]
       kasan_report.cold.14+0x7f/0x11b mm/kasan/report.c:459
       xfs_ilock_attr_map_shared+0xe3/0xf6 fs/xfs/xfs_inode.c:127
       xfs_attr_get+0x378/0x4c2 fs/xfs/libxfs/xfs_attr.c:159
       xfs_xattr_get+0xe3/0x150 fs/xfs/xfs_xattr.c:36
       __vfs_getxattr+0xdf/0x13d fs/xattr.c:399
       cap_inode_need_killpriv+0x41/0x5d security/commoncap.c:300
       security_inode_need_killpriv+0x4c/0x97 security/security.c:1408
       dentry_needs_remove_privs.part.28+0x21/0x63 fs/inode.c:1912
       dentry_needs_remove_privs+0x80/0x9e fs/inode.c:1908
       do_truncate+0xc3/0x1e0 fs/open.c:56
       handle_truncate fs/namei.c:3084 [inline]
       do_open fs/namei.c:3432 [inline]
       path_openat+0x30ab/0x396d fs/namei.c:3561
       do_filp_open+0x1c4/0x290 fs/namei.c:3588
       do_sys_openat2+0x60d/0x98c fs/open.c:1212
       do_sys_open+0xcf/0x13c fs/open.c:1228
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x3a/0x7e arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0x0
      RIP: 0033:0x7f7ef4bb753d
      Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48
      89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73
      01 c3 48 8b 0d 1b 79 2c 00 f7 d8 64 89 01 48
      RSP: 002b:00007f7ef52c2ed8 EFLAGS: 00000246 ORIG_RAX: 0000000000000055
      RAX: ffffffffffffffda RBX: 0000000000404148 RCX: 00007f7ef4bb753d
      RDX: 00007f7ef4bb753d RSI: 0000000000000000 RDI: 0000000020004fc0
      RBP: 0000000000404140 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 0030656c69662f2e
      R13: 00007ffd794db37f R14: 00007ffd794db470 R15: 00007f7ef52c2fc0
       </TASK>
      
      Allocated by task 2953:
       kasan_save_stack+0x19/0x38 mm/kasan/common.c:38
       kasan_set_track mm/kasan/common.c:46 [inline]
       set_alloc_info mm/kasan/common.c:434 [inline]
       __kasan_slab_alloc+0x68/0x7c mm/kasan/common.c:467
       kasan_slab_alloc include/linux/kasan.h:254 [inline]
       slab_post_alloc_hook mm/slab.h:519 [inline]
       slab_alloc_node mm/slub.c:3213 [inline]
       slab_alloc mm/slub.c:3221 [inline]
       kmem_cache_alloc+0x11b/0x3eb mm/slub.c:3226
       kmem_cache_zalloc include/linux/slab.h:711 [inline]
       xfs_ifork_alloc+0x25/0xa2 fs/xfs/libxfs/xfs_inode_fork.c:287
       xfs_bmap_add_attrfork+0x3f2/0x9b1 fs/xfs/libxfs/xfs_bmap.c:1098
       xfs_attr_set+0xe38/0x12a7 fs/xfs/libxfs/xfs_attr.c:746
       xfs_xattr_set+0xeb/0x1a9 fs/xfs/xfs_xattr.c:59
       __vfs_setxattr+0x11b/0x177 fs/xattr.c:180
       __vfs_setxattr_noperm+0x128/0x5e0 fs/xattr.c:214
       __vfs_setxattr_locked+0x1d4/0x258 fs/xattr.c:275
       vfs_setxattr+0x154/0x33d fs/xattr.c:301
       setxattr+0x216/0x29f fs/xattr.c:575
       __do_sys_fsetxattr fs/xattr.c:632 [inline]
       __se_sys_fsetxattr fs/xattr.c:621 [inline]
       __x64_sys_fsetxattr+0x243/0x2fe fs/xattr.c:621
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x3a/0x7e arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0x0
      
      Freed by task 2949:
       kasan_save_stack+0x19/0x38 mm/kasan/common.c:38
       kasan_set_track+0x1c/0x21 mm/kasan/common.c:46
       kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:360
       ____kasan_slab_free mm/kasan/common.c:366 [inline]
       ____kasan_slab_free mm/kasan/common.c:328 [inline]
       __kasan_slab_free+0xe2/0x10e mm/kasan/common.c:374
       kasan_slab_free include/linux/kasan.h:230 [inline]
       slab_free_hook mm/slub.c:1700 [inline]
       slab_free_freelist_hook mm/slub.c:1726 [inline]
       slab_free mm/slub.c:3492 [inline]
       kmem_cache_free+0xdc/0x3ce mm/slub.c:3508
       xfs_attr_fork_remove+0x8d/0x132 fs/xfs/libxfs/xfs_attr_leaf.c:773
       xfs_attr_sf_removename+0x5dd/0x6cb fs/xfs/libxfs/xfs_attr_leaf.c:822
       xfs_attr_remove_iter+0x68c/0x805 fs/xfs/libxfs/xfs_attr.c:1413
       xfs_attr_remove_args+0xb1/0x10d fs/xfs/libxfs/xfs_attr.c:684
       xfs_attr_set+0xf1e/0x12a7 fs/xfs/libxfs/xfs_attr.c:802
       xfs_xattr_set+0xeb/0x1a9 fs/xfs/xfs_xattr.c:59
       __vfs_removexattr+0x106/0x16a fs/xattr.c:468
       cap_inode_killpriv+0x24/0x47 security/commoncap.c:324
       security_inode_killpriv+0x54/0xa1 security/security.c:1414
       setattr_prepare+0x1a6/0x897 fs/attr.c:146
       xfs_vn_change_ok+0x111/0x15e fs/xfs/xfs_iops.c:682
       xfs_vn_setattr_size+0x5f/0x15a fs/xfs/xfs_iops.c:1065
       xfs_vn_setattr+0x125/0x2ad fs/xfs/xfs_iops.c:1093
       notify_change+0xae5/0x10a1 fs/attr.c:410
       do_truncate+0x134/0x1e0 fs/open.c:64
       handle_truncate fs/namei.c:3084 [inline]
       do_open fs/namei.c:3432 [inline]
       path_openat+0x30ab/0x396d fs/namei.c:3561
       do_filp_open+0x1c4/0x290 fs/namei.c:3588
       do_sys_openat2+0x60d/0x98c fs/open.c:1212
       do_sys_open+0xcf/0x13c fs/open.c:1228
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x3a/0x7e arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0x0
      
      The buggy address belongs to the object at ffff88802cec9188
       which belongs to the cache xfs_ifork of size 40
      The buggy address is located 20 bytes inside of
       40-byte region [ffff88802cec9188, ffff88802cec91b0)
      The buggy address belongs to the page:
      page:00000000c3af36a1 refcount:1 mapcount:0 mapping:0000000000000000
      index:0x0 pfn:0x2cec9
      flags: 0xfffffc0000200(slab|node=0|zone=1|lastcpupid=0x1fffff)
      raw: 000fffffc0000200 ffffea00009d2580 0000000600000006 ffff88801a9ffc80
      raw: 0000000000000000 0000000080490049 00000001ffffffff 0000000000000000
      page dumped because: kasan: bad access detected
      
      Memory state around the buggy address:
       ffff88802cec9080: fb fb fb fc fc fa fb fb fb fb fc fc fb fb fb fb
       ffff88802cec9100: fb fc fc fb fb fb fb fb fc fc fb fb fb fb fb fc
      >ffff88802cec9180: fc fa fb fb fb fb fc fc fa fb fb fb fb fc fc fb
                                  ^
       ffff88802cec9200: fb fb fb fb fc fc fb fb fb fb fb fc fc fb fb fb
       ffff88802cec9280: fb fb fc fc fa fb fb fb fb fc fc fa fb fb fb fb
      ==================================================================
      
      The root cause of this bug is the unlocked access to xfs_inode.i_afp
      from the getxattr code paths while trying to determine which ILOCK mode
      to use to stabilize the xattr data.  Unfortunately, the VFS does not
      acquire i_rwsem when vfs_getxattr (or listxattr) call into the
      filesystem, which means that getxattr can race with a removexattr that's
      tearing down the attr fork and crash:
      
      xfs_attr_set:                          xfs_attr_get:
      xfs_attr_fork_remove:                  xfs_ilock_attr_map_shared:
      
      xfs_idestroy_fork(ip->i_afp);
      kmem_cache_free(xfs_ifork_cache, ip->i_afp);
      
                                             if (ip->i_afp &&
      
      ip->i_afp = NULL;
      
                                                 xfs_need_iread_extents(ip->i_afp))
                                             <KABOOM>
      
      ip->i_forkoff = 0;
      
      Regrettably, the VFS is much more lax about i_rwsem and getxattr than
      is immediately obvious -- not only does it not guarantee that we hold
      i_rwsem, it actually doesn't guarantee that we *don't* hold it either.
      The getxattr system call won't acquire the lock before calling XFS, but
      the file capabilities code calls getxattr with and without i_rwsem held
      to determine if the "security.capabilities" xattr is set on the file.
      
      Fixing the VFS locking requires a treewide investigation into every code
      path that could touch an xattr and what i_rwsem state it expects or sets
      up.  That could take years or even prove impossible; fortunately, we
      can fix this UAF problem inside XFS.
      
      An earlier version of this patch used smp_wmb in xfs_attr_fork_remove to
      ensure that i_forkoff is always zeroed before i_afp is set to null and
      changed the read paths to use smp_rmb before accessing i_forkoff and
      i_afp, which avoided these UAF problems.  However, the patch author was
      too busy dealing with other problems in the meantime, and by the time he
      came back to this issue, the situation had changed a bit.
      
      On a modern system with selinux, each inode will always have at least
      one xattr for the selinux label, so it doesn't make much sense to keep
      incurring the extra pointer dereference.  Furthermore, Allison's
      upcoming parent pointer patchset will also cause nearly every inode in
      the filesystem to have extended attributes.  Therefore, make the inode
      attribute fork structure part of struct xfs_inode, at a cost of 40 more
      bytes.
      
      This patch adds a clunky if_present field where necessary to maintain
      the existing logic of xattr fork null pointer testing in the existing
      codebase.  The next patch switches the logic over to XFS_IFORK_Q and it
      all goes away.
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      2ed5b09b
    • D
      xfs: convert XFS_IFORK_PTR to a static inline helper · 732436ef
      Darrick J. Wong 提交于
      We're about to make this logic do a bit more, so convert the macro to a
      static inline function for better typechecking and fewer shouty macros.
      No functional changes here.
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      732436ef
    • A
      xfs: removed useless condition in function xfs_attr_node_get · 0f38063d
      Andrey Strachuk 提交于
      At line 1561, variable "state" is being compared
      with NULL every loop iteration.
      
      -------------------------------------------------------------------
      1561	for (i = 0; state != NULL && i < state->path.active; i++) {
      1562		xfs_trans_brelse(args->trans, state->path.blk[i].bp);
      1563		state->path.blk[i].bp = NULL;
      1564	}
      -------------------------------------------------------------------
      
      However, it cannot be NULL.
      
      ----------------------------------------
      1546	state = xfs_da_state_alloc(args);
      ----------------------------------------
      
      xfs_da_state_alloc calls kmem_cache_zalloc. kmem_cache_zalloc is
      called with __GFP_NOFAIL flag and, therefore, it cannot return NULL.
      
      --------------------------------------------------------------------------
      	struct xfs_da_state *
      	xfs_da_state_alloc(
      	struct xfs_da_args	*args)
      	{
      		struct xfs_da_state	*state;
      
      		state = kmem_cache_zalloc(xfs_da_state_cache, GFP_NOFS | __GFP_NOFAIL);
      		state->args = args;
      		state->mp = args->dp->i_mount;
      		return state;
      	}
      --------------------------------------------------------------------------
      
      Found by Linux Verification Center (linuxtesting.org) with SVACE.
      Signed-off-by: NAndrey Strachuk <strochuk@ispras.ru>
      
      Fixes: 4d0cdd2b ("xfs: clean up xfs_attr_node_hasname")
      Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      0f38063d
  3. 07 7月, 2022 14 次提交
  4. 29 6月, 2022 2 次提交
    • D
      xfs: don't hold xattr leaf buffers across transaction rolls · e53bcffa
      Darrick J. Wong 提交于
      Now that we've established (again!) that empty xattr leaf buffers are
      ok, we no longer need to bhold them to transactions when we're creating
      new leaf blocks.  Get rid of the entire mechanism, which should simplify
      the xattr code quite a bit.
      
      The original justification for using bhold here was to prevent the AIL
      from trying to write the empty leaf block into the fs during the brief
      time that we release the buffer lock.  The reason for /that/ was to
      prevent recovery from tripping over the empty ondisk block.
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      e53bcffa
    • D
      xfs: empty xattr leaf header blocks are not corruption · 7be3bd88
      Darrick J. Wong 提交于
      TLDR: Revert commit 51e6104f ("xfs: detect empty attr leaf blocks in
      xfs_attr3_leaf_verify") because it was wrong.
      
      Every now and then we get a corruption report from the kernel or
      xfs_repair about empty leaf blocks in the extended attribute structure.
      We've long thought that these shouldn't be possible, but prior to 5.18
      one would shake loose in the recoveryloop fstests about once a month.
      
      A new addition to the xattr leaf block verifier in 5.19-rc1 makes this
      happen every 7 minutes on my testing cloud.  I added a ton of logging to
      detect any time we set the header count on an xattr leaf block to zero.
      This produced the following dmesg output on generic/388:
      
      XFS (sda4): ino 0x21fcbaf leaf 0x129bf78 hdcount==0!
      Call Trace:
       <TASK>
       dump_stack_lvl+0x34/0x44
       xfs_attr3_leaf_create+0x187/0x230
       xfs_attr_shortform_to_leaf+0xd1/0x2f0
       xfs_attr_set_iter+0x73e/0xa90
       xfs_xattri_finish_update+0x45/0x80
       xfs_attr_finish_item+0x1b/0xd0
       xfs_defer_finish_noroll+0x19c/0x770
       __xfs_trans_commit+0x153/0x3e0
       xfs_attr_set+0x36b/0x740
       xfs_xattr_set+0x89/0xd0
       __vfs_setxattr+0x67/0x80
       __vfs_setxattr_noperm+0x6e/0x120
       vfs_setxattr+0x97/0x180
       setxattr+0x88/0xa0
       path_setxattr+0xc3/0xe0
       __x64_sys_setxattr+0x27/0x30
       do_syscall_64+0x35/0x80
       entry_SYSCALL_64_after_hwframe+0x46/0xb0
      
      So now we know that someone is creating empty xattr leaf blocks as part
      of converting a sf xattr structure into a leaf xattr structure.  The
      conversion routine logs any existing sf attributes in the same
      transaction that creates the leaf block, so we know this is a setxattr
      to a file that has no attributes at all.
      
      Next, g/388 calls the shutdown ioctl and cycles the mount to trigger log
      recovery.  I also augmented buffer item recovery to call ->verify_struct
      on any attr leaf blocks and complain if it finds a failure:
      
      XFS (sda4): Unmounting Filesystem
      XFS (sda4): Mounting V5 Filesystem
      XFS (sda4): Starting recovery (logdev: internal)
      XFS (sda4): xattr leaf daddr 0x129bf78 hdrcount == 0!
      Call Trace:
       <TASK>
       dump_stack_lvl+0x34/0x44
       xfs_attr3_leaf_verify+0x3b8/0x420
       xlog_recover_buf_commit_pass2+0x60a/0x6c0
       xlog_recover_items_pass2+0x4e/0xc0
       xlog_recover_commit_trans+0x33c/0x350
       xlog_recovery_process_trans+0xa5/0xe0
       xlog_recover_process_data+0x8d/0x140
       xlog_do_recovery_pass+0x19b/0x720
       xlog_do_log_recovery+0x62/0xc0
       xlog_do_recover+0x33/0x1d0
       xlog_recover+0xda/0x190
       xfs_log_mount+0x14c/0x360
       xfs_mountfs+0x517/0xa60
       xfs_fs_fill_super+0x6bc/0x950
       get_tree_bdev+0x175/0x280
       vfs_get_tree+0x1a/0x80
       path_mount+0x6f5/0xaa0
       __x64_sys_mount+0x103/0x140
       do_syscall_64+0x35/0x80
       entry_SYSCALL_64_after_hwframe+0x46/0xb0
      RIP: 0033:0x7fc61e241eae
      
      And a moment later, the _delwri_submit of the recovered buffers trips
      the same verifier and recovery fails:
      
      XFS (sda4): Metadata corruption detected at xfs_attr3_leaf_verify+0x393/0x420 [xfs], xfs_attr3_leaf block 0x129bf78
      XFS (sda4): Unmount and run xfs_repair
      XFS (sda4): First 128 bytes of corrupted metadata buffer:
      00000000: 00 00 00 00 00 00 00 00 3b ee 00 00 00 00 00 00  ........;.......
      00000010: 00 00 00 00 01 29 bf 78 00 00 00 00 00 00 00 00  .....).x........
      00000020: a5 1b d0 02 b2 9a 49 df 8e 9c fb 8d f8 31 3e 9d  ......I......1>.
      00000030: 00 00 00 00 02 1f cb af 00 00 00 00 10 00 00 00  ................
      00000040: 00 50 0f b0 00 00 00 00 00 00 00 00 00 00 00 00  .P..............
      00000050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
      00000060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
      00000070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
      XFS (sda4): Corruption of in-memory data (0x8) detected at _xfs_buf_ioapply+0x37f/0x3b0 [xfs] (fs/xfs/xfs_buf.c:1518).  Shutting down filesystem.
      XFS (sda4): Please unmount the filesystem and rectify the problem(s)
      XFS (sda4): log mount/recovery failed: error -117
      XFS (sda4): log mount failed
      
      I think I see what's going on here -- setxattr is racing with something
      that shuts down the filesystem:
      
      Thread 1				Thread 2
      --------				--------
      xfs_attr_sf_addname
      xfs_attr_shortform_to_leaf
      <create empty leaf>
      xfs_trans_bhold(leaf)
      xattri_dela_state = XFS_DAS_LEAF_ADD
      <roll transaction>
      					<flush log>
      					<shut down filesystem>
      xfs_trans_bhold_release(leaf)
      <discover fs is dead, bail>
      
      Thread 3
      --------
      <cycle mount, start recovery>
      xlog_recover_buf_commit_pass2
      xlog_recover_do_reg_buffer
      <replay empty leaf buffer from recovered buf item>
      xfs_buf_delwri_queue(leaf)
      xfs_buf_delwri_submit
      _xfs_buf_ioapply(leaf)
      xfs_attr3_leaf_write_verify
      <trip over empty leaf buffer>
      <fail recovery>
      
      As you can see, the bhold keeps the leaf buffer locked and thus prevents
      the *AIL* from tripping over the ichdr.count==0 check in the write
      verifier.  Unfortunately, it doesn't prevent the log from getting
      flushed to disk, which sets up log recovery to fail.
      
      So.  It's clear that the kernel has always had the ability to persist
      attr leaf blocks with ichdr.count==0, which means that it's part of the
      ondisk format now.
      
      Unfortunately, this check has been added and removed multiple times
      throughout history.  It first appeared in[1] kernel 3.10 as part of the
      early V5 format patches.  The check was later discovered to break log
      recovery and hence disabled[2] during log recovery in kernel 4.10.
      Simultaneously, the check was added[3] to xfs_repair 4.9.0 to try to
      weed out the empty leaf blocks.  This was still not correct because log
      recovery would recover an empty attr leaf block successfully only for
      regular xattr operations to trip over the empty block during of the
      block during regular operation.  Therefore, the check was removed
      entirely[4] in kernel 5.7 but removal of the xfs_repair check was
      forgotten.  The continued complaints from xfs_repair lead to us
      mistakenly re-adding[5] the verifier check for kernel 5.19.  Remove it
      once again.
      
      [1] 517c2220 ("xfs: add CRCs to attr leaf blocks")
      [2] 2e1d2337 ("xfs: ignore leaf attr ichdr.count in verifier
                         during log replay")
      [3] f7140161 ("xfs_repair: junk leaf attribute if count == 0")
      [4] f28cef9e ("xfs: don't fail verifier on empty attr3 leaf
                         block")
      [5] 51e6104f ("xfs: detect empty attr leaf blocks in
                         xfs_attr3_leaf_verify")
      
      Looking at the rest of the xattr code, it seems that files with empty
      leaf blocks behave as expected -- listxattr reports no attributes;
      getxattr on any xattr returns nothing as expected; removexattr does
      nothing; and setxattr can add attributes just fine.
      
      Original-bug: 517c2220 ("xfs: add CRCs to attr leaf blocks")
      Still-not-fixed-by: 2e1d2337 ("xfs: ignore leaf attr ichdr.count in verifier during log replay")
      Removed-in: f28cef9e ("xfs: don't fail verifier on empty attr3 leaf block")
      Fixes: 51e6104f ("xfs: detect empty attr leaf blocks in xfs_attr3_leaf_verify")
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      7be3bd88
  5. 16 6月, 2022 2 次提交
    • D
      xfs: fix variable state usage · 10930b25
      Darrick J. Wong 提交于
      The variable @args is fed to a tracepoint, and that's the only place
      it's used.  This is fine for the kernel, but for userspace, tracepoints
      are #define'd out of existence, which results in this warning on gcc
      11.2:
      
      xfs_attr.c: In function ‘xfs_attr_node_try_addname’:
      xfs_attr.c:1440:42: warning: unused variable ‘args’ [-Wunused-variable]
       1440 |         struct xfs_da_args              *args = attr->xattri_da_args;
            |                                          ^~~~
      
      Clean this up.
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NAllison Henderson <allison.henderson@oracle.com>
      10930b25
    • D
      xfs: fix TOCTOU race involving the new logged xattrs control knob · f4288f01
      Darrick J. Wong 提交于
      I found a race involving the larp control knob, aka the debugging knob
      that lets developers enable logging of extended attribute updates:
      
      Thread 1			Thread 2
      
      echo 0 > /sys/fs/xfs/debug/larp
      				setxattr(REPLACE)
      				xfs_has_larp (returns false)
      				xfs_attr_set
      
      echo 1 > /sys/fs/xfs/debug/larp
      
      				xfs_attr_defer_replace
      				xfs_attr_init_replace_state
      				xfs_has_larp (returns true)
      				xfs_attr_init_remove_state
      
      				<oops, wrong DAS state!>
      
      This isn't a particularly severe problem right now because xattr logging
      is only enabled when CONFIG_XFS_DEBUG=y, and developers *should* know
      what they're doing.
      
      However, the eventual intent is that callers should be able to ask for
      the assistance of the log in persisting xattr updates.  This capability
      might not be required for /all/ callers, which means that dynamic
      control must work correctly.  Once an xattr update has decided whether
      or not to use logged xattrs, it needs to stay in that mode until the end
      of the operation regardless of what subsequent parallel operations might
      do.
      
      Therefore, it is an error to continue sampling xfs_globals.larp once
      xfs_attr_change has made a decision about larp, and it was not correct
      for me to have told Allison that ->create_intent functions can sample
      the global log incompat feature bitfield to decide to elide a log item.
      
      Instead, create a new op flag for the xfs_da_args structure, and convert
      all other callers of xfs_has_larp and xfs_sb_version_haslogxattrs within
      the attr update state machine to look for the operations flag.
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: NAllison Henderson <allison.henderson@oracle.com>
      f4288f01
  6. 27 5月, 2022 8 次提交
    • D
      xfs: move xfs_attr_use_log_assist usage out of libxfs · efc2efeb
      Darrick J. Wong 提交于
      The LARP patchset added an awkward coupling point between libxfs and
      what would be libxlog, if the XFS log were actually its own library.
      Move the code that sets up logged xattr updates out of libxfs and into
      xfs_xattr.c so that libxfs no longer has to know about xlog_* functions.
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      efc2efeb
    • D
      xfs: move xfs_attr_use_log_assist out of xfs_log.c · d9c61ccb
      Darrick J. Wong 提交于
      The LARP patchset added an awkward coupling point between libxfs and
      what would be libxlog, if the XFS log were actually its own library.
      Move the code that enables logged xattr updates out of "lib"xlog and into
      xfs_xattr.c so that it no longer has to know about xlog_* functions.
      
      While we're at it, give xfs_xattr.c its own header file.
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      d9c61ccb
    • D
      xfs: convert buf_cancel_table allocation to kmalloc_array · 910bbdf2
      Darrick J. Wong 提交于
      While we're messing around with how recovery allocates and frees the
      buffer cancellation table, convert the allocation to use kmalloc_array
      instead of the old kmem_alloc APIs, and make it handle a null return,
      even though that's not likely.
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      910bbdf2
    • D
      xfs: refactor buffer cancellation table allocation · 27232349
      Darrick J. Wong 提交于
      Move the code that allocates and frees the buffer cancellation tables
      used by log recovery into the file that actually uses the tables.  This
      is a precursor to some cleanups and a memory leak fix.
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      27232349
    • D
      xfs: don't leak btree cursor when insrec fails after a split · a54f78de
      Darrick J. Wong 提交于
      The recent patch to improve btree cycle checking caused a regression
      when I rebased the in-memory btree branch atop the 5.19 for-next branch,
      because in-memory short-pointer btrees do not have AG numbers.  This
      produced the following complaint from kmemleak:
      
      unreferenced object 0xffff88803d47dde8 (size 264):
        comm "xfs_io", pid 4889, jiffies 4294906764 (age 24.072s)
        hex dump (first 32 bytes):
          90 4d 0b 0f 80 88 ff ff 00 a0 bd 05 80 88 ff ff  .M..............
          e0 44 3a a0 ff ff ff ff 00 df 08 06 80 88 ff ff  .D:.............
        backtrace:
          [<ffffffffa0388059>] xfbtree_dup_cursor+0x49/0xc0 [xfs]
          [<ffffffffa029887b>] xfs_btree_dup_cursor+0x3b/0x200 [xfs]
          [<ffffffffa029af5d>] __xfs_btree_split+0x6ad/0x820 [xfs]
          [<ffffffffa029b130>] xfs_btree_split+0x60/0x110 [xfs]
          [<ffffffffa029f6da>] xfs_btree_make_block_unfull+0x19a/0x1f0 [xfs]
          [<ffffffffa029fada>] xfs_btree_insrec+0x3aa/0x810 [xfs]
          [<ffffffffa029fff3>] xfs_btree_insert+0xb3/0x240 [xfs]
          [<ffffffffa02cb729>] xfs_rmap_insert+0x99/0x200 [xfs]
          [<ffffffffa02cf142>] xfs_rmap_map_shared+0x192/0x5f0 [xfs]
          [<ffffffffa02cf60b>] xfs_rmap_map_raw+0x6b/0x90 [xfs]
          [<ffffffffa0384a85>] xrep_rmap_stash+0xd5/0x1d0 [xfs]
          [<ffffffffa0384dc0>] xrep_rmap_visit_bmbt+0xa0/0xf0 [xfs]
          [<ffffffffa0384fb6>] xrep_rmap_scan_iext+0x56/0xa0 [xfs]
          [<ffffffffa03850d8>] xrep_rmap_scan_ifork+0xd8/0x160 [xfs]
          [<ffffffffa0385195>] xrep_rmap_scan_inode+0x35/0x80 [xfs]
          [<ffffffffa03852ee>] xrep_rmap_find_rmaps+0x10e/0x270 [xfs]
      
      I noticed that xfs_btree_insrec has a bunch of debug code that return
      out of the function immediately, without freeing the "new" btree cursor
      that can be returned when _make_block_unfull calls xfs_btree_split.  Fix
      the error return in this function to free the btree cursor.
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      a54f78de
    • D
      xfs: assert in xfs_btree_del_cursor should take into account error · 56486f30
      Dave Chinner 提交于
      xfs/538 on a 1kB block filesystem failed with this assert:
      
      XFS: Assertion failed: cur->bc_btnum != XFS_BTNUM_BMAP || cur->bc_ino.allocated == 0 || xfs_is_shutdown(cur->bc_mp), file: fs/xfs/libxfs/xfs_btree.c, line: 448
      
      The problem was that an allocation failed unexpectedly in
      xfs_bmbt_alloc_block() after roughly 150,000 minlen allocation error
      injections, resulting in an EFSCORRUPTED error being returned to
      xfs_bmapi_write(). The error occurred on extent-to-btree format
      conversion allocating the new root block:
      
       RIP: 0010:xfs_bmbt_alloc_block+0x177/0x210
       Call Trace:
        <TASK>
        xfs_btree_new_iroot+0xdf/0x520
        xfs_btree_make_block_unfull+0x10d/0x1c0
        xfs_btree_insrec+0x364/0x790
        xfs_btree_insert+0xaa/0x210
        xfs_bmap_add_extent_hole_real+0x1fe/0x9a0
        xfs_bmapi_allocate+0x34c/0x420
        xfs_bmapi_write+0x53c/0x9c0
        xfs_alloc_file_space+0xee/0x320
        xfs_file_fallocate+0x36b/0x450
        vfs_fallocate+0x148/0x340
        __x64_sys_fallocate+0x3c/0x70
        do_syscall_64+0x35/0x80
        entry_SYSCALL_64_after_hwframe+0x44/0xa
      
      Why the allocation failed at this point is unknown, but is likely
      that we ran the transaction out of reserved space and filesystem out
      of space with bmbt blocks because of all the minlen allocations
      being done causing worst case fragmentation of a large allocation.
      
      Regardless of the cause, we've then called xfs_bmapi_finish() which
      calls xfs_btree_del_cursor(cur, error) to tear down the cursor.
      
      So we have a failed operation, error != 0, cur->bc_ino.allocated > 0
      and the filesystem is still up. The assert fails to take into
      account that allocation can fail with an error and the transaction
      teardown will shut the filesystem down if necessary. i.e. the
      assert needs to check "|| error != 0" as well, because at this point
      shutdown is pending because the current transaction is dirty....
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      56486f30
    • D
      xfs: don't assert fail on perag references on teardown · 5b55cbc2
      Dave Chinner 提交于
      Not fatal, the assert is there to catch developer attention. I'm
      seeing this occasionally during recoveryloop testing after a
      shutdown, and I don't want this to stop an overnight recoveryloop
      run as it is currently doing.
      
      Convert the ASSERT to a XFS_IS_CORRUPT() check so it will dump a
      corruption report into the log and cause a test failure that way,
      but it won't stop the machine dead.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      5b55cbc2
    • D
      xfs: avoid unnecessary runtime sibling pointer endian conversions · 5672225e
      Dave Chinner 提交于
      Commit dc04db2a has caused a small aim7 regression, showing a
      small increase in CPU usage in __xfs_btree_check_sblock() as a
      result of the extra checking.
      
      This is likely due to the endian conversion of the sibling poitners
      being unconditional instead of relying on the compiler to endian
      convert the NULL pointer at compile time and avoiding the runtime
      conversion for this common case.
      
      Rework the checks so that endian conversion of the sibling pointers
      is only done if they are not null as the original code did.
      
      .... and these need to be "inline" because the compiler completely
      fails to inline them automatically like it should be doing.
      
      $ size fs/xfs/libxfs/xfs_btree.o*
         text	   data	    bss	    dec	    hex	filename
        51874	    240	      0	  52114	   cb92 fs/xfs/libxfs/xfs_btree.o.orig
        51562	    240	      0	  51802	   ca5a fs/xfs/libxfs/xfs_btree.o.inline
      
      Just when you think the tools have advanced sufficiently we don't
      have to care about stuff like this anymore, along comes a reminder
      that *our tools still suck*.
      
      Fixes: dc04db2a ("xfs: detect self referencing btree sibling pointers")
      Reported-by: Nkernel test robot <oliver.sang@intel.com>
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      5672225e
  7. 23 5月, 2022 2 次提交
    • D
      xfs: share xattr name and value buffers when logging xattr updates · 4183e4f2
      Darrick J. Wong 提交于
      While running xfs/297 and generic/642, I noticed a crash in
      xfs_attri_item_relog when it tries to copy the attr name to the new
      xattri log item.  I think what happened here was that we called
      ->iop_commit on the old attri item (which nulls out the pointers) as
      part of a log force at the same time that a chained attr operation was
      ongoing.  The system was busy enough that at some later point, the defer
      ops operation decided it was necessary to relog the attri log item, but
      as we've detached the name buffer from the old attri log item, we can't
      copy it to the new one, and kaboom.
      
      I think there's a broader refcounting problem with LARP mode -- the
      setxattr code can return to userspace before the CIL actually formats
      and commits the log item, which results in a UAF bug.  Therefore, the
      xattr log item needs to be able to retain a reference to the name and
      value buffers until the log items have completely cleared the log.
      Furthermore, each time we create an intent log item, we allocate new
      memory and (re)copy the contents; sharing here would be very useful.
      
      Solve the UAF and the unnecessary memory allocations by having the log
      code create a single refcounted buffer to contain the name and value
      contents.  This buffer can be passed from old to new during a relog
      operation, and the logging code can (optionally) attach it to the
      xfs_attr_item for reuse when LARP mode is enabled.
      
      This also fixes a problem where the xfs_attri_log_item objects weren't
      being freed back to the same cache where they came from.
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      4183e4f2
    • D
      xfs: do not use logged xattr updates on V4 filesystems · 22a68ba7
      Darrick J. Wong 提交于
      V4 superblocks do not contain the log_incompat feature bit, which means
      that we cannot protect xattr log items against kernels that are too old
      to know how to recover them.  Turn off the log items for such
      filesystems and adjust the "delayed" name to reflect what it's really
      controlling.
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      22a68ba7
  8. 22 5月, 2022 7 次提交