1. 12 4月, 2023 4 次提交
    • D
      xfs: replace XFS_IFORK_Q with a proper predicate function · ad92c988
      Darrick J. Wong 提交于
      mainline inclusion
      from mainline-v5.19-rc5
      commit 932b42c6
      category: bugfix
      bugzilla: 187164, https://gitee.com/openeuler/kernel/issues/I4KIAO
      CVE: NA
      
      Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=932b42c66cb5d0ca9800b128415b4ad6b1952b3e
      
      --------------------------------
      
      Replace this shouty macro with a real C function that has a more
      descriptive name.
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      
      conflicts:
      	fs/xfs/libxfs/xfs_attr.h
      	fs/xfs/libxfs/xfs_inode_fork.c
      	fs/xfs/scrub/btree.c
      	fs/xfs/xfs_inode.c
      Signed-off-by: NLong Li <leo.lilong@huawei.com>
      Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
      Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
      ad92c988
    • D
      xfs: use XFS_IFORK_Q to determine the presence of an xattr fork · d79239ec
      Darrick J. Wong 提交于
      mainline inclusion
      from mainline-v5.19-rc5
      commit e45d7cb2
      category: bugfix
      bugzilla: 187164, https://gitee.com/openeuler/kernel/issues/I4KIAO
      CVE: NA
      
      Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e45d7cb2356e6b59fe64da28324025cc6fcd3fbd
      
      --------------------------------
      
      Modify xfs_ifork_ptr to return a NULL pointer if the caller asks for the
      attribute fork but i_forkoff is zero.  This eliminates the ambiguity
      between i_forkoff and i_af.if_present, which should make it easier to
      understand the lifetime of attr forks.
      
      While we're at it, remove the if_present checks around calls to
      xfs_idestroy_fork and xfs_ifork_zap_attr since they can both handle attr
      forks that have already been torn down.
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      
      conflicts:
      	fs/xfs/libxfs/xfs_attr.h
      	fs/xfs/libxfs/xfs_inode_fork.c
      	fs/xfs/libxfs/xfs_inode_fork.h
      	fs/xfs/xfs_icache.c
      	fs/xfs/xfs_inode.c
      Signed-off-by: NLong Li <leo.lilong@huawei.com>
      Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
      Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
      d79239ec
    • D
      xfs: make inode attribute forks a permanent part of struct xfs_inode · 947546ab
      Darrick J. Wong 提交于
      mainline inclusion
      from mainline-v5.19-rc5
      commit 2ed5b09b
      category: bugfix
      bugzilla: 187164, https://gitee.com/openeuler/kernel/issues/I4KIAO
      CVE: NA
      
      Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=2ed5b09b3e8fc274ae8fecd6ab7c5106a364bed1
      
      --------------------------------
      
      Syzkaller reported a UAF bug a while back:
      
      ==================================================================
      BUG: KASAN: use-after-free in xfs_ilock_attr_map_shared+0xe3/0xf6 fs/xfs/xfs_inode.c:127
      Read of size 4 at addr ffff88802cec919c by task syz-executor262/2958
      
      CPU: 2 PID: 2958 Comm: syz-executor262 Not tainted
      5.15.0-0.30.3-20220406_1406 #3
      Hardware name: Red Hat KVM, BIOS 1.13.0-2.module+el8.3.0+7860+a7792d29
      04/01/2014
      Call Trace:
       <TASK>
       __dump_stack lib/dump_stack.c:88 [inline]
       dump_stack_lvl+0x82/0xa9 lib/dump_stack.c:106
       print_address_description.constprop.9+0x21/0x2d5 mm/kasan/report.c:256
       __kasan_report mm/kasan/report.c:442 [inline]
       kasan_report.cold.14+0x7f/0x11b mm/kasan/report.c:459
       xfs_ilock_attr_map_shared+0xe3/0xf6 fs/xfs/xfs_inode.c:127
       xfs_attr_get+0x378/0x4c2 fs/xfs/libxfs/xfs_attr.c:159
       xfs_xattr_get+0xe3/0x150 fs/xfs/xfs_xattr.c:36
       __vfs_getxattr+0xdf/0x13d fs/xattr.c:399
       cap_inode_need_killpriv+0x41/0x5d security/commoncap.c:300
       security_inode_need_killpriv+0x4c/0x97 security/security.c:1408
       dentry_needs_remove_privs.part.28+0x21/0x63 fs/inode.c:1912
       dentry_needs_remove_privs+0x80/0x9e fs/inode.c:1908
       do_truncate+0xc3/0x1e0 fs/open.c:56
       handle_truncate fs/namei.c:3084 [inline]
       do_open fs/namei.c:3432 [inline]
       path_openat+0x30ab/0x396d fs/namei.c:3561
       do_filp_open+0x1c4/0x290 fs/namei.c:3588
       do_sys_openat2+0x60d/0x98c fs/open.c:1212
       do_sys_open+0xcf/0x13c fs/open.c:1228
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x3a/0x7e arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0x0
      RIP: 0033:0x7f7ef4bb753d
      Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48
      89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73
      01 c3 48 8b 0d 1b 79 2c 00 f7 d8 64 89 01 48
      RSP: 002b:00007f7ef52c2ed8 EFLAGS: 00000246 ORIG_RAX: 0000000000000055
      RAX: ffffffffffffffda RBX: 0000000000404148 RCX: 00007f7ef4bb753d
      RDX: 00007f7ef4bb753d RSI: 0000000000000000 RDI: 0000000020004fc0
      RBP: 0000000000404140 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 0030656c69662f2e
      R13: 00007ffd794db37f R14: 00007ffd794db470 R15: 00007f7ef52c2fc0
       </TASK>
      
      Allocated by task 2953:
       kasan_save_stack+0x19/0x38 mm/kasan/common.c:38
       kasan_set_track mm/kasan/common.c:46 [inline]
       set_alloc_info mm/kasan/common.c:434 [inline]
       __kasan_slab_alloc+0x68/0x7c mm/kasan/common.c:467
       kasan_slab_alloc include/linux/kasan.h:254 [inline]
       slab_post_alloc_hook mm/slab.h:519 [inline]
       slab_alloc_node mm/slub.c:3213 [inline]
       slab_alloc mm/slub.c:3221 [inline]
       kmem_cache_alloc+0x11b/0x3eb mm/slub.c:3226
       kmem_cache_zalloc include/linux/slab.h:711 [inline]
       xfs_ifork_alloc+0x25/0xa2 fs/xfs/libxfs/xfs_inode_fork.c:287
       xfs_bmap_add_attrfork+0x3f2/0x9b1 fs/xfs/libxfs/xfs_bmap.c:1098
       xfs_attr_set+0xe38/0x12a7 fs/xfs/libxfs/xfs_attr.c:746
       xfs_xattr_set+0xeb/0x1a9 fs/xfs/xfs_xattr.c:59
       __vfs_setxattr+0x11b/0x177 fs/xattr.c:180
       __vfs_setxattr_noperm+0x128/0x5e0 fs/xattr.c:214
       __vfs_setxattr_locked+0x1d4/0x258 fs/xattr.c:275
       vfs_setxattr+0x154/0x33d fs/xattr.c:301
       setxattr+0x216/0x29f fs/xattr.c:575
       __do_sys_fsetxattr fs/xattr.c:632 [inline]
       __se_sys_fsetxattr fs/xattr.c:621 [inline]
       __x64_sys_fsetxattr+0x243/0x2fe fs/xattr.c:621
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x3a/0x7e arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0x0
      
      Freed by task 2949:
       kasan_save_stack+0x19/0x38 mm/kasan/common.c:38
       kasan_set_track+0x1c/0x21 mm/kasan/common.c:46
       kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:360
       ____kasan_slab_free mm/kasan/common.c:366 [inline]
       ____kasan_slab_free mm/kasan/common.c:328 [inline]
       __kasan_slab_free+0xe2/0x10e mm/kasan/common.c:374
       kasan_slab_free include/linux/kasan.h:230 [inline]
       slab_free_hook mm/slub.c:1700 [inline]
       slab_free_freelist_hook mm/slub.c:1726 [inline]
       slab_free mm/slub.c:3492 [inline]
       kmem_cache_free+0xdc/0x3ce mm/slub.c:3508
       xfs_attr_fork_remove+0x8d/0x132 fs/xfs/libxfs/xfs_attr_leaf.c:773
       xfs_attr_sf_removename+0x5dd/0x6cb fs/xfs/libxfs/xfs_attr_leaf.c:822
       xfs_attr_remove_iter+0x68c/0x805 fs/xfs/libxfs/xfs_attr.c:1413
       xfs_attr_remove_args+0xb1/0x10d fs/xfs/libxfs/xfs_attr.c:684
       xfs_attr_set+0xf1e/0x12a7 fs/xfs/libxfs/xfs_attr.c:802
       xfs_xattr_set+0xeb/0x1a9 fs/xfs/xfs_xattr.c:59
       __vfs_removexattr+0x106/0x16a fs/xattr.c:468
       cap_inode_killpriv+0x24/0x47 security/commoncap.c:324
       security_inode_killpriv+0x54/0xa1 security/security.c:1414
       setattr_prepare+0x1a6/0x897 fs/attr.c:146
       xfs_vn_change_ok+0x111/0x15e fs/xfs/xfs_iops.c:682
       xfs_vn_setattr_size+0x5f/0x15a fs/xfs/xfs_iops.c:1065
       xfs_vn_setattr+0x125/0x2ad fs/xfs/xfs_iops.c:1093
       notify_change+0xae5/0x10a1 fs/attr.c:410
       do_truncate+0x134/0x1e0 fs/open.c:64
       handle_truncate fs/namei.c:3084 [inline]
       do_open fs/namei.c:3432 [inline]
       path_openat+0x30ab/0x396d fs/namei.c:3561
       do_filp_open+0x1c4/0x290 fs/namei.c:3588
       do_sys_openat2+0x60d/0x98c fs/open.c:1212
       do_sys_open+0xcf/0x13c fs/open.c:1228
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x3a/0x7e arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x44/0x0
      
      The buggy address belongs to the object at ffff88802cec9188
       which belongs to the cache xfs_ifork of size 40
      The buggy address is located 20 bytes inside of
       40-byte region [ffff88802cec9188, ffff88802cec91b0)
      The buggy address belongs to the page:
      page:00000000c3af36a1 refcount:1 mapcount:0 mapping:0000000000000000
      index:0x0 pfn:0x2cec9
      flags: 0xfffffc0000200(slab|node=0|zone=1|lastcpupid=0x1fffff)
      raw: 000fffffc0000200 ffffea00009d2580 0000000600000006 ffff88801a9ffc80
      raw: 0000000000000000 0000000080490049 00000001ffffffff 0000000000000000
      page dumped because: kasan: bad access detected
      
      Memory state around the buggy address:
       ffff88802cec9080: fb fb fb fc fc fa fb fb fb fb fc fc fb fb fb fb
       ffff88802cec9100: fb fc fc fb fb fb fb fb fc fc fb fb fb fb fb fc
      >ffff88802cec9180: fc fa fb fb fb fb fc fc fa fb fb fb fb fc fc fb
                                  ^
       ffff88802cec9200: fb fb fb fb fc fc fb fb fb fb fb fc fc fb fb fb
       ffff88802cec9280: fb fb fc fc fa fb fb fb fb fc fc fa fb fb fb fb
      ==================================================================
      
      The root cause of this bug is the unlocked access to xfs_inode.i_afp
      from the getxattr code paths while trying to determine which ILOCK mode
      to use to stabilize the xattr data.  Unfortunately, the VFS does not
      acquire i_rwsem when vfs_getxattr (or listxattr) call into the
      filesystem, which means that getxattr can race with a removexattr that's
      tearing down the attr fork and crash:
      
      xfs_attr_set:                          xfs_attr_get:
      xfs_attr_fork_remove:                  xfs_ilock_attr_map_shared:
      
      xfs_idestroy_fork(ip->i_afp);
      kmem_cache_free(xfs_ifork_cache, ip->i_afp);
      
                                             if (ip->i_afp &&
      
      ip->i_afp = NULL;
      
                                                 xfs_need_iread_extents(ip->i_afp))
                                             <KABOOM>
      
      ip->i_forkoff = 0;
      
      Regrettably, the VFS is much more lax about i_rwsem and getxattr than
      is immediately obvious -- not only does it not guarantee that we hold
      i_rwsem, it actually doesn't guarantee that we *don't* hold it either.
      The getxattr system call won't acquire the lock before calling XFS, but
      the file capabilities code calls getxattr with and without i_rwsem held
      to determine if the "security.capabilities" xattr is set on the file.
      
      Fixing the VFS locking requires a treewide investigation into every code
      path that could touch an xattr and what i_rwsem state it expects or sets
      up.  That could take years or even prove impossible; fortunately, we
      can fix this UAF problem inside XFS.
      
      An earlier version of this patch used smp_wmb in xfs_attr_fork_remove to
      ensure that i_forkoff is always zeroed before i_afp is set to null and
      changed the read paths to use smp_rmb before accessing i_forkoff and
      i_afp, which avoided these UAF problems.  However, the patch author was
      too busy dealing with other problems in the meantime, and by the time he
      came back to this issue, the situation had changed a bit.
      
      On a modern system with selinux, each inode will always have at least
      one xattr for the selinux label, so it doesn't make much sense to keep
      incurring the extra pointer dereference.  Furthermore, Allison's
      upcoming parent pointer patchset will also cause nearly every inode in
      the filesystem to have extended attributes.  Therefore, make the inode
      attribute fork structure part of struct xfs_inode, at a cost of 40 more
      bytes.
      
      This patch adds a clunky if_present field where necessary to maintain
      the existing logic of xattr fork null pointer testing in the existing
      codebase.  The next patch switches the logic over to XFS_IFORK_Q and it
      all goes away.
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      
      conflicts:
      	fs/xfs/libxfs/xfs_attr.c
      	fs/xfs/libxfs/xfs_attr.h
      	fs/xfs/libxfs/xfs_attr_leaf.c
      	fs/xfs/libxfs/xfs_bmap.c
      	fs/xfs/libxfs/xfs_inode_buf.c
      	fs/xfs/libxfs/xfs_inode_fork.c
      	fs/xfs/libxfs/xfs_inode_fork.h
      	fs/xfs/xfs_attr_inactive.c
      	fs/xfs/xfs_attr_list.c
      	fs/xfs/xfs_icache.c
      	fs/xfs/xfs_inode.c
      	fs/xfs/xfs_inode.h
      	fs/xfs/xfs_inode_item.c
      	fs/xfs/xfs_itable.c
      Signed-off-by: NLong Li <leo.lilong@huawei.com>
      Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
      Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
      947546ab
    • D
      xfs: convert XFS_IFORK_PTR to a static inline helper · 34ba4153
      Darrick J. Wong 提交于
      mainline inclusion
      from mainline-v5.19-rc5
      commit 732436ef
      category: bugfix
      bugzilla: 187164, https://gitee.com/openeuler/kernel/issues/I4KIAO
      CVE: NA
      
      Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=732436ef916b4f338d672ea56accfdb11e8d0732
      
      --------------------------------
      
      We're about to make this logic do a bit more, so convert the macro to a
      static inline function for better typechecking and fewer shouty macros.
      No functional changes here.
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      
      conflicts:
      	fs/xfs/libxfs/xfs_bmap.c
      	fs/xfs/libxfs/xfs_bmap_btree.c
      	fs/xfs/libxfs/xfs_inode_fork.c
      	fs/xfs/libxfs/xfs_inode_fork.h
      	fs/xfs/scrub/bmap.c
      	fs/xfs/scrub/symlink.c
      	fs/xfs/xfs_inode.c
      	fs/xfs/xfs_ioctl.c
      	fs/xfs/xfs_qm.c
      	fs/xfs/xfs_reflink.c
      Signed-off-by: NLong Li <leo.lilong@huawei.com>
      Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
      Signed-off-by: NJialin Zhang <zhangjialin11@huawei.com>
      34ba4153
  2. 15 3月, 2023 1 次提交
  3. 07 12月, 2022 2 次提交
  4. 02 11月, 2022 1 次提交
  5. 09 3月, 2022 1 次提交
  6. 14 1月, 2022 1 次提交
  7. 07 1月, 2022 5 次提交
  8. 27 12月, 2021 3 次提交
    • D
      xfs: reset child dir '..' entry when unlinking child · 25e7a55e
      Darrick J. Wong 提交于
      mainline-inclusion
      from mainline-v5.14-rc1
      commit 5838d035
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I4KIAO
      CVE: NA
      
      Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5838d0356bb3c320867c393f12b169c01a870bda
      
      -------------------------------------------------
      
      While running xfs/168, I noticed a second source of post-shrink
      corruption errors causing shutdowns.
      
      Let's say that directory B has a low inode number and is a child of
      directory A, which has a high number.  If B is empty but open, and
      unlinked from A, B's dotdot link continues to point to A.  If A is then
      unlinked and the filesystem shrunk so that A is no longer a valid inode,
      a subsequent AIL push of B will trip the inode verifiers because the
      dotdot entry points outside of the filesystem.
      
      To avoid this problem, reset B's dotdot entry to the root directory when
      unlinking directories, since the root directory cannot be removed.
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: NGao Xiang <hsiangkao@linux.alibaba.com>
      Signed-off-by: NGuo Xuenan <guoxuenan@huawei.com>
      Reviewed-by: NLihong Kou <koulihong@huawei.com>
      Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      25e7a55e
    • D
      xfs: xfs_log_force_lsn isn't passed a LSN · e100c699
      Dave Chinner 提交于
      mainline-inclusion
      from mainline-v5.13-rc4
      commit 5f9b4b0d
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I4KIAO
      CVE: NA
      
      Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5f9b4b0de8dc2fb8eb655463b438001c111570fe
      
      -------------------------------------------------
      
      In doing an investigation into AIL push stalls, I was looking at the
      log force code to see if an async CIL push could be done instead.
      This lead me to xfs_log_force_lsn() and looking at how it works.
      
      xfs_log_force_lsn() is only called from inode synchronisation
      contexts such as fsync(), and it takes the ip->i_itemp->ili_last_lsn
      value as the LSN to sync the log to. This gets passed to
      xlog_cil_force_lsn() via xfs_log_force_lsn() to flush the CIL to the
      journal, and then used by xfs_log_force_lsn() to flush the iclogs to
      the journal.
      
      The problem is that ip->i_itemp->ili_last_lsn does not store a
      log sequence number. What it stores is passed to it from the
      ->iop_committing method, which is called by xfs_log_commit_cil().
      The value this passes to the iop_committing method is the CIL
      context sequence number that the item was committed to.
      
      As it turns out, xlog_cil_force_lsn() converts the sequence to an
      actual commit LSN for the related context and returns that to
      xfs_log_force_lsn(). xfs_log_force_lsn() overwrites it's "lsn"
      variable that contained a sequence with an actual LSN and then uses
      that to sync the iclogs.
      
      This caused me some confusion for a while, even though I originally
      wrote all this code a decade ago. ->iop_committing is only used by
      a couple of log item types, and only inode items use the sequence
      number it is passed.
      
      Let's clean up the API, CIL structures and inode log item to call it
      a sequence number, and make it clear that the high level code is
      using CIL sequence numbers and not on-disk LSNs for integrity
      synchronisation purposes.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: NAllison Henderson <allison.henderson@oracle.com>
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: NGuo Xuenan <guoxuenan@huawei.com>
      Reviewed-by: NLihong Kou <koulihong@huawei.com>
      Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      e100c699
    • G
      xfs: Fix fall-through warnings for Clang · 1ef907ec
      Gustavo A. R. Silva 提交于
      mainline-inclusion
      from mainline-v5.13-rc2
      commit 53004ee7
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I4KIAO
      CVE: NA
      
      Reference:
      https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=53004ee78d6273c994534ccf79d993098ac89769
      
      -------------------------------------------------
      
      In preparation to enable -Wimplicit-fallthrough for Clang, fix
      the following warnings by replacing /* fall through */ comments,
      and its variants, with the new pseudo-keyword macro fallthrough:
      
      fs/xfs/libxfs/xfs_alloc.c:3167:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
      fs/xfs/libxfs/xfs_da_btree.c:286:3: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
      fs/xfs/libxfs/xfs_ag_resv.c:346:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
      fs/xfs/libxfs/xfs_ag_resv.c:388:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
      fs/xfs/xfs_bmap_util.c:246:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
      fs/xfs/xfs_export.c:88:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
      fs/xfs/xfs_export.c:96:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
      fs/xfs/xfs_file.c:867:3: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
      fs/xfs/xfs_ioctl.c:562:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
      fs/xfs/xfs_ioctl.c:1548:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
      fs/xfs/xfs_iomap.c:1040:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
      fs/xfs/xfs_inode.c:852:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
      fs/xfs/xfs_log.c:2627:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
      fs/xfs/xfs_trans_buf.c:298:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
      fs/xfs/scrub/bmap.c:275:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
      fs/xfs/scrub/btree.c:48:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
      fs/xfs/scrub/common.c:85:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
      fs/xfs/scrub/common.c:138:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
      fs/xfs/scrub/common.c:698:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
      fs/xfs/scrub/dabtree.c:51:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
      fs/xfs/scrub/repair.c:951:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
      fs/xfs/scrub/agheader.c:89:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
      
      Notice that Clang doesn't recognize /* fall through */ comments as
      implicit fall-through markings, so in order to globally enable
      -Wimplicit-fallthrough for Clang, these comments need to be
      replaced with fallthrough; in the whole codebase.
      
      Link: https://github.com/KSPP/linux/issues/115Signed-off-by: NGustavo A. R. Silva <gustavoars@kernel.org>
      Signed-off-by: NGuo Xuenan <guoxuenan@huawei.com>
      Reviewed-by: NLihong Kou <koulihong@huawei.com>
      Reviewed-by: NZhang Yi <yi.zhang@huawei.com>
      Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
      1ef907ec
  9. 22 9月, 2020 1 次提交
  10. 16 9月, 2020 4 次提交
  11. 07 9月, 2020 1 次提交
    • D
      xfs: xfs_iflock is no longer a completion · 718ecc50
      Dave Chinner 提交于
      With the recent rework of the inode cluster flushing, we no longer
      ever wait on the the inode flush "lock". It was never a lock in the
      first place, just a completion to allow callers to wait for inode IO
      to complete. We now never wait for flush completion as all inode
      flushing is non-blocking. Hence we can get rid of all the iflock
      infrastructure and instead just set and check a state flag.
      
      Rename the XFS_IFLOCK flag to XFS_IFLUSHING, convert all the
      xfs_iflock_nowait() test-and-set operations on that flag, and
      replace all the xfs_ifunlock() calls to clear operations.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      718ecc50
  12. 05 8月, 2020 1 次提交
  13. 14 7月, 2020 1 次提交
    • G
      xfs: get rid of unnecessary xfs_perag_{get,put} pairs · 92a00544
      Gao Xiang 提交于
      In the course of some operations, we look up the perag from
      the mount multiple times to get or change perag information.
      These are often very short pieces of code, so while the
      lookup cost is generally low, the cost of the lookup is far
      higher than the cost of the operation we are doing on the
      perag.
      
      Since we changed buffers to hold references to the perag
      they are cached in, many modification contexts already hold
      active references to the perag that are held across these
      operations. This is especially true for any operation that
      is serialised by an allocation group header buffer.
      
      In these cases, we can just use the buffer's reference to
      the perag to avoid needing to do lookups to access the
      perag. This means that many operations don't need to do
      perag lookups at all to access the perag because they've
      already looked up objects that own persistent references
      and hence can use that reference instead.
      
      Cc: Dave Chinner <dchinner@redhat.com>
      Cc: "Darrick J. Wong" <darrick.wong@oracle.com>
      Signed-off-by: NGao Xiang <hsiangkao@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      92a00544
  14. 07 7月, 2020 13 次提交
    • D
      xfs: remove xfs_inobp_check() · e2705b03
      Dave Chinner 提交于
      This debug code is called on every xfs_iflush() call, which then
      checks every inode in the buffer for non-zero unlinked list field.
      Hence it checks every inode in the cluster buffer every time a
      single inode on that cluster it flushed. This is resulting in:
      
      -   38.91%     5.33%  [kernel]  [k] xfs_iflush
         - 17.70% xfs_iflush
            - 9.93% xfs_inobp_check
                 4.36% xfs_buf_offset
      
      10% of the CPU time spent flushing inodes is repeatedly checking
      unlinked fields in the buffer. We don't need to do this.
      
      The other place we call xfs_inobp_check() is
      xfs_iunlink_update_dinode(), and this is after we've done this
      assert for the agino we are about to write into that inode:
      
      	ASSERT(xfs_verify_agino_or_null(mp, agno, next_agino));
      
      which means we've already checked that the agino we are about to
      write is not 0 on debug kernels. The inode buffer verifiers do
      everything else we need, so let's just remove this debug code.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      e2705b03
    • D
      xfs: rework xfs_iflush_cluster() dirty inode iteration · 5717ea4d
      Dave Chinner 提交于
      Now that we have all the dirty inodes attached to the cluster
      buffer, we don't actually have to do radix tree lookups to find
      them. Sure, the radix tree is efficient, but walking a linked list
      of just the dirty inodes attached to the buffer is much better.
      
      We are also no longer dependent on having a locked inode passed into
      the function to determine where to start the lookup. This means we
      can drop it from the function call and treat all inodes the same.
      
      We also make xfs_iflush_cluster skip inodes marked with
      XFS_IRECLAIM. This we avoid races with inodes that reclaim is
      actively referencing or are being re-initialised by inode lookup. If
      they are actually dirty, they'll get written by a future cluster
      flush....
      
      We also add a shutdown check after obtaining the flush lock so that
      we catch inodes that are dirty in memory and may have inconsistent
      state due to the shutdown in progress. We abort these inodes
      directly and so they remove themselves directly from the buffer list
      and the AIL rather than having to wait for the buffer to be failed
      and callbacks run to be processed correctly.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      5717ea4d
    • D
      xfs: rename xfs_iflush_int() · e6187b34
      Dave Chinner 提交于
      with xfs_iflush() gone, we can rename xfs_iflush_int() back to
      xfs_iflush(). Also move it up above xfs_iflush_cluster() so we don't
      need the forward definition any more.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NAmir Goldstein <amir73il@gmail.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      e6187b34
    • D
      xfs: xfs_iflush() is no longer necessary · 90c60e16
      Dave Chinner 提交于
      Now we have a cached buffer on inode log items, we don't need
      to do buffer lookups when flushing inodes anymore - all we need
      to do is lock the buffer and we are ready to go.
      
      This largely gets rid of the need for xfs_iflush(), which is
      essentially just a mechanism to look up the buffer and flush the
      inode to it. Instead, we can just call xfs_iflush_cluster() with a
      few modifications to ensure it also flushes the inode we already
      hold locked.
      
      This allows the AIL inode item pushing to be almost entirely
      non-blocking in XFS - we won't block unless memory allocation
      for the cluster inode lookup blocks or the block device queues are
      full.
      
      Writeback during inode reclaim becomes a little more complex because
      we now have to lock the buffer ourselves, but otherwise this change
      is largely a functional no-op that removes a whole lot of code.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      90c60e16
    • D
      xfs: attach inodes to the cluster buffer when dirtied · 48d55e2a
      Dave Chinner 提交于
      Rather than attach inodes to the cluster buffer just when we are
      doing IO, attach the inodes to the cluster buffer when they are
      dirtied. The means the buffer always carries a list of dirty inodes
      that reference it, and we can use that list to make more fundamental
      changes to inode writeback that aren't otherwise possible.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      48d55e2a
    • D
      xfs: rework stale inodes in xfs_ifree_cluster · 71e3e356
      Dave Chinner 提交于
      Once we have inodes pinning the cluster buffer and attached whenever
      they are dirty, we no longer have a guarantee that the items are
      flush locked when we lock the cluster buffer. Hence we cannot just
      walk the buffer log item list and modify the attached inodes.
      
      If the inode is not flush locked, we have to ILOCK it first and then
      flush lock it to do all the prerequisite checks needed to avoid
      races with other code. This is already handled by
      xfs_ifree_get_one_inode(), so rework the inode iteration loop and
      function to update all inodes in cache whether they are attached to
      the buffer or not.
      
      Note: we also remove the copying of the log item lsn to the
      ili_flush_lsn as xfs_iflush_done() now uses the XFS_ISTALE flag to
      trigger aborts and so flush lsn matching is not needed in IO
      completion for processing freed inodes.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      71e3e356
    • D
      xfs: get rid of log item callbacks · 2ef3f7f5
      Dave Chinner 提交于
      They are not used anymore, so remove them from the log item and the
      buffer iodone attachment interfaces.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      2ef3f7f5
    • D
      xfs: make inode IO completion buffer centric · aac855ab
      Dave Chinner 提交于
      Having different io completion callbacks for different inode states
      makes things complex. We can detect if the inode is stale via the
      XFS_ISTALE flag in IO completion, so we don't need a special
      callback just for this.
      
      This means inodes only have a single iodone callback, and inode IO
      completion is entirely buffer centric at this point. Hence we no
      longer need to use a log item callback at all as we can just call
      xfs_iflush_done() directly from the buffer completions and walk the
      buffer log item list to complete the all inodes under IO.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      aac855ab
    • D
      xfs: mark inode buffers in cache · f593bf14
      Dave Chinner 提交于
      Inode buffers always have write IO callbacks, so by marking them
      directly we can avoid needing to attach ->b_iodone functions to
      them. This avoids an indirect call, and makes future modifications
      much simpler.
      
      While this is largely a refactor of existing functionality, we
      broaden the scope of the flag to beyond where inodes are explicitly
      attached because future changes need to know what type of log items
      are attached to the buffer. Adding this buffer flag may invoke the
      inode iodone callback in cases where it wouldn't have been
      previously, but this is not a functional change because the callback
      is identical to the normal buffer write iodone callback when inodes
      are not attached.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      f593bf14
    • D
      xfs: add an inode item lock · 1319ebef
      Dave Chinner 提交于
      The inode log item is kind of special in that it can be aggregating
      new changes in memory at the same time time existing changes are
      being written back to disk. This means there are fields in the log
      item that are accessed concurrently from contexts that don't share
      any locking at all.
      
      e.g. updating ili_last_fields occurs at flush time under the
      ILOCK_EXCL and flush lock at flush time, under the flush lock at IO
      completion time, and is read under the ILOCK_EXCL when the inode is
      logged.  Hence there is no actual serialisation between reading the
      field during logging of the inode in transactions vs clearing the
      field in IO completion.
      
      We currently get away with this by the fact that we are only
      clearing fields in IO completion, and nothing bad happens if we
      accidentally log more of the inode than we actually modify. Worst
      case is we consume a tiny bit more memory and log bandwidth.
      
      However, if we want to do more complex state manipulations on the
      log item that requires updates at all three of these potential
      locations, we need to have some mechanism of serialising those
      operations. To do this, introduce a spinlock into the log item to
      serialise internal state.
      
      This could be done via the xfs_inode i_flags_lock, but this then
      leads to potential lock inversion issues where inode flag updates
      need to occur inside locks that best nest inside the inode log item
      locks (e.g. marking inodes stale during inode cluster freeing).
      Using a separate spinlock avoids these sorts of problems and
      simplifies future code.
      
      This does not touch the use of ili_fields in the item formatting
      code - that is entirely protected by the ILOCK_EXCL at this point in
      time, so it remains untouched.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      1319ebef
    • D
      xfs: remove logged flag from inode log item · 1dfde687
      Dave Chinner 提交于
      This was used to track if the item had logged fields being flushed
      to disk. We log everything in the inode these days, so this logic is
      no longer needed. Remove it.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      1dfde687
    • D
      xfs: Don't allow logging of XFS_ISTALE inodes · 96355d5a
      Dave Chinner 提交于
      In tracking down a problem in this patchset, I discovered we are
      reclaiming dirty stale inodes. This wasn't discovered until inodes
      were always attached to the cluster buffer and then the rcu callback
      that freed inodes was assert failing because the inode still had an
      active pointer to the cluster buffer after it had been reclaimed.
      
      Debugging the issue indicated that this was a pre-existing issue
      resulting from the way the inodes are handled in xfs_inactive_ifree.
      When we free a cluster buffer from xfs_ifree_cluster, all the inodes
      in cache are marked XFS_ISTALE. Those that are clean have nothing
      else done to them and so eventually get cleaned up by background
      reclaim. i.e. it is assumed we'll never dirty/relog an inode marked
      XFS_ISTALE.
      
      On journal commit dirty stale inodes as are handled by both
      buffer and inode log items to run though xfs_istale_done() and
      removed from the AIL (buffer log item commit) or the log item will
      simply unpin it because the buffer log item will clean it. What happens
      to any specific inode is entirely dependent on which log item wins
      the commit race, but the result is the same - stale inodes are
      clean, not attached to the cluster buffer, and not in the AIL. Hence
      inode reclaim can just free these inodes without further care.
      
      However, if the stale inode is relogged, it gets dirtied again and
      relogged into the CIL. Most of the time this isn't an issue, because
      relogging simply changes the inode's location in the current
      checkpoint. Problems arise, however, when the CIL checkpoints
      between two transactions in the xfs_inactive_ifree() deferops
      processing. This results in the XFS_ISTALE inode being redirtied
      and inserted into the CIL without any of the other stale cluster
      buffer infrastructure being in place.
      
      Hence on journal commit, it simply gets unpinned, so it remains
      dirty in memory. Everything in inode writeback avoids XFS_ISTALE
      inodes so it can't be written back, and it is not tracked in the AIL
      so there's not even a trigger to attempt to clean the inode. Hence
      the inode just sits dirty in memory until inode reclaim comes along,
      sees that it is XFS_ISTALE, and goes to reclaim it. This reclaiming
      of a dirty inode caused use after free, list corruptions and other
      nasty issues later in this patchset.
      
      Hence this patch addresses a violation of the "never log XFS_ISTALE
      inodes" caused by the deferops processing rolling a transaction
      and relogging a stale inode in xfs_inactive_free. It also adds a
      bunch of asserts to catch this problem in debug kernels so that
      we don't reintroduce this problem in future.
      
      Reproducer for this issue was generic/558 on a v4 filesystem.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      96355d5a
    • D
      xfs: move helpers that lock and unlock two inodes against userspace IO · e2aaee9c
      Darrick J. Wong 提交于
      Move the double-inode locking helpers to xfs_inode.c since they're not
      specific to reflink.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      e2aaee9c
  15. 10 6月, 2020 1 次提交