1. 10 7月, 2022 1 次提交
  2. 29 6月, 2022 2 次提交
    • D
      xfs: don't hold xattr leaf buffers across transaction rolls · e53bcffa
      Darrick J. Wong 提交于
      Now that we've established (again!) that empty xattr leaf buffers are
      ok, we no longer need to bhold them to transactions when we're creating
      new leaf blocks.  Get rid of the entire mechanism, which should simplify
      the xattr code quite a bit.
      
      The original justification for using bhold here was to prevent the AIL
      from trying to write the empty leaf block into the fs during the brief
      time that we release the buffer lock.  The reason for /that/ was to
      prevent recovery from tripping over the empty ondisk block.
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      e53bcffa
    • D
      xfs: empty xattr leaf header blocks are not corruption · 7be3bd88
      Darrick J. Wong 提交于
      TLDR: Revert commit 51e6104f ("xfs: detect empty attr leaf blocks in
      xfs_attr3_leaf_verify") because it was wrong.
      
      Every now and then we get a corruption report from the kernel or
      xfs_repair about empty leaf blocks in the extended attribute structure.
      We've long thought that these shouldn't be possible, but prior to 5.18
      one would shake loose in the recoveryloop fstests about once a month.
      
      A new addition to the xattr leaf block verifier in 5.19-rc1 makes this
      happen every 7 minutes on my testing cloud.  I added a ton of logging to
      detect any time we set the header count on an xattr leaf block to zero.
      This produced the following dmesg output on generic/388:
      
      XFS (sda4): ino 0x21fcbaf leaf 0x129bf78 hdcount==0!
      Call Trace:
       <TASK>
       dump_stack_lvl+0x34/0x44
       xfs_attr3_leaf_create+0x187/0x230
       xfs_attr_shortform_to_leaf+0xd1/0x2f0
       xfs_attr_set_iter+0x73e/0xa90
       xfs_xattri_finish_update+0x45/0x80
       xfs_attr_finish_item+0x1b/0xd0
       xfs_defer_finish_noroll+0x19c/0x770
       __xfs_trans_commit+0x153/0x3e0
       xfs_attr_set+0x36b/0x740
       xfs_xattr_set+0x89/0xd0
       __vfs_setxattr+0x67/0x80
       __vfs_setxattr_noperm+0x6e/0x120
       vfs_setxattr+0x97/0x180
       setxattr+0x88/0xa0
       path_setxattr+0xc3/0xe0
       __x64_sys_setxattr+0x27/0x30
       do_syscall_64+0x35/0x80
       entry_SYSCALL_64_after_hwframe+0x46/0xb0
      
      So now we know that someone is creating empty xattr leaf blocks as part
      of converting a sf xattr structure into a leaf xattr structure.  The
      conversion routine logs any existing sf attributes in the same
      transaction that creates the leaf block, so we know this is a setxattr
      to a file that has no attributes at all.
      
      Next, g/388 calls the shutdown ioctl and cycles the mount to trigger log
      recovery.  I also augmented buffer item recovery to call ->verify_struct
      on any attr leaf blocks and complain if it finds a failure:
      
      XFS (sda4): Unmounting Filesystem
      XFS (sda4): Mounting V5 Filesystem
      XFS (sda4): Starting recovery (logdev: internal)
      XFS (sda4): xattr leaf daddr 0x129bf78 hdrcount == 0!
      Call Trace:
       <TASK>
       dump_stack_lvl+0x34/0x44
       xfs_attr3_leaf_verify+0x3b8/0x420
       xlog_recover_buf_commit_pass2+0x60a/0x6c0
       xlog_recover_items_pass2+0x4e/0xc0
       xlog_recover_commit_trans+0x33c/0x350
       xlog_recovery_process_trans+0xa5/0xe0
       xlog_recover_process_data+0x8d/0x140
       xlog_do_recovery_pass+0x19b/0x720
       xlog_do_log_recovery+0x62/0xc0
       xlog_do_recover+0x33/0x1d0
       xlog_recover+0xda/0x190
       xfs_log_mount+0x14c/0x360
       xfs_mountfs+0x517/0xa60
       xfs_fs_fill_super+0x6bc/0x950
       get_tree_bdev+0x175/0x280
       vfs_get_tree+0x1a/0x80
       path_mount+0x6f5/0xaa0
       __x64_sys_mount+0x103/0x140
       do_syscall_64+0x35/0x80
       entry_SYSCALL_64_after_hwframe+0x46/0xb0
      RIP: 0033:0x7fc61e241eae
      
      And a moment later, the _delwri_submit of the recovered buffers trips
      the same verifier and recovery fails:
      
      XFS (sda4): Metadata corruption detected at xfs_attr3_leaf_verify+0x393/0x420 [xfs], xfs_attr3_leaf block 0x129bf78
      XFS (sda4): Unmount and run xfs_repair
      XFS (sda4): First 128 bytes of corrupted metadata buffer:
      00000000: 00 00 00 00 00 00 00 00 3b ee 00 00 00 00 00 00  ........;.......
      00000010: 00 00 00 00 01 29 bf 78 00 00 00 00 00 00 00 00  .....).x........
      00000020: a5 1b d0 02 b2 9a 49 df 8e 9c fb 8d f8 31 3e 9d  ......I......1>.
      00000030: 00 00 00 00 02 1f cb af 00 00 00 00 10 00 00 00  ................
      00000040: 00 50 0f b0 00 00 00 00 00 00 00 00 00 00 00 00  .P..............
      00000050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
      00000060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
      00000070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
      XFS (sda4): Corruption of in-memory data (0x8) detected at _xfs_buf_ioapply+0x37f/0x3b0 [xfs] (fs/xfs/xfs_buf.c:1518).  Shutting down filesystem.
      XFS (sda4): Please unmount the filesystem and rectify the problem(s)
      XFS (sda4): log mount/recovery failed: error -117
      XFS (sda4): log mount failed
      
      I think I see what's going on here -- setxattr is racing with something
      that shuts down the filesystem:
      
      Thread 1				Thread 2
      --------				--------
      xfs_attr_sf_addname
      xfs_attr_shortform_to_leaf
      <create empty leaf>
      xfs_trans_bhold(leaf)
      xattri_dela_state = XFS_DAS_LEAF_ADD
      <roll transaction>
      					<flush log>
      					<shut down filesystem>
      xfs_trans_bhold_release(leaf)
      <discover fs is dead, bail>
      
      Thread 3
      --------
      <cycle mount, start recovery>
      xlog_recover_buf_commit_pass2
      xlog_recover_do_reg_buffer
      <replay empty leaf buffer from recovered buf item>
      xfs_buf_delwri_queue(leaf)
      xfs_buf_delwri_submit
      _xfs_buf_ioapply(leaf)
      xfs_attr3_leaf_write_verify
      <trip over empty leaf buffer>
      <fail recovery>
      
      As you can see, the bhold keeps the leaf buffer locked and thus prevents
      the *AIL* from tripping over the ichdr.count==0 check in the write
      verifier.  Unfortunately, it doesn't prevent the log from getting
      flushed to disk, which sets up log recovery to fail.
      
      So.  It's clear that the kernel has always had the ability to persist
      attr leaf blocks with ichdr.count==0, which means that it's part of the
      ondisk format now.
      
      Unfortunately, this check has been added and removed multiple times
      throughout history.  It first appeared in[1] kernel 3.10 as part of the
      early V5 format patches.  The check was later discovered to break log
      recovery and hence disabled[2] during log recovery in kernel 4.10.
      Simultaneously, the check was added[3] to xfs_repair 4.9.0 to try to
      weed out the empty leaf blocks.  This was still not correct because log
      recovery would recover an empty attr leaf block successfully only for
      regular xattr operations to trip over the empty block during of the
      block during regular operation.  Therefore, the check was removed
      entirely[4] in kernel 5.7 but removal of the xfs_repair check was
      forgotten.  The continued complaints from xfs_repair lead to us
      mistakenly re-adding[5] the verifier check for kernel 5.19.  Remove it
      once again.
      
      [1] 517c2220 ("xfs: add CRCs to attr leaf blocks")
      [2] 2e1d2337 ("xfs: ignore leaf attr ichdr.count in verifier
                         during log replay")
      [3] f7140161 ("xfs_repair: junk leaf attribute if count == 0")
      [4] f28cef9e ("xfs: don't fail verifier on empty attr3 leaf
                         block")
      [5] 51e6104f ("xfs: detect empty attr leaf blocks in
                         xfs_attr3_leaf_verify")
      
      Looking at the rest of the xattr code, it seems that files with empty
      leaf blocks behave as expected -- listxattr reports no attributes;
      getxattr on any xattr returns nothing as expected; removexattr does
      nothing; and setxattr can add attributes just fine.
      
      Original-bug: 517c2220 ("xfs: add CRCs to attr leaf blocks")
      Still-not-fixed-by: 2e1d2337 ("xfs: ignore leaf attr ichdr.count in verifier during log replay")
      Removed-in: f28cef9e ("xfs: don't fail verifier on empty attr3 leaf block")
      Fixes: 51e6104f ("xfs: detect empty attr leaf blocks in xfs_attr3_leaf_verify")
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      7be3bd88
  3. 16 6月, 2022 1 次提交
    • D
      xfs: fix TOCTOU race involving the new logged xattrs control knob · f4288f01
      Darrick J. Wong 提交于
      I found a race involving the larp control knob, aka the debugging knob
      that lets developers enable logging of extended attribute updates:
      
      Thread 1			Thread 2
      
      echo 0 > /sys/fs/xfs/debug/larp
      				setxattr(REPLACE)
      				xfs_has_larp (returns false)
      				xfs_attr_set
      
      echo 1 > /sys/fs/xfs/debug/larp
      
      				xfs_attr_defer_replace
      				xfs_attr_init_replace_state
      				xfs_has_larp (returns true)
      				xfs_attr_init_remove_state
      
      				<oops, wrong DAS state!>
      
      This isn't a particularly severe problem right now because xattr logging
      is only enabled when CONFIG_XFS_DEBUG=y, and developers *should* know
      what they're doing.
      
      However, the eventual intent is that callers should be able to ask for
      the assistance of the log in persisting xattr updates.  This capability
      might not be required for /all/ callers, which means that dynamic
      control must work correctly.  Once an xattr update has decided whether
      or not to use logged xattrs, it needs to stay in that mode until the end
      of the operation regardless of what subsequent parallel operations might
      do.
      
      Therefore, it is an error to continue sampling xfs_globals.larp once
      xfs_attr_change has made a decision about larp, and it was not correct
      for me to have told Allison that ->create_intent functions can sample
      the global log incompat feature bitfield to decide to elide a log item.
      
      Instead, create a new op flag for the xfs_da_args structure, and convert
      all other callers of xfs_has_larp and xfs_sb_version_haslogxattrs within
      the attr update state machine to look for the operations flag.
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: NAllison Henderson <allison.henderson@oracle.com>
      f4288f01
  4. 12 5月, 2022 3 次提交
    • D
      xfs: detect empty attr leaf blocks in xfs_attr3_leaf_verify · 51e6104f
      Dave Chinner 提交于
      xfs_repair flags these as a corruption error, so the verifier should
      catch software bugs that result in empty leaf blocks being written
      to disk, too.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NAllison Henderson <allison.henderson@oracle.com>
      Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      51e6104f
    • D
      xfs: ATTR_REPLACE algorithm with LARP enabled needs rework · fdaf1bb3
      Dave Chinner 提交于
      We can't use the same algorithm for replacing an existing attribute
      when logging attributes. The existing algorithm is essentially:
      
      1. create new attr w/ INCOMPLETE
      2. atomically flip INCOMPLETE flags between old + new attribute
      3. remove old attr which is marked w/ INCOMPLETE
      
      This algorithm guarantees that we see either the old or new
      attribute, and if we fail after the atomic flag flip, we don't have
      to recover the removal of the old attr because we never see
      INCOMPLETE attributes in lookups.
      
      For logged attributes, however, this does not work. The logged
      attribute intents do not track the work that has been done as the
      transaction rolls, and hence the only recovery mechanism we have is
      "run the replace operation from scratch".
      
      This is further exacerbated by the attempt to avoid needing the
      INCOMPLETE flag to create an atomic swap. This means we can create
      a second active attribute of the same name before we remove the
      original. If we fail at any point after the create but before the
      removal has completed, we end up with duplicate attributes in
      the attr btree and recovery only tries to replace one of them.
      
      There are several other failure modes where we can leave partially
      allocated remote attributes that expose stale data, partially free
      remote attributes that enable UAF based stale data exposure, etc.
      
      TO fix this, we need a different algorithm for replace operations
      when LARP is enabled. Luckily, it's not that complex if we take the
      right first step. That is, the first thing we log is the attri
      intent with the new name/value pair and mark the old attr as
      INCOMPLETE in the same transaction.
      
      From there, we then remove the old attr and keep relogging the
      new name/value in the intent, such that we always know that we have
      to create the new attr in recovery. Once the old attr is removed,
      we then run a normal ATTR_CREATE operation relogging the intent as
      we go. If the new attr is local, then it gets created in a single
      atomic transaction that also logs the final intent done. If the new
      attr is remote, the we set INCOMPLETE on the new attr while we
      allocate and set the remote value, and then we clear the INCOMPLETE
      flag at in the last transaction taht logs the final intent done.
      
      If we fail at any point in this algorithm, log recovery will always
      see the same state on disk: the new name/value in the intent, and
      either an INCOMPLETE attr or no attr in the attr btree. If we find
      an INCOMPLETE attr, we run the full replace starting with removing
      the INCOMPLETE attr. If we don't find it, then we simply create the
      new attr.
      
      Notably, recovery of a failed create that has an INCOMPLETE flag set
      is now the same - we start with the lookup of the INCOMPLETE attr,
      and if that exists then we do the full replace recovery process,
      otherwise we just create the new attr.
      
      Hence changing the way we do the replace operation when LARP is
      enabled allows us to use the same log recovery algorithm for both
      the ATTR_CREATE and ATTR_REPLACE operations. This is also the same
      algorithm we use for runtime ATTR_REPLACE operations (except for the
      step setting up the initial conditions).
      
      The result is that:
      
      - ATTR_CREATE uses the same algorithm regardless of whether LARP is
        enabled or not
      - ATTR_REPLACE with larp=0 is identical to the old algorithm
      - ATTR_REPLACE with larp=1 runs an unmodified attr removal algorithm
        from the larp=0 code and then runs the unmodified ATTR_CREATE
        code.
      - log recovery when larp=1 runs the same ATTR_REPLACE algorithm as
        it uses at runtime.
      
      Because the state machine is now quite clean, changing the algorithm
      is really just a case of changing the initial state and how the
      states link together for the ATTR_REPLACE case. Hence it's not a
      huge amount of code for what is a fairly substantial rework
      of the attr logging and recovery algorithm....
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NAllison Henderson <allison.henderson@oracle.com>
      Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      fdaf1bb3
    • D
      xfs: use XFS_DA_OP flags in deferred attr ops · e7f358de
      Dave Chinner 提交于
      We currently store the high level attr operation in
      args->attr_flags. This field contains what the VFS is telling us to
      do, but don't necessarily match what we are doing in the low level
      modification state machine. e.g. XATTR_REPLACE implies both
      XFS_DA_OP_ADDNAME and XFS_DA_OP_RENAME because it is doing both a
      remove and adding a new attr.
      
      However, deep in the individual state machine operations, we check
      errors against this high level VFS op flags, not the low level
      XFS_DA_OP flags. Indeed, we don't even have a low level flag for
      a REMOVE operation, so the only way we know we are doing a remove
      is the complete absence of XATTR_REPLACE, XATTR_CREATE,
      XFS_DA_OP_ADDNAME and XFS_DA_OP_RENAME. And because there are other
      flags in these fields, this is a pain to check if we need to.
      
      As the XFS_DA_OP flags are only needed once the deferred operations
      are set up, set these flags appropriately when we set the initial
      operation state. We also introduce a XFS_DA_OP_REMOVE flag to make
      it easy to know that we are doing a remove operation.
      
      With these, we can remove the use of XATTR_REPLACE and XATTR_CREATE
      in low level lookup operations, and manipulate the low level flags
      according to the low level context that is operating. e.g. log
      recovery does not have a VFS xattr operation state to copy into
      args->attr_flags, and the low level state machine ops we do for
      recovery do not match the high level VFS operations that were in
      progress when the system failed...
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: NAllison Henderson <allison.henderson@oracle.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      e7f358de
  5. 11 5月, 2022 1 次提交
  6. 09 5月, 2022 1 次提交
  7. 23 10月, 2021 1 次提交
  8. 20 8月, 2021 3 次提交
  9. 10 8月, 2021 2 次提交
  10. 10 6月, 2021 1 次提交
  11. 02 6月, 2021 2 次提交
    • D
      xfs: move xfs_perag_get/put to xfs_ag.[ch] · 9bbafc71
      Dave Chinner 提交于
      They are AG functions, not superblock functions, so move them to the
      appropriate location.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
      9bbafc71
    • A
      xfs: Add delay ready attr remove routines · 2b74b03c
      Allison Henderson 提交于
      This patch modifies the attr remove routines to be delay ready. This
      means they no longer roll or commit transactions, but instead return
      -EAGAIN to have the calling routine roll and refresh the transaction. In
      this series, xfs_attr_remove_args is merged with
      xfs_attr_node_removename become a new function, xfs_attr_remove_iter.
      This new version uses a sort of state machine like switch to keep track
      of where it was when EAGAIN was returned. A new version of
      xfs_attr_remove_args consists of a simple loop to refresh the
      transaction until the operation is completed. A new XFS_DAC_DEFER_FINISH
      flag is used to finish the transaction where ever the existing code used
      to.
      
      Calls to xfs_attr_rmtval_remove are replaced with the delay ready
      version __xfs_attr_rmtval_remove. We will rename
      __xfs_attr_rmtval_remove back to xfs_attr_rmtval_remove when we are
      done.
      
      xfs_attr_rmtval_remove itself is still in use by the set routines (used
      during a rename).  For reasons of preserving existing function, we
      modify xfs_attr_rmtval_remove to call xfs_defer_finish when the flag is
      set.  Similar to how xfs_attr_remove_args does here.  Once we transition
      the set routines to be delay ready, xfs_attr_rmtval_remove is no longer
      used and will be removed.
      
      This patch also adds a new struct xfs_delattr_context, which we will use
      to keep track of the current state of an attribute operation. The new
      xfs_delattr_state enum is used to track various operations that are in
      progress so that we know not to repeat them, and resume where we left
      off before EAGAIN was returned to cycle out the transaction. Other
      members take the place of local variables that need to retain their
      values across multiple function calls.  See xfs_attr.h for a more
      detailed diagram of the states.
      Signed-off-by: NAllison Henderson <allison.henderson@oracle.com>
      Reviewed-by: NChandan Babu R <chandanrlinux@gmail.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
      2b74b03c
  12. 16 4月, 2021 2 次提交
  13. 08 4月, 2021 1 次提交
  14. 19 11月, 2020 1 次提交
    • G
      xfs: fix forkoff miscalculation related to XFS_LITINO(mp) · ada49d64
      Gao Xiang 提交于
      Currently, commit e9e2eae8 dropped a (int) decoration from
      XFS_LITINO(mp), and since sizeof() expression is also involved,
      the result of XFS_LITINO(mp) is simply as the size_t type
      (commonly unsigned long).
      
      Considering the expression in xfs_attr_shortform_bytesfit():
        offset = (XFS_LITINO(mp) - bytes) >> 3;
      let "bytes" be (int)340, and
          "XFS_LITINO(mp)" be (unsigned long)336.
      
      on 64-bit platform, the expression is
        offset = ((unsigned long)336 - (int)340) >> 3 =
                 (int)(0xfffffffffffffffcUL >> 3) = -1
      
      but on 32-bit platform, the expression is
        offset = ((unsigned long)336 - (int)340) >> 3 =
                 (int)(0xfffffffcUL >> 3) = 0x1fffffff
      instead.
      
      so offset becomes a large positive number on 32-bit platform, and
      cause xfs_attr_shortform_bytesfit() returns maxforkoff rather than 0.
      
      Therefore, one result is
        "ASSERT(new_size <= XFS_IFORK_SIZE(ip, whichfork));"
      
      assertion failure in xfs_idata_realloc(), which was also the root
      cause of the original bugreport from Dennis, see:
         https://bugzilla.redhat.com/show_bug.cgi?id=1894177
      
      And it can also be manually triggered with the following commands:
        $ touch a;
        $ setfattr -n user.0 -v "`seq 0 80`" a;
        $ setfattr -n user.1 -v "`seq 0 80`" a
      
      on 32-bit platform.
      
      Fix the case in xfs_attr_shortform_bytesfit() by bailing out
      "XFS_LITINO(mp) < bytes" in advance suggested by Eric and a misleading
      comment together with this bugfix suggested by Darrick. It seems the
      other users of XFS_LITINO(mp) are not impacted.
      
      Fixes: e9e2eae8 ("xfs: only check the superblock version for dinode size calculation")
      Cc: <stable@vger.kernel.org> # 5.7+
      Reported-and-tested-by: NDennis Gilmore <dgilmore@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NGao Xiang <hsiangkao@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      ada49d64
  15. 16 9月, 2020 4 次提交
  16. 27 8月, 2020 2 次提交
  17. 29 7月, 2020 4 次提交
  18. 27 5月, 2020 1 次提交
    • D
      xfs: more lockdep whackamole with kmem_alloc* · 6dcde60e
      Darrick J. Wong 提交于
      Dave Airlie reported the following lockdep complaint:
      
      >  ======================================================
      >  WARNING: possible circular locking dependency detected
      >  5.7.0-0.rc5.20200515git1ae7efb3.1.fc33.x86_64 #1 Not tainted
      >  ------------------------------------------------------
      >  kswapd0/159 is trying to acquire lock:
      >  ffff9b38d01a4470 (&xfs_nondir_ilock_class){++++}-{3:3},
      >  at: xfs_ilock+0xde/0x2c0 [xfs]
      >
      >  but task is already holding lock:
      >  ffffffffbbb8bd00 (fs_reclaim){+.+.}-{0:0}, at:
      >  __fs_reclaim_acquire+0x5/0x30
      >
      >  which lock already depends on the new lock.
      >
      >
      >  the existing dependency chain (in reverse order) is:
      >
      >  -> #1 (fs_reclaim){+.+.}-{0:0}:
      >         fs_reclaim_acquire+0x34/0x40
      >         __kmalloc+0x4f/0x270
      >         kmem_alloc+0x93/0x1d0 [xfs]
      >         kmem_alloc_large+0x4c/0x130 [xfs]
      >         xfs_attr_copy_value+0x74/0xa0 [xfs]
      >         xfs_attr_get+0x9d/0xc0 [xfs]
      >         xfs_get_acl+0xb6/0x200 [xfs]
      >         get_acl+0x81/0x160
      >         posix_acl_xattr_get+0x3f/0xd0
      >         vfs_getxattr+0x148/0x170
      >         getxattr+0xa7/0x240
      >         path_getxattr+0x52/0x80
      >         do_syscall_64+0x5c/0xa0
      >         entry_SYSCALL_64_after_hwframe+0x49/0xb3
      >
      >  -> #0 (&xfs_nondir_ilock_class){++++}-{3:3}:
      >         __lock_acquire+0x1257/0x20d0
      >         lock_acquire+0xb0/0x310
      >         down_write_nested+0x49/0x120
      >         xfs_ilock+0xde/0x2c0 [xfs]
      >         xfs_reclaim_inode+0x3f/0x400 [xfs]
      >         xfs_reclaim_inodes_ag+0x20b/0x410 [xfs]
      >         xfs_reclaim_inodes_nr+0x31/0x40 [xfs]
      >         super_cache_scan+0x190/0x1e0
      >         do_shrink_slab+0x184/0x420
      >         shrink_slab+0x182/0x290
      >         shrink_node+0x174/0x680
      >         balance_pgdat+0x2d0/0x5f0
      >         kswapd+0x21f/0x510
      >         kthread+0x131/0x150
      >         ret_from_fork+0x3a/0x50
      >
      >  other info that might help us debug this:
      >
      >   Possible unsafe locking scenario:
      >
      >         CPU0                    CPU1
      >         ----                    ----
      >    lock(fs_reclaim);
      >                                 lock(&xfs_nondir_ilock_class);
      >                                 lock(fs_reclaim);
      >    lock(&xfs_nondir_ilock_class);
      >
      >   *** DEADLOCK ***
      >
      >  4 locks held by kswapd0/159:
      >   #0: ffffffffbbb8bd00 (fs_reclaim){+.+.}-{0:0}, at:
      >  __fs_reclaim_acquire+0x5/0x30
      >   #1: ffffffffbbb7cef8 (shrinker_rwsem){++++}-{3:3}, at:
      >  shrink_slab+0x115/0x290
      >   #2: ffff9b39f07a50e8
      >  (&type->s_umount_key#56){++++}-{3:3}, at: super_cache_scan+0x38/0x1e0
      >   #3: ffff9b39f077f258
      >  (&pag->pag_ici_reclaim_lock){+.+.}-{3:3}, at:
      >  xfs_reclaim_inodes_ag+0x82/0x410 [xfs]
      
      This is a known false positive because inodes cannot simultaneously be
      getting reclaimed and the target of a getxattr operation, but lockdep
      doesn't know that.  We can (selectively) shut up lockdep until either
      it gets smarter or we change inode reclaim not to require the ILOCK by
      applying a stupid GFP_NOLOCKDEP bandaid.
      Reported-by: NDave Airlie <airlied@gmail.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Tested-by: NDave Airlie <airlied@gmail.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      6dcde60e
  19. 20 5月, 2020 4 次提交
  20. 19 3月, 2020 1 次提交
  21. 12 3月, 2020 1 次提交
  22. 03 3月, 2020 1 次提交