1. 02 7月, 2022 1 次提交
    • D
      xfs: introduce per-cpu CIL tracking structure · af1c2146
      Dave Chinner 提交于
      The CIL push lock is highly contended on larger machines, becoming a
      hard bottleneck that about 700,000 transaction commits/s on >16p
      machines. To address this, start moving the CIL tracking
      infrastructure to utilise per-CPU structures.
      
      We need to track the space used, the amount of log reservation space
      reserved to write the CIL, the log items in the CIL and the busy
      extents that need to be completed by the CIL commit.  This requires
      a couple of per-cpu counters, an unordered per-cpu list and a
      globally ordered per-cpu list.
      
      Create a per-cpu structure to hold these and all the management
      interfaces needed, as well as the hooks to handle hotplug CPUs.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
      
      af1c2146
  2. 24 6月, 2022 2 次提交
    • D
      xfs: introduce xfs_inodegc_push() · 5e672cd6
      Dave Chinner 提交于
      The current blocking mechanism for pushing the inodegc queue out to
      disk can result in systems becoming unusable when there is a long
      running inodegc operation. This is because the statfs()
      implementation currently issues a blocking flush of the inodegc
      queue and a significant number of common system utilities will call
      statfs() to discover something about the underlying filesystem.
      
      This can result in userspace operations getting stuck on inodegc
      progress, and when trying to remove a heavily reflinked file on slow
      storage with a full journal, this can result in delays measuring in
      hours.
      
      Avoid this problem by adding "push" function that expedites the
      flushing of the inodegc queue, but doesn't wait for it to complete.
      
      Convert xfs_fs_statfs() and xfs_qm_scall_getquota() to use this
      mechanism so they don't block but still ensure that queued
      operations are expedited.
      
      Fixes: ab23a776 ("xfs: per-cpu deferred inode inactivation queues")
      Reported-by: NChris Dunlop <chris@onthe.net.au>
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      [djwong: fix _getquota_next to use _inodegc_push too]
      Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      5e672cd6
    • D
      xfs: bound maximum wait time for inodegc work · 7cf2b0f9
      Dave Chinner 提交于
      Currently inodegc work can sit queued on the per-cpu queue until
      the workqueue is either flushed of the queue reaches a depth that
      triggers work queuing (and later throttling). This means that we
      could queue work that waits for a long time for some other event to
      trigger flushing.
      
      Hence instead of just queueing work at a specific depth, use a
      delayed work that queues the work at a bound time. We can still
      schedule the work immediately at a given depth, but we no long need
      to worry about leaving a number of items on the list that won't get
      processed until external events prevail.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      7cf2b0f9
  3. 27 5月, 2022 1 次提交
  4. 22 5月, 2022 1 次提交
  5. 18 4月, 2022 1 次提交
  6. 13 4月, 2022 1 次提交
  7. 12 4月, 2022 1 次提交
    • D
      xfs: use a separate frextents counter for rt extent reservations · 2229276c
      Darrick J. Wong 提交于
      As mentioned in the previous commit, the kernel misuses sb_frextents in
      the incore mount to reflect both incore reservations made by running
      transactions as well as the actual count of free rt extents on disk.
      This results in the superblock being written to the log with an
      underestimate of the number of rt extents that are marked free in the
      rtbitmap.
      
      Teaching XFS to recompute frextents after log recovery avoids
      operational problems in the current mount, but it doesn't solve the
      problem of us writing undercounted frextents which are then recovered by
      an older kernel that doesn't have that fix.
      
      Create an incore percpu counter to mirror the ondisk frextents.  This
      new counter will track transaction reservations and the only time we
      will touch the incore super counter (i.e the one that gets logged) is
      when those transactions commit updates to the rt bitmap.  This is in
      contrast to the lazysbcount counters (e.g. fdblocks), where we know that
      log recovery will always fix any incorrect counter that we log.
      As a bonus, we only take m_sb_lock at transaction commit time.
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      2229276c
  8. 28 3月, 2022 1 次提交
    • D
      xfs: don't report reserved bnobt space as available · 85bcfa26
      Darrick J. Wong 提交于
      On a modern filesystem, we don't allow userspace to allocate blocks for
      data storage from the per-AG space reservations, the user-controlled
      reservation pool that prevents ENOSPC in the middle of internal
      operations, or the internal per-AG set-aside that prevents unwanted
      filesystem shutdowns due to ENOSPC during a bmap btree split.
      
      Since we now consider freespace btree blocks as unavailable for
      allocation for data storage, we shouldn't report those blocks via statfs
      either.  This makes the numbers that we return via the statfs f_bavail
      and f_bfree fields a more conservative estimate of actual free space.
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      85bcfa26
  9. 10 2月, 2022 1 次提交
  10. 31 1月, 2022 1 次提交
  11. 22 12月, 2021 1 次提交
    • D
      xfs: only run COW extent recovery when there are no live extents · 7993f1a4
      Darrick J. Wong 提交于
      As part of multiple customer escalations due to file data corruption
      after copy on write operations, I wrote some fstests that use fsstress
      to hammer on COW to shake things loose.  Regrettably, I caught some
      filesystem shutdowns due to incorrect rmap operations with the following
      loop:
      
      mount <filesystem>				# (0)
      fsstress <run only readonly ops> &		# (1)
      while true; do
      	fsstress <run all ops>
      	mount -o remount,ro			# (2)
      	fsstress <run only readonly ops>
      	mount -o remount,rw			# (3)
      done
      
      When (2) happens, notice that (1) is still running.  xfs_remount_ro will
      call xfs_blockgc_stop to walk the inode cache to free all the COW
      extents, but the blockgc mechanism races with (1)'s reader threads to
      take IOLOCKs and loses, which means that it doesn't clean them all out.
      Call such a file (A).
      
      When (3) happens, xfs_remount_rw calls xfs_reflink_recover_cow, which
      walks the ondisk refcount btree and frees any COW extent that it finds.
      This function does not check the inode cache, which means that incore
      COW forks of inode (A) is now inconsistent with the ondisk metadata.  If
      one of those former COW extents are allocated and mapped into another
      file (B) and someone triggers a COW to the stale reservation in (A), A's
      dirty data will be written into (B) and once that's done, those blocks
      will be transferred to (A)'s data fork without bumping the refcount.
      
      The results are catastrophic -- file (B) and the refcount btree are now
      corrupt.  In the first patch, we fixed the race condition in (2) so that
      (A) will always flush the COW fork.  In this second patch, we move the
      _recover_cow call to the initial mount call in (0) for safety.
      
      As mentioned previously, xfs_reflink_recover_cow walks the refcount
      btree looking for COW staging extents, and frees them.  This was
      intended to be run at mount time (when we know there are no live inodes)
      to clean up any leftover staging events that may have been left behind
      during an unclean shutdown.  As a time "optimization" for readonly
      mounts, we deferred this to the ro->rw transition, not realizing that
      any failure to clean all COW forks during a rw->ro transition would
      result in catastrophic corruption.
      
      Therefore, remove this optimization and only run the recovery routine
      when we're guaranteed not to have any COW staging extents anywhere,
      which means we always run this at mount time.  While we're at it, move
      the callsite to xfs_log_mount_finish because any refcount btree
      expansion (however unlikely given that we're removing records from the
      right side of the index) must be fed by a per-AG reservation, which
      doesn't exist in its current location.
      
      Fixes: 174edb0e ("xfs: store in-progress CoW allocations in the refcount btree")
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: NChandan Babu R <chandan.babu@oracle.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      7993f1a4
  12. 08 12月, 2021 1 次提交
    • D
      xfs: remove all COW fork extents when remounting readonly · 089558bc
      Darrick J. Wong 提交于
      As part of multiple customer escalations due to file data corruption
      after copy on write operations, I wrote some fstests that use fsstress
      to hammer on COW to shake things loose.  Regrettably, I caught some
      filesystem shutdowns due to incorrect rmap operations with the following
      loop:
      
      mount <filesystem>				# (0)
      fsstress <run only readonly ops> &		# (1)
      while true; do
      	fsstress <run all ops>
      	mount -o remount,ro			# (2)
      	fsstress <run only readonly ops>
      	mount -o remount,rw			# (3)
      done
      
      When (2) happens, notice that (1) is still running.  xfs_remount_ro will
      call xfs_blockgc_stop to walk the inode cache to free all the COW
      extents, but the blockgc mechanism races with (1)'s reader threads to
      take IOLOCKs and loses, which means that it doesn't clean them all out.
      Call such a file (A).
      
      When (3) happens, xfs_remount_rw calls xfs_reflink_recover_cow, which
      walks the ondisk refcount btree and frees any COW extent that it finds.
      This function does not check the inode cache, which means that incore
      COW forks of inode (A) is now inconsistent with the ondisk metadata.  If
      one of those former COW extents are allocated and mapped into another
      file (B) and someone triggers a COW to the stale reservation in (A), A's
      dirty data will be written into (B) and once that's done, those blocks
      will be transferred to (A)'s data fork without bumping the refcount.
      
      The results are catastrophic -- file (B) and the refcount btree are now
      corrupt.  Solve this race by forcing the xfs_blockgc_free_space to run
      synchronously, which causes xfs_icwalk to return to inodes that were
      skipped because the blockgc code couldn't take the IOLOCK.  This is safe
      to do here because the VFS has already prohibited new writer threads.
      
      Fixes: 10ddf64e ("xfs: remove leftover CoW reservations when remounting ro")
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChandan Babu R <chandan.babu@oracle.com>
      089558bc
  13. 05 12月, 2021 3 次提交
  14. 31 10月, 2021 1 次提交
  15. 23 10月, 2021 3 次提交
  16. 20 10月, 2021 3 次提交
  17. 27 8月, 2021 2 次提交
  18. 20 8月, 2021 7 次提交
    • D
      xfs: introduce xfs_sb_is_v5 helper · d6837c1a
      Dave Chinner 提交于
      Rather than open coding XFS_SB_VERSION_NUM(sbp) == XFS_SB_VERSION_5
      checks everywhere, add a simple wrapper to encapsulate this and make
      the code easier to read.
      
      This allows us to remove the xfs_sb_version_has_v3inode() wrapper
      which is only used in xfs_format.h now and is just a version number
      check.
      
      There are a couple of places where we should be checking the mount
      feature bits rather than the superblock version (e.g. remount), so
      those are converted to use xfs_has_crc(mp) instead.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      d6837c1a
    • D
      xfs: convert xfs_sb_version_has checks to use mount features · ebd9027d
      Dave Chinner 提交于
      This is a conversion of the remaining xfs_sb_version_has..(sbp)
      checks to use xfs_has_..(mp) feature checks.
      
      This was largely done with a vim replacement macro that did:
      
      :0,$s/xfs_sb_version_has\(.*\)&\(.*\)->m_sb/xfs_has_\1\2/g<CR>
      
      A couple of other variants were also used, and the rest touched up
      by hand.
      
      $ size -t fs/xfs/built-in.a
      	   text    data     bss     dec     hex filename
      before	1127533  311352     484 1439369  15f689 (TOTALS)
      after	1125360  311352     484 1437196  15ee0c (TOTALS)
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      ebd9027d
    • D
      xfs: replace XFS_FORCED_SHUTDOWN with xfs_is_shutdown · 75c8c50f
      Dave Chinner 提交于
      Remove the shouty macro and instead use the inline function that
      matches other state/feature check wrapper naming. This conversion
      was done with sed.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      75c8c50f
    • D
      xfs: convert remaining mount flags to state flags · 2e973b2c
      Dave Chinner 提交于
      The remaining mount flags kept in m_flags are actually runtime state
      flags. These change dynamically, so they really should be updated
      atomically so we don't potentially lose an update due to racing
      modifications.
      
      Convert these remaining flags to be stored in m_opstate and use
      atomic bitops to set and clear the flags. This also adds a couple of
      simple wrappers for common state checks - read only and shutdown.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      2e973b2c
    • D
      xfs: convert mount flags to features · 0560f31a
      Dave Chinner 提交于
      Replace m_flags feature checks with xfs_has_<feature>() calls and
      rework the setup code to set flags in m_features.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      0560f31a
    • D
      xfs: replace xfs_sb_version checks with feature flag checks · 38c26bfd
      Dave Chinner 提交于
      Convert the xfs_sb_version_hasfoo() to checks against
      mp->m_features. Checks of the superblock itself during disk
      operations (e.g. in the read/write verifiers and the to/from disk
      formatters) are not converted - they operate purely on the
      superblock state. Everything else should use the mount features.
      
      Large parts of this conversion were done with sed with commands like
      this:
      
      for f in `git grep -l xfs_sb_version_has fs/xfs/*.c`; do
      	sed -i -e 's/xfs_sb_version_has\(.*\)(&\(.*\)->m_sb)/xfs_has_\1(\2)/' $f
      done
      
      With manual cleanups for things like "xfs_has_extflgbit" and other
      little inconsistencies in naming.
      
      The result is ia lot less typing to check features and an XFS binary
      size reduced by a bit over 3kB:
      
      $ size -t fs/xfs/built-in.a
      	text	   data	    bss	    dec	    hex	filenam
      before	1130866  311352     484 1442702  16038e (TOTALS)
      after	1127727  311352     484 1439563  15f74b (TOTALS)
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      38c26bfd
    • D
      xfs: rework attr2 feature and mount options · e23b55d5
      Dave Chinner 提交于
      The attr2 feature is somewhat unique in that it has both a superblock
      feature bit to enable it and mount options to enable and disable it.
      
      Back when it was first introduced in 2005, attr2 was disabled unless
      either the attr2 superblock feature bit was set, or the attr2 mount
      option was set. If the superblock feature bit was not set but the
      mount option was set, then when the first attr2 format inode fork
      was created, it would set the superblock feature bit. This is as it
      should be - the superblock feature bit indicated the presence of the
      attr2 on disk format.
      
      The noattr2 mount option, however, did not affect the superblock
      feature bit. If noattr2 was specified, the on-disk superblock
      feature bit was ignored and the code always just created attr1
      format inode forks.  If neither of the attr2 or noattr2 mounts
      option were specified, then the behaviour was determined by the
      superblock feature bit.
      
      This was all pretty sane.
      
      Fast foward 3 years, and we are dealing with fallout from the
      botched sb_features2 addition and having to deal with feature
      mismatches between the sb_features2 and sb_bad_features2 fields. The
      attr2 feature bit was one of these flags. The reconciliation was
      done well after mount option parsing and, unfortunately, the feature
      reconciliation had a bug where it ignored the noattr2 mount option.
      
      For reasons lost to the mists of time, it was decided that resolving
      this issue in commit 7c12f296 ("[XFS] Fix up noattr2 so that it
      will properly update the versionnum and features2 fields.") required
      noattr2 to clear the superblock attr2 feature bit.  This greatly
      complicated the attr2 behaviour and broke rules about feature bits
      needing to be set when those specific features are present in the
      filesystem.
      
      By complicated, I mean that it introduced problems due to feature
      bit interactions with log recovery. All of the superblock feature
      bit checks are done prior to log recovery, but if we crash after
      removing a feature bit, then on the next mount we see the feature
      bit in the unrecovered superblock, only to have it go away after the
      log has been replayed.  This means our mount time feature processing
      could be all wrong.
      
      Hence you can mount with noattr2, crash shortly afterwards, and
      mount again without attr2 or noattr2 and still have attr2 enabled
      because the second mount sees attr2 still enabled in the superblock
      before recovery runs and removes the feature bit. It's just a mess.
      
      Further, this is all legacy code as the v5 format requires attr2 to
      be enabled at all times and it cannot be disabled.  i.e. the noattr2
      mount option returns an error when used on v5 format filesystems.
      
      To straighten this all out, this patch reverts the attr2/noattr2
      mount option behaviour back to the original behaviour. There is no
      reason for disabling attr2 these days, so we will only do this when
      the noattr2 mount option is set. This will not remove the superblock
      feature bit. The superblock bit will provide the default behaviour
      and only track whether attr2 is present on disk or not. The attr2
      mount option will enable the creation of attr2 format inode forks,
      and if the superblock feature bit is not set it will be added when
      the first attr2 inode fork is created.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      e23b55d5
  19. 17 8月, 2021 3 次提交
    • D
      xfs: move the CIL workqueue to the CIL · 33c0dd78
      Dave Chinner 提交于
      We only use the CIL workqueue in the CIL, so it makes no sense to
      hang it off the xfs_mount and have to walk multiple pointers back up
      to the mount when we have the CIL structures right there.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      33c0dd78
    • D
      xfs: CIL work is serialised, not pipelined · 39823d0f
      Dave Chinner 提交于
      Because we use a single work structure attached to the CIL rather
      than the CIL context, we can only queue a single work item at a
      time. This results in the CIL being single threaded and limits
      performance when it becomes CPU bound.
      
      The design of the CIL is that it is pipelined and multiple commits
      can be running concurrently, but the way the work is currently
      implemented means that it is not pipelining as it was intended. The
      critical work to switch the CIL context can take a few milliseconds
      to run, but the rest of the CIL context flush can take hundreds of
      milliseconds to complete. The context switching is the serialisation
      point of the CIL, once the context has been switched the rest of the
      context push can run asynchrnously with all other context pushes.
      
      Hence we can move the work to the CIL context so that we can run
      multiple CIL pushes at the same time and spread the majority of
      the work out over multiple CPUs. We can keep the per-cpu CIL commit
      state on the CIL rather than the context, because the context is
      pinned to the CIL until the switch is done and we aggregate and
      drain the per-cpu state held on the CIL during the context switch.
      
      However, because we no longer serialise the CIL work, we can have
      effectively unlimited CIL pushes in progress. We don't want to do
      this - not only does it create contention on the iclogs and the
      state machine locks, we can run the log right out of space with
      outstanding pushes. Instead, limit the work concurrency to 4
      concurrent works being processed at a time. This is enough
      concurrency to remove the CIL from being a CPU bound bottleneck but
      not enough to create new contention points or unbound concurrency
      issues.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      39823d0f
    • D
      xfs: convert log flags to an operational state field · e1d06e5f
      Dave Chinner 提交于
      log->l_flags doesn't actually contain "flags" as such, it contains
      operational state information that can change at runtime. For the
      shutdown state, this at least should be an atomic bit because
      it is read without holding locks in many places and so using atomic
      bitops for the state field modifications makes sense.
      
      This allows us to use things like test_and_set_bit() on state
      changes (e.g. setting XLOG_TAIL_WARN) to avoid races in setting the
      state when we aren't holding locks.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      e1d06e5f
  20. 10 8月, 2021 3 次提交
  21. 07 8月, 2021 2 次提交
    • D
      xfs: per-cpu deferred inode inactivation queues · ab23a776
      Dave Chinner 提交于
      Move inode inactivation to background work contexts so that it no
      longer runs in the context that releases the final reference to an
      inode. This will allow process work that ends up blocking on
      inactivation to continue doing work while the filesytem processes
      the inactivation in the background.
      
      A typical demonstration of this is unlinking an inode with lots of
      extents. The extents are removed during inactivation, so this blocks
      the process that unlinked the inode from the directory structure. By
      moving the inactivation to the background process, the userspace
      applicaiton can keep working (e.g. unlinking the next inode in the
      directory) while the inactivation work on the previous inode is
      done by a different CPU.
      
      The implementation of the queue is relatively simple. We use a
      per-cpu lockless linked list (llist) to queue inodes for
      inactivation without requiring serialisation mechanisms, and a work
      item to allow the queue to be processed by a CPU bound worker
      thread. We also keep a count of the queue depth so that we can
      trigger work after a number of deferred inactivations have been
      queued.
      
      The use of a bound workqueue with a single work depth allows the
      workqueue to run one work item per CPU. We queue the work item on
      the CPU we are currently running on, and so this essentially gives
      us affine per-cpu worker threads for the per-cpu queues. THis
      maintains the effective CPU affinity that occurs within XFS at the
      AG level due to all objects in a directory being local to an AG.
      Hence inactivation work tends to run on the same CPU that last
      accessed all the objects that inactivation accesses and this
      maintains hot CPU caches for unlink workloads.
      
      A depth of 32 inodes was chosen to match the number of inodes in an
      inode cluster buffer. This hopefully allows sequential
      allocation/unlink behaviours to defering inactivation of all the
      inodes in a single cluster buffer at a time, further helping
      maintain hot CPU and buffer cache accesses while running
      inactivations.
      
      A hard per-cpu queue throttle of 256 inode has been set to avoid
      runaway queuing when inodes that take a long to time inactivate are
      being processed. For example, when unlinking inodes with large
      numbers of extents that can take a lot of processing to free.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      [djwong: tweak comments and tracepoints, convert opflags to state bits]
      Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      ab23a776
    • D
      xfs: move xfs_inactive call to xfs_inode_mark_reclaimable · c6c2066d
      Darrick J. Wong 提交于
      Move the xfs_inactive call and all the other debugging checks and stats
      updates into xfs_inode_mark_reclaimable because most of that are
      implementation details about the inode cache.  This is preparation for
      deferred inactivation that is coming up.  We also move it around
      xfs_icache.c in preparation for deferred inactivation.
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      c6c2066d