1. 08 4月, 2021 1 次提交
  2. 15 3月, 2021 1 次提交
    • D
      xfs: force log and push AIL to clear pinned inodes when aborting mount · d336f7eb
      Darrick J. Wong 提交于
      If we allocate quota inodes in the process of mounting a filesystem but
      then decide to abort the mount, it's possible that the quota inodes are
      sitting around pinned by the log.  Now that inode reclaim relies on the
      AIL to flush inodes, we have to force the log and push the AIL in
      between releasing the quota inodes and kicking off reclaim to tear down
      all the incore inodes.  Do this by extracting the bits we need from the
      unmount path and reusing them.  As an added bonus, failed writes during
      a failed mount will not retry forever now.
      
      This was originally found during a fuzz test of metadata directories
      (xfs/1546), but the actual symptom was that reclaim hung up on the quota
      inodes.
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      d336f7eb
  3. 04 2月, 2021 2 次提交
  4. 23 1月, 2021 4 次提交
  5. 19 11月, 2020 1 次提交
  6. 16 9月, 2020 1 次提交
  7. 07 9月, 2020 2 次提交
  8. 07 7月, 2020 2 次提交
    • D
      xfs: remove SYNC_WAIT from xfs_reclaim_inodes() · 4d0bab3a
      Dave Chinner 提交于
      Clean up xfs_reclaim_inodes() callers. Most callers want blocking
      behaviour, so just make the existing SYNC_WAIT behaviour the
      default.
      
      For the xfs_reclaim_worker(), just call xfs_reclaim_inodes_ag()
      directly because we just want optimistic clean inode reclaim to be
      done in the background.
      
      For xfs_quiesce_attr() we can just remove the inode reclaim calls as
      they are a historic relic that was required to flush dirty inodes
      that contained unlogged changes. We now log all changes to the
      inodes, so the sync AIL push from xfs_log_quiesce() called by
      xfs_quiesce_attr() will do all the required inode writeback for
      freeze.
      
      Seeing as we now want to loop until all reclaimable inodes have been
      reclaimed, make xfs_reclaim_inodes() loop on the XFS_ICI_RECLAIM_TAG
      tag rather than having xfs_reclaim_inodes_ag() tell it that inodes
      were skipped. This is much more reliable and will always loop until
      all reclaimable inodes are reclaimed.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      4d0bab3a
    • D
      xfs: allow multiple reclaimers per AG · 0e8e2c63
      Dave Chinner 提交于
      Inode reclaim will still throttle direct reclaim on the per-ag
      reclaim locks. This is no longer necessary as reclaim can run
      non-blocking now. Hence we can remove these locks so that we don't
      arbitrarily block reclaimers just because there are more direct
      reclaimers than there are AGs.
      
      This can result in multiple reclaimers working on the same range of
      an AG, but this doesn't cause any apparent issues. Optimising the
      spread of concurrent reclaimers for best efficiency can be done in a
      future patchset.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      0e8e2c63
  9. 27 5月, 2020 1 次提交
    • D
      xfs: reduce free inode accounting overhead · f18c9a90
      Dave Chinner 提交于
      Shaokun Zhang reported that XFS was using substantial CPU time in
      percpu_count_sum() when running a single threaded benchmark on
      a high CPU count (128p) machine from xfs_mod_ifree(). The issue
      is that the filesystem is empty when the benchmark runs, so inode
      allocation is running with a very low inode free count.
      
      With the percpu counter batching, this means comparisons when the
      counter is less that 128 * 256 = 32768 use the slow path of adding
      up all the counters across the CPUs, and this is expensive on high
      CPU count machines.
      
      The summing in xfs_mod_ifree() is only used to fire an assert if an
      underrun occurs. The error is ignored by the higher level code.
      Hence this is really just debug code and we don't need to run it
      on production kernels, nor do we need such debug checks to return
      error values just to trigger an assert.
      
      Finally, xfs_mod_icount/xfs_mod_ifree are only called from
      xfs_trans_unreserve_and_mod_sb(), so get rid of them and just
      directly call the percpu_counter_add/percpu_counter_compare
      functions. The compare functions are now run only on debug builds as
      they are internal to ASSERT() checks and so only compiled in when
      ASSERTs are active (CONFIG_XFS_DEBUG=y or CONFIG_XFS_WARN=y).
      Reported-by: NShaokun Zhang <zhangshaokun@hisilicon.com>
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      f18c9a90
  10. 05 5月, 2020 1 次提交
  11. 12 3月, 2020 1 次提交
  12. 19 12月, 2019 2 次提交
    • D
      xfs: don't commit sunit/swidth updates to disk if that would cause repair failures · 13eaec4b
      Darrick J. Wong 提交于
      Alex Lyakas reported[1] that mounting an xfs filesystem with new sunit
      and swidth values could cause xfs_repair to fail loudly.  The problem
      here is that repair calculates the where mkfs should have allocated the
      root inode, based on the superblock geometry.  The allocation decisions
      depend on sunit, which means that we really can't go updating sunit if
      it would lead to a subsequent repair failure on an otherwise correct
      filesystem.
      
      Port from xfs_repair some code that computes the location of the root
      inode and teach mount to skip the ondisk update if it would cause
      problems for repair.  Along the way we'll update the documentation,
      provide a function for computing the minimum AGFL size instead of
      open-coding it, and cut down some indenting in the mount code.
      
      Note that we allow the mount to proceed (and new allocations will
      reflect this new geometry) because we've never screened this kind of
      thing before.  We'll have to wait for a new future incompat feature to
      enforce correct behavior, alas.
      
      Note that the geometry reporting always uses the superblock values, not
      the incore ones, so that is what xfs_info and xfs_growfs will report.
      
      [1] https://lore.kernel.org/linux-xfs/20191125130744.GA44777@bfoster/T/#m00f9594b511e076e2fcdd489d78bc30216d72a7dReported-by: NAlex Lyakas <alex@zadara.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      13eaec4b
    • D
      xfs: split the sunit parameter update into two parts · 4f5b1b3a
      Darrick J. Wong 提交于
      If the administrator provided a sunit= mount option, we need to validate
      the raw parameter, convert the mount option units (512b blocks) into the
      internal unit (fs blocks), and then validate that the (now cooked)
      parameter doesn't screw anything up on disk.  The incore inode geometry
      computation can depend on the new sunit option, but a subsequent patch
      will make validating the cooked value depends on the computed inode
      geometry, so break the sunit update into two steps.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      4f5b1b3a
  13. 14 11月, 2019 1 次提交
  14. 06 11月, 2019 1 次提交
  15. 30 10月, 2019 4 次提交
  16. 06 9月, 2019 1 次提交
  17. 27 8月, 2019 1 次提交
  18. 29 6月, 2019 1 次提交
  19. 12 6月, 2019 4 次提交
  20. 27 4月, 2019 2 次提交
  21. 15 4月, 2019 3 次提交
  22. 12 2月, 2019 1 次提交
    • D
      xfs: cache unlinked pointers in an rhashtable · 9b247179
      Darrick J. Wong 提交于
      Use a rhashtable to cache the unlinked list incore.  This should speed
      up unlinked processing considerably when there are a lot of inodes on
      the unlinked list because iunlink_remove no longer has to traverse an
      entire bucket list to find which inode points to the one being removed.
      
      The incore list structure records "X.next_unlinked = Y" relations, with
      the rhashtable using Y to index the records.  This makes finding the
      inode X that points to a inode Y very quick.  If our cache fails to find
      anything we can always fall back on the old method.
      
      FWIW this drastically reduces the amount of time it takes to remove
      inodes from the unlinked list.  I wrote a program to open a lot of
      O_TMPFILE files and then close them in the same order, which takes
      a very long time if we have to traverse the unlinked lists.  With the
      ptach, I see:
      
      + /d/t/tmpfile/tmpfile
      Opened 193531 files in 6.33s.
      Closed 193531 files in 5.86s
      
      real    0m12.192s
      user    0m0.064s
      sys     0m11.619s
      + cd /
      + umount /mnt
      
      real    0m0.050s
      user    0m0.004s
      sys     0m0.030s
      
      And without the patch:
      
      + /d/t/tmpfile/tmpfile
      Opened 193588 files in 6.35s.
      Closed 193588 files in 751.61s
      
      real    12m38.853s
      user    0m0.084s
      sys     12m34.470s
      + cd /
      + umount /mnt
      
      real    0m0.086s
      user    0m0.000s
      sys     0m0.060s
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      9b247179
  23. 13 12月, 2018 2 次提交