“fed81c404311e77ed9527413851e7294373fc24c”上不存在“make/mapfiles/libawt/mapfile-vers”
  1. 19 8月, 2021 1 次提交
  2. 10 8月, 2021 7 次提交
    • D
      xfs: throttle inode inactivation queuing on memory reclaim · 40b1de00
      Darrick J. Wong 提交于
      Now that we defer inode inactivation, we've decoupled the process of
      unlinking or closing an inode from the process of inactivating it.  In
      theory this should lead to better throughput since we now inactivate the
      queued inodes in batches instead of one at a time.
      
      Unfortunately, one of the primary risks with this decoupling is the loss
      of rate control feedback between the frontend and background threads.
      In other words, a rm -rf /* thread can run the system out of memory if
      it can queue inodes for inactivation and jump to a new CPU faster than
      the background threads can actually clear the deferred work.  The
      workers can get scheduled off the CPU if they have to do IO, etc.
      
      To solve this problem, we configure a shrinker so that it will activate
      the /second/ time the shrinkers are called.  The custom shrinker will
      queue all percpu deferred inactivation workers immediately and set a
      flag to force frontend callers who are releasing a vfs inode to wait for
      the inactivation workers.
      
      On my test VM with 560M of RAM and a 2TB filesystem, this seems to solve
      most of the OOMing problem when deleting 10 million inodes.
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      40b1de00
    • D
      xfs: use background worker pool when transactions can't get free space · e8d04c2a
      Darrick J. Wong 提交于
      In xfs_trans_alloc, if the block reservation call returns ENOSPC, we
      call xfs_blockgc_free_space with a NULL icwalk structure to try to free
      space.  Each frontend thread that encounters this situation starts its
      own walk of the inode cache to see if it can find anything, which is
      wasteful since we don't have any additional selection criteria.  For
      this one common case, create a function that reschedules all pending
      background work immediately and flushes the workqueue so that the scan
      can run in parallel.
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      e8d04c2a
    • D
      xfs: don't run speculative preallocation gc when fs is frozen · 6f649091
      Darrick J. Wong 提交于
      Now that we have the infrastructure to switch background workers on and
      off at will, fix the block gc worker code so that we don't actually run
      the worker when the filesystem is frozen, same as we do for deferred
      inactivation.
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      6f649091
    • D
      xfs: inactivate inodes any time we try to free speculative preallocations · 2eb66502
      Darrick J. Wong 提交于
      Other parts of XFS have learned to call xfs_blockgc_free_{space,quota}
      to try to free speculative preallocations when space is tight.  This
      means that file writes, transaction reservation failures, quota limit
      enforcement, and the EOFBLOCKS ioctl all call this function to free
      space when things are tight.
      
      Since inode inactivation is now a background task, this means that the
      filesystem can be hanging on to unlinked but not yet freed space.  Add
      this to the list of things that xfs_blockgc_free_* makes writer threads
      scan for when they cannot reserve space.
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      2eb66502
    • D
      xfs: queue inactivation immediately when free realtime extents are tight · 65f03d86
      Darrick J. Wong 提交于
      Now that we have made the inactivation of unlinked inodes a background
      task to increase the throughput of file deletions, we need to be a
      little more careful about how long of a delay we can tolerate.
      
      Similar to the patch doing this for free space on the data device, if
      the file being inactivated is a realtime file and the realtime volume is
      running low on free extents, we want to run the worker ASAP so that the
      realtime allocator can make better decisions.
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      65f03d86
    • D
      xfs: queue inactivation immediately when quota is nearing enforcement · 108523b8
      Darrick J. Wong 提交于
      Now that we have made the inactivation of unlinked inodes a background
      task to increase the throughput of file deletions, we need to be a
      little more careful about how long of a delay we can tolerate.
      
      Specifically, if the dquots attached to the inode being inactivated are
      nearing any kind of enforcement boundary, we want to queue that
      inactivation work immediately so that users don't get EDQUOT/ENOSPC
      errors even after they deleted a bunch of files to stay within quota.
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      108523b8
    • D
      xfs: queue inactivation immediately when free space is tight · 7d6f07d2
      Darrick J. Wong 提交于
      Now that we have made the inactivation of unlinked inodes a background
      task to increase the throughput of file deletions, we need to be a
      little more careful about how long of a delay we can tolerate.
      
      On a mostly empty filesystem, the risk of the allocator making poor
      decisions due to fragmentation of the free space on account a lengthy
      delay in background updates is minimal because there's plenty of space.
      However, if free space is tight, we want to deallocate unlinked inodes
      as quickly as possible to avoid fallocate ENOSPC and to give the
      allocator the best shot at optimal allocations for new writes.
      
      Therefore, queue the percpu worker immediately if the filesystem is more
      than 95% full.  This follows the same principle that XFS becomes less
      aggressive about speculative allocations and lazy cleanup (and more
      precise about accounting) when nearing full.
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      7d6f07d2
  3. 07 8月, 2021 4 次提交
    • D
      xfs: per-cpu deferred inode inactivation queues · ab23a776
      Dave Chinner 提交于
      Move inode inactivation to background work contexts so that it no
      longer runs in the context that releases the final reference to an
      inode. This will allow process work that ends up blocking on
      inactivation to continue doing work while the filesytem processes
      the inactivation in the background.
      
      A typical demonstration of this is unlinking an inode with lots of
      extents. The extents are removed during inactivation, so this blocks
      the process that unlinked the inode from the directory structure. By
      moving the inactivation to the background process, the userspace
      applicaiton can keep working (e.g. unlinking the next inode in the
      directory) while the inactivation work on the previous inode is
      done by a different CPU.
      
      The implementation of the queue is relatively simple. We use a
      per-cpu lockless linked list (llist) to queue inodes for
      inactivation without requiring serialisation mechanisms, and a work
      item to allow the queue to be processed by a CPU bound worker
      thread. We also keep a count of the queue depth so that we can
      trigger work after a number of deferred inactivations have been
      queued.
      
      The use of a bound workqueue with a single work depth allows the
      workqueue to run one work item per CPU. We queue the work item on
      the CPU we are currently running on, and so this essentially gives
      us affine per-cpu worker threads for the per-cpu queues. THis
      maintains the effective CPU affinity that occurs within XFS at the
      AG level due to all objects in a directory being local to an AG.
      Hence inactivation work tends to run on the same CPU that last
      accessed all the objects that inactivation accesses and this
      maintains hot CPU caches for unlink workloads.
      
      A depth of 32 inodes was chosen to match the number of inodes in an
      inode cluster buffer. This hopefully allows sequential
      allocation/unlink behaviours to defering inactivation of all the
      inodes in a single cluster buffer at a time, further helping
      maintain hot CPU and buffer cache accesses while running
      inactivations.
      
      A hard per-cpu queue throttle of 256 inode has been set to avoid
      runaway queuing when inodes that take a long to time inactivate are
      being processed. For example, when unlinking inodes with large
      numbers of extents that can take a lot of processing to free.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      [djwong: tweak comments and tracepoints, convert opflags to state bits]
      Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      ab23a776
    • D
      xfs: detach dquots from inode if we don't need to inactivate it · 62af7d54
      Darrick J. Wong 提交于
      If we don't need to inactivate an inode, we can detach the dquots and
      move on to reclamation.  This isn't strictly required here; it's a
      preparation patch for deferred inactivation per reviewer request[1] to
      move the creation of xfs_inode_needs_inactivation into a separate
      change.  Eventually this !need_inactive chunk will turn into the code
      path for inodes that skip xfs_inactive and go straight to memory
      reclaim.
      
      [1] https://lore.kernel.org/linux-xfs/20210609012838.GW2945738@locust/T/#mca6d958521cb88bbc1bfe1a30767203328d410b5Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      62af7d54
    • D
      xfs: move xfs_inactive call to xfs_inode_mark_reclaimable · c6c2066d
      Darrick J. Wong 提交于
      Move the xfs_inactive call and all the other debugging checks and stats
      updates into xfs_inode_mark_reclaimable because most of that are
      implementation details about the inode cache.  This is preparation for
      deferred inactivation that is coming up.  We also move it around
      xfs_icache.c in preparation for deferred inactivation.
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      c6c2066d
    • C
      xfs: remove xfs_dqrele_all_inodes · 777eb1fa
      Christoph Hellwig 提交于
      xfs_dqrele_all_inodes is unused now, remove it and all supporting code.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      777eb1fa
  4. 22 6月, 2021 3 次提交
  5. 09 6月, 2021 4 次提交
    • D
      xfs: rename struct xfs_eofblocks to xfs_icwalk · b26b2bf1
      Darrick J. Wong 提交于
      The xfs_eofblocks structure is no longer well-named -- nowadays it
      provides optional filtering criteria to any walk of the incore inode
      cache.  Only one of the cache walk goals has anything to do with
      clearing of speculative post-EOF preallocations, so change the name to
      be more appropriate.
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      b26b2bf1
    • D
      xfs: selectively keep sick inodes in memory · 9492750a
      Darrick J. Wong 提交于
      It's important that the filesystem retain its memory of sick inodes for
      a little while after problems are found so that reports can be collected
      about what was wrong.  Don't let inode reclamation free sick inodes
      unless we're unmounting or the fs already went down.
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NCarlos Maiolino <cmaiolino@redhat.com>
      9492750a
    • D
      xfs: change the prefix of XFS_EOF_FLAGS_* to XFS_ICWALK_FLAG_ · 2d53f66b
      Darrick J. Wong 提交于
      In preparation for renaming struct xfs_eofblocks to struct xfs_icwalk,
      change the prefix of the existing XFS_EOF_FLAGS_* flags to
      XFS_ICWALK_FLAG_ and convert all the existing users.  This adds a degree
      of interface separation between the ioctl definitions and the incore
      parameters.  Since FLAGS_UNION is only used in xfs_icache.c, move it
      there as a private flag.
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NCarlos Maiolino <cmaiolino@redhat.com>
      2d53f66b
    • D
      xfs: only reset incore inode health state flags when reclaiming an inode · 255794c7
      Darrick J. Wong 提交于
      While running some fuzz tests on inode metadata, I noticed that the
      filesystem health report (as provided by xfs_spaceman) failed to report
      the file corruption even when spaceman was run immediately after running
      xfs_scrub to detect the corruption.  That isn't the intended behavior;
      one ought to be able to run scrub to detect errors in the ondisk
      metadata and be able to access to those reports for some time after the
      scrub.
      
      After running the same sequence through an instrumented kernel, I
      discovered the reason why -- scrub igets the file, scans it, marks it
      sick, and ireleases the inode.  When the VFS lets go of the incore
      inode, it moves to RECLAIMABLE state.  If spaceman igets the incore
      inode before it moves to RECLAIM state, iget reinitializes the VFS
      state, clears the sick and checked masks, and hands back the inode.  At
      this point, the caller has the exact same incore inode, but with all the
      health state erased.
      
      In other words, we're erasing the incore inode's health state flags when
      we've decided NOT to sever the link between the incore inode and the
      ondisk inode.  This is wrong, so we need to remove the lines that zero
      the fields from xfs_iget_cache_hit.
      
      As a precaution, we add the same lines into xfs_reclaim_inode just after
      we sever the link between incore and ondisk inode.  Strictly speaking
      this isn't necessary because once an inode has gone through reclaim it
      must go through xfs_inode_alloc (which also zeroes the state) and
      xfs_iget is careful to check for mismatches between the inode it pulls
      out of the radix tree and the one it wants.
      
      Fixes: 6772c1f1 ("xfs: track metadata health status")
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NCarlos Maiolino <cmaiolino@redhat.com>
      255794c7
  6. 04 6月, 2021 14 次提交
  7. 02 6月, 2021 2 次提交
  8. 08 4月, 2021 5 次提交