“b4c612a473eb816dff6e5ab6820dff338057aa8d”上不存在“README.md”
  1. 04 2月, 2021 14 次提交
  2. 16 9月, 2020 1 次提交
  3. 07 9月, 2020 1 次提交
    • D
      xfs: xfs_iflock is no longer a completion · 718ecc50
      Dave Chinner 提交于
      With the recent rework of the inode cluster flushing, we no longer
      ever wait on the the inode flush "lock". It was never a lock in the
      first place, just a completion to allow callers to wait for inode IO
      to complete. We now never wait for flush completion as all inode
      flushing is non-blocking. Hence we can get rid of all the iflock
      infrastructure and instead just set and check a state flag.
      
      Rename the XFS_IFLOCK flag to XFS_IFLUSHING, convert all the
      xfs_iflock_nowait() test-and-set operations on that flag, and
      replace all the xfs_ifunlock() calls to clear operations.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      718ecc50
  4. 29 7月, 2020 2 次提交
  5. 07 7月, 2020 9 次提交
    • D
      xfs: attach inodes to the cluster buffer when dirtied · 48d55e2a
      Dave Chinner 提交于
      Rather than attach inodes to the cluster buffer just when we are
      doing IO, attach the inodes to the cluster buffer when they are
      dirtied. The means the buffer always carries a list of dirty inodes
      that reference it, and we can use that list to make more fundamental
      changes to inode writeback that aren't otherwise possible.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      48d55e2a
    • D
      xfs: clean up inode reclaim comments · 02511a5a
      Dave Chinner 提交于
      Inode reclaim is quite different now to the way described in various
      comments, so update all the comments explaining what it does and how
      it works.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      02511a5a
    • D
      xfs: remove SYNC_WAIT from xfs_reclaim_inodes() · 4d0bab3a
      Dave Chinner 提交于
      Clean up xfs_reclaim_inodes() callers. Most callers want blocking
      behaviour, so just make the existing SYNC_WAIT behaviour the
      default.
      
      For the xfs_reclaim_worker(), just call xfs_reclaim_inodes_ag()
      directly because we just want optimistic clean inode reclaim to be
      done in the background.
      
      For xfs_quiesce_attr() we can just remove the inode reclaim calls as
      they are a historic relic that was required to flush dirty inodes
      that contained unlogged changes. We now log all changes to the
      inodes, so the sync AIL push from xfs_log_quiesce() called by
      xfs_quiesce_attr() will do all the required inode writeback for
      freeze.
      
      Seeing as we now want to loop until all reclaimable inodes have been
      reclaimed, make xfs_reclaim_inodes() loop on the XFS_ICI_RECLAIM_TAG
      tag rather than having xfs_reclaim_inodes_ag() tell it that inodes
      were skipped. This is much more reliable and will always loop until
      all reclaimable inodes are reclaimed.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      4d0bab3a
    • D
      xfs: remove SYNC_TRYLOCK from inode reclaim · 50718b8d
      Dave Chinner 提交于
      All background reclaim is SYNC_TRYLOCK already, and even blocking
      reclaim (SYNC_WAIT) can use trylock mechanisms as
      xfs_reclaim_inodes_ag() will keep cycling until there are no more
      reclaimable inodes. Hence we can kill SYNC_TRYLOCK from inode
      reclaim and make everything unconditionally non-blocking.
      
      We remove all the optimistic "avoid blocking on locks" checks done
      in xfs_reclaim_inode_grab() as nothing blocks on locks anymore.
      Further, checking XFS_IFLOCK optimistically can result in detecting
      inodes in the process of being cleaned (i.e. between being removed
      from the AIL and having the flush lock dropped), so for
      xfs_reclaim_inodes() to reliably reclaim all inodes we need to drop
      these checks anyway.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      50718b8d
    • D
      xfs: don't block inode reclaim on the ILOCK · 9552e14d
      Dave Chinner 提交于
      When we attempt to reclaim an inode, the first thing we do is take
      the inode lock. This is blocking right now, so if the inode being
      accessed by something else (e.g. being flushed to the cluster
      buffer) we will block here.
      
      Change this to a trylock so that we do not block inode reclaim
      unnecessarily here.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      9552e14d
    • D
      xfs: allow multiple reclaimers per AG · 0e8e2c63
      Dave Chinner 提交于
      Inode reclaim will still throttle direct reclaim on the per-ag
      reclaim locks. This is no longer necessary as reclaim can run
      non-blocking now. Hence we can remove these locks so that we don't
      arbitrarily block reclaimers just because there are more direct
      reclaimers than there are AGs.
      
      This can result in multiple reclaimers working on the same range of
      an AG, but this doesn't cause any apparent issues. Optimising the
      spread of concurrent reclaimers for best efficiency can be done in a
      future patchset.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      0e8e2c63
    • D
      xfs: remove IO submission from xfs_reclaim_inode() · 617825fe
      Dave Chinner 提交于
      We no longer need to issue IO from shrinker based inode reclaim to
      prevent spurious OOM killer invocation. This leaves only the global
      filesystem management operations such as unmount needing to
      writeback dirty inodes and reclaim them.
      
      Instead of using the reclaim pass to write dirty inodes before
      reclaiming them, use the AIL to push all the dirty inodes before we
      try to reclaim them. This allows us to remove all the conditional
      SYNC_WAIT locking and the writeback code from xfs_reclaim_inode()
      and greatly simplify the checks we need to do to reclaim an inode.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      617825fe
    • D
      xfs: make inode reclaim almost non-blocking · 993f951f
      Dave Chinner 提交于
      Now that dirty inode writeback doesn't cause read-modify-write
      cycles on the inode cluster buffer under memory pressure, the need
      to throttle memory reclaim to the rate at which we can clean dirty
      inodes goes away. That is due to the fact that we no longer thrash
      inode cluster buffers under memory pressure to clean dirty inodes.
      
      This means inode writeback no longer stalls on memory allocation
      or read IO, and hence can be done asynchronously without generating
      memory pressure. As a result, blocking inode writeback in reclaim is
      no longer necessary to prevent reclaim priority windup as cleaning
      dirty inodes is no longer dependent on having memory reserves
      available for the filesystem to make progress reclaiming inodes.
      
      Hence we can convert inode reclaim to be non-blocking for shrinker
      callouts, both for direct reclaim and kswapd.
      
      On a vanilla kernel, running a 16-way fsmark create workload on a
      4 node/16p/16GB RAM machine, I can reliably pin 14.75GB of RAM via
      userspace mlock(). The OOM killer gets invoked at 15GB of
      pinned RAM.
      
      Without the inode cluster pinning, this non-blocking reclaim patch
      triggers premature OOM killer invocation with the same memory
      pinning, sometimes with as much as 45% of RAM being free.  It's
      trivially easy to trigger the OOM killer when reclaim does not
      block.
      
      With pinning inode clusters in RAM and then adding this patch, I can
      reliably pin 14.5GB of RAM and still have the fsmark workload run to
      completion. The OOM killer gets invoked 14.75GB of pinned RAM, which
      is only a small amount of memory less than the vanilla kernel. It is
      much more reliable than just with async reclaim alone.
      
      simoops shows that allocation stalls go away when async reclaim is
      used. Vanilla kernel:
      
      Run time: 1924 seconds
      Read latency (p50: 3,305,472) (p95: 3,723,264) (p99: 4,001,792)
      Write latency (p50: 184,064) (p95: 553,984) (p99: 807,936)
      Allocation latency (p50: 2,641,920) (p95: 3,911,680) (p99: 4,464,640)
      work rate = 13.45/sec (avg 13.44/sec) (p50: 13.46) (p95: 13.58) (p99: 13.70)
      alloc stall rate = 3.80/sec (avg: 2.59) (p50: 2.54) (p95: 2.96) (p99: 3.02)
      
      With inode cluster pinning and async reclaim:
      
      Run time: 1924 seconds
      Read latency (p50: 3,305,472) (p95: 3,715,072) (p99: 3,977,216)
      Write latency (p50: 187,648) (p95: 553,984) (p99: 789,504)
      Allocation latency (p50: 2,748,416) (p95: 3,919,872) (p99: 4,448,256)
      work rate = 13.28/sec (avg 13.32/sec) (p50: 13.26) (p95: 13.34) (p99: 13.34)
      alloc stall rate = 0.02/sec (avg: 0.02) (p50: 0.01) (p95: 0.03) (p99: 0.03)
      
      Latencies don't really change much, nor does the work rate. However,
      allocation almost never stalls with these changes, whilst the
      vanilla kernel is sometimes reporting 20 stalls/s over a 60s sample
      period. This difference is due to inode reclaim being largely
      non-blocking now.
      
      IOWs, once we have pinned inode cluster buffers, we can make inode
      reclaim non-blocking without a major risk of premature and/or
      spurious OOM killer invocation, and without any changes to memory
      reclaim infrastructure.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NAmir Goldstein <amir73il@gmail.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      993f951f
    • D
      xfs: Don't allow logging of XFS_ISTALE inodes · 96355d5a
      Dave Chinner 提交于
      In tracking down a problem in this patchset, I discovered we are
      reclaiming dirty stale inodes. This wasn't discovered until inodes
      were always attached to the cluster buffer and then the rcu callback
      that freed inodes was assert failing because the inode still had an
      active pointer to the cluster buffer after it had been reclaimed.
      
      Debugging the issue indicated that this was a pre-existing issue
      resulting from the way the inodes are handled in xfs_inactive_ifree.
      When we free a cluster buffer from xfs_ifree_cluster, all the inodes
      in cache are marked XFS_ISTALE. Those that are clean have nothing
      else done to them and so eventually get cleaned up by background
      reclaim. i.e. it is assumed we'll never dirty/relog an inode marked
      XFS_ISTALE.
      
      On journal commit dirty stale inodes as are handled by both
      buffer and inode log items to run though xfs_istale_done() and
      removed from the AIL (buffer log item commit) or the log item will
      simply unpin it because the buffer log item will clean it. What happens
      to any specific inode is entirely dependent on which log item wins
      the commit race, but the result is the same - stale inodes are
      clean, not attached to the cluster buffer, and not in the AIL. Hence
      inode reclaim can just free these inodes without further care.
      
      However, if the stale inode is relogged, it gets dirtied again and
      relogged into the CIL. Most of the time this isn't an issue, because
      relogging simply changes the inode's location in the current
      checkpoint. Problems arise, however, when the CIL checkpoints
      between two transactions in the xfs_inactive_ifree() deferops
      processing. This results in the XFS_ISTALE inode being redirtied
      and inserted into the CIL without any of the other stale cluster
      buffer infrastructure being in place.
      
      Hence on journal commit, it simply gets unpinned, so it remains
      dirty in memory. Everything in inode writeback avoids XFS_ISTALE
      inodes so it can't be written back, and it is not tracked in the AIL
      so there's not even a trigger to attempt to clean the inode. Hence
      the inode just sits dirty in memory until inode reclaim comes along,
      sees that it is XFS_ISTALE, and goes to reclaim it. This reclaiming
      of a dirty inode caused use after free, list corruptions and other
      nasty issues later in this patchset.
      
      Hence this patch addresses a violation of the "never log XFS_ISTALE
      inodes" caused by the deferops processing rolling a transaction
      and relogging a stale inode in xfs_inactive_free. It also adds a
      bunch of asserts to catch this problem in debug kernels so that
      we don't reintroduce this problem in future.
      
      Reproducer for this issue was generic/558 on a v4 filesystem.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      96355d5a
  6. 30 5月, 2020 1 次提交
  7. 27 5月, 2020 11 次提交
  8. 20 5月, 2020 1 次提交
新手
引导
客服 返回
顶部