1. 07 7月, 2020 10 次提交
    • D
      xfs: xfs_iflush() is no longer necessary · 90c60e16
      Dave Chinner 提交于
      Now we have a cached buffer on inode log items, we don't need
      to do buffer lookups when flushing inodes anymore - all we need
      to do is lock the buffer and we are ready to go.
      
      This largely gets rid of the need for xfs_iflush(), which is
      essentially just a mechanism to look up the buffer and flush the
      inode to it. Instead, we can just call xfs_iflush_cluster() with a
      few modifications to ensure it also flushes the inode we already
      hold locked.
      
      This allows the AIL inode item pushing to be almost entirely
      non-blocking in XFS - we won't block unless memory allocation
      for the cluster inode lookup blocks or the block device queues are
      full.
      
      Writeback during inode reclaim becomes a little more complex because
      we now have to lock the buffer ourselves, but otherwise this change
      is largely a functional no-op that removes a whole lot of code.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      90c60e16
    • D
      xfs: attach inodes to the cluster buffer when dirtied · 48d55e2a
      Dave Chinner 提交于
      Rather than attach inodes to the cluster buffer just when we are
      doing IO, attach the inodes to the cluster buffer when they are
      dirtied. The means the buffer always carries a list of dirty inodes
      that reference it, and we can use that list to make more fundamental
      changes to inode writeback that aren't otherwise possible.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      48d55e2a
    • D
      xfs: rework stale inodes in xfs_ifree_cluster · 71e3e356
      Dave Chinner 提交于
      Once we have inodes pinning the cluster buffer and attached whenever
      they are dirty, we no longer have a guarantee that the items are
      flush locked when we lock the cluster buffer. Hence we cannot just
      walk the buffer log item list and modify the attached inodes.
      
      If the inode is not flush locked, we have to ILOCK it first and then
      flush lock it to do all the prerequisite checks needed to avoid
      races with other code. This is already handled by
      xfs_ifree_get_one_inode(), so rework the inode iteration loop and
      function to update all inodes in cache whether they are attached to
      the buffer or not.
      
      Note: we also remove the copying of the log item lsn to the
      ili_flush_lsn as xfs_iflush_done() now uses the XFS_ISTALE flag to
      trigger aborts and so flush lsn matching is not needed in IO
      completion for processing freed inodes.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      71e3e356
    • D
      xfs: get rid of log item callbacks · 2ef3f7f5
      Dave Chinner 提交于
      They are not used anymore, so remove them from the log item and the
      buffer iodone attachment interfaces.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      2ef3f7f5
    • D
      xfs: make inode IO completion buffer centric · aac855ab
      Dave Chinner 提交于
      Having different io completion callbacks for different inode states
      makes things complex. We can detect if the inode is stale via the
      XFS_ISTALE flag in IO completion, so we don't need a special
      callback just for this.
      
      This means inodes only have a single iodone callback, and inode IO
      completion is entirely buffer centric at this point. Hence we no
      longer need to use a log item callback at all as we can just call
      xfs_iflush_done() directly from the buffer completions and walk the
      buffer log item list to complete the all inodes under IO.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      aac855ab
    • D
      xfs: mark inode buffers in cache · f593bf14
      Dave Chinner 提交于
      Inode buffers always have write IO callbacks, so by marking them
      directly we can avoid needing to attach ->b_iodone functions to
      them. This avoids an indirect call, and makes future modifications
      much simpler.
      
      While this is largely a refactor of existing functionality, we
      broaden the scope of the flag to beyond where inodes are explicitly
      attached because future changes need to know what type of log items
      are attached to the buffer. Adding this buffer flag may invoke the
      inode iodone callback in cases where it wouldn't have been
      previously, but this is not a functional change because the callback
      is identical to the normal buffer write iodone callback when inodes
      are not attached.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      f593bf14
    • D
      xfs: add an inode item lock · 1319ebef
      Dave Chinner 提交于
      The inode log item is kind of special in that it can be aggregating
      new changes in memory at the same time time existing changes are
      being written back to disk. This means there are fields in the log
      item that are accessed concurrently from contexts that don't share
      any locking at all.
      
      e.g. updating ili_last_fields occurs at flush time under the
      ILOCK_EXCL and flush lock at flush time, under the flush lock at IO
      completion time, and is read under the ILOCK_EXCL when the inode is
      logged.  Hence there is no actual serialisation between reading the
      field during logging of the inode in transactions vs clearing the
      field in IO completion.
      
      We currently get away with this by the fact that we are only
      clearing fields in IO completion, and nothing bad happens if we
      accidentally log more of the inode than we actually modify. Worst
      case is we consume a tiny bit more memory and log bandwidth.
      
      However, if we want to do more complex state manipulations on the
      log item that requires updates at all three of these potential
      locations, we need to have some mechanism of serialising those
      operations. To do this, introduce a spinlock into the log item to
      serialise internal state.
      
      This could be done via the xfs_inode i_flags_lock, but this then
      leads to potential lock inversion issues where inode flag updates
      need to occur inside locks that best nest inside the inode log item
      locks (e.g. marking inodes stale during inode cluster freeing).
      Using a separate spinlock avoids these sorts of problems and
      simplifies future code.
      
      This does not touch the use of ili_fields in the item formatting
      code - that is entirely protected by the ILOCK_EXCL at this point in
      time, so it remains untouched.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      1319ebef
    • D
      xfs: remove logged flag from inode log item · 1dfde687
      Dave Chinner 提交于
      This was used to track if the item had logged fields being flushed
      to disk. We log everything in the inode these days, so this logic is
      no longer needed. Remove it.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      1dfde687
    • D
      xfs: Don't allow logging of XFS_ISTALE inodes · 96355d5a
      Dave Chinner 提交于
      In tracking down a problem in this patchset, I discovered we are
      reclaiming dirty stale inodes. This wasn't discovered until inodes
      were always attached to the cluster buffer and then the rcu callback
      that freed inodes was assert failing because the inode still had an
      active pointer to the cluster buffer after it had been reclaimed.
      
      Debugging the issue indicated that this was a pre-existing issue
      resulting from the way the inodes are handled in xfs_inactive_ifree.
      When we free a cluster buffer from xfs_ifree_cluster, all the inodes
      in cache are marked XFS_ISTALE. Those that are clean have nothing
      else done to them and so eventually get cleaned up by background
      reclaim. i.e. it is assumed we'll never dirty/relog an inode marked
      XFS_ISTALE.
      
      On journal commit dirty stale inodes as are handled by both
      buffer and inode log items to run though xfs_istale_done() and
      removed from the AIL (buffer log item commit) or the log item will
      simply unpin it because the buffer log item will clean it. What happens
      to any specific inode is entirely dependent on which log item wins
      the commit race, but the result is the same - stale inodes are
      clean, not attached to the cluster buffer, and not in the AIL. Hence
      inode reclaim can just free these inodes without further care.
      
      However, if the stale inode is relogged, it gets dirtied again and
      relogged into the CIL. Most of the time this isn't an issue, because
      relogging simply changes the inode's location in the current
      checkpoint. Problems arise, however, when the CIL checkpoints
      between two transactions in the xfs_inactive_ifree() deferops
      processing. This results in the XFS_ISTALE inode being redirtied
      and inserted into the CIL without any of the other stale cluster
      buffer infrastructure being in place.
      
      Hence on journal commit, it simply gets unpinned, so it remains
      dirty in memory. Everything in inode writeback avoids XFS_ISTALE
      inodes so it can't be written back, and it is not tracked in the AIL
      so there's not even a trigger to attempt to clean the inode. Hence
      the inode just sits dirty in memory until inode reclaim comes along,
      sees that it is XFS_ISTALE, and goes to reclaim it. This reclaiming
      of a dirty inode caused use after free, list corruptions and other
      nasty issues later in this patchset.
      
      Hence this patch addresses a violation of the "never log XFS_ISTALE
      inodes" caused by the deferops processing rolling a transaction
      and relogging a stale inode in xfs_inactive_free. It also adds a
      bunch of asserts to catch this problem in debug kernels so that
      we don't reintroduce this problem in future.
      
      Reproducer for this issue was generic/558 on a v4 filesystem.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      96355d5a
    • D
      xfs: move helpers that lock and unlock two inodes against userspace IO · e2aaee9c
      Darrick J. Wong 提交于
      Move the double-inode locking helpers to xfs_inode.c since they're not
      specific to reflink.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      e2aaee9c
  2. 10 6月, 2020 1 次提交
  3. 09 6月, 2020 1 次提交
  4. 20 5月, 2020 6 次提交
  5. 07 5月, 2020 5 次提交
  6. 05 5月, 2020 1 次提交
  7. 06 4月, 2020 1 次提交
  8. 02 4月, 2020 1 次提交
    • B
      xfs: fix inode number overflow in ifree cluster helper · d9fdd0ad
      Brian Foster 提交于
      Qian Cai reports seemingly random buffer read verifier errors during
      filesystem writeback. This was isolated to a recent patch that
      factored out some inode cluster freeing code and happened to cast an
      unsigned inode number type to a signed value. If the inode number
      value overflows, we can skip marking in-core inodes associated with
      the underlying buffer stale at the time the physical inodes are
      freed. If such an inode happens to be dirty, xfsaild will eventually
      attempt to write it back over non-inode blocks. The invalidation of
      the underlying inode buffer causes writeback to read the buffer from
      disk. This fails the read verifier (preventing eventual corruption)
      if the buffer no longer looks like an inode cluster. Analysis by
      Dave Chinner.
      
      Fix up the helper to use the proper type for inode number values.
      
      Fixes: 5806165a ("xfs: factor inode lookup from xfs_ifree_cluster")
      Reported-by: NQian Cai <cai@lca.pw>
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      d9fdd0ad
  9. 29 3月, 2020 1 次提交
  10. 27 3月, 2020 1 次提交
  11. 19 3月, 2020 2 次提交
  12. 12 3月, 2020 2 次提交
  13. 03 3月, 2020 2 次提交
  14. 27 1月, 2020 1 次提交
  15. 24 1月, 2020 1 次提交
  16. 15 1月, 2020 1 次提交
  17. 14 11月, 2019 3 次提交