1. 23 1月, 2021 8 次提交
    • B
      xfs: fold sbcount quiesce logging into log covering · f46e5a17
      Brian Foster 提交于
      xfs_log_sbcount() calls xfs_sync_sb() to sync superblock counters to
      disk when lazy superblock accounting is enabled. This occurs on
      unmount, freeze, and read-only (re)mount and ensures the final
      values are calculated and persisted to disk before each form of
      quiesce completes.
      
      Now that log covering occurs in all of these contexts and uses the
      same xfs_sync_sb() mechanism to update log state, there is no need
      to log the superblock separately for any reason. Update the log
      quiesce path to sync the superblock at least once for any mount
      where lazy superblock accounting is enabled. If the log is already
      covered, it will remain in the covered state. Otherwise, the next
      sync as part of the normal covering sequence will carry the
      associated superblock update with it. Remove xfs_log_sbcount() now
      that it is no longer needed.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: NAllison Henderson <allison.henderson@oracle.com>
      f46e5a17
    • B
      xfs: don't reset log idle state on covering checkpoints · b0eb9e11
      Brian Foster 提交于
      Now that log covering occurs on quiesce, we'd like to reuse the
      underlying superblock sync for final superblock updates. This
      includes things like lazy superblock counter updates, log feature
      incompat bits in the future, etc. One quirk to this approach is that
      once the log is in the IDLE (i.e. already covered) state, any
      subsequent log write resets the state back to NEED. This means that
      a final superblock sync to an already covered log requires two more
      sb syncs to return the log back to IDLE again.
      
      For example, if a lazy superblock enabled filesystem is mount cycled
      without any modifications, the unmount path syncs the superblock
      once and writes an unmount record. With the desired log quiesce
      covering behavior, we sync the superblock three times at unmount
      time: once for the lazy superblock counter update and twice more to
      cover the log. By contrast, if the log is active or only partially
      covered at unmount time, a final superblock sync would doubly serve
      as the one or two remaining syncs required to cover the log.
      
      This duplicate covering sequence is unnecessary because the
      filesystem remains consistent if a crash occurs at any point. The
      superblock will either be recovered in the event of a crash or
      written back before the log is quiesced and potentially cleaned with
      an unmount record.
      
      Update the log covering state machine to remain in the IDLE state if
      additional covering checkpoints pass through the log. This
      facilitates final superblock updates (such as lazy superblock
      counters) via a single sb sync without losing covered status. This
      provides some consistency with the active and partially covered
      cases and also avoids harmless, but spurious checkpoints when
      quiescing the log.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: NAllison Henderson <allison.henderson@oracle.com>
      b0eb9e11
    • B
      xfs: cover the log during log quiesce · 303591a0
      Brian Foster 提交于
      The log quiesce mechanism historically terminates by marking the log
      clean with an unmount record. The primary objective is to indicate
      that log recovery is no longer required after the quiesce has
      flushed all in-core changes and written back filesystem metadata.
      While this is perfectly fine, it is somewhat hacky as currently used
      in certain contexts. For example, filesystem freeze quiesces (i.e.
      cleans) the log and immediately redirties it with a dummy superblock
      transaction to ensure that log recovery runs in the event of a
      crash.
      
      While this functions correctly, cleaning the log from freeze context
      is clearly superfluous given the current redirtying behavior.
      Instead, the desired behavior can be achieved by simply covering the
      log. This effectively retires all on-disk log items from the active
      range of the log by issuing two synchronous and sequential dummy
      superblock update transactions that serve to update the on-disk log
      head and tail. The subtle difference is that the log technically
      remains dirty due to the lack of an unmount record, though recovery
      is effectively a no-op due to the content of the checkpoints being
      clean (i.e. the unmodified on-disk superblock).
      
      Log covering currently runs in the background and only triggers once
      the filesystem and log has idled. The purpose of the background
      mechanism is to prevent log recovery from replaying the most
      recently logged items long after those items may have been written
      back. In the quiesce path, the log has been deliberately idled by
      forcing the log and pushing the AIL until empty in a context where
      no further mutable filesystem operations are allowed. Therefore, we
      can cover the log as the final step in the log quiesce codepath to
      reflect that all previously active items have been successfully
      written back.
      
      This facilitates selective log covering from certain contexts (i.e.
      freeze) that only seek to quiesce, but not necessarily clean the
      log. Note that as a side effect of this change, log covering now
      occurs when cleaning the log as well. This is harmless, facilitates
      subsequent cleanups, and is mostly temporary as various operations
      switch to use explicit log covering.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: NAllison Henderson <allison.henderson@oracle.com>
      303591a0
    • B
      xfs: separate log cleaning from log quiesce · 9e54ee0f
      Brian Foster 提交于
      Log quiesce is currently associated with cleaning the log, which is
      accomplished by writing an unmount record as the last step of the
      quiesce sequence. The quiesce codepath is a bit convoluted in this
      regard due to how it is reused from various contexts. In preparation
      to create separate log cleaning and log covering interfaces, lift
      the write of the unmount record into a new cleaning helper and call
      that wherever xfs_log_quiesce() is currently invoked. No functional
      changes.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NAllison Henderson <allison.henderson@oracle.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      9e54ee0f
    • B
      xfs: lift writable fs check up into log worker task · 37444fc4
      Brian Foster 提交于
      The log covering helper checks whether the filesystem is writable to
      determine whether to cover the log. The helper is currently only
      called from the background log worker. In preparation to reuse the
      helper from freezing contexts, lift the check into xfs_log_worker().
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NAllison Henderson <allison.henderson@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      37444fc4
    • B
      xfs: sync lazy sb accounting on quiesce of read-only mounts · 50d25484
      Brian Foster 提交于
      xfs_log_sbcount() syncs the superblock specifically to accumulate
      the in-core percpu superblock counters and commit them to disk. This
      is required to maintain filesystem consistency across quiesce
      (freeze, read-only mount/remount) or unmount when lazy superblock
      accounting is enabled because individual transactions do not update
      the superblock directly.
      
      This mechanism works as expected for writable mounts, but
      xfs_log_sbcount() skips the update for read-only mounts. Read-only
      mounts otherwise still allow log recovery and write out an unmount
      record during log quiesce. If a read-only mount performs log
      recovery, it can modify the in-core superblock counters and write an
      unmount record when the filesystem unmounts without ever syncing the
      in-core counters. This leaves the filesystem with a clean log but in
      an inconsistent state with regard to lazy sb counters.
      
      Update xfs_log_sbcount() to use the same logic
      xfs_log_unmount_write() uses to determine when to write an unmount
      record. This ensures that lazy accounting is always synced before
      the log is cleaned. Refactor this logic into a new helper to
      distinguish between a writable filesystem and a writable log.
      Specifically, the log is writable unless the filesystem is mounted
      with the norecovery mount option, the underlying log device is
      read-only, or the filesystem is shutdown. Drop the freeze state
      check because the update is already allowed during the freezing
      process and no context calls this function on an already frozen fs.
      Also, retain the shutdown check in xfs_log_unmount_write() to catch
      the case where the preceding log force might have triggered a
      shutdown.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NGao Xiang <hsiangkao@redhat.com>
      Reviewed-by: NAllison Henderson <allison.henderson@oracle.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NBill O'Donnell <billodo@redhat.com>
      Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      50d25484
    • B
      xfs: don't drain buffer lru on freeze and read-only remount · 8321ddb2
      Brian Foster 提交于
      xfs_buftarg_drain() is called from xfs_log_quiesce() to ensure the
      buffer cache is reclaimed during unmount. xfs_log_quiesce() is also
      called from xfs_quiesce_attr(), however, which means that cache
      state is completely drained for filesystem freeze and read-only
      remount. While technically harmless, this is unnecessarily
      heavyweight. Both freeze and read-only mounts allow reads and thus
      allow population of the buffer cache. Therefore, the transitional
      sequence in either case really only needs to quiesce outstanding
      writes to return the filesystem in a generally read-only state.
      
      Additionally, some users have reported that attempts to freeze a
      filesystem concurrent with a read-heavy workload causes the freeze
      process to stall for a significant amount of time. This occurs
      because, as mentioned above, the read workload repopulates the
      buffer LRU while the freeze task attempts to drain it.
      
      To improve this situation, replace the drain in xfs_log_quiesce()
      with a buffer I/O quiesce and lift the drain into the unmount path.
      This removes buffer LRU reclaim from freeze and read-only [re]mount,
      but ensures the LRU is still drained before the filesystem unmounts.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      8321ddb2
    • B
      xfs: rename xfs_wait_buftarg() to xfs_buftarg_drain() · 10fb9ac1
      Brian Foster 提交于
      xfs_wait_buftarg() is vaguely named and somewhat overloaded. Its
      primary purpose is to reclaim all buffers from the provided buffer
      target LRU. In preparation to refactor xfs_wait_buftarg() into
      serialization and LRU draining components, rename the function and
      associated helpers to something more descriptive. This patch has no
      functional changes with the minor exception of renaming a
      tracepoint.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <djwong@kernel.org>
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      10fb9ac1
  2. 07 10月, 2020 1 次提交
  3. 24 9月, 2020 1 次提交
  4. 29 7月, 2020 1 次提交
  5. 27 3月, 2020 10 次提交
  6. 26 3月, 2020 1 次提交
    • B
      xfs: shutdown on failure to add page to log bio · 842a42d1
      Brian Foster 提交于
      If the bio_add_page() call fails, we proceed to write out a
      partially constructed log buffer. This corrupts the physical log
      such that log recovery is not possible. Worse, persistent
      occurrences of this error eventually lead to a BUG_ON() failure in
      bio_split() as iclogs wrap the end of the physical log, which
      triggers log recovery on subsequent mount.
      
      Rather than warn about writing out a corrupted log buffer, shutdown
      the fs as is done for any log I/O related error. This preserves the
      consistency of the physical log such that log recovery succeeds on a
      subsequent mount. Note that this was observed on a 64k page debug
      kernel without upstream commit 59bb4798 ("mm, sl[aou]b:
      guarantee natural alignment for kmalloc(power-of-two)"), which
      demonstrated frequent iclog bio overflows due to unaligned (slab
      allocated) iclog data buffers.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      842a42d1
  7. 23 3月, 2020 7 次提交
  8. 14 3月, 2020 3 次提交
  9. 03 3月, 2020 1 次提交
    • B
      xfs: fix iclog release error check race with shutdown · 6b789c33
      Brian Foster 提交于
      Prior to commit df732b29 ("xfs: call xlog_state_release_iclog with
      l_icloglock held"), xlog_state_release_iclog() always performed a
      locked check of the iclog error state before proceeding into the
      sync state processing code. As of this commit, part of
      xlog_state_release_iclog() was open-coded into
      xfs_log_release_iclog() and as a result the locked error state check
      was lost.
      
      The lockless check still exists, but this doesn't account for the
      possibility of a race with a shutdown being performed by another
      task causing the iclog state to change while the original task waits
      on ->l_icloglock. This has reproduced very rarely via generic/475
      and manifests as an assert failure in __xlog_state_release_iclog()
      due to an unexpected iclog state.
      
      Restore the locked error state check in xlog_state_release_iclog()
      to ensure that an iclog state update via shutdown doesn't race with
      the iclog release state processing code.
      
      Fixes: df732b29 ("xfs: call xlog_state_release_iclog with l_icloglock held")
      Reported-by: NZorro Lang <zlang@redhat.com>
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      6b789c33
  10. 04 12月, 2019 1 次提交
    • B
      xfs: fix mount failure crash on invalid iclog memory access · 798a9cad
      Brian Foster 提交于
      syzbot (via KASAN) reports a use-after-free in the error path of
      xlog_alloc_log(). Specifically, the iclog freeing loop doesn't
      handle the case of a fully initialized ->l_iclog linked list.
      Instead, it assumes that the list is partially constructed and NULL
      terminated.
      
      This bug manifested because there was no possible error scenario
      after iclog list setup when the original code was added.  Subsequent
      code and associated error conditions were added some time later,
      while the original error handling code was never updated. Fix up the
      error loop to terminate either on a NULL iclog or reaching the end
      of the list.
      
      Reported-by: syzbot+c732f8644185de340492@syzkaller.appspotmail.com
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      798a9cad
  11. 19 11月, 2019 1 次提交
  12. 11 11月, 2019 1 次提交
  13. 06 11月, 2019 1 次提交
  14. 22 10月, 2019 3 次提交