1. 13 1月, 2018 2 次提交
  2. 09 1月, 2018 2 次提交
  3. 28 11月, 2017 1 次提交
    • D
      xfs: log recovery should replay deferred ops in order · 50995582
      Darrick J. Wong 提交于
      As part of testing log recovery with dm_log_writes, Amir Goldstein
      discovered an error in the deferred ops recovery that lead to corruption
      of the filesystem metadata if a reflink+rmap filesystem happened to shut
      down midway through a CoW remap:
      
      "This is what happens [after failed log recovery]:
      
      "Phase 1 - find and verify superblock...
      "Phase 2 - using internal log
      "        - zero log...
      "        - scan filesystem freespace and inode maps...
      "        - found root inode chunk
      "Phase 3 - for each AG...
      "        - scan (but don't clear) agi unlinked lists...
      "        - process known inodes and perform inode discovery...
      "        - agno = 0
      "data fork in regular inode 134 claims CoW block 376
      "correcting nextents for inode 134
      "bad data fork in inode 134
      "would have cleared inode 134"
      
      Hou Tao dissected the log contents of exactly such a crash:
      
      "According to the implementation of xfs_defer_finish(), these ops should
      be completed in the following sequence:
      
      "Have been done:
      "(1) CUI: Oper (160)
      "(2) BUI: Oper (161)
      "(3) CUD: Oper (194), for CUI Oper (160)
      "(4) RUI A: Oper (197), free rmap [0x155, 2, -9]
      
      "Should be done:
      "(5) BUD: for BUI Oper (161)
      "(6) RUI B: add rmap [0x155, 2, 137]
      "(7) RUD: for RUI A
      "(8) RUD: for RUI B
      
      "Actually be done by xlog_recover_process_intents()
      "(5) BUD: for BUI Oper (161)
      "(6) RUI B: add rmap [0x155, 2, 137]
      "(7) RUD: for RUI B
      "(8) RUD: for RUI A
      
      "So the rmap entry [0x155, 2, -9] for COW should be freed firstly,
      then a new rmap entry [0x155, 2, 137] will be added. However, as we can see
      from the log record in post_mount.log (generated after umount) and the trace
      print, the new rmap entry [0x155, 2, 137] are added firstly, then the rmap
      entry [0x155, 2, -9] are freed."
      
      When reconstructing the internal log state from the log items found on
      disk, it's required that deferred ops replay in exactly the same order
      that they would have had the filesystem not gone down.  However,
      replaying unfinished deferred ops can create /more/ deferred ops.  These
      new deferred ops are finished in the wrong order.  This causes fs
      corruption and replay crashes, so let's create a single defer_ops to
      handle the subsequent ops created during replay, then use one single
      transaction at the end of log recovery to ensure that everything is
      replayed in the same order as they're supposed to be.
      Reported-by: NAmir Goldstein <amir73il@gmail.com>
      Analyzed-by: NHou Tao <houtao1@huawei.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Tested-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      50995582
  4. 07 11月, 2017 1 次提交
  5. 02 11月, 2017 1 次提交
  6. 27 10月, 2017 3 次提交
    • B
      xfs: fix log block underflow during recovery cycle verification · 9f2a4505
      Brian Foster 提交于
      It is possible for mkfs to format very small filesystems with too
      small of an internal log with respect to the various minimum size
      and block count requirements. If this occurs when the log happens to
      be smaller than the scan window used for cycle verification and the
      scan wraps the end of the log, the start_blk calculation in
      xlog_find_head() underflows and leads to an attempt to scan an
      invalid range of log blocks. This results in log recovery failure
      and a failed mount.
      
      Since there may be filesystems out in the wild with this kind of
      geometry, we cannot simply refuse to mount. Instead, cap the scan
      window for cycle verification to the size of the physical log. This
      ensures that the cycle verification proceeds as expected when the
      scan wraps the end of the log.
      Reported-by: NZorro Lang <zlang@redhat.com>
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      9f2a4505
    • B
      xfs: more robust recovery xlog buffer validation · 99c26595
      Brian Foster 提交于
      mkfs has a historical problem where it can format very small
      filesystems with too small of a physical log. Under certain
      conditions, log recovery of an associated filesystem can end up
      passing garbage parameter values to some of the cycle and log record
      verification functions due to bugs in log recovery not dealing with
      such filesystems properly. This results in attempts to read from
      bogus/underflowed log block addresses.
      
      Since the buffer read may ultimately succeed, log recovery can
      proceed with bogus data and otherwise go off the rails and crash.
      One example of this is a negative last_blk being passed to
      xlog_find_verify_log_record() causing us to skip the loop, pass a
      NULL head pointer to xlog_header_check_mount() and crash.
      
      Improve the xlog buffer verification to address this problem. We
      already verify xlog buffer length, so update this mechanism to also
      sanity check for a valid log relative block address and otherwise
      return an error. Pass a fixed, valid log block address from
      xlog_get_bp() since the target address will be validated when the
      buffer is read. This ensures that any bogus log block address/length
      calculations lead to graceful mount failure rather than risking a
      crash or worse if recovery proceeds with bogus data.
      Reported-by: NZorro Lang <zlang@redhat.com>
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      99c26595
    • C
      xfs: remove the never fully implemented UUID fork format · 42b67dc6
      Christoph Hellwig 提交于
      Remove the dead code dealing with the UUID fork format that was never
      implemented in Linux (and neither in IRIX as far as I know).
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      42b67dc6
  7. 02 9月, 2017 1 次提交
  8. 23 8月, 2017 5 次提交
    • B
      xfs: add log recovery tracepoint for head/tail · e67d3d42
      Brian Foster 提交于
      Torn write detection and tail overwrite detection can shift the log
      head and tail respectively in the event of CRC mismatch or
      corruption errors. Add a high-level log recovery tracepoint to dump
      the final log head/tail and make those values easily attainable in
      debug/diagnostic situations.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      e67d3d42
    • B
      xfs: handle -EFSCORRUPTED during head/tail verification · a4c9b34d
      Brian Foster 提交于
      Torn write and tail overwrite detection both trigger only on
      -EFSBADCRC errors. While this is the most likely failure scenario
      for each condition, -EFSCORRUPTED is still possible in certain cases
      depending on what ends up on disk when a torn write or partial tail
      overwrite occurs. For example, an invalid log record h_len can lead
      to an -EFSCORRUPTED error when running the log recovery CRC pass.
      
      Therefore, update log head and tail verification to trigger the
      associated head/tail fixups in the event of -EFSCORRUPTED errors
      along with -EFSBADCRC. Also, -EFSCORRUPTED can currently be returned
      from xlog_do_recovery_pass() before rhead_blk is initialized if the
      first record encountered happens to be corrupted. This leads to an
      incorrect 'first_bad' return value. Initialize rhead_blk earlier in
      the function to address that problem as well.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      a4c9b34d
    • B
      xfs: fix log recovery corruption error due to tail overwrite · 4a4f66ea
      Brian Foster 提交于
      If we consider the case where the tail (T) of the log is pinned long
      enough for the head (H) to push and block behind the tail, we can
      end up blocked in the following state without enough free space (f)
      in the log to satisfy a transaction reservation:
      
      	0	phys. log	N
      	[-------HffT---H'--T'---]
      
      The last good record in the log (before H) refers to T. The tail
      eventually pushes forward (T') leaving more free space in the log
      for writes to H. At this point, suppose space frees up in the log
      for the maximum of 8 in-core log buffers to start flushing out to
      the log. If this pushes the head from H to H', these next writes
      overwrite the previous tail T. This is safe because the items logged
      from T to T' have been written back and removed from the AIL.
      
      If the next log writes (H -> H') happen to fail and result in
      partial records in the log, the filesystem shuts down having
      overwritten T with invalid data. Log recovery correctly locates H on
      the subsequent mount, but H still refers to the now corrupted tail
      T. This results in log corruption errors and recovery failure.
      
      Since the tail overwrite results from otherwise correct runtime
      behavior, it is up to log recovery to try and deal with this
      situation. Update log recovery tail verification to run a CRC pass
      from the first record past the tail to the head. This facilitates
      error detection at T and moves the recovery tail to the first good
      record past H' (similar to truncating the head on torn write
      detection). If corruption is detected beyond the range possibly
      affected by the max number of iclogs, the log is legitimately
      corrupted and log recovery failure is expected.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      4a4f66ea
    • B
      xfs: always verify the log tail during recovery · 5297ac1f
      Brian Foster 提交于
      Log tail verification currently only occurs when torn writes are
      detected at the head of the log. This was introduced because a
      change in the head block due to torn writes can lead to a change in
      the tail block (each log record header references the current tail)
      and the tail block should be verified before log recovery proceeds.
      
      Tail corruption is possible outside of torn write scenarios,
      however. For example, partial log writes can be detected and cleared
      during the initial head/tail block discovery process. If the partial
      write coincides with a tail overwrite, the log tail is corrupted and
      recovery fails.
      
      To facilitate correct handling of log tail overwites, update log
      recovery to always perform tail verification. This is necessary to
      detect potential tail overwrite conditions when torn writes may not
      have occurred. This changes normal (i.e., no torn writes) recovery
      behavior slightly to detect and return CRC related errors near the
      tail before actual recovery starts.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      5297ac1f
    • B
      xfs: fix recovery failure when log record header wraps log end · 284f1c2c
      Brian Foster 提交于
      The high-level log recovery algorithm consists of two loops that
      walk the physical log and process log records from the tail to the
      head. The first loop handles the case where the tail is beyond the
      head and processes records up to the end of the physical log. The
      subsequent loop processes records from the beginning of the physical
      log to the head.
      
      Because log records can wrap around the end of the physical log, the
      first loop mentioned above must handle this case appropriately.
      Records are processed from in-core buffers, which means that this
      algorithm must split the reads of such records into two partial
      I/Os: 1.) from the beginning of the record to the end of the log and
      2.) from the beginning of the log to the end of the record. This is
      further complicated by the fact that the log record header and log
      record data are read into independent buffers.
      
      The current handling of each buffer correctly splits the reads when
      either the header or data starts before the end of the log and wraps
      around the end. The data read does not correctly handle the case
      where the prior header read wrapped or ends on the physical log end
      boundary. blk_no is incremented to or beyond the log end after the
      header read to point to the record data, but the split data read
      logic triggers, attempts to read from an invalid log block and
      ultimately causes log recovery to fail. This can be reproduced
      fairly reliably via xfstests tests generic/047 and generic/388 with
      large iclog sizes (256k) and small (10M) logs.
      
      If the record header read has pushed beyond the end of the physical
      log, the subsequent data read is actually contiguous. Update the
      data read logic to detect the case where blk_no has wrapped, mod it
      against the log size to read from the correct address and issue one
      contiguous read for the log data buffer. The log record is processed
      as normal from the buffer(s), the loop exits after the current
      iteration and the subsequent loop picks up with the first new record
      after the start of the log.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      284f1c2c
  9. 25 6月, 2017 1 次提交
    • B
      xfs: free uncommitted transactions during log recovery · 39775431
      Brian Foster 提交于
      Log recovery allocates in-core transaction and member item data
      structures on-demand as it processes the on-disk log. Transactions
      are allocated on first encounter on-disk and stored in a hash table
      structure where they are easily accessible for subsequent lookups.
      Transaction items are also allocated on demand and are attached to
      the associated transactions.
      
      When a commit record is encountered in the log, the transaction is
      committed to the fs and the in-core structures are freed. If a
      filesystem crashes or shuts down before all in-core log buffers are
      flushed to the log, however, not all transactions may have commit
      records in the log. As expected, the modifications in such an
      incomplete transaction are not replayed to the fs. The in-core data
      structures for the partial transaction are never freed, however,
      resulting in a memory leak.
      
      Update xlog_do_recovery_pass() to first correctly initialize the
      hash table array so empty lists can be distinguished from populated
      lists on function exit. Update xlog_recover_free_trans() to always
      remove the transaction from the list prior to freeing the associated
      memory. Finally, walk the hash table of transaction lists as the
      last step before it goes out of scope and free any transactions that
      may remain on the lists. This prevents a memory leak of partial
      transactions in the log.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      39775431
  10. 20 6月, 2017 1 次提交
    • D
      xfs: remove double-underscore integer types · c8ce540d
      Darrick J. Wong 提交于
      This is a purely mechanical patch that removes the private
      __{u,}int{8,16,32,64}_t typedefs in favor of using the system
      {u,}int{8,16,32,64}_t typedefs.  This is the sed script used to perform
      the transformation and fix the resulting whitespace and indentation
      errors:
      
      s/typedef\t__uint8_t/typedef __uint8_t\t/g
      s/typedef\t__uint/typedef __uint/g
      s/typedef\t__int\([0-9]*\)_t/typedef int\1_t\t/g
      s/__uint8_t\t/__uint8_t\t\t/g
      s/__uint/uint/g
      s/__int\([0-9]*\)_t\t/__int\1_t\t\t/g
      s/__int/int/g
      /^typedef.*int[0-9]*_t;$/d
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      c8ce540d
  11. 05 6月, 2017 1 次提交
  12. 09 5月, 2017 1 次提交
  13. 05 12月, 2016 3 次提交
  14. 08 11月, 2016 1 次提交
  15. 05 10月, 2016 2 次提交
  16. 04 10月, 2016 2 次提交
  17. 26 9月, 2016 5 次提交
    • B
      xfs: log recovery tracepoints to track current lsn and buffer submission · 5cd9cee9
      Brian Foster 提交于
      Log recovery has particular rules around buffer submission along with
      tricky corner cases where independent transactions can share an LSN. As
      such, it can be difficult to follow when/why buffers are submitted
      during recovery.
      
      Add a couple tracepoints to post the current LSN of a record when a new
      record is being processed and when a buffer is being skipped due to LSN
      ordering. Also, update the recover item class to include the LSN of the
      current transaction for the item being processed.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      5cd9cee9
    • B
      xfs: update metadata LSN in buffers during log recovery · 60a4a222
      Brian Foster 提交于
      Log recovery is currently broken for v5 superblocks in that it never
      updates the metadata LSN of buffers written out during recovery. The
      metadata LSN is recorded in various bits of metadata to provide recovery
      ordering criteria that prevents transient corruption states reported by
      buffer write verifiers. Without such ordering logic, buffer updates can
      be replayed out of order and lead to false positive transient corruption
      states. This is generally not a corruption vector on its own, but
      corruption detection shuts down the filesystem and ultimately prevents a
      mount if it occurs during log recovery. This requires an xfs_repair run
      that clears the log and potentially loses filesystem updates.
      
      This problem is avoided in most cases as metadata writes during normal
      filesystem operation update the metadata LSN appropriately. The problem
      with log recovery not updating metadata LSNs manifests if the system
      happens to crash shortly after log recovery itself. In this scenario, it
      is possible for log recovery to complete all metadata I/O such that the
      filesystem is consistent. If a crash occurs after that point but before
      the log tail is pushed forward by subsequent operations, however, the
      next mount performs the same log recovery over again. If a buffer is
      updated multiple times in the dirty range of the log, an earlier update
      in the log might not be valid based on the current state of the
      associated buffer after all of the updates in the log had been replayed
      (before the previous crash). If a verifier happens to detect such a
      problem, the filesystem claims corruption and immediately shuts down.
      
      This commonly manifests in practice as directory block verifier failures
      such as the following, likely due to directory verifiers being
      particularly detailed in their checks as compared to most others:
      
        ...
        Mounting V5 Filesystem
        XFS (dm-0): Starting recovery (logdev: internal)
        XFS (dm-0): Internal error XFS_WANT_CORRUPTED_RETURN at line ... of \
          file fs/xfs/libxfs/xfs_dir2_data.c.  Caller xfs_dir3_data_verify ...
        ...
      
      Update log recovery to update the metadata LSN of recovered buffers.
      Since metadata LSNs are already updated by write verifer functions via
      attached log items, attach a dummy log item to the buffer during
      validation and explicitly set the LSN of the current transaction. This
      ensures that the metadata LSN of a buffer is updated based on whether
      the recovery I/O actually completes, and if so, that subsequent recovery
      attempts identify that the buffer is already up to date with respect to
      the current transaction.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      60a4a222
    • B
      xfs: don't warn on buffers not being recovered due to LSN · 040c52c0
      Brian Foster 提交于
      The log recovery buffer validation function is invoked in cases where a
      buffer update may be skipped due to LSN ordering. If the validation
      function happens to come across directory conversion situations (e.g., a
      dir3 block to data conversion), it may warn about seeing a buffer log
      format of one type and a buffer with a magic number of another.
      
      This warning is not valid as the buffer update is ultimately skipped.
      This is indicated by a current_lsn of NULLCOMMITLSN provided by the
      caller. As such, update xlog_recover_validate_buf_type() to only warn in
      such cases when a buffer update is expected.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      040c52c0
    • B
      xfs: pass current lsn to log recovery buffer validation · 22db9af2
      Brian Foster 提交于
      The current LSN must be available to the buffer validation function to
      provide the ability to update the metadata LSN of the buffer. Pass the
      current_lsn value down to xlog_recover_validate_buf_type() in
      preparation.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      22db9af2
    • B
      xfs: rework log recovery to submit buffers on LSN boundaries · 12818d24
      Brian Foster 提交于
      The fix to log recovery to update the metadata LSN in recovered buffers
      introduces the requirement that a buffer is submitted only once per
      current LSN. Log recovery currently submits buffers on transaction
      boundaries. This is not sufficient as the abstraction between log
      records and transactions allows for various scenarios where multiple
      transactions can share the same current LSN. If independent transactions
      share an LSN and both modify the same buffer, log recovery can
      incorrectly skip updates and leave the filesystem in an inconsisent
      state.
      
      In preparation for proper metadata LSN updates during log recovery,
      update log recovery to submit buffers for write on LSN change boundaries
      rather than transaction boundaries. Explicitly track the current LSN in
      a new struct xlog field to handle the various corner cases of when the
      current LSN may or may not change.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      12818d24
  18. 03 8月, 2016 6 次提交
  19. 06 4月, 2016 1 次提交