1. 04 1月, 2016 4 次提交
    • B
      xfs: return start block of first bad log record during recovery · d7f37692
      Brian Foster 提交于
      Each log recovery pass walks from the tail block to the head block and
      processes records appropriately based on the associated log pass type.
      There are various failure conditions that can occur through this
      sequence, such as I/O errors, CRC errors, etc. Log torn write detection
      will perform CRC verification near the head of the log to detect torn
      writes and trim torn records from the log appropriately.
      
      As it is, xlog_do_recovery_pass() only returns an error code in the
      event of CRC failure, which isn't enough information to trim the head of
      the log. Update xlog_do_recovery_pass() to optionally return the start
      block of the associated record when an error occurs. This patch contains
      no functional changes.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      
      d7f37692
    • B
      xfs: refactor and open code log record crc check · b94fb2d1
      Brian Foster 提交于
      Log record CRC verification currently occurs during active log recovery,
      immediately before a log record is unpacked. Therefore, the CRC
      calculation code is buried within the data unpack function. CRC
      verification pass support only needs to go so far as check the CRC, but
      this is not easily allowed as the code is currently organized.
      
      Since we now have a new log record processing helper, pull the record
      CRC verification code out from the unpack helper and open-code it at the
      top of the new process helper. This facilitates the ability to modify
      how records are processed based on the type of the current pass. This
      patch contains no functional changes.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      
      b94fb2d1
    • B
      xfs: refactor log record unpack and data processing · 9d94901f
      Brian Foster 提交于
      xlog_do_recovery_pass() duplicates a couple function calls related to
      processing log records because the function must handle wrapping around
      the end of the log if the head is behind the tail. This is implemented
      as separate loops. CRC verification pass support will modify how records
      are processed in both of these loops.
      
      Rather than continue to duplicate code, factor the calls that process a
      log record into a new helper and call that helper from both loops. This
      patch contains no functional changes.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      
      9d94901f
    • B
      xfs: detect and handle invalid iclog size set by mkfs · a70f9fe5
      Brian Foster 提交于
      XFS log records have separate fields for the record size and the iclog
      size used to write the record. mkfs.xfs zeroes the log and writes an
      unmount record to generate a clean log for the subsequent mount. The
      userspace record logging code has a bug where the iclog size (h_size)
      field of the log record is hardcoded to 32k, even if a log stripe unit
      is specified. The log record length is correctly extended to the stripe
      unit. Since the kernel log recovery code uses the h_size field to
      determine the log buffer size, this means that the kernel can attempt to
      read/process records larger than the buffer size and overrun the buffer.
      
      This has historically not been a problem because the kernel doesn't
      actually run through log recovery in the clean unmount case. Instead,
      the kernel detects that a single unmount record exists between the head
      and tail and pushes the tail forward such that the log is viewed as
      clean (head == tail). Once CRC verification is enabled, however, all
      records at the head of the log are verified for CRC errors and thus we
      are susceptible to overrun problems if the iclog field is not correct.
      
      While the core problem must be fixed in userspace, this is historical
      behavior that must be detected in the kernel to avoid severe side
      effects such as memory corruption and crashes. Update the log buffer
      size calculation code to detect this condition, warn the user and resize
      the log buffer based on the log stripe unit. Return a corruption error
      in cases where this does not look like a clean filesystem (i.e., the log
      record header indicates more than one operation).
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      
      a70f9fe5
  2. 10 11月, 2015 1 次提交
    • B
      xfs: fix log recovery op header validation assert · 848ccfc8
      Brian Foster 提交于
      Commit 89cebc84 ("xfs: validate transaction header length on log
      recovery") added additional validation of the on-disk op header length
      to protect from buffer overflow during log recovery. It accounts for the
      fact that the transaction header can be split across multiple op
      headers. It added an assert for when this occurs that verifies the
      length of the second part of a split transaction header is less than a
      full transaction header. In other words, it expects that the first op
      header of a split transaction header includes at least some portion of
      the transaction header.
      
      This expectation is not always valid as a zero-length op header can
      exist for the first op header of a split transaction header (see
      xlog_recover_add_to_trans() for details). This means that the second op
      header can have a valid, full length transaction header and thus the
      full header is copied in xlog_recover_add_to_cont_trans(). Fix the
      assert in xlog_recover_add_to_cont_trans() to handle this case correctly
      and require that the op header length is less than or equal to a full
      transaction header.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      
      848ccfc8
  3. 12 10月, 2015 1 次提交
    • B
      xfs: validate metadata LSNs against log on v5 superblocks · a45086e2
      Brian Foster 提交于
      Since the onset of v5 superblocks, the LSN of the last modification has
      been included in a variety of on-disk data structures. This LSN is used
      to provide log recovery ordering guarantees (e.g., to ensure an older
      log recovery item is not replayed over a newer target data structure).
      
      While this works correctly from the point a filesystem is formatted and
      mounted, userspace tools have some problematic behaviors that defeat
      this mechanism. For example, xfs_repair historically zeroes out the log
      unconditionally (regardless of whether corruption is detected). If this
      occurs, the LSN of the filesystem is reset and the log is now in a
      problematic state with respect to on-disk metadata structures that might
      have a larger LSN. Until either the log catches up to the highest
      previously used metadata LSN or each affected data structure is modified
      and written out without incident (which resets the metadata LSN), log
      recovery is susceptible to filesystem corruption.
      
      This problem is ultimately addressed and repaired in the associated
      userspace tools. The kernel is still responsible to detect the problem
      and notify the user that something is wrong. Check the superblock LSN at
      mount time and fail the mount if it is invalid. From that point on,
      trigger verifier failure on any metadata I/O where an invalid LSN is
      detected. This results in a filesystem shutdown and guarantees that we
      do not log metadata changes with invalid LSNs on disk. Since this is a
      known issue with a known recovery path, present a warning to instruct
      the user how to recover.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      a45086e2
  4. 19 8月, 2015 7 次提交
    • D
      xfs: log recovery needs to validate against sb_meta_uuid · fcfbe2c4
      Dave Chinner 提交于
      Now that sb_uuid can be changed by the user, we cannot use this to
      validate the metadata blocks being recovered belong to this
      filesystem. We must check against the sb_meta_uuid as that will
      remain unchanged.
      
      There is a complication in this code - the superblock itself. We can
      not check the sb_meta_uuid unconditionally, as that may not be set
      on disk. Hence we must verify the superblock sb_uuid matches between
      the log record and the in-core superblock.
      
      Found by inspection after the previous two problems were found.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      fcfbe2c4
    • B
      xfs: fix broken icreate log item cancellation · fc0d1656
      Brian Foster 提交于
      Inode cluster buffers are invalidated and cancelled when inode chunks
      are freed to notify log recovery that previous logged updates to the
      metadata buffer should be skipped. This ensures that log recovery does
      not overwrite buffers that might have already been reused.
      
      On v4 filesystems, inode chunk allocation and inode updates are logged
      via the cluster buffers and thus cancellation is easily detected via
      buffer cancellation items. v5 filesystems use the new icreate
      transaction, which uses logical logging and ordered buffers to log a
      full inode chunk allocation at once. The resulting icreate item often
      spans multiple inode cluster buffers.
      
      Log recovery checks for cancelled buffers when processing icreate log
      items, but it has a couple problems. First, it uses the full length of
      the inode chunk rather than the cluster size. Second, it uses the length
      in FSB units rather than BB units. Either of these problems prevent
      icreate recovery from identifying cancelled buffers and thus inode
      initialization proceeds unconditionally.
      
      Update xlog_recover_do_icreate_pass2() to iterate the icreate range in
      cluster sized increments and check each increment for cancellation.
      Since icreate is currently only used for the minimum atomic inode chunk
      allocation, we expect that either all or none of the buffers will be
      cancelled. Cancel the icreate if at least one buffer is cancelled to
      avoid making a bad situation worse by initializing a partial inode
      chunk, but detect such anomalies and warn the user.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      fc0d1656
    • B
      xfs: icreate log item recovery and cancellation tracepoints · 78d57e45
      Brian Foster 提交于
      Various log items have recovery tracepoints to identify whether a
      particular log item is recovered or cancelled. Add the equivalent
      tracepoints for the icreate transaction.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      78d57e45
    • B
      xfs: don't leave EFIs on AIL on mount failure · f0b2efad
      Brian Foster 提交于
      Log recovery occurs in two phases at mount time. In the first phase,
      EFIs and EFDs are processed and potentially cancelled out. EFIs without
      EFD objects are inserted into the AIL for processing and recovery in the
      second phase. xfs_mountfs() runs various other operations between the
      phases and is thus subject to failure. If failure occurs after the first
      phase but before the second, pending EFIs sit on the AIL, pin it and
      cause the mount to hang.
      
      Update the mount sequence to ensure that pending EFIs are cancelled in
      the event of failure. Add a recovery cancellation mechanism to iterate
      the AIL and cancel all EFI items when requested. Plumb cancellation
      support through the log mount finish helper and update xfs_mountfs() to
      invoke cancellation in the event of failure after recovery has started.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      f0b2efad
    • B
      xfs: use EFI refcount consistently in log recovery · e32a1d1f
      Brian Foster 提交于
      The EFI is initialized with a reference count of 2. One for the EFI to
      ensure the item makes it to the AIL and one for the subsequently created
      EFD to release the EFI once the EFD is committed. Log recovery uses the
      EFI in a similar manner, but implements a hack to remove both references
      in one call once the EFD is handled.
      
      Update log recovery to use EFI reference counting in a manner consistent
      with the log. When an EFI is encountered during recovery, an EFI item is
      allocated and inserted to the AIL directly. Since the EFI reference is
      typically dropped when the EFI is unpinned and this is analogous with
      AIL insertion, drop the EFI reference at this point.
      
      When a corresponding EFD is encountered in the log, this indicates that
      the extents were freed, no processing is required and the EFI can be
      dropped. Update xlog_recover_efd_pass2() to simply drop the EFD
      reference at this point rather than open code the AIL removal and EFI
      free.
      
      Remaining EFIs (i.e., with no corresponding EFD) are processed in
      xlog_recover_finish(). An EFD transaction is allocated and the extents
      are freed, which transfers ownership of the EFI reference to the EFD
      item in the log.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      e32a1d1f
    • B
      xfs: ensure EFD trans aborts on log recovery extent free failure · 6bc43af3
      Brian Foster 提交于
      Log recovery attempts to free extents with leftover EFIs in the AIL
      after initial processing. If the extent free fails (e.g., due to
      unrelated fs corruption), the transaction is cancelled, though it
      might not be dirtied at the time. If this is the case, the EFD does
      not abort and thus does not release the EFI. This can lead to hangs
      as the EFI pins the AIL.
      
      Update xlog_recover_process_efi() to log the EFD in the transaction
      before xfs_free_extent() errors are handled to ensure the
      transaction is dirty, aborts the EFD and releases the EFI on error.
      Since this is a requirement for EFD processing (and consistent with
      xfs_bmap_finish()), update the EFD logging helper to do the extent
      free and unconditionally log the EFD. This encodes the required EFD
      logging behavior into the helper and reduces the likelihood of
      errors down the road.
      
      [dchinner: re-add xfs_alloc.h to xfs_log_recover.c to fix build
       failure.]
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      6bc43af3
    • B
      xfs: disentagle EFI release from the extent count · 5e4b5386
      Brian Foster 提交于
      Release of the EFI either occurs based on the reference count or the
      extent count. The extent count used is either the count tracked in
      the EFI or EFD, depending on the particular situation. In either
      case, the count is initialized to the final value and thus always
      matches the current efi_next_extent value once the EFI is completely
      constructed.  For example, the EFI extent count is increased as the
      extents are logged in xfs_bmap_finish() and the full free list is
      always completely processed. Therefore, the count is guaranteed to
      be complete once the EFI transaction is committed. The EFD uses the
      efd_nextents counter to release the EFI. This counter is initialized
      to the count of the EFI when the EFD is created. Thus the EFD, as
      currently used, has no concept of partial EFI release based on
      extent count.
      
      Given that the EFI extent count is always released in whole, use of
      the extent count for reference counting is unnecessary. Remove this
      level of the API and release the EFI based on the core reference
      count. The efi_next_extent counter remains because it is still used
      to track the slot to log the next extent to free.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      5e4b5386
  5. 29 7月, 2015 3 次提交
    • J
      xfs: Use consistent logging message prefixes · f41febd2
      Joe Perches 提交于
      The second and subsequent lines of multi-line logging messages
      are not prefixed with the same information as the first line.
      
      Separate messages with newlines into multiple calls to ensure
      consistent prefixing and allow easier grep use.
      Signed-off-by: NJoe Perches <joe@perches.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      
      f41febd2
    • B
      xfs: validate transaction header length on log recovery · 89cebc84
      Brian Foster 提交于
      When log recovery hits a new transaction, it copies the transaction
      header from the expected location in the log to the in-core structure
      using the length from the op record header. This length is validated to
      ensure it doesn't exceed the length of the record, but not against the
      expected size of a transaction header (and thus the size of the in-core
      structure). If the on-disk length is corrupted, the associated memcpy()
      can overflow, write to unrelated memory and lead to crashes. This has
      been reproduced via filesystem fuzzing.
      
      The code currently handles the possibility that the transaction header
      is split across two op records. Neither instance accounts for corruption
      where the op record length might be larger than the in-core transaction
      header. Update both sites to detect such corruption, warn and return an
      error from log recovery. Also add some comments and assert that if the
      record is split, the copy of the second portion is less than a full
      header. Otherwise, this suggests the copy of the second portion could
      have overwritten bits from the first and thus that something could be
      wrong.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      
      89cebc84
    • D
      xfs: remote attribute headers contain an invalid LSN · e3c32ee9
      Dave Chinner 提交于
      In recent testing, a system that crashed failed log recovery on
      restart with a bad symlink buffer magic number:
      
      XFS (vda): Starting recovery (logdev: internal)
      XFS (vda): Bad symlink block magic!
      XFS: Assertion failed: 0, file: fs/xfs/xfs_log_recover.c, line: 2060
      
      On examination of the log via xfs_logprint, none of the symlink
      buffers in the log had a bad magic number, nor were any other types
      of buffer log format headers mis-identified as symlink buffers.
      Tracing was used to find the buffer the kernel was tripping over,
      and xfs_db identified it's contents as:
      
      000: 5841524d 00000000 00000346 64d82b48 8983e692 d71e4680 a5f49e2c b317576e
      020: 00000000 00602038 00000000 006034ce d0020000 00000000 4d4d4d4d 4d4d4d4d
      040: 4d4d4d4d 4d4d4d4d 4d4d4d4d 4d4d4d4d 4d4d4d4d 4d4d4d4d 4d4d4d4d 4d4d4d4d
      060: 4d4d4d4d 4d4d4d4d 4d4d4d4d 4d4d4d4d 4d4d4d4d 4d4d4d4d 4d4d4d4d 4d4d4d4d
      .....
      
      This is a remote attribute buffer, which are notable in that they
      are not logged but are instead written synchronously by the remote
      attribute code so that they exist on disk before the attribute
      transactions are committed to the journal.
      
      The above remote attribute block has an invalid LSN in it - cycle
      0xd002000, block 0 - which means when log recovery comes along to
      determine if the transaction that writes to the underlying block
      should be replayed, it sees a block that has a future LSN and so
      does not replay the buffer data in the transaction. Instead, it
      validates the buffer magic number and attaches the buffer verifier
      to it.  It is this buffer magic number check that is failing in the
      above assert, indicating that we skipped replay due to the LSN of
      the underlying buffer.
      
      The problem here is that the remote attribute buffers cannot have a
      valid LSN placed into them, because the transaction that contains 
      the attribute tree pointer changes and the block allocation that the
      attribute data is being written to hasn't yet been committed. Hence
      the LSN field in the attribute block is completely unwritten,
      thereby leaving the underlying contents of the block in the LSN
      field. It could have any value, and hence a future overwrite of the
      block by log recovery may or may not work correctly.
      
      Fix this by always writing an invalid LSN to the remote attribute
      block, as any buffer in log recovery that needs to write over the
      remote attribute should occur. We are protected from having old data
      written over the attribute by the fact that freeing the block before
      the remote attribute is written will result in the buffer being
      marked stale in the log and so all changes prior to the buffer stale
      transaction will be cancelled by log recovery.
      
      Hence it is safe to ignore the LSN in the case or synchronously
      written, unlogged metadata such as remote attribute blocks, and to
      ensure we do that correctly, we need to write an invalid LSN to all
      remote attribute blocks to trigger immediate recovery of metadata
      that is written over the top.
      
      As a further protection for filesystems that may already have remote
      attribute blocks with bad LSNs on disk, change the log recovery code
      to always trigger immediate recovery of metadata over remote
      attribute blocks.
      
      cc: <stable@vger.kernel.org>
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      e3c32ee9
  6. 22 6月, 2015 2 次提交
  7. 04 6月, 2015 2 次提交
    • C
      xfs: saner xfs_trans_commit interface · 70393313
      Christoph Hellwig 提交于
      The flags argument to xfs_trans_commit is not useful for most callers, as
      a commit of a transaction without a permanent log reservation must pass
      0 here, and all callers for a transaction with a permanent log reservation
      except for xfs_trans_roll must pass XFS_TRANS_RELEASE_LOG_RES.  So remove
      the flags argument from the public xfs_trans_commit interfaces, and
      introduce low-level __xfs_trans_commit variant just for xfs_trans_roll
      that regrants a log reservation instead of releasing it.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      70393313
    • C
      xfs: remove the flags argument to xfs_trans_cancel · 4906e215
      Christoph Hellwig 提交于
      xfs_trans_cancel takes two flags arguments: XFS_TRANS_RELEASE_LOG_RES and
      XFS_TRANS_ABORT.  Both of them are a direct product of the transaction
      state, and can be deducted:
      
       - any dirty transaction needs XFS_TRANS_ABORT to be properly canceled,
         and XFS_TRANS_ABORT is a noop for a transaction that is not dirty.
       - any transaction with a permanent log reservation needs
         XFS_TRANS_RELEASE_LOG_RES to be properly canceled, and passing
         XFS_TRANS_RELEASE_LOG_RES for a transaction without a permanent
         log reservation is invalid.
      
      So just remove the flags argument and do the right thing.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      4906e215
  8. 29 5月, 2015 2 次提交
  9. 23 2月, 2015 1 次提交
  10. 04 12月, 2014 1 次提交
  11. 28 11月, 2014 3 次提交
  12. 02 10月, 2014 2 次提交
    • D
      xfs: introduce xfs_buf_submit[_wait] · 595bff75
      Dave Chinner 提交于
      There is a lot of cookie-cutter code that looks like:
      
      	if (shutdown)
      		handle buffer error
      	xfs_buf_iorequest(bp)
      	error = xfs_buf_iowait(bp)
      	if (error)
      		handle buffer error
      
      spread through XFS. There's significant complexity now in
      xfs_buf_iorequest() to specifically handle this sort of synchronous
      IO pattern, but there's all sorts of nasty surprises in different
      error handling code dependent on who owns the buffer references and
      the locks.
      
      Pull this pattern into a single helper, where we can hide all the
      synchronous IO warts and hence make the error handling for all the
      callers much saner. This removes the need for a special extra
      reference to protect IO completion processing, as we can now hold a
      single reference across dispatch and waiting, simplifying the sync
      IO smeantics and error handling.
      
      In doing this, also rename xfs_buf_iorequest to xfs_buf_submit and
      make it explicitly handle on asynchronous IO. This forces all users
      to be switched specifically to one interface or the other and
      removes any ambiguity between how the interfaces are to be used. It
      also means that xfs_buf_iowait() goes away.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      595bff75
    • D
      xfs: xfs_buf_ioend and xfs_buf_iodone_work duplicate functionality · e8aaba9a
      Dave Chinner 提交于
      We do some work in xfs_buf_ioend, and some work in
      xfs_buf_iodone_work, but much of that functionality is the same.
      This work can all be done in a single function, leaving
      xfs_buf_iodone just a wrapper to determine if we should execute it
      by workqueue or directly. hence rename xfs_buf_iodone_work to
      xfs_buf_ioend(), and add a new xfs_buf_ioend_async() for places that
      need async processing.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      e8aaba9a
  13. 29 9月, 2014 5 次提交
  14. 09 9月, 2014 2 次提交
    • E
      xfs: deduplicate xlog_do_recovery_pass() · 970fd3f0
      Eric Sandeen 提交于
      In xlog_do_recovery_pass(), there are 2 distinct cases:
      non-wrapped and wrapped log recovery.
      
      If we find a wrapped log, we recover around the end
      of the log, and then handle the rest of recovery
      exactly as in the non-wrapped case - using exactly the same
      (duplicated) code.
      
      Rather than having the same code in both cases, we can
      get the wrapped portion out of the way first if needed,
      and then recover the non-wrapped portion of the log.
      
      There should be no functional change here, just code
      reorganization & deduplication.
      
      The patch looks a bit bigger than it really is; the last
      hunk is whitespace changes (un-indenting).
      
      Tested with xfstests "check -g log" on a stock configuration.
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      970fd3f0
    • B
      xfs: export log_recovery_delay to delay mount time log recovery · 2e227178
      Brian Foster 提交于
      XFS log recovery has been discovered to have race conditions with
      buffers when I/O errors occur. External tools are available to simulate
      I/O errors to XFS, but this alone is not sufficient for testing log
      recovery. XFS unconditionally resets the inactive region of the log
      prior to log recovery to avoid confusion over processing any partially
      written log records that might have been written before an unclean
      shutdown. Therefore, unconditional write I/O failures at mount time are
      caught by the reset sequence rather than log recovery and hinder the
      ability to test the latter.
      
      The device-mapper dm-flakey module uses an up/down timer to define a
      cycle for when to fail I/Os. Create a pre log recovery delay tunable
      that can be used to coordinate XFS log recovery with I/O errors
      simulated by dm-flakey. This facilitates coordination in userspace that
      allows the reset of stale log blocks to succeed and writes due to log
      recovery to fail. For example, define a dm-flakey instance with an
      uptime long enough to allow log reset to succeed and a log recovery
      delay long enough to allow the dm-flakey uptime to expire.
      
      The 'log_recovery_delay' sysfs tunable is exported under
      /sys/fs/xfs/debug and is only enabled for kernels compiled in XFS debug
      mode. The value is exported in units of seconds and allows for a delay
      of up to 60 seconds. Note that this is for XFS debug and test
      instrumentation purposes only and should not be used by applications. No
      delay is enabled by default.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      2e227178
  15. 04 8月, 2014 2 次提交
    • D
      xfs: dquot recovery needs verifiers · ad3714b8
      Dave Chinner 提交于
      dquot recovery should add verifiers to the dquot buffers that it
      recovers changes into. Unfortunately, it doesn't attached the
      verifiers to the buffers in a consistent manner. For example,
      xlog_recover_dquot_pass2() reads dquot buffers without a verifier
      and then writes it without ever having attached a verifier to the
      buffer.
      
      Further, dquot buffer recovery may write a dquot buffer that has not
      been modified, or indeed, shoul dbe written because quotas are not
      enabled and hence changes to the buffer were not replayed. In this
      case, we again write buffers without verifiers attached because that
      doesn't happen until after the buffer changes have been replayed.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      ad3714b8
    • D
      xfs: ensure verifiers are attached to recovered buffers · 67dc288c
      Dave Chinner 提交于
      Crash testing of CRC enabled filesystems has resulted in a number of
      reports of bad CRCs being detected after the filesystem was mounted.
      Errors such as the following were being seen:
      
      XFS (sdb3): Mounting V5 Filesystem
      XFS (sdb3): Starting recovery (logdev: internal)
      XFS (sdb3): Metadata CRC error detected at xfs_agf_read_verify+0x5a/0x100 [xfs], block 0x1
      XFS (sdb3): Unmount and run xfs_repair
      XFS (sdb3): First 64 bytes of corrupted metadata buffer:
      ffff880136ffd600: 58 41 47 46 00 00 00 01 00 00 00 00 00 0f aa 40  XAGF...........@
      ffff880136ffd610: 00 02 6d 53 00 02 77 f8 00 00 00 00 00 00 00 01  ..mS..w.........
      ffff880136ffd620: 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 03  ................
      ffff880136ffd630: 00 00 00 04 00 08 81 d0 00 08 81 a7 00 00 00 00  ................
      XFS (sdb3): metadata I/O error: block 0x1 ("xfs_trans_read_buf_map") error 74 numblks 1
      
      The errors were typically being seen in AGF, AGI and their related
      btree block buffers some time after log recovery had run. Often it
      wasn't until later subsequent mounts that the problem was
      discovered. The common symptom was a buffer with the correct
      contents, but a CRC and an LSN that matched an older version of the
      contents.
      
      Some debug added to _xfs_buf_ioapply() indicated that buffers were
      being written without verifiers attached to them from log recovery,
      and Jan Kara isolated the cause to log recovery readahead an dit's
      interactions with buffers that had a more recent LSN on disk than
      the transaction being recovered. In this case, the buffer did not
      get a verifier attached, and os when the second phase of log
      recovery ran and recovered EFIs and unlinked inodes, the buffers
      were modified and written without the verifier running. Hence they
      had up to date contents, but stale LSNs and CRCs.
      
      Fix it by attaching verifiers to buffers we skip due to future LSN
      values so they don't escape into the buffer cache without the
      correct verifier attached.
      
      This patch is based on analysis and a patch from Jan Kara.
      
      cc: <stable@vger.kernel.org>
      Reported-by: NJan Kara <jack@suse.cz>
      Reported-by: NFanael Linithien <fanael4@gmail.com>
      Reported-by: NGrozdan <neutrino8@gmail.com>
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      67dc288c
  16. 25 6月, 2014 1 次提交
    • D
      xfs: global error sign conversion · 2451337d
      Dave Chinner 提交于
      Convert all the errors the core XFs code to negative error signs
      like the rest of the kernel and remove all the sign conversion we
      do in the interface layers.
      
      Errors for conversion (and comparison) found via searches like:
      
      $ git grep " E" fs/xfs
      $ git grep "return E" fs/xfs
      $ git grep " E[A-Z].*;$" fs/xfs
      
      Negation points found via searches like:
      
      $ git grep "= -[a-z,A-Z]" fs/xfs
      $ git grep "return -[a-z,A-D,F-Z]" fs/xfs
      $ git grep " -[a-z].*;" fs/xfs
      
      [ with some bits I missed from Brian Foster ]
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      2451337d
  17. 22 6月, 2014 1 次提交