1. 03 8月, 2016 4 次提交
  2. 06 4月, 2016 2 次提交
  3. 07 3月, 2016 5 次提交
    • D
      xfs: reinitialise per-AG structures if geometry changes during recovery · a798011c
      Dave Chinner 提交于
      If a crash occurs immediately after a filesystem grow operation, the
      updated superblock geometry is found only in the log. After we
      recover the log, the superblock is reread and re-initialised and so
      has the new geometry in memory. If the new geometry has more AGs
      than prior to the grow operation, then the new AGs will not have
      in-memory xfs_perag structurea associated with them.
      
      This will result in an oops when the first metadata buffer from a
      new AG is looked up in the buffer cache, as the block lies within
      the new geometry but then fails to find a perag structure on lookup.
      This is easily fixed by simply re-initialising the perag structure
      after re-reading the superblock at the conclusion of the first pahse
      of log recovery.
      
      This, however, does not fix the case of log recovery requiring
      access to metadata in the newly grown space. Fortunately for us,
      because the in-core superblock has not been updated, this will
      result in detection of access beyond the end of the filesystem
      and so recovery will fail at that point. If this proves to be
      a problem, then we can address it separately to the current
      reported issue.
      Reported-by: NAlex Lyakas <alex@zadarastorage.com>
      Tested-by: NAlex Lyakas <alex@zadarastorage.com>
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      a798011c
    • B
      xfs: only run torn log write detection on dirty logs · 7f6aff3a
      Brian Foster 提交于
      XFS uses CRC verification over a sub-range of the head of the log to
      detect and handle torn writes. This torn log write detection currently
      runs unconditionally at mount time, regardless of whether the log is
      dirty or clean. This is problematic in cases where a filesystem might
      end up being moved across different, incompatible (i.e., opposite
      byte-endianness) architectures.
      
      The problem lies in the fact that log data is not necessarily written in
      an architecture independent format. For example, certain bits of data
      are written in native endian format. Further, the size of certain log
      data structures differs (i.e., struct xlog_rec_header) depending on the
      word size of the cpu. This leads to false positive crc verification
      errors and ultimately failed mounts when a cleanly unmounted filesystem
      is mounted on a system with an incompatible architecture from data that
      was written near the head of the log.
      
      Update the log head/tail discovery code to run torn write detection only
      when the log is not clean. This means something other than an unmount
      record resides at the head of the log and log recovery is imminent. It
      is a requirement to run log recovery on the same type of host that had
      written the content of the dirty log and therefore CRC failures are
      legitimate corruptions in that scenario.
      Reported-by: NJan Beulich <JBeulich@suse.com>
      Tested-by: NJan Beulich <JBeulich@suse.com>
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      
      7f6aff3a
    • B
      xfs: refactor in-core log state update to helper · 717bc0eb
      Brian Foster 提交于
      Once the record at the head of the log is identified and verified, the
      in-core log state is updated based on the record. This includes
      information such as the current head block and cycle, the start block of
      the last record written to the log, the tail lsn, etc.
      
      Once torn write detection is conditional, this logic will need to be
      reused. Factor the code to update the in-core log data structures into a
      new helper function. This patch does not change behavior.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      
      717bc0eb
    • B
      xfs: refactor unmount record detection into helper · 65b99a08
      Brian Foster 提交于
      Once the mount sequence has identified the head and tail blocks of the
      physical log, the record at the head of the log is located and examined
      for an unmount record to determine if the log is clean. This currently
      occurs after torn write verification of the head region of the log.
      
      This must ultimately be separated from torn write verification and may
      need to be called again if the log head is walked back due to a torn
      write (to determine whether the new head record is an unmount record).
      Separate this logic into a new helper function. This patch does not
      change behavior.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      
      65b99a08
    • B
      xfs: separate log head record discovery from verification · 82ff6cc2
      Brian Foster 提交于
      The code that locates the log record at the head of the log is buried in
      the log head verification function. This is fine when torn write
      verification occurs unconditionally, but this behavior is problematic
      for filesystems that might be moved across systems with different
      architectures.
      
      In preparation for separating examination of the log head for unmount
      records from torn write detection, lift the record location logic out of
      the log verification function and into the caller. This patch does not
      change behavior.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      
      82ff6cc2
  4. 10 2月, 2016 5 次提交
  5. 09 2月, 2016 6 次提交
  6. 08 2月, 2016 1 次提交
  7. 12 1月, 2016 1 次提交
    • D
      xfs: handle dquot buffer readahead in log recovery correctly · 7d6a13f0
      Dave Chinner 提交于
      When we do dquot readahead in log recovery, we do not use a verifier
      as the underlying buffer may not have dquots in it. e.g. the
      allocation operation hasn't yet been replayed. Hence we do not want
      to fail recovery because we detect an operation to be replayed has
      not been run yet. This problem was addressed for inodes in commit
      d8914002 ("xfs: inode buffers may not be valid during recovery
      readahead") but the problem was not recognised to exist for dquots
      and their buffers as the dquot readahead did not have a verifier.
      
      The result of not using a verifier is that when the buffer is then
      next read to replay a dquot modification, the dquot buffer verifier
      will only be attached to the buffer if *readahead is not complete*.
      Hence we can read the buffer, replay the dquot changes and then add
      it to the delwri submission list without it having a verifier
      attached to it. This then generates warnings in xfs_buf_ioapply(),
      which catches and warns about this case.
      
      Fix this and make it handle the same readahead verifier error cases
      as for inode buffers by adding a new readahead verifier that has a
      write operation as well as a read operation that marks the buffer as
      not done if any corruption is detected.  Also make sure we don't run
      readahead if the dquot buffer has been marked as cancelled by
      recovery.
      
      This will result in readahead either succeeding and the buffer
      having a valid write verifier, or readahead failing and the buffer
      state requiring the subsequent read to resubmit the IO with the new
      verifier.  In either case, this will result in the buffer always
      ending up with a valid write verifier on it.
      
      Note: we also need to fix the inode buffer readahead error handling
      to mark the buffer with EIO. Brian noticed the code I copied from
      there wrong during review, so fix it at the same time. Add comments
      linking the two functions that handle readahead verifier errors
      together so we don't forget this behavioural link in future.
      
      cc: <stable@vger.kernel.org> # 3.12 - current
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      7d6a13f0
  8. 05 1月, 2016 1 次提交
    • B
      xfs: detect and trim torn writes during log recovery · 7088c413
      Brian Foster 提交于
      Certain types of storage, such as persistent memory, do not provide
      sector atomicity for writes. This means that if a crash occurs while XFS
      is writing log records, only part of those records might make it to the
      storage. This is problematic because log recovery uses the cycle value
      packed at the top of each log block to locate the head/tail of the log.
      This can lead to CRC verification failures during log recovery and an
      unmountable fs for a filesystem that is otherwise consistent.
      
      Update log recovery to incorporate log record CRC verification as part
      of the head/tail discovery process. Once the head is located via the
      traditional algorithm, run a CRC-only pass over the records up to the
      head of the log. If CRC verification fails, assume that the records are
      torn as a matter of policy and trim the head block back to the start of
      the first bad record.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      
      7088c413
  9. 04 1月, 2016 6 次提交
    • B
      xfs: refactor log record start detection into a new helper · eed6b462
      Brian Foster 提交于
      As part of the head/tail discovery process, log recovery locates the
      head block and then reverse seeks to find the start of the last active
      record in the log. This is non-trivial as the record itself could have
      wrapped around the end of the physical log. Log recovery torn write
      detection potentially needs to walk further behind the last record in
      the log, as multiple log I/Os can be in-flight at one time during a
      crash event.
      
      Therefore, refactor the reverse log record header search mechanism into
      a new helper that supports the ability to seek past an arbitrary number
      of log records (or until the tail is hit). Update the head/tail search
      mechanism to call the new helper, but otherwise there is no change in
      log recovery behavior.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      
      eed6b462
    • B
      xfs: support a crc verification only log record pass · 6528250b
      Brian Foster 提交于
      Log recovery torn write detection uses CRC verification over a range of
      the active log to identify torn writes. Since the generic log recovery
      pass code implements a superset of the functionality required for CRC
      verification, it can be easily modified to support a CRC verification
      only pass.
      
      Create a new CRC pass type and update the log record processing helper
      to skip everything beyond CRC verification when in this mode. This pass
      will be invoked in subsequent patches to implement torn write detection.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      
      6528250b
    • B
      xfs: return start block of first bad log record during recovery · d7f37692
      Brian Foster 提交于
      Each log recovery pass walks from the tail block to the head block and
      processes records appropriately based on the associated log pass type.
      There are various failure conditions that can occur through this
      sequence, such as I/O errors, CRC errors, etc. Log torn write detection
      will perform CRC verification near the head of the log to detect torn
      writes and trim torn records from the log appropriately.
      
      As it is, xlog_do_recovery_pass() only returns an error code in the
      event of CRC failure, which isn't enough information to trim the head of
      the log. Update xlog_do_recovery_pass() to optionally return the start
      block of the associated record when an error occurs. This patch contains
      no functional changes.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      
      d7f37692
    • B
      xfs: refactor and open code log record crc check · b94fb2d1
      Brian Foster 提交于
      Log record CRC verification currently occurs during active log recovery,
      immediately before a log record is unpacked. Therefore, the CRC
      calculation code is buried within the data unpack function. CRC
      verification pass support only needs to go so far as check the CRC, but
      this is not easily allowed as the code is currently organized.
      
      Since we now have a new log record processing helper, pull the record
      CRC verification code out from the unpack helper and open-code it at the
      top of the new process helper. This facilitates the ability to modify
      how records are processed based on the type of the current pass. This
      patch contains no functional changes.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      
      b94fb2d1
    • B
      xfs: refactor log record unpack and data processing · 9d94901f
      Brian Foster 提交于
      xlog_do_recovery_pass() duplicates a couple function calls related to
      processing log records because the function must handle wrapping around
      the end of the log if the head is behind the tail. This is implemented
      as separate loops. CRC verification pass support will modify how records
      are processed in both of these loops.
      
      Rather than continue to duplicate code, factor the calls that process a
      log record into a new helper and call that helper from both loops. This
      patch contains no functional changes.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      
      9d94901f
    • B
      xfs: detect and handle invalid iclog size set by mkfs · a70f9fe5
      Brian Foster 提交于
      XFS log records have separate fields for the record size and the iclog
      size used to write the record. mkfs.xfs zeroes the log and writes an
      unmount record to generate a clean log for the subsequent mount. The
      userspace record logging code has a bug where the iclog size (h_size)
      field of the log record is hardcoded to 32k, even if a log stripe unit
      is specified. The log record length is correctly extended to the stripe
      unit. Since the kernel log recovery code uses the h_size field to
      determine the log buffer size, this means that the kernel can attempt to
      read/process records larger than the buffer size and overrun the buffer.
      
      This has historically not been a problem because the kernel doesn't
      actually run through log recovery in the clean unmount case. Instead,
      the kernel detects that a single unmount record exists between the head
      and tail and pushes the tail forward such that the log is viewed as
      clean (head == tail). Once CRC verification is enabled, however, all
      records at the head of the log are verified for CRC errors and thus we
      are susceptible to overrun problems if the iclog field is not correct.
      
      While the core problem must be fixed in userspace, this is historical
      behavior that must be detected in the kernel to avoid severe side
      effects such as memory corruption and crashes. Update the log buffer
      size calculation code to detect this condition, warn the user and resize
      the log buffer based on the log stripe unit. Return a corruption error
      in cases where this does not look like a clean filesystem (i.e., the log
      record header indicates more than one operation).
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      
      a70f9fe5
  10. 10 11月, 2015 1 次提交
    • B
      xfs: fix log recovery op header validation assert · 848ccfc8
      Brian Foster 提交于
      Commit 89cebc84 ("xfs: validate transaction header length on log
      recovery") added additional validation of the on-disk op header length
      to protect from buffer overflow during log recovery. It accounts for the
      fact that the transaction header can be split across multiple op
      headers. It added an assert for when this occurs that verifies the
      length of the second part of a split transaction header is less than a
      full transaction header. In other words, it expects that the first op
      header of a split transaction header includes at least some portion of
      the transaction header.
      
      This expectation is not always valid as a zero-length op header can
      exist for the first op header of a split transaction header (see
      xlog_recover_add_to_trans() for details). This means that the second op
      header can have a valid, full length transaction header and thus the
      full header is copied in xlog_recover_add_to_cont_trans(). Fix the
      assert in xlog_recover_add_to_cont_trans() to handle this case correctly
      and require that the op header length is less than or equal to a full
      transaction header.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      
      848ccfc8
  11. 12 10月, 2015 1 次提交
    • B
      xfs: validate metadata LSNs against log on v5 superblocks · a45086e2
      Brian Foster 提交于
      Since the onset of v5 superblocks, the LSN of the last modification has
      been included in a variety of on-disk data structures. This LSN is used
      to provide log recovery ordering guarantees (e.g., to ensure an older
      log recovery item is not replayed over a newer target data structure).
      
      While this works correctly from the point a filesystem is formatted and
      mounted, userspace tools have some problematic behaviors that defeat
      this mechanism. For example, xfs_repair historically zeroes out the log
      unconditionally (regardless of whether corruption is detected). If this
      occurs, the LSN of the filesystem is reset and the log is now in a
      problematic state with respect to on-disk metadata structures that might
      have a larger LSN. Until either the log catches up to the highest
      previously used metadata LSN or each affected data structure is modified
      and written out without incident (which resets the metadata LSN), log
      recovery is susceptible to filesystem corruption.
      
      This problem is ultimately addressed and repaired in the associated
      userspace tools. The kernel is still responsible to detect the problem
      and notify the user that something is wrong. Check the superblock LSN at
      mount time and fail the mount if it is invalid. From that point on,
      trigger verifier failure on any metadata I/O where an invalid LSN is
      detected. This results in a filesystem shutdown and guarantees that we
      do not log metadata changes with invalid LSNs on disk. Since this is a
      known issue with a known recovery path, present a warning to instruct
      the user how to recover.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      a45086e2
  12. 19 8月, 2015 7 次提交
    • D
      xfs: log recovery needs to validate against sb_meta_uuid · fcfbe2c4
      Dave Chinner 提交于
      Now that sb_uuid can be changed by the user, we cannot use this to
      validate the metadata blocks being recovered belong to this
      filesystem. We must check against the sb_meta_uuid as that will
      remain unchanged.
      
      There is a complication in this code - the superblock itself. We can
      not check the sb_meta_uuid unconditionally, as that may not be set
      on disk. Hence we must verify the superblock sb_uuid matches between
      the log record and the in-core superblock.
      
      Found by inspection after the previous two problems were found.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      fcfbe2c4
    • B
      xfs: fix broken icreate log item cancellation · fc0d1656
      Brian Foster 提交于
      Inode cluster buffers are invalidated and cancelled when inode chunks
      are freed to notify log recovery that previous logged updates to the
      metadata buffer should be skipped. This ensures that log recovery does
      not overwrite buffers that might have already been reused.
      
      On v4 filesystems, inode chunk allocation and inode updates are logged
      via the cluster buffers and thus cancellation is easily detected via
      buffer cancellation items. v5 filesystems use the new icreate
      transaction, which uses logical logging and ordered buffers to log a
      full inode chunk allocation at once. The resulting icreate item often
      spans multiple inode cluster buffers.
      
      Log recovery checks for cancelled buffers when processing icreate log
      items, but it has a couple problems. First, it uses the full length of
      the inode chunk rather than the cluster size. Second, it uses the length
      in FSB units rather than BB units. Either of these problems prevent
      icreate recovery from identifying cancelled buffers and thus inode
      initialization proceeds unconditionally.
      
      Update xlog_recover_do_icreate_pass2() to iterate the icreate range in
      cluster sized increments and check each increment for cancellation.
      Since icreate is currently only used for the minimum atomic inode chunk
      allocation, we expect that either all or none of the buffers will be
      cancelled. Cancel the icreate if at least one buffer is cancelled to
      avoid making a bad situation worse by initializing a partial inode
      chunk, but detect such anomalies and warn the user.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      fc0d1656
    • B
      xfs: icreate log item recovery and cancellation tracepoints · 78d57e45
      Brian Foster 提交于
      Various log items have recovery tracepoints to identify whether a
      particular log item is recovered or cancelled. Add the equivalent
      tracepoints for the icreate transaction.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      78d57e45
    • B
      xfs: don't leave EFIs on AIL on mount failure · f0b2efad
      Brian Foster 提交于
      Log recovery occurs in two phases at mount time. In the first phase,
      EFIs and EFDs are processed and potentially cancelled out. EFIs without
      EFD objects are inserted into the AIL for processing and recovery in the
      second phase. xfs_mountfs() runs various other operations between the
      phases and is thus subject to failure. If failure occurs after the first
      phase but before the second, pending EFIs sit on the AIL, pin it and
      cause the mount to hang.
      
      Update the mount sequence to ensure that pending EFIs are cancelled in
      the event of failure. Add a recovery cancellation mechanism to iterate
      the AIL and cancel all EFI items when requested. Plumb cancellation
      support through the log mount finish helper and update xfs_mountfs() to
      invoke cancellation in the event of failure after recovery has started.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      f0b2efad
    • B
      xfs: use EFI refcount consistently in log recovery · e32a1d1f
      Brian Foster 提交于
      The EFI is initialized with a reference count of 2. One for the EFI to
      ensure the item makes it to the AIL and one for the subsequently created
      EFD to release the EFI once the EFD is committed. Log recovery uses the
      EFI in a similar manner, but implements a hack to remove both references
      in one call once the EFD is handled.
      
      Update log recovery to use EFI reference counting in a manner consistent
      with the log. When an EFI is encountered during recovery, an EFI item is
      allocated and inserted to the AIL directly. Since the EFI reference is
      typically dropped when the EFI is unpinned and this is analogous with
      AIL insertion, drop the EFI reference at this point.
      
      When a corresponding EFD is encountered in the log, this indicates that
      the extents were freed, no processing is required and the EFI can be
      dropped. Update xlog_recover_efd_pass2() to simply drop the EFD
      reference at this point rather than open code the AIL removal and EFI
      free.
      
      Remaining EFIs (i.e., with no corresponding EFD) are processed in
      xlog_recover_finish(). An EFD transaction is allocated and the extents
      are freed, which transfers ownership of the EFI reference to the EFD
      item in the log.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      e32a1d1f
    • B
      xfs: ensure EFD trans aborts on log recovery extent free failure · 6bc43af3
      Brian Foster 提交于
      Log recovery attempts to free extents with leftover EFIs in the AIL
      after initial processing. If the extent free fails (e.g., due to
      unrelated fs corruption), the transaction is cancelled, though it
      might not be dirtied at the time. If this is the case, the EFD does
      not abort and thus does not release the EFI. This can lead to hangs
      as the EFI pins the AIL.
      
      Update xlog_recover_process_efi() to log the EFD in the transaction
      before xfs_free_extent() errors are handled to ensure the
      transaction is dirty, aborts the EFD and releases the EFI on error.
      Since this is a requirement for EFD processing (and consistent with
      xfs_bmap_finish()), update the EFD logging helper to do the extent
      free and unconditionally log the EFD. This encodes the required EFD
      logging behavior into the helper and reduces the likelihood of
      errors down the road.
      
      [dchinner: re-add xfs_alloc.h to xfs_log_recover.c to fix build
       failure.]
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      6bc43af3
    • B
      xfs: disentagle EFI release from the extent count · 5e4b5386
      Brian Foster 提交于
      Release of the EFI either occurs based on the reference count or the
      extent count. The extent count used is either the count tracked in
      the EFI or EFD, depending on the particular situation. In either
      case, the count is initialized to the final value and thus always
      matches the current efi_next_extent value once the EFI is completely
      constructed.  For example, the EFI extent count is increased as the
      extents are logged in xfs_bmap_finish() and the full free list is
      always completely processed. Therefore, the count is guaranteed to
      be complete once the EFI transaction is committed. The EFD uses the
      efd_nextents counter to release the EFI. This counter is initialized
      to the count of the EFI when the EFD is created. Thus the EFD, as
      currently used, has no concept of partial EFI release based on
      extent count.
      
      Given that the EFI extent count is always released in whole, use of
      the extent count for reference counting is unnecessary. Remove this
      level of the API and release the EFI based on the core reference
      count. The efi_next_extent counter remains because it is still used
      to track the slot to log the next extent to free.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      5e4b5386