1. 12 2月, 2019 3 次提交
  2. 13 12月, 2018 1 次提交
  3. 29 9月, 2018 1 次提交
    • B
      xfs: remove invalid log recovery first/last cycle check · ec2ed0b5
      Brian Foster 提交于
      One of the first steps of log recovery is to check for the special
      case of a zeroed log. If the first cycle in the log is zero or the
      tail portion of the log is zeroed, the head is set to the first
      instance of cycle 0. xlog_find_zeroed() includes a sanity check that
      enforces that the first cycle in the log must be 1 if the last cycle
      is 0. While this is true in most cases, the check is not totally
      valid because it doesn't consider the case where the filesystem
      crashed after a partial/out of order log buffer completion that
      wraps around the end of the physical log.
      
      For example, consider a filesystem that has completed most of the
      first cycle of the log, reaches the end of the physical log and
      splits the next single log buffer write into two in order to wrap
      around the end of the log. If these I/Os are reordered, the second
      (wrapped) I/O completes and the first happens to fail, the log is
      left in a state where the last cycle of the log is 0 and the first
      cycle is 2. This causes the xlog_find_zeroed() sanity check to fail
      and prevents the filesystem from mounting. This situation has been
      reproduced on particular systems via repeated runs of generic/475.
      
      This is an expected state that log recovery already knows how to
      deal with, however. Since the log is still partially zeroed, the
      head is detected correctly and points to a valid tail. The
      subsequent stale block detection clears blocks beyond the head up to
      the tail (within a maximum range), with the express purpose of
      clearing such out of order writes. As expected, this removes the out
      of order cycle 2 blocks at the physical start of the log.
      
      In other words, the only thing that prevents a clean mount and
      recovery of the filesystem in this scenario is the specific (last ==
      0 && first != 1) sanity check in xlog_find_zeroed(). Since the log
      head/tail are now independently validated via cycle, log record and
      CRC checks, this highly specific first cycle check is of dubious
      value. Remove it and rely on the higher level validation to
      determine whether log content is sane and recoverable.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      
      ec2ed0b5
  4. 03 8月, 2018 2 次提交
  5. 27 7月, 2018 5 次提交
  6. 12 7月, 2018 3 次提交
  7. 09 6月, 2018 2 次提交
  8. 07 6月, 2018 1 次提交
    • D
      xfs: convert to SPDX license tags · 0b61f8a4
      Dave Chinner 提交于
      Remove the verbose license text from XFS files and replace them
      with SPDX tags. This does not change the license of any of the code,
      merely refers to the common, up-to-date license files in LICENSES/
      
      This change was mostly scripted. fs/xfs/Makefile and
      fs/xfs/libxfs/xfs_fs.h were modified by hand, the rest were detected
      and modified by the following command:
      
      for f in `git grep -l "GNU General" fs/xfs/` ; do
      	echo $f
      	cat $f | awk -f hdr.awk > $f.new
      	mv -f $f.new $f
      done
      
      And the hdr.awk script that did the modification (including
      detecting the difference between GPL-2.0 and GPL-2.0+ licenses)
      is as follows:
      
      $ cat hdr.awk
      BEGIN {
      	hdr = 1.0
      	tag = "GPL-2.0"
      	str = ""
      }
      
      /^ \* This program is free software/ {
      	hdr = 2.0;
      	next
      }
      
      /any later version./ {
      	tag = "GPL-2.0+"
      	next
      }
      
      /^ \*\// {
      	if (hdr > 0.0) {
      		print "// SPDX-License-Identifier: " tag
      		print str
      		print $0
      		str=""
      		hdr = 0.0
      		next
      	}
      	print $0
      	next
      }
      
      /^ \* / {
      	if (hdr > 1.0)
      		next
      	if (hdr > 0.0) {
      		if (str != "")
      			str = str "\n"
      		str = str $0
      		next
      	}
      	print $0
      	next
      }
      
      /^ \*/ {
      	if (hdr > 0.0)
      		next
      	print $0
      	next
      }
      
      // {
      	if (hdr > 0.0) {
      		if (str != "")
      			str = str "\n"
      		str = str $0
      		next
      	}
      	print $0
      }
      
      END { }
      $
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      0b61f8a4
  9. 05 6月, 2018 1 次提交
  10. 10 5月, 2018 1 次提交
  11. 30 3月, 2018 1 次提交
  12. 12 3月, 2018 3 次提交
  13. 29 1月, 2018 1 次提交
    • C
      Split buffer's b_fspriv field · fb1755a6
      Carlos Maiolino 提交于
      By splitting the b_fspriv field into two different fields (b_log_item
      and b_li_list). It's possible to get rid of an old ABI workaround, by
      using the new b_log_item field to store xfs_buf_log_item separated from
      the log items attached to the buffer, which will be linked in the new
      b_li_list field.
      
      This way, there is no more need to reorder the log items list to place
      the buf_log_item at the beginning of the list, simplifying a bit the
      logic to handle buffer IO.
      
      This also opens the possibility to change buffer's log items list into a
      proper list_head.
      
      b_log_item field is still defined as a void *, because it is still used
      by the log buffers to store xlog_in_core structures, and there is no
      need to add an extra field on xfs_buf just for xlog_in_core.
      Signed-off-by: NCarlos Maiolino <cmaiolino@redhat.com>
      Reviewed-by: NBill O'Donnell <billodo@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      [darrick: minor style changes]
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      fb1755a6
  14. 13 1月, 2018 2 次提交
  15. 09 1月, 2018 2 次提交
  16. 28 11月, 2017 1 次提交
    • D
      xfs: log recovery should replay deferred ops in order · 50995582
      Darrick J. Wong 提交于
      As part of testing log recovery with dm_log_writes, Amir Goldstein
      discovered an error in the deferred ops recovery that lead to corruption
      of the filesystem metadata if a reflink+rmap filesystem happened to shut
      down midway through a CoW remap:
      
      "This is what happens [after failed log recovery]:
      
      "Phase 1 - find and verify superblock...
      "Phase 2 - using internal log
      "        - zero log...
      "        - scan filesystem freespace and inode maps...
      "        - found root inode chunk
      "Phase 3 - for each AG...
      "        - scan (but don't clear) agi unlinked lists...
      "        - process known inodes and perform inode discovery...
      "        - agno = 0
      "data fork in regular inode 134 claims CoW block 376
      "correcting nextents for inode 134
      "bad data fork in inode 134
      "would have cleared inode 134"
      
      Hou Tao dissected the log contents of exactly such a crash:
      
      "According to the implementation of xfs_defer_finish(), these ops should
      be completed in the following sequence:
      
      "Have been done:
      "(1) CUI: Oper (160)
      "(2) BUI: Oper (161)
      "(3) CUD: Oper (194), for CUI Oper (160)
      "(4) RUI A: Oper (197), free rmap [0x155, 2, -9]
      
      "Should be done:
      "(5) BUD: for BUI Oper (161)
      "(6) RUI B: add rmap [0x155, 2, 137]
      "(7) RUD: for RUI A
      "(8) RUD: for RUI B
      
      "Actually be done by xlog_recover_process_intents()
      "(5) BUD: for BUI Oper (161)
      "(6) RUI B: add rmap [0x155, 2, 137]
      "(7) RUD: for RUI B
      "(8) RUD: for RUI A
      
      "So the rmap entry [0x155, 2, -9] for COW should be freed firstly,
      then a new rmap entry [0x155, 2, 137] will be added. However, as we can see
      from the log record in post_mount.log (generated after umount) and the trace
      print, the new rmap entry [0x155, 2, 137] are added firstly, then the rmap
      entry [0x155, 2, -9] are freed."
      
      When reconstructing the internal log state from the log items found on
      disk, it's required that deferred ops replay in exactly the same order
      that they would have had the filesystem not gone down.  However,
      replaying unfinished deferred ops can create /more/ deferred ops.  These
      new deferred ops are finished in the wrong order.  This causes fs
      corruption and replay crashes, so let's create a single defer_ops to
      handle the subsequent ops created during replay, then use one single
      transaction at the end of log recovery to ensure that everything is
      replayed in the same order as they're supposed to be.
      Reported-by: NAmir Goldstein <amir73il@gmail.com>
      Analyzed-by: NHou Tao <houtao1@huawei.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Tested-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      50995582
  17. 07 11月, 2017 1 次提交
  18. 02 11月, 2017 1 次提交
  19. 27 10月, 2017 3 次提交
    • B
      xfs: fix log block underflow during recovery cycle verification · 9f2a4505
      Brian Foster 提交于
      It is possible for mkfs to format very small filesystems with too
      small of an internal log with respect to the various minimum size
      and block count requirements. If this occurs when the log happens to
      be smaller than the scan window used for cycle verification and the
      scan wraps the end of the log, the start_blk calculation in
      xlog_find_head() underflows and leads to an attempt to scan an
      invalid range of log blocks. This results in log recovery failure
      and a failed mount.
      
      Since there may be filesystems out in the wild with this kind of
      geometry, we cannot simply refuse to mount. Instead, cap the scan
      window for cycle verification to the size of the physical log. This
      ensures that the cycle verification proceeds as expected when the
      scan wraps the end of the log.
      Reported-by: NZorro Lang <zlang@redhat.com>
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      9f2a4505
    • B
      xfs: more robust recovery xlog buffer validation · 99c26595
      Brian Foster 提交于
      mkfs has a historical problem where it can format very small
      filesystems with too small of a physical log. Under certain
      conditions, log recovery of an associated filesystem can end up
      passing garbage parameter values to some of the cycle and log record
      verification functions due to bugs in log recovery not dealing with
      such filesystems properly. This results in attempts to read from
      bogus/underflowed log block addresses.
      
      Since the buffer read may ultimately succeed, log recovery can
      proceed with bogus data and otherwise go off the rails and crash.
      One example of this is a negative last_blk being passed to
      xlog_find_verify_log_record() causing us to skip the loop, pass a
      NULL head pointer to xlog_header_check_mount() and crash.
      
      Improve the xlog buffer verification to address this problem. We
      already verify xlog buffer length, so update this mechanism to also
      sanity check for a valid log relative block address and otherwise
      return an error. Pass a fixed, valid log block address from
      xlog_get_bp() since the target address will be validated when the
      buffer is read. This ensures that any bogus log block address/length
      calculations lead to graceful mount failure rather than risking a
      crash or worse if recovery proceeds with bogus data.
      Reported-by: NZorro Lang <zlang@redhat.com>
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      99c26595
    • C
      xfs: remove the never fully implemented UUID fork format · 42b67dc6
      Christoph Hellwig 提交于
      Remove the dead code dealing with the UUID fork format that was never
      implemented in Linux (and neither in IRIX as far as I know).
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      42b67dc6
  20. 02 9月, 2017 1 次提交
  21. 23 8月, 2017 4 次提交
    • B
      xfs: add log recovery tracepoint for head/tail · e67d3d42
      Brian Foster 提交于
      Torn write detection and tail overwrite detection can shift the log
      head and tail respectively in the event of CRC mismatch or
      corruption errors. Add a high-level log recovery tracepoint to dump
      the final log head/tail and make those values easily attainable in
      debug/diagnostic situations.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      e67d3d42
    • B
      xfs: handle -EFSCORRUPTED during head/tail verification · a4c9b34d
      Brian Foster 提交于
      Torn write and tail overwrite detection both trigger only on
      -EFSBADCRC errors. While this is the most likely failure scenario
      for each condition, -EFSCORRUPTED is still possible in certain cases
      depending on what ends up on disk when a torn write or partial tail
      overwrite occurs. For example, an invalid log record h_len can lead
      to an -EFSCORRUPTED error when running the log recovery CRC pass.
      
      Therefore, update log head and tail verification to trigger the
      associated head/tail fixups in the event of -EFSCORRUPTED errors
      along with -EFSBADCRC. Also, -EFSCORRUPTED can currently be returned
      from xlog_do_recovery_pass() before rhead_blk is initialized if the
      first record encountered happens to be corrupted. This leads to an
      incorrect 'first_bad' return value. Initialize rhead_blk earlier in
      the function to address that problem as well.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      a4c9b34d
    • B
      xfs: fix log recovery corruption error due to tail overwrite · 4a4f66ea
      Brian Foster 提交于
      If we consider the case where the tail (T) of the log is pinned long
      enough for the head (H) to push and block behind the tail, we can
      end up blocked in the following state without enough free space (f)
      in the log to satisfy a transaction reservation:
      
      	0	phys. log	N
      	[-------HffT---H'--T'---]
      
      The last good record in the log (before H) refers to T. The tail
      eventually pushes forward (T') leaving more free space in the log
      for writes to H. At this point, suppose space frees up in the log
      for the maximum of 8 in-core log buffers to start flushing out to
      the log. If this pushes the head from H to H', these next writes
      overwrite the previous tail T. This is safe because the items logged
      from T to T' have been written back and removed from the AIL.
      
      If the next log writes (H -> H') happen to fail and result in
      partial records in the log, the filesystem shuts down having
      overwritten T with invalid data. Log recovery correctly locates H on
      the subsequent mount, but H still refers to the now corrupted tail
      T. This results in log corruption errors and recovery failure.
      
      Since the tail overwrite results from otherwise correct runtime
      behavior, it is up to log recovery to try and deal with this
      situation. Update log recovery tail verification to run a CRC pass
      from the first record past the tail to the head. This facilitates
      error detection at T and moves the recovery tail to the first good
      record past H' (similar to truncating the head on torn write
      detection). If corruption is detected beyond the range possibly
      affected by the max number of iclogs, the log is legitimately
      corrupted and log recovery failure is expected.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      4a4f66ea
    • B
      xfs: always verify the log tail during recovery · 5297ac1f
      Brian Foster 提交于
      Log tail verification currently only occurs when torn writes are
      detected at the head of the log. This was introduced because a
      change in the head block due to torn writes can lead to a change in
      the tail block (each log record header references the current tail)
      and the tail block should be verified before log recovery proceeds.
      
      Tail corruption is possible outside of torn write scenarios,
      however. For example, partial log writes can be detected and cleared
      during the initial head/tail block discovery process. If the partial
      write coincides with a tail overwrite, the log tail is corrupted and
      recovery fails.
      
      To facilitate correct handling of log tail overwites, update log
      recovery to always perform tail verification. This is necessary to
      detect potential tail overwrite conditions when torn writes may not
      have occurred. This changes normal (i.e., no torn writes) recovery
      behavior slightly to detect and return CRC related errors near the
      tail before actual recovery starts.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      5297ac1f