1. 12 10月, 2017 1 次提交
  2. 02 9月, 2017 2 次提交
    • A
      xfs: fix incorrect log_flushed on fsync · 47c7d0b1
      Amir Goldstein 提交于
      When calling into _xfs_log_force{,_lsn}() with a pointer
      to log_flushed variable, log_flushed will be set to 1 if:
      1. xlog_sync() is called to flush the active log buffer
      AND/OR
      2. xlog_wait() is called to wait on a syncing log buffers
      
      xfs_file_fsync() checks the value of log_flushed after
      _xfs_log_force_lsn() call to optimize away an explicit
      PREFLUSH request to the data block device after writing
      out all the file's pages to disk.
      
      This optimization is incorrect in the following sequence of events:
      
       Task A                    Task B
       -------------------------------------------------------
       xfs_file_fsync()
         _xfs_log_force_lsn()
           xlog_sync()
              [submit PREFLUSH]
                                 xfs_file_fsync()
                                   file_write_and_wait_range()
                                     [submit WRITE X]
                                     [endio  WRITE X]
                                   _xfs_log_force_lsn()
                                     xlog_wait()
              [endio  PREFLUSH]
      
      The write X is not guarantied to be on persistent storage
      when PREFLUSH request in completed, because write A was submitted
      after the PREFLUSH request, but xfs_file_fsync() of task A will
      be notified of log_flushed=1 and will skip explicit flush.
      
      If the system crashes after fsync of task A, write X may not be
      present on disk after reboot.
      
      This bug was discovered and demonstrated using Josef Bacik's
      dm-log-writes target, which can be used to record block io operations
      and then replay a subset of these operations onto the target device.
      The test goes something like this:
      - Use fsx to execute ops of a file and record ops on log device
      - Every now and then fsync the file, store md5 of file and mark
        the location in the log
      - Then replay log onto device for each mark, mount fs and compare
        md5 of file to stored value
      
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Josef Bacik <jbacik@fb.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      47c7d0b1
    • D
      xfs: evict all inodes involved with log redo item · 799ea9e9
      Darrick J. Wong 提交于
      When we introduced the bmap redo log items, we set MS_ACTIVE on the
      mountpoint and XFS_IRECOVERY on the inode to prevent unlinked inodes
      from being truncated prematurely during log recovery.  This also had the
      effect of putting linked inodes on the lru instead of evicting them.
      
      Unfortunately, we neglected to find all those unreferenced lru inodes
      and evict them after finishing log recovery, which means that we leak
      them if anything goes wrong in the rest of xfs_mountfs, because the lru
      is only cleaned out on unmount.
      
      Therefore, evict unreferenced inodes in the lru list immediately
      after clearing MS_ACTIVE.
      
      Fixes: 17c12bcd ("xfs: when replaying bmap operations, don't let unlinked inodes get reaped")
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Cc: viro@ZenIV.linux.org.uk
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      799ea9e9
  3. 23 8月, 2017 2 次提交
  4. 18 8月, 2017 1 次提交
    • D
      xfs: clear MS_ACTIVE after finishing log recovery · 8204f8dd
      Darrick J. Wong 提交于
      Way back when we established inode block-map redo log items, it was
      discovered that we needed to prevent the VFS from evicting inodes during
      log recovery because any given inode might be have bmap redo items to
      replay even if the inode has no link count and is ultimately deleted,
      and any eviction of an unlinked inode causes the inode to be truncated
      and freed too early.
      
      To make this possible, we set MS_ACTIVE so that inodes would not be torn
      down immediately upon release.  Unfortunately, this also results in the
      quota inodes not being released at all if a later part of the mount
      process should fail, because we never reclaim the inodes.  So, set
      MS_ACTIVE right before we do the last part of log recovery and clear it
      immediately after we finish the log recovery so that everything
      will be torn down properly if we abort the mount.
      
      Fixes: 17c12bcd ("xfs: when replaying bmap operations, don't let unlinked inodes get reaped")
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      8204f8dd
  5. 28 6月, 2017 3 次提交
  6. 20 6月, 2017 1 次提交
    • D
      xfs: remove double-underscore integer types · c8ce540d
      Darrick J. Wong 提交于
      This is a purely mechanical patch that removes the private
      __{u,}int{8,16,32,64}_t typedefs in favor of using the system
      {u,}int{8,16,32,64}_t typedefs.  This is the sed script used to perform
      the transformation and fix the resulting whitespace and indentation
      errors:
      
      s/typedef\t__uint8_t/typedef __uint8_t\t/g
      s/typedef\t__uint/typedef __uint/g
      s/typedef\t__int\([0-9]*\)_t/typedef int\1_t\t/g
      s/__uint8_t\t/__uint8_t\t\t/g
      s/__uint/uint/g
      s/__int\([0-9]*\)_t\t/__int\1_t\t\t/g
      s/__int/int/g
      /^typedef.*int[0-9]*_t;$/d
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      c8ce540d
  7. 19 6月, 2017 2 次提交
    • B
      xfs: dump transaction usage details on log reservation overrun · d4ca1d55
      Brian Foster 提交于
      If a transaction log reservation overrun occurs, the ticket data
      associated with the reservation is dumped in xfs_log_commit_cil().
      This occurs long after the transaction items and details have been
      removed from the transaction and effectively lost. This limited set
      of ticket data provides very little information to support debugging
      transaction overruns based on the typical report.
      
      To improve transaction log reservation overrun reporting, create a
      helper to dump transaction details such as log items, log vector
      data, etc., as well as the underlying ticket data for the
      transaction. Move the overrun detection from xfs_log_commit_cil() to
      xlog_cil_insert_items() so it occurs prior to migration of the
      logged items to the CIL. Call the new helper such that it is able to
      dump this transaction data before it is lost.
      
      Also, warn on overrun to provide callstack context for the offending
      transaction and include a few additional messages from
      xlog_cil_insert_items() to display the reservation consumed locally
      for overhead such as log vector headers, split region headers and
      the context ticket. This provides a complete general breakdown of
      the reservation consumption of a transaction when/if it happens to
      overrun the reservation.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      d4ca1d55
    • B
      xfs: separate shutdown from ticket reservation print helper · 7d2d5653
      Brian Foster 提交于
      xlog_print_tic_res() pre-dates delayed logging and the committed
      items list (CIL) and thus retains some factoring warts, such as hard
      coded function names in the output and the fact that it induces a
      shutdown.
      
      In preparation for more detailed logging of regular transaction
      overrun situations, refactor xlog_print_tic_res() to be slightly
      more generic. Reword some of the warning messages and pull the
      shutdown into the callers.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      7d2d5653
  8. 26 4月, 2017 1 次提交
  9. 04 4月, 2017 1 次提交
    • B
      xfs: use dedicated log worker wq to avoid deadlock with cil wq · 696a5620
      Brian Foster 提交于
      The log covering background task used to be part of the xfssyncd
      workqueue. That workqueue was removed as of commit 5889608d ("xfs:
      syncd workqueue is no more") and the associated work item scheduled
      to the xfs-log wq. The latter is used for log buffer I/O completion.
      
      Since xfs_log_worker() can invoke a log flush, a deadlock is
      possible between the xfs-log and xfs-cil workqueues. Consider the
      following codepath from xfs_log_worker():
      
      xfs_log_worker()
        xfs_log_force()
          _xfs_log_force()
            xlog_cil_force()
              xlog_cil_force_lsn()
                xlog_cil_push_now()
                  flush_work()
      
      The above is in xfs-log wq context and blocked waiting on the
      completion of an xfs-cil work item. Concurrently, the cil push in
      progress can end up blocked here:
      
      xlog_cil_push_work()
        xlog_cil_push()
          xlog_write()
            xlog_state_get_iclog_space()
              xlog_wait(&log->l_flush_wait, ...)
      
      The above is in xfs-cil context waiting on log buffer I/O
      completion, which executes in xfs-log wq context. In this scenario
      both workqueues are deadlocked waiting on eachother.
      
      Add a new workqueue specifically for the high level log covering and
      ail pushing worker, as was the case prior to commit 5889608d.
      Diagnosed-by: NDavid Jeffery <djeffery@redhat.com>
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      696a5620
  10. 10 1月, 2017 1 次提交
  11. 09 12月, 2016 1 次提交
  12. 05 12月, 2016 1 次提交
    • D
      xfs: optimise CRC updates · cae028df
      Dave Chinner 提交于
      Nick Piggin reported that the CRC overhead in an fsync heavy
      workload was higher than expected on a Power8 machine. Part of this
      was to do with the fact that the power8 CRC implementation is not
      efficient for CRC lengths of less than 512 bytes, and so the way we
      split the CRCs over the CRC field means a lot of the CRCs are
      reduced to being less than than optimal size.
      
      To optimise this, change the CRC update mechanism to zero the CRC
      field first, and then compute the CRC in one pass over the buffer
      and write the result back into the buffer. We can do this safely
      because anything writing a CRC has exclusive access to the buffer
      the CRC is being calculated over.
      
      We leave the CRC verify code the same - it still splits the CRC
      calculation - because we do not want read-only operations modifying
      the underlying buffer. This is because read-only operations may not
      have an exclusive access to the buffer guaranteed, and so temporary
      modifications could leak out to to other processes accessing the
      buffer concurrently.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      cae028df
  13. 20 7月, 2016 1 次提交
  14. 01 6月, 2016 1 次提交
  15. 06 4月, 2016 2 次提交
  16. 07 3月, 2016 1 次提交
  17. 10 2月, 2016 4 次提交
  18. 05 1月, 2016 1 次提交
    • B
      xfs: debug mode log record crc error injection · 609adfc2
      Brian Foster 提交于
      XFS now uses CRC verification over a limited section of the log to
      detect torn writes prior to a crash. This is difficult to test directly
      due to the timing and hardware requirements to cause a short write.
      
      Add a mechanism to inject CRC errors into log records to facilitate
      testing torn write detection during log recovery. This mechanism is
      dangerous and can result in filesystem corruption. Thus, it is only
      available in DEBUG mode for testing/development purposes. Set a non-zero
      value to the following sysfs entry to enable error injection:
      
      	/sys/fs/xfs/<dev>/log/log_badcrc_factor
      
      Once enabled, XFS intentionally writes an invalid CRC to a log record at
      some random point in the future based on the provided frequency. The
      filesystem immediately shuts down once the record has been written to
      the physical log to prevent metadata writeback (e.g., AIL insertion)
      once the log write completes. This helps reasonably simulate a torn
      write to the log as the affected record must be safe to discard. The
      next mount after the intentional shutdown requires log recovery and
      should detect and recover from the torn write.
      
      Note again that this _will_ result in data loss or worse. For testing
      and development purposes only!
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      
      609adfc2
  19. 04 1月, 2016 1 次提交
  20. 12 10月, 2015 3 次提交
    • B
      xfs: per-filesystem stats counter implementation · ff6d6af2
      Bill O'Donnell 提交于
      This patch modifies the stats counting macros and the callers
      to those macros to properly increment, decrement, and add-to
      the xfs stats counts. The counts for global and per-fs stats
      are correctly advanced, and cleared by writing a "1" to the
      corresponding clear file.
      
      global counts: /sys/fs/xfs/stats/stats
      per-fs counts: /sys/fs/xfs/sda*/stats/stats
      
      global clear:  /sys/fs/xfs/stats/stats_clear
      per-fs clear:  /sys/fs/xfs/sda*/stats/stats_clear
      
      [dchinner: cleaned up macro variables, removed CONFIG_FS_PROC around
       stats structures and macros. ]
      Signed-off-by: NBill O'Donnell <billodo@redhat.com>
      Reviewed-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      ff6d6af2
    • E
      xfs: avoid null *src in memcpy call in xlog_write · 91f9f5fe
      Eric Sandeen 提交于
      The gcc undefined behavior sanitizer caught this; surely
      any sane memcpy implementation will no-op if size == 0,
      but behavior with a *src of NULL is technically undefined
      (declared nonnull), so avoid it here.
      
      We are actually in this situation frequently via
      xlog_commit_record(), because:
      
              struct xfs_log_iovec reg = {
                      .i_addr = NULL,
                      .i_len = 0,
                      .i_type = XLOG_REG_TYPE_COMMIT,
              };
      Reported-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      91f9f5fe
    • B
      xfs: validate metadata LSNs against log on v5 superblocks · a45086e2
      Brian Foster 提交于
      Since the onset of v5 superblocks, the LSN of the last modification has
      been included in a variety of on-disk data structures. This LSN is used
      to provide log recovery ordering guarantees (e.g., to ensure an older
      log recovery item is not replayed over a newer target data structure).
      
      While this works correctly from the point a filesystem is formatted and
      mounted, userspace tools have some problematic behaviors that defeat
      this mechanism. For example, xfs_repair historically zeroes out the log
      unconditionally (regardless of whether corruption is detected). If this
      occurs, the LSN of the filesystem is reset and the log is now in a
      problematic state with respect to on-disk metadata structures that might
      have a larger LSN. Until either the log catches up to the highest
      previously used metadata LSN or each affected data structure is modified
      and written out without incident (which resets the metadata LSN), log
      recovery is susceptible to filesystem corruption.
      
      This problem is ultimately addressed and repaired in the associated
      userspace tools. The kernel is still responsible to detect the problem
      and notify the user that something is wrong. Check the superblock LSN at
      mount time and fail the mount if it is invalid. From that point on,
      trigger verifier failure on any metadata I/O where an invalid LSN is
      detected. This results in a filesystem shutdown and guarantees that we
      do not log metadata changes with invalid LSNs on disk. Since this is a
      known issue with a known recovery path, present a warning to instruct
      the user how to recover.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      a45086e2
  21. 19 8月, 2015 2 次提交
    • B
      xfs: checksum log record ext headers based on record size · a3f20014
      Brian Foster 提交于
      The first 4 bytes of every basic block in the physical log is stamped
      with the current lsn. To support this mechanism, the log record header
      (first block of each new log record) contains space for the original
      first byte of each log record block before it is replaced with the lsn.
      The log record header has space for 32k worth of blocks. The version 2
      log adds new extended record headers for each additional 32k worth of
      blocks beyond what is supported by the record header.
      
      The log record checksum incorporates the log record header, the extended
      headers and the record payload. xlog_cksum() checksums the extended
      headers based on log->l_iclog_heads, which specifies the number of
      extended headers in a log record based on the log buffer size mount
      option. The log buffer size is variable, however, and thus means the
      checksum can be calculated differently based on how a filesystem is
      mounted. This is problematic if a filesystem crashes and recovery occurs
      on a subsequent mount using a different log buffer size. For example,
      crash an active filesystem that is mounted with the default (32k)
      logbsize, attempt remount/recovery using '-o logbsize=64k' and the mount
      fails on or warns about log checksum failures.
      
      To avoid this problem, update xlog_cksum() to calculate the checksum
      based on the size of the log buffer according to the log record. The
      size is already included in the h_size field of the log record header
      and thus is available at log recovery time. Extended log record headers
      are also only written when the log record is large enough to require
      them. This makes checksum calculation of log records consistent with the
      extended record header mechanism as well as how on-disk records are
      checksummed with various log buffer size mount options.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      a3f20014
    • B
      xfs: don't leave EFIs on AIL on mount failure · f0b2efad
      Brian Foster 提交于
      Log recovery occurs in two phases at mount time. In the first phase,
      EFIs and EFDs are processed and potentially cancelled out. EFIs without
      EFD objects are inserted into the AIL for processing and recovery in the
      second phase. xfs_mountfs() runs various other operations between the
      phases and is thus subject to failure. If failure occurs after the first
      phase but before the second, pending EFIs sit on the AIL, pin it and
      cause the mount to hang.
      
      Update the mount sequence to ensure that pending EFIs are cancelled in
      the event of failure. Add a recovery cancellation mechanism to iterate
      the AIL and cancel all EFI items when requested. Plumb cancellation
      support through the log mount finish helper and update xfs_mountfs() to
      invoke cancellation in the event of failure after recovery has started.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      f0b2efad
  22. 29 7月, 2015 1 次提交
  23. 22 6月, 2015 3 次提交
  24. 04 6月, 2015 1 次提交
  25. 22 1月, 2015 1 次提交
    • D
      xfs: consolidate superblock logging functions · 61e63ecb
      Dave Chinner 提交于
      We now have several superblock loggin functions that are identical
      except for the transaction reservation and whether it shoul dbe a
      synchronous transaction or not. Consolidate these all into a single
      function, a single reserveration and a sync flag and call it
      xfs_sync_sb().
      
      Also, xfs_mod_sb() is not really a modification function - it's the
      operation of logging the superblock buffer. hence change the name of
      it to reflect this.
      
      Note that we have to change the mp->m_update_flags that are passed
      around at mount time to a boolean simply to indicate a superblock
      update is needed.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      61e63ecb
  26. 24 12月, 2014 1 次提交