1. 10 1月, 2017 1 次提交
  2. 09 12月, 2016 1 次提交
  3. 05 12月, 2016 1 次提交
    • D
      xfs: optimise CRC updates · cae028df
      Dave Chinner 提交于
      Nick Piggin reported that the CRC overhead in an fsync heavy
      workload was higher than expected on a Power8 machine. Part of this
      was to do with the fact that the power8 CRC implementation is not
      efficient for CRC lengths of less than 512 bytes, and so the way we
      split the CRCs over the CRC field means a lot of the CRCs are
      reduced to being less than than optimal size.
      
      To optimise this, change the CRC update mechanism to zero the CRC
      field first, and then compute the CRC in one pass over the buffer
      and write the result back into the buffer. We can do this safely
      because anything writing a CRC has exclusive access to the buffer
      the CRC is being calculated over.
      
      We leave the CRC verify code the same - it still splits the CRC
      calculation - because we do not want read-only operations modifying
      the underlying buffer. This is because read-only operations may not
      have an exclusive access to the buffer guaranteed, and so temporary
      modifications could leak out to to other processes accessing the
      buffer concurrently.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      cae028df
  4. 20 7月, 2016 1 次提交
  5. 01 6月, 2016 1 次提交
  6. 06 4月, 2016 2 次提交
  7. 07 3月, 2016 1 次提交
  8. 10 2月, 2016 4 次提交
  9. 05 1月, 2016 1 次提交
    • B
      xfs: debug mode log record crc error injection · 609adfc2
      Brian Foster 提交于
      XFS now uses CRC verification over a limited section of the log to
      detect torn writes prior to a crash. This is difficult to test directly
      due to the timing and hardware requirements to cause a short write.
      
      Add a mechanism to inject CRC errors into log records to facilitate
      testing torn write detection during log recovery. This mechanism is
      dangerous and can result in filesystem corruption. Thus, it is only
      available in DEBUG mode for testing/development purposes. Set a non-zero
      value to the following sysfs entry to enable error injection:
      
      	/sys/fs/xfs/<dev>/log/log_badcrc_factor
      
      Once enabled, XFS intentionally writes an invalid CRC to a log record at
      some random point in the future based on the provided frequency. The
      filesystem immediately shuts down once the record has been written to
      the physical log to prevent metadata writeback (e.g., AIL insertion)
      once the log write completes. This helps reasonably simulate a torn
      write to the log as the affected record must be safe to discard. The
      next mount after the intentional shutdown requires log recovery and
      should detect and recover from the torn write.
      
      Note again that this _will_ result in data loss or worse. For testing
      and development purposes only!
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      
      609adfc2
  10. 04 1月, 2016 1 次提交
  11. 12 10月, 2015 3 次提交
    • B
      xfs: per-filesystem stats counter implementation · ff6d6af2
      Bill O'Donnell 提交于
      This patch modifies the stats counting macros and the callers
      to those macros to properly increment, decrement, and add-to
      the xfs stats counts. The counts for global and per-fs stats
      are correctly advanced, and cleared by writing a "1" to the
      corresponding clear file.
      
      global counts: /sys/fs/xfs/stats/stats
      per-fs counts: /sys/fs/xfs/sda*/stats/stats
      
      global clear:  /sys/fs/xfs/stats/stats_clear
      per-fs clear:  /sys/fs/xfs/sda*/stats/stats_clear
      
      [dchinner: cleaned up macro variables, removed CONFIG_FS_PROC around
       stats structures and macros. ]
      Signed-off-by: NBill O'Donnell <billodo@redhat.com>
      Reviewed-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      ff6d6af2
    • E
      xfs: avoid null *src in memcpy call in xlog_write · 91f9f5fe
      Eric Sandeen 提交于
      The gcc undefined behavior sanitizer caught this; surely
      any sane memcpy implementation will no-op if size == 0,
      but behavior with a *src of NULL is technically undefined
      (declared nonnull), so avoid it here.
      
      We are actually in this situation frequently via
      xlog_commit_record(), because:
      
              struct xfs_log_iovec reg = {
                      .i_addr = NULL,
                      .i_len = 0,
                      .i_type = XLOG_REG_TYPE_COMMIT,
              };
      Reported-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      91f9f5fe
    • B
      xfs: validate metadata LSNs against log on v5 superblocks · a45086e2
      Brian Foster 提交于
      Since the onset of v5 superblocks, the LSN of the last modification has
      been included in a variety of on-disk data structures. This LSN is used
      to provide log recovery ordering guarantees (e.g., to ensure an older
      log recovery item is not replayed over a newer target data structure).
      
      While this works correctly from the point a filesystem is formatted and
      mounted, userspace tools have some problematic behaviors that defeat
      this mechanism. For example, xfs_repair historically zeroes out the log
      unconditionally (regardless of whether corruption is detected). If this
      occurs, the LSN of the filesystem is reset and the log is now in a
      problematic state with respect to on-disk metadata structures that might
      have a larger LSN. Until either the log catches up to the highest
      previously used metadata LSN or each affected data structure is modified
      and written out without incident (which resets the metadata LSN), log
      recovery is susceptible to filesystem corruption.
      
      This problem is ultimately addressed and repaired in the associated
      userspace tools. The kernel is still responsible to detect the problem
      and notify the user that something is wrong. Check the superblock LSN at
      mount time and fail the mount if it is invalid. From that point on,
      trigger verifier failure on any metadata I/O where an invalid LSN is
      detected. This results in a filesystem shutdown and guarantees that we
      do not log metadata changes with invalid LSNs on disk. Since this is a
      known issue with a known recovery path, present a warning to instruct
      the user how to recover.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      a45086e2
  12. 19 8月, 2015 2 次提交
    • B
      xfs: checksum log record ext headers based on record size · a3f20014
      Brian Foster 提交于
      The first 4 bytes of every basic block in the physical log is stamped
      with the current lsn. To support this mechanism, the log record header
      (first block of each new log record) contains space for the original
      first byte of each log record block before it is replaced with the lsn.
      The log record header has space for 32k worth of blocks. The version 2
      log adds new extended record headers for each additional 32k worth of
      blocks beyond what is supported by the record header.
      
      The log record checksum incorporates the log record header, the extended
      headers and the record payload. xlog_cksum() checksums the extended
      headers based on log->l_iclog_heads, which specifies the number of
      extended headers in a log record based on the log buffer size mount
      option. The log buffer size is variable, however, and thus means the
      checksum can be calculated differently based on how a filesystem is
      mounted. This is problematic if a filesystem crashes and recovery occurs
      on a subsequent mount using a different log buffer size. For example,
      crash an active filesystem that is mounted with the default (32k)
      logbsize, attempt remount/recovery using '-o logbsize=64k' and the mount
      fails on or warns about log checksum failures.
      
      To avoid this problem, update xlog_cksum() to calculate the checksum
      based on the size of the log buffer according to the log record. The
      size is already included in the h_size field of the log record header
      and thus is available at log recovery time. Extended log record headers
      are also only written when the log record is large enough to require
      them. This makes checksum calculation of log records consistent with the
      extended record header mechanism as well as how on-disk records are
      checksummed with various log buffer size mount options.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      a3f20014
    • B
      xfs: don't leave EFIs on AIL on mount failure · f0b2efad
      Brian Foster 提交于
      Log recovery occurs in two phases at mount time. In the first phase,
      EFIs and EFDs are processed and potentially cancelled out. EFIs without
      EFD objects are inserted into the AIL for processing and recovery in the
      second phase. xfs_mountfs() runs various other operations between the
      phases and is thus subject to failure. If failure occurs after the first
      phase but before the second, pending EFIs sit on the AIL, pin it and
      cause the mount to hang.
      
      Update the mount sequence to ensure that pending EFIs are cancelled in
      the event of failure. Add a recovery cancellation mechanism to iterate
      the AIL and cancel all EFI items when requested. Plumb cancellation
      support through the log mount finish helper and update xfs_mountfs() to
      invoke cancellation in the event of failure after recovery has started.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      f0b2efad
  13. 29 7月, 2015 1 次提交
  14. 22 6月, 2015 3 次提交
  15. 04 6月, 2015 1 次提交
  16. 22 1月, 2015 1 次提交
    • D
      xfs: consolidate superblock logging functions · 61e63ecb
      Dave Chinner 提交于
      We now have several superblock loggin functions that are identical
      except for the transaction reservation and whether it shoul dbe a
      synchronous transaction or not. Consolidate these all into a single
      function, a single reserveration and a sync flag and call it
      xfs_sync_sb().
      
      Also, xfs_mod_sb() is not really a modification function - it's the
      operation of logging the superblock buffer. hence change the name of
      it to reflect this.
      
      Note that we have to change the mp->m_update_flags that are passed
      around at mount time to a boolean simply to indicate a superblock
      update is needed.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      61e63ecb
  17. 24 12月, 2014 2 次提交
  18. 04 12月, 2014 1 次提交
    • B
      xfs: split metadata and log buffer completion to separate workqueues · b29c70f5
      Brian Foster 提交于
      XFS traditionally sends all buffer I/O completion work to a single
      workqueue. This includes metadata buffer completion and log buffer
      completion. The log buffer completion requires a high priority queue to
      prevent stalls due to log forces getting stuck behind other queued work.
      
      Rather than continue to prioritize all buffer I/O completion due to the
      needs of log completion, split log buffer completion off to
      m_log_workqueue and move the high priority flag from m_buf_workqueue to
      m_log_workqueue.
      
      Add a b_ioend_wq wq pointer to xfs_buf to allow completion workqueue
      customization on a per-buffer basis. Initialize b_ioend_wq to
      m_buf_workqueue by default in the generic buffer I/O submission path.
      Finally, override the default wq with the high priority m_log_workqueue
      in the log buffer I/O submission path.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      b29c70f5
  19. 28 11月, 2014 3 次提交
  20. 02 10月, 2014 3 次提交
    • D
      xfs: introduce xfs_buf_submit[_wait] · 595bff75
      Dave Chinner 提交于
      There is a lot of cookie-cutter code that looks like:
      
      	if (shutdown)
      		handle buffer error
      	xfs_buf_iorequest(bp)
      	error = xfs_buf_iowait(bp)
      	if (error)
      		handle buffer error
      
      spread through XFS. There's significant complexity now in
      xfs_buf_iorequest() to specifically handle this sort of synchronous
      IO pattern, but there's all sorts of nasty surprises in different
      error handling code dependent on who owns the buffer references and
      the locks.
      
      Pull this pattern into a single helper, where we can hide all the
      synchronous IO warts and hence make the error handling for all the
      callers much saner. This removes the need for a special extra
      reference to protect IO completion processing, as we can now hold a
      single reference across dispatch and waiting, simplifying the sync
      IO smeantics and error handling.
      
      In doing this, also rename xfs_buf_iorequest to xfs_buf_submit and
      make it explicitly handle on asynchronous IO. This forces all users
      to be switched specifically to one interface or the other and
      removes any ambiguity between how the interfaces are to be used. It
      also means that xfs_buf_iowait() goes away.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      595bff75
    • D
      xfs: xfs_buf_ioend and xfs_buf_iodone_work duplicate functionality · e8aaba9a
      Dave Chinner 提交于
      We do some work in xfs_buf_ioend, and some work in
      xfs_buf_iodone_work, but much of that functionality is the same.
      This work can all be done in a single function, leaving
      xfs_buf_iodone just a wrapper to determine if we should execute it
      by workqueue or directly. hence rename xfs_buf_iodone_work to
      xfs_buf_ioend(), and add a new xfs_buf_ioend_async() for places that
      need async processing.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      e8aaba9a
    • D
      xfs: force the log before shutting down · a870fe6d
      Dave Chinner 提交于
      When we have marked the filesystem for shutdown, we want to prevent
      any further buffer IO from being submitted. However, we currently
      force the log after marking the filesystem as shut down, hence
      allowing IO to the log *after* we have marked both the filesystem
      and the log as in an error state.
      
      Clean this up by forcing the log before we mark the filesytem with
      an error. This replaces the pure CIL flush that we currently have
      which works around this same issue (i.e the CIL can't be flushed
      once the shutdown flags are set) and hence enables us to clean up
      the logic substantially.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      a870fe6d
  21. 04 8月, 2014 1 次提交
    • D
      xfs: catch buffers written without verifiers attached · 400b9d88
      Dave Chinner 提交于
      We recently had a bug where buffers were slipping through log
      recovery without any verifier attached to them. This was resulting
      in on-disk CRC mismatches for valid data. Add some warning code to
      catch this occurrence so that we catch such bugs during development
      rather than not being aware they exist.
      
      Note that we cannot do this verification unconditionally as non-CRC
      filesystems don't always attach verifiers to the buffers being
      written. e.g. during log recovery we cannot identify all the
      different types of buffers correctly on non-CRC filesystems, so we
      can't attach the correct verifiers in all cases and so we don't
      attach any. Hence we don't want on non-CRC filesystems to avoid
      spamming the logs with false indications.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      400b9d88
  22. 15 7月, 2014 1 次提交
  23. 25 6月, 2014 1 次提交
    • D
      xfs: global error sign conversion · 2451337d
      Dave Chinner 提交于
      Convert all the errors the core XFs code to negative error signs
      like the rest of the kernel and remove all the sign conversion we
      do in the interface layers.
      
      Errors for conversion (and comparison) found via searches like:
      
      $ git grep " E" fs/xfs
      $ git grep "return E" fs/xfs
      $ git grep " E[A-Z].*;$" fs/xfs
      
      Negation points found via searches like:
      
      $ git grep "= -[a-z,A-Z]" fs/xfs
      $ git grep "return -[a-z,A-D,F-Z]" fs/xfs
      $ git grep " -[a-z].*;" fs/xfs
      
      [ with some bits I missed from Brian Foster ]
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      2451337d
  24. 22 6月, 2014 1 次提交
  25. 06 6月, 2014 1 次提交
  26. 07 5月, 2014 1 次提交
    • D
      xfs: don't sleep in xlog_cil_force_lsn on shutdown · ac983517
      Dave Chinner 提交于
      Reports of a shutdown hang when fsyncing a directory have surfaced,
      such as this:
      
      [ 3663.394472] Call Trace:
      [ 3663.397199]  [<ffffffff815f1889>] schedule+0x29/0x70
      [ 3663.402743]  [<ffffffffa01feda5>] xlog_cil_force_lsn+0x185/0x1a0 [xfs]
      [ 3663.416249]  [<ffffffffa01fd3af>] _xfs_log_force_lsn+0x6f/0x2f0 [xfs]
      [ 3663.429271]  [<ffffffffa01a339d>] xfs_dir_fsync+0x7d/0xe0 [xfs]
      [ 3663.435873]  [<ffffffff811df8c5>] do_fsync+0x65/0xa0
      [ 3663.441408]  [<ffffffff811dfbc0>] SyS_fsync+0x10/0x20
      [ 3663.447043]  [<ffffffff815fc7d9>] system_call_fastpath+0x16/0x1b
      
      If we trigger a shutdown in xlog_cil_push() from xlog_write(), we
      will never wake waiters on the current push sequence number, so
      anything waiting in xlog_cil_force_lsn() for that push sequence
      number to come up will not get woken and hence stall the shutdown.
      
      Fix this by ensuring we call wake_up_all(&cil->xc_commit_wait) in
      the push abort handling, in the log shutdown code when waking all
      waiters, and adding a shutdown check in the sequence completion wait
      loops to ensure they abort when a wakeup due to a shutdown occurs.
      Reported-by: NBoris Ranto <branto@redhat.com>
      Reported-by: NEric Sandeen <esandeen@redhat.com>
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      ac983517