1. 12 3月, 2018 1 次提交
  2. 29 1月, 2018 2 次提交
  3. 13 1月, 2018 2 次提交
  4. 09 1月, 2018 1 次提交
    • B
      xfs: print transaction log reservation on overrun · 2c8f6265
      Brian Foster 提交于
      The transaction dump code displays the content and reservation
      consumption of a particular transaction in the event of an overrun.
      It currently displays the reservation associated with the
      transaction ticket, but not the original reservation attached to the
      transaction.
      
      The latter value reflects the original transaction reservation
      calculation before additional reservation overhead is assigned, such
      as for the CIL context header and potential split region headers.
      
      Update xlog_print_trans() to also print the original transaction
      reservation in the event of overrun. This provides a reference point
      to identify how much reservation overhead was added to a particular
      ticket by xfs_log_calc_unit_res().
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      2c8f6265
  5. 28 11月, 2017 1 次提交
    • L
      Rename superblock flags (MS_xyz -> SB_xyz) · 1751e8a6
      Linus Torvalds 提交于
      This is a pure automated search-and-replace of the internal kernel
      superblock flags.
      
      The s_flags are now called SB_*, with the names and the values for the
      moment mirroring the MS_* flags that they're equivalent to.
      
      Note how the MS_xyz flags are the ones passed to the mount system call,
      while the SB_xyz flags are what we then use in sb->s_flags.
      
      The script to do this was:
      
          # places to look in; re security/*: it generally should *not* be
          # touched (that stuff parses mount(2) arguments directly), but
          # there are two places where we really deal with superblock flags.
          FILES="drivers/mtd drivers/staging/lustre fs ipc mm \
                  include/linux/fs.h include/uapi/linux/bfs_fs.h \
                  security/apparmor/apparmorfs.c security/apparmor/include/lib.h"
          # the list of MS_... constants
          SYMS="RDONLY NOSUID NODEV NOEXEC SYNCHRONOUS REMOUNT MANDLOCK \
                DIRSYNC NOATIME NODIRATIME BIND MOVE REC VERBOSE SILENT \
                POSIXACL UNBINDABLE PRIVATE SLAVE SHARED RELATIME KERNMOUNT \
                I_VERSION STRICTATIME LAZYTIME SUBMOUNT NOREMOTELOCK NOSEC BORN \
                ACTIVE NOUSER"
      
          SED_PROG=
          for i in $SYMS; do SED_PROG="$SED_PROG -e s/MS_$i/SB_$i/g"; done
      
          # we want files that contain at least one of MS_...,
          # with fs/namespace.c and fs/pnode.c excluded.
          L=$(for i in $SYMS; do git grep -w -l MS_$i $FILES; done| sort|uniq|grep -v '^fs/namespace.c'|grep -v '^fs/pnode.c')
      
          for f in $L; do sed -i $f $SED_PROG; done
      Requested-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1751e8a6
  6. 07 11月, 2017 1 次提交
  7. 02 11月, 2017 1 次提交
  8. 27 10月, 2017 2 次提交
    • D
      xfs: validate sb_logsunit is a multiple of the fs blocksize · 9c92ee20
      Darrick J. Wong 提交于
      Make sure the log stripe unit is sane before proceeding with mounting.
      AFAICT this means that logsunit has to be 0, 1, or a multiple of the fs
      block size.  Found this by setting the LSB of logsunit in xfs/350 and
      watching the system crash as soon as we try to write to the log.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      9c92ee20
    • B
      xfs: drain the buffer LRU on mount · f1b92bbc
      Brian Foster 提交于
      Log recovery of v4 filesystems does not use buffer verifiers because
      log recovery historically can result in transient buffer corruption
      when target buffers might be ahead of the log after a crash. v5
      filesystems work around this problem with metadata LSN ordering.
      
      While this log recovery verifier behavior is necessary on v4 supers,
      it can result in leaving buffers around in the LRU without verifiers
      attached for a significant amount of time. This leads to use of
      unverified buffers while the filesystem is in active use, long after
      recovery has completed.
      
      To address this problem, drain all buffers from the LRU as a final
      step of the log mount sequence. Note that this is done
      unconditionally to provide a consistently clean cache footprint,
      regardless of superblock version or log state. As a side effect,
      this ensures that all cache resident, unverified buffers are
      reclaimed after log recovery and therefore must be recreated with
      verifiers on subsequent use.
      Reported-by: NDarrick Wong <darrick.wong@oracle.com>
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      f1b92bbc
  9. 12 10月, 2017 1 次提交
  10. 02 9月, 2017 2 次提交
    • A
      xfs: fix incorrect log_flushed on fsync · 47c7d0b1
      Amir Goldstein 提交于
      When calling into _xfs_log_force{,_lsn}() with a pointer
      to log_flushed variable, log_flushed will be set to 1 if:
      1. xlog_sync() is called to flush the active log buffer
      AND/OR
      2. xlog_wait() is called to wait on a syncing log buffers
      
      xfs_file_fsync() checks the value of log_flushed after
      _xfs_log_force_lsn() call to optimize away an explicit
      PREFLUSH request to the data block device after writing
      out all the file's pages to disk.
      
      This optimization is incorrect in the following sequence of events:
      
       Task A                    Task B
       -------------------------------------------------------
       xfs_file_fsync()
         _xfs_log_force_lsn()
           xlog_sync()
              [submit PREFLUSH]
                                 xfs_file_fsync()
                                   file_write_and_wait_range()
                                     [submit WRITE X]
                                     [endio  WRITE X]
                                   _xfs_log_force_lsn()
                                     xlog_wait()
              [endio  PREFLUSH]
      
      The write X is not guarantied to be on persistent storage
      when PREFLUSH request in completed, because write A was submitted
      after the PREFLUSH request, but xfs_file_fsync() of task A will
      be notified of log_flushed=1 and will skip explicit flush.
      
      If the system crashes after fsync of task A, write X may not be
      present on disk after reboot.
      
      This bug was discovered and demonstrated using Josef Bacik's
      dm-log-writes target, which can be used to record block io operations
      and then replay a subset of these operations onto the target device.
      The test goes something like this:
      - Use fsx to execute ops of a file and record ops on log device
      - Every now and then fsync the file, store md5 of file and mark
        the location in the log
      - Then replay log onto device for each mark, mount fs and compare
        md5 of file to stored value
      
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Josef Bacik <jbacik@fb.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      47c7d0b1
    • D
      xfs: evict all inodes involved with log redo item · 799ea9e9
      Darrick J. Wong 提交于
      When we introduced the bmap redo log items, we set MS_ACTIVE on the
      mountpoint and XFS_IRECOVERY on the inode to prevent unlinked inodes
      from being truncated prematurely during log recovery.  This also had the
      effect of putting linked inodes on the lru instead of evicting them.
      
      Unfortunately, we neglected to find all those unreferenced lru inodes
      and evict them after finishing log recovery, which means that we leak
      them if anything goes wrong in the rest of xfs_mountfs, because the lru
      is only cleaned out on unmount.
      
      Therefore, evict unreferenced inodes in the lru list immediately
      after clearing MS_ACTIVE.
      
      Fixes: 17c12bcd ("xfs: when replaying bmap operations, don't let unlinked inodes get reaped")
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Cc: viro@ZenIV.linux.org.uk
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      799ea9e9
  11. 23 8月, 2017 2 次提交
  12. 18 8月, 2017 1 次提交
    • D
      xfs: clear MS_ACTIVE after finishing log recovery · 8204f8dd
      Darrick J. Wong 提交于
      Way back when we established inode block-map redo log items, it was
      discovered that we needed to prevent the VFS from evicting inodes during
      log recovery because any given inode might be have bmap redo items to
      replay even if the inode has no link count and is ultimately deleted,
      and any eviction of an unlinked inode causes the inode to be truncated
      and freed too early.
      
      To make this possible, we set MS_ACTIVE so that inodes would not be torn
      down immediately upon release.  Unfortunately, this also results in the
      quota inodes not being released at all if a later part of the mount
      process should fail, because we never reclaim the inodes.  So, set
      MS_ACTIVE right before we do the last part of log recovery and clear it
      immediately after we finish the log recovery so that everything
      will be torn down properly if we abort the mount.
      
      Fixes: 17c12bcd ("xfs: when replaying bmap operations, don't let unlinked inodes get reaped")
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      8204f8dd
  13. 28 6月, 2017 3 次提交
  14. 20 6月, 2017 1 次提交
    • D
      xfs: remove double-underscore integer types · c8ce540d
      Darrick J. Wong 提交于
      This is a purely mechanical patch that removes the private
      __{u,}int{8,16,32,64}_t typedefs in favor of using the system
      {u,}int{8,16,32,64}_t typedefs.  This is the sed script used to perform
      the transformation and fix the resulting whitespace and indentation
      errors:
      
      s/typedef\t__uint8_t/typedef __uint8_t\t/g
      s/typedef\t__uint/typedef __uint/g
      s/typedef\t__int\([0-9]*\)_t/typedef int\1_t\t/g
      s/__uint8_t\t/__uint8_t\t\t/g
      s/__uint/uint/g
      s/__int\([0-9]*\)_t\t/__int\1_t\t\t/g
      s/__int/int/g
      /^typedef.*int[0-9]*_t;$/d
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      c8ce540d
  15. 19 6月, 2017 2 次提交
    • B
      xfs: dump transaction usage details on log reservation overrun · d4ca1d55
      Brian Foster 提交于
      If a transaction log reservation overrun occurs, the ticket data
      associated with the reservation is dumped in xfs_log_commit_cil().
      This occurs long after the transaction items and details have been
      removed from the transaction and effectively lost. This limited set
      of ticket data provides very little information to support debugging
      transaction overruns based on the typical report.
      
      To improve transaction log reservation overrun reporting, create a
      helper to dump transaction details such as log items, log vector
      data, etc., as well as the underlying ticket data for the
      transaction. Move the overrun detection from xfs_log_commit_cil() to
      xlog_cil_insert_items() so it occurs prior to migration of the
      logged items to the CIL. Call the new helper such that it is able to
      dump this transaction data before it is lost.
      
      Also, warn on overrun to provide callstack context for the offending
      transaction and include a few additional messages from
      xlog_cil_insert_items() to display the reservation consumed locally
      for overhead such as log vector headers, split region headers and
      the context ticket. This provides a complete general breakdown of
      the reservation consumption of a transaction when/if it happens to
      overrun the reservation.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      d4ca1d55
    • B
      xfs: separate shutdown from ticket reservation print helper · 7d2d5653
      Brian Foster 提交于
      xlog_print_tic_res() pre-dates delayed logging and the committed
      items list (CIL) and thus retains some factoring warts, such as hard
      coded function names in the output and the fact that it induces a
      shutdown.
      
      In preparation for more detailed logging of regular transaction
      overrun situations, refactor xlog_print_tic_res() to be slightly
      more generic. Reword some of the warning messages and pull the
      shutdown into the callers.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      7d2d5653
  16. 26 4月, 2017 1 次提交
  17. 04 4月, 2017 1 次提交
    • B
      xfs: use dedicated log worker wq to avoid deadlock with cil wq · 696a5620
      Brian Foster 提交于
      The log covering background task used to be part of the xfssyncd
      workqueue. That workqueue was removed as of commit 5889608d ("xfs:
      syncd workqueue is no more") and the associated work item scheduled
      to the xfs-log wq. The latter is used for log buffer I/O completion.
      
      Since xfs_log_worker() can invoke a log flush, a deadlock is
      possible between the xfs-log and xfs-cil workqueues. Consider the
      following codepath from xfs_log_worker():
      
      xfs_log_worker()
        xfs_log_force()
          _xfs_log_force()
            xlog_cil_force()
              xlog_cil_force_lsn()
                xlog_cil_push_now()
                  flush_work()
      
      The above is in xfs-log wq context and blocked waiting on the
      completion of an xfs-cil work item. Concurrently, the cil push in
      progress can end up blocked here:
      
      xlog_cil_push_work()
        xlog_cil_push()
          xlog_write()
            xlog_state_get_iclog_space()
              xlog_wait(&log->l_flush_wait, ...)
      
      The above is in xfs-cil context waiting on log buffer I/O
      completion, which executes in xfs-log wq context. In this scenario
      both workqueues are deadlocked waiting on eachother.
      
      Add a new workqueue specifically for the high level log covering and
      ail pushing worker, as was the case prior to commit 5889608d.
      Diagnosed-by: NDavid Jeffery <djeffery@redhat.com>
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      696a5620
  18. 10 1月, 2017 1 次提交
  19. 09 12月, 2016 1 次提交
  20. 05 12月, 2016 1 次提交
    • D
      xfs: optimise CRC updates · cae028df
      Dave Chinner 提交于
      Nick Piggin reported that the CRC overhead in an fsync heavy
      workload was higher than expected on a Power8 machine. Part of this
      was to do with the fact that the power8 CRC implementation is not
      efficient for CRC lengths of less than 512 bytes, and so the way we
      split the CRCs over the CRC field means a lot of the CRCs are
      reduced to being less than than optimal size.
      
      To optimise this, change the CRC update mechanism to zero the CRC
      field first, and then compute the CRC in one pass over the buffer
      and write the result back into the buffer. We can do this safely
      because anything writing a CRC has exclusive access to the buffer
      the CRC is being calculated over.
      
      We leave the CRC verify code the same - it still splits the CRC
      calculation - because we do not want read-only operations modifying
      the underlying buffer. This is because read-only operations may not
      have an exclusive access to the buffer guaranteed, and so temporary
      modifications could leak out to to other processes accessing the
      buffer concurrently.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      cae028df
  21. 20 7月, 2016 1 次提交
  22. 01 6月, 2016 1 次提交
  23. 06 4月, 2016 2 次提交
  24. 07 3月, 2016 1 次提交
  25. 10 2月, 2016 4 次提交
  26. 05 1月, 2016 1 次提交
    • B
      xfs: debug mode log record crc error injection · 609adfc2
      Brian Foster 提交于
      XFS now uses CRC verification over a limited section of the log to
      detect torn writes prior to a crash. This is difficult to test directly
      due to the timing and hardware requirements to cause a short write.
      
      Add a mechanism to inject CRC errors into log records to facilitate
      testing torn write detection during log recovery. This mechanism is
      dangerous and can result in filesystem corruption. Thus, it is only
      available in DEBUG mode for testing/development purposes. Set a non-zero
      value to the following sysfs entry to enable error injection:
      
      	/sys/fs/xfs/<dev>/log/log_badcrc_factor
      
      Once enabled, XFS intentionally writes an invalid CRC to a log record at
      some random point in the future based on the provided frequency. The
      filesystem immediately shuts down once the record has been written to
      the physical log to prevent metadata writeback (e.g., AIL insertion)
      once the log write completes. This helps reasonably simulate a torn
      write to the log as the affected record must be safe to discard. The
      next mount after the intentional shutdown requires log recovery and
      should detect and recover from the torn write.
      
      Note again that this _will_ result in data loss or worse. For testing
      and development purposes only!
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      
      609adfc2
  27. 04 1月, 2016 1 次提交
  28. 12 10月, 2015 1 次提交
    • B
      xfs: per-filesystem stats counter implementation · ff6d6af2
      Bill O'Donnell 提交于
      This patch modifies the stats counting macros and the callers
      to those macros to properly increment, decrement, and add-to
      the xfs stats counts. The counts for global and per-fs stats
      are correctly advanced, and cleared by writing a "1" to the
      corresponding clear file.
      
      global counts: /sys/fs/xfs/stats/stats
      per-fs counts: /sys/fs/xfs/sda*/stats/stats
      
      global clear:  /sys/fs/xfs/stats/stats_clear
      per-fs clear:  /sys/fs/xfs/sda*/stats/stats_clear
      
      [dchinner: cleaned up macro variables, removed CONFIG_FS_PROC around
       stats structures and macros. ]
      Signed-off-by: NBill O'Donnell <billodo@redhat.com>
      Reviewed-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      ff6d6af2