1. 30 11月, 2012 1 次提交
    • D
      xfs: fix direct IO nested transaction deadlock. · 437a255a
      Dave Chinner 提交于
      The direct IO path can do a nested transaction reservation when
      writing past the EOF. The first transaction is the append
      transaction for setting the filesize at IO completion, but we can
      also need a transaction for allocation of blocks. If the log is low
      on space due to reservations and small log, the append transaction
      can be granted after wating for space as the only active transaction
      in the system. This then attempts a reservation for an allocation,
      which there isn't space in the log for, and the reservation sleeps.
      The result is that there is nothing left in the system to wake up
      all the processes waiting for log space to come free.
      
      The stack trace that shows this deadlock is relatively innocuous:
      
       xlog_grant_head_wait
       xlog_grant_head_check
       xfs_log_reserve
       xfs_trans_reserve
       xfs_iomap_write_direct
       __xfs_get_blocks
       xfs_get_blocks_direct
       do_blockdev_direct_IO
       __blockdev_direct_IO
       xfs_vm_direct_IO
       generic_file_direct_write
       xfs_file_dio_aio_writ
       xfs_file_aio_write
       do_sync_write
       vfs_write
      
      This was discovered on a filesystem with a log of only 10MB, and a
      log stripe unit of 256k whih increased the base reservations by
      512k. Hence a allocation transaction requires 1.2MB of log space to
      be available instead of only 260k, and so greatly increased the
      chance that there wouldn't be enough log space available for the
      nested transaction to succeed. The key to reproducing it is this
      mkfs command:
      
      mkfs.xfs -f -d agcount=16,su=256k,sw=12 -l su=256k,size=2560b $SCRATCH_DEV
      
      The test case was a 1000 fsstress processes running with random
      freeze and unfreezes every few seconds. Thanks to Eryu Guan
      (eguan@redhat.com) for writing the test that found this on a system
      with a somewhat unique default configuration....
      
      cc: <stable@vger.kernel.org>
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NAndrew Dahl <adahl@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      437a255a
  2. 20 11月, 2012 1 次提交
    • C
      xfs: add CRC checks to the log · 0e446be4
      Christoph Hellwig 提交于
      Implement CRCs for the log buffers.  We re-use a field in
      struct xlog_rec_header that was used for a weak checksum of the
      log buffer payload in debug builds before.
      
      The new checksumming uses the crc32c checksum we will use elsewhere
      in XFS, and also protects the record header and addition cycle data.
      
      Due to this there are some interesting changes in xlog_sync, as we
      need to do the cycle wrapping for the split buffer case much earlier,
      as we would touch the buffer after generating the checksum otherwise.
      
      The CRC calculation is always enabled, even for non-CRC filesystems,
      as adding this CRC does not change the log format. On non-CRC
      filesystems, only issue an alert if a CRC mismatch is found and
      allow recovery to continue - this will act as an indicator that
      log recovery problems are a result of log corruption. On CRC enabled
      filesystems, however, log recovery will fail.
      
      Note that existing debug kernels will write a simple checksum value
      to the log, so the first time this is run on a filesystem taht was
      last used on a debug kernel it will through CRC mismatch warning
      errors. These can be ignored.
      
      Initially based on a patch from Dave Chinner, then modified
      significantly by Christoph Hellwig.  Modified again by Dave Chinner
      to get to this version.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      0e446be4
  3. 16 11月, 2012 1 次提交
    • D
      xfs: make buffer read verication an IO completion function · c3f8fc73
      Dave Chinner 提交于
      Add a verifier function callback capability to the buffer read
      interfaces.  This will be used by the callers to supply a function
      that verifies the contents of the buffer when it is read from disk.
      This patch does not provide callback functions, but simply modifies
      the interfaces to allow them to be called.
      
      The reason for adding this to the read interfaces is that it is very
      difficult to tell fom the outside is a buffer was just read from
      disk or whether we just pulled it out of cache. Supplying a callbck
      allows the buffer cache to use it's internal knowledge of the buffer
      to execute it only when the buffer is read from disk.
      
      It is intended that the verifier functions will mark the buffer with
      an EFSCORRUPTED error when verification fails. This allows the
      reading context to distinguish a verification error from an IO
      error, and potentially take further actions on the buffer (e.g.
      attempt repair) based on the error reported.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NPhil White <pwhite@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      c3f8fc73
  4. 18 10月, 2012 5 次提交
    • D
      xfs: only update the last_sync_lsn when a transaction completes · d35e88fa
      Dave Chinner 提交于
      The log write code stamps each iclog with the current tail LSN in
      the iclog header so that recovery knows where to find the tail of
      thelog once it has found the head. Normally this is taken from the
      first item on the AIL - the log item that corresponds to the oldest
      active item in the log.
      
      The problem is that when the AIL is empty, the tail lsn is dervied
      from the the l_last_sync_lsn, which is the LSN of the last iclog to
      be written to the log. In most cases this doesn't happen, because
      the AIL is rarely empty on an active filesystem. However, when it
      does, it opens up an interesting case when the transaction being
      committed to the iclog spans multiple iclogs.
      
      That is, the first iclog is stamped with the l_last_sync_lsn, and IO
      is issued. Then the next iclog is setup, the changes copied into the
      iclog (takes some time), and then the l_last_sync_lsn is stamped
      into the header and IO is issued. This is still the same
      transaction, so the tail lsn of both iclogs must be the same for log
      recovery to find the entire transaction to be able to replay it.
      
      The problem arises in that the iclog buffer IO completion updates
      the l_last_sync_lsn with it's own LSN. Therefore, If the first iclog
      completes it's IO before the second iclog is filled and has the tail
      lsn stamped in it, it will stamp the LSN of the first iclog into
      it's tail lsn field. If the system fails at this point, log recovery
      will not see a complete transaction, so the transaction will no be
      replayed.
      
      The fix is simple - the l_last_sync_lsn is updated when a iclog
      buffer IO completes, and this is incorrect. The l_last_sync_lsn
      shoul dbe updated when a transaction is completed by a iclog buffer
      IO. That is, only iclog buffers that have transaction commit
      callbacks attached to them should update the l_last_sync_lsn. This
      means that the last_sync_lsn will only move forward when a commit
      record it written, not in the middle of a large transaction that is
      rolling through multiple iclog buffers.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      d35e88fa
    • D
      xfs: xfs_quiesce_attr() should quiesce the log like unmount · c75921a7
      Dave Chinner 提交于
      xfs_quiesce_attr() is supposed to leave the log empty with an
      unmount record written. Right now it does not wait for the AIL to be
      emptied before writing the unmount record, not does it wait for
      metadata IO completion, either. Fix it to use the same method and
      code as xfs_log_unmount().
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      c75921a7
    • D
      xfs: syncd workqueue is no more · 5889608d
      Dave Chinner 提交于
      With the syncd functions moved to the log and/or removed, the syncd
      workqueue is the only remaining bit left. It is used by the log
      covering/ail pushing work, as well as by the inode reclaim work.
      
      Given how cheap workqueues are these days, give the log and inode
      reclaim work their own work queues and kill the syncd work queue.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      5889608d
    • D
      xfs: Bring some sanity to log unmounting · cf2931db
      Dave Chinner 提交于
      When unmounting the filesystem, there are lots of operations that
      need to be done in a specific order, and they are spread across
      across a couple of functions. We have to drain the AIL before we
      write the unmount record, and we have to shut down the background
      log work before we do either of them.
      
      But this is all split haphazardly across xfs_unmountfs() and
      xfs_log_unmount(). Move all the AIL flushing and log manipulations
      to xfs_log_unmount() so that the responisbilities of each function
      is clear and the operations they perform obvious.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      cf2931db
    • D
      xfs: sync work is now only periodic log work · f661f1e0
      Dave Chinner 提交于
      The only thing the periodic sync work does now is flush the AIL and
      idle the log. These are really functions of the log code, so move
      the work to xfs_log.c and rename it appropriately.
      
      The only wart that this leaves behind is the xfssyncd_centisecs
      sysctl, otherwise the xfssyncd is dead. Clean up any comments that
      related to xfssyncd to reflect it's passing.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      f661f1e0
  5. 22 6月, 2012 5 次提交
  6. 30 5月, 2012 1 次提交
  7. 21 5月, 2012 1 次提交
    • D
      xfs: add trace points for log forces · 14c26c6a
      Dave Chinner 提交于
      To enable easy tracing of the location of log forces and the
      frequency of them via perf, add a pair of trace points to the log
      force functions.  This will help debug where excessive log forces
      are being issued from by simple perf commands like:
      
      # ~/perf/perf top -e xfs:xfs_log_force -G -U
      
      Which gives this sort of output:
      
      Events: 141  xfs:xfs_log_force
      -  100.00%  [kernel]  [k] xfs_log_force
         - xfs_log_force
              87.04% xfsaild
                 kthread
                 kernel_thread_helper
            - 12.87% xfs_buf_lock
                 _xfs_buf_find
                 xfs_buf_get
                 xfs_trans_get_buf
                 xfs_da_do_buf
                 xfs_da_get_buf
                 xfs_dir2_data_init
                 xfs_dir2_leaf_addname
                 xfs_dir_createname
                 xfs_create
                 xfs_vn_mknod
                 xfs_vn_create
                 vfs_create
                 do_last.isra.41
                 path_openat
                 do_filp_open
                 do_sys_open
                 sys_open
                 system_call_fastpath
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sig.com>
      14c26c6a
  8. 15 5月, 2012 7 次提交
  9. 27 3月, 2012 1 次提交
    • D
      xfs: Account log unmount transaction correctly · 3948659e
      Dave Chinner 提交于
      There have been a few reports of this warning appearing recently:
      
      XFS (dm-4): xlog_space_left: head behind tail
       tail_cycle = 129, tail_bytes = 20163072
       GH   cycle = 129, GH   bytes = 20162880
      
      The common cause appears to be lots of freeze and unfreeze cycles,
      and the output from the warnings indicates that we are leaking
      around 8 bytes of log space per freeze/unfreeze cycle.
      
      When we freeze the filesystem, we write an unmount record and that
      uses xlog_write directly - a special type of transaction,
      effectively. What it doesn't do, however, is correctly account for
      the log space it uses. The unmount record writes an 8 byte structure
      with a special magic number into the log, and the space this
      consumes is not accounted for in the log ticket tracking the
      operation. Hence we leak 8 bytes every unmount record that is
      written.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      3948659e
  10. 23 2月, 2012 11 次提交
  11. 09 12月, 2011 1 次提交
  12. 07 12月, 2011 1 次提交
    • C
      xfs: fix the logspace waiting algorithm · 9f9c19ec
      Christoph Hellwig 提交于
      Apply the scheme used in log_regrant_write_log_space to wake up any other
      threads waiting for log space before the newly added one to
      log_regrant_write_log_space as well, and factor the code into readable
      helpers.  For each of the queues we have add two helpers:
      
       - one to try to wake up all waiting threads.  This helper will also be
         usable by xfs_log_move_tail once we remove the current opportunistic
         wakeups in it.
       - one to sleep on t_wait until enough log space is available, loosely
         modelled after Linux waitqueues.
       
      And use them to reimplement the guts of log_regrant_write_log_space and
      log_regrant_write_log_space.  These two function now use one and the same
      algorithm for waiting on log space instead of subtly different ones before,
      with an option to completely unify them in the near future.
      
      Also move the filesystem shutdown handling to the common caller given
      that we had to touch it anyway.
      
      Based on hard debugging and an earlier patch from
      Chandra Seetharaman <sekharan@us.ibm.com>.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NChandra Seetharaman <sekharan@us.ibm.com>
      Tested-by: NChandra Seetharaman <sekharan@us.ibm.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      9f9c19ec
  13. 09 11月, 2011 1 次提交
  14. 12 10月, 2011 3 次提交