1. 02 10月, 2014 9 次提交
    • D
      xfs: introduce xfs_buf_submit[_wait] · 595bff75
      Dave Chinner 提交于
      There is a lot of cookie-cutter code that looks like:
      
      	if (shutdown)
      		handle buffer error
      	xfs_buf_iorequest(bp)
      	error = xfs_buf_iowait(bp)
      	if (error)
      		handle buffer error
      
      spread through XFS. There's significant complexity now in
      xfs_buf_iorequest() to specifically handle this sort of synchronous
      IO pattern, but there's all sorts of nasty surprises in different
      error handling code dependent on who owns the buffer references and
      the locks.
      
      Pull this pattern into a single helper, where we can hide all the
      synchronous IO warts and hence make the error handling for all the
      callers much saner. This removes the need for a special extra
      reference to protect IO completion processing, as we can now hold a
      single reference across dispatch and waiting, simplifying the sync
      IO smeantics and error handling.
      
      In doing this, also rename xfs_buf_iorequest to xfs_buf_submit and
      make it explicitly handle on asynchronous IO. This forces all users
      to be switched specifically to one interface or the other and
      removes any ambiguity between how the interfaces are to be used. It
      also means that xfs_buf_iowait() goes away.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      595bff75
    • D
      xfs: kill xfs_bioerror_relse · 8b131973
      Dave Chinner 提交于
      There is only one caller now - xfs_trans_read_buf_map() - and it has
      very well defined call semantics - read, synchronous, and b_iodone
      is NULL. Hence it's pretty clear what error handling is necessary
      for this case. The bigger problem of untangling
      xfs_trans_read_buf_map error handling is left to a future patch.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      8b131973
    • D
      xfs: xfs_bioerror can die. · 27187754
      Dave Chinner 提交于
      Internal buffer write error handling is a mess due to the unnatural
      split between xfs_bioerror and xfs_bioerror_relse().
      
      xfs_bwrite() only does sync IO and determines the handler to
      call based on b_iodone, so for this caller the only difference
      between xfs_bioerror() and xfs_bioerror_release() is the XBF_DONE
      flag. We don't care what the XBF_DONE flag state is because we stale
      the buffer in both paths - the next buffer lookup will clear
      XBF_DONE because XBF_STALE is set. Hence we can use common
      error handling for xfs_bwrite().
      
      __xfs_buf_delwri_submit() is a similar - it's only ever called
      on writes - all sync or async - and again there's no reason to
      handle them any differently at all.
      
      Clean up the nasty error handling and remove xfs_bioerror().
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      27187754
    • D
      xfs: kill xfs_bdstrat_cb · 8dac3921
      Dave Chinner 提交于
      Only has two callers, and is just a shutdown check and error handler
      around xfs_buf_iorequest. However, the error handling is a mess of
      read and write semantics, and both internal callers only call it for
      writes. Hence kill the wrapper, and follow up with a patch to
      sanitise the error handling.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      
      8dac3921
    • D
      xfs: rework xfs_buf_bio_endio error handling · 61be9c52
      Dave Chinner 提交于
      Currently the report of a bio error from completion
      immediately marks the buffer with an error. The issue is that this
      is racy w.r.t. synchronous IO - the submitter can see b_error being
      set before the IO is complete, and hence we cannot differentiate
      between submission failures and completion failures.
      
      Add an internal b_io_error field protected by the b_lock to catch IO
      completion errors, and only propagate that to the buffer during
      final IO completion handling. Hence we can tell in xfs_buf_iorequest
      if we've had a submission failure bey checking bp->b_error before
      dropping our b_io_remaining reference - that reference will prevent
      b_io_error values from being propagated to b_error in the event that
      completion races with submission.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      61be9c52
    • D
      xfs: xfs_buf_ioend and xfs_buf_iodone_work duplicate functionality · e8aaba9a
      Dave Chinner 提交于
      We do some work in xfs_buf_ioend, and some work in
      xfs_buf_iodone_work, but much of that functionality is the same.
      This work can all be done in a single function, leaving
      xfs_buf_iodone just a wrapper to determine if we should execute it
      by workqueue or directly. hence rename xfs_buf_iodone_work to
      xfs_buf_ioend(), and add a new xfs_buf_ioend_async() for places that
      need async processing.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      e8aaba9a
    • D
      xfs: synchronous buffer IO needs a reference · e11bb805
      Dave Chinner 提交于
      When synchronous IO runs IO completion work, it does so without an
      IO reference or a hold reference on the buffer. The IO "hold
      reference" is owned by the submitter, and released when the
      submission is complete. The IO reference is released when both the
      submitter and the bio end_io processing is run, and so if the io
      completion work is run from IO completion context, it is run without
      an IO reference.
      
      Hence we can get the situation where the submitter can submit the
      IO, see an error on the buffer and unlock and free the buffer while
      there is still IO in progress. This leads to use-after-free and
      memory corruption.
      
      Fix this by taking a "sync IO hold" reference that is owned by the
      IO and not released until after the buffer completion calls are run
      to wake up synchronous waiters. This means that the buffer will not
      be freed in any circumstance until all IO processing is completed.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      
      e11bb805
    • D
      xfs: Don't use xfs_buf_iowait in the delwri buffer code · cf53e99d
      Dave Chinner 提交于
      For the special case of delwri buffer submission and waiting, we
      don't need to issue IO synchronously at all. The second pass to call
      xfs_buf_iowait() can be replaced with  blocking on xfs_buf_lock() -
      the buffer will be unlocked when the async IO is complete.
      
      This formalises a sane the method of waiting for async IO - take an
      extra reference, submit the IO, call xfs_buf_lock() when you want to
      wait for IO completion. i.e.:
      
      	bp = xfs_buf_find();
      	xfs_buf_hold(bp);
      	bp->b_flags |= XBF_ASYNC;
      	xfs_buf_iosubmit(bp);
      	xfs_buf_lock(bp)
      	error = bp->b_error;
      	....
      	xfs_buf_relse(bp);
      
      While this is somewhat racy for gathering IO errors, none of the
      code that calls xfs_buf_delwri_submit() will race against other
      users of the buffers being submitted. Even if they do, we don't
      really care if the error is detected by the delwri code or the user
      we raced against. Either way, the error will be detected and
      handled.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      cf53e99d
    • D
      xfs: force the log before shutting down · a870fe6d
      Dave Chinner 提交于
      When we have marked the filesystem for shutdown, we want to prevent
      any further buffer IO from being submitted. However, we currently
      force the log after marking the filesystem as shut down, hence
      allowing IO to the log *after* we have marked both the filesystem
      and the log as in an error state.
      
      Clean this up by forcing the log before we mark the filesytem with
      an error. This replaces the pure CIL flush that we currently have
      which works around this same issue (i.e the CIL can't be flushed
      once the shutdown flags are set) and hence enables us to clean up
      the logic substantially.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      a870fe6d
  2. 04 8月, 2014 14 次提交
    • K
      xfs: fix coccinelle warnings · 6eee8972
      kbuild test robot 提交于
      Removes unneeded semicolon, introduced by commit a70a4fa5 ("xfs: fix
      a couple error sequence jumps in xfs_mountfs"):
      
      fs/xfs/xfs_mount.c:858:24-25: Unneeded semicolon
      
      Generated by: scripts/coccinelle/misc/semicolon.cocci
      Signed-off-by: NFengguang Wu <fengguang.wu@intel.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      6eee8972
    • D
      xfs: flush both inodes in xfs_swap_extents · 4ef897a2
      Dave Chinner 提交于
      We need to treat both inodes identically from a page cache point of
      view when prepareing them for extent swapping. We don't do this
      right now - we assume that one of the inodes empty, because that's
      what xfs_fsr currently does. Remove this assumption from the code.
      
      While factoring out the flushing and related checks, move the
      transactions reservation to immeidately after the flushes so that we
      don't need to pick up and then drop the ilock to do the transaction
      reservation. There are no issues with aborting the transaction it if
      the checks fail before we join the inodes to the transaction and
      dirty them, so this is a safe change to make.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      4ef897a2
    • D
      xfs: fix swapext ilock deadlock · 81217683
      Dave Chinner 提交于
      xfs_swap_extents() holds the ilock over a call to
      filemap_write_and_wait(), which can then try to write data and take
      the ilock. That causes a self-deadlock.
      
      Fix the deadlock and clean up the code by separating the locking
      appropriately. Add a lockflags variable to track what locks we are
      holding as we gain and drop them and cleanup the error handling to
      always use "out_unlock" with the lockflags variable.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      81217683
    • D
      xfs: kill xfs_vnode.h · b92cc59f
      Dave Chinner 提交于
      Move the IO flag definitions to xfs_inode.h and kill the header file
      as it is now empty.
      
      Removing the xfs_vnode.h file showed up an implicit header include
      path:
      	xfs_linux.h -> xfs_vnode.h -> xfs_fs.h
      
      And so every xfs header file has been inplicitly been including
      xfs_fs.h where it is needed or not. Hence the removal of xfs_vnode.h
      causes all sorts of build issues because BBTOB() and friends are no
      longer automatically included in the build. This also gets fixed.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      b92cc59f
    • D
      xfs: kill VN_MAPPED · dd8c38ba
      Dave Chinner 提交于
      Only one user, no longer needed.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      dd8c38ba
    • D
      xfs: kill VN_CACHED · 2667c6f9
      Dave Chinner 提交于
      Only has 2 users, has outlived it's usefulness.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      2667c6f9
    • D
      xfs: kill VN_DIRTY() · eac152b4
      Dave Chinner 提交于
      Only one user of the macro and the dirty mapping check is redundant
      so just get rid of it.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      eac152b4
    • D
      xfs: dquot recovery needs verifiers · ad3714b8
      Dave Chinner 提交于
      dquot recovery should add verifiers to the dquot buffers that it
      recovers changes into. Unfortunately, it doesn't attached the
      verifiers to the buffers in a consistent manner. For example,
      xlog_recover_dquot_pass2() reads dquot buffers without a verifier
      and then writes it without ever having attached a verifier to the
      buffer.
      
      Further, dquot buffer recovery may write a dquot buffer that has not
      been modified, or indeed, shoul dbe written because quotas are not
      enabled and hence changes to the buffer were not replayed. In this
      case, we again write buffers without verifiers attached because that
      doesn't happen until after the buffer changes have been replayed.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      ad3714b8
    • D
      xfs: quotacheck leaves dquot buffers without verifiers · 5fd364fe
      Dave Chinner 提交于
      When running xfs/305, I noticed that quotacheck was flushing dquot
      buffers that did not have the xfs_dquot_buf_ops verifiers attached:
      
      XFS (vdb): _xfs_buf_ioapply: no ops on block 0x1dc8/0x1dc8
      ffff880052489000: 44 51 01 04 00 00 65 b8 00 00 00 00 00 00 00 00  DQ....e.........
      ffff880052489010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
      ffff880052489020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
      ffff880052489030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
      CPU: 1 PID: 2376 Comm: mount Not tainted 3.16.0-rc2-dgc+ #306
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
       ffff88006fe38000 ffff88004a0ffae8 ffffffff81cf1cca 0000000000000001
       ffff88004a0ffb88 ffffffff814d50ca 000010004a0ffc70 0000000000000000
       ffff88006be56dc4 0000000000000021 0000000000001dc8 ffff88007c773d80
      Call Trace:
       [<ffffffff81cf1cca>] dump_stack+0x45/0x56
       [<ffffffff814d50ca>] _xfs_buf_ioapply+0x3ca/0x3d0
       [<ffffffff810db520>] ? wake_up_state+0x20/0x20
       [<ffffffff814d51f5>] ? xfs_bdstrat_cb+0x55/0xb0
       [<ffffffff814d513b>] xfs_buf_iorequest+0x6b/0xd0
       [<ffffffff814d51f5>] xfs_bdstrat_cb+0x55/0xb0
       [<ffffffff814d53ab>] __xfs_buf_delwri_submit+0x15b/0x220
       [<ffffffff814d6040>] ? xfs_buf_delwri_submit+0x30/0x90
       [<ffffffff814d6040>] xfs_buf_delwri_submit+0x30/0x90
       [<ffffffff8150f89d>] xfs_qm_quotacheck+0x17d/0x3c0
       [<ffffffff81510591>] xfs_qm_mount_quotas+0x151/0x1e0
       [<ffffffff814ed01c>] xfs_mountfs+0x56c/0x7d0
       [<ffffffff814f0f12>] xfs_fs_fill_super+0x2c2/0x340
       [<ffffffff811c9fe4>] mount_bdev+0x194/0x1d0
       [<ffffffff814f0c50>] ? xfs_finish_flags+0x170/0x170
       [<ffffffff814ef0f5>] xfs_fs_mount+0x15/0x20
       [<ffffffff811ca8c9>] mount_fs+0x39/0x1b0
       [<ffffffff811e4d67>] vfs_kern_mount+0x67/0x120
       [<ffffffff811e757e>] do_mount+0x23e/0xad0
       [<ffffffff8117abde>] ? __get_free_pages+0xe/0x50
       [<ffffffff811e71e6>] ? copy_mount_options+0x36/0x150
       [<ffffffff811e8103>] SyS_mount+0x83/0xc0
       [<ffffffff81cfd40b>] tracesys+0xdd/0xe2
      
      This was caused by dquot buffer readahead not attaching a verifier
      structure to the buffer when readahead was issued, resulting in the
      followup read of the buffer finding a valid buffer and so not
      attaching new verifiers to the buffer as part of the read.
      
      Also, when a verifier failure occurs, we then read the buffer
      without verifiers. Attach the verifiers manually after this read so
      that if the buffer is then written it will be verified that the
      corruption has been repaired.
      
      Further, when flushing a dquot we don't ask for a verifier when
      reading in the dquot buffer the dquot belongs to. Most of the time
      this isn't an issue because the buffer is still cached, but when it
      is not cached it will result in writing the dquot buffer without
      having the verfier attached.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      5fd364fe
    • D
      xfs: ensure verifiers are attached to recovered buffers · 67dc288c
      Dave Chinner 提交于
      Crash testing of CRC enabled filesystems has resulted in a number of
      reports of bad CRCs being detected after the filesystem was mounted.
      Errors such as the following were being seen:
      
      XFS (sdb3): Mounting V5 Filesystem
      XFS (sdb3): Starting recovery (logdev: internal)
      XFS (sdb3): Metadata CRC error detected at xfs_agf_read_verify+0x5a/0x100 [xfs], block 0x1
      XFS (sdb3): Unmount and run xfs_repair
      XFS (sdb3): First 64 bytes of corrupted metadata buffer:
      ffff880136ffd600: 58 41 47 46 00 00 00 01 00 00 00 00 00 0f aa 40  XAGF...........@
      ffff880136ffd610: 00 02 6d 53 00 02 77 f8 00 00 00 00 00 00 00 01  ..mS..w.........
      ffff880136ffd620: 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 03  ................
      ffff880136ffd630: 00 00 00 04 00 08 81 d0 00 08 81 a7 00 00 00 00  ................
      XFS (sdb3): metadata I/O error: block 0x1 ("xfs_trans_read_buf_map") error 74 numblks 1
      
      The errors were typically being seen in AGF, AGI and their related
      btree block buffers some time after log recovery had run. Often it
      wasn't until later subsequent mounts that the problem was
      discovered. The common symptom was a buffer with the correct
      contents, but a CRC and an LSN that matched an older version of the
      contents.
      
      Some debug added to _xfs_buf_ioapply() indicated that buffers were
      being written without verifiers attached to them from log recovery,
      and Jan Kara isolated the cause to log recovery readahead an dit's
      interactions with buffers that had a more recent LSN on disk than
      the transaction being recovered. In this case, the buffer did not
      get a verifier attached, and os when the second phase of log
      recovery ran and recovered EFIs and unlinked inodes, the buffers
      were modified and written without the verifier running. Hence they
      had up to date contents, but stale LSNs and CRCs.
      
      Fix it by attaching verifiers to buffers we skip due to future LSN
      values so they don't escape into the buffer cache without the
      correct verifier attached.
      
      This patch is based on analysis and a patch from Jan Kara.
      
      cc: <stable@vger.kernel.org>
      Reported-by: NJan Kara <jack@suse.cz>
      Reported-by: NFanael Linithien <fanael4@gmail.com>
      Reported-by: NGrozdan <neutrino8@gmail.com>
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      67dc288c
    • D
      xfs: catch buffers written without verifiers attached · 400b9d88
      Dave Chinner 提交于
      We recently had a bug where buffers were slipping through log
      recovery without any verifier attached to them. This was resulting
      in on-disk CRC mismatches for valid data. Add some warning code to
      catch this occurrence so that we catch such bugs during development
      rather than not being aware they exist.
      
      Note that we cannot do this verification unconditionally as non-CRC
      filesystems don't always attach verifiers to the buffers being
      written. e.g. during log recovery we cannot identify all the
      different types of buffers correctly on non-CRC filesystems, so we
      can't attach the correct verifiers in all cases and so we don't
      attach any. Hence we don't want on non-CRC filesystems to avoid
      spamming the logs with false indications.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      400b9d88
    • E
      xfs: avoid false quotacheck after unclean shutdown · 5ef828c4
      Eric Sandeen 提交于
      The commit
      
      83e782e1 xfs: Remove incore use of XFS_OQUOTA_ENFD and XFS_OQUOTA_CHKD
      
      added a new function xfs_sb_quota_from_disk() which swaps
      on-disk XFS_OQUOTA_* flags for in-core XFS_GQUOTA_* and XFS_PQUOTA_*
      flags after the superblock is read.
      
      However, if log recovery is required, the superblock is read again,
      and the modified in-core flags are re-read from disk, so we have
      XFS_OQUOTA_* flags in memory again.  This causes the
      XFS_QM_NEED_QUOTACHECK() test to be true, because the XFS_OQUOTA_CHKD
      is still set, and not XFS_GQUOTA_CHKD or XFS_PQUOTA_CHKD.
      
      Change xfs_sb_from_disk to call xfs_sb_quota_from disk and always
      convert the disk flags to in-memory flags.
      
      Add a lower-level function which can be called with "false" to
      not convert the flags, so that the sb verifier can verify
      exactly what was on disk, per Brian Foster's suggestion.
      Reported-by: NCyril B. <cbay@excellency.fr>
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      5ef828c4
    • B
      xfs: fix rounding error of fiemap length parameter · eedf32bf
      Brian Foster 提交于
      The offset and length parameters are converted from bytes to basic
      blocks by xfs_vn_fiemap(). The BTOBB() converter rounds the value up to
      the nearest basic block. This leads to unexpected behavior when
      unaligned offsets are provided to FIEMAP.
      
      Fix the conversions of byte values to block values to cover the provided
      offsets. Round down the start offset to the nearest basic block.
      Calculate the end offset based on the provided values, round up and
      calculate length based on the start block offset.
      Reported-by: NChandan Rajendra <chandan@linux.vnet.ibm.com>
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      eedf32bf
    • J
      xfs: introduce xfs_bulkstat_ag_ichunk · 1e773c49
      Jie Liu 提交于
      Introduce xfs_bulkstat_ag_ichunk() to process inodes in chunk with a
      pointer to a formatter function that will iget the inode and fill in
      the appropriate structure.
      
      Refactor xfs_bulkstat() with it.
      Signed-off-by: NJie Liu <jeff.liu@oracle.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      1e773c49
  3. 30 7月, 2014 1 次提交
  4. 24 7月, 2014 16 次提交