1. 09 1月, 2018 1 次提交
  2. 29 11月, 2017 1 次提交
  3. 02 11月, 2017 1 次提交
  4. 28 10月, 2017 1 次提交
  5. 27 10月, 2017 1 次提交
    • B
      xfs: buffer lru reference count error injection tag · 7561d27e
      Brian Foster 提交于
      XFS uses a fixed reference count for certain types of buffers in the
      internal LRU cache. These reference counts dictate how aggressively
      certain buffers are reclaimed vs. others. While the reference counts
      implements priority across different buffer types, all buffers
      (other than uncached buffers) are typically cached for at least one
      reclaim cycle.
      
      We've had at least one bug recently that has been hidden by a
      released buffer sitting around in the LRU. Users hitting the problem
      were able to reproduce under enough memory pressure to cause
      aggressive reclaim in a particular window of time.
      
      To support future xfstests cases, add an error injection tag to
      hardcode the buffer reference count to zero. When enabled, this
      bypasses caching of associated buffers and facilitates test cases
      that depend on this behavior.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      7561d27e
  6. 26 9月, 2017 1 次提交
  7. 01 9月, 2017 1 次提交
  8. 24 8月, 2017 1 次提交
    • C
      block: replace bi_bdev with a gendisk pointer and partitions index · 74d46992
      Christoph Hellwig 提交于
      This way we don't need a block_device structure to submit I/O.  The
      block_device has different life time rules from the gendisk and
      request_queue and is usually only available when the block device node
      is open.  Other callers need to explicitly create one (e.g. the lightnvm
      passthrough code, or the new nvme multipathing code).
      
      For the actual I/O path all that we need is the gendisk, which exists
      once per block device.  But given that the block layer also does
      partition remapping we additionally need a partition index, which is
      used for said remapping in generic_make_request.
      
      Note that all the block drivers generally want request_queue or
      sometimes the gendisk, so this removes a layer of indirection all
      over the stack.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      74d46992
  9. 20 6月, 2017 1 次提交
    • D
      xfs: remove double-underscore integer types · c8ce540d
      Darrick J. Wong 提交于
      This is a purely mechanical patch that removes the private
      __{u,}int{8,16,32,64}_t typedefs in favor of using the system
      {u,}int{8,16,32,64}_t typedefs.  This is the sed script used to perform
      the transformation and fix the resulting whitespace and indentation
      errors:
      
      s/typedef\t__uint8_t/typedef __uint8_t\t/g
      s/typedef\t__uint/typedef __uint/g
      s/typedef\t__int\([0-9]*\)_t/typedef int\1_t\t/g
      s/__uint8_t\t/__uint8_t\t\t/g
      s/__uint/uint/g
      s/__int\([0-9]*\)_t\t/__int\1_t\t\t/g
      s/__int/int/g
      /^typedef.*int[0-9]*_t;$/d
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      c8ce540d
  10. 19 6月, 2017 1 次提交
    • B
      xfs: push buffer of flush locked dquot to avoid quotacheck deadlock · 7912e7fe
      Brian Foster 提交于
      Reclaim during quotacheck can lead to deadlocks on the dquot flush
      lock:
      
       - Quotacheck populates a local delwri queue with the physical dquot
         buffers.
       - Quotacheck performs the xfs_qm_dqusage_adjust() bulkstat and
         dirties all of the dquots.
       - Reclaim kicks in and attempts to flush a dquot whose buffer is
         already queud on the quotacheck queue. The flush succeeds but
         queueing to the reclaim delwri queue fails as the backing buffer is
         already queued. The flush unlock is now deferred to I/O completion
         of the buffer from the quotacheck queue.
       - The dqadjust bulkstat continues and dirties the recently flushed
         dquot once again.
       - Quotacheck proceeds to the xfs_qm_flush_one() walk which requires
         the flush lock to update the backing buffers with the in-core
         recalculated values. It deadlocks on the redirtied dquot as the
         flush lock was already acquired by reclaim, but the buffer resides
         on the local delwri queue which isn't submitted until the end of
         quotacheck.
      
      This is reproduced by running quotacheck on a filesystem with a
      couple million inodes in low memory (512MB-1GB) situations. This is
      a regression as of commit 43ff2122 ("xfs: on-stack delayed write
      buffer lists"), which removed a trylock and buffer I/O submission
      from the quotacheck dquot flush sequence.
      
      Quotacheck first resets and collects the physical dquot buffers in a
      delwri queue. Then, it traverses the filesystem inodes via bulkstat,
      updates the in-core dquots, flushes the corrected dquots to the
      backing buffers and finally submits the delwri queue for I/O. Since
      the backing buffers are queued across the entire quotacheck
      operation, dquot reclaim cannot possibly complete a dquot flush
      before quotacheck completes.
      
      Therefore, quotacheck must submit the buffer for I/O in order to
      cycle the flush lock and flush the dirty in-core dquot to the
      buffer. Add a delwri queue buffer push mechanism to submit an
      individual buffer for I/O without losing the delwri queue status and
      use it from quotacheck to avoid the deadlock. This restores
      quotacheck behavior to as before the regression was introduced.
      Reported-by: NMartin Svec <martin.svec@zoner.cz>
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      7912e7fe
  11. 09 6月, 2017 1 次提交
  12. 08 6月, 2017 1 次提交
  13. 31 5月, 2017 1 次提交
    • B
      xfs: use ->b_state to fix buffer I/O accounting release race · 63db7c81
      Brian Foster 提交于
      We've had user reports of unmount hangs in xfs_wait_buftarg() that
      analysis shows is due to btp->bt_io_count == -1. bt_io_count
      represents the count of in-flight asynchronous buffers and thus
      should always be >= 0. xfs_wait_buftarg() waits for this value to
      stabilize to zero in order to ensure that all untracked (with
      respect to the lru) buffers have completed I/O processing before
      unmount proceeds to tear down in-core data structures.
      
      The value of -1 implies an I/O accounting decrement race. Indeed,
      the fact that xfs_buf_ioacct_dec() is called from xfs_buf_rele()
      (where the buffer lock is no longer held) means that bp->b_flags can
      be updated from an unsafe context. While a user-level reproducer is
      currently not available, some intrusive hacks to run racing buffer
      lookups/ioacct/releases from multiple threads was used to
      successfully manufacture this problem.
      
      Existing callers do not expect to acquire the buffer lock from
      xfs_buf_rele(). Therefore, we can not safely update ->b_flags from
      this context. It turns out that we already have separate buffer
      state bits and associated serialization for dealing with buffer LRU
      state in the form of ->b_state and ->b_lock. Therefore, replace the
      _XBF_IN_FLIGHT flag with a ->b_state variant, update the I/O
      accounting wrappers appropriately and make sure they are used with
      the correct locking. This ensures that buffer in-flight state can be
      modified at buffer release time without racing with modifications
      from a buffer lock holder.
      
      Fixes: 9c7504aa ("xfs: track and serialize in-flight async buffers against unmount")
      Cc: <stable@vger.kernel.org> # v4.8+
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NNikolay Borisov <nborisov@suse.com>
      Tested-by: NLibor Pechacek <lpechacek@suse.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      63db7c81
  14. 04 5月, 2017 1 次提交
  15. 26 4月, 2017 1 次提交
  16. 02 3月, 2017 1 次提交
  17. 02 2月, 2017 1 次提交
  18. 26 1月, 2017 1 次提交
    • D
      xfs: clear _XBF_PAGES from buffers when readahead page · 2aa6ba7b
      Darrick J. Wong 提交于
      If we try to allocate memory pages to back an xfs_buf that we're trying
      to read, it's possible that we'll be so short on memory that the page
      allocation fails.  For a blocking read we'll just wait, but for
      readahead we simply dump all the pages we've collected so far.
      
      Unfortunately, after dumping the pages we neglect to clear the
      _XBF_PAGES state, which means that the subsequent call to xfs_buf_free
      thinks that b_pages still points to pages we own.  It then double-frees
      the b_pages pages.
      
      This results in screaming about negative page refcounts from the memory
      manager, which xfs oughtn't be triggering.  To reproduce this case,
      mount a filesystem where the size of the inodes far outweighs the
      availalble memory (a ~500M inode filesystem on a VM with 300MB memory
      did the trick here) and run bulkstat in parallel with other memory
      eating processes to put a huge load on the system.  The "check summary"
      phase of xfs_scrub also works for this purpose.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NEric Sandeen <sandeen@redhat.com>
      2aa6ba7b
  19. 09 12月, 2016 1 次提交
  20. 07 12月, 2016 1 次提交
    • L
      xfs: use rhashtable to track buffer cache · 6031e73a
      Lucas Stach 提交于
      On filesystems with a lot of metadata and in metadata intensive workloads
      xfs_buf_find() is showing up at the top of the CPU cycles trace. Most of
      the CPU time is spent on CPU cache misses while traversing the rbtree.
      
      As the buffer cache does not need any kind of ordering, but fast lookups
      a hashtable is the natural data structure to use. The rhashtable
      infrastructure provides a self-scaling hashtable implementation and
      allows lookups to proceed while the table is going through a resize
      operation.
      
      This reduces the CPU-time spent for the lookups to 1/3 even for small
      filesystems with a relatively small number of cached buffers, with
      possibly much larger gains on higher loaded filesystems.
      
      [dchinner: reduce minimum hash size to an acceptable size for large
      	   filesystems with many AGs with no active use.]
      [dchinner: remove stale rbtree asserts.]
      [dchinner: use xfs_buf_map for compare function argument.]
      [dchinner: make functions static.]
      [dchinner: remove redundant comments.]
      Signed-off-by: NLucas Stach <dev@lynxeye.de>
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      
      6031e73a
  21. 01 11月, 2016 1 次提交
  22. 26 8月, 2016 1 次提交
    • B
      xfs: prevent dropping ioend completions during buftarg wait · 800b2694
      Brian Foster 提交于
      xfs_wait_buftarg() waits for all pending I/O, drains the ioend
      completion workqueue and walks the LRU until all buffers in the cache
      have been released. This is traditionally an unmount operation` but the
      mechanism is also reused during filesystem freeze.
      
      xfs_wait_buftarg() invokes drain_workqueue() as part of the quiesce,
      which is intended more for a shutdown sequence in that it indicates to
      the queue that new operations are not expected once the drain has begun.
      New work jobs after this point result in a WARN_ON_ONCE() and are
      otherwise dropped.
      
      With filesystem freeze, however, read operations are allowed and can
      proceed during or after the workqueue drain. If such a read occurs
      during the drain sequence, the workqueue infrastructure complains about
      the queued ioend completion work item and drops it on the floor. As a
      result, the buffer remains on the LRU and the freeze never completes.
      
      Despite the fact that the overall buffer cache cleanup is not necessary
      during freeze, fix up this operation such that it is safe to invoke
      during non-unmount quiesce operations. Replace the drain_workqueue()
      call with flush_workqueue(), which runs a similar serialization on
      pending workqueue jobs without causing new jobs to be dropped. This is
      safe for unmount as unmount independently locks out new operations by
      the time xfs_wait_buftarg() is invoked.
      
      cc: <stable@vger.kernel.org>
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      800b2694
  23. 17 8月, 2016 1 次提交
    • B
      xfs: don't assert fail on non-async buffers on ioacct decrement · 4dd3fd71
      Brian Foster 提交于
      The buffer I/O accounting mechanism tracks async buffers under I/O.  As
      an optimization, the buffer I/O count is incremented only once on the
      first async I/O for a given hold cycle of a buffer and decremented once
      the buffer is released to the LRU (or freed).
      
      xfs_buf_ioacct_dec() has an ASSERT() check for an XBF_ASYNC buffer, but
      we have one or two corner cases where a buffer can be submitted for I/O
      multiple times via different methods in a single hold cycle. If an async
      I/O occurs first, the I/O count is incremented. If a sync I/O occurs
      before the hold count drops, XBF_ASYNC is cleared by the time the I/O
      count is decremented.
      
      Remove the async assert check from xfs_buf_ioacct_dec() as this is a
      perfectly valid scenario. For the purposes of I/O accounting, we really
      only care about the buffer async state at I/O submission time.
      Discovered-and-analyzed-by: NDave Chinner <david@fromorbit.com>
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      
      4dd3fd71
  24. 20 7月, 2016 3 次提交
    • B
      xfs: track and serialize in-flight async buffers against unmount · 9c7504aa
      Brian Foster 提交于
      Newly allocated XFS metadata buffers are added to the LRU once the hold
      count is released, which typically occurs after I/O completion. There is
      no other mechanism at current that tracks the existence or I/O state of
      a new buffer. Further, readahead I/O tends to be submitted
      asynchronously by nature, which means the I/O can remain in flight and
      actually complete long after the calling context is gone. This means
      that file descriptors or any other holds on the filesystem can be
      released, allowing the filesystem to be unmounted while I/O is still in
      flight. When I/O completion occurs, core data structures may have been
      freed, causing completion to run into invalid memory accesses and likely
      to panic.
      
      This problem is reproduced on XFS via directory readahead. A filesystem
      is mounted, a directory is opened/closed and the filesystem immediately
      unmounted. The open/close cycle triggers a directory readahead that if
      delayed long enough, runs buffer I/O completion after the unmount has
      completed.
      
      To address this problem, add a mechanism to track all in-flight,
      asynchronous buffers using per-cpu counters in the buftarg. The buffer
      is accounted on the first I/O submission after the current reference is
      acquired and unaccounted once the buffer is returned to the LRU or
      freed. Update xfs_wait_buftarg() to wait on all in-flight I/O before
      walking the LRU list. Once in-flight I/O has completed and the workqueue
      has drained, all new buffers should have been released onto the LRU.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      
      9c7504aa
    • B
      xfs: exclude never-released buffers from buftarg I/O accounting · c891c30a
      Brian Foster 提交于
      The upcoming buftarg I/O accounting mechanism maintains a count of
      all buffers that have undergone I/O in the current hold-release
      cycle.  Certain buffers associated with core infrastructure (e.g.,
      the xfs_mount superblock buffer, log buffers) are never released,
      however. This means that accounting I/O submission on such buffers
      elevates the buftarg count indefinitely and could lead to lockup on
      unmount.
      
      Define a new buffer flag to explicitly exclude buffers from buftarg
      I/O accounting. Set the flag on the superblock and associated log
      buffers.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      
      c891c30a
    • E
      xfs: remove extraneous buffer flag changes · 0b4db5df
      Eric Sandeen 提交于
      Fix up a couple places where extra flag manipulation occurs.
      
      In the first case we clear XBF_ASYNC and then immediately reset it -
      so don't bother clearing in the first place.
      
      In the 2nd case we are at a point in the function where the buffer
      must already be async, so there is no need to reset it.
      
      Add consistent spacing around the " | " while we're at it.
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Reviewed-by: NCarlos Maiolino <cmaiolino@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      
      0b4db5df
  25. 21 6月, 2016 1 次提交
  26. 10 6月, 2016 1 次提交
  27. 08 6月, 2016 3 次提交
  28. 01 6月, 2016 1 次提交
    • D
      xfs: reduce lock hold times in buffer writeback · 26f1fe85
      Dave Chinner 提交于
      When we have a lot of metadata to flush from the AIL, the buffer
      list can get very long. The current submission code tries to batch
      submission to optimise IO order of the metadata (i.e. ascending
      block order) to maximise block layer merging or IO to adjacent
      metadata blocks.
      
      Unfortunately, the method used can result in long lock times
      occurring as buffers locked early on in the buffer list might not be
      dispatched until the end of the IO licst processing. This is because
      sorting does not occur util after the buffer list has been processed
      and the buffers that are going to be submitted are locked. Hence
      when the buffer list is several thousand buffers long, the lock hold
      times before IO dispatch can be significant.
      
      To fix this, sort the buffer list before we start trying to lock and
      submit buffers. This means we can now submit buffers immediately
      after they are locked, allowing merging to occur immediately on the
      plug and dispatch to occur as quickly as possible. This means there
      is minimal delay between locking the buffer and IO submission
      occuring, hence reducing the worst case lock hold times seen during
      delayed write buffer IO submission signficantly.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NCarlos Maiolino <cmaiolino@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      26f1fe85
  29. 18 5月, 2016 1 次提交
    • B
      xfs: buffer ->bi_end_io function requires irq-safe lock · 9bdd9bd6
      Brian Foster 提交于
      Reports have surfaced of a lockdep splat complaining about an
      irq-safe -> irq-unsafe locking order in the xfs_buf_bio_end_io() bio
      completion handler. This only occurs when I/O errors are present
      because bp->b_lock is only acquired in this context to protect
      setting an error on the buffer. The problem is that this lock can be
      acquired with the (request_queue) q->queue_lock held. See
      scsi_end_request() or ata_qc_schedule_eh(), for example.
      
      Replace the locked test/set of b_io_error with a cmpxchg() call.
      This eliminates the need for the lock and thus the lock ordering
      problem goes away.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      9bdd9bd6
  30. 10 2月, 2016 1 次提交
  31. 19 1月, 2016 1 次提交
    • D
      xfs: log mount failures don't wait for buffers to be released · 85bec546
      Dave Chinner 提交于
      Recently I've been seeing xfs/051 fail on 1k block size filesystems.
      Trying to trace the events during the test lead to the problem going
      away, indicating that it was a race condition that lead to this
      ASSERT failure:
      
      XFS: Assertion failed: atomic_read(&pag->pag_ref) == 0, file: fs/xfs/xfs_mount.c, line: 156
      .....
      [<ffffffff814e1257>] xfs_free_perag+0x87/0xb0
      [<ffffffff814e21b9>] xfs_mountfs+0x4d9/0x900
      [<ffffffff814e5dff>] xfs_fs_fill_super+0x3bf/0x4d0
      [<ffffffff811d8800>] mount_bdev+0x180/0x1b0
      [<ffffffff814e3ff5>] xfs_fs_mount+0x15/0x20
      [<ffffffff811d90a8>] mount_fs+0x38/0x170
      [<ffffffff811f4347>] vfs_kern_mount+0x67/0x120
      [<ffffffff811f7018>] do_mount+0x218/0xd60
      [<ffffffff811f7e5b>] SyS_mount+0x8b/0xd0
      
      When I finally caught it with tracing enabled, I saw that AG 2 had
      an elevated reference count and a buffer was responsible for it. I
      tracked down the specific buffer, and found that it was missing the
      final reference count release that would put it back on the LRU and
      hence be found by xfs_wait_buftarg() calls in the log mount failure
      handling.
      
      The last four traces for the buffer before the assert were (trimmed
      for relevance)
      
      kworker/0:1-5259   xfs_buf_iodone:        hold 2  lock 0 flags ASYNC
      kworker/0:1-5259   xfs_buf_ioerror:       hold 2  lock 0 error -5
      mount-7163	   xfs_buf_lock_done:     hold 2  lock 0 flags ASYNC
      mount-7163	   xfs_buf_unlock:        hold 2  lock 1 flags ASYNC
      
      This is an async write that is completing, so there's nobody waiting
      for it directly.  Hence we call xfs_buf_relse() once all the
      processing is complete. That does:
      
      static inline void xfs_buf_relse(xfs_buf_t *bp)
      {
      	xfs_buf_unlock(bp);
      	xfs_buf_rele(bp);
      }
      
      Now, it's clear that mount is waiting on the buffer lock, and that
      it has been released by xfs_buf_relse() and gained by mount. This is
      expected, because at this point the mount process is in
      xfs_buf_delwri_submit() waiting for all the IO it submitted to
      complete.
      
      The mount process, however, is waiting on the lock for the buffer
      because it is in xfs_buf_delwri_submit(). This waits for IO
      completion, but it doesn't wait for the buffer reference owned by
      the IO to go away. The mount process collects all the completions,
      fails the log recovery, and the higher level code then calls
      xfs_wait_buftarg() to free all the remaining buffers in the
      filesystem.
      
      The issue is that on unlocking the buffer, the scheduler has decided
      that the mount process has higher priority than the the kworker
      thread that is running the IO completion, and so immediately
      switched contexts to the mount process from the semaphore unlock
      code, hence preventing the kworker thread from finishing the IO
      completion and releasing the IO reference to the buffer.
      
      Hence by the time that xfs_wait_buftarg() is run, the buffer still
      has an active reference and so isn't on the LRU list that the
      function walks to free the remaining buffers. Hence we miss that
      buffer and continue onwards to tear down the mount structures,
      at which time we get find a stray reference count on the perag
      structure. On a non-debug kernel, this will be ignored and the
      structure torn down and freed. Hence when the kworker thread is then
      rescheduled and the buffer released and freed, it will access a
      freed perag structure.
      
      The problem here is that when the log mount fails, we still need to
      quiesce the log to ensure that the IO workqueues have returned to
      idle before we run xfs_wait_buftarg(). By synchronising the
      workqueues, we ensure that all IO completions are fully processed,
      not just to the point where buffers have been unlocked. This ensures
      we don't end up in the situation above.
      
      cc: <stable@vger.kernel.org> # 3.18
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      85bec546
  32. 12 1月, 2016 1 次提交
    • D
      xfs: inode recovery readahead can race with inode buffer creation · b79f4a1c
      Dave Chinner 提交于
      When we do inode readahead in log recovery, we do can do the
      readahead before we've replayed the icreate transaction that stamps
      the buffer with inode cores. The inode readahead verifier catches
      this and marks the buffer as !done to indicate that it doesn't yet
      contain valid inodes.
      
      In adding buffer error notification  (i.e. setting b_error = -EIO at
      the same time as as we clear the done flag) to such a readahead
      verifier failure, we can then get subsequent inode recovery failing
      with this error:
      
      XFS (dm-0): metadata I/O error: block 0xa00060 ("xlog_recover_do..(read#2)") error 5 numblks 32
      
      This occurs when readahead completion races with icreate item replay
      such as:
      
      	inode readahead
      		find buffer
      		lock buffer
      		submit RA io
      	....
      	icreate recovery
      	    xfs_trans_get_buffer
      		find buffer
      		lock buffer
      		<blocks on RA completion>
      	.....
      	<ra completion>
      		fails verifier
      		clear XBF_DONE
      		set bp->b_error = -EIO
      		release and unlock buffer
      	<icreate gains lock>
      	icreate initialises buffer
      	marks buffer as done
      	adds buffer to delayed write queue
      	releases buffer
      
      At this point, we have an initialised inode buffer that is up to
      date but has an -EIO state registered against it. When we finally
      get to recovering an inode in that buffer:
      
      	inode item recovery
      	    xfs_trans_read_buffer
      		find buffer
      		lock buffer
      		sees XBF_DONE is set, returns buffer
      	    sees bp->b_error is set
      		fail log recovery!
      
      Essentially, we need xfs_trans_get_buf_map() to clear the error status of
      the buffer when doing a lookup. This function returns uninitialised
      buffers, so the buffer returned can not be in an error state and
      none of the code that uses this function expects b_error to be set
      on return. Indeed, there is an ASSERT(!bp->b_error); in the
      transaction case in xfs_trans_get_buf_map() that would have caught
      this if log recovery used transactions....
      
      This patch firstly changes the inode readahead failure to set -EIO
      on the buffer, and secondly changes xfs_buf_get_map() to never
      return a buffer with an error state set so this first change doesn't
      cause unexpected log recovery failures.
      
      cc: <stable@vger.kernel.org> # 3.12 - current
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      b79f4a1c
  33. 07 1月, 2016 1 次提交
  34. 04 1月, 2016 1 次提交
  35. 12 10月, 2015 2 次提交