1. 28 6月, 2008 2 次提交
  2. 25 5月, 2008 3 次提交
    • N
      md: restart recovery cleanly after device failure. · dfc70645
      NeilBrown 提交于
      When we get any IO error during a recovery (rebuilding a spare), we abort
      the recovery and restart it.
      
      For RAID6 (and multi-drive RAID1) it may not be best to restart at the
      beginning: when multiple failures can be tolerated, the recovery may be
      able to continue and re-doing all that has already been done doesn't make
      sense.
      
      We already have the infrastructure to record where a recovery is up to
      and restart from there, but it is not being used properly.
      This is because:
        - We sometimes abort with MD_RECOVERY_ERR rather than just MD_RECOVERY_INTR,
          which causes the recovery not be be checkpointed.
        - We remove spares and then re-added them which loses important state
          information.
      
      The distinction between MD_RECOVERY_ERR and MD_RECOVERY_INTR really isn't
      needed.  If there is an error, the relevant drive will be marked as
      Faulty, and that is enough to ensure correct handling of the error.  So we
      first remove MD_RECOVERY_ERR, changing some of the uses of it to
      MD_RECOVERY_INTR.
      
      Then we cause the attempt to remove a non-faulty device from an array to
      fail (unless recovery is impossible as the array is too degraded).  Then
      when remove_and_add_spares attempts to remove the devices on which
      recovery can continue, it will fail, they will remain in place, and
      recovery will continue on them as desired.
      
      Issue:  If we are halfway through rebuilding a spare and another drive
      fails, and a new spare is immediately available,  do we want to:
       1/ complete the current rebuild, then go back and rebuild the new spare or
       2/ restart the rebuild from the start and rebuild both devices in
          parallel.
      
      Both options can be argued for.  The code currently takes option 2 as
        a/ this requires least code change
        b/ this results in a minimally-degraded array in minimal time.
      
      Cc: "Eivind Sarto" <ivan@kasenna.com>
      Signed-off-by: NNeil Brown <neilb@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      dfc70645
    • N
      md: raid1: Fix restoration of bio between failed read and write. · 698b18c1
      NeilBrown 提交于
      When performing a "recovery" or "check" pass on a RAID1 array, we read
      from each device and possible, if there is a difference or a read error,
      write back to some devices.
      
      We use the same 'bio' for both read and write, resetting various fields
      between the two operations.
      
      We forgot to reset bv_offset and bv_len however.  These are often left
      unchanged, but in the case where there is an IO error one or two sectors
      into a page, they are changed.
      
      This results in correctable errors not being corrected properly.  It does
      not result in any data corruption.
      
      Cc: "Fairbanks, David" <David.Fairbanks@stratus.com>
      Signed-off-by: NNeil Brown <neilb@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      698b18c1
    • N
      md: fix possible oops when removing a bitmap from an active array · 84255d10
      NeilBrown 提交于
      It is possible to add a write-intent bitmap to an active array, or remove
      the bitmap that is there.
      
      When we do with the 'quiesce' the array, which causes make_request to
      block in "wait_barrier()".
      
      However we are sampling the value of "mddev->bitmap" before the
      wait_barrier call, and using it afterwards.  This can result in using a
      bitmap structure that has been freed.
      Signed-off-by: NNeil Brown <neilb@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      84255d10
  3. 15 5月, 2008 1 次提交
    • N
      Remove blkdev warning triggered by using md · e7e72bf6
      Neil Brown 提交于
      As setting and clearing queue flags now requires that we hold a spinlock
      on the queue, and as blk_queue_stack_limits is called without that lock,
      get the lock inside blk_queue_stack_limits.
      
      For blk_queue_stack_limits to be able to find the right lock, each md
      personality needs to set q->queue_lock to point to the appropriate lock.
      Those personalities which didn't previously use a spin_lock, us
      q->__queue_lock.  So always initialise that lock when allocated.
      
      With this in place, setting/clearing of the QUEUE_FLAG_PLUGGED bit will no
      longer cause warnings as it will be clear that the proper lock is held.
      
      Thanks to Dan Williams for review and fixing the silly bugs.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Cc: Alistair John Strachan <alistair@devzero.co.uk>
      Cc: Nick Piggin <npiggin@suse.de>
      Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
      Cc: Jacek Luczak <difrost.kernel@gmail.com>
      Cc: Prakash Punnoor <prakash@punnoor.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e7e72bf6
  4. 30 4月, 2008 1 次提交
  5. 28 4月, 2008 1 次提交
  6. 05 3月, 2008 2 次提交
    • N
      md: fix possible raid1/raid10 deadlock on read error during resync · 1c830532
      NeilBrown 提交于
      Thanks to K.Tanaka and the scsi fault injection framework, here is a fix for
      another possible deadlock in raid1/raid10 error handing.
      
      If a read request returns an error while a resync is happening and a resync
      request is pending, the attempt to fix the error will block until the resync
      progresses, and the resync will block until the read request completes.  Thus
      a deadlock.
      
      This patch fixes the problem.
      
      Cc: "K.Tanaka" <k-tanaka@ce.jp.nec.com>
      Signed-off-by: NNeil Brown <neilb@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1c830532
    • N
      md: fix deadlock in md/raid1 and md/raid10 when handling a read error · a35e63ef
      NeilBrown 提交于
      When handling a read error, we freeze the array to stop any other IO while
      attempting to over-write with correct data.
      
      This is done in the raid1d(raid10d) thread and must wait for all submitted IO
      to complete (except for requests that failed and are sitting in the retry
      queue - these are counted in ->nr_queue and will stay there during a freeze).
      
      However write requests need attention from raid1d as bitmap updates might be
      required.  This can cause a deadlock as raid1 is waiting for requests to
      finish that themselves need attention from raid1d.
      
      So we create a new function 'flush_pending_writes' to give that attention, and
      call it in freeze_array to be sure that we aren't waiting on raid1d.
      
      Thanks to "K.Tanaka" <k-tanaka@ce.jp.nec.com> for finding and reporting this
      problem.
      
      Cc: "K.Tanaka" <k-tanaka@ce.jp.nec.com>
      Signed-off-by: NNeil Brown <neilb@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a35e63ef
  7. 07 2月, 2008 3 次提交
  8. 09 11月, 2007 1 次提交
  9. 20 10月, 2007 1 次提交
  10. 17 10月, 2007 1 次提交
  11. 16 10月, 2007 1 次提交
  12. 10 10月, 2007 1 次提交
  13. 23 8月, 2007 2 次提交
  14. 24 7月, 2007 1 次提交
  15. 18 7月, 2007 1 次提交
    • N
      md: change bitmap_unplug and others to void functions · 4ad13663
      NeilBrown 提交于
      bitmap_unplug only ever returns 0, so it may as well be void.  Two callers try
      to print a message if it returns non-zero, but that message is already printed
      by bitmap_file_kick.
      
      write_page returns an error which is not consistently checked.  It always
      causes BITMAP_WRITE_ERROR to be set on an error, and that can more
      conveniently be checked.
      
      When the return of write_page is checked, an error causes bitmap_file_kick to
      be called - so move that call into write_page - and protect against recursive
      calls into bitmap_file_kick.
      
      bitmap_update_sb returns an error that is never checked.
      
      So make these 'void' and be consistent about checking the bit.
      Signed-off-by: NNeil Brown <neilb@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4ad13663
  16. 17 6月, 2007 1 次提交
  17. 11 5月, 2007 1 次提交
  18. 10 5月, 2007 2 次提交
  19. 27 1月, 2007 2 次提交
  20. 12 1月, 2007 1 次提交
  21. 14 12月, 2006 1 次提交
  22. 11 12月, 2006 1 次提交
    • N
      [PATCH] md: assorted md and raid1 one-liners · 17571284
      NeilBrown 提交于
      Fix few bugs that meant that:
        - superblocks weren't alway written at exactly the right time (this
          could show up if the array was not written to - writting to the array
          causes lots of superblock updates and so hides these errors).
      
        - restarting device recovery after a clean shutdown (version-1 metadata
          only) didn't work as intended (or at all).
      
      1/ Ensure superblock is updated when a new device is added.
      2/ Remove an inappropriate test on MD_RECOVERY_SYNC in md_do_sync.
         The body of this if takes one of two branches depending on whether
         MD_RECOVERY_SYNC is set, so testing it in the clause of the if
         is wrong.
      3/ Flag superblock for updating after a resync/recovery finishes.
      4/ If we find the neeed to restart a recovery in the middle (version-1
         metadata only) make sure a full recovery (not just as guided by
         bitmaps) does get done.
      Signed-off-by: NNeil Brown <neilb@suse.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      17571284
  23. 29 10月, 2006 1 次提交
  24. 03 10月, 2006 5 次提交
  25. 02 9月, 2006 1 次提交
  26. 28 8月, 2006 1 次提交
  27. 11 7月, 2006 1 次提交