1. 28 4月, 2008 1 次提交
  2. 05 3月, 2008 4 次提交
    • K
      md: the md RAID10 resync thread could cause a md RAID10 array deadlock · a07e6ab4
      K.Tanaka 提交于
      This message describes another issue about md RAID10 found by testing the
      2.6.24 md RAID10 using new scsi fault injection framework.
      
      Abstract:
      
      When a scsi error results in disabling a disk during RAID10 recovery, the
      resync threads of md RAID10 could stall.
      
      This case, the raid array has already been broken and it may not matter.  But
      I think stall is not preferable.  If it occurs, even shutdown or reboot will
      fail because of resource busy.
      
      The deadlock mechanism:
      
      The r10bio_s structure has a "remaining" member to keep track of BIOs yet to
      be handled when recovering.  The "remaining" counter is incremented when
      building a BIO in sync_request() and is decremented when finish a BIO in
      end_sync_write().
      
      If building a BIO fails for some reasons in sync_request(), the "remaining"
      should be decremented if it has already been incremented.  I found a case
      where this decrement is forgotten.  This causes a md_do_sync() deadlock
      because md_do_sync() waits for md_done_sync() called by end_sync_write(), but
      end_sync_write() never calls md_done_sync() because of the "remaining" counter
      mismatch.
      
      For example, this problem would be reproduced in the following case:
      
      Personalities : [raid10]
      md0 : active raid10 sdf1[4] sde1[5](F) sdd1[2] sdc1[1] sdb1[6](F)
            3919616 blocks 64K chunks 2 near-copies [4/2] [_UU_]
            [>....................]  recovery =  2.2% (45376/1959808) finish=0.7min speed=45376K/sec
      
      This case, sdf1 is recovering, sdb1 and sde1 are disabled.
      An additional error with detaching sdd will cause a deadlock.
      
      md0 : active raid10 sdf1[4] sde1[5](F) sdd1[6](F) sdc1[1] sdb1[7](F)
            3919616 blocks 64K chunks 2 near-copies [4/1] [_U__]
            [=>...................]  recovery =  5.0% (99520/1959808) finish=5.9min speed=5237K/sec
      
       2739 ?        S<     0:17 [md0_raid10]
      28608 ?        D<     0:00 [md0_resync]
      28629 pts/1    Ss     0:00 bash
      28830 pts/1    R+     0:00 ps ax
      31819 ?        D<     0:00 [kjournald]
      
      The resync thread keeps working, but actually it is deadlocked.
      
      Patch:
      By this patch, the remaining counter will be decremented if needed.
      Signed-off-by: NNeil Brown <neilb@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a07e6ab4
    • N
      md: fix possible raid1/raid10 deadlock on read error during resync · 1c830532
      NeilBrown 提交于
      Thanks to K.Tanaka and the scsi fault injection framework, here is a fix for
      another possible deadlock in raid1/raid10 error handing.
      
      If a read request returns an error while a resync is happening and a resync
      request is pending, the attempt to fix the error will block until the resync
      progresses, and the resync will block until the read request completes.  Thus
      a deadlock.
      
      This patch fixes the problem.
      
      Cc: "K.Tanaka" <k-tanaka@ce.jp.nec.com>
      Signed-off-by: NNeil Brown <neilb@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1c830532
    • K
      md: don't attempt read-balancing for raid10 'far' layouts · 8ed3a195
      Keld Simonsen 提交于
      This patch changes the disk to be read for layout "far > 1" to always be the
      disk with the lowest block address.
      
      Thus the chunks to be read will always be (for a fully functioning array) from
      the first band of stripes, and the raid will then work as a raid0 consisting
      of the first band of stripes.
      
      Some advantages:
      
      The fastest part which is the outer sectors of the disks involved will be
      used.  The outer blocks of a disk may be as much as 100 % faster than the
      inner blocks.
      
      Average seek time will be smaller, as seeks will always be confined to the
      first part of the disks.
      
      Mixed disks with different performance characteristics will work better, as
      they will work as raid0, the sequential read rate will be number of disks
      involved times the IO rate of the slowest disk.
      
      If a disk is malfunctioning, the first disk which is working, and has the
      lowest block address for the logical block will be used.
      Signed-off-by: NKeld Simonsen <keld@dkuug.dk>
      Signed-off-by: NNeil Brown <neilb@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8ed3a195
    • N
      md: fix deadlock in md/raid1 and md/raid10 when handling a read error · a35e63ef
      NeilBrown 提交于
      When handling a read error, we freeze the array to stop any other IO while
      attempting to over-write with correct data.
      
      This is done in the raid1d(raid10d) thread and must wait for all submitted IO
      to complete (except for requests that failed and are sitting in the retry
      queue - these are counted in ->nr_queue and will stay there during a freeze).
      
      However write requests need attention from raid1d as bitmap updates might be
      required.  This can cause a deadlock as raid1 is waiting for requests to
      finish that themselves need attention from raid1d.
      
      So we create a new function 'flush_pending_writes' to give that attention, and
      call it in freeze_array to be sure that we aren't waiting on raid1d.
      
      Thanks to "K.Tanaka" <k-tanaka@ce.jp.nec.com> for finding and reporting this
      problem.
      
      Cc: "K.Tanaka" <k-tanaka@ce.jp.nec.com>
      Signed-off-by: NNeil Brown <neilb@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a35e63ef
  3. 07 2月, 2008 3 次提交
  4. 09 11月, 2007 1 次提交
  5. 16 10月, 2007 1 次提交
  6. 10 10月, 2007 1 次提交
  7. 01 8月, 2007 2 次提交
  8. 24 7月, 2007 1 次提交
  9. 18 7月, 2007 1 次提交
    • N
      md: change bitmap_unplug and others to void functions · 4ad13663
      NeilBrown 提交于
      bitmap_unplug only ever returns 0, so it may as well be void.  Two callers try
      to print a message if it returns non-zero, but that message is already printed
      by bitmap_file_kick.
      
      write_page returns an error which is not consistently checked.  It always
      causes BITMAP_WRITE_ERROR to be set on an error, and that can more
      conveniently be checked.
      
      When the return of write_page is checked, an error causes bitmap_file_kick to
      be called - so move that call into write_page - and protect against recursive
      calls into bitmap_file_kick.
      
      bitmap_update_sb returns an error that is never checked.
      
      So make these 'void' and be consistent about checking the bit.
      Signed-off-by: NNeil Brown <neilb@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4ad13663
  10. 17 6月, 2007 1 次提交
  11. 02 3月, 2007 1 次提交
    • N
      [PATCH] md: fix raid10 recovery problem. · 64a742bc
      NeilBrown 提交于
      There are two errors that can lead to recovery problems with raid10
      when used in 'far' more (not the default).
      
      Due to a '>' instead of '>=' the wrong block is located which would result in
      garbage being written to some random location, quite possible outside the
      range of the device, causing the newly reconstructed device to fail.
      
      The device size calculation had some rounding errors (it didn't round when it
      should) and so recovery would go a few blocks too far which would again cause
      a write to a random block address and probably a device error.
      
      The code for working with device sizes was fairly confused and spread out, so
      this has been tided up a bit.
      Signed-off-by: NNeil Brown <neilb@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      64a742bc
  12. 12 1月, 2007 1 次提交
  13. 14 12月, 2006 1 次提交
  14. 29 10月, 2006 1 次提交
  15. 22 10月, 2006 1 次提交
  16. 03 10月, 2006 5 次提交
  17. 11 7月, 2006 1 次提交
  18. 27 6月, 2006 4 次提交
  19. 02 5月, 2006 2 次提交
  20. 02 4月, 2006 1 次提交
  21. 04 2月, 2006 1 次提交
  22. 15 1月, 2006 1 次提交
  23. 07 1月, 2006 4 次提交