1. 11 3月, 2008 1 次提交
  2. 05 3月, 2008 9 次提交
    • K
      md: the md RAID10 resync thread could cause a md RAID10 array deadlock · a07e6ab4
      K.Tanaka 提交于
      This message describes another issue about md RAID10 found by testing the
      2.6.24 md RAID10 using new scsi fault injection framework.
      
      Abstract:
      
      When a scsi error results in disabling a disk during RAID10 recovery, the
      resync threads of md RAID10 could stall.
      
      This case, the raid array has already been broken and it may not matter.  But
      I think stall is not preferable.  If it occurs, even shutdown or reboot will
      fail because of resource busy.
      
      The deadlock mechanism:
      
      The r10bio_s structure has a "remaining" member to keep track of BIOs yet to
      be handled when recovering.  The "remaining" counter is incremented when
      building a BIO in sync_request() and is decremented when finish a BIO in
      end_sync_write().
      
      If building a BIO fails for some reasons in sync_request(), the "remaining"
      should be decremented if it has already been incremented.  I found a case
      where this decrement is forgotten.  This causes a md_do_sync() deadlock
      because md_do_sync() waits for md_done_sync() called by end_sync_write(), but
      end_sync_write() never calls md_done_sync() because of the "remaining" counter
      mismatch.
      
      For example, this problem would be reproduced in the following case:
      
      Personalities : [raid10]
      md0 : active raid10 sdf1[4] sde1[5](F) sdd1[2] sdc1[1] sdb1[6](F)
            3919616 blocks 64K chunks 2 near-copies [4/2] [_UU_]
            [>....................]  recovery =  2.2% (45376/1959808) finish=0.7min speed=45376K/sec
      
      This case, sdf1 is recovering, sdb1 and sde1 are disabled.
      An additional error with detaching sdd will cause a deadlock.
      
      md0 : active raid10 sdf1[4] sde1[5](F) sdd1[6](F) sdc1[1] sdb1[7](F)
            3919616 blocks 64K chunks 2 near-copies [4/1] [_U__]
            [=>...................]  recovery =  5.0% (99520/1959808) finish=5.9min speed=5237K/sec
      
       2739 ?        S<     0:17 [md0_raid10]
      28608 ?        D<     0:00 [md0_resync]
      28629 pts/1    Ss     0:00 bash
      28830 pts/1    R+     0:00 ps ax
      31819 ?        D<     0:00 [kjournald]
      
      The resync thread keeps working, but actually it is deadlocked.
      
      Patch:
      By this patch, the remaining counter will be decremented if needed.
      Signed-off-by: NNeil Brown <neilb@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a07e6ab4
    • N
      md: fix possible raid1/raid10 deadlock on read error during resync · 1c830532
      NeilBrown 提交于
      Thanks to K.Tanaka and the scsi fault injection framework, here is a fix for
      another possible deadlock in raid1/raid10 error handing.
      
      If a read request returns an error while a resync is happening and a resync
      request is pending, the attempt to fix the error will block until the resync
      progresses, and the resync will block until the read request completes.  Thus
      a deadlock.
      
      This patch fixes the problem.
      
      Cc: "K.Tanaka" <k-tanaka@ce.jp.nec.com>
      Signed-off-by: NNeil Brown <neilb@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1c830532
    • K
      md: don't attempt read-balancing for raid10 'far' layouts · 8ed3a195
      Keld Simonsen 提交于
      This patch changes the disk to be read for layout "far > 1" to always be the
      disk with the lowest block address.
      
      Thus the chunks to be read will always be (for a fully functioning array) from
      the first band of stripes, and the raid will then work as a raid0 consisting
      of the first band of stripes.
      
      Some advantages:
      
      The fastest part which is the outer sectors of the disks involved will be
      used.  The outer blocks of a disk may be as much as 100 % faster than the
      inner blocks.
      
      Average seek time will be smaller, as seeks will always be confined to the
      first part of the disks.
      
      Mixed disks with different performance characteristics will work better, as
      they will work as raid0, the sequential read rate will be number of disks
      involved times the IO rate of the slowest disk.
      
      If a disk is malfunctioning, the first disk which is working, and has the
      lowest block address for the logical block will be used.
      Signed-off-by: NKeld Simonsen <keld@dkuug.dk>
      Signed-off-by: NNeil Brown <neilb@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8ed3a195
    • N
      md: lock access to rdev attributes properly · 27c529bb
      NeilBrown 提交于
      When we access attributes of an rdev (component device on an md array) through
      sysfs, we really need to lock the array against concurrent changes.  We
      currently do that when we change an attribute, but not when we read an
      attribute.  We need to lock when reading as well else rdev->mddev could become
      NULL while we are accessing it.
      
      So add appropriate locking (mddev_lock) to rdev_attr_show.
      
      rdev_size_store requires some extra care as well as it needs to unlock the
      mddev while scanning other mddevs for overlapping regions.  We currently
      assume that rdev->mddev will still be unchanged after the scan, but that
      cannot be certain.  So take a copy of rdev->mddev for use at the end of the
      function.
      Signed-off-by: NNeil Brown <neilb@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      27c529bb
    • N
      md: make sure a reshape is started when device switches to read-write · 25156198
      NeilBrown 提交于
      A resync/reshape/recovery thread will refuse to progress when the array is
      marked read-only.  So whenever it mark it not read-only, it is important to
      wake up thread resync thread.  There is one place we didn't do this.
      
      The problem manifests if the start_ro module parameters is set, and a raid5
      array that is in the middle of a reshape (restripe) is started.  The array
      will initially be semi-read-only (meaning it acts like it is readonly until
      the first write).  So the reshape will not proceed.
      
      On the first write, the array will become read-write, but the reshape will not
      be started, and there is no event which will ever restart that thread.
      Signed-off-by: NNeil Brown <neilb@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      25156198
    • N
      md: clean up irregularity with raid autodetect · d0fae18f
      NeilBrown 提交于
      When a raid1 array is stopped, all components currently get added to the list
      for auto-detection.  However we should really only add components that were
      found by autodetection in the first place.  So add a flag to record that
      information, and use it.
      Signed-off-by: NNeil Brown <neilb@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d0fae18f
    • N
      md: guard against possible bad array geometry in v1 metadata · a1801f85
      NeilBrown 提交于
      Make sure the data doesn't start before the end of the superblock when the
      superblock is at the start of the device.
      Signed-off-by: NNeil Brown <neilb@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a1801f85
    • N
      md: reduce CPU wastage on idle md array with a write-intent bitmap · 8311c29d
      NeilBrown 提交于
      On an md array with a write-intent bitmap, a thread wakes up every few seconds
      and scans the bitmap looking for work to do.  If the array is idle, there will
      be no work to do, but a lot of scanning is done to discover this.
      
      So cache the fact that the bitmap is completely clean, and avoid scanning the
      whole bitmap when the cache is known to be clean.
      Signed-off-by: NNeil Brown <neilb@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8311c29d
    • N
      md: fix deadlock in md/raid1 and md/raid10 when handling a read error · a35e63ef
      NeilBrown 提交于
      When handling a read error, we freeze the array to stop any other IO while
      attempting to over-write with correct data.
      
      This is done in the raid1d(raid10d) thread and must wait for all submitted IO
      to complete (except for requests that failed and are sitting in the retry
      queue - these are counted in ->nr_queue and will stay there during a freeze).
      
      However write requests need attention from raid1d as bitmap updates might be
      required.  This can cause a deadlock as raid1 is waiting for requests to
      finish that themselves need attention from raid1d.
      
      So we create a new function 'flush_pending_writes' to give that attention, and
      call it in freeze_array to be sure that we aren't waiting on raid1d.
      
      Thanks to "K.Tanaka" <k-tanaka@ce.jp.nec.com> for finding and reporting this
      problem.
      
      Cc: "K.Tanaka" <k-tanaka@ce.jp.nec.com>
      Signed-off-by: NNeil Brown <neilb@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a35e63ef
  3. 20 2月, 2008 1 次提交
  4. 15 2月, 2008 4 次提交
  5. 14 2月, 2008 1 次提交
  6. 08 2月, 2008 24 次提交