1. 28 7月, 2011 6 次提交
    • N
      md/raid5. Don't write to known bad block on doubtful devices. · 73e92e51
      NeilBrown 提交于
      If a device has seen write errors, don't write to any known
      bad blocks on that device.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      73e92e51
    • N
      md/raid5: write errors should be recorded as bad blocks if possible. · bc2607f3
      NeilBrown 提交于
      When a write error is detected, don't mark the device as failed
      immediately but rather record the fact for handle_stripe to deal with.
      
      Handle_stripe then attempts to record a bad block.  Only if that fails
      does the device get marked as faulty.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      bc2607f3
    • N
      md/raid5: use bad-block log to improve handling of uncorrectable read errors. · 7f0da59b
      NeilBrown 提交于
      If we get an uncorrectable read error - record a bad block rather than
      failing the device.
      And if these errors (which may be due to known bad blocks) cause
      recovery to be impossible, record a bad block on the recovering
      devices, or abort the recovery.
      
      As we might abort a recovery without failing a device we need to teach
      RAID5 about recovery_disabled handling.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      7f0da59b
    • N
      md/raid5: avoid reading from known bad blocks. · 31c176ec
      NeilBrown 提交于
      There are two times that we might read in raid5:
      1/ when a read request fits within a chunk on a single
         working device.
         In this case, if there is any bad block in the range of
         the read, we simply fail the cache-bypass read and
         perform the read though the stripe cache.
      
      2/ when reading into the stripe cache.  In this case we
         mark as failed any device which has a bad block in that
         strip (1 page wide).
         Note that we will both avoid reading and avoid writing.
         This is correct (as we will never read from the block, there
         is no point writing), but not optimal (as writing could 'fix'
         the error) - that will be addressed later.
      
      If we have not seen any write errors on the device yet, we treat a bad
      block like a recent read error.  This will encourage an attempt to fix
      the read error which will either generate a write error, or will
      ensure good data is stored there.  We don't yet forget the bad block
      in that case.  That comes later.
      
      Now that we honour bad blocks when reading we can allow devices with
      bad blocks into the array.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      31c176ec
    • N
      md: make it easier to wait for bad blocks to be acknowledged. · de393cde
      NeilBrown 提交于
      It is only safe to choose not to write to a bad block if that bad
      block is safely recorded in metadata - i.e. if it has been
      'acknowledged'.
      
      If it hasn't we need to wait for the acknowledgement.
      
      We support that using rdev->blocked wait and
      md_wait_for_blocked_rdev by introducing a new device flag
      'BlockedBadBlock'.
      
      This flag is only advisory.
      It is cleared whenever we acknowledge a bad block, so that a waiter
      can re-check the particular bad blocks that it is interested it.
      
      It should be set by a caller when they find they need to wait.
      This (set after test) is inherently racy, but as
      md_wait_for_blocked_rdev already has a timeout, losing the race will
      have minimal impact.
      
      When we clear "Blocked" was also clear "BlockedBadBlocks" incase it
      was set incorrectly (see above race).
      
      We also modify the way we manage 'Blocked' to fit better with the new
      handling of 'BlockedBadBlocks' and to make it consistent between
      externally managed and internally managed metadata.   This requires
      that each raidXd loop checks if the metadata needs to be written and
      triggers a write (md_check_recovery) if needed.  Otherwise a queued
      write request might cause raidXd to wait for the metadata to write,
      and only that thread can write it.
      
      Before writing metadata, we set FaultRecorded for all devices that
      are Faulty, then after writing the metadata we clear Blocked for any
      device for which the Fault was certainly Recorded.
      
      The 'faulty' device flag now appears in sysfs if the device is faulty
      *or* it has unacknowledged bad blocks.  So user-space which does not
      understand bad blocks can continue to function correctly.
      User space which does, should not assume a device is faulty until it
      sees the 'faulty' flag, and then sees the list of unacknowledged bad
      blocks is empty.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      de393cde
    • N
      md: don't allow arrays to contain devices with bad blocks. · 34b343cf
      NeilBrown 提交于
      As no personality understand bad block lists yet, we must
      reject any device that is known to contain bad blocks.
      As the personalities get taught, these tests can be removed.
      
      This only applies to raid1/raid5/raid10.
      For linear/raid0/multipath/faulty the whole concept of bad blocks
      doesn't mean anything so there is no point adding the checks.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      Reviewed-by: NNamhyung Kim <namhyung@gmail.com>
      34b343cf
  2. 27 7月, 2011 13 次提交
  3. 26 7月, 2011 7 次提交
  4. 18 7月, 2011 2 次提交
  5. 14 6月, 2011 3 次提交
  6. 09 6月, 2011 1 次提交
    • J
      MD: raid5 do not set fullsync · d6b212f4
      Jonathan Brassow 提交于
      Add check to determine if a device needs full resync or if partial resync will do
      
      RAID 5 was assuming that if a device was not In_sync, it must undergo a full
      resync.  We add a check to see if 'saved_raid_disk' is the same as 'raid_disk'.
      If it is, we can safely skip the full resync and rely on the bitmap for
      partial recovery instead.  This is the legitimate purpose of 'saved_raid_disk',
      from md.h:
      int saved_raid_disk;            /* role that device used to have in the
                                       * array and could again if we did a partial
                                       * resync from the bitmap
                                       */
      Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      d6b212f4
  7. 11 5月, 2011 2 次提交
    • N
      md: allow resync_start to be set while an array is active. · b098636c
      NeilBrown 提交于
      The sysfs attribute 'resync_start' (known internally as recovery_cp),
      records where a resync is up to.  A value of 0 means the array is
      not known to be in-sync at all.  A value of MaxSector means the array
      is believed to be fully in-sync.
      
      When the size of member devices of an array (RAID1,RAID4/5/6) is
      increased, the array can be increased to match.  This process sets
      resync_start to the old end-of-device offset so that the new part of
      the array gets resynced.
      
      However with RAID1 (and RAID6) a resync is not technically necessary
      and may be undesirable.  So it would be good if the implied resync
      after the array is resized could be avoided.
      
      So: change 'resync_start' so the value can be changed while the array
      is active, and as a precaution only allow it to be changed while
      resync/recovery is 'frozen'.  Changing it once resync has started is
      not going to be useful anyway.
      
      This allows the array to be resized without a resync by:
        write 'frozen' to 'sync_action'
        write new size to 'component_size' (this will set resync_start)
        write 'none' to 'resync_start'
        write 'idle' to 'sync_action'.
      
      Also slightly improve some tests on recovery_cp when resizing
      raid1/raid5.  Now that an arbitrary value could be set we should be
      more careful in our tests.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      b098636c
    • N
      md: make error_handler functions more uniform and correct. · 6f8d0c77
      NeilBrown 提交于
      - there is no need to test_bit Faulty, as that was already done in
        md_error which is the only caller of these functions.
      - MD_CHANGE_DEVS should be set *after* faulty is set to ensure
        metadata is updated correctly.
      - spinlock should be held while updating ->degraded.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      
        
      6f8d0c77
  8. 10 5月, 2011 1 次提交
  9. 22 4月, 2011 1 次提交
  10. 20 4月, 2011 2 次提交
    • N
      md: Fix dev_sectors on takeover from raid0 to raid4/5 · 3b71bd93
      NeilBrown 提交于
      A raid0 array doesn't set 'dev_sectors' as each device might
      contribute a different number of sectors.
      So when converting to a RAID4 or RAID5 we need to set dev_sectors
      as they need the number.
      We have already verified that in fact all devices do contribute
      the same number of sectors, so use that number.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      3b71bd93
    • N
      md/raid5: remove setting of ->queue_lock · 2b7da309
      NeilBrown 提交于
      We previously needed to set ->queue_lock to match the raid5
      device_lock so we could safely use queue_flag_* operations (e.g. for
      plugging). which test the ->queue_lock is in fact locked.
      
      However that need has completely gone away and is unlikely to come
      back to remove this now-pointless setting.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      2b7da309
  11. 18 4月, 2011 2 次提交
    • N
      md: incorporate new plugging into raid5. · 7c13edc8
      NeilBrown 提交于
      In raid5 plugging is used for 2 things:
       1/ collecting writes that require a bitmap update
       2/ collecting writes in the hope that we can create full
          stripes - or at least more-full.
      
      We now release these different sets of stripes when plug_cnt
      is zero.
      
      Also in make_request, we call mddev_check_plug to hopefully increase
      plug_cnt, and wake up the thread at the end if plugging wasn't
      achieved for some reason.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      7c13edc8
    • N
      md - remove old plugging code. · 482c0834
      NeilBrown 提交于
      md has some plugging infrastructure for RAID5 to use because the
      normal plugging infrastructure required a 'request_queue', and when
      called from dm, RAID5 doesn't have one of those available.
      
      This relied on the ->unplug_fn callback which doesn't exist any more.
      
      So remove all of that code, both in md and raid5.  Subsequent patches
      with restore the plugging functionality.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      482c0834