1. 02 8月, 2011 28 次提交
  2. 28 7月, 2011 12 次提交
    • N
      md/raid10: handle further errors during fix_read_error better. · 58c54fcc
      NeilBrown 提交于
      If we find more read/write errors we should record a bad block before
      failing the device.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      58c54fcc
    • N
      md/raid10: Handle read errors during recovery better. · 5e570289
      NeilBrown 提交于
      Currently when we get a read error during recovery, we simply abort
      the recovery.
      
      Instead, repeat the read in page-sized blocks.
      On successful reads, write to the target.
      On read errors, record a bad block on the destination,
      and only if that fails do we abort the recovery.
      
      As we now retry reads we need to know where we read from.  This was in
      bi_sector but that can be changed during a read attempt.
      So store the correct from_addr and to_addr in the r10_bio for later
      access.
      
      
      Signed-off-by: NeilBrown<neilb@suse.de>
      5e570289
    • N
      md/raid10: simplify read error handling during recovery. · e684e41d
      NeilBrown 提交于
      If a read error is detected during recovery the code currently
      fails the read device.
      This isn't really necessary.  recovery_request_write will signal
      a write error to end_sync_write and it will record a write
      error on the destination device which will record a bad block
      there or kick it from the array.
      
      So just remove this call to do md_error.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      e684e41d
    • N
      md/raid10: record bad blocks due to write errors during resync/recovery. · 1a0b7cd8
      NeilBrown 提交于
      If we get a write error during resync/recovery don't fail the device
      but instead record a bad block.  If that fails we can then fail the
      device.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      1a0b7cd8
    • N
      md/raid10: attempt to fix read errors during resync/check · f84ee364
      NeilBrown 提交于
      We already attempt to fix read errors found during normal IO
      and a 'repair' process.
      It is best to try to repair them at any time they are found,
      so move a test so that during sync and check a read error will
      be corrected by over-writing with good data.
      
      If both (all) devices have known bad blocks in the sync section we
      won't try to fix even though the bad blocks might not overlap.  That
      should be considered later.
      
      Also if we hit a read error during recovery we don't try to fix it.
      It would only be possible to fix if there were at least three copies
      of data, which is not very common with RAID10.  But it should still
      be considered later.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      f84ee364
    • N
      md/raid10: Handle write errors by updating badblock log. · bd870a16
      NeilBrown 提交于
      When we get a write error (in the data area, not in metadata),
      update the badblock log rather than failing the whole device.
      
      As the write may well be many blocks, we trying writing each
      block individually and only log the ones which fail.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      bd870a16
    • N
      md/raid10: clear bad-block record when write succeeds. · 749c55e9
      NeilBrown 提交于
      If we succeed in writing to a block that was recorded as
      being bad, we clear the bad-block record.
      
      This requires some delayed handling as the bad-block-list update has
      to happen in process-context.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      749c55e9
    • N
      md/raid10: avoid writing to known bad blocks on known bad drives. · d4432c23
      NeilBrown 提交于
      Writing to known bad blocks on drives that have seen a write error
      is asking for trouble.  So try to avoid these blocks.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      d4432c23
    • N
      md/raid10 record bad blocks as needed during recovery. · e875ecea
      NeilBrown 提交于
      When recovering one or more devices, if all the good devices have
      bad blocks we should record a bad block on the device being rebuilt.
      
      If this fails, we need to abort the recovery.
      
      To ensure we don't think that we aborted later than we actually did,
      we need to move the check for MD_RECOVERY_INTR earlier in md_do_sync,
      in particular before mddev->curr_resync is updated.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      e875ecea
    • N
      md/raid10: avoid reading known bad blocks during resync/recovery. · 40c356ce
      NeilBrown 提交于
      During resync/recovery limit the size of the request to avoid
      reading into a bad block that does not start at-or-before the current
      read address.
      
      Similarly if there is a bad block at this address, don't allow the
      current request to extend beyond the end of that bad block.
      
      Now that we don't ever read from known bad blocks, it is safe to allow
      devices with those blocks into the array.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      40c356ce
    • N
      md/raid10 - avoid reading from known bad blocks - part 3 · 8dbed5ce
      NeilBrown 提交于
      When attempting to repair a read error, don't read from
      devices with a known bad block.
      
      As we are only reading PAGE_SIZE blocks, we don't try to
      narrow down to smaller regions in the hope that only part of this
      page is bad - it isn't worth the effort.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      8dbed5ce
    • N
      md/raid10: avoid reading from known bad blocks - part 2 · 7399c31b
      NeilBrown 提交于
      When redirecting a read error to a different device, we must
      again avoid bad blocks and possibly split the request.
      
      Spin_lock typo fixed thanks to Dan Carpenter <error27@gmail.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      7399c31b