1. 13 11月, 2009 2 次提交
    • N
      md/raid5: Allow dirty-degraded arrays to be assembled when only party is degraded. · c148ffdc
      NeilBrown 提交于
      Normally is it not safe to allow a raid5 that is both dirty and
      degraded to be assembled without explicit request from that admin, as
      it can cause hidden data corruption.
      This is because 'dirty' means that the parity cannot be trusted, and
      'degraded' means that the parity needs to be used.
      
      However, if the device that is missing contains only parity, then
      there is no issue and assembly can continue.
      This particularly applies when a RAID5 is being converted to a RAID6
      and there is an unclean shutdown while the conversion is happening.
      
      So check for whether the degraded space only contains parity, and
      in that case, allow the assembly.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      c148ffdc
    • N
      Don't unconditionally set in_sync on newly added device in raid5_reshape · 7ef90146
      NeilBrown 提交于
      When a reshape finds that it can add spare devices into the array,
      those devices might already be 'in_sync' if they are beyond the old
      size of the array, or they might not if they are within the array.
      
      The first case happens when we change an N-drive RAID5 to an
      N+1-drive RAID5.
      The second happens when we convert an N-drive RAID5 to an
      N+1-drive RAID6.
      
      So set the flag more carefully.
      Also, ->recovery_offset is only meaningful when the flag is clear,
      so only set it in that case.
      
      This change needs the preceding two to ensure that the non-in_sync
      device doesn't get evicted from the array when it is stopped, in the
      case where v0.90 metadata is used.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      7ef90146
  2. 06 11月, 2009 1 次提交
    • N
      md/raid5: make sure curr_sync_completes is uptodate when reshape starts · 8dee7211
      NeilBrown 提交于
      This value is visible through sysfs and is used by mdadm
      when it manages a reshape (backing up data that is about to be
      rearranged).  So it is important that it is always correct.
      Current it does not get updated properly when a reshape
      starts which can cause problems when assembling an array
      that is in the middle of being reshaped.
      
      This is suitable for 2.6.31.y stable kernels.
      
      Cc: stable@kernel.org
      Signed-off-by: NNeilBrown <neilb@suse.de>
      8dee7211
  3. 20 10月, 2009 1 次提交
  4. 16 10月, 2009 6 次提交
    • N
      md/async: don't pass a memory pointer as a page pointer. · 5dd33c9a
      NeilBrown 提交于
      md/raid6 passes a list of 'struct page *' to the async_tx routines,
      which then either DMA map them for offload, or take the page_address
      for CPU based calculations.
      
      For RAID6 we sometime leave 'blanks' in the list of pages.
      For CPU based calcs, we want to treat theses as a page of zeros.
      For offloaded calculations, we simply don't pass a page to the
      hardware.
      
      Currently the 'blanks' are encoded as a pointer to
      raid6_empty_zero_page.  This is a 4096 byte memory region, not a
      'struct page'.  This is mostly handled correctly but is rather ugly.
      
      So change the code to pass and expect a NULL pointer for the blanks.
      When taking page_address of a page, we need to check for a NULL and
      in that case use raid6_empty_zero_page.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      5dd33c9a
    • N
      md: Fix handling of raid5 array which is being reshaped to fewer devices. · 5e5e3e78
      NeilBrown 提交于
      When a raid5 (or raid6) array is being reshaped to have fewer devices,
      conf->raid_disks is the latter and hence smaller number of devices.
      However sometimes we want to use a number which is the total number of
      currently required devices - the larger of the 'old' and 'new' sizes.
      Before we implemented reducing the number of devices, this was always
      'new' i.e. ->raid_disks.
      Now we need max(raid_disks, previous_raid_disks) in those places.
      
      This particularly affects assembling an array that was shutdown while
      in the middle of a reshape to fewer devices.
      
      md.c needs a similar fix when interpreting the md metadata.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      5e5e3e78
    • N
      md: fix problems with RAID6 calculations for DDF. · e4424fee
      NeilBrown 提交于
      Signed-off-by: NNeilBrown <neilb@suse.de>
      e4424fee
    • D
      md/raid456: downlevel multicore operations to raid_run_ops · 417b8d4a
      Dan Williams 提交于
      The percpu conversion allowed a straightforward handoff of stripe
      processing to the async subsytem that initially showed some modest gains
      (+4%).  However, this model is too simplistic and leads to stripes
      bouncing between raid5d and the async thread pool for every invocation
      of handle_stripe().  As reported by Holger this can fall into a
      pathological situation severely impacting throughput (6x performance
      loss).
      
      By downleveling the parallelism to raid_run_ops the pathological
      stripe_head bouncing is eliminated.  This version still exhibits an
      average 11% throughput loss for:
      
      	mdadm --create /dev/md0 /dev/sd[b-q] -n 16 -l 6
      	echo 1024 > /sys/block/md0/md/stripe_cache_size
      	dd if=/dev/zero of=/dev/md0 bs=1024k count=2048
      
      ...but the results are at least stable and can be used as a base for
      further multicore experimentation.
      Reported-by: NHolger Kiehl <Holger.Kiehl@dwd.de>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      417b8d4a
    • D
      md/raid5: initialize conf->device_lock earlier · f5efd45a
      Dan Williams 提交于
      Deallocating a raid5_conf_t structure requires taking 'device_lock'.
      Ensure it is initialized before it is used, i.e. initialize the lock
      before attempting any further initializations that might fail.
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      f5efd45a
    • N
      Revert "md: do not progress the resync process if the stripe was blocked" · 1442577b
      NeilBrown 提交于
      This reverts commit df10cfbc.
      
      This patch was based on a misunderstanding and risks introducing a busy-wait loop.
      So revert it.
      Acked-by: NDan Williams <dan.j.williams@intel.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      1442577b
  5. 23 9月, 2009 3 次提交
  6. 17 9月, 2009 2 次提交
  7. 11 9月, 2009 1 次提交
  8. 09 9月, 2009 1 次提交
    • D
      dmaengine: add fence support · 0403e382
      Dan Williams 提交于
      Some engines optimize operation by reading ahead in the descriptor chain
      such that descriptor2 may start execution before descriptor1 completes.
      If descriptor2 depends on the result from descriptor1 then a fence is
      required (on descriptor2) to disable this optimization.  The async_tx
      api could implicitly identify dependencies via the 'depend_tx'
      parameter, but that would constrain cases where the dependency chain
      only specifies a completion order rather than a data dependency.  So,
      provide an ASYNC_TX_FENCE to explicitly identify data dependencies.
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      0403e382
  9. 30 8月, 2009 12 次提交
  10. 13 8月, 2009 3 次提交
    • N
      md/raid5: Properly remove excess drives after shrinking a raid5/6 · 1a67dde0
      NeilBrown 提交于
      We were removing the drives, from the array, but not
      removing symlinks from /sys/.... and not marking the device
      as having been removed.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      1a67dde0
    • N
      md/raid5: make sure a reshape restarts at the correct address. · a639755c
      NeilBrown 提交于
      This "if" don't allow for the possibility that the number of devices
      doesn't change, and so sector_nr isn't set correctly in that case.
      So change '>' to '>='.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      a639755c
    • N
      md/raid5: allow new reshape modes to be restarted in the middle. · 67ac6011
      NeilBrown 提交于
      md/raid5 doesn't allow a reshape to restart if it involves writing
      over the same part of disk that it would be reading from.
      This happens at the beginning of a reshape that increases the number
      of devices, at the end of a reshape that decreases the number of
      devices, and continuously for a reshape that does not change the
      number of devices.
      
      The current code is correct for the "increase number of devices"
      case as the critical section at the start is handled by userspace
      performing a backup.
      
      It does not work for reducing the number of devices, or the
      no-change case.
      For 'reducing', we need to invert the test.  For no-change we cannot
      really be sure things will be safe, so simply require the array
      to be read-only, which is how the user-space code which carefully
      starts such arrays works.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      67ac6011
  11. 03 8月, 2009 3 次提交
    • N
      md: Use revalidate_disk to effect changes in size of device. · 449aad3e
      NeilBrown 提交于
      As revalidate_disk calls check_disk_size_change, it will cause
      any capacity change of a gendisk to be propagated to the blockdev
      inode.  So use that instead of mucking about with locks and
      i_size_write.
      
      Also add a call to revalidate_disk in do_md_run and a few other places
      where the gendisk capacity is changed.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      449aad3e
    • N
      md: allow raid5_quiesce to work properly when reshape is happening. · 64bd660b
      NeilBrown 提交于
      The ->quiesce method is not supposed to stop resync/recovery/reshape,
      just normal IO.
      But in raid5 we don't have a way to know which stripes are being
      used for normal IO and which for resync etc, so we need to wait for
      all stripes to be idle to be sure that all writes have completed.
      
      However reshape keeps at least some stripe busy for an extended period
      of time, so a call to raid5_quiesce can block for several seconds
      needlessly.
      So arrange for reshape etc to pause briefly while raid5_quiesce is
      trying to quiesce the array so that the active_stripes count can
      drop to zero.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      64bd660b
    • N
      md/raid5: set reshape_position correctly when reshape starts. · e516402c
      NeilBrown 提交于
      As the internal reshape_progress counter is the main driver
      for reshape, the fact that reshape_position sometimes starts with the
      wrong value has minimal effect.  It is visible in sysfs and that
      is all.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      e516402c
  12. 31 7月, 2009 1 次提交
  13. 15 7月, 2009 1 次提交
  14. 01 7月, 2009 3 次提交