1. 23 9月, 2009 3 次提交
  2. 11 9月, 2009 1 次提交
  3. 13 8月, 2009 3 次提交
    • N
      md/raid5: Properly remove excess drives after shrinking a raid5/6 · 1a67dde0
      NeilBrown 提交于
      We were removing the drives, from the array, but not
      removing symlinks from /sys/.... and not marking the device
      as having been removed.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      1a67dde0
    • N
      md/raid5: make sure a reshape restarts at the correct address. · a639755c
      NeilBrown 提交于
      This "if" don't allow for the possibility that the number of devices
      doesn't change, and so sector_nr isn't set correctly in that case.
      So change '>' to '>='.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      a639755c
    • N
      md/raid5: allow new reshape modes to be restarted in the middle. · 67ac6011
      NeilBrown 提交于
      md/raid5 doesn't allow a reshape to restart if it involves writing
      over the same part of disk that it would be reading from.
      This happens at the beginning of a reshape that increases the number
      of devices, at the end of a reshape that decreases the number of
      devices, and continuously for a reshape that does not change the
      number of devices.
      
      The current code is correct for the "increase number of devices"
      case as the critical section at the start is handled by userspace
      performing a backup.
      
      It does not work for reducing the number of devices, or the
      no-change case.
      For 'reducing', we need to invert the test.  For no-change we cannot
      really be sure things will be safe, so simply require the array
      to be read-only, which is how the user-space code which carefully
      starts such arrays works.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      67ac6011
  4. 03 8月, 2009 3 次提交
    • N
      md: Use revalidate_disk to effect changes in size of device. · 449aad3e
      NeilBrown 提交于
      As revalidate_disk calls check_disk_size_change, it will cause
      any capacity change of a gendisk to be propagated to the blockdev
      inode.  So use that instead of mucking about with locks and
      i_size_write.
      
      Also add a call to revalidate_disk in do_md_run and a few other places
      where the gendisk capacity is changed.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      449aad3e
    • N
      md: allow raid5_quiesce to work properly when reshape is happening. · 64bd660b
      NeilBrown 提交于
      The ->quiesce method is not supposed to stop resync/recovery/reshape,
      just normal IO.
      But in raid5 we don't have a way to know which stripes are being
      used for normal IO and which for resync etc, so we need to wait for
      all stripes to be idle to be sure that all writes have completed.
      
      However reshape keeps at least some stripe busy for an extended period
      of time, so a call to raid5_quiesce can block for several seconds
      needlessly.
      So arrange for reshape etc to pause briefly while raid5_quiesce is
      trying to quiesce the array so that the active_stripes count can
      drop to zero.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      64bd660b
    • N
      md/raid5: set reshape_position correctly when reshape starts. · e516402c
      NeilBrown 提交于
      As the internal reshape_progress counter is the main driver
      for reshape, the fact that reshape_position sometimes starts with the
      wrong value has minimal effect.  It is visible in sysfs and that
      is all.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      e516402c
  5. 31 7月, 2009 1 次提交
  6. 01 7月, 2009 3 次提交
  7. 18 6月, 2009 11 次提交
  8. 16 6月, 2009 2 次提交
  9. 09 6月, 2009 3 次提交
    • N
      md/raid5: fix bug in reshape code when chunk_size decreases. · 0e6e0271
      NeilBrown 提交于
      Now that we support changing the chunksize, we calculate
      "reshape_sectors" to be the max of number of sectors in old
      and new chunk size.
      However there is one please where we still use 'chunksize'
      rather than 'reshape_sectors'.
      This causes a reshape that reduces the size of chunks to freeze.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      0e6e0271
    • N
      md/raid5 - avoid deadlocks in get_active_stripe during reshape · a8c906ca
      NeilBrown 提交于
      md has functionality to 'quiesce' and array so that all pending
      IO completed and no new IO starts.  This is used to achieve a
      stable state before making internal changes.
      
      Currently this quiescing applies equally to normal IO, resync
      IO, and reshape IO.
      However there is a problem with applying it to reshape IO.
      Reshape can have multiple 'stripe_heads' that must be active together.
      If the quiesce come between allocating the first and the last of
      such a collection, then we deadlock, as the last will not be allocated
      until the quiesce is lifted, the quiesce will not be lifted until the
      first (which has been allocated) gets used, and that first cannot be
      used until the last is allocated.
      
      It is not necessary to inhibit reshape IO when a quiesce is
      requested.  Those places in the code that require a full quiesce will
      ensure the reshape thread is not running at all.
      
      So allow reshape requests to get access to new stripe_heads without
      being blocked by a 'quiesce'.
      
      This only affects in-place reshapes (i.e. where the array does not
      grow or shrink) and these are only newly supported.  So this patch is
      not needed in earlier kernels.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      a8c906ca
    • N
      md/raid5: use conf->raid_disks in preference to mddev->raid_disk · f001a70c
      NeilBrown 提交于
      mddev->raid_disks can be changed and any time by a request from
      user-space.  It is a suggestion as to what number of raid_disks is
      desired.
      
      conf->raid_disks can only be changed by the raid5 module with suitable
      locks in place.  It is a statement as to the current number of
      raid_disks.
      
      There are two places where the latter should be used, but the former
      is used.  This can lead to a crash when reshaping an array.
      
      This patch changes to mddev-> to conf->
      Signed-off-by: NNeilBrown <neilb@suse.de>
      f001a70c
  10. 27 5月, 2009 1 次提交
  11. 26 5月, 2009 1 次提交
  12. 23 5月, 2009 1 次提交
  13. 17 4月, 2009 1 次提交
    • N
      md: update sync_completed and reshape_position even more often. · c03f6a19
      NeilBrown 提交于
      There are circumstances when a user-space process might need to
      "oversee" a resync/reshape process.  For example when doing an
      in-place reshape of a raid5, it is prudent to take a backup of each
      section before reshaping it as this is the only way to provide
      safety against an unplanned shutdown (i.e. crash/power failure).
      
      The sync_max sysfs value can be used to stop the resync from
      advancing beyond a particular point.
      So user-space can:
        suspend IO to the first section and back it up
        set 'sync_max' to the end of the section
        wait for 'sync_completed' to reach that point
        resume IO on the first section and move on to the next section.
      
      However this process requires the kernel and user-space to run in
      lock-step which could introduce unnecessary delays.
      
      It would be better if a 'double buffered' approach could be used with
      userspace and kernel space working on different sections with the
      'next' section always ready when the 'current' section is finished.
      
      One problem with implementing this is that sync_completed is only
      guaranteed to be updated when the sync process reaches sync_max.
      (it is updated on a time basis at other times, but it is hard to rely
      on that).  This defeats some of the double buffering.
      
      With this patch, sync_completed (and reshape_position) get updated as
      the current position approaches sync_max, so there is room for
      userspace to advance sync_max early without losing updates.
      
      To be precise, sync_completed is updated when the current sync
      position reaches half way between the current value of sync_completed
      and the value of sync_max.  This will usually be a good time for user
      space to update sync_max.
      
      If sync_max does not get updated, the updates to sync_completed
      (together with associated metadata updates) will occur at an
      exponentially increasing frequency which will get unreasonably fast
      (one update every page) immediately before the process hits sync_max
      and stops.  So the update rate will be unreasonably fast only for an
      insignificant period of time.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      c03f6a19
  14. 14 4月, 2009 1 次提交
    • N
      md: improve usefulness and accuracy of sysfs file md/sync_completed. · acb180b0
      NeilBrown 提交于
      The sync_completed file reports how much of a resync (or recovery or
      reshape) has been completed.
      However due to the possibility of out-of-order completion of writes,
      it is not certain to be accurate.
      
      We have an internal value - mddev->curr_resync_completed - which is an
      accurate value (though it might not always be quite so uptodate).
      
      So:
       - make curr_resync_completed be uptodate a little more often,
         particularly when raid5 reshape updates status in the metadata
       - report curr_resync_completed in the sysfs file
       - allow poll/select to report all updates to md/sync_completed.
      
      This makes sync_completed completed usable by any external metadata
      handler that wants to record this status information in its metadata.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      acb180b0
  15. 31 3月, 2009 5 次提交
    • N
      md/raid5 revise rules for when to update metadata during reshape · c8f517c4
      NeilBrown 提交于
      We currently update the metadata :
       1/ every 3Megabytes
       2/ When the place we will write new-layout data to is recorded in
          the metadata as still containing old-layout data.
      
      Rule one exists to avoid having to re-do too much reshaping in the
      face of a crash/restart.  So it should really be time based rather
      than size based.  So change it to "every 10 seconds".
      
      Rule two turns out to be too harsh when restriping an array
      'in-place', as in that case the metadata much be updates for every
      stripe.
      For the in-place update, it can only possibly be safe from a crash if
      some user-space program data a backup of every e.g. few hundred
      stripes before allowing them to be reshaped.  In that case, the
      constant metadata update is pointless.
      So only update the metadata if the new metadata will report that the
      end of the 'old-layout' data is beyond where we are currently
      writing 'new-layout' data.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      c8f517c4
    • N
      md/raid5: minor code cleanups in make_request. · b0f9ec04
      NeilBrown 提交于
      ... and to be certain the that make_request doesn't wait forever,
      add a 'wake_up' when ->reshape_progress has been set to MaxSector
      Signed-off-by: NNeilBrown <neilb@suse.de>
      b0f9ec04
    • N
      md: remove CONFIG_MD_RAID_RESHAPE config option. · 2cffc4a0
      NeilBrown 提交于
      This was only needed when the code was experimental.  Most of it
      is well tested now, so the option is no longer useful.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      2cffc4a0
    • N
      md/raid5: be more careful about write ordering when reshaping. · ab69ae12
      NeilBrown 提交于
      When we are reshaping an array, it is very important that we read
      the data from a particular sector offset before writing new data
      at that offset.
      
      In most cases when growing or shrinking an array we read long before
      we even consider writing.  But when restriping an array without
      changing it size, there is a small possibility that we might have
      some data to available write before the read has happened at the same
      location.  This would require some stripes to be in cache already.
      
      To guard against this small possibility, we check, before writing,
      that the 'old' stripe at the same location is not in the process of
      being read.  And we ensure that we mark all 'source' stripes as such
      before allowing new 'destination' stripes to proceed.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      ab69ae12
    • N
      md/raid5: allow layout and chunksize to be changed on active array. · 88ce4930
      NeilBrown 提交于
      If an array has 3 or more devices, we allow the chunksize or layout
      to be changed and when a reshape starts, we use these as the 'new'
      values.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      88ce4930