1. 22 5月, 2012 2 次提交
    • M
      md/raid10: Fix memleak in r10buf_pool_alloc · 5fdd2cf8
      majianpeng 提交于
      If the allocation of rep1_bio fails, we currently don't free the 'bio'
      of the same dev.
      
      Reported by kmemleak.
      Signed-off-by: Nmajianpeng <majianpeng@gmail.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      5fdd2cf8
    • N
      md/raid10: add reshape support · 3ea7daa5
      NeilBrown 提交于
      A 'near' or 'offset' lay RAID10 array can be reshaped to a different
      'near' or 'offset' layout, a different chunk size, and a different
      number of devices.
      However the number of copies cannot change.
      
      Unlike RAID5/6, we do not support having user-space backup data that
      is being relocated during a 'critical section'.  Rather, the
      data_offset of each device must change so that when writing any block
      to a new location, it will not over-write any data that is still
      'live'.
      
      This means that RAID10 reshape is not supportable on v0.90 metadata.
      
      The different between the old data_offset and the new_offset must be
      at least the larger of the chunksize multiplied by offset copies of
      each of the old and new layout. (for 'near' mode, offset_copies == 1).
      
      A larger difference of around 64M seems useful for in-place reshapes
      as more data can be moved between metadata updates.
      Very large differences (e.g. 512M) seem to slow the process down due
      to lots of long seeks (on oldish consumer graded devices at least).
      
      Metadata needs to be updated whenever the place we are about to write
      to is considered - by the current metadata - to still contain data in
      the old layout.
      
      [unbalanced locking fix from Dan Carpenter <dan.carpenter@oracle.com>]
      Signed-off-by: NNeilBrown <neilb@suse.de>
      3ea7daa5
  2. 21 5月, 2012 4 次提交
    • N
      md/raid10: split out interpretation of layout to separate function. · deb200d0
      NeilBrown 提交于
      We will soon be interpreting the layout (and chunksize etc) from
      multiple places to support reshape.  So split it out into separate
      function.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      deb200d0
    • N
      md/raid10: Introduce 'prev' geometry to support reshape. · f8c9e74f
      NeilBrown 提交于
      When RAID10 supports reshape it will need a 'previous' and a 'current'
      geometry, so introduce that here.
      Use the 'prev' geometry when before the reshape_position, and the
      current 'geo' when beyond it.  At other times, use both as
      appropriate.
      
      For now, both are identical (And reshape_position is never set).
      
      When we use the 'prev' geometry, we must use the old data_offset.
      When we use the current (And a reshape is happening) we must use
      the new_data_offset.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      f8c9e74f
    • N
      md/raid10: collect some geometry fields into a dedicated structure. · 5cf00fcd
      NeilBrown 提交于
      We will shortly be adding reshape support for RAID10 which will
      require it having 2 concurrent geometries (before and after).
      To make that easier, collect most geometry fields into 'struct geom'
      and access them from there.  Then we will more easily be able to add
      a second set of fields.
      
      Note that 'copies' is not in this struct and so cannot be changed.
      There is little need to change this number and doing so is a lot
      more difficult as it requires reallocating more things.
      So leave it out for now.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      5cf00fcd
    • N
      md: add possibility to change data-offset for devices. · c6563a8c
      NeilBrown 提交于
      When reshaping we can avoid costly intermediate backup by
      changing the 'start' address of the array on the device
      (if there is enough room).
      
      So as a first step, allow such a change to be requested
      through sysfs, and recorded in v1.x metadata.
      
      (As we didn't previous check that all 'pad' fields were zero,
       we need a new FEATURE flag for this.
       A (belatedly) check that all remaining 'pad' fields are
       zero to avoid a repeat of this)
      
      The new data offset must be requested separately for each device.
      This allows each to have a different change in the data offset.
      This is not likely to be used often but as data_offset can be
      set per-device, new_data_offset should be too.
      
      This patch also removes the 'acknowledged' arg to rdev_set_badblocks as
      it is never used and never will be.  At the same time we add a new
      arg ('in_new') which is currently always zero but will be used more
      soon.
      
      When a reshape finishes we will need to update the data_offset
      and rdev->sectors.  So provide an exported function to do that.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      c6563a8c
  3. 19 5月, 2012 1 次提交
    • N
      md/raid10: fix transcription error in calc_sectors conversion. · b0d634d5
      NeilBrown 提交于
      The old code was
      		sector_div(stride, fc);
      the new code was
      		sector_dir(size, conf->near_copies);
      
      'size' is right (the stride various wasn't really needed), but
      'fc' means 'far_copies', and that is an important difference.
      
      Signed-off-by: NeilBrown <neilb@suse.de>       
      b0d634d5
  4. 17 5月, 2012 1 次提交
    • N
      md/raid10: set dev_sectors properly when resizing devices in array. · 6508fdbf
      NeilBrown 提交于
      raid10 stores dev_sectors in 'conf' separately from the one in
      'mddev' because it can have a very significant effect on block
      addressing and so need to be updated carefully.
      
      However raid10_resize isn't updating it at all!
      
      To update it correctly, we need to make sure it is a proper
      multiple of the chunksize taking various details of the layout
      in to account.
      This calculation is currently done in setup_conf.   So split it
      out from there and call it from raid10_resize as well.
      Then set conf->dev_sectors properly.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      6508fdbf
  5. 12 4月, 2012 1 次提交
  6. 03 4月, 2012 1 次提交
  7. 19 3月, 2012 5 次提交
    • N
      md/raid10 - support resizing some RAID10 arrays. · 006a09a0
      NeilBrown 提交于
      'resizing' an array in this context means making use of extra
      space that has become available in component devices, not adding new
      devices.
      It also includes shrinking the array to take up less space of
      component devices.
      
      This is not supported for array with a 'far' layout.  However
      for 'near' and 'offset' layout arrays, adding and removing space at
      the end of the devices is easy to support, and this patch provides
      that support.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      006a09a0
    • N
      md/raid10: handle merge_bvec_fn in member devices. · 050b6615
      NeilBrown 提交于
      Currently we don't honour merge_bvec_fn in member devices so if there
      is one, we force all requests to be single-page at most.
      This is not ideal.
      
      So enhance the raid10 merge_bvec_fn to check that function in children
      as well.
      
      This introduces a small problem.  There is no locking around calls
      the ->merge_bvec_fn and subsequent calls to ->make_request.  So a
      device added between these could end up getting a request which
      violates its merge_bvec_fn.
      
      Currently the best we can do is synchronize_sched().  This will work
      providing no preemption happens.  If there is preemption, we just
      have to hope that new devices are largely consistent with old devices.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      050b6615
    • N
      md: tidy up rdev_for_each usage. · dafb20fa
      NeilBrown 提交于
      md.h has an 'rdev_for_each()' macro for iterating the rdevs in an
      mddev.  However it uses the 'safe' version of list_for_each_entry,
      and so requires the extra variable, but doesn't include 'safe' in the
      name, which is useful documentation.
      
      Consequently some places use this safe version without needing it, and
      many use an explicity list_for_each entry.
      
      So:
       - rename rdev_for_each to rdev_for_each_safe
       - create a new rdev_for_each which uses the plain
         list_for_each_entry,
       - use the 'safe' version only where needed, and convert all other
         list_for_each_entry calls to use rdev_for_each.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      dafb20fa
    • N
      md/raid1,raid10: avoid deadlock during resync/recovery. · d6b42dcb
      NeilBrown 提交于
      If RAID1 or RAID10 is used under LVM or some other stacking
      block device, it is possible to enter a deadlock during
      resync or recovery.
      This can happen if the upper level block device creates
      two requests to the RAID1 or RAID10.  The first request gets
      processed, blocks recovery and queue requests for underlying
      requests in current->bio_list.  A resync request then starts
      which will wait for those requests and block new IO.
      
      But then the second request to the RAID1/10 will be attempted
      and it cannot progress until the resync request completes,
      which cannot progress until the underlying device requests complete,
      which are on a queue behind that second request.
      
      So allow that second request to proceed even though there is
      a resync request about to start.
      
      This is suitable for any -stable kernel.
      
      Cc: stable@vger.kernel.org
      Reported-by: NRay Morris <support@bettercgi.com>
      Tested-by: NRay Morris <support@bettercgi.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      d6b42dcb
    • N
      md: allow re-add to failed arrays. · dc10c643
      NeilBrown 提交于
      When an array is failed (some data inaccessible) then there is no
      point attempting to add a spare as it could not possibly be recovered.
      
      However that may be value in re-adding a recently removed device.
      e.g. if there is a write-intent-bitmap and it is clear, then access
      to the data could be restored by this action.
      
      So don't reject a re-add to a failed array for RAID10 and RAID5 (the
      only arrays  types that check for a failed array).
      Signed-off-by: NNeilBrown <neilb@suse.de>
      dc10c643
  8. 13 3月, 2012 1 次提交
  9. 06 3月, 2012 1 次提交
  10. 14 2月, 2012 1 次提交
    • N
      md/raid10: fix handling of error on last working device in array. · fae8cc5e
      NeilBrown 提交于
      If we get a read error on the last working device in a RAID10 which
      contains the target block, then we don't fail the device (which is
      good) but we don't abort retries, which is wrong.
      We end up in an infinite loop retrying the read on the one device.
      
      This patch fixes the problem in two places:
      1/ in raid10_end_read_request we don't even ask for a retry if this
         was the last usable device.  This is efficient but a little racy
         and will sometimes retry when it should not.
      
      2/ in handle_read_error we are careful to exclude any device from
         retry which we tried to mark as faulty (that might have failed if
         it was the last device).  This is race-free but less efficient.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      fae8cc5e
  11. 23 12月, 2011 11 次提交
  12. 01 11月, 2011 1 次提交
  13. 31 10月, 2011 1 次提交
    • N
      md/raid10: Fix bug when activating a hot-spare. · 7fcc7c8a
      NeilBrown 提交于
      This is a fairly serious bug in RAID10.
      
      When a RAID10 array is degraded and a hot-spare is activated, the
      spare does not take up the empty slot, but rather replaces the first
      working device.
      This is likely to make the array non-functional.   It would normally
      be possible to recover the data, but that would need care and is not
      guaranteed.
      
      This bug was introduced in commit
         2bb77736
      which first appeared in 3.1.
      
      Cc: stable@kernel.org
      Signed-off-by: NNeilBrown <neilb@suse.de>
      7fcc7c8a
  14. 26 10月, 2011 1 次提交
    • N
      md: Fix some bugs in recovery_disabled handling. · d890fa2b
      NeilBrown 提交于
      In 3.0 we changed the way recovery_disabled was handle so that instead
      of testing against zero, we test an mddev-> value against a conf->
      value.
      Two problems:
        1/ one place in raid1 was missed and still sets to '1'.
        2/ We didn't explicitly set the conf-> value at array creation
           time.
           It defaulted to '0' just like the mddev value does so they
           could appear equal and thus disable recovery.
           This did not affect normal 'md' as it calls bind_rdev_to_array
           which changes the mddev value.  However the dmraid interface
           doesn't call this and so doesn't change ->recovery_disabled; so at
           array start all recovery is incorrectly disabled.
      
      So initialise the 'conf' value to one less that the mddev value, so
      the will only be the same when explicitly set that way.
      Reported-by: NJonathan Brassow <jbrassow@redhat.com>
      Signed-off-by: NNeilBrown  <neilb@suse.de>
      d890fa2b
  15. 11 10月, 2011 8 次提交