1. 04 7月, 2012 1 次提交
  2. 03 7月, 2012 4 次提交
    • N
      md: fix up plugging (again). · b357f04a
      NeilBrown 提交于
      The value returned by "mddev_check_plug" is only valid until the
      next 'schedule' as that will unplug things.  This could happen at any
      call to mempool_alloc.
      So just calling mddev_check_plug at the start doesn't really make
      sense.
      
      So call it just before, or just after, queuing things for the thread.
      As the action that happens at unplug is to wake the thread, this makes
      lots of sense.
      If we cannot add a plug (which requires a small GFP_ATOMIC alloc) we
      wake thread immediately.
      
      RAID5 is a bit different.  Requests are queued for the thread and the
      thread is woken by release_stripe.  So we don't need to wake the
      thread on failure.
      However the thread doesn't perform certain actions when there is any
      active plug, so it is important to install a plug before waking the
      thread.  So for RAID5 we install the plug *before* queuing the request
      and waking the thread.
      
      Without this patch it is possible for raid1 or raid10 to queue a
      request without then waking the thread, resulting in the array locking
      up.
      
      Also change raid10 to only flush_pending_write when there are not
      active plugs, just like raid1.
      
      This patch is suitable for 3.0 or later.  I plan to submit it to
      -stable, but I'll like to let it spend a few weeks in mainline
      first to be sure it is completely safe.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      b357f04a
    • N
      md: make 'name' arg to md_register_thread non-optional. · 0232605d
      NeilBrown 提交于
      Having the 'name' arg optional and defaulting to the current
      personality name is no necessary and leads to errors, as when
      changing the level of an array we can end up using the
      name of the old level instead of the new one.
      
      So make it non-optional and always explicitly pass the name
      of the level that the array will be.
      Reported-by: Nmajianpeng <majianpeng@gmail.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      0232605d
    • N
      md/raid10: fix failure when trying to repair a read error. · 055d3747
      NeilBrown 提交于
      commit 58c54fcc
           md/raid10: handle further errors during fix_read_error better.
      
      in 3.1 added "r10_sync_page_io" which takes an IO size in sectors.
      But we were passing the IO size in bytes!!!
      This resulting in bio_add_page failing, and empty request being sent
      down, and a consequent BUG_ON in scsi_lib.
      
      [fix missing space in error message at same time]
      
      This fix is suitable for 3.1.y and later.
      
      Cc: stable@vger.kernel.org
      Reported-by: NChristian Balzer <chibi@gol.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      055d3747
    • N
      md/raid10: Don't try to recovery unmatched (and unused) chunks. · fc448a18
      NeilBrown 提交于
      If a RAID10 has an odd number of chunks - as might happen when there
      are an odd number of devices - the last chunk has no pair and so is
      not mirrored.  We don't store data there, but when recovering the last
      device in an array we retry to recover that last chunk from a
      non-existent location.  This results in an error, and the recovery
      aborts.
      
      When we get to that last chunk we should just stop - there is nothing
      more to do anyway.
      
      This bug has been present since the introduction of RAID10, so the
      patch is appropriate for any -stable kernel.
      
      Cc: stable@vger.kernel.org
      Reported-by: NChristian Balzer <chibi@gol.com>
      Tested-by: NChristian Balzer <chibi@gol.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      fc448a18
  3. 31 5月, 2012 1 次提交
    • N
      md: raid1/raid10: fix problem with merge_bvec_fn · aba336bd
      NeilBrown 提交于
      The new merge_bvec_fn which calls the corresponding function
      in subsidiary devices requires that mddev->merge_check_needed
      be set if any child has a merge_bvec_fn.
      
      However were were only setting that when a device was hot-added,
      not when a device was present from the start.
      
      This bug was introduced in 3.4 so patch is suitable for 3.4.y
      kernels.  However that are conflicts in raid10.c so a separate
      patch will be needed for 3.4.y.
      
      Cc: stable@vger.kernel.org
      Reported-by: NSebastian Riemer <sebastian.riemer@profitbricks.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      aba336bd
  4. 22 5月, 2012 5 次提交
    • N
      md/raid10: Remove extras after reshape to smaller number of devices. · 63aced61
      NeilBrown 提交于
      When a reshape which reduced the number of devices finishes
      we must remove the extra devices.
      
      So ensure  that raid10_remove_disk won't try to keep them, and
      have raid10_finish_reshape clear the 'in_sync' flag.  Then
      remove_and_add_spares will be able to remove them.
      Reported-by: NHannes Reinecke <hare@suse.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      63aced61
    • N
      md/raid10: resize bitmap when required during reshape. · bb63a701
      NeilBrown 提交于
      If a reshape changes the size of the array, then we can now
      update the bitmap to suit - so do so.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      bb63a701
    • N
      md: allow array to be resized while bitmap is present. · a4a6125a
      NeilBrown 提交于
      Now that bitmaps can be resized, we can allow an array to be resized
      while the bitmap is present.
      
      This only covers resizing that involves changing the effective size
      of member devices, not resizing that changes the number of devices.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      a4a6125a
    • M
      md/raid10: Fix memleak in r10buf_pool_alloc · 5fdd2cf8
      majianpeng 提交于
      If the allocation of rep1_bio fails, we currently don't free the 'bio'
      of the same dev.
      
      Reported by kmemleak.
      Signed-off-by: Nmajianpeng <majianpeng@gmail.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      5fdd2cf8
    • N
      md/raid10: add reshape support · 3ea7daa5
      NeilBrown 提交于
      A 'near' or 'offset' lay RAID10 array can be reshaped to a different
      'near' or 'offset' layout, a different chunk size, and a different
      number of devices.
      However the number of copies cannot change.
      
      Unlike RAID5/6, we do not support having user-space backup data that
      is being relocated during a 'critical section'.  Rather, the
      data_offset of each device must change so that when writing any block
      to a new location, it will not over-write any data that is still
      'live'.
      
      This means that RAID10 reshape is not supportable on v0.90 metadata.
      
      The different between the old data_offset and the new_offset must be
      at least the larger of the chunksize multiplied by offset copies of
      each of the old and new layout. (for 'near' mode, offset_copies == 1).
      
      A larger difference of around 64M seems useful for in-place reshapes
      as more data can be moved between metadata updates.
      Very large differences (e.g. 512M) seem to slow the process down due
      to lots of long seeks (on oldish consumer graded devices at least).
      
      Metadata needs to be updated whenever the place we are about to write
      to is considered - by the current metadata - to still contain data in
      the old layout.
      
      [unbalanced locking fix from Dan Carpenter <dan.carpenter@oracle.com>]
      Signed-off-by: NNeilBrown <neilb@suse.de>
      3ea7daa5
  5. 21 5月, 2012 4 次提交
    • N
      md/raid10: split out interpretation of layout to separate function. · deb200d0
      NeilBrown 提交于
      We will soon be interpreting the layout (and chunksize etc) from
      multiple places to support reshape.  So split it out into separate
      function.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      deb200d0
    • N
      md/raid10: Introduce 'prev' geometry to support reshape. · f8c9e74f
      NeilBrown 提交于
      When RAID10 supports reshape it will need a 'previous' and a 'current'
      geometry, so introduce that here.
      Use the 'prev' geometry when before the reshape_position, and the
      current 'geo' when beyond it.  At other times, use both as
      appropriate.
      
      For now, both are identical (And reshape_position is never set).
      
      When we use the 'prev' geometry, we must use the old data_offset.
      When we use the current (And a reshape is happening) we must use
      the new_data_offset.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      f8c9e74f
    • N
      md/raid10: collect some geometry fields into a dedicated structure. · 5cf00fcd
      NeilBrown 提交于
      We will shortly be adding reshape support for RAID10 which will
      require it having 2 concurrent geometries (before and after).
      To make that easier, collect most geometry fields into 'struct geom'
      and access them from there.  Then we will more easily be able to add
      a second set of fields.
      
      Note that 'copies' is not in this struct and so cannot be changed.
      There is little need to change this number and doing so is a lot
      more difficult as it requires reallocating more things.
      So leave it out for now.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      5cf00fcd
    • N
      md: add possibility to change data-offset for devices. · c6563a8c
      NeilBrown 提交于
      When reshaping we can avoid costly intermediate backup by
      changing the 'start' address of the array on the device
      (if there is enough room).
      
      So as a first step, allow such a change to be requested
      through sysfs, and recorded in v1.x metadata.
      
      (As we didn't previous check that all 'pad' fields were zero,
       we need a new FEATURE flag for this.
       A (belatedly) check that all remaining 'pad' fields are
       zero to avoid a repeat of this)
      
      The new data offset must be requested separately for each device.
      This allows each to have a different change in the data offset.
      This is not likely to be used often but as data_offset can be
      set per-device, new_data_offset should be too.
      
      This patch also removes the 'acknowledged' arg to rdev_set_badblocks as
      it is never used and never will be.  At the same time we add a new
      arg ('in_new') which is currently always zero but will be used more
      soon.
      
      When a reshape finishes we will need to update the data_offset
      and rdev->sectors.  So provide an exported function to do that.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      c6563a8c
  6. 19 5月, 2012 1 次提交
    • N
      md/raid10: fix transcription error in calc_sectors conversion. · b0d634d5
      NeilBrown 提交于
      The old code was
      		sector_div(stride, fc);
      the new code was
      		sector_dir(size, conf->near_copies);
      
      'size' is right (the stride various wasn't really needed), but
      'fc' means 'far_copies', and that is an important difference.
      
      Signed-off-by: NeilBrown <neilb@suse.de>       
      b0d634d5
  7. 17 5月, 2012 1 次提交
    • N
      md/raid10: set dev_sectors properly when resizing devices in array. · 6508fdbf
      NeilBrown 提交于
      raid10 stores dev_sectors in 'conf' separately from the one in
      'mddev' because it can have a very significant effect on block
      addressing and so need to be updated carefully.
      
      However raid10_resize isn't updating it at all!
      
      To update it correctly, we need to make sure it is a proper
      multiple of the chunksize taking various details of the layout
      in to account.
      This calculation is currently done in setup_conf.   So split it
      out from there and call it from raid10_resize as well.
      Then set conf->dev_sectors properly.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      6508fdbf
  8. 12 4月, 2012 1 次提交
  9. 03 4月, 2012 1 次提交
  10. 19 3月, 2012 5 次提交
    • N
      md/raid10 - support resizing some RAID10 arrays. · 006a09a0
      NeilBrown 提交于
      'resizing' an array in this context means making use of extra
      space that has become available in component devices, not adding new
      devices.
      It also includes shrinking the array to take up less space of
      component devices.
      
      This is not supported for array with a 'far' layout.  However
      for 'near' and 'offset' layout arrays, adding and removing space at
      the end of the devices is easy to support, and this patch provides
      that support.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      006a09a0
    • N
      md/raid10: handle merge_bvec_fn in member devices. · 050b6615
      NeilBrown 提交于
      Currently we don't honour merge_bvec_fn in member devices so if there
      is one, we force all requests to be single-page at most.
      This is not ideal.
      
      So enhance the raid10 merge_bvec_fn to check that function in children
      as well.
      
      This introduces a small problem.  There is no locking around calls
      the ->merge_bvec_fn and subsequent calls to ->make_request.  So a
      device added between these could end up getting a request which
      violates its merge_bvec_fn.
      
      Currently the best we can do is synchronize_sched().  This will work
      providing no preemption happens.  If there is preemption, we just
      have to hope that new devices are largely consistent with old devices.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      050b6615
    • N
      md: tidy up rdev_for_each usage. · dafb20fa
      NeilBrown 提交于
      md.h has an 'rdev_for_each()' macro for iterating the rdevs in an
      mddev.  However it uses the 'safe' version of list_for_each_entry,
      and so requires the extra variable, but doesn't include 'safe' in the
      name, which is useful documentation.
      
      Consequently some places use this safe version without needing it, and
      many use an explicity list_for_each entry.
      
      So:
       - rename rdev_for_each to rdev_for_each_safe
       - create a new rdev_for_each which uses the plain
         list_for_each_entry,
       - use the 'safe' version only where needed, and convert all other
         list_for_each_entry calls to use rdev_for_each.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      dafb20fa
    • N
      md/raid1,raid10: avoid deadlock during resync/recovery. · d6b42dcb
      NeilBrown 提交于
      If RAID1 or RAID10 is used under LVM or some other stacking
      block device, it is possible to enter a deadlock during
      resync or recovery.
      This can happen if the upper level block device creates
      two requests to the RAID1 or RAID10.  The first request gets
      processed, blocks recovery and queue requests for underlying
      requests in current->bio_list.  A resync request then starts
      which will wait for those requests and block new IO.
      
      But then the second request to the RAID1/10 will be attempted
      and it cannot progress until the resync request completes,
      which cannot progress until the underlying device requests complete,
      which are on a queue behind that second request.
      
      So allow that second request to proceed even though there is
      a resync request about to start.
      
      This is suitable for any -stable kernel.
      
      Cc: stable@vger.kernel.org
      Reported-by: NRay Morris <support@bettercgi.com>
      Tested-by: NRay Morris <support@bettercgi.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      d6b42dcb
    • N
      md: allow re-add to failed arrays. · dc10c643
      NeilBrown 提交于
      When an array is failed (some data inaccessible) then there is no
      point attempting to add a spare as it could not possibly be recovered.
      
      However that may be value in re-adding a recently removed device.
      e.g. if there is a write-intent-bitmap and it is clear, then access
      to the data could be restored by this action.
      
      So don't reject a re-add to a failed array for RAID10 and RAID5 (the
      only arrays  types that check for a failed array).
      Signed-off-by: NNeilBrown <neilb@suse.de>
      dc10c643
  11. 13 3月, 2012 1 次提交
  12. 06 3月, 2012 1 次提交
  13. 14 2月, 2012 1 次提交
    • N
      md/raid10: fix handling of error on last working device in array. · fae8cc5e
      NeilBrown 提交于
      If we get a read error on the last working device in a RAID10 which
      contains the target block, then we don't fail the device (which is
      good) but we don't abort retries, which is wrong.
      We end up in an infinite loop retrying the read on the one device.
      
      This patch fixes the problem in two places:
      1/ in raid10_end_read_request we don't even ask for a retry if this
         was the last usable device.  This is efficient but a little racy
         and will sometimes retry when it should not.
      
      2/ in handle_read_error we are careful to exclude any device from
         retry which we tried to mark as faulty (that might have failed if
         it was the last device).  This is race-free but less efficient.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      fae8cc5e
  14. 23 12月, 2011 11 次提交
  15. 01 11月, 2011 1 次提交
  16. 31 10月, 2011 1 次提交
    • N
      md/raid10: Fix bug when activating a hot-spare. · 7fcc7c8a
      NeilBrown 提交于
      This is a fairly serious bug in RAID10.
      
      When a RAID10 array is degraded and a hot-spare is activated, the
      spare does not take up the empty slot, but rather replaces the first
      working device.
      This is likely to make the array non-functional.   It would normally
      be possible to recover the data, but that would need care and is not
      guaranteed.
      
      This bug was introduced in commit
         2bb77736
      which first appeared in 3.1.
      
      Cc: stable@kernel.org
      Signed-off-by: NNeilBrown <neilb@suse.de>
      7fcc7c8a