1. 14 6月, 2013 3 次提交
    • J
      DM RAID: Fix raid_resume not reviving failed devices in all cases · a4dc163a
      Jonathan Brassow 提交于
      DM RAID:  Fix raid_resume not reviving failed devices in all cases
      
      When a device fails in a RAID array, it is marked as Faulty.  Later,
      md_check_recovery is called which (through the call chain) calls
      'hot_remove_disk' in order to have the personalities remove the device
      from use in the array.
      
      Sometimes, it is possible for the array to be suspended before the
      personalities get their chance to perform 'hot_remove_disk'.  This is
      normally not an issue.  If the array is deactivated, then the failed
      device will be noticed when the array is reinstantiated.  If the
      array is resumed and the disk is still missing, md_check_recovery will
      be called upon resume and 'hot_remove_disk' will be called at that
      time.  However, (for dm-raid) if the device has been restored,
      a resume on the array would cause it to attempt to revive the device
      by calling 'hot_add_disk'.  If 'hot_remove_disk' had not been called,
      a situation is then created where the device is thought to concurrently
      be the replacement and the device to be replaced.  Thus, the device
      is first sync'ed with the rest of the array (because it is the replacement
      device) and then marked Faulty and removed from the array (because
      it is also the device being replaced).
      
      The solution is to check and see if the device had properly been removed
      before the array was suspended.  This is done by seeing whether the
      device's 'raid_disk' field is -1 - a condition that implies that
      'md_check_recovery -> remove_and_add_spares (where raid_disk is set to -1)
      -> hot_remove_disk' has been called.  If 'raid_disk' is not -1, then
      'hot_remove_disk' must be called to complete the removal of the previously
      faulty device before it can be revived via 'hot_add_disk'.
      Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      a4dc163a
    • J
      DM RAID: Break-up untidy function · f381e71b
      Jonathan Brassow 提交于
      DM RAID:  Break-up untidy function
      
      Clean-up excessive indentation by moving some code in raid_resume()
      into its own function.
      Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      f381e71b
    • J
      DM RAID: Add ability to restore transiently failed devices on resume · 9092c02d
      Jonathan Brassow 提交于
      DM RAID: Add ability to restore transiently failed devices on resume
      
      This patch adds code to the resume function to check over the devices
      in the RAID array.  If any are found to be marked as failed and their
      superblocks can be read, an attempt is made to reintegrate them into
      the array.  This allows the user to refresh the array with a simple
      suspend and resume of the array - rather than having to load a
      completely new table, allocate and initialize all the structures and
      throw away the old instantiation.
      Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      9092c02d
  2. 24 4月, 2013 1 次提交
    • J
      DM RAID: Add message/status support for changing sync action · be83651f
      Jonathan Brassow 提交于
      DM RAID:  Add message/status support for changing sync action
      
      This patch adds a message interface to dm-raid to allow the user to more
      finely control the sync actions being performed by the MD driver.  This
      gives the user the ability to initiate "check" and "repair" (i.e. scrubbing).
      Two additional fields have been appended to the status output to provide more
      information about the type of sync action occurring and the results of those
      actions, specifically: <sync_action> and <mismatch_cnt>.  These new fields
      will always be populated.  This is essentially the device-mapper way of doing
      what MD controls through the 'sync_action' sysfs file and shows through the
      'mismatch_cnt' sysfs file.
      Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      be83651f
  3. 02 3月, 2013 2 次提交
    • A
      dm: rename request variables to bios · 55a62eef
      Alasdair G Kergon 提交于
      Use 'bio' in the name of variables and functions that deal with
      bios rather than 'request' to avoid confusion with the normal
      block layer use of 'request'.
      
      No functional changes.
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      55a62eef
    • M
      dm: fix truncated status strings · fd7c092e
      Mikulas Patocka 提交于
      Avoid returning a truncated table or status string instead of setting
      the DM_BUFFER_FULL_FLAG when the last target of a table fills the
      buffer.
      
      When processing a table or status request, the function retrieve_status
      calls ti->type->status. If ti->type->status returns non-zero,
      retrieve_status assumes that the buffer overflowed and sets
      DM_BUFFER_FULL_FLAG.
      
      However, targets don't return non-zero values from their status method
      on overflow. Most targets returns always zero.
      
      If a buffer overflow happens in a target that is not the last in the
      table, it gets noticed during the next iteration of the loop in
      retrieve_status; but if a buffer overflow happens in the last target, it
      goes unnoticed and erroneously truncated data is returned.
      
      In the current code, the targets behave in the following way:
      * dm-crypt returns -ENOMEM if there is not enough space to store the
        key, but it returns 0 on all other overflows.
      * dm-thin returns errors from the status method if a disk error happened.
        This is incorrect because retrieve_status doesn't check the error
        code, it assumes that all non-zero values mean buffer overflow.
      * all the other targets always return 0.
      
      This patch changes the ti->type->status function to return void (because
      most targets don't use the return code). Overflow is detected in
      retrieve_status: if the status method fills up the remaining space
      completely, it is assumed that buffer overflow happened.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      fd7c092e
  4. 26 2月, 2013 1 次提交
    • J
      DM RAID: Add support for MD's RAID10 "far" and "offset" algorithms · fe5d2f4a
      Jonathan Brassow 提交于
      DM RAID:  Add support for MD's RAID10 "far" and "offset" algorithms
      
      Until now, dm-raid.c only supported the "near" algorthm of MD's RAID10
      implementation.  This patch adds support for the "far" and "offset"
      algorithms, but only with the improved redundancy that is brought with
      the introduction of the 'use_far_sets' bit, which shifts copied stripes
      according to smaller sets vs the entire array.  That is, the 17th bit
      of the 'layout' variable that defines the RAID10 implementation will
      always be set.   (More information on how the 'layout' variable selects
      the RAID10 algorithm can be found in the opening comments of
      drivers/md/raid10.c.)
      Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      fe5d2f4a
  5. 24 1月, 2013 1 次提交
    • J
      DM-RAID: Fix RAID10's check for sufficient redundancy · 55ebbb59
      Jonathan Brassow 提交于
      Before attempting to activate a RAID array, it is checked for sufficient
      redundancy.  That is, we make sure that there are not too many failed
      devices - or devices specified for rebuild - to undermine our ability to
      activate the array.  The current code performs this check twice - once to
      ensure there were not too many devices specified for rebuild by the user
      ('validate_rebuild_devices') and again after possibly experiencing a failure
      to read the superblock ('analyse_superblocks').  Neither of these checks are
      sufficient.  The first check is done properly but with insufficient
      information about the possible failure state of the devices to make a good
      determination if the array can be activated.  The second check is simply
      done wrong in the case of RAID10 because it doesn't account for the
      independence of the stripes (i.e. mirror sets).  The solution is to use the
      properly written check ('validate_rebuild_devices'), but perform the check
      after the superblocks have been read and we know which devices have failed.
      This gives us one check instead of two and performs it in a location where
      it can be done right.
      
      Only RAID10 was affected and it was affected in the following ways:
      - the code did not properly catch the condition where a user specified
        a device for rebuild that already had a failed device in the same mirror
        set.  (This condition would, however, be caught at a deeper level in MD.)
      - the code triggers a false positive and denies activation when devices in
        independent mirror sets have failed - counting the failures as though they
        were all in the same set.
      
      The most likely place this error was introduced (or this patch should have
      been included) is in commit 4ec1e369 - first introduced in v3.7-rc1.
      Consequently this fix should also go in v3.7.y, however there is a
      small conflict on the .version in raid_target, so I'll submit a
      separate patch to -stable.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      55ebbb59
  6. 22 12月, 2012 2 次提交
  7. 11 10月, 2012 4 次提交
  8. 01 8月, 2012 1 次提交
  9. 27 7月, 2012 4 次提交
  10. 22 5月, 2012 4 次提交
    • J
      DM RAID: Use md_error() in place of simply setting Faulty bit · c32fb9e7
      Jonathan Brassow 提交于
      When encountering an error while reading the superblock, call md_error.
      
      We are currently setting the 'Faulty' bit on one of the array devices when an
      error is encountered while reading the superblock of a dm-raid array.  We should
      be calling md_error(), as it handles the error more completely.
      Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      c32fb9e7
    • J
      DM RAID: Record and handle missing devices · 81f382f9
      Jonathan Brassow 提交于
      Missing dm-raid devices should be recorded in the superblock
      
      When specifying the devices that compose a DM RAID array, it is possible to denote
      failed or missing devices with '-'s.  When this occurs, we must record this in the
      superblock.  We do this by checking if the array position's data device is missing
      and then forcing MD to record the superblock by setting 'MD_CHANGE_DEVS' in
      'raid_resume'.  If we do not cause the superblock to be rewritten by the resume
      function, it is possible for a stale superblock to be written by an out-going
      in-active table (during 'raid_dtr').
      Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      81f382f9
    • J
      DM RAID: Set recovery flags on resume · 47525e59
      Jonathan Brassow 提交于
      Properly initialize MD recovery flags when resuming device-mapper devices.
      
      When a device-mapper device is suspended, all I/O must stop.  This is done by
      calling 'md_stop_writes' and 'mddev_suspend'.  These calls in-turn manipulate
      the recovery flags - including setting 'MD_RECOVERY_FROZEN'.  The DM device
      may have been suspended while recovery was not yet complete, so the process
      needs to pick-up where it left off.  Since 'mddev_resume' does not unset
      'MD_RECOVERY_FROZEN' and set 'MD_RECOVERY_NEEDED', we must do it ourselves.
      'MD_RECOVERY_NEEDED' can safely be set in 'mddev_resume', but 'MD_RECOVERY_FROZEN'
      must be set outside of 'mddev_resume' due to how MD handles RAID reshaping.
      (e.g.  It is possible for a user to delay reshaping a RAID5->RAID6 by purposefully
      setting 'MD_RECOVERY_FROZEN'.  Clearing it in 'mddev_resume' would override the
      desired behavior.)
      
      Because 'mddev_resume' already unconditionally calls 'md_wakeup_thread(mddev->thread)'
      there is no need to make this call from 'raid_resume' since it calls 'mddev_resume'.
      
      Also clean up where  level_store calls mddev_resume() - it current
      duplicates some of the funcitons of that call. - NB
      Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      47525e59
    • N
      md: dm-raid should call helper function to clear rdev. · 545c8795
      NeilBrown 提交于
      dm-raid currently open-codes the freeing of some members of
      and rdev.  It is more maintainable to have it call common code
      from md.c which does this for all call-sites.
      
      So remove free_disk_sb to md_rdev_clear, export it, and use it in
      dm-raid.c
      Signed-off-by: NNeilBrown <neilb@suse.de>
      545c8795
  11. 24 4月, 2012 1 次提交
  12. 29 3月, 2012 1 次提交
    • J
      dm raid: handle failed devices during start up · 0447568f
      Jonathan E Brassow 提交于
      The dm-raid code currently fails to create a RAID array if any of the
      superblocks cannot be read.  This was an oversight as there is already
      code to handle this case if the values ('- -') were provided for the
      failed array position.
      
      With this patch, if a superblock cannot be read, the array position's
      fields are initialized as though '- -' was set in the table.  That is,
      the device is failed and the position should not be used, but if there
      is sufficient redundancy, the array should still be activated.
      Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      0447568f
  13. 19 3月, 2012 1 次提交
    • N
      md: tidy up rdev_for_each usage. · dafb20fa
      NeilBrown 提交于
      md.h has an 'rdev_for_each()' macro for iterating the rdevs in an
      mddev.  However it uses the 'safe' version of list_for_each_entry,
      and so requires the extra variable, but doesn't include 'safe' in the
      name, which is useful documentation.
      
      Consequently some places use this safe version without needing it, and
      many use an explicity list_for_each entry.
      
      So:
       - rename rdev_for_each to rdev_for_each_safe
       - create a new rdev_for_each which uses the plain
         list_for_each_entry,
       - use the 'safe' version only where needed, and convert all other
         list_for_each_entry calls to use rdev_for_each.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      dafb20fa
  14. 08 3月, 2012 2 次提交
  15. 31 1月, 2012 1 次提交
    • J
      Prevent DM RAID from loading bitmap twice. · 34f8ac6d
      Jonathan Brassow 提交于
      The life cycle of a device-mapper target is:
      1) create
      2) resume
      3) suspend
      *) possibly repeat from 2
      4) destroy
      
      The dm-raid target is unconditionally calling MD's bitmap_load function upon
      every resume.  If steps 2 & 3 above are repeated, bitmap_load is called
      multiple times.  It is only written to be called once; otherwise, it allocates
      new memory for the bitmap (without freeing the old) and incrementing the number
      of pages it thinks it has without zeroing first.  This ultimately leads to
      access beyond allocated memory and lost memory.
      
      Simply avoiding the bitmap_load call upon resume is not sufficient.  If the
      target was suspended while the initial recovery was only partially complete,
      it needs to be restarted when the target is resumed.  This is why
      'md_wakeup_thread' is called before issuing the 'mddev_resume'.
      Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      34f8ac6d
  16. 01 11月, 2011 2 次提交
  17. 11 10月, 2011 3 次提交
  18. 26 9月, 2011 1 次提交
  19. 02 8月, 2011 5 次提交