1. 11 5月, 2011 16 次提交
    • N
      md: allow resync_start to be set while an array is active. · b098636c
      NeilBrown 提交于
      The sysfs attribute 'resync_start' (known internally as recovery_cp),
      records where a resync is up to.  A value of 0 means the array is
      not known to be in-sync at all.  A value of MaxSector means the array
      is believed to be fully in-sync.
      
      When the size of member devices of an array (RAID1,RAID4/5/6) is
      increased, the array can be increased to match.  This process sets
      resync_start to the old end-of-device offset so that the new part of
      the array gets resynced.
      
      However with RAID1 (and RAID6) a resync is not technically necessary
      and may be undesirable.  So it would be good if the implied resync
      after the array is resized could be avoided.
      
      So: change 'resync_start' so the value can be changed while the array
      is active, and as a precaution only allow it to be changed while
      resync/recovery is 'frozen'.  Changing it once resync has started is
      not going to be useful anyway.
      
      This allows the array to be resized without a resync by:
        write 'frozen' to 'sync_action'
        write new size to 'component_size' (this will set resync_start)
        write 'none' to 'resync_start'
        write 'idle' to 'sync_action'.
      
      Also slightly improve some tests on recovery_cp when resizing
      raid1/raid5.  Now that an arbitrary value could be set we should be
      more careful in our tests.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      b098636c
    • N
      md/raid10: reformat some loops with less indenting. · ab9d47e9
      NeilBrown 提交于
      When a loop ends with an 'if' with a large body, it is neater
      to make the if 'continue' on the inverse condition, and then
      the body is indented less.
      
      Apply this pattern 3 times, and wrap some other long lines.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      ab9d47e9
    • N
      md/raid10: remove unused variable. · f17ed07c
      NeilBrown 提交于
      This variable 'disk' is never used - how odd.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      f17ed07c
    • N
      md/raid10: make more use of 'slot' in raid10d. · a8830bca
      NeilBrown 提交于
      Now that we have a 'slot' variable, make better use of it to simplify
      some code a little.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      a8830bca
    • N
      md/raid10: some tidying up in fix_read_error · 7c4e06ff
      NeilBrown 提交于
      Currently the rdev on which a read error happened could be removed
      before we perform the fix_error handling.  This requires extra tests
      for NULL.
      
      So delay the rdev_dec_pending call until after the call to
      fix_read_error so that we can be sure that the rdev still exists.
      
      This allows an 'if' clause to be removed so the body gets re-indented
      back one level.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      7c4e06ff
    • N
      md/raid1: improve handling of pages allocated for write-behind. · af6d7b76
      NeilBrown 提交于
      The current handling and freeing of these pages is a bit fragile.
      We only keep the list of allocated pages in each bio, so we need to
      still have a valid bio when freeing the pages, which is a bit clumsy.
      
      So simply store the allocated page list in the r1_bio so it can easily
      be found and freed when we are finished with the r1_bio.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      af6d7b76
    • N
      md/raid1: try fix_sync_read_error before process_checks. · 7ca78d57
      NeilBrown 提交于
      If we get a read error during resync/recovery we current repeat with
      single-page reads to find out just where the error is, and possibly
      read each page from a different device.
      
      With check/repair we don't currently do that, we just fail.
      However it is possible that while all devices fail on the large 64K
      read, we might be able to satisfy each 4K from one device or another.
      
      So call fix_sync_read_error before process_checks to maximise the
      chance of finding good data and writing it out to the devices with
      read errors.
      
      For this to work, we need to set the 'uptodate' flags properly after
      fix_sync_read_error has succeeded.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      7ca78d57
    • N
      md/raid1: tidy up new functions: process_checks and fix_sync_read_error. · 78d7f5f7
      NeilBrown 提交于
      These changes are mostly cosmetic:
      
      1/ change mddev->raid_disks to conf->raid_disks because the later is
         technically safer, though in current practice it doesn't matter in
         this particular context.
      2/ Rearrange two for / if loops to have an early 'continue' so the
         body of the 'if' doesn't need to be indented so much.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      78d7f5f7
    • N
      md/raid1: split out two sub-functions from sync_request_write · a68e5870
      NeilBrown 提交于
      sync_request_write is too big and too deep.
      So split out two self-contains bits of functionality into separate
      function.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      a68e5870
    • N
      md: make error_handler functions more uniform and correct. · 6f8d0c77
      NeilBrown 提交于
      - there is no need to test_bit Faulty, as that was already done in
        md_error which is the only caller of these functions.
      - MD_CHANGE_DEVS should be set *after* faulty is set to ensure
        metadata is updated correctly.
      - spinlock should be held while updating ->degraded.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      
        
      6f8d0c77
    • N
      md/multipath: discard ->working_disks in favour of ->degraded · 92f861a7
      NeilBrown 提交于
      conf->working_disks duplicates information already available
      in mddev->degraded.
      So remove working_disks.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      92f861a7
    • N
      md/raid1: clean up read_balance. · 76073054
      NeilBrown 提交于
      read_balance has two loops which both look for a 'best'
      device based on slightly different criteria.
      This is clumsy and makes is hard to add extra criteria.
      
      So replace it all with a single loop that combines everything.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      76073054
    • N
      md: simplify raid10 read_balance · 56d99121
      NeilBrown 提交于
      raid10 read balance has two different loop for looking through
      possible devices to chose the best.
      Collapse those into one loop and generally make the code more
      readable.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      56d99121
    • N
      md/bitmap: fix saving of events_cleared and other state. · 8258c532
      NeilBrown 提交于
      If a bitmap is found to be 'stale' the events_cleared value
      is set to match 'events'.
      However if the array is degraded this does not get stored on disk.
      This can subsequently lead to incorrect behaviour.
      
      So change bitmap_update_sb to always update events_cleared in the
      superblock from the known events_cleared.
      For neatness also set ->state from ->flags.
      This requires updating ->state whenever we update ->flags, which makes
      sense anyway.
      
      This is suitable for any active -stable release.
      
      cc: stable@kernel.org
      Signed-off-by: NNeilBrown <neilb@suse.de>
      8258c532
    • N
      md: reject a re-add request that cannot be honoured. · bedd86b7
      NeilBrown 提交于
      The 'add_new_disk' ioctl can be used to add a device either as a
      spare, or as an active disk that just needs to be resynced based on
      write-intent-bitmap information (re-add)
      
      Currently if a re-add is requested but fails we add as a spare
      instead.  This makes it impossible for user-space to check for
      failure.
      
      So change to require that a re-add attempt will either succeed or
      completely fail.  User-space can then decide what to do next.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      bedd86b7
    • N
      md: Fix race when creating a new md device. · b0140891
      NeilBrown 提交于
      There is a race when creating an md device by opening /dev/mdXX.
      
      If two processes do this at much the same time they will follow the
      call path
        __blkdev_get -> get_gendisk -> kobj_lookup
      
      The first will call
        -> md_probe -> md_alloc -> add_disk -> blk_register_region
      
      and the race happens when the second gets to kobj_lookup after
      add_disk has called blk_register_region but before it returns to
      md_alloc.
      
      In the case the second will not call md_probe (as the probe is already
      done) but will get a handle on the gendisk, return to __blkdev_get
      which will then call md_open (via the ->open) pointer.
      
      As mddev->gendisk hasn't been set yet, md_open will think something is
      wrong an return with ERESTARTSYS.
      
      This can loop endlessly while the first thread makes no progress
      through add_disk.  Nothing is blocking it, but due to scheduler
      behaviour it doesn't get a turn.
      So this is essentially a live-lock.
      
      We fix this by simply moving the assignment to mddev->gendisk before
      the call the add_disk() so md_open doesn't get confused.
      Also move blk_queue_flush earlier because add_disk should be as late
      as possible.
      
      To make sure that md_open doesn't complete until md_alloc has done all
      that is needed, we take mddev->open_mutex during the last part of
      md_alloc.  md_open will wait for this.
      
      This can cause a lock-up on boot so Cc:ing for stable.
      For 2.6.36 and earlier a different patch will be needed as the
      'blk_queue_flush' call isn't there.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      Reported-by: NThomas Jarosch <thomas.jarosch@intra2net.com>
      Tested-by: NThomas Jarosch <thomas.jarosch@intra2net.com>
      Cc: stable@kernel.org
      b0140891
  2. 22 4月, 2011 1 次提交
  3. 20 4月, 2011 3 次提交
  4. 18 4月, 2011 6 次提交
    • N
      md: fix up raid1/raid10 unplugging. · c3b328ac
      NeilBrown 提交于
      We just need to make sure that an unplug event wakes up the md
      thread, which is exactly what mddev_check_plugged does.
      
      Also remove some plug-related code that is no longer needed.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      c3b328ac
    • N
      md: incorporate new plugging into raid5. · 7c13edc8
      NeilBrown 提交于
      In raid5 plugging is used for 2 things:
       1/ collecting writes that require a bitmap update
       2/ collecting writes in the hope that we can create full
          stripes - or at least more-full.
      
      We now release these different sets of stripes when plug_cnt
      is zero.
      
      Also in make_request, we call mddev_check_plug to hopefully increase
      plug_cnt, and wake up the thread at the end if plugging wasn't
      achieved for some reason.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      7c13edc8
    • N
      md: provide generic support for handling unplug callbacks. · 97658cdd
      NeilBrown 提交于
      When an md device adds a request to a queue, it can call
      mddev_check_plugged.
      If this succeeds then we know that the md thread will be woken up
      shortly, and ->plug_cnt will be non-zero until then, so some
      processing can be delayed.
      
      If it fails, then no unplug callback is expected and the make_request
      function needs to do whatever is required to make the request happen.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      97658cdd
    • N
      md - remove old plugging code. · 482c0834
      NeilBrown 提交于
      md has some plugging infrastructure for RAID5 to use because the
      normal plugging infrastructure required a 'request_queue', and when
      called from dm, RAID5 doesn't have one of those available.
      
      This relied on the ->unplug_fn callback which doesn't exist any more.
      
      So remove all of that code, both in md and raid5.  Subsequent patches
      with restore the plugging functionality.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      482c0834
    • N
      md/dm - remove remains of plug_fn callback. · af1db72d
      NeilBrown 提交于
      Now that unplugging is done differently, the unplug_fn callback is
      never called, so it can be completely discarded.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      af1db72d
    • N
      md: use new plugging interface for RAID IO. · e1dfa0a2
      NeilBrown 提交于
      md/raid submits a lot of IO from the various raid threads.
      So adding start/finish plug calls to those so that some
      plugging happens.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      e1dfa0a2
  5. 06 4月, 2011 1 次提交
    • M
      dm: improve block integrity support · a63a5cf8
      Mike Snitzer 提交于
      The current block integrity (DIF/DIX) support in DM is verifying that
      all devices' integrity profiles match during DM device resume (which
      is past the point of no return).  To some degree that is unavoidable
      (stacked DM devices force this late checking).  But for most DM
      devices (which aren't stacking on other DM devices) the ideal time to
      verify all integrity profiles match is during table load.
      
      Introduce the notion of an "initialized" integrity profile: a profile
      that was blk_integrity_register()'d with a non-NULL 'blk_integrity'
      template.  Add blk_integrity_is_initialized() to allow checking if a
      profile was initialized.
      
      Update DM integrity support to:
      - check all devices with _initialized_ integrity profiles match
        during table load; uninitialized profiles (e.g. for underlying DM
        device(s) of a stacked DM device) are ignored.
      - disallow a table load that would result in an integrity profile that
        conflicts with a DM device's existing (in-use) integrity profile
      - avoid clearing an existing integrity profile
      - validate all integrity profiles match during resume; but if they
        don't all we can do is report the mismatch (during resume we're past
        the point of no return)
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Cc: Martin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      a63a5cf8
  6. 31 3月, 2011 1 次提交
  7. 29 3月, 2011 1 次提交
  8. 24 3月, 2011 10 次提交
  9. 22 3月, 2011 1 次提交