1. 08 8月, 2010 8 次提交
  2. 07 8月, 2010 1 次提交
  3. 26 7月, 2010 16 次提交
    • N
      md/bitmap: separate out loading a bitmap from initialising the structures. · 69e51b44
      NeilBrown 提交于
      dm makes this distinction between ->ctr and ->resume, so we need to
      too.
      
      Also get the new bitmap_load to clear out the bitmap first, as this is
      most consistent with the dm suspend/resume approach
      Signed-off-by: NNeilBrown <neilb@suse.de>
      69e51b44
    • N
      md/bitmap: prepare for storing write-intent-bitmap via dm-dirty-log. · e384e585
      NeilBrown 提交于
      This allows md/raid5 to fully work as a dm target.
      
      Normally md uses a 'filemap' which contains a list of pages of bits
      each of which may be written separately.
      dm-log uses and all-or-nothing approach to writing the log, so
      when using a dm-log, ->filemap is NULL and the flags normally stored
      in filemap_attr are stored in ->logattrs instead.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      e384e585
    • N
      md/bitmap: optimise scanning of empty bitmaps. · ef425673
      NeilBrown 提交于
      A bitmap is stored as one page per 2048 bits.
      If none of the bits are set, the page is not allocated.
      
      When bitmap_get_counter finds that a page isn't allocate,
      it just reports that one bit work of space isn't flagged,
      rather than reporting that 2048 bits worth of space are
      unflagged.
      This can cause searches for flagged bits (e.g. bitmap_close_sync)
      to do more work than is really necessary.
      
      So change bitmap_get_counter (when creating) to report a number of
      blocks that more accurately reports the range of the device for which
      no counter currently exists.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      ef425673
    • N
      md/bitmap: clean up plugging calls. · b63d7c2e
      NeilBrown 提交于
      1/ use md_unplug in bitmap.c as we will soon be using bitmaps under
        arrays with no queue attached.
      
      2/ Don't bother plugging the queue when we set a bit in the bitmap.
         The reason for this was to encourage as many bits as possible to
         get set before we unplug and write stuff out.
         However every personality already plugs the queue after
         bitmap_startwrite either directly (raid1/raid10) or be setting
         STRIPE_BIT_DELAY which causes the queue to be plugged later
         (raid5).
      Signed-off-by: NNeilBrown <neilb@suse.de>
      b63d7c2e
    • N
      md/bitmap: reduce dependence on sysfs. · 5ff5afff
      NeilBrown 提交于
      For dm-raid45 we will want to use bitmaps in dm-targets which don't
      have entries in sysfs, so cope with the mddev not living in sysfs.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      5ff5afff
    • N
      md/bitmap: white space clean up and similar. · ac2f40be
      NeilBrown 提交于
      Fixes some whitespace problems
      Fixed some checkpatch.pl complaints.
      Replaced kmalloc ... memset(0), with kzalloc
      Fixed an unlikely memory leak on an error path.
      Reformatted a number of 'if/else' sets, sometimes
      replacing goto with an else clause.
      Removed some old comments and commented-out code.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      ac2f40be
    • N
      md/raid5: export raid5 unplugging interface. · 9f7c2220
      NeilBrown 提交于
      Also remove remaining accesses to ->queue and ->gendisk when ->queue
      is NULL (As it is in a DM target).
      Signed-off-by: NNeilBrown <neilb@suse.de>
      9f7c2220
    • N
      md/plug: optionally use plugger to unplug an array during resync/recovery. · 252ac522
      NeilBrown 提交于
      If an array doesn't have a 'queue' then md_do_sync cannot
      unplug it.
      In that case it will have a 'plugger', so make that available
      to the mddev, and use it to unplug the array if needed.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      252ac522
    • N
      md/raid5: add simple plugging infrastructure. · 2ac87401
      NeilBrown 提交于
      md/raid5 uses the plugging infrastructure provided by the block layer
      and 'struct request_queue'.  However when we plug raid5 under dm there
      is no request queue so we cannot use that.
      
      So create a similar infrastructure that is much lighter weight and use
      it for raid5.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      2ac87401
    • N
      md/raid5: export is_congested test · 11d8a6e3
      NeilBrown 提交于
      the dm module will need this for dm-raid45.
      
      Also only access ->queue->backing_dev_info->congested_fn
      if ->queue actually exists.  It won't in a dm target.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      11d8a6e3
    • N
      raid5: Don't set read-ahead when there is no queue · 4a5add49
      NeilBrown 提交于
      dm-raid456 does not provide a 'queue' for raid5 to use,
      so we must make raid5 stop depending on the queue.
      
      First: read_ahead
      dm handles read-ahead adjustment fully in userspace, so
      simply don't do any readahead adjustments if there is
      no queue.
      
      Also re-arrange code slightly so all the accesses to ->queue are
      together.
      
      Finally, move the blk_queue_merge_bvec function into the 'if' as
      the ->split_io setting in dm-raid456 has the same effect.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      4a5add49
    • N
      md: add support for raising dm events. · 768a418d
      NeilBrown 提交于
      dm uses scheduled work to raise events to user-space.
      So allow md device to have work_structs and schedule them on an error.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      768a418d
    • N
      md: export various start/stop interfaces · 390ee602
      NeilBrown 提交于
      export entry points for starting and stopping md arrays.
      This will be used by a module to make md/raid5 work under
      dm.
      Also stop calling md_stop_writes from md_stop, as that won't
      work well with dm - it will want to call the two separately.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      390ee602
    • N
      md: split out md_rdev_init · e8bb9a83
      NeilBrown 提交于
      This functionality will be needed separately in a subsequent patch, so
      split it into it's own exported function.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      e8bb9a83
    • N
      md: be more careful setting MD_CHANGE_CLEAN · 676e42d8
      NeilBrown 提交于
      When MD_CHANGE_CLEAN is set we might block in md_write_start.
      So we should only set it when fairly sure that something will clear
      it.
      
      There are two places where it is set so as to encourage a metadata
      update to record the progress of resync/recovery.  This should only
      be done if the internal metadata update mechanisms are in use, which
      can be tested by by inspecting '->persistent'.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      676e42d8
    • N
      md/raid5: ensure we create a unique name for kmem_cache when mddev has no gendisk · f4be6b43
      NeilBrown 提交于
      We will shortly allow md devices with no gendisk (they are attached to
      a dm-target instead).  That will cause mdname() to return 'mdX'.
      There is one place where mdname really needs to be unique: when
      creating the name for a slab cache.
      So in that case, if there is no gendisk, you the address of the mddev
      formatted in HEX to provide a unique name.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      f4be6b43
  4. 21 7月, 2010 2 次提交
  5. 24 6月, 2010 12 次提交
    • N
      md/raid5: don't include 'spare' drives when reshaping to fewer devices. · 3424bf6a
      NeilBrown 提交于
      There are few situations where it would make any sense to add a spare
      when reducing the number of devices in an array, but it is
      conceivable:  A 6 drive RAID6 with two missing devices could be
      reshaped to a 5 drive RAID6, and a spare could become available
      just in time for the reshape, but not early enough to have been
      recovered first.  'freezing' recovery can make this easy to
      do without any races.
      
      However doing such a thing is a bad idea.  md will not record the
      partially-recovered state of the 'spare' and when the reshape
      finished it will think that the spare is still spare.
      Easiest way to avoid this confusion is to simply disallow it.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      3424bf6a
    • N
      md/raid5: add a missing 'continue' in a loop. · 2f115882
      NeilBrown 提交于
      As the comment says, the tail of this loop only applies to devices
      that are not fully in sync, so if In_sync was set, we should avoid
      the rest of the loop.
      
      This bug will hardly ever cause an actual problem.  The worst it
      can do is allow an array to be assembled that is dirty and degraded,
      which is not generally a good idea (without warning the sysadmin
      first).
      
      This will only happen if the array is RAID4 or a RAID5/6 in an
      intermediate state during a reshape and so has one drive that is
      all 'parity' - no data - while some other device has failed.
      
      This is certainly possible, but not at all common.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      2f115882
    • N
      md/raid5: Allow recovered part of partially recovered devices to be in-sync · 415e72d0
      NeilBrown 提交于
      During a recovery of reshape the early part of some devices might be
      in-sync while the later parts are not.
      We we know we are looking at an early part it is good to treat that
      part as in-sync for stripe calculations.
      
      This is particularly important for a reshape which suffers device
      failure.  Treating the data as in-sync can mean the difference between
      data-safety and data-loss.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      415e72d0
    • N
      md/raid5: More careful check for "has array failed". · 674806d6
      NeilBrown 提交于
      When we are reshaping an array, the device failure combinations
      that cause us to decide that the array as failed are more subtle.
      
      In particular, any 'spare' will be fully in-sync in the section
      of the array that has already been reshaped, thus failures that
      affect only that section are less critical.
      
      So encode this subtlety in a new function and call it as appropriate.
      
      The case that showed this problem was a 4 drive RAID5 to 8 drive RAID6
      conversion where the last two devices failed.
      This resulted in:
      
        good good good good incomplete good good failed failed
      
      while converting a 5-drive RAID6 to 8 drive RAID5
      The incomplete device causes the whole array to look bad,
      bad as it was actually good for the section that had been
      converted to 8-drives, all the data was actually safe.
      Reported-by: NTerry Morris <tbmorris@tbmorris.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      674806d6
    • N
      md: Don't update ->recovery_offset when reshaping an array to fewer devices. · 70fffd0b
      NeilBrown 提交于
      When an array is reshaped to have fewer devices, the reshape proceeds
      from the end of the devices to the beginning.
      
      If a device happens to be non-In_sync (which is possible but rare)
      we would normally update the ->recovery_offset as the reshape
      progresses. However that would be wrong as the recover_offset records
      that the early part of the device is in_sync, while in fact it would
      only be the later part that is in_sync, and in any case the offset
      number would be measured from the wrong end of the device.
      
      Relatedly, if after a reshape a spare is discovered to not be
      recoverred all the way to the end, not allow spare_active
      to incorporate it in the array.
      
      This becomes relevant in the following sample scenario:
      
      A 4 drive RAID5 is converted to a 6 drive RAID6 in a combined
      operation.
      The RAID5->RAID6 conversion will cause a 5 drive to be included as a
      spare, then the 5drive -> 6drive reshape will effectively rebuild that
      spare as it progresses.  The 6th drive is treated as in_sync the whole
      time as there is never any case that we might consider reading from
      it, but must not because there is no valid data.
      
      If we interrupt this reshape part-way through and reverse it to return
      to a 5-drive RAID6 (or event a 4-drive RAID5), we don't want to update
      the recovery_offset - as that would be wrong - and we don't want to
      include that spare as active in the 5-drive RAID6 when the reversed
      reshape completed and it will be mostly out-of-sync still.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      70fffd0b
    • N
      md/raid5: avoid oops when number of devices is reduced then increased. · e4e11e38
      NeilBrown 提交于
      The entries in the stripe_cache maintained by raid5 are enlarged
      when we increased the number of devices in the array, but not
      shrunk when we reduce the number of devices.
      So if entries are added after reducing the number of devices, we
      much ensure to initialise the whole entry, not just the part that
      is currently relevant.  Otherwise if we enlarge the array again,
      we will reference uninitialised values.
      
      As grow_buffers/shrink_buffer now want to use a count that is stored
      explicity in the raid_conf, they should get it from there rather than
      being passed it as a parameter.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      e4e11e38
    • M
      md: enable raid4->raid0 takeover · 049d6c1e
      Maciej Trela 提交于
      Only level 5 with layout=PARITY_N can be taken over to raid0 now.
      Lets allow level 4 either.
      Signed-off-by: NMaciej Trela <maciej.trela@intel.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      049d6c1e
    • M
      md: clear layout after ->raid0 takeover · 001048a3
      Maciej Trela 提交于
      After takeover from raid5/10 -> raid0 mddev->layout is not cleared.
      Signed-off-by: NMaciej Trela <maciej.trela@intel.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      001048a3
    • M
      md: fix raid10 takeover: use new_layout for setup_conf · f73ea873
      Maciej Trela 提交于
      Use mddev->new_layout in setup_conf.
      Also use new_chunk, and don't set ->degraded in takeover().  That
      gets set in run()
      Signed-off-by: NMaciej Trela <maciej.trela@intel.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      f73ea873
    • N
      md: fix handling of array level takeover that re-arranges devices. · e93f68a1
      NeilBrown 提交于
      Most array level changes leave the list of devices largely unchanged,
      possibly causing one at the end to become redundant.
      However conversions between RAID0 and RAID10 need to renumber
      all devices (except 0).
      
      This renumbering is currently being done in the ->run method when the
      new personality takes over.  However this is too late as the common
      code in md.c might already have invalidated some of the devices if
      they had a ->raid_disk number that appeared to high.
      
      Moving it into the ->takeover method is too early as the array is
      still active at that time and wrong ->raid_disk numbers could cause
      confusion.
      
      So add a ->new_raid_disk field to mdk_rdev_s and use it to communicate
      the new raid_disk number.
      Now the common code knows exactly which devices need to be renumbered,
      and which can be invalidated, and can do it all at a convenient time
      when the array is suspend.
      It can also update some symlinks in sysfs which previously were not be
      updated correctly.
      Reported-by: NMaciej Trela <maciej.trela@intel.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      e93f68a1
    • P
      md: raid10: Fix null pointer dereference in fix_read_error() · 0544a21d
      Prasanna S. Panchamukhi 提交于
      Such NULL pointer dereference can occur when the driver was fixing the
      read errors/bad blocks and the disk was physically removed
      causing a system crash. This patch check if the
      rcu_dereference() returns valid rdev before accessing it in fix_read_error().
      
      Cc: stable@kernel.org
      Signed-off-by: NPrasanna S. Panchamukhi <prasanna.panchamukhi@riverbed.com>
      Signed-off-by: NRob Becker <rbecker@riverbed.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      0544a21d
    • N
      Restore partition detection of newly created md arrays. · f3b99be1
      NeilBrown 提交于
      Commit  b821eaa5 broke partition
      detection for md arrays.
      
      The logic was almost right.  However if revalidate_disk is called
      when the device is not yet open, bdev->bd_disk won't be set, so the
      flush_disk() Call will not set bd_invalidated.
      
      So when md_open is called we still need to ensure that
      ->bd_invalidated gets set.  This is easily done with a call to
      check_disk_size_change in the place where the offending commit removed
      check_disk_change.  At the important times, the size will have changed
      from 0 to non-zero, so check_disk_size_change will set bd_invalidated.
      Tested-by: NDuncan <1i5t5.duncan@cox.net>
      Reported-by: NDuncan <1i5t5.duncan@cox.net>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      f3b99be1
  6. 28 5月, 2010 1 次提交