1. 22 3月, 2011 1 次提交
  2. 17 3月, 2011 1 次提交
  3. 10 3月, 2011 2 次提交
    • J
      block: kill off REQ_UNPLUG · 721a9602
      Jens Axboe 提交于
      With the plugging now being explicitly controlled by the
      submitter, callers need not pass down unplugging hints
      to the block layer. If they want to unplug, it's because they
      manually plugged on their own - in which case, they should just
      unplug at will.
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      721a9602
    • J
      block: remove per-queue plugging · 7eaceacc
      Jens Axboe 提交于
      Code has been converted over to the new explicit on-stack plugging,
      and delay users have been converted to use the new API for that.
      So lets kill off the old plugging along with aops->sync_page().
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      7eaceacc
  4. 24 2月, 2011 1 次提交
    • N
      md: Fix - again - partition detection when array becomes active · f0b4f7e2
      NeilBrown 提交于
      Revert
          b821eaa5
      and
          f3b99be1
      
      When I wrote the first of these I had a wrong idea about the
      lifetime of 'struct block_device'.  It can disappear at any time that
      the block device is not open if it falls out of the inode cache.
      
      So relying on the 'size' recorded with it to detect when the
      device size has changed and so we need to revalidate, is wrong.
      
      Rather, we really do need the 'changed' attribute stored directly in
      the mddev and set/tested as appropriate.
      
      Without this patch, a sequence of:
         mknod / open / close / unlink
      
      (which can cause a block_device to be created and then destroyed)
      will result in a rescan of the partition table and consequence removal
      and addition of partitions.
      Several of these in a row can get udev racing to create and unlink and
      other code can get confused.
      
      With the patch, the rescan is only performed when needed and so there
      are no races.
      
      This is suitable for any stable kernel from 2.6.35.
      Reported-by: N"Wojcik, Krzysztof" <krzysztof.wojcik@intel.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      Cc: stable@kernel.org
      f0b4f7e2
  5. 21 2月, 2011 1 次提交
    • N
      md: avoid spinlock problem in blk_throtl_exit · da9cf505
      NeilBrown 提交于
      blk_throtl_exit assumes that ->queue_lock still exists,
      so make sure that it does.
      To do this, we stop redirecting ->queue_lock to conf->device_lock
      and leave it pointing where it is initialised - __queue_lock.
      
      As the blk_plug functions check the ->queue_lock is held, we now
      take that spin_lock explicitly around the plug functions.  We don't
      need the locking, just the warning removal.
      
      This is needed for any kernel with the blk_throtl code, which is
      which is 2.6.37 and later.
      
      Cc: stable@kernel.org
      Signed-off-by: NNeilBrown <neilb@suse.de>
      da9cf505
  6. 16 2月, 2011 2 次提交
    • N
      md: correctly handle probe of an 'mdp' device. · 8f5f02c4
      NeilBrown 提交于
      'mdp' devices are md devices with preallocated device numbers
      for partitions. As such it is possible to mknod and open a partition
      before opening the whole device.
      
      this causes  md_probe() to be called with a device number of a
      partition, which in-turn calls mddev_find with such a number.
      
      However mddev_find expects the number of a 'whole device' and
      does the wrong thing with partition numbers.
      
      So add code to mddev_find to remove the 'partition' part of
      a device number and just work with the 'whole device'.
      
      This patch addresses https://bugzilla.kernel.org/show_bug.cgi?id=28652
      
      Reported-by: hkmaly@bigfoot.com
      Signed-off-by: NNeilBrown <neilb@suse.de>
      Cc: <stable@kernel.org>
      8f5f02c4
    • N
      md: don't set_capacity before array is active. · cbe6ef1d
      NeilBrown 提交于
      If the desired size of an array is set (via sysfs) before the array is
      active (which is the normal sequence), we currrently call set_capacity
      immediately.
      This means that a subsequent 'open' (as can be caused by some
      udev-triggers program) will notice the new size and try to probe for
      partitions.  However as the array isn't quite ready yet the read will
      fail.  Then when the array is read, as the size doesn't change again
      we don't try to re-probe.
      
      So when setting array size via sysfs, only call set_capacity if the
      array is already active.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      cbe6ef1d
  7. 14 2月, 2011 1 次提交
  8. 08 2月, 2011 2 次提交
    • K
      FIX: md: process hangs at wait_barrier after 0->10 takeover · 02214dc5
      Krzysztof Wojcik 提交于
      Following symptoms were observed:
      1. After raid0->raid10 takeover operation we have array with 2
      missing disks.
      When we add disk for rebuild, recovery process starts as expected
      but it does not finish- it stops at about 90%, md126_resync process
      hangs in "D" state.
      2. Similar behavior is when we have mounted raid0 array and we
      execute takeover to raid10. After this when we try to unmount array-
      it causes process umount hangs in "D"
      
      In scenarios above processes hang at the same function- wait_barrier
      in raid10.c.
      Process waits in macro "wait_event_lock_irq" until the
      "!conf->barrier" condition will be true.
      In scenarios above it never happens.
      
      Reason was that at the end of level_store, after calling pers->run,
      we call mddev_resume. This calls pers->quiesce(mddev, 0) with
      RAID10, that calls lower_barrier.
      However raise_barrier hadn't been called on that 'conf' yet,
      so conf->barrier becomes negative, which is bad.
      
      This patch introduces setting conf->barrier=1 after takeover
      operation. It prevents to become barrier negative after call
      lower_barrier().
      Signed-off-by: NKrzysztof Wojcik <krzysztof.wojcik@intel.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      02214dc5
    • C
      md_make_request: don't touch the bio after calling make_request · e91ece55
      Chris Mason 提交于
      md_make_request was calling bio_sectors() for part_stat_add
      after it was calling the make_request function.  This is
      bad because the make_request function can free the bio and
      because the bi_size field can change around.
      
      The fix here was suggested by Jens Axboe.  It saves the
      sector count before the make_request call.  I hit this
      with CONFIG_DEBUG_PAGEALLOC turned on while trying to break
      his pretty fusionio card.
      
      Cc: <stable@kernel.org>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      e91ece55
  9. 02 2月, 2011 1 次提交
  10. 31 1月, 2011 8 次提交
    • N
      md: don't clear curr_resync_completed at end of resync. · 7281f812
      NeilBrown 提交于
      There is no need to set this to zero at this point.  It will be
      set to zero by remove_and_add_spares or at the start of
      md_do_sync at the latest.
      And setting it to zero before MD_RECOVERY_RUNNING is cleared can
      make a 'zero' appear briefly in the 'sync_completed' sysfs attribute
      just as resync is finishing.
      
      So simply remove this setting to zero.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      7281f812
    • N
      md: Don't use remove_and_add_spares to remove failed devices from a read-only array · a8c42c7f
      NeilBrown 提交于
      remove_and_add_spares is called in two places where the needs really
      are very different.
      remove_and_add_spares should not be called on an array which is about
      to be reshaped as some extra devices might have been manually added
      and that would remove them.  However if the array is 'read-auto',
      that will currently happen, which is bad.
      
      So in the 'ro != 0' case don't call remove_and_add_spares but simply
      remove the failed devices as the comment suggests is needed.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      a8c42c7f
    • K
      Add raid1->raid0 takeover support · fc3a08b8
      Krzysztof Wojcik 提交于
      This patch introduces raid 1 to raid0 takeover operation
      in kernel space.
      Signed-off-by: NKrzysztof Wojcik <krzysztof.wojcik@intel.com>
      Signed-off-by: NNeil Brown <neilb@nbeee.brown>
      fc3a08b8
    • N
      md: Remove the AllReserved flag for component devices. · f21e9ff7
      NeilBrown 提交于
      This flag is not needed and is used badly.
      
      Devices that are included in a native-metadata array are reserved
      exclusively for that array - and currently have AllReserved set.
      They all are bd_claimed for the rdev and so cannot be shared.
      
      Devices that are included in external-metadata arrays can be shared
      among multiple arrays - providing there is no overlap.
      These are bd_claimed for md in general - not for a particular rdev.
      
      When changing the amount of a device that is used in an array we need
      to check for overlap.  This currently includes a check on AllReserved
      So even without overlap, sharing with an AllReserved device is not
      allowed.
      However the bd_claim usage already precludes sharing with these
      devices, so the test on AllReserved is not needed.  And in fact it is
      wrong.
      
      As this is the only use of AllReserved, simply remove all usage and
      definition of AllReserved.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      f21e9ff7
    • N
      md: don't abort checking spares as soon as one cannot be added. · 50da0840
      NeilBrown 提交于
      As spares can be added manually before a reshape starts, we need to
      find them all to mark some of them as in_sync.
      
      Previously we would abort looking for spares when we found an
      unallocated spare what could not be added to the array (implying there
      was no room for new spares).  However already-added spares could be
      later in the list, so we need to keep searching.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      50da0840
    • N
      md: fix the test for finding spares in raid5_start_reshape. · 469518a3
      NeilBrown 提交于
      As spares can be added to the array before the reshape is started,
      we need to find and count them when checking there are enough.
      The array could have been degraded, so we need to check all devices,
      no just those out side of the range of devices in the array before
      the reshape.
      
      So instead of checking the index, check the In_sync flag as that
      reliably tells if the device is a spare or this purpose.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      469518a3
    • N
      md: simplify some 'if' conditionals in raid5_start_reshape. · 87a8dec9
      NeilBrown 提交于
      There are two consecutive 'if' statements.
      
       if (mddev->delta_disks >= 0)
            ....
       if (mddev->delta_disks > 0)
      
      The code in the second is equally valid if delta_disks == 0, and these
      two statements are the only place that 'added_devices' is used.
      
      So make them a single if statement, make added_devices a local
      variable, and re-indent it all.
      
      No functional change.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      87a8dec9
    • N
      md: revert change to raid_disks on failure. · de171cb9
      NeilBrown 提交于
      If we try to update_raid_disks and it fails, we should put
      'delta_disks' back to zero.  This is important because some code,
      such as slot_store, assumes that delta_disks has been validated.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      de171cb9
  11. 15 1月, 2011 1 次提交
    • T
      block: restore multiple bd_link_disk_holder() support · 49731baa
      Tejun Heo 提交于
      Commit e09b457b (block: simplify holder symlink handling) incorrectly
      assumed that there is only one link at maximum.  dm may use multiple
      links and expects block layer to track reference count for each link,
      which is different from and unrelated to the exclusive device holder
      identified by @holder when the device is opened.
      
      Remove the single holder assumption and automatic removal of the link
      and revive the per-link reference count tracking.  The code
      essentially behaves the same as before commit e09b457b sans the
      unnecessary kobject reference count dancing.
      
      While at it, note that this facility should not be used by anyone else
      than the current ones.  Sysfs symlinks shouldn't be abused like this
      and the whole thing doesn't belong in the block layer at all.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-by: NMilan Broz <mbroz@redhat.com>
      Cc: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
      Cc: Neil Brown <neilb@suse.de>
      Cc: linux-raid@vger.kernel.org
      Cc: Kay Sievers <kay.sievers@vrfy.org>
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      49731baa
  12. 14 1月, 2011 19 次提交
    • N
      md: Fix removal of extra drives when converting RAID6 to RAID5 · bf2cb0da
      NeilBrown 提交于
      When a RAID6 is converted to a RAID5, the extra drive should
      be discarded.  However it isn't due to a typo in a comparison.
      
      This bug was introduced in commit e93f68a1 in 2.6.35-rc4
      and is suitable for any -stable since than.
      
      As the extra drive is not removed, the 'degraded' counter is wrong and
      so the RAID5 will not respond correctly to a subsequent failure.
      
      Cc: stable@kernel.org
      Signed-off-by: NNeilBrown <neilb@suse.de>
      bf2cb0da
    • N
      md: range check slot number when manually adding a spare. · ba1b41b6
      NeilBrown 提交于
      When adding a spare to an active array, we should check the slot
      number, but allow it to be larger than raid_disks if a reshape
      is being prepared.
      
      Apply the same test when adding a device to an
      array-under-construction.  It already had most of the test in place,
      but not quite all.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      ba1b41b6
    • N
      md/raid5: handle manually-added spares in start_reshape. · 1a940fce
      NeilBrown 提交于
      It is possible to manually add spares to specific slots before
      starting a reshape.
      raid5_start_reshape should recognised this possibility and include
      it in the accounting.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      1a940fce
    • R
      md: fix sync_completed reporting for very large drives (>2TB) · 13ae864b
      Rémi Rérolle 提交于
      The values exported in the sync_completed file are unsigned long, which
      overflows with very large drives, resulting in wrong values reported.
      
      Since sync_completed uses sectors as unit, we'll start getting wrong
      values with components larger than 2TB.
      
      This patch simply replaces the use of unsigned long by unsigned long long.
      Signed-off-by: NRémi Rérolle <rrerolle@lacie.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      13ae864b
    • N
      md: allow suspend_lo and suspend_hi to decrease as well as increase. · 23ddff37
      NeilBrown 提交于
      The sysfs attributes 'suspend_lo' and 'suspend_hi' describe a region
      to which read/writes are suspended so that the under lying data can be
      manipulated without user-space noticing.
      Currently the window they describe can only move forwards along the
      device.  However this is an unnecessary restriction which will cause
      problems with planned developments.
      So relax this restriction and allow these endpoints to move
      arbitrarily.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      23ddff37
    • N
      md: Don't let implementation detail of curr_resync leak out through sysfs. · 75d3da43
      NeilBrown 提交于
      mddev->curr_resync has artificial values of '1' and '2' which are used
      by the code which ensures only one resync is happening at a time on
      any given device.
      
      These values are internal and should never be exposed to user-space
      (except when translated appropriately as in the 'pending' status in
      /proc/mdstat).
      
      Unfortunately they are as ->curr_resync is assigned to
      ->curr_resync_completed and that value is directly visible through
      sysfs.
      
      So change the assignments to ->curr_resync_completed to get the same
      valued from elsewhere in a form that doesn't have the magic '1' or '2'
      values.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      75d3da43
    • J
      md: separate meta and data devs · a6ff7e08
      Jonathan Brassow 提交于
      Allow the metadata to be on a separate device from the
      data.
      
      This doesn't mean the data and metadata will by on separate
      physical devices - it simply gives device-mapper and userspace
      tools more flexibility.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      a6ff7e08
    • J
      md-new-param-to_sync_page_io · ccebd4c4
      Jonathan Brassow 提交于
      Add new parameter to 'sync_page_io'.
      
      The new parameter allows us to distinguish between metadata and data
      operations.  This becomes important later when we add the ability to
      use separate devices for data and metadata.
      Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
      ccebd4c4
    • J
      md-new-param-to-calc_dev_sboffset · 57b2caa3
      Jonathan Brassow 提交于
      When we allow for separate devices for data and metadata
      in a later patch, we will need to be able to calculate
      the superblock offset based on more than the bdev.
      Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
      57b2caa3
    • N
      md: Be more careful about clearing flags bit in ->recovery · 7ebc0be7
      NeilBrown 提交于
      Setting ->recovery to 0 is generally not a good idea as it could clear
      bits that shouldn't be cleared.  In particular, MD_RECOVERY_FROZEN
      should only be cleared on explicit request from user-space.
      
      So when we need to clear things, just clear the bits that need
      clearing.
      
      As there are a few different places which reap a resync process - and
      some do an incomplte job - factor out the code for doing the from
      md_check_recovery and call that function instead of open coding part
      of it.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      Reported-by: NJonathan Brassow <jbrassow@redhat.com>
      7ebc0be7
    • N
      md: md_stop_writes requires mddev_lock. · defad61a
      NeilBrown 提交于
      As md_stop_writes manipulates the sync_thread and calls md_update_sb,
      it need to be called with mddev_lock held.
      
      In all internal cases it is, but the symbol is exported for dm-raid to
      call and in that case the lock won't be help.
      Do make an exported version which takes the lock, and an internal
      version which does not.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      defad61a
    • J
      md/raid5: use sysfs_notify_dirent_safe to avoid NULL pointer · 43c73ca4
      Jonathan Brassow 提交于
      With the module parameter 'start_dirty_degraded' set,
      raid5_spare_active() previously called sysfs_notify_dirent() with a NULL
      argument (rdev->sysfs_state) when a rebuild finished.
      Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      43c73ca4
    • N
      md: Ensure no IO request to get md device before it is properly initialised. · 0ca69886
      NeilBrown 提交于
      When an md device is in the process of coming on line it is possible
      for an IO request (typically a partition table probe) to get through
      before the array is fully initialised, which can cause unexpected
      behaviour (e.g. a crash).
      
      So explicitly record when the array is ready for IO and don't allow IO
      through until then.
      
      There is no possibility for a similar problem when the array is going
      off-line as there must only be one 'open' at that time, and it is busy
      off-lining the array and so cannot send IO requests.  So no memory
      barrier is needed in md_stop()
      
      This has been a bug since commit 409c57f3 in 2.6.30 which
      introduced md_make_request.  Before then, each personality would
      register its own make_request_fn when it was ready.
      This is suitable for any stable kernel from 2.6.30.y onwards.
      
      Cc: <stable@kernel.org>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      Reported-by: N"Hawrylewicz Czarnowski, Przemyslaw" <przemyslaw.hawrylewicz.czarnowski@intel.com>
      0ca69886
    • J
      067032bc
    • N
      md: fix regression resulting in delays in clearing bits in a bitmap · 6c987910
      NeilBrown 提交于
      commit 589a594b (2.6.37-rc4) fixed a problem were md_thread would
      sometimes call the ->run function at a bad time.
      
      If an error is detected during array start up after the md_thread has
      been started, the md_thread is killed.  This resulted in the ->run
      function being called once.  However the array may not be in a state
      that it is safe to call ->run.
      
      However the fix imposed meant that  ->run was not called on a timeout.
      This means that when an array goes idle, bitmap bits do not get
      cleared promptly.  While the array is busy the bits will still be
      cleared when appropriate so this is not very serious.  There is no
      risk to data.
      
      Change the test so that we only avoid calling ->run when the thread
      is being stopped.  This more explicitly addresses the problem situation.
      
      This is suitable for 2.6.37-stable and any -stable kernel to which
      589a594b was applied.
      
      Cc: stable@kernel.org
      Signed-off-by: NNeilBrown <neilb@suse.de>
      6c987910
    • N
      dm: raid456 basic support · 9d09e663
      NeilBrown 提交于
      This patch is the skeleton for the DM target that will be
      the bridge from DM to MD (initially RAID456 and later RAID1).  It
      provides a way to use device-mapper interfaces to the MD RAID456
      drivers.
      
      As with all device-mapper targets, the nominal public interfaces are the
      constructor (CTR) tables and the status outputs (both STATUSTYPE_INFO
      and STATUSTYPE_TABLE).  The CTR table looks like the following:
      
      1: <s> <l> raid \
      2:	<raid_type> <#raid_params> <raid_params> \
      3:	<#raid_devs> <meta_dev1> <dev1> .. <meta_devN> <devN>
      
      Line 1 contains the standard first three arguments to any device-mapper
      target - the start, length, and target type fields.  The target type in
      this case is "raid".
      
      Line 2 contains the arguments that define the particular raid
      type/personality/level, the required arguments for that raid type, and
      any optional arguments.  Possible raid types include: raid4, raid5_la,
      raid5_ls, raid5_rs, raid6_zr, raid6_nr, and raid6_nc.  (again, raid1 is
      planned for the future.)  The list of required and optional parameters
      is the same for all the current raid types.  The required parameters are
      positional, while the optional parameters are given as key/value pairs.
      The possible parameters are as follows:
       <chunk_size>		Chunk size in sectors.
       [[no]sync]		Force/Prevent RAID initialization
       [rebuild <idx>]	Rebuild the drive indicated by the index
       [daemon_sleep <ms>]	Time between bitmap daemon work to clear bits
       [min_recovery_rate <kB/sec/disk>]	Throttle RAID initialization
       [max_recovery_rate <kB/sec/disk>]	Throttle RAID initialization
       [max_write_behind <value>]		See '-write-behind=' (man mdadm)
       [stripe_cache <sectors>]		Stripe cache size for higher RAIDs
      
      Line 3 contains the list of devices that compose the array in
      metadata/data device pairs.  If the metadata is stored separately, a '-'
      is given for the metadata device position.  If a drive has failed or is
      missing at creation time, a '-' can be given for both the metadata and
      data drives for a given position.
      
      Examples:
      # RAID4 - 4 data drives, 1 parity
      # No metadata devices specified to hold superblock/bitmap info
      # Chunk size of 1MiB
      # (Lines separated for easy reading)
      0 1960893648 raid \
      	raid4 1 2048 \
      	5 - 8:17 - 8:33 - 8:49 - 8:65 - 8:81
      
      # RAID4 - 4 data drives, 1 parity (no metadata devices)
      # Chunk size of 1MiB, force RAID initialization,
      #	min recovery rate at 20 kiB/sec/disk
      0 1960893648 raid \
              raid4 4 2048 min_recovery_rate 20 sync\
              5 - 8:17 - 8:33 - 8:49 - 8:65 - 8:81
      
      Performing a 'dmsetup table' should display the CTR table used to
      construct the mapping (with possible reordering of optional
      parameters).
      
      Performing a 'dmsetup status' will yield information on the state and
      health of the array.  The output is as follows:
      1: <s> <l> raid \
      2:	<raid_type> <#devices> <1 health char for each dev> <resync_ratio>
      
      Line 1 is standard DM output.  Line 2 is best shown by example:
      	0 1960893648 raid raid4 5 AAAAA 2/490221568
      Here we can see the RAID type is raid4, there are 5 devices - all of
      which are 'A'live, and the array is 2/490221568 complete with recovery.
      
      Cc: linux-raid@vger.kernel.org
      Signed-off-by: NNeilBrown <neilb@suse.de>
      Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      9d09e663
    • N
      dm: per target unplug callback support · 99d03c14
      NeilBrown 提交于
      Add per-target unplug callback support.
      
      Cc: linux-raid@vger.kernel.org
      Signed-off-by: NNeilBrown <neilb@suse.de>
      Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      99d03c14
    • N
      dm: introduce target callbacks and congestion callback · 9d357b07
      NeilBrown 提交于
      DM currently implements congestion checking by checking on congestion
      in each component device.  For raid456 we need to also check if the
      stripe cache is congested.
      
      Add per-target congestion checker callback support.
      
      Extending the target_callbacks structure with additional callback
      functions allows for establishing multiple callbacks per-target (a
      callback is also needed for unplug).
      
      Cc: linux-raid@vger.kernel.org
      Signed-off-by: NNeilBrown <neilb@suse.de>
      Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      9d357b07
    • C
      dm mpath: delay activate_path retry on SCSI_DH_RETRY · 4e2d19e4
      Chandra Seetharaman 提交于
      This patch adds a user-configurable 'pg_init_delay_msecs' feature.  Use
      this feature to specify the number of milliseconds to delay before
      retrying scsi_dh_activate, when SCSI_DH_RETRY is returned.
      
      SCSI Device Handlers return SCSI_DH_IMM_RETRY if we could retry
      activation immediately and SCSI_DH_RETRY in cases where it is better to
      retry after some delay.
      
      Currently we immediately retry scsi_dh_activate irrespective of
      SCSI_DH_IMM_RETRY and SCSI_DH_RETRY.
      
      The 'pg_init_delay_msecs' feature may be provided during table create or
      load, e.g.:
          dmsetup create --table "0 20971520 multipath 3 queue_if_no_path \
      	pg_init_delay_msecs 2500 ..." mpatha
      
      The default for 'pg_init_delay_msecs' is 2000 milliseconds.
      Maximum configurable delay is 60000 milliseconds.  Specifying a
      'pg_init_delay_msecs' of 0 will cause immediate retry.
      Signed-off-by: NNikanth Karthikesan <knikanth@suse.de>
      Signed-off-by: NChandra Seetharaman <sekharan@us.ibm.com>
      Acked-by: NMike Christie <michaelc@cs.wisc.edu>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      4e2d19e4