1. 06 11月, 2013 1 次提交
    • H
      dm mpath: requeue I/O during pg_init · b63349a7
      Hannes Reinecke 提交于
      When pg_init is running no I/O can be submitted to the underlying
      devices, as the path priority etc might change.  When using queue_io for
      this, requests will be piling up within multipath as the block I/O
      scheduler just sees a _very fast_ device.  All of this queued I/O has to
      be resubmitted from within multipathing once pg_init is done.
      
      This approach has the problem that it's virtually impossible to
      abort I/O when pg_init is running, and we're adding heavy load
      to the devices after pg_init since all of the queued I/O needs to be
      resubmitted _before_ any requests can be pulled off of the request queue
      and normal operation continues.
      
      This patch will requeue the I/O that triggers the pg_init call, and
      return 'busy' when pg_init is in progress.  With these changes the block
      I/O scheduler will stop submitting I/O during pg_init, resulting in a
      quicker path switch and less I/O pressure (and memory consumption) after
      pg_init.
      Signed-off-by: NHannes Reinecke <hare@suse.de>
      [patch header edited for clarity and typos by Mike Snitzer]
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      b63349a7
  2. 01 11月, 2013 1 次提交
    • S
      dm mpath: fix race condition between multipath_dtr and pg_init_done · 954a73d5
      Shiva Krishna Merla 提交于
      Whenever multipath_dtr() is happening we must prevent queueing any
      further path activation work.  Implement this by adding a new
      'pg_init_disabled' flag to the multipath structure that denotes future
      path activation work should be skipped if it is set.  By disabling
      pg_init and then re-enabling in flush_multipath_work() we also avoid the
      potential for pg_init to be initiated while suspending an mpath device.
      
      Without this patch a race condition exists that may result in a kernel
      panic:
      
      1) If after pg_init_done() decrements pg_init_in_progress to 0, a call
         to wait_for_pg_init_completion() assumes there are no more pending path
         management commands.
      2) If pg_init_required is set by pg_init_done(), due to retryable
         mode_select errors, then process_queued_ios() will again queue the
         path activation work.
      3) If free_multipath() completes before activate_path() work is called a
         NULL pointer dereference like the following can be seen when
         accessing members of the recently destructed multipath:
      
      BUG: unable to handle kernel NULL pointer dereference at 0000000000000090
      RIP: 0010:[<ffffffffa003db1b>]  [<ffffffffa003db1b>] activate_path+0x1b/0x30 [dm_multipath]
      [<ffffffff81090ac0>] worker_thread+0x170/0x2a0
      [<ffffffff81096c80>] ? autoremove_wake_function+0x0/0x40
      
      [switch to disabling pg_init in flush_multipath_work & header edits by Mike Snitzer]
      Signed-off-by: NShiva Krishna Merla <shivakrishna.merla@netapp.com>
      Reviewed-by: NKrishnasamy Somasundaram <somasundaram.krishnasamy@netapp.com>
      Tested-by: NSpeagle Andy <Andy.Speagle@netapp.com>
      Acked-by: NJunichi Nomura <j-nomura@ce.jp.nec.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org
      954a73d5
  3. 23 9月, 2013 1 次提交
  4. 20 9月, 2013 1 次提交
    • M
      dm mpath: disable WRITE SAME if it fails · f84cb8a4
      Mike Snitzer 提交于
      Workaround the SCSI layer's problematic WRITE SAME heuristics by
      disabling WRITE SAME in the DM multipath device's queue_limits if an
      underlying device disabled it.
      
      The WRITE SAME heuristics, with both the original commit 5db44863
      ("[SCSI] sd: Implement support for WRITE SAME") and the updated commit
      66c28f97 ("[SCSI] sd: Update WRITE SAME heuristics"), default to enabling
      WRITE SAME(10) even without successfully determining it is supported.
      After the first failed WRITE SAME the SCSI layer will disable WRITE SAME
      for the device (by setting sdkp->device->no_write_same which results in
      'max_write_same_sectors' in device's queue_limits to be set to 0).
      
      When a device is stacked ontop of such a SCSI device any changes to that
      SCSI device's queue_limits do not automatically propagate up the stack.
      As such, a DM multipath device will not have its WRITE SAME support
      disabled.  This causes the block layer to continue to issue WRITE SAME
      requests to the mpath device which causes paths to fail and (if mpath IO
      isn't configured to queue when no paths are available) it will result in
      actual IO errors to the upper layers.
      
      This fix doesn't help configurations that have additional devices
      stacked ontop of the mpath device (e.g. LVM created linear DM devices
      ontop).  A proper fix that restacks all the queue_limits from the bottom
      of the device stack up will need to be explored if SCSI will continue to
      use this model of optimistically allowing op codes and then disabling
      them after they fail for the first time.
      
      Before this patch:
      
      EXT4-fs (dm-6): mounted filesystem with ordered data mode. Opts: (null)
      device-mapper: multipath: XXX snitm debugging: got -EREMOTEIO (-121)
      device-mapper: multipath: XXX snitm debugging: failing WRITE SAME IO with error=-121
      end_request: critical target error, dev dm-6, sector 528
      dm-6: WRITE SAME failed. Manually zeroing.
      device-mapper: multipath: Failing path 8:112.
      end_request: I/O error, dev dm-6, sector 4616
      dm-6: WRITE SAME failed. Manually zeroing.
      end_request: I/O error, dev dm-6, sector 4616
      end_request: I/O error, dev dm-6, sector 5640
      end_request: I/O error, dev dm-6, sector 6664
      end_request: I/O error, dev dm-6, sector 7688
      end_request: I/O error, dev dm-6, sector 524288
      Buffer I/O error on device dm-6, logical block 65536
      lost page write due to I/O error on dm-6
      JBD2: Error -5 detected when updating journal superblock for dm-6-8.
      end_request: I/O error, dev dm-6, sector 524296
      Aborting journal on device dm-6-8.
      end_request: I/O error, dev dm-6, sector 524288
      Buffer I/O error on device dm-6, logical block 65536
      lost page write due to I/O error on dm-6
      JBD2: Error -5 detected when updating journal superblock for dm-6-8.
      
      # cat /sys/block/sdh/queue/write_same_max_bytes
      0
      # cat /sys/block/dm-6/queue/write_same_max_bytes
      33553920
      
      After this patch:
      
      EXT4-fs (dm-6): mounted filesystem with ordered data mode. Opts: (null)
      device-mapper: multipath: XXX snitm debugging: got -EREMOTEIO (-121)
      device-mapper: multipath: XXX snitm debugging: WRITE SAME I/O failed with error=-121
      end_request: critical target error, dev dm-6, sector 528
      dm-6: WRITE SAME failed. Manually zeroing.
      
      # cat /sys/block/sdh/queue/write_same_max_bytes
      0
      # cat /sys/block/dm-6/queue/write_same_max_bytes
      0
      
      It should be noted that WRITE SAME support wasn't enabled in DM
      multipath until v3.10.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Cc: Martin K. Petersen <martin.petersen@oracle.com>
      Cc: Hannes Reinecke <hare@suse.de>
      Cc: stable@vger.kernel.org # 3.10+
      f84cb8a4
  5. 19 9月, 2013 1 次提交
  6. 24 8月, 2013 1 次提交
  7. 11 7月, 2013 1 次提交
    • H
      dm mpath: fix ioctl deadlock when no paths · 6c182cd8
      Hannes Reinecke 提交于
      When multipath needs to retry an ioctl the reference to the
      current live table needs to be dropped. Otherwise a deadlock
      occurs when all paths are down:
      - dm_blk_ioctl takes a reference to the current table
        and spins in multipath_ioctl().
      - A new table is being loaded, but upon resume the process
        hangs in dm_table_destroy() waiting for references to
        drop to zero.
      
      With this patch the reference to the old table is dropped
      prior to retry, thereby avoiding the deadlock.
      Signed-off-by: NHannes Reinecke <hare@suse.de>
      Cc: Mike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      6c182cd8
  8. 10 5月, 2013 1 次提交
  9. 02 3月, 2013 2 次提交
    • A
      dm: rename request variables to bios · 55a62eef
      Alasdair G Kergon 提交于
      Use 'bio' in the name of variables and functions that deal with
      bios rather than 'request' to avoid confusion with the normal
      block layer use of 'request'.
      
      No functional changes.
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      55a62eef
    • M
      dm: fix truncated status strings · fd7c092e
      Mikulas Patocka 提交于
      Avoid returning a truncated table or status string instead of setting
      the DM_BUFFER_FULL_FLAG when the last target of a table fills the
      buffer.
      
      When processing a table or status request, the function retrieve_status
      calls ti->type->status. If ti->type->status returns non-zero,
      retrieve_status assumes that the buffer overflowed and sets
      DM_BUFFER_FULL_FLAG.
      
      However, targets don't return non-zero values from their status method
      on overflow. Most targets returns always zero.
      
      If a buffer overflow happens in a target that is not the last in the
      table, it gets noticed during the next iteration of the loop in
      retrieve_status; but if a buffer overflow happens in the last target, it
      goes unnoticed and erroneously truncated data is returned.
      
      In the current code, the targets behave in the following way:
      * dm-crypt returns -ENOMEM if there is not enough space to store the
        key, but it returns 0 on all other overflows.
      * dm-thin returns errors from the status method if a disk error happened.
        This is incorrect because retrieve_status doesn't check the error
        code, it assumes that all non-zero values mean buffer overflow.
      * all the other targets always return 0.
      
      This patch changes the ti->type->status function to return void (because
      most targets don't use the return code). Overflow is detected in
      retrieve_status: if the status method fills up the remaining space
      completely, it is assumed that buffer overflow happened.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      fd7c092e
  10. 12 10月, 2012 1 次提交
  11. 27 9月, 2012 1 次提交
    • M
      dm mpath: only retry ioctl when no paths if queue_if_no_path set · 7ba10aa6
      Mike Snitzer 提交于
      When there are no paths and multipath receives an ioctl, it waits until
      a path becomes available.  This behaviour is incorrect if the
      "queue_if_no_path" setting was not specified, as then the ioctl should
      be rejected immediately, which this patch now does.
      
      commit 35991652 ("dm mpath: allow ioctls to trigger pg init") should
      have checked if queue_if_no_path was configured before queueing IO.
      
      Checking for the queue_if_no_path feature, like is done in map_io(),
      allows the following table load to work without blocking in the
      multipath_ioctl retry loop:
      
        echo "0 1024 multipath 0 0 0 0" | dmsetup create mpath_nodevs
      
      Without this fix the multipath_ioctl will block with the following stack
      trace:
      
        blkid           D 0000000000000002     0 23936      1 0x00000000
         ffff8802b89e5cd8 0000000000000082 ffff8802b89e5fd8 0000000000012440
         ffff8802b89e4010 0000000000012440 0000000000012440 0000000000012440
         ffff8802b89e5fd8 0000000000012440 ffff88030c2aab30 ffff880325794040
        Call Trace:
         [<ffffffff814ce099>] schedule+0x29/0x70
         [<ffffffff814cc312>] schedule_timeout+0x182/0x2e0
         [<ffffffff8104dee0>] ? lock_timer_base+0x70/0x70
         [<ffffffff814cc48e>] schedule_timeout_uninterruptible+0x1e/0x20
         [<ffffffff8104f840>] msleep+0x20/0x30
         [<ffffffffa0000839>] multipath_ioctl+0x109/0x170 [dm_multipath]
         [<ffffffffa06bfb9c>] dm_blk_ioctl+0xbc/0xd0 [dm_mod]
         [<ffffffff8122a408>] __blkdev_driver_ioctl+0x28/0x30
         [<ffffffff8122a79e>] blkdev_ioctl+0xce/0x730
         [<ffffffff811970ac>] block_ioctl+0x3c/0x40
         [<ffffffff8117321c>] do_vfs_ioctl+0x8c/0x340
         [<ffffffff81166293>] ? sys_newfstat+0x33/0x40
         [<ffffffff81173571>] sys_ioctl+0xa1/0xb0
         [<ffffffff814d70a9>] system_call_fastpath+0x16/0x1b
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org # 3.5+
      Acked-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      7ba10aa6
  12. 21 8月, 2012 1 次提交
    • T
      workqueue: deprecate flush[_delayed]_work_sync() · 43829731
      Tejun Heo 提交于
      flush[_delayed]_work_sync() are now spurious.  Mark them deprecated
      and convert all users to flush[_delayed]_work().
      
      If you're cc'd and wondering what's going on: Now all workqueues are
      non-reentrant and the regular flushes guarantee that the work item is
      not pending or running on any CPU on return, so there's no reason to
      use the sync flushes at all and they're going away.
      
      This patch doesn't make any functional difference.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: Ian Campbell <ian.campbell@citrix.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Mattia Dongili <malattia@linux.it>
      Cc: Kent Yoder <key@linux.vnet.ibm.com>
      Cc: David Airlie <airlied@linux.ie>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Karsten Keil <isdn@linux-pingi.de>
      Cc: Bryan Wu <bryan.wu@canonical.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Alasdair Kergon <agk@redhat.com>
      Cc: Mauro Carvalho Chehab <mchehab@infradead.org>
      Cc: Florian Tobias Schandinat <FlorianSchandinat@gmx.de>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: linux-wireless@vger.kernel.org
      Cc: Anton Vorontsov <cbou@mail.ru>
      Cc: Sangbeom Kim <sbkim73@samsung.com>
      Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Eric Van Hensbergen <ericvh@gmail.com>
      Cc: Takashi Iwai <tiwai@suse.de>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Petr Vandrovec <petr@vandrovec.name>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Avi Kivity <avi@redhat.com> 
      43829731
  13. 27 7月, 2012 2 次提交
    • A
      dm thin: commit before gathering status · 1f4e0ff0
      Alasdair G Kergon 提交于
      Commit outstanding metadata before returning the status for a dm thin
      pool so that the numbers reported are as up-to-date as possible.
      
      The commit is not performed if the device is suspended or if
      the DM_NOFLUSH_FLAG is supplied by userspace and passed to the target
      through a new 'status_flags' parameter in the target's dm_status_fn.
      
      The userspace dmsetup tool will support the --noflush flag with the
      'dmsetup status' and 'dmsetup wait' commands from version 1.02.76
      onwards.
      Tested-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      1f4e0ff0
    • M
      dm mpath: add retain_attached_hw_handler feature · a58a935d
      Mike Snitzer 提交于
      A SCSI device handler might get attached to a device during the
      initial device scan.  We do not necessarily want to override
      this when loading a multipath table, so this patch adds a new
      multipath feature argument "retain_attached_hw_handler".
      
      During SCSI device scan all loaded SCSI device handlers will be
      consulted for a match (via scsi_dh's provided .match).  If a match is
      found that device handler will be attached.  We need a way to have
      userspace multipathd's provided 'hw_handler' not override the already
      attached hardware handler.
      
      When specifying the new feature 'retain_attached_hw_handler' multipath
      will use the currently attached hardware handler instead of trying to
      attach the one specified during table load.  If no hardware handler is
      attached the specified hardware handler will still be used.
      
      Leverages scsi_dh_attach's ability to increment the scsi_dh's reference
      count if the same scsi_dh name is provided when attaching - currently
      attached scsi_dh name is determined with scsi_dh_attached_handler_name.
      
      Depends upon commit 7e8a74b1
      ("[SCSI] scsi_dh: add scsi_dh_attached_handler_name").
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Tested-by: NBabu Moger <babu.moger@netapp.com>
      Reviewed-by: NChandra Seetharaman <sekharan@us.ibm.com>
      Acked-by: NHannes Reinecke <hare@suse.de>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      a58a935d
  14. 03 6月, 2012 3 次提交
    • M
      dm mpath: allow ioctls to trigger pg init · 35991652
      Mikulas Patocka 提交于
      After the failure of a group of paths, any alternative paths that
      need initialising do not become available until further I/O is sent to
      the device.  Until this has happened, ioctls return -EAGAIN.
      
      With this patch, new paths are made available in response to an ioctl
      too.  The processing of the ioctl gets delayed until this has happened.
      
      Instead of returning an error, we submit a work item to kmultipathd
      (that will potentially activate the new path) and retry in ten
      milliseconds.
      
      Note that the patch doesn't retry an ioctl if the ioctl itself fails due
      to a path failure.  Such retries should be handled intelligently by the
      code that generated the ioctl in the first place, noting that some SCSI
      commands should not be retried because they are not idempotent (XOR write
      commands).  For commands that could be retried, there is a danger that
      if the device rejected the SCSI command, the path could be errorneously
      marked as failed, and the request would be retried on another path which
      might fail too.  It can be determined if the failure happens on the
      device or on the SCSI controller, but there is no guarantee that all
      SCSI drivers set these flags correctly.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      35991652
    • M
      dm mpath: delay retry of bypassed pg · f220fd4e
      Mike Christie 提交于
      If I/O needs retrying and only bypassed priority groups are available,
      set the pg_init_delay_retry flag to wait before retrying.
      
      If, for example, the reason for the bypass is that the controller is
      getting reset or there is a firmware upgrade happening, retrying right
      away would cause a flood of log messages and retries for what could be a
      few seconds or even several minutes.
      Signed-off-by: NMike Christie <michaelc@cs.wisc.edu>
      Acked-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      f220fd4e
    • M
      dm mpath: reduce size of struct multipath · 1fbdd2b3
      Mike Snitzer 提交于
      Move multipath structure's 'lock' and 'queue_size' members to eliminate
      two 4-byte holes.  Also use a bit within a single unsigned int for each
      existing flag (saves 8-bytes).  This allows future flags to be added
      without each consuming an unsigned int.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Acked-by: NHannes Reinecke <hare@suse.de>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      1fbdd2b3
  15. 12 5月, 2012 1 次提交
  16. 29 3月, 2012 2 次提交
  17. 15 1月, 2012 1 次提交
  18. 02 8月, 2011 2 次提交
  19. 27 7月, 2011 1 次提交
  20. 29 5月, 2011 1 次提交
  21. 24 3月, 2011 2 次提交
    • M
      dm mpath: allow table load with no priority groups · a490a07a
      Mike Snitzer 提交于
      This patch adjusts the multipath target to allow a table with both 0
      priority groups and 0 for the initial priority group number.
      
      If any mpath device is held open when all paths in the last priority
      group have failed, userspace multipathd will attempt to reload the
      associated DM table to reflect the fact that the device no longer has
      any priority groups.  But the reload attempt always failed because the
      multipath target did not allow 0 priority groups.
      
      All multipath target messages related to priority group (enable_group,
      disable_group, switch_group) will handle a priority group of 0 (will
      cause error).
      
      When reloading a multipath table with 0 priority groups, userspace
      multipathd must be updated to specify an initial priority group number
      of 0 (rather than 1).
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Cc: Babu Moger <babu.moger@lsi.com>
      Acked-by: NHannes Reinecke <hare@suse.de>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      a490a07a
    • M
      dm mpath: fail message ioctl if specified path is not valid · 19040c0b
      Mike Snitzer 提交于
      Fail the reinstate_path and fail_path message ioctl if the specified
      path is not valid.
      
      The message ioctl would succeed for the 'reinistate_path' and
      'fail_path' messages even if action was not taken because the
      specified device was not a valid path of the multipath device.
      
      Before, when /dev/vdb is not a path of mpathb:
      $ dmsetup message mpathb 0 reinstate_path /dev/vdb
      $ echo $?
      0
      
      After:
      $ dmsetup message mpathb 0 reinstate_path /dev/vdb
      device-mapper: message ioctl failed: Invalid argument
      Command failed
      $ echo $?
      1
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      19040c0b
  22. 13 2月, 2011 1 次提交
  23. 14 1月, 2011 4 次提交
  24. 12 8月, 2010 2 次提交
  25. 06 3月, 2010 5 次提交