1. 01 11月, 2015 1 次提交
    • M
      Revert "dm mpath: fix stalls when handling invalid ioctls" · 47796938
      Mauricio Faria de Oliveira 提交于
      This reverts commit a1989b33.
      
      That commit introduced a regression at least for the case of the SG_IO ioctl()
      running without CAP_SYS_RAWIO capability (e.g., unprivileged users) when there
      are no active paths: the ioctl() fails with the ENOTTY errno immediately rather
      than blocking due to queue_if_no_path until a path becomes active, for example.
      
      That case happens to be exercised by QEMU KVM guests with 'scsi-block' devices
      (qemu "-device scsi-block" [1], libvirt "<disk type='block' device='lun'>" [2])
      from multipath devices; which leads to SCSI/filesystem errors in such a guest.
      
      More general scenarios can hit that regression too. The following demonstration
      employs a SG_IO ioctl() with a standard SCSI INQUIRY command for this objective
      (some output & user changes omitted for brevity and comments added for clarity).
      
      Reverting that commit restores normal operation (queueing) in failing scenarios;
      tested on linux-next (next-20151022).
      
      1) Test-case is based on sg_simple0 [3] (just SG_IO; remove SG_GET_VERSION_NUM)
      
          $ cat sg_simple0.c
          ... see [3] ...
          $ sed '/SG_GET_VERSION_NUM/,/}/d' sg_simple0.c > sgio_inquiry.c
          $ gcc sgio_inquiry.c -o sgio_inquiry
      
      2) The ioctl() works fine with active paths present.
      
          # multipath -l 85ag56
          85ag56 (...) dm-19 IBM     ,2145
          size=60G features='1 queue_if_no_path' hwhandler='0' wp=rw
          |-+- policy='service-time 0' prio=0 status=active
          | |- 8:0:11:0  sdz  65:144  active undef running
          | `- 9:0:9:0   sdbf 67:144  active undef running
          `-+- policy='service-time 0' prio=0 status=enabled
            |- 8:0:12:0  sdae 65:224  active undef running
            `- 9:0:12:0  sdbo 68:32   active undef running
      
          $ ./sgio_inquiry /dev/mapper/85ag56
          Some of the INQUIRY command's response:
              IBM       2145              0000
          INQUIRY duration=0 millisecs, resid=0
      
      3) The ioctl() fails with ENOTTY errno with _no_ active paths present,
         for unprivileged users (rather than blocking due to queue_if_no_path).
      
          # for path in $(multipath -l 85ag56 | grep -o 'sd[a-z]\+'); \
                do multipathd -k"fail path $path"; done
      
          # multipath -l 85ag56
          85ag56 (...) dm-19 IBM     ,2145
          size=60G features='1 queue_if_no_path' hwhandler='0' wp=rw
          |-+- policy='service-time 0' prio=0 status=enabled
          | |- 8:0:11:0  sdz  65:144  failed undef running
          | `- 9:0:9:0   sdbf 67:144  failed undef running
          `-+- policy='service-time 0' prio=0 status=enabled
            |- 8:0:12:0  sdae 65:224  failed undef running
            `- 9:0:12:0  sdbo 68:32   failed undef running
      
          $ ./sgio_inquiry /dev/mapper/85ag56
          sg_simple0: Inquiry SG_IO ioctl error: Inappropriate ioctl for device
      
      4) dmesg shows that scsi_verify_blk_ioctl() failed for SG_IO (0x2285);
         it returns -ENOIOCTLCMD, later replaced with -ENOTTY in vfs_ioctl().
      
          $ dmesg
          <...>
          [] device-mapper: multipath: Failing path 65:144.
          [] device-mapper: multipath: Failing path 67:144.
          [] device-mapper: multipath: Failing path 65:224.
          [] device-mapper: multipath: Failing path 68:32.
          [] sgio_inquiry: sending ioctl 2285 to a partition!
      
      5) The ioctl() only works if the SYS_CAP_RAWIO capability is present
         (then queueing happens -- in this example, queue_if_no_path is set);
         this is due to a conditional check in scsi_verify_blk_ioctl().
      
          # capsh --drop=cap_sys_rawio -- -c './sgio_inquiry /dev/mapper/85ag56'
          sg_simple0: Inquiry SG_IO ioctl error: Inappropriate ioctl for device
      
          # ./sgio_inquiry /dev/mapper/85ag56 &
          [1] 72830
      
          # cat /proc/72830/stack
          [<c00000171c0df700>] 0xc00000171c0df700
          [<c000000000015934>] __switch_to+0x204/0x350
          [<c000000000152d4c>] msleep+0x5c/0x80
          [<c00000000077dfb0>] dm_blk_ioctl+0x70/0x170
          [<c000000000487c40>] blkdev_ioctl+0x2b0/0x9b0
          [<c0000000003128e4>] block_ioctl+0x64/0xd0
          [<c0000000002dd3b0>] do_vfs_ioctl+0x490/0x780
          [<c0000000002dd774>] SyS_ioctl+0xd4/0xf0
          [<c000000000009358>] system_call+0x38/0xd0
      
      6) This is the function call chain exercised in this analysis:
      
      SYSCALL_DEFINE3(ioctl, <...>) @ fs/ioctl.c
          -> do_vfs_ioctl()
              -> vfs_ioctl()
                  ...
                  error = filp->f_op->unlocked_ioctl(filp, cmd, arg);
                  ...
                      -> dm_blk_ioctl() @ drivers/md/dm.c
                          -> multipath_ioctl() @ drivers/md/dm-mpath.c
                              ...
                              (bdev = NULL, due to no active paths)
                              ...
                              if (!bdev || <...>) {
                                  int err = scsi_verify_blk_ioctl(NULL, cmd);
                                  if (err)
                                      r = err;
                              }
                              ...
                                  -> scsi_verify_blk_ioctl() @ block/scsi_ioctl.c
                                      ...
                                      if (bd && bd == bd->bd_contains) // not taken (bd = NULL)
                                          return 0;
                                      ...
                                      if (capable(CAP_SYS_RAWIO)) // not taken (unprivileged user)
                                          return 0;
                                      ...
                                      printk_ratelimited(KERN_WARNING
                                                 "%s: sending ioctl %x to a partition!\n" <...>);
      
                                      return -ENOIOCTLCMD;
                                  <-
                              ...
                              return r ? : <...>
                          <-
                  ...
                  if (error == -ENOIOCTLCMD)
                      error = -ENOTTY;
                   out:
                      return error;
                  ...
      
      Links:
      [1] http://git.qemu.org/?p=qemu.git;a=commit;h=336a6915bc7089fb20fea4ba99972ad9a97c5f52
      [2] https://libvirt.org/formatdomain.html#elementsDisks (see 'disk' -> 'device')
      [3] http://tldp.org/HOWTO/SCSI-Generic-HOWTO/pexample.html (Revision 1.2, 2002-05-03)
      Signed-off-by: NMauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org
      47796938
  2. 29 8月, 2015 2 次提交
  3. 28 5月, 2015 1 次提交
    • M
      dm mpath: fix leak of dm_mpath_io structure in blk-mq .queue_rq error path · 4c6dd53d
      Mike Snitzer 提交于
      Otherwise kmemleak reported:
      
      unreferenced object 0xffff88009b14e2b0 (size 16):
        comm "fio", pid 4274, jiffies 4294978034 (age 1253.210s)
        hex dump (first 16 bytes):
          40 12 f3 99 01 88 ff ff 00 10 00 00 00 00 00 00  @...............
        backtrace:
          [<ffffffff81600029>] kmemleak_alloc+0x49/0xb0
          [<ffffffff811679a8>] kmem_cache_alloc+0xf8/0x160
          [<ffffffff8111c950>] mempool_alloc_slab+0x10/0x20
          [<ffffffff8111cb37>] mempool_alloc+0x57/0x150
          [<ffffffffa04d2b61>] __multipath_map.isra.17+0xe1/0x220 [dm_multipath]
          [<ffffffffa04d2cb5>] multipath_clone_and_map+0x15/0x20 [dm_multipath]
          [<ffffffffa02889b5>] map_request.isra.39+0xd5/0x220 [dm_mod]
          [<ffffffffa028b0e4>] dm_mq_queue_rq+0x134/0x240 [dm_mod]
          [<ffffffff812cccb5>] __blk_mq_run_hw_queue+0x1d5/0x380
          [<ffffffff812ccaa5>] blk_mq_run_hw_queue+0xc5/0x100
          [<ffffffff812ce350>] blk_sq_make_request+0x240/0x300
          [<ffffffff812c0f30>] generic_make_request+0xc0/0x110
          [<ffffffff812c0ff2>] submit_bio+0x72/0x150
          [<ffffffff811c07cb>] do_blockdev_direct_IO+0x1f3b/0x2da0
          [<ffffffff811c166e>] __blockdev_direct_IO+0x3e/0x40
          [<ffffffff8120aa1a>] ext4_direct_IO+0x1aa/0x390
      
      Fixes: e5863d9a ("dm: allocate requests in target when stacking on blk-mq devices")
      Reported-by: NBart Van Assche <bart.vanassche@sandisk.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org # 4.0+
      4c6dd53d
  4. 16 4月, 2015 2 次提交
    • M
      dm: optimize dm_mq_queue_rq to _not_ use kthread if using pure blk-mq · 02233342
      Mike Snitzer 提交于
      dm_mq_queue_rq() is in atomic context so care must be taken to not
      sleep -- as such GFP_ATOMIC is used for the md->bs bioset allocations
      and dm-mpath's call to blk_get_request().  In the future the bioset
      allocations will hopefully go away (by removing support for partial
      completions of bios in a cloned request).
      
      Also prepare for supporting DM blk-mq ontop of old-style request_fn
      device(s) if a new dm-mod 'use_blk_mq' parameter is set.  The kthread
      will still be used to queue work if blk-mq is used ontop of old-style
      request_fn device(s).
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      02233342
    • M
      dm: add full blk-mq support to request-based DM · bfebd1cd
      Mike Snitzer 提交于
      Commit e5863d9a ("dm: allocate requests in target when stacking on
      blk-mq devices") served as the first step toward fully utilizing blk-mq
      in request-based DM -- it enabled stacking an old-style (request_fn)
      request_queue ontop of the underlying blk-mq device(s).  That first step
      didn't improve performance of DM multipath ontop of fast blk-mq devices
      (e.g. NVMe) because the top-level old-style request_queue was severely
      limited by the queue_lock.
      
      The second step offered here enables stacking a blk-mq request_queue
      ontop of the underlying blk-mq device(s).  This unlocks significant
      performance gains on fast blk-mq devices, Keith Busch tested on his NVMe
      testbed and offered this really positive news:
      
       "Just providing a performance update. All my fio tests are getting
        roughly equal performance whether accessed through the raw block
        device or the multipath device mapper (~470k IOPS). I could only push
        ~20% of the raw iops through dm before this conversion, so this latest
        tree is looking really solid from a performance standpoint."
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Tested-by: NKeith Busch <keith.busch@intel.com>
      bfebd1cd
  5. 01 4月, 2015 1 次提交
  6. 10 2月, 2015 3 次提交
    • J
      dm mpath: simplify failure path of dm_multipath_init() · ff658e9c
      Johannes Thumshirn 提交于
      Currently the cleanup of all error cases are open-coded.  Introduce a
      common exit path and labels.
      Signed-off-by: NJohannes Thumshirn <morbidrsa@gmail.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      ff658e9c
    • M
      dm: allocate requests in target when stacking on blk-mq devices · e5863d9a
      Mike Snitzer 提交于
      For blk-mq request-based DM the responsibility of allocating a cloned
      request is transfered from DM core to the target type.  Doing so
      enables the cloned request to be allocated from the appropriate
      blk-mq request_queue's pool (only the DM target, e.g. multipath, can
      know which block device to send a given cloned request to).
      
      Care was taken to preserve compatibility with old-style block request
      completion that requires request-based DM _not_ acquire the clone
      request's queue lock in the completion path.  As such, there are now 2
      different request-based DM target_type interfaces:
      1) the original .map_rq() interface will continue to be used for
         non-blk-mq devices -- the preallocated clone request is passed in
         from DM core.
      2) a new .clone_and_map_rq() and .release_clone_rq() will be used for
         blk-mq devices -- blk_get_request() and blk_put_request() are used
         respectively from these hooks.
      
      dm_table_set_type() was updated to detect if the request-based target is
      being stacked on blk-mq devices, if so DM_TYPE_MQ_REQUEST_BASED is set.
      DM core disallows switching the DM table's type after it is set.  This
      means that there is no mixing of non-blk-mq and blk-mq devices within
      the same request-based DM table.
      
      [This patch was started by Keith and later heavily modified by Mike]
      Tested-by: NBart Van Assche <bvanassche@acm.org>
      Signed-off-by: NKeith Busch <keith.busch@intel.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      e5863d9a
    • K
      dm: submit stacked requests in irq enabled context · 2eb6e1e3
      Keith Busch 提交于
      Switch to having request-based DM enqueue all prep'ed requests into work
      processed by another thread.  This allows request-based DM to invoke
      block APIs that assume interrupt enabled context (e.g. blk_get_request)
      and is a prerequisite for adding blk-mq support to request-based DM.
      
      The new kernel thread is only initialized for request-based DM devices.
      
      multipath_map() is now always in irq enabled context so change multipath
      spinlock (m->lock) locking to always disable interrupts.
      Signed-off-by: NKeith Busch <keith.busch@intel.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      2eb6e1e3
  7. 06 10月, 2014 1 次提交
  8. 02 8月, 2014 1 次提交
  9. 11 7月, 2014 1 次提交
    • J
      dm mpath: fix IO hang due to logic bug in multipath_busy · 7a7a3b45
      Jun'ichi Nomura 提交于
      Commit e8099177 ("dm mpath: push back requests instead of queueing")
      modified multipath_busy() to return true if !pg_ready().  pg_ready()
      checks the current state of the multipath device and may return false
      even if a new IO is needed to change the state.
      
      Bart Van Assche reported that he had multipath IO lockup when he was
      performing cable pull tests.  Analysis showed that the multipath
      device had a single path group with both paths active, but that the
      path group itself was not active.  During the multipath device state
      transitions 'queue_io' got set but nothing could clear it.  Clearing
      'queue_io' only happens in __choose_pgpath(), but it won't be called
      if multipath_busy() returns true due to pg_ready() returning false
      when 'queue_io' is set.
      
      As such the !pg_ready() check in multipath_busy() is wrong because new
      IO will not be sent to multipath target and the multipath state change
      won't happen.  That results in multipath IO lockup.
      
      The intent of multipath_busy() is to avoid unnecessary cycles of
      dequeue + request_fn + requeue if it is known that the multipath
      device will requeue.
      
      Such "busy" situations would be:
        - path group is being activated
        - there is no path and the multipath is setup to requeue if no path
      
      Fix multipath_busy() to return "busy" early only for these specific
      situations.
      Reported-by: NBart Van Assche <bvanassche@acm.org>
      Tested-by: NBart Van Assche <bvanassche@acm.org>
      Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org # v3.15
      7a7a3b45
  10. 04 6月, 2014 1 次提交
    • M
      dm: disable WRITE SAME if it fails · 7eee4ae2
      Mike Snitzer 提交于
      Add DM core support for disabling WRITE SAME on first failure to both
      request-based and bio-based targets.  The need to disable WRITE SAME
      stems from SCSI enabling it by default but then disabling it when it
      fails.  When SCSI does this it returns "permanent target failure, do
      not retry" using -EREMOTEIO.  Update DM core to only disable WRITE SAME
      on failure if the returned error is -EREMOTEIO.
      
      Commit f84cb8a4 ("dm mpath: disable WRITE SAME if it fails")
      implemented multipath specific disabling of WRITE SAME if it fails.
      However, as that commit detailed, the multipath-only solution doesn't go
      far enough if bio-based DM targets are stacked ontop of the
      request-based dm-multipath target (as is commonly done using dm-linear
      to support partitions on multipath devices, via kpartx).
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Acked-by: NMartin K. Petersen <martin.petersen@oracle.com>
      Tested-by: NAlex Chen <alex.chen@huawei.com>
      7eee4ae2
  11. 27 5月, 2014 1 次提交
  12. 15 5月, 2014 1 次提交
  13. 28 3月, 2014 8 次提交
  14. 26 2月, 2014 1 次提交
  15. 06 11月, 2013 1 次提交
    • H
      dm mpath: requeue I/O during pg_init · b63349a7
      Hannes Reinecke 提交于
      When pg_init is running no I/O can be submitted to the underlying
      devices, as the path priority etc might change.  When using queue_io for
      this, requests will be piling up within multipath as the block I/O
      scheduler just sees a _very fast_ device.  All of this queued I/O has to
      be resubmitted from within multipathing once pg_init is done.
      
      This approach has the problem that it's virtually impossible to
      abort I/O when pg_init is running, and we're adding heavy load
      to the devices after pg_init since all of the queued I/O needs to be
      resubmitted _before_ any requests can be pulled off of the request queue
      and normal operation continues.
      
      This patch will requeue the I/O that triggers the pg_init call, and
      return 'busy' when pg_init is in progress.  With these changes the block
      I/O scheduler will stop submitting I/O during pg_init, resulting in a
      quicker path switch and less I/O pressure (and memory consumption) after
      pg_init.
      Signed-off-by: NHannes Reinecke <hare@suse.de>
      [patch header edited for clarity and typos by Mike Snitzer]
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      b63349a7
  16. 01 11月, 2013 1 次提交
    • S
      dm mpath: fix race condition between multipath_dtr and pg_init_done · 954a73d5
      Shiva Krishna Merla 提交于
      Whenever multipath_dtr() is happening we must prevent queueing any
      further path activation work.  Implement this by adding a new
      'pg_init_disabled' flag to the multipath structure that denotes future
      path activation work should be skipped if it is set.  By disabling
      pg_init and then re-enabling in flush_multipath_work() we also avoid the
      potential for pg_init to be initiated while suspending an mpath device.
      
      Without this patch a race condition exists that may result in a kernel
      panic:
      
      1) If after pg_init_done() decrements pg_init_in_progress to 0, a call
         to wait_for_pg_init_completion() assumes there are no more pending path
         management commands.
      2) If pg_init_required is set by pg_init_done(), due to retryable
         mode_select errors, then process_queued_ios() will again queue the
         path activation work.
      3) If free_multipath() completes before activate_path() work is called a
         NULL pointer dereference like the following can be seen when
         accessing members of the recently destructed multipath:
      
      BUG: unable to handle kernel NULL pointer dereference at 0000000000000090
      RIP: 0010:[<ffffffffa003db1b>]  [<ffffffffa003db1b>] activate_path+0x1b/0x30 [dm_multipath]
      [<ffffffff81090ac0>] worker_thread+0x170/0x2a0
      [<ffffffff81096c80>] ? autoremove_wake_function+0x0/0x40
      
      [switch to disabling pg_init in flush_multipath_work & header edits by Mike Snitzer]
      Signed-off-by: NShiva Krishna Merla <shivakrishna.merla@netapp.com>
      Reviewed-by: NKrishnasamy Somasundaram <somasundaram.krishnasamy@netapp.com>
      Tested-by: NSpeagle Andy <Andy.Speagle@netapp.com>
      Acked-by: NJunichi Nomura <j-nomura@ce.jp.nec.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org
      954a73d5
  17. 23 9月, 2013 1 次提交
  18. 20 9月, 2013 1 次提交
    • M
      dm mpath: disable WRITE SAME if it fails · f84cb8a4
      Mike Snitzer 提交于
      Workaround the SCSI layer's problematic WRITE SAME heuristics by
      disabling WRITE SAME in the DM multipath device's queue_limits if an
      underlying device disabled it.
      
      The WRITE SAME heuristics, with both the original commit 5db44863
      ("[SCSI] sd: Implement support for WRITE SAME") and the updated commit
      66c28f97 ("[SCSI] sd: Update WRITE SAME heuristics"), default to enabling
      WRITE SAME(10) even without successfully determining it is supported.
      After the first failed WRITE SAME the SCSI layer will disable WRITE SAME
      for the device (by setting sdkp->device->no_write_same which results in
      'max_write_same_sectors' in device's queue_limits to be set to 0).
      
      When a device is stacked ontop of such a SCSI device any changes to that
      SCSI device's queue_limits do not automatically propagate up the stack.
      As such, a DM multipath device will not have its WRITE SAME support
      disabled.  This causes the block layer to continue to issue WRITE SAME
      requests to the mpath device which causes paths to fail and (if mpath IO
      isn't configured to queue when no paths are available) it will result in
      actual IO errors to the upper layers.
      
      This fix doesn't help configurations that have additional devices
      stacked ontop of the mpath device (e.g. LVM created linear DM devices
      ontop).  A proper fix that restacks all the queue_limits from the bottom
      of the device stack up will need to be explored if SCSI will continue to
      use this model of optimistically allowing op codes and then disabling
      them after they fail for the first time.
      
      Before this patch:
      
      EXT4-fs (dm-6): mounted filesystem with ordered data mode. Opts: (null)
      device-mapper: multipath: XXX snitm debugging: got -EREMOTEIO (-121)
      device-mapper: multipath: XXX snitm debugging: failing WRITE SAME IO with error=-121
      end_request: critical target error, dev dm-6, sector 528
      dm-6: WRITE SAME failed. Manually zeroing.
      device-mapper: multipath: Failing path 8:112.
      end_request: I/O error, dev dm-6, sector 4616
      dm-6: WRITE SAME failed. Manually zeroing.
      end_request: I/O error, dev dm-6, sector 4616
      end_request: I/O error, dev dm-6, sector 5640
      end_request: I/O error, dev dm-6, sector 6664
      end_request: I/O error, dev dm-6, sector 7688
      end_request: I/O error, dev dm-6, sector 524288
      Buffer I/O error on device dm-6, logical block 65536
      lost page write due to I/O error on dm-6
      JBD2: Error -5 detected when updating journal superblock for dm-6-8.
      end_request: I/O error, dev dm-6, sector 524296
      Aborting journal on device dm-6-8.
      end_request: I/O error, dev dm-6, sector 524288
      Buffer I/O error on device dm-6, logical block 65536
      lost page write due to I/O error on dm-6
      JBD2: Error -5 detected when updating journal superblock for dm-6-8.
      
      # cat /sys/block/sdh/queue/write_same_max_bytes
      0
      # cat /sys/block/dm-6/queue/write_same_max_bytes
      33553920
      
      After this patch:
      
      EXT4-fs (dm-6): mounted filesystem with ordered data mode. Opts: (null)
      device-mapper: multipath: XXX snitm debugging: got -EREMOTEIO (-121)
      device-mapper: multipath: XXX snitm debugging: WRITE SAME I/O failed with error=-121
      end_request: critical target error, dev dm-6, sector 528
      dm-6: WRITE SAME failed. Manually zeroing.
      
      # cat /sys/block/sdh/queue/write_same_max_bytes
      0
      # cat /sys/block/dm-6/queue/write_same_max_bytes
      0
      
      It should be noted that WRITE SAME support wasn't enabled in DM
      multipath until v3.10.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Cc: Martin K. Petersen <martin.petersen@oracle.com>
      Cc: Hannes Reinecke <hare@suse.de>
      Cc: stable@vger.kernel.org # 3.10+
      f84cb8a4
  19. 19 9月, 2013 1 次提交
  20. 24 8月, 2013 1 次提交
  21. 11 7月, 2013 1 次提交
    • H
      dm mpath: fix ioctl deadlock when no paths · 6c182cd8
      Hannes Reinecke 提交于
      When multipath needs to retry an ioctl the reference to the
      current live table needs to be dropped. Otherwise a deadlock
      occurs when all paths are down:
      - dm_blk_ioctl takes a reference to the current table
        and spins in multipath_ioctl().
      - A new table is being loaded, but upon resume the process
        hangs in dm_table_destroy() waiting for references to
        drop to zero.
      
      With this patch the reference to the old table is dropped
      prior to retry, thereby avoiding the deadlock.
      Signed-off-by: NHannes Reinecke <hare@suse.de>
      Cc: Mike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      6c182cd8
  22. 10 5月, 2013 1 次提交
  23. 02 3月, 2013 2 次提交
    • A
      dm: rename request variables to bios · 55a62eef
      Alasdair G Kergon 提交于
      Use 'bio' in the name of variables and functions that deal with
      bios rather than 'request' to avoid confusion with the normal
      block layer use of 'request'.
      
      No functional changes.
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      55a62eef
    • M
      dm: fix truncated status strings · fd7c092e
      Mikulas Patocka 提交于
      Avoid returning a truncated table or status string instead of setting
      the DM_BUFFER_FULL_FLAG when the last target of a table fills the
      buffer.
      
      When processing a table or status request, the function retrieve_status
      calls ti->type->status. If ti->type->status returns non-zero,
      retrieve_status assumes that the buffer overflowed and sets
      DM_BUFFER_FULL_FLAG.
      
      However, targets don't return non-zero values from their status method
      on overflow. Most targets returns always zero.
      
      If a buffer overflow happens in a target that is not the last in the
      table, it gets noticed during the next iteration of the loop in
      retrieve_status; but if a buffer overflow happens in the last target, it
      goes unnoticed and erroneously truncated data is returned.
      
      In the current code, the targets behave in the following way:
      * dm-crypt returns -ENOMEM if there is not enough space to store the
        key, but it returns 0 on all other overflows.
      * dm-thin returns errors from the status method if a disk error happened.
        This is incorrect because retrieve_status doesn't check the error
        code, it assumes that all non-zero values mean buffer overflow.
      * all the other targets always return 0.
      
      This patch changes the ti->type->status function to return void (because
      most targets don't use the return code). Overflow is detected in
      retrieve_status: if the status method fills up the remaining space
      completely, it is assumed that buffer overflow happened.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      fd7c092e
  24. 12 10月, 2012 1 次提交
  25. 27 9月, 2012 1 次提交
    • M
      dm mpath: only retry ioctl when no paths if queue_if_no_path set · 7ba10aa6
      Mike Snitzer 提交于
      When there are no paths and multipath receives an ioctl, it waits until
      a path becomes available.  This behaviour is incorrect if the
      "queue_if_no_path" setting was not specified, as then the ioctl should
      be rejected immediately, which this patch now does.
      
      commit 35991652 ("dm mpath: allow ioctls to trigger pg init") should
      have checked if queue_if_no_path was configured before queueing IO.
      
      Checking for the queue_if_no_path feature, like is done in map_io(),
      allows the following table load to work without blocking in the
      multipath_ioctl retry loop:
      
        echo "0 1024 multipath 0 0 0 0" | dmsetup create mpath_nodevs
      
      Without this fix the multipath_ioctl will block with the following stack
      trace:
      
        blkid           D 0000000000000002     0 23936      1 0x00000000
         ffff8802b89e5cd8 0000000000000082 ffff8802b89e5fd8 0000000000012440
         ffff8802b89e4010 0000000000012440 0000000000012440 0000000000012440
         ffff8802b89e5fd8 0000000000012440 ffff88030c2aab30 ffff880325794040
        Call Trace:
         [<ffffffff814ce099>] schedule+0x29/0x70
         [<ffffffff814cc312>] schedule_timeout+0x182/0x2e0
         [<ffffffff8104dee0>] ? lock_timer_base+0x70/0x70
         [<ffffffff814cc48e>] schedule_timeout_uninterruptible+0x1e/0x20
         [<ffffffff8104f840>] msleep+0x20/0x30
         [<ffffffffa0000839>] multipath_ioctl+0x109/0x170 [dm_multipath]
         [<ffffffffa06bfb9c>] dm_blk_ioctl+0xbc/0xd0 [dm_mod]
         [<ffffffff8122a408>] __blkdev_driver_ioctl+0x28/0x30
         [<ffffffff8122a79e>] blkdev_ioctl+0xce/0x730
         [<ffffffff811970ac>] block_ioctl+0x3c/0x40
         [<ffffffff8117321c>] do_vfs_ioctl+0x8c/0x340
         [<ffffffff81166293>] ? sys_newfstat+0x33/0x40
         [<ffffffff81173571>] sys_ioctl+0xa1/0xb0
         [<ffffffff814d70a9>] system_call_fastpath+0x16/0x1b
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org # 3.5+
      Acked-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      7ba10aa6
  26. 21 8月, 2012 1 次提交
    • T
      workqueue: deprecate flush[_delayed]_work_sync() · 43829731
      Tejun Heo 提交于
      flush[_delayed]_work_sync() are now spurious.  Mark them deprecated
      and convert all users to flush[_delayed]_work().
      
      If you're cc'd and wondering what's going on: Now all workqueues are
      non-reentrant and the regular flushes guarantee that the work item is
      not pending or running on any CPU on return, so there's no reason to
      use the sync flushes at all and they're going away.
      
      This patch doesn't make any functional difference.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: Ian Campbell <ian.campbell@citrix.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Mattia Dongili <malattia@linux.it>
      Cc: Kent Yoder <key@linux.vnet.ibm.com>
      Cc: David Airlie <airlied@linux.ie>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Karsten Keil <isdn@linux-pingi.de>
      Cc: Bryan Wu <bryan.wu@canonical.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Alasdair Kergon <agk@redhat.com>
      Cc: Mauro Carvalho Chehab <mchehab@infradead.org>
      Cc: Florian Tobias Schandinat <FlorianSchandinat@gmx.de>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: linux-wireless@vger.kernel.org
      Cc: Anton Vorontsov <cbou@mail.ru>
      Cc: Sangbeom Kim <sbkim73@samsung.com>
      Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Eric Van Hensbergen <ericvh@gmail.com>
      Cc: Takashi Iwai <tiwai@suse.de>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Petr Vandrovec <petr@vandrovec.name>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Avi Kivity <avi@redhat.com> 
      43829731
  27. 27 7月, 2012 2 次提交
    • A
      dm thin: commit before gathering status · 1f4e0ff0
      Alasdair G Kergon 提交于
      Commit outstanding metadata before returning the status for a dm thin
      pool so that the numbers reported are as up-to-date as possible.
      
      The commit is not performed if the device is suspended or if
      the DM_NOFLUSH_FLAG is supplied by userspace and passed to the target
      through a new 'status_flags' parameter in the target's dm_status_fn.
      
      The userspace dmsetup tool will support the --noflush flag with the
      'dmsetup status' and 'dmsetup wait' commands from version 1.02.76
      onwards.
      Tested-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      1f4e0ff0
    • M
      dm mpath: add retain_attached_hw_handler feature · a58a935d
      Mike Snitzer 提交于
      A SCSI device handler might get attached to a device during the
      initial device scan.  We do not necessarily want to override
      this when loading a multipath table, so this patch adds a new
      multipath feature argument "retain_attached_hw_handler".
      
      During SCSI device scan all loaded SCSI device handlers will be
      consulted for a match (via scsi_dh's provided .match).  If a match is
      found that device handler will be attached.  We need a way to have
      userspace multipathd's provided 'hw_handler' not override the already
      attached hardware handler.
      
      When specifying the new feature 'retain_attached_hw_handler' multipath
      will use the currently attached hardware handler instead of trying to
      attach the one specified during table load.  If no hardware handler is
      attached the specified hardware handler will still be used.
      
      Leverages scsi_dh_attach's ability to increment the scsi_dh's reference
      count if the same scsi_dh name is provided when attaching - currently
      attached scsi_dh name is determined with scsi_dh_attached_handler_name.
      
      Depends upon commit 7e8a74b1
      ("[SCSI] scsi_dh: add scsi_dh_attached_handler_name").
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Tested-by: NBabu Moger <babu.moger@netapp.com>
      Reviewed-by: NChandra Seetharaman <sekharan@us.ibm.com>
      Acked-by: NHannes Reinecke <hare@suse.de>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      a58a935d