1. 23 2月, 2016 10 次提交
  2. 18 11月, 2015 2 次提交
    • J
      dm mpath: fix infinite recursion in ioctl when no paths and !queue_if_no_path · 43e43c9e
      Junichi Nomura 提交于
      In multipath_prepare_ioctl(),
        - pgpath is a path selected from available paths
        - m->queue_io is true if we cannot send a request immediately to
          paths, either because:
            * there is no available path
            * the path group needs activation (pg_init)
                - pg_init is not started
                - pg_init is still running
        - m->queue_if_no_path is true if the device is configured to queue
          I/O if there are no available paths
      
      If !pgpath && !m->queue_if_no_path, the handler should return -EIO.
      However in the course of refactoring the condition check has broken
      and returns success in that case.  Since bdev points to the dm device
      itself, dm_blk_ioctl() calls __blk_dev_driver_ioctl() for itself and
      recurses until crash.
      
      You could reproduce the problem like this:
      
        # dmsetup create mp --table '0 1024 multipath 0 0 0 0'
        # sg_inq /dev/mapper/mp
        <crash>
        [  172.648615] BUG: unable to handle kernel paging request at fffffffc81b10268
        [  172.662843] PGD 19dd067 PUD 0
        [  172.666269] Thread overran stack, or stack corrupted
        [  172.671808] Oops: 0000 [#1] SMP
        ...
      
      Fix the condition check with some clarifications.
      
      Fixes: e56f81e0 ("dm: refactor ioctl handling")
      Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Mike Snitzer <snitzer@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      43e43c9e
    • J
      dm: fix ioctl retry termination with signal · 5bbbfdf6
      Junichi Nomura 提交于
      dm-mpath retries ioctl, when no path is readily available and the device
      is configured to queue I/O in such a case. If you want to stop the retry
      before multipathd decides to turn off queueing mode, you could send
      signal for the process to exit from the loop.
      
      However the check of fatal signal has not carried along when commit
      6c182cd8 ("dm mpath: fix ioctl deadlock when no paths") moved the
      loop from dm-mpath to dm core. As a result, we can't terminate such
      a process in the retry loop.
      
      Easy reproducer of the situation is:
      
        # dmsetup create mp --table '0 1024 multipath 0 0 0 0'
        # dmsetup message mp 0 'queue_if_no_path'
        # sg_inq /dev/mapper/mp
      
      then you should be able to terminate sg_inq by pressing Ctrl+C.
      
      Fixes: 6c182cd8 ("dm mpath: fix ioctl deadlock when no paths")
      Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com>
      Cc: Hannes Reinecke <hare@suse.de>
      Cc: Mikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org
      5bbbfdf6
  3. 01 11月, 2015 3 次提交
    • C
      dm: add support for passing through persistent reservations · 71cdb697
      Christoph Hellwig 提交于
      This adds support to pass through persistent reservation requests
      similar to the existing ioctl handling, and with the same limitations,
      e.g. devices may only have a single target attached.
      
      This is mostly intended for multipathing.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      71cdb697
    • C
      dm: refactor ioctl handling · e56f81e0
      Christoph Hellwig 提交于
      This moves the call to blkdev_ioctl and the argument checking to DM core
      code, and only leaves a callout to find the block device to operate on
      in the targets.  This simplifies the code and allows us to pass through
      ioctl-like command using other methods in the next patch.
      
      Also split out a helper around calling the prepare_ioctl method that
      will be reused for persistent reservation handling.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      e56f81e0
    • M
      Revert "dm mpath: fix stalls when handling invalid ioctls" · 47796938
      Mauricio Faria de Oliveira 提交于
      This reverts commit a1989b33.
      
      That commit introduced a regression at least for the case of the SG_IO ioctl()
      running without CAP_SYS_RAWIO capability (e.g., unprivileged users) when there
      are no active paths: the ioctl() fails with the ENOTTY errno immediately rather
      than blocking due to queue_if_no_path until a path becomes active, for example.
      
      That case happens to be exercised by QEMU KVM guests with 'scsi-block' devices
      (qemu "-device scsi-block" [1], libvirt "<disk type='block' device='lun'>" [2])
      from multipath devices; which leads to SCSI/filesystem errors in such a guest.
      
      More general scenarios can hit that regression too. The following demonstration
      employs a SG_IO ioctl() with a standard SCSI INQUIRY command for this objective
      (some output & user changes omitted for brevity and comments added for clarity).
      
      Reverting that commit restores normal operation (queueing) in failing scenarios;
      tested on linux-next (next-20151022).
      
      1) Test-case is based on sg_simple0 [3] (just SG_IO; remove SG_GET_VERSION_NUM)
      
          $ cat sg_simple0.c
          ... see [3] ...
          $ sed '/SG_GET_VERSION_NUM/,/}/d' sg_simple0.c > sgio_inquiry.c
          $ gcc sgio_inquiry.c -o sgio_inquiry
      
      2) The ioctl() works fine with active paths present.
      
          # multipath -l 85ag56
          85ag56 (...) dm-19 IBM     ,2145
          size=60G features='1 queue_if_no_path' hwhandler='0' wp=rw
          |-+- policy='service-time 0' prio=0 status=active
          | |- 8:0:11:0  sdz  65:144  active undef running
          | `- 9:0:9:0   sdbf 67:144  active undef running
          `-+- policy='service-time 0' prio=0 status=enabled
            |- 8:0:12:0  sdae 65:224  active undef running
            `- 9:0:12:0  sdbo 68:32   active undef running
      
          $ ./sgio_inquiry /dev/mapper/85ag56
          Some of the INQUIRY command's response:
              IBM       2145              0000
          INQUIRY duration=0 millisecs, resid=0
      
      3) The ioctl() fails with ENOTTY errno with _no_ active paths present,
         for unprivileged users (rather than blocking due to queue_if_no_path).
      
          # for path in $(multipath -l 85ag56 | grep -o 'sd[a-z]\+'); \
                do multipathd -k"fail path $path"; done
      
          # multipath -l 85ag56
          85ag56 (...) dm-19 IBM     ,2145
          size=60G features='1 queue_if_no_path' hwhandler='0' wp=rw
          |-+- policy='service-time 0' prio=0 status=enabled
          | |- 8:0:11:0  sdz  65:144  failed undef running
          | `- 9:0:9:0   sdbf 67:144  failed undef running
          `-+- policy='service-time 0' prio=0 status=enabled
            |- 8:0:12:0  sdae 65:224  failed undef running
            `- 9:0:12:0  sdbo 68:32   failed undef running
      
          $ ./sgio_inquiry /dev/mapper/85ag56
          sg_simple0: Inquiry SG_IO ioctl error: Inappropriate ioctl for device
      
      4) dmesg shows that scsi_verify_blk_ioctl() failed for SG_IO (0x2285);
         it returns -ENOIOCTLCMD, later replaced with -ENOTTY in vfs_ioctl().
      
          $ dmesg
          <...>
          [] device-mapper: multipath: Failing path 65:144.
          [] device-mapper: multipath: Failing path 67:144.
          [] device-mapper: multipath: Failing path 65:224.
          [] device-mapper: multipath: Failing path 68:32.
          [] sgio_inquiry: sending ioctl 2285 to a partition!
      
      5) The ioctl() only works if the SYS_CAP_RAWIO capability is present
         (then queueing happens -- in this example, queue_if_no_path is set);
         this is due to a conditional check in scsi_verify_blk_ioctl().
      
          # capsh --drop=cap_sys_rawio -- -c './sgio_inquiry /dev/mapper/85ag56'
          sg_simple0: Inquiry SG_IO ioctl error: Inappropriate ioctl for device
      
          # ./sgio_inquiry /dev/mapper/85ag56 &
          [1] 72830
      
          # cat /proc/72830/stack
          [<c00000171c0df700>] 0xc00000171c0df700
          [<c000000000015934>] __switch_to+0x204/0x350
          [<c000000000152d4c>] msleep+0x5c/0x80
          [<c00000000077dfb0>] dm_blk_ioctl+0x70/0x170
          [<c000000000487c40>] blkdev_ioctl+0x2b0/0x9b0
          [<c0000000003128e4>] block_ioctl+0x64/0xd0
          [<c0000000002dd3b0>] do_vfs_ioctl+0x490/0x780
          [<c0000000002dd774>] SyS_ioctl+0xd4/0xf0
          [<c000000000009358>] system_call+0x38/0xd0
      
      6) This is the function call chain exercised in this analysis:
      
      SYSCALL_DEFINE3(ioctl, <...>) @ fs/ioctl.c
          -> do_vfs_ioctl()
              -> vfs_ioctl()
                  ...
                  error = filp->f_op->unlocked_ioctl(filp, cmd, arg);
                  ...
                      -> dm_blk_ioctl() @ drivers/md/dm.c
                          -> multipath_ioctl() @ drivers/md/dm-mpath.c
                              ...
                              (bdev = NULL, due to no active paths)
                              ...
                              if (!bdev || <...>) {
                                  int err = scsi_verify_blk_ioctl(NULL, cmd);
                                  if (err)
                                      r = err;
                              }
                              ...
                                  -> scsi_verify_blk_ioctl() @ block/scsi_ioctl.c
                                      ...
                                      if (bd && bd == bd->bd_contains) // not taken (bd = NULL)
                                          return 0;
                                      ...
                                      if (capable(CAP_SYS_RAWIO)) // not taken (unprivileged user)
                                          return 0;
                                      ...
                                      printk_ratelimited(KERN_WARNING
                                                 "%s: sending ioctl %x to a partition!\n" <...>);
      
                                      return -ENOIOCTLCMD;
                                  <-
                              ...
                              return r ? : <...>
                          <-
                  ...
                  if (error == -ENOIOCTLCMD)
                      error = -ENOTTY;
                   out:
                      return error;
                  ...
      
      Links:
      [1] http://git.qemu.org/?p=qemu.git;a=commit;h=336a6915bc7089fb20fea4ba99972ad9a97c5f52
      [2] https://libvirt.org/formatdomain.html#elementsDisks (see 'disk' -> 'device')
      [3] http://tldp.org/HOWTO/SCSI-Generic-HOWTO/pexample.html (Revision 1.2, 2002-05-03)
      Signed-off-by: NMauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org
      47796938
  4. 29 8月, 2015 2 次提交
  5. 28 5月, 2015 1 次提交
    • M
      dm mpath: fix leak of dm_mpath_io structure in blk-mq .queue_rq error path · 4c6dd53d
      Mike Snitzer 提交于
      Otherwise kmemleak reported:
      
      unreferenced object 0xffff88009b14e2b0 (size 16):
        comm "fio", pid 4274, jiffies 4294978034 (age 1253.210s)
        hex dump (first 16 bytes):
          40 12 f3 99 01 88 ff ff 00 10 00 00 00 00 00 00  @...............
        backtrace:
          [<ffffffff81600029>] kmemleak_alloc+0x49/0xb0
          [<ffffffff811679a8>] kmem_cache_alloc+0xf8/0x160
          [<ffffffff8111c950>] mempool_alloc_slab+0x10/0x20
          [<ffffffff8111cb37>] mempool_alloc+0x57/0x150
          [<ffffffffa04d2b61>] __multipath_map.isra.17+0xe1/0x220 [dm_multipath]
          [<ffffffffa04d2cb5>] multipath_clone_and_map+0x15/0x20 [dm_multipath]
          [<ffffffffa02889b5>] map_request.isra.39+0xd5/0x220 [dm_mod]
          [<ffffffffa028b0e4>] dm_mq_queue_rq+0x134/0x240 [dm_mod]
          [<ffffffff812cccb5>] __blk_mq_run_hw_queue+0x1d5/0x380
          [<ffffffff812ccaa5>] blk_mq_run_hw_queue+0xc5/0x100
          [<ffffffff812ce350>] blk_sq_make_request+0x240/0x300
          [<ffffffff812c0f30>] generic_make_request+0xc0/0x110
          [<ffffffff812c0ff2>] submit_bio+0x72/0x150
          [<ffffffff811c07cb>] do_blockdev_direct_IO+0x1f3b/0x2da0
          [<ffffffff811c166e>] __blockdev_direct_IO+0x3e/0x40
          [<ffffffff8120aa1a>] ext4_direct_IO+0x1aa/0x390
      
      Fixes: e5863d9a ("dm: allocate requests in target when stacking on blk-mq devices")
      Reported-by: NBart Van Assche <bart.vanassche@sandisk.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org # 4.0+
      4c6dd53d
  6. 16 4月, 2015 2 次提交
    • M
      dm: optimize dm_mq_queue_rq to _not_ use kthread if using pure blk-mq · 02233342
      Mike Snitzer 提交于
      dm_mq_queue_rq() is in atomic context so care must be taken to not
      sleep -- as such GFP_ATOMIC is used for the md->bs bioset allocations
      and dm-mpath's call to blk_get_request().  In the future the bioset
      allocations will hopefully go away (by removing support for partial
      completions of bios in a cloned request).
      
      Also prepare for supporting DM blk-mq ontop of old-style request_fn
      device(s) if a new dm-mod 'use_blk_mq' parameter is set.  The kthread
      will still be used to queue work if blk-mq is used ontop of old-style
      request_fn device(s).
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      02233342
    • M
      dm: add full blk-mq support to request-based DM · bfebd1cd
      Mike Snitzer 提交于
      Commit e5863d9a ("dm: allocate requests in target when stacking on
      blk-mq devices") served as the first step toward fully utilizing blk-mq
      in request-based DM -- it enabled stacking an old-style (request_fn)
      request_queue ontop of the underlying blk-mq device(s).  That first step
      didn't improve performance of DM multipath ontop of fast blk-mq devices
      (e.g. NVMe) because the top-level old-style request_queue was severely
      limited by the queue_lock.
      
      The second step offered here enables stacking a blk-mq request_queue
      ontop of the underlying blk-mq device(s).  This unlocks significant
      performance gains on fast blk-mq devices, Keith Busch tested on his NVMe
      testbed and offered this really positive news:
      
       "Just providing a performance update. All my fio tests are getting
        roughly equal performance whether accessed through the raw block
        device or the multipath device mapper (~470k IOPS). I could only push
        ~20% of the raw iops through dm before this conversion, so this latest
        tree is looking really solid from a performance standpoint."
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Tested-by: NKeith Busch <keith.busch@intel.com>
      bfebd1cd
  7. 01 4月, 2015 1 次提交
  8. 10 2月, 2015 3 次提交
    • J
      dm mpath: simplify failure path of dm_multipath_init() · ff658e9c
      Johannes Thumshirn 提交于
      Currently the cleanup of all error cases are open-coded.  Introduce a
      common exit path and labels.
      Signed-off-by: NJohannes Thumshirn <morbidrsa@gmail.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      ff658e9c
    • M
      dm: allocate requests in target when stacking on blk-mq devices · e5863d9a
      Mike Snitzer 提交于
      For blk-mq request-based DM the responsibility of allocating a cloned
      request is transfered from DM core to the target type.  Doing so
      enables the cloned request to be allocated from the appropriate
      blk-mq request_queue's pool (only the DM target, e.g. multipath, can
      know which block device to send a given cloned request to).
      
      Care was taken to preserve compatibility with old-style block request
      completion that requires request-based DM _not_ acquire the clone
      request's queue lock in the completion path.  As such, there are now 2
      different request-based DM target_type interfaces:
      1) the original .map_rq() interface will continue to be used for
         non-blk-mq devices -- the preallocated clone request is passed in
         from DM core.
      2) a new .clone_and_map_rq() and .release_clone_rq() will be used for
         blk-mq devices -- blk_get_request() and blk_put_request() are used
         respectively from these hooks.
      
      dm_table_set_type() was updated to detect if the request-based target is
      being stacked on blk-mq devices, if so DM_TYPE_MQ_REQUEST_BASED is set.
      DM core disallows switching the DM table's type after it is set.  This
      means that there is no mixing of non-blk-mq and blk-mq devices within
      the same request-based DM table.
      
      [This patch was started by Keith and later heavily modified by Mike]
      Tested-by: NBart Van Assche <bvanassche@acm.org>
      Signed-off-by: NKeith Busch <keith.busch@intel.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      e5863d9a
    • K
      dm: submit stacked requests in irq enabled context · 2eb6e1e3
      Keith Busch 提交于
      Switch to having request-based DM enqueue all prep'ed requests into work
      processed by another thread.  This allows request-based DM to invoke
      block APIs that assume interrupt enabled context (e.g. blk_get_request)
      and is a prerequisite for adding blk-mq support to request-based DM.
      
      The new kernel thread is only initialized for request-based DM devices.
      
      multipath_map() is now always in irq enabled context so change multipath
      spinlock (m->lock) locking to always disable interrupts.
      Signed-off-by: NKeith Busch <keith.busch@intel.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      2eb6e1e3
  9. 06 10月, 2014 1 次提交
  10. 02 8月, 2014 1 次提交
  11. 11 7月, 2014 1 次提交
    • J
      dm mpath: fix IO hang due to logic bug in multipath_busy · 7a7a3b45
      Jun'ichi Nomura 提交于
      Commit e8099177 ("dm mpath: push back requests instead of queueing")
      modified multipath_busy() to return true if !pg_ready().  pg_ready()
      checks the current state of the multipath device and may return false
      even if a new IO is needed to change the state.
      
      Bart Van Assche reported that he had multipath IO lockup when he was
      performing cable pull tests.  Analysis showed that the multipath
      device had a single path group with both paths active, but that the
      path group itself was not active.  During the multipath device state
      transitions 'queue_io' got set but nothing could clear it.  Clearing
      'queue_io' only happens in __choose_pgpath(), but it won't be called
      if multipath_busy() returns true due to pg_ready() returning false
      when 'queue_io' is set.
      
      As such the !pg_ready() check in multipath_busy() is wrong because new
      IO will not be sent to multipath target and the multipath state change
      won't happen.  That results in multipath IO lockup.
      
      The intent of multipath_busy() is to avoid unnecessary cycles of
      dequeue + request_fn + requeue if it is known that the multipath
      device will requeue.
      
      Such "busy" situations would be:
        - path group is being activated
        - there is no path and the multipath is setup to requeue if no path
      
      Fix multipath_busy() to return "busy" early only for these specific
      situations.
      Reported-by: NBart Van Assche <bvanassche@acm.org>
      Tested-by: NBart Van Assche <bvanassche@acm.org>
      Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org # v3.15
      7a7a3b45
  12. 04 6月, 2014 1 次提交
    • M
      dm: disable WRITE SAME if it fails · 7eee4ae2
      Mike Snitzer 提交于
      Add DM core support for disabling WRITE SAME on first failure to both
      request-based and bio-based targets.  The need to disable WRITE SAME
      stems from SCSI enabling it by default but then disabling it when it
      fails.  When SCSI does this it returns "permanent target failure, do
      not retry" using -EREMOTEIO.  Update DM core to only disable WRITE SAME
      on failure if the returned error is -EREMOTEIO.
      
      Commit f84cb8a4 ("dm mpath: disable WRITE SAME if it fails")
      implemented multipath specific disabling of WRITE SAME if it fails.
      However, as that commit detailed, the multipath-only solution doesn't go
      far enough if bio-based DM targets are stacked ontop of the
      request-based dm-multipath target (as is commonly done using dm-linear
      to support partitions on multipath devices, via kpartx).
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Acked-by: NMartin K. Petersen <martin.petersen@oracle.com>
      Tested-by: NAlex Chen <alex.chen@huawei.com>
      7eee4ae2
  13. 27 5月, 2014 1 次提交
  14. 15 5月, 2014 1 次提交
  15. 28 3月, 2014 8 次提交
  16. 26 2月, 2014 1 次提交
  17. 06 11月, 2013 1 次提交
    • H
      dm mpath: requeue I/O during pg_init · b63349a7
      Hannes Reinecke 提交于
      When pg_init is running no I/O can be submitted to the underlying
      devices, as the path priority etc might change.  When using queue_io for
      this, requests will be piling up within multipath as the block I/O
      scheduler just sees a _very fast_ device.  All of this queued I/O has to
      be resubmitted from within multipathing once pg_init is done.
      
      This approach has the problem that it's virtually impossible to
      abort I/O when pg_init is running, and we're adding heavy load
      to the devices after pg_init since all of the queued I/O needs to be
      resubmitted _before_ any requests can be pulled off of the request queue
      and normal operation continues.
      
      This patch will requeue the I/O that triggers the pg_init call, and
      return 'busy' when pg_init is in progress.  With these changes the block
      I/O scheduler will stop submitting I/O during pg_init, resulting in a
      quicker path switch and less I/O pressure (and memory consumption) after
      pg_init.
      Signed-off-by: NHannes Reinecke <hare@suse.de>
      [patch header edited for clarity and typos by Mike Snitzer]
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      b63349a7