1. 21 11月, 2016 2 次提交
  2. 29 9月, 2016 1 次提交
    • H
      dm mpath: always return reservation conflict without failing over · 8ff232c1
      Hannes Reinecke 提交于
      If dm-mpath encounters an reservation conflict it should not fail the
      path (as communication with the target is not affected) but should
      rather retry on another path.  However, in doing so we might be inducing
      a ping-pong between paths, with no guarantee of any forward progress.
      And arguably a reservation conflict is an unexpected error, so we should
      be passing it upwards to allow the application to take appropriate
      steps.
      
      This change resolves a show-stopper problem seen with the pNFS SCSI
      layout because it is trivial to hit reservation conflict based failover
      loops without it.
      
      Doubts were raised about the implications of this change relative to
      products like IBM's SVC.  But there is little point withholding a fix
      for Linux because a proprietary product may or may not have some issues
      in its implementation of how it interfaces with Linux.  In the future,
      if there is glaring evidence that this change is certainly problematic
      we can revisit it.
      Signed-off-by: NHannes Reinecke <hare@suse.de>
      Acked-by: NChristoph Hellwig <hch@lst.de>
      Tested-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: Mike Snitzer <snitzer@redhat.com> # tweaked header
      8ff232c1
  3. 15 9月, 2016 4 次提交
  4. 08 8月, 2016 1 次提交
    • J
      block: rename bio bi_rw to bi_opf · 1eff9d32
      Jens Axboe 提交于
      Since commit 63a4cc24, bio->bi_rw contains flags in the lower
      portion and the op code in the higher portions. This means that
      old code that relies on manually setting bi_rw is most likely
      going to be broken. Instead of letting that brokeness linger,
      rename the member, to force old and out-of-tree code to break
      at compile time instead of at runtime.
      
      No intended functional changes in this commit.
      Signed-off-by: NJens Axboe <axboe@fb.com>
      1eff9d32
  5. 03 8月, 2016 1 次提交
  6. 11 6月, 2016 4 次提交
    • M
      dm mpath: add optional "queue_mode" feature · e83068a5
      Mike Snitzer 提交于
      Allow a user to specify an optional feature 'queue_mode <mode>' where
      <mode> may be "bio", "rq" or "mq" -- which corresponds to bio-based,
      request_fn rq-based, and blk-mq rq-based respectively.
      
      If the queue_mode feature isn't specified the default for the
      "multipath" target is still "rq" but if dm_mod.use_blk_mq is set to Y
      it'll default to mode "mq".
      
      This new queue_mode feature introduces the ability for each multipath
      device to have its own queue_mode (whereas before this feature all
      multipath devices effectively had to have the same queue_mode).
      
      This commit also goes a long way to eliminate the awkward (ab)use of
      DM_TYPE_*, the associated filter_md_type() and other relatively fragile
      and difficult to maintain code.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      e83068a5
    • M
      bf661be1
    • M
      dm mpath: reinstate bio-based support · 76e33fe4
      Mike Snitzer 提交于
      Add "multipath-bio" target that offers a bio-based multipath target as
      an alternative to the request-based "multipath" target -- but in a
      following commit "multipath-bio" will immediately be replaced by a new
      "queue_mode" feature for the "multipath" target which will allow
      bio-based mode to be selected.
      
      When DM multipath was originally converted from bio-based to
      request-based the motivation for the change was better dynamic load
      balancing (by leveraging block core's request-based IO schedulers, for
      merging and sorting, _before_ DM multipath would make the decision on
      where to steer the IO -- based on path load and/or availability).
      
      More background is available in this "Request-based Device-mapper
      multipath and Dynamic load balancing" paper:
      https://www.kernel.org/doc/ols/2007/ols2007v2-pages-235-244.pdf
      
      But we've now come full circle where significantly faster storage
      devices no longer need IOs to be made larger to drive optimal IO
      performance.  And even if they do there have been changes to the block
      and filesystem layers that help ensure upper layers are constructing
      larger IOs.  In addition, SCSI's differentiated IO errors will propagate
      through to bio-based IO completion hooks -- so that eliminates another
      historic justiciation for request-based DM multipath.  Lastly, the block
      layer's immutable biovec changes have made bio cloning cheaper than it
      has ever been; whereas request cloning is still relatively expensive
      (both on a CPU usage and memory footprint level).
      
      As such, bio-based DM multipath offers the promise of a more efficient
      IO path for high IOPs devices that are, or will be, emerging.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      76e33fe4
    • M
      dm: move request-based code out to dm-rq.[hc] · 4cc96131
      Mike Snitzer 提交于
      Add some seperation between bio-based and request-based DM core code.
      
      'struct mapped_device' and other DM core only structures and functions
      have been moved to dm-core.h and all relevant DM core .c files have been
      updated to include dm-core.h rather than dm.h
      
      DM targets should _never_ include dm-core.h!
      
      [block core merge conflict resolution from Stephen Rothwell]
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
      4cc96131
  7. 06 5月, 2016 4 次提交
  8. 11 3月, 2016 1 次提交
    • M
      dm mpath: cleanup reinstate_path() et al based on code review · ec31f3f7
      Mike Snitzer 提交于
      fail_path() will print a "Failing path ..." message but reinstate_path()
      doesn't print a "Reinstating path ...".  Add that message to
      reinstate_path() to add symmetry and aid system debugging.
      
      Remove reinstate_path()'s check for the path_selector providing
      .reinstate_path hook.  All path selectors provide this and any future
      ones must too.
      
      activate_path() calls pg_init_done() with SCSI_DH_DEV_OFFLINED but
      pg_init_done() doesn't expicitly handle it in its swicth statement.  Add
      SCSI_DH_DEV_OFFLINED to the default case.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      ec31f3f7
  9. 23 2月, 2016 11 次提交
  10. 18 11月, 2015 2 次提交
    • J
      dm mpath: fix infinite recursion in ioctl when no paths and !queue_if_no_path · 43e43c9e
      Junichi Nomura 提交于
      In multipath_prepare_ioctl(),
        - pgpath is a path selected from available paths
        - m->queue_io is true if we cannot send a request immediately to
          paths, either because:
            * there is no available path
            * the path group needs activation (pg_init)
                - pg_init is not started
                - pg_init is still running
        - m->queue_if_no_path is true if the device is configured to queue
          I/O if there are no available paths
      
      If !pgpath && !m->queue_if_no_path, the handler should return -EIO.
      However in the course of refactoring the condition check has broken
      and returns success in that case.  Since bdev points to the dm device
      itself, dm_blk_ioctl() calls __blk_dev_driver_ioctl() for itself and
      recurses until crash.
      
      You could reproduce the problem like this:
      
        # dmsetup create mp --table '0 1024 multipath 0 0 0 0'
        # sg_inq /dev/mapper/mp
        <crash>
        [  172.648615] BUG: unable to handle kernel paging request at fffffffc81b10268
        [  172.662843] PGD 19dd067 PUD 0
        [  172.666269] Thread overran stack, or stack corrupted
        [  172.671808] Oops: 0000 [#1] SMP
        ...
      
      Fix the condition check with some clarifications.
      
      Fixes: e56f81e0 ("dm: refactor ioctl handling")
      Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Mike Snitzer <snitzer@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      43e43c9e
    • J
      dm: fix ioctl retry termination with signal · 5bbbfdf6
      Junichi Nomura 提交于
      dm-mpath retries ioctl, when no path is readily available and the device
      is configured to queue I/O in such a case. If you want to stop the retry
      before multipathd decides to turn off queueing mode, you could send
      signal for the process to exit from the loop.
      
      However the check of fatal signal has not carried along when commit
      6c182cd8 ("dm mpath: fix ioctl deadlock when no paths") moved the
      loop from dm-mpath to dm core. As a result, we can't terminate such
      a process in the retry loop.
      
      Easy reproducer of the situation is:
      
        # dmsetup create mp --table '0 1024 multipath 0 0 0 0'
        # dmsetup message mp 0 'queue_if_no_path'
        # sg_inq /dev/mapper/mp
      
      then you should be able to terminate sg_inq by pressing Ctrl+C.
      
      Fixes: 6c182cd8 ("dm mpath: fix ioctl deadlock when no paths")
      Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com>
      Cc: Hannes Reinecke <hare@suse.de>
      Cc: Mikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org
      5bbbfdf6
  11. 01 11月, 2015 3 次提交
    • C
      dm: add support for passing through persistent reservations · 71cdb697
      Christoph Hellwig 提交于
      This adds support to pass through persistent reservation requests
      similar to the existing ioctl handling, and with the same limitations,
      e.g. devices may only have a single target attached.
      
      This is mostly intended for multipathing.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      71cdb697
    • C
      dm: refactor ioctl handling · e56f81e0
      Christoph Hellwig 提交于
      This moves the call to blkdev_ioctl and the argument checking to DM core
      code, and only leaves a callout to find the block device to operate on
      in the targets.  This simplifies the code and allows us to pass through
      ioctl-like command using other methods in the next patch.
      
      Also split out a helper around calling the prepare_ioctl method that
      will be reused for persistent reservation handling.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      e56f81e0
    • M
      Revert "dm mpath: fix stalls when handling invalid ioctls" · 47796938
      Mauricio Faria de Oliveira 提交于
      This reverts commit a1989b33.
      
      That commit introduced a regression at least for the case of the SG_IO ioctl()
      running without CAP_SYS_RAWIO capability (e.g., unprivileged users) when there
      are no active paths: the ioctl() fails with the ENOTTY errno immediately rather
      than blocking due to queue_if_no_path until a path becomes active, for example.
      
      That case happens to be exercised by QEMU KVM guests with 'scsi-block' devices
      (qemu "-device scsi-block" [1], libvirt "<disk type='block' device='lun'>" [2])
      from multipath devices; which leads to SCSI/filesystem errors in such a guest.
      
      More general scenarios can hit that regression too. The following demonstration
      employs a SG_IO ioctl() with a standard SCSI INQUIRY command for this objective
      (some output & user changes omitted for brevity and comments added for clarity).
      
      Reverting that commit restores normal operation (queueing) in failing scenarios;
      tested on linux-next (next-20151022).
      
      1) Test-case is based on sg_simple0 [3] (just SG_IO; remove SG_GET_VERSION_NUM)
      
          $ cat sg_simple0.c
          ... see [3] ...
          $ sed '/SG_GET_VERSION_NUM/,/}/d' sg_simple0.c > sgio_inquiry.c
          $ gcc sgio_inquiry.c -o sgio_inquiry
      
      2) The ioctl() works fine with active paths present.
      
          # multipath -l 85ag56
          85ag56 (...) dm-19 IBM     ,2145
          size=60G features='1 queue_if_no_path' hwhandler='0' wp=rw
          |-+- policy='service-time 0' prio=0 status=active
          | |- 8:0:11:0  sdz  65:144  active undef running
          | `- 9:0:9:0   sdbf 67:144  active undef running
          `-+- policy='service-time 0' prio=0 status=enabled
            |- 8:0:12:0  sdae 65:224  active undef running
            `- 9:0:12:0  sdbo 68:32   active undef running
      
          $ ./sgio_inquiry /dev/mapper/85ag56
          Some of the INQUIRY command's response:
              IBM       2145              0000
          INQUIRY duration=0 millisecs, resid=0
      
      3) The ioctl() fails with ENOTTY errno with _no_ active paths present,
         for unprivileged users (rather than blocking due to queue_if_no_path).
      
          # for path in $(multipath -l 85ag56 | grep -o 'sd[a-z]\+'); \
                do multipathd -k"fail path $path"; done
      
          # multipath -l 85ag56
          85ag56 (...) dm-19 IBM     ,2145
          size=60G features='1 queue_if_no_path' hwhandler='0' wp=rw
          |-+- policy='service-time 0' prio=0 status=enabled
          | |- 8:0:11:0  sdz  65:144  failed undef running
          | `- 9:0:9:0   sdbf 67:144  failed undef running
          `-+- policy='service-time 0' prio=0 status=enabled
            |- 8:0:12:0  sdae 65:224  failed undef running
            `- 9:0:12:0  sdbo 68:32   failed undef running
      
          $ ./sgio_inquiry /dev/mapper/85ag56
          sg_simple0: Inquiry SG_IO ioctl error: Inappropriate ioctl for device
      
      4) dmesg shows that scsi_verify_blk_ioctl() failed for SG_IO (0x2285);
         it returns -ENOIOCTLCMD, later replaced with -ENOTTY in vfs_ioctl().
      
          $ dmesg
          <...>
          [] device-mapper: multipath: Failing path 65:144.
          [] device-mapper: multipath: Failing path 67:144.
          [] device-mapper: multipath: Failing path 65:224.
          [] device-mapper: multipath: Failing path 68:32.
          [] sgio_inquiry: sending ioctl 2285 to a partition!
      
      5) The ioctl() only works if the SYS_CAP_RAWIO capability is present
         (then queueing happens -- in this example, queue_if_no_path is set);
         this is due to a conditional check in scsi_verify_blk_ioctl().
      
          # capsh --drop=cap_sys_rawio -- -c './sgio_inquiry /dev/mapper/85ag56'
          sg_simple0: Inquiry SG_IO ioctl error: Inappropriate ioctl for device
      
          # ./sgio_inquiry /dev/mapper/85ag56 &
          [1] 72830
      
          # cat /proc/72830/stack
          [<c00000171c0df700>] 0xc00000171c0df700
          [<c000000000015934>] __switch_to+0x204/0x350
          [<c000000000152d4c>] msleep+0x5c/0x80
          [<c00000000077dfb0>] dm_blk_ioctl+0x70/0x170
          [<c000000000487c40>] blkdev_ioctl+0x2b0/0x9b0
          [<c0000000003128e4>] block_ioctl+0x64/0xd0
          [<c0000000002dd3b0>] do_vfs_ioctl+0x490/0x780
          [<c0000000002dd774>] SyS_ioctl+0xd4/0xf0
          [<c000000000009358>] system_call+0x38/0xd0
      
      6) This is the function call chain exercised in this analysis:
      
      SYSCALL_DEFINE3(ioctl, <...>) @ fs/ioctl.c
          -> do_vfs_ioctl()
              -> vfs_ioctl()
                  ...
                  error = filp->f_op->unlocked_ioctl(filp, cmd, arg);
                  ...
                      -> dm_blk_ioctl() @ drivers/md/dm.c
                          -> multipath_ioctl() @ drivers/md/dm-mpath.c
                              ...
                              (bdev = NULL, due to no active paths)
                              ...
                              if (!bdev || <...>) {
                                  int err = scsi_verify_blk_ioctl(NULL, cmd);
                                  if (err)
                                      r = err;
                              }
                              ...
                                  -> scsi_verify_blk_ioctl() @ block/scsi_ioctl.c
                                      ...
                                      if (bd && bd == bd->bd_contains) // not taken (bd = NULL)
                                          return 0;
                                      ...
                                      if (capable(CAP_SYS_RAWIO)) // not taken (unprivileged user)
                                          return 0;
                                      ...
                                      printk_ratelimited(KERN_WARNING
                                                 "%s: sending ioctl %x to a partition!\n" <...>);
      
                                      return -ENOIOCTLCMD;
                                  <-
                              ...
                              return r ? : <...>
                          <-
                  ...
                  if (error == -ENOIOCTLCMD)
                      error = -ENOTTY;
                   out:
                      return error;
                  ...
      
      Links:
      [1] http://git.qemu.org/?p=qemu.git;a=commit;h=336a6915bc7089fb20fea4ba99972ad9a97c5f52
      [2] https://libvirt.org/formatdomain.html#elementsDisks (see 'disk' -> 'device')
      [3] http://tldp.org/HOWTO/SCSI-Generic-HOWTO/pexample.html (Revision 1.2, 2002-05-03)
      Signed-off-by: NMauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org
      47796938
  12. 29 8月, 2015 2 次提交
  13. 28 5月, 2015 1 次提交
    • M
      dm mpath: fix leak of dm_mpath_io structure in blk-mq .queue_rq error path · 4c6dd53d
      Mike Snitzer 提交于
      Otherwise kmemleak reported:
      
      unreferenced object 0xffff88009b14e2b0 (size 16):
        comm "fio", pid 4274, jiffies 4294978034 (age 1253.210s)
        hex dump (first 16 bytes):
          40 12 f3 99 01 88 ff ff 00 10 00 00 00 00 00 00  @...............
        backtrace:
          [<ffffffff81600029>] kmemleak_alloc+0x49/0xb0
          [<ffffffff811679a8>] kmem_cache_alloc+0xf8/0x160
          [<ffffffff8111c950>] mempool_alloc_slab+0x10/0x20
          [<ffffffff8111cb37>] mempool_alloc+0x57/0x150
          [<ffffffffa04d2b61>] __multipath_map.isra.17+0xe1/0x220 [dm_multipath]
          [<ffffffffa04d2cb5>] multipath_clone_and_map+0x15/0x20 [dm_multipath]
          [<ffffffffa02889b5>] map_request.isra.39+0xd5/0x220 [dm_mod]
          [<ffffffffa028b0e4>] dm_mq_queue_rq+0x134/0x240 [dm_mod]
          [<ffffffff812cccb5>] __blk_mq_run_hw_queue+0x1d5/0x380
          [<ffffffff812ccaa5>] blk_mq_run_hw_queue+0xc5/0x100
          [<ffffffff812ce350>] blk_sq_make_request+0x240/0x300
          [<ffffffff812c0f30>] generic_make_request+0xc0/0x110
          [<ffffffff812c0ff2>] submit_bio+0x72/0x150
          [<ffffffff811c07cb>] do_blockdev_direct_IO+0x1f3b/0x2da0
          [<ffffffff811c166e>] __blockdev_direct_IO+0x3e/0x40
          [<ffffffff8120aa1a>] ext4_direct_IO+0x1aa/0x390
      
      Fixes: e5863d9a ("dm: allocate requests in target when stacking on blk-mq devices")
      Reported-by: NBart Van Assche <bart.vanassche@sandisk.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org # 4.0+
      4c6dd53d
  14. 16 4月, 2015 2 次提交
    • M
      dm: optimize dm_mq_queue_rq to _not_ use kthread if using pure blk-mq · 02233342
      Mike Snitzer 提交于
      dm_mq_queue_rq() is in atomic context so care must be taken to not
      sleep -- as such GFP_ATOMIC is used for the md->bs bioset allocations
      and dm-mpath's call to blk_get_request().  In the future the bioset
      allocations will hopefully go away (by removing support for partial
      completions of bios in a cloned request).
      
      Also prepare for supporting DM blk-mq ontop of old-style request_fn
      device(s) if a new dm-mod 'use_blk_mq' parameter is set.  The kthread
      will still be used to queue work if blk-mq is used ontop of old-style
      request_fn device(s).
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      02233342
    • M
      dm: add full blk-mq support to request-based DM · bfebd1cd
      Mike Snitzer 提交于
      Commit e5863d9a ("dm: allocate requests in target when stacking on
      blk-mq devices") served as the first step toward fully utilizing blk-mq
      in request-based DM -- it enabled stacking an old-style (request_fn)
      request_queue ontop of the underlying blk-mq device(s).  That first step
      didn't improve performance of DM multipath ontop of fast blk-mq devices
      (e.g. NVMe) because the top-level old-style request_queue was severely
      limited by the queue_lock.
      
      The second step offered here enables stacking a blk-mq request_queue
      ontop of the underlying blk-mq device(s).  This unlocks significant
      performance gains on fast blk-mq devices, Keith Busch tested on his NVMe
      testbed and offered this really positive news:
      
       "Just providing a performance update. All my fio tests are getting
        roughly equal performance whether accessed through the raw block
        device or the multipath device mapper (~470k IOPS). I could only push
        ~20% of the raw iops through dm before this conversion, so this latest
        tree is looking really solid from a performance standpoint."
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Tested-by: NKeith Busch <keith.busch@intel.com>
      bfebd1cd
  15. 01 4月, 2015 1 次提交