1. 15 9月, 2016 1 次提交
  2. 08 8月, 2016 1 次提交
    • J
      block: rename bio bi_rw to bi_opf · 1eff9d32
      Jens Axboe 提交于
      Since commit 63a4cc24, bio->bi_rw contains flags in the lower
      portion and the op code in the higher portions. This means that
      old code that relies on manually setting bi_rw is most likely
      going to be broken. Instead of letting that brokeness linger,
      rename the member, to force old and out-of-tree code to break
      at compile time instead of at runtime.
      
      No intended functional changes in this commit.
      Signed-off-by: NJens Axboe <axboe@fb.com>
      1eff9d32
  3. 03 8月, 2016 1 次提交
  4. 21 7月, 2016 2 次提交
    • T
      dm: add infrastructure for DAX support · 545ed20e
      Toshi Kani 提交于
      Change mapped device to implement direct_access function,
      dm_blk_direct_access(), which calls a target direct_access function.
      'struct target_type' is extended to have target direct_access interface.
      This function limits direct accessible size to the dm_target's limit
      with max_io_len().
      
      Add dm_table_supports_dax() to iterate all targets and associated block
      devices to check for DAX support.  To add DAX support to a DM target the
      target must only implement the direct_access function.
      
      Add a new dm type, DM_TYPE_DAX_BIO_BASED, which indicates that mapped
      device supports DAX and is bio based.  This new type is used to assure
      that all target devices have DAX support and remain that way after
      QUEUE_FLAG_DAX is set in mapped device.
      
      At initial table load, QUEUE_FLAG_DAX is set to mapped device when setting
      DM_TYPE_DAX_BIO_BASED to the type.  Any subsequent table load to the
      mapped device must have the same type, or else it fails per the check in
      table_load().
      Signed-off-by: NToshi Kani <toshi.kani@hpe.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      545ed20e
    • C
      block: get rid of bio_rw and READA · 70246286
      Christoph Hellwig 提交于
      These two are confusing leftover of the old world order, combining
      values of the REQ_OP_ and REQ_ namespaces.  For callers that don't
      special case we mostly just replace bi_rw with bio_data_dir or
      op_is_write, except for the few cases where a switch over the REQ_OP_
      values makes more sense.  Any check for READA is replaced with an
      explicit check for REQ_RAHEAD.  Also remove the READA alias for
      REQ_RAHEAD.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
      Reviewed-by: NMike Christie <mchristi@redhat.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      70246286
  5. 19 7月, 2016 1 次提交
  6. 11 6月, 2016 2 次提交
    • M
      dm mpath: add optional "queue_mode" feature · e83068a5
      Mike Snitzer 提交于
      Allow a user to specify an optional feature 'queue_mode <mode>' where
      <mode> may be "bio", "rq" or "mq" -- which corresponds to bio-based,
      request_fn rq-based, and blk-mq rq-based respectively.
      
      If the queue_mode feature isn't specified the default for the
      "multipath" target is still "rq" but if dm_mod.use_blk_mq is set to Y
      it'll default to mode "mq".
      
      This new queue_mode feature introduces the ability for each multipath
      device to have its own queue_mode (whereas before this feature all
      multipath devices effectively had to have the same queue_mode).
      
      This commit also goes a long way to eliminate the awkward (ab)use of
      DM_TYPE_*, the associated filter_md_type() and other relatively fragile
      and difficult to maintain code.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      e83068a5
    • M
      dm: move request-based code out to dm-rq.[hc] · 4cc96131
      Mike Snitzer 提交于
      Add some seperation between bio-based and request-based DM core code.
      
      'struct mapped_device' and other DM core only structures and functions
      have been moved to dm-core.h and all relevant DM core .c files have been
      updated to include dm-core.h rather than dm.h
      
      DM targets should _never_ include dm-core.h!
      
      [block core merge conflict resolution from Stephen Rothwell]
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
      4cc96131
  7. 08 6月, 2016 5 次提交
  8. 06 5月, 2016 1 次提交
  9. 11 4月, 2016 1 次提交
  10. 15 3月, 2016 1 次提交
    • B
      dm: fix rq_end_stats() NULL pointer in dm_requeue_original_request() · 98dbc9c6
      Bryn M. Reeves 提交于
      An "old" (.request_fn) DM 'struct request' stores a pointer to the
      associated 'struct dm_rq_target_io' in rq->special.
      
      dm_requeue_original_request(), previously named
      dm_requeue_unmapped_original_request(), called dm_unprep_request() to
      reset rq->special to NULL.  But rq_end_stats() would go on to hit a NULL
      pointer deference because its call to tio_from_request() returned NULL.
      
      Fix this by calling rq_end_stats() _before_ dm_unprep_request()
      Signed-off-by: NBryn M. Reeves <bmr@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Fixes: e262f347 ("dm stats: add support for request-based DM devices")
      Cc: stable@vger.kernel.org # 4.2+
      98dbc9c6
  11. 11 3月, 2016 5 次提交
  12. 23 2月, 2016 12 次提交
    • M
      dm: allow immutable request-based targets to use blk-mq pdu · 591ddcfc
      Mike Snitzer 提交于
      This will allow DM multipath to use a portion of the blk-mq pdu space
      for target data (e.g. struct dm_mpath_io).
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      591ddcfc
    • M
      dm: rename target's per_bio_data_size to per_io_data_size · 30187e1d
      Mike Snitzer 提交于
      Request-based DM will also make use of per_bio_data_size.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      30187e1d
    • M
      dm: distinquish old .request_fn (dm-old) vs dm-mq request-based DM · eca7ee6d
      Mike Snitzer 提交于
      Rename various methods to have either a "dm_old" or "dm_mq" prefix.
      Improve code comments to assist with understanding the duality of code
      that handles both "dm_old" and "dm_mq" cases.
      
      It is no much easier to quickly look at the code and _know_ that a given
      method is either 1) "dm_old" only 2) "dm_mq" only 3) common to both.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      eca7ee6d
    • M
      dm: remove support for stacking dm-mq on .request_fn device(s) · c5248f79
      Mike Snitzer 提交于
      Remove all fiddley code that propped up this support for a blk-mq
      request-queue ontop of all .request_fn devices.
      
      Testing has proven this niche request-based dm-mq mode to be buggy, when
      testing fault tolerance with DM multipath, and there is no point trying
      to preserve it.
      
      Should help improve efficiency of pure dm-mq code and make code
      maintenance less delicate.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      c5248f79
    • M
      dm: fix a couple locking issues with use of block interfaces · 818c5f3b
      Mike Snitzer 提交于
      old_stop_queue() was checking blk_queue_stopped() without holding the
      q->queue_lock.
      
      dm_requeue_original_request() needed to check blk_queue_stopped(), with
      q->queue_lock held, before calling blk_mq_kick_requeue_list().  And a
      side-effect of that change is start_queue() must also call
      blk_mq_kick_requeue_list().
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      818c5f3b
    • M
      dm: allocate blk_mq_tag_set rather than embed in mapped_device · 1c357a1e
      Mike Snitzer 提交于
      The blk_mq_tag_set is only needed for dm-mq support.  There is point
      wasting space in 'struct mapped_device' for non-dm-mq devices.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> # check kzalloc return
      1c357a1e
    • M
      dm: add 'dm_mq_nr_hw_queues' and 'dm_mq_queue_depth' module params · faad87df
      Mike Snitzer 提交于
      Allow user to change these values via module params or sysfs.
      
      'dm_mq_nr_hw_queues' defaults to 1 (max 32).
      
      'dm_mq_queue_depth' defaults to 2048 (up from 64, which proved far too
      small under moderate sized workloads -- the dm-multipath device would
      continuously block waiting for tags (requests) to become available).
      The maximum is BLK_MQ_MAX_DEPTH (currently 10240).
      
      Keep in mind the total number of pre-allocated requests per
      request-based dm-mq device is 'dm_mq_nr_hw_queues' * 'dm_mq_queue_depth'
      (currently 2048).
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      faad87df
    • M
      dm: optimize dm_request_fn() · c91852ff
      Mike Snitzer 提交于
      DM multipath is the only request-based DM target -- which only supports
      tables with a single target that is immutable.  Leverage this fact in
      dm_request_fn().
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      c91852ff
    • M
      dm: optimize dm_mq_queue_rq() · 16f12266
      Mike Snitzer 提交于
      DM multipath is the only dm-mq target.  But that aside, request-based DM
      only supports tables with a single target that is immutable.  Leverage
      this fact in dm_mq_queue_rq() by using the 'immutable_target' stored in
      the mapped_device when the table was made active.  This saves the need
      to even take the read-side of the SRCU via dm_{get,put}_live_table.
      
      If the active DM table does not have an immutable target (e.g. "error"
      target was swapped in) then fallback to the slow-path where the target
      is looked up from the live table.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      16f12266
    • M
      dm: cleanup dm_any_congested() · e522c039
      Mike Snitzer 提交于
      The request-based DM support for checking queue congestion doesn't
      require access to the live DM table.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      e522c039
    • M
      dm: remove unused dm_get_rq_mapinfo() · ae6ad75e
      Mike Snitzer 提交于
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      ae6ad75e
    • M
      dm: fix excessive dm-mq context switching · 6acfe68b
      Mike Snitzer 提交于
      Request-based DM's blk-mq support (dm-mq) was reported to be 50% slower
      than if an underlying null_blk device were used directly.  One of the
      reasons for this drop in performance is that blk_insert_clone_request()
      was calling blk_mq_insert_request() with @async=true.  This forced the
      use of kblockd_schedule_delayed_work_on() to run the blk-mq hw queues
      which ushered in ping-ponging between process context (fio in this case)
      and kblockd's kworker to submit the cloned request.  The ftrace
      function_graph tracer showed:
      
        kworker-2013  =>   fio-12190
        fio-12190    =>  kworker-2013
        ...
        kworker-2013  =>   fio-12190
        fio-12190    =>  kworker-2013
        ...
      
      Fixing blk_insert_clone_request()'s blk_mq_insert_request() call to
      _not_ use kblockd to submit the cloned requests isn't enough to
      eliminate the observed context switches.
      
      In addition to this dm-mq specific blk-core fix, there are 2 DM core
      fixes to dm-mq that (when paired with the blk-core fix) completely
      eliminate the observed context switching:
      
      1)  don't blk_mq_run_hw_queues in blk-mq request completion
      
          Motivated by desire to reduce overhead of dm-mq, punting to kblockd
          just increases context switches.
      
          In my testing against a really fast null_blk device there was no benefit
          to running blk_mq_run_hw_queues() on completion (and no other blk-mq
          driver does this).  So hopefully this change doesn't induce the need for
          yet another revert like commit 621739b0 !
      
      2)  use blk_mq_complete_request() in dm_complete_request()
      
          blk_complete_request() doesn't offer the traditional q->mq_ops vs
          .request_fn branching pattern that other historic block interfaces
          do (e.g. blk_get_request).  Using blk_mq_complete_request() for
          blk-mq requests is important for performance.  It should be noted
          that, like blk_complete_request(), blk_mq_complete_request() doesn't
          natively handle partial completions -- but the request-based
          DM-multipath target does provide the required partial completion
          support by dm.c:end_clone_bio() triggering requeueing of the request
          via dm-mpath.c:multipath_end_io()'s return of DM_ENDIO_REQUEUE.
      
      dm-mq fix #2 is _much_ more important than #1 for eliminating the
      context switches.
      Before: cpu          : usr=15.10%, sys=59.39%, ctx=7905181, majf=0, minf=475
      After:  cpu          : usr=20.60%, sys=79.35%, ctx=2008, majf=0, minf=472
      
      With these changes multithreaded async read IOPs improved from ~950K
      to ~1350K for this dm-mq stacked on null_blk test-case.  The raw read
      IOPs of the underlying null_blk device for the same workload is ~1950K.
      
      Fixes: 7fb4898e ("block: add blk-mq support to blk_insert_cloned_request()")
      Fixes: bfebd1cd ("dm: add full blk-mq support to request-based DM")
      Cc: stable@vger.kernel.org # 4.1+
      Reported-by: NSagi Grimberg <sagig@dev.mellanox.co.il>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Acked-by: NJens Axboe <axboe@kernel.dk>
      6acfe68b
  13. 22 2月, 2016 3 次提交
    • M
      dm: fix sparse "unexpected unlock" warnings in ioctl code · 956a4025
      Mike Snitzer 提交于
      Rename dm_get_live_table_for_ioctl to dm_grab_bdev_for_ioctl and have it
      do the dm_{get,put}_live_table() rather than split those operations.
      
      The dm_grab_bdev_for_ioctl() callers only care about the block_device
      associated with a singleton DM device so there isn't any need to retain
      a reference to the live DM table.  It is sufficient to:
      1) dm_get_live_table()
      2) bdgrab() the bdev associated with the singleton table's target
      3) dm_put_live_table()
      4) bdput() the bdev
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      956a4025
    • M
      dm: do not return target from dm_get_live_table_for_ioctl() · 66482026
      Mike Snitzer 提交于
      None of the callers actually used the returned target.
      Also, just reuse bdev pointer passed to dm_blk_ioctl().
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      66482026
    • M
      dm: fix dm_rq_target_io leak on faults with .request_fn DM w/ blk-mq paths · 4328daa2
      Mike Snitzer 提交于
      Using request-based DM mpath configured with the following stacking
      (.request_fn DM mpath ontop of scsi-mq paths):
      
      echo Y > /sys/module/scsi_mod/parameters/use_blk_mq
      echo N > /sys/module/dm_mod/parameters/use_blk_mq
      
      'struct dm_rq_target_io' would leak if a request is requeued before a
      blk-mq clone is allocated (or fails to allocate).  free_rq_tio()
      wasn't being called.
      
      kmemleak reported:
      
      unreferenced object 0xffff8800b90b98c0 (size 112):
        comm "kworker/7:1H", pid 5692, jiffies 4295056109 (age 78.589s)
        hex dump (first 32 bytes):
          00 d0 5c 2c 03 88 ff ff 40 00 bf 01 00 c9 ff ff  ..\,....@.......
          e0 d9 b1 34 00 88 ff ff 00 00 00 00 00 00 00 00  ...4............
        backtrace:
          [<ffffffff81672b6e>] kmemleak_alloc+0x4e/0xb0
          [<ffffffff811dbb63>] kmem_cache_alloc+0xc3/0x1e0
          [<ffffffff8117eae5>] mempool_alloc_slab+0x15/0x20
          [<ffffffff8117ec1e>] mempool_alloc+0x6e/0x170
          [<ffffffffa00029ac>] dm_old_prep_fn+0x3c/0x180 [dm_mod]
          [<ffffffff812fbd78>] blk_peek_request+0x168/0x290
          [<ffffffffa0003e62>] dm_request_fn+0xb2/0x1b0 [dm_mod]
          [<ffffffff812f66e3>] __blk_run_queue+0x33/0x40
          [<ffffffff812f9585>] blk_delay_work+0x25/0x40
          [<ffffffff81096fff>] process_one_work+0x14f/0x3d0
          [<ffffffff81097715>] worker_thread+0x125/0x4b0
          [<ffffffff8109ce88>] kthread+0xd8/0xf0
          [<ffffffff8167cb8f>] ret_from_fork+0x3f/0x70
          [<ffffffffffffffff>] 0xffffffffffffffff
      
      crash> struct -o dm_rq_target_io
      struct dm_rq_target_io {
          ...
      }
      SIZE: 112
      
      Fixes: e5863d9a ("dm: allocate requests in target when stacking on blk-mq devices")
      Cc: stable@vger.kernel.org # 4.0+
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      4328daa2
  14. 18 11月, 2015 2 次提交
    • M
      dm: do not reuse dm_blk_ioctl block_device input as local variable · 647a20d5
      Mike Snitzer 提交于
      (Ab)using the @bdev passed to dm_blk_ioctl() opens the potential for
      targets' .prepare_ioctl to fail if they go on to check the bdev for
      !NULL.
      
      Fixes: e56f81e0 ("dm: refactor ioctl handling")
      Reported-by: NJunichi Nomura <j-nomura@ce.jp.nec.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      647a20d5
    • J
      dm: fix ioctl retry termination with signal · 5bbbfdf6
      Junichi Nomura 提交于
      dm-mpath retries ioctl, when no path is readily available and the device
      is configured to queue I/O in such a case. If you want to stop the retry
      before multipathd decides to turn off queueing mode, you could send
      signal for the process to exit from the loop.
      
      However the check of fatal signal has not carried along when commit
      6c182cd8 ("dm mpath: fix ioctl deadlock when no paths") moved the
      loop from dm-mpath to dm core. As a result, we can't terminate such
      a process in the retry loop.
      
      Easy reproducer of the situation is:
      
        # dmsetup create mp --table '0 1024 multipath 0 0 0 0'
        # dmsetup message mp 0 'queue_if_no_path'
        # sg_inq /dev/mapper/mp
      
      then you should be able to terminate sg_inq by pressing Ctrl+C.
      
      Fixes: 6c182cd8 ("dm mpath: fix ioctl deadlock when no paths")
      Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com>
      Cc: Hannes Reinecke <hare@suse.de>
      Cc: Mikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org
      5bbbfdf6
  15. 08 11月, 2015 1 次提交
  16. 01 11月, 2015 1 次提交
    • M
      dm: eliminate unused "bioset" process for each bio-based DM device · dbba42d8
      Mikulas Patocka 提交于
      Commit 54efd50b ("block: make
      generic_make_request handle arbitrarily sized bios") makes it possible
      for block devices to process large bios.  In doing so that commit
      allocates a new queue->bio_split bioset for each block device, this
      bioset is used for allocating bios when the driver needs to split large
      bios.
      
      Each bioset allocates a workqueue process, thus the above commit
      increases the number of processes allocated per block device.
      
      DM doesn't need the queue->bio_split bioset, thus we can deallocate it.
      This reduces the number of allocated processes per bio-based DM device
      from 3 to 2.  Also remove the call to blk_queue_split(), it is not
      needed because DM does its own splitting.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      dbba42d8