1. 24 8月, 2020 1 次提交
  2. 05 8月, 2020 1 次提交
  3. 24 7月, 2020 1 次提交
    • M
      dm integrity: fix integrity recalculation that is improperly skipped · 5df96f2b
      Mikulas Patocka 提交于
      Commit adc0daad ("dm: report suspended
      device during destroy") broke integrity recalculation.
      
      The problem is dm_suspended() returns true not only during suspend,
      but also during resume. So this race condition could occur:
      1. dm_integrity_resume calls queue_work(ic->recalc_wq, &ic->recalc_work)
      2. integrity_recalc (&ic->recalc_work) preempts the current thread
      3. integrity_recalc calls if (unlikely(dm_suspended(ic->ti))) goto unlock_ret;
      4. integrity_recalc exits and no recalculating is done.
      
      To fix this race condition, add a function dm_post_suspending that is
      only true during the postsuspend phase and use it instead of
      dm_suspended().
      
      Signed-off-by: Mikulas Patocka <mpatocka redhat com>
      Fixes: adc0daad ("dm: report suspended device during destroy")
      Cc: stable vger kernel org # v4.18+
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      5df96f2b
  4. 09 7月, 2020 3 次提交
  5. 08 7月, 2020 2 次提交
    • C
      dm: use bio_uninit instead of bio_disassociate_blkg · 382761dc
      Christoph Hellwig 提交于
      bio_uninit is the proper API to clean up a BIO that has been allocated
      on stack or inside a structure that doesn't come from the BIO allocator.
      Switch dm to use that instead of bio_disassociate_blkg, which really is
      an implementation detail.  Note that the bio_uninit calls are also moved
      to the two callers of __send_empty_flush, so that they better pair with
      the bio_init calls used to initialize them.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      382761dc
    • M
      dm: do not use waitqueue for request-based DM · 85067747
      Ming Lei 提交于
      Given request-based DM now uses blk-mq's blk_mq_queue_inflight() to
      determine if outstanding IO has completed (and DM has no control over
      the blk-mq state machine used to track outstanding IO) it is unsafe to
      wakeup waiter (dm_wait_for_completion) before blk-mq has cleared a
      request's state bits (e.g. MQ_RQ_IN_FLIGHT or MQ_RQ_COMPLETE).  As
      such dm_wait_for_completion() could be left to wait indefinitely if no
      other requests complete.
      
      Fix this by eliminating request-based DM's use of waitqueue to wait
      for blk-mq requests to complete in dm_wait_for_completion.
      Signed-off-by: NMing Lei <ming.lei@redhat.com>
      Depends-on: 3c94d83c ("blk-mq: change blk_mq_queue_busy() to blk_mq_queue_inflight()")
      Cc: stable@vger.kernel.org
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      85067747
  6. 02 7月, 2020 1 次提交
    • J
      dm: remove unused variable · b53ac8b8
      Jens Axboe 提交于
      Since merging the commit identified in Fixes below, we trigger this
      compile time warning:
      
      drivers/md/dm.c: In function ‘__map_bio’:
      drivers/md/dm.c:1296:24: warning: unused variable ‘md’ [-Wunused-variable]
       1296 |  struct mapped_device *md = io->md;
             |                        ^~
      
      Remove the 'md' variable.
      
      Fixes: 5a6c35f9 ("block: remove direct_make_request")
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      b53ac8b8
  7. 01 7月, 2020 5 次提交
  8. 29 6月, 2020 1 次提交
  9. 20 6月, 2020 1 次提交
  10. 27 5月, 2020 1 次提交
  11. 21 5月, 2020 1 次提交
    • M
      dm: use DMDEBUG macros now that they use pr_debug variants · ac75b09f
      Mike Snitzer 提交于
      Now that DMDEBUG uses pr_debug and DMDEBUG_LIMIT uses
      pr_debug_ratelimited cleanup DM's 2 direct pr_debug callers to use
      them to get the benefit of consistent DM_FMT formatting of debugging
      messages.
      
      While doing so, dm-mpath.c:dm_report_EIO() was switched over to using
      DMDEBUG_LIMIT due to the potential for error handling floods in the IO
      completion path.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      ac75b09f
  12. 19 5月, 2020 1 次提交
    • C
      blk-mq: allow blk_mq_make_request to consume the q_usage_counter reference · ac7c5675
      Christoph Hellwig 提交于
      blk_mq_make_request currently needs to grab an q_usage_counter
      reference when allocating a request.  This is because the block layer
      grabs one before calling blk_mq_make_request, but also releases it as
      soon as blk_mq_make_request returns.  Remove the blk_queue_exit call
      after blk_mq_make_request returns, and instead let it consume the
      reference.  This works perfectly fine for the block layer caller, just
      device mapper needs an extra reference as the old problem still
      persists there.  Open code blk_queue_enter_live in device mapper,
      as there should be no other callers and this allows better documenting
      why we do a non-try get.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      ac7c5675
  13. 15 5月, 2020 1 次提交
  14. 14 5月, 2020 1 次提交
    • S
      block: Inline encryption support for blk-mq · a892c8d5
      Satya Tangirala 提交于
      We must have some way of letting a storage device driver know what
      encryption context it should use for en/decrypting a request. However,
      it's the upper layers (like the filesystem/fscrypt) that know about and
      manages encryption contexts. As such, when the upper layer submits a bio
      to the block layer, and this bio eventually reaches a device driver with
      support for inline encryption, the device driver will need to have been
      told the encryption context for that bio.
      
      We want to communicate the encryption context from the upper layer to the
      storage device along with the bio, when the bio is submitted to the block
      layer. To do this, we add a struct bio_crypt_ctx to struct bio, which can
      represent an encryption context (note that we can't use the bi_private
      field in struct bio to do this because that field does not function to pass
      information across layers in the storage stack). We also introduce various
      functions to manipulate the bio_crypt_ctx and make the bio/request merging
      logic aware of the bio_crypt_ctx.
      
      We also make changes to blk-mq to make it handle bios with encryption
      contexts. blk-mq can merge many bios into the same request. These bios need
      to have contiguous data unit numbers (the necessary changes to blk-merge
      are also made to ensure this) - as such, it suffices to keep the data unit
      number of just the first bio, since that's all a storage driver needs to
      infer the data unit number to use for each data block in each bio in a
      request. blk-mq keeps track of the encryption context to be used for all
      the bios in a request with the request's rq_crypt_ctx. When the first bio
      is added to an empty request, blk-mq will program the encryption context
      of that bio into the request_queue's keyslot manager, and store the
      returned keyslot in the request's rq_crypt_ctx. All the functions to
      operate on encryption contexts are in blk-crypto.c.
      
      Upper layers only need to call bio_crypt_set_ctx with the encryption key,
      algorithm and data_unit_num; they don't have to worry about getting a
      keyslot for each encryption context, as blk-mq/blk-crypto handles that.
      Blk-crypto also makes it possible for request-based layered devices like
      dm-rq to make use of inline encryption hardware by cloning the
      rq_crypt_ctx and programming a keyslot in the new request_queue when
      necessary.
      
      Note that any user of the block layer can submit bios with an
      encryption context, such as filesystems, device-mapper targets, etc.
      Signed-off-by: NSatya Tangirala <satyat@google.com>
      Reviewed-by: NEric Biggers <ebiggers@google.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      a892c8d5
  15. 25 4月, 2020 1 次提交
  16. 03 4月, 2020 3 次提交
    • M
      Revert "dm: always call blk_queue_split() in dm_process_bio()" · 120c9257
      Mike Snitzer 提交于
      This reverts commit effd58c9.
      
      blk_queue_split() is causing excessive IO splitting -- because
      blk_max_size_offset() depends on 'chunk_sectors' limit being set and
      if it isn't (as is the case for DM targets!) it falls back to
      splitting on a 'max_sectors' boundary regardless of offset.
      
      "Fix" this by reverting back to _not_ using blk_queue_split() in
      dm_process_bio() for normal IO (reads and writes).  Long-term fix is
      still TBD but it should focus on training blk_max_size_offset() to
      call into a DM provided hook (to call DM's max_io_len()).
      
      Test results from simple misaligned IO test on 4-way dm-striped device
      with chunksize of 128K and stripesize of 512K:
      
      xfs_io -d -c 'pread -b 2m 224s 4072s' /dev/mapper/stripe_dev
      
      before this revert:
      
      253,0   21        1     0.000000000  2206  Q   R 224 + 4072 [xfs_io]
      253,0   21        2     0.000008267  2206  X   R 224 / 480 [xfs_io]
      253,0   21        3     0.000010530  2206  X   R 224 / 256 [xfs_io]
      253,0   21        4     0.000027022  2206  X   R 480 / 736 [xfs_io]
      253,0   21        5     0.000028751  2206  X   R 480 / 512 [xfs_io]
      253,0   21        6     0.000033323  2206  X   R 736 / 992 [xfs_io]
      253,0   21        7     0.000035130  2206  X   R 736 / 768 [xfs_io]
      253,0   21        8     0.000039146  2206  X   R 992 / 1248 [xfs_io]
      253,0   21        9     0.000040734  2206  X   R 992 / 1024 [xfs_io]
      253,0   21       10     0.000044694  2206  X   R 1248 / 1504 [xfs_io]
      253,0   21       11     0.000046422  2206  X   R 1248 / 1280 [xfs_io]
      253,0   21       12     0.000050376  2206  X   R 1504 / 1760 [xfs_io]
      253,0   21       13     0.000051974  2206  X   R 1504 / 1536 [xfs_io]
      253,0   21       14     0.000055881  2206  X   R 1760 / 2016 [xfs_io]
      253,0   21       15     0.000057462  2206  X   R 1760 / 1792 [xfs_io]
      253,0   21       16     0.000060999  2206  X   R 2016 / 2272 [xfs_io]
      253,0   21       17     0.000062489  2206  X   R 2016 / 2048 [xfs_io]
      253,0   21       18     0.000066133  2206  X   R 2272 / 2528 [xfs_io]
      253,0   21       19     0.000067507  2206  X   R 2272 / 2304 [xfs_io]
      253,0   21       20     0.000071136  2206  X   R 2528 / 2784 [xfs_io]
      253,0   21       21     0.000072764  2206  X   R 2528 / 2560 [xfs_io]
      253,0   21       22     0.000076185  2206  X   R 2784 / 3040 [xfs_io]
      253,0   21       23     0.000077486  2206  X   R 2784 / 2816 [xfs_io]
      253,0   21       24     0.000080885  2206  X   R 3040 / 3296 [xfs_io]
      253,0   21       25     0.000082316  2206  X   R 3040 / 3072 [xfs_io]
      253,0   21       26     0.000085788  2206  X   R 3296 / 3552 [xfs_io]
      253,0   21       27     0.000087096  2206  X   R 3296 / 3328 [xfs_io]
      253,0   21       28     0.000093469  2206  X   R 3552 / 3808 [xfs_io]
      253,0   21       29     0.000095186  2206  X   R 3552 / 3584 [xfs_io]
      253,0   21       30     0.000099228  2206  X   R 3808 / 4064 [xfs_io]
      253,0   21       31     0.000101062  2206  X   R 3808 / 3840 [xfs_io]
      253,0   21       32     0.000104956  2206  X   R 4064 / 4096 [xfs_io]
      253,0   21       33     0.001138823     0  C   R 4096 + 200 [0]
      
      after this revert:
      
      253,0   18        1     0.000000000  4430  Q   R 224 + 3896 [xfs_io]
      253,0   18        2     0.000018359  4430  X   R 224 / 256 [xfs_io]
      253,0   18        3     0.000028898  4430  X   R 256 / 512 [xfs_io]
      253,0   18        4     0.000033535  4430  X   R 512 / 768 [xfs_io]
      253,0   18        5     0.000065684  4430  X   R 768 / 1024 [xfs_io]
      253,0   18        6     0.000091695  4430  X   R 1024 / 1280 [xfs_io]
      253,0   18        7     0.000098494  4430  X   R 1280 / 1536 [xfs_io]
      253,0   18        8     0.000114069  4430  X   R 1536 / 1792 [xfs_io]
      253,0   18        9     0.000129483  4430  X   R 1792 / 2048 [xfs_io]
      253,0   18       10     0.000136759  4430  X   R 2048 / 2304 [xfs_io]
      253,0   18       11     0.000152412  4430  X   R 2304 / 2560 [xfs_io]
      253,0   18       12     0.000160758  4430  X   R 2560 / 2816 [xfs_io]
      253,0   18       13     0.000183385  4430  X   R 2816 / 3072 [xfs_io]
      253,0   18       14     0.000190797  4430  X   R 3072 / 3328 [xfs_io]
      253,0   18       15     0.000197667  4430  X   R 3328 / 3584 [xfs_io]
      253,0   18       16     0.000218751  4430  X   R 3584 / 3840 [xfs_io]
      253,0   18       17     0.000226005  4430  X   R 3840 / 4096 [xfs_io]
      253,0   18       18     0.000250404  4430  Q   R 4120 + 176 [xfs_io]
      253,0   18       19     0.000847708     0  C   R 4096 + 24 [0]
      253,0   18       20     0.000855783     0  C   R 4120 + 176 [0]
      
      Fixes: effd58c9 ("dm: always call blk_queue_split() in dm_process_bio()")
      Cc: stable@vger.kernel.org
      Reported-by: NAndreas Gruenbacher <agruenba@redhat.com>
      Tested-by: NBarry Marson <bmarson@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      120c9257
    • V
      dax: Move mandatory ->zero_page_range() check in alloc_dax() · 4e4ced93
      Vivek Goyal 提交于
      zero_page_range() dax operation is mandatory for dax devices. Right now
      that check happens in dax_zero_page_range() function. Dan thinks that's
      too late and its better to do the check earlier in alloc_dax().
      
      I also modified alloc_dax() to return pointer with error code in it in
      case of failure. Right now it returns NULL and caller assumes failure
      happened due to -ENOMEM. But with this ->zero_page_range() check, I
      need to return -EINVAL instead.
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Link: https://lore.kernel.org/r/20200401161125.GB9398@redhat.comSigned-off-by: NDan Williams <dan.j.williams@intel.com>
      4e4ced93
    • V
      dm,dax: Add dax zero_page_range operation · cdf6cdcd
      Vivek Goyal 提交于
      This patch adds support for dax zero_page_range operation to dm targets.
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Acked-by: NMike Snitzer <snitzer@redhat.com>
      Link: https://lore.kernel.org/r/20200228163456.1587-5-vgoyal@redhat.comSigned-off-by: NDan Williams <dan.j.williams@intel.com>
      cdf6cdcd
  17. 28 3月, 2020 1 次提交
    • C
      block: simplify queue allocation · 3d745ea5
      Christoph Hellwig 提交于
      Current make_request based drivers use either blk_alloc_queue_node or
      blk_alloc_queue to allocate a queue, and then set up the make_request_fn
      function pointer and a few parameters using the blk_queue_make_request
      helper.  Simplify this by passing the make_request pointer to
      blk_alloc_queue, and while at it merge the _node variant into the main
      helper by always passing a node_id, and remove the superfluous gfp_mask
      parameter.  A lower-level __blk_alloc_queue is kept for the blk-mq case.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      3d745ea5
  18. 25 3月, 2020 1 次提交
  19. 04 3月, 2020 1 次提交
  20. 28 2月, 2020 1 次提交
    • M
      dm: report suspended device during destroy · adc0daad
      Mikulas Patocka 提交于
      The function dm_suspended returns true if the target is suspended.
      However, when the target is being suspended during unload, it returns
      false.
      
      An example where this is a problem: the test "!dm_suspended(wc->ti)" in
      writecache_writeback is not sufficient, because dm_suspended returns
      zero while writecache_suspend is in progress.  As is, without an
      enhanced dm_suspended, simply switching from flush_workqueue to
      drain_workqueue still emits warnings:
      workqueue writecache-writeback: drain_workqueue() isn't complete after 10 tries
      workqueue writecache-writeback: drain_workqueue() isn't complete after 100 tries
      workqueue writecache-writeback: drain_workqueue() isn't complete after 200 tries
      workqueue writecache-writeback: drain_workqueue() isn't complete after 300 tries
      workqueue writecache-writeback: drain_workqueue() isn't complete after 400 tries
      
      writecache_suspend calls flush_workqueue(wc->writeback_wq) - this function
      flushes the current work. However, the workqueue may re-queue itself and
      flush_workqueue doesn't wait for re-queued works to finish. Because of
      this - the function writecache_writeback continues execution after the
      device was suspended and then concurrently with writecache_dtr, causing
      a crash in writecache_writeback.
      
      We must use drain_workqueue - that waits until the work and all re-queued
      works finish.
      
      As a prereq for switching to drain_workqueue, this commit fixes
      dm_suspended to return true after the presuspend hook and before the
      postsuspend hook - just like during a normal suspend. It allows
      simplifying the dm-integrity and dm-writecache targets so that they
      don't have to maintain suspended flags on their own.
      
      With this change use of drain_workqueue() can be used effectively.  This
      change was tested with the lvm2 testsuite and cryptsetup testsuite and
      the are no regressions.
      
      Fixes: 48debafe ("dm: add writecache target")
      Cc: stable@vger.kernel.org # 4.18+
      Reported-by: NCorey Marthaler <cmarthal@redhat.com>
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      adc0daad
  21. 28 1月, 2020 1 次提交
  22. 13 11月, 2019 3 次提交
  23. 07 11月, 2019 1 次提交
  24. 23 8月, 2019 1 次提交
    • M
      dm: make dm_table_find_target return NULL · 123d87d5
      Mikulas Patocka 提交于
      Currently, if we pass too high sector number to dm_table_find_target, it
      returns zeroed dm_target structure and callers test if the structure is
      zeroed with the macro dm_target_is_valid.
      
      However, returning NULL is common practice to indicate errors.
      
      This patch refactors the dm code, so that dm_table_find_target returns
      NULL and its callers test the returned value for NULL. The macro
      dm_target_is_valid is deleted. In alloc_targets, we no longer allocate an
      extra zeroed target.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      123d87d5
  25. 12 7月, 2019 1 次提交
  26. 06 7月, 2019 2 次提交
  27. 22 5月, 2019 1 次提交
    • M
      dm: make sure to obey max_io_len_target_boundary · 51b86f9a
      Michael Lass 提交于
      Commit 61697a6a ("dm: eliminate 'split_discard_bios' flag from DM
      target interface") incorrectly removed code from
      __send_changing_extent_only() that is required to impose a per-target IO
      boundary on IO that exceeds max_io_len_target_boundary().  Otherwise
      "special" IO (e.g. DISCARD, WRITE SAME, WRITE ZEROES) can write beyond
      where allowed.
      
      Fix this by restoring the max_io_len_target_boundary() limit in
      __send_changing_extent_only()
      
      Fixes: 61697a6a ("dm: eliminate 'split_discard_bios' flag from DM target interface")
      Cc: stable@vger.kernel.org # 5.1+
      Signed-off-by: NMichael Lass <bevan@bi-co.net>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      51b86f9a
  28. 21 5月, 2019 1 次提交
    • D
      dax: Arrange for dax_supported check to span multiple devices · 7bf7eac8
      Dan Williams 提交于
      Pankaj reports that starting with commit ad428cdb "dax: Check the
      end of the block-device capacity with dax_direct_access()" device-mapper
      no longer allows dax operation. This results from the stricter checks in
      __bdev_dax_supported() that validate that the start and end of a
      block-device map to the same 'pagemap' instance.
      
      Teach the dax-core and device-mapper to validate the 'pagemap' on a
      per-target basis. This is accomplished by refactoring the
      bdev_dax_supported() internals into generic_fsdax_supported() which
      takes a sector range to validate. Consequently generic_fsdax_supported()
      is suitable to be used in a device-mapper ->iterate_devices() callback.
      A new ->dax_supported() operation is added to allow composite devices to
      split and route upper-level bdev_dax_supported() requests.
      
      Fixes: ad428cdb ("dax: Check the end of the block-device...")
      Cc: <stable@vger.kernel.org>
      Cc: Ira Weiny <ira.weiny@intel.com>
      Cc: Dave Jiang <dave.jiang@intel.com>
      Cc: Keith Busch <keith.busch@intel.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Vishal Verma <vishal.l.verma@intel.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Reported-by: NPankaj Gupta <pagupta@redhat.com>
      Reviewed-by: NPankaj Gupta <pagupta@redhat.com>
      Tested-by: NPankaj Gupta <pagupta@redhat.com>
      Tested-by: NVaibhav Jain <vaibhav@linux.ibm.com>
      Reviewed-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      7bf7eac8