1. 02 10月, 2020 1 次提交
    • M
      dm: fold dm_process_bio() into dm_submit_bio() · b2abdb1b
      Mike Snitzer 提交于
      dm_process_bio() is only called by dm_submit_bio(), there is no benefit
      to keeping dm_process_bio() factored out, so fold it.
      
      While at it, cleanup dm_submit_bio()'s DMF_BLOCK_IO_FOR_SUSPEND related
      branching and expand scope of dm_get_live_table() rcu reference on map
      via common 'out' label to dm_put_live_table().
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      b2abdb1b
  2. 01 10月, 2020 1 次提交
    • M
      dm: fix missing imposition of queue_limits from dm_wq_work() thread · 0c2915b8
      Mike Snitzer 提交于
      If a DM device was suspended when bios were issued to it, those bios
      would be deferred using queue_io(). Once the DM device was resumed
      dm_process_bio() could be called by dm_wq_work() for original bio that
      still needs splitting. dm_process_bio()'s check for current->bio_list
      (meaning call chain is within ->submit_bio) as a prerequisite for
      calling blk_queue_split() for "abnormal IO" would result in
      dm_process_bio() never imposing corresponding queue_limits
      (e.g. discard_granularity, discard_max_bytes, etc).
      
      Fix this by always having dm_wq_work() resubmit deferred bios using
      submit_bio_noacct().
      
      Side-effect is blk_queue_split() is always called for "abnormal IO" from
      ->submit_bio, be it from application thread or dm_wq_work() workqueue,
      so proper bio splitting and depth-first bio submission is performed.
      For sake of clarity, remove current->bio_list check before call to
      blk_queue_split().
      
      Also, remove dm_wq_work()'s use of dm_{get,put}_live_table() -- no
      longer needed since IO will be reissued in terms of ->submit_bio.
      And rename bio variable from 'c' to 'bio'.
      
      Fixes: cf9c3786 ("dm: fix comment in dm_process_bio()")
      Reported-by: NJeffle Xu <jefflexu@linux.alibaba.com>
      Reviewed-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      0c2915b8
  3. 30 9月, 2020 7 次提交
  4. 25 9月, 2020 1 次提交
  5. 22 9月, 2020 2 次提交
    • M
      dm: fix comment in dm_process_bio() · cf9c3786
      Mike Snitzer 提交于
      Refer to the correct function (->submit_bio instead of ->queue_bio).
      Also, add details about why using blk_queue_split() isn't needed for
      dm_wq_work()'s call to dm_process_bio().
      
      Fixes: c62b37d9 ("block: move ->make_request_fn to struct block_device_operations")
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      cf9c3786
    • M
      dm: fix bio splitting and its bio completion order for regular IO · ee1dfad5
      Mike Snitzer 提交于
      dm_queue_split() is removed because __split_and_process_bio() _must_
      handle splitting bios to ensure proper bio submission and completion
      ordering as a bio is split.
      
      Otherwise, multiple recursive calls to ->submit_bio will cause multiple
      split bios to be allocated from the same ->bio_split mempool at the same
      time. This would result in deadlock in low memory conditions because no
      progress could be made (only one bio is available in ->bio_split
      mempool).
      
      This fix has been verified to still fix the loss of performance, due
      to excess splitting, that commit 120c9257 provided.
      
      Fixes: 120c9257 ("Revert "dm: always call blk_queue_split() in dm_process_bio()"")
      Cc: stable@vger.kernel.org # 5.0+, requires custom backport due to 5.9 changes
      Reported-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      ee1dfad5
  6. 20 9月, 2020 1 次提交
    • D
      dm/dax: Fix table reference counts · 02186d88
      Dan Williams 提交于
      A recent fix to the dm_dax_supported() flow uncovered a latent bug. When
      dm_get_live_table() fails it is still required to drop the
      srcu_read_lock(). Without this change the lvm2 test-suite triggers this
      warning:
      
          # lvm2-testsuite --only pvmove-abort-all.sh
      
          WARNING: lock held when returning to user space!
          5.9.0-rc5+ #251 Tainted: G           OE
          ------------------------------------------------
          lvm/1318 is leaving the kernel with locks still held!
          1 lock held by lvm/1318:
           #0: ffff9372abb5a340 (&md->io_barrier){....}-{0:0}, at: dm_get_live_table+0x5/0xb0 [dm_mod]
      
      ...and later on this hang signature:
      
          INFO: task lvm:1344 blocked for more than 122 seconds.
                Tainted: G           OE     5.9.0-rc5+ #251
          "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
          task:lvm             state:D stack:    0 pid: 1344 ppid:     1 flags:0x00004000
          Call Trace:
           __schedule+0x45f/0xa80
           ? finish_task_switch+0x249/0x2c0
           ? wait_for_completion+0x86/0x110
           schedule+0x5f/0xd0
           schedule_timeout+0x212/0x2a0
           ? __schedule+0x467/0xa80
           ? wait_for_completion+0x86/0x110
           wait_for_completion+0xb0/0x110
           __synchronize_srcu+0xd1/0x160
           ? __bpf_trace_rcu_utilization+0x10/0x10
           __dm_suspend+0x6d/0x210 [dm_mod]
           dm_suspend+0xf6/0x140 [dm_mod]
      
      Fixes: 7bf7eac8 ("dax: Arrange for dax_supported check to span multiple devices")
      Cc: <stable@vger.kernel.org>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Alasdair Kergon <agk@redhat.com>
      Cc: Mike Snitzer <snitzer@redhat.com>
      Reported-by: NAdrian Huang <ahuang12@lenovo.com>
      Reviewed-by: NIra Weiny <ira.weiny@intel.com>
      Tested-by: NAdrian Huang <ahuang12@lenovo.com>
      Link: https://lore.kernel.org/r/160045867590.25663.7548541079217827340.stgit@dwillia2-desk3.amr.corp.intel.comSigned-off-by: NDan Williams <dan.j.williams@intel.com>
      02186d88
  7. 02 9月, 2020 1 次提交
  8. 24 8月, 2020 1 次提交
  9. 05 8月, 2020 1 次提交
  10. 24 7月, 2020 1 次提交
    • M
      dm integrity: fix integrity recalculation that is improperly skipped · 5df96f2b
      Mikulas Patocka 提交于
      Commit adc0daad ("dm: report suspended
      device during destroy") broke integrity recalculation.
      
      The problem is dm_suspended() returns true not only during suspend,
      but also during resume. So this race condition could occur:
      1. dm_integrity_resume calls queue_work(ic->recalc_wq, &ic->recalc_work)
      2. integrity_recalc (&ic->recalc_work) preempts the current thread
      3. integrity_recalc calls if (unlikely(dm_suspended(ic->ti))) goto unlock_ret;
      4. integrity_recalc exits and no recalculating is done.
      
      To fix this race condition, add a function dm_post_suspending that is
      only true during the postsuspend phase and use it instead of
      dm_suspended().
      
      Signed-off-by: Mikulas Patocka <mpatocka redhat com>
      Fixes: adc0daad ("dm: report suspended device during destroy")
      Cc: stable vger kernel org # v4.18+
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      5df96f2b
  11. 09 7月, 2020 3 次提交
  12. 08 7月, 2020 2 次提交
    • C
      dm: use bio_uninit instead of bio_disassociate_blkg · 382761dc
      Christoph Hellwig 提交于
      bio_uninit is the proper API to clean up a BIO that has been allocated
      on stack or inside a structure that doesn't come from the BIO allocator.
      Switch dm to use that instead of bio_disassociate_blkg, which really is
      an implementation detail.  Note that the bio_uninit calls are also moved
      to the two callers of __send_empty_flush, so that they better pair with
      the bio_init calls used to initialize them.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      382761dc
    • M
      dm: do not use waitqueue for request-based DM · 85067747
      Ming Lei 提交于
      Given request-based DM now uses blk-mq's blk_mq_queue_inflight() to
      determine if outstanding IO has completed (and DM has no control over
      the blk-mq state machine used to track outstanding IO) it is unsafe to
      wakeup waiter (dm_wait_for_completion) before blk-mq has cleared a
      request's state bits (e.g. MQ_RQ_IN_FLIGHT or MQ_RQ_COMPLETE).  As
      such dm_wait_for_completion() could be left to wait indefinitely if no
      other requests complete.
      
      Fix this by eliminating request-based DM's use of waitqueue to wait
      for blk-mq requests to complete in dm_wait_for_completion.
      Signed-off-by: NMing Lei <ming.lei@redhat.com>
      Depends-on: 3c94d83c ("blk-mq: change blk_mq_queue_busy() to blk_mq_queue_inflight()")
      Cc: stable@vger.kernel.org
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      85067747
  13. 02 7月, 2020 1 次提交
    • J
      dm: remove unused variable · b53ac8b8
      Jens Axboe 提交于
      Since merging the commit identified in Fixes below, we trigger this
      compile time warning:
      
      drivers/md/dm.c: In function ‘__map_bio’:
      drivers/md/dm.c:1296:24: warning: unused variable ‘md’ [-Wunused-variable]
       1296 |  struct mapped_device *md = io->md;
             |                        ^~
      
      Remove the 'md' variable.
      
      Fixes: 5a6c35f9 ("block: remove direct_make_request")
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      b53ac8b8
  14. 01 7月, 2020 5 次提交
  15. 29 6月, 2020 1 次提交
  16. 20 6月, 2020 1 次提交
  17. 27 5月, 2020 1 次提交
  18. 21 5月, 2020 1 次提交
    • M
      dm: use DMDEBUG macros now that they use pr_debug variants · ac75b09f
      Mike Snitzer 提交于
      Now that DMDEBUG uses pr_debug and DMDEBUG_LIMIT uses
      pr_debug_ratelimited cleanup DM's 2 direct pr_debug callers to use
      them to get the benefit of consistent DM_FMT formatting of debugging
      messages.
      
      While doing so, dm-mpath.c:dm_report_EIO() was switched over to using
      DMDEBUG_LIMIT due to the potential for error handling floods in the IO
      completion path.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      ac75b09f
  19. 19 5月, 2020 1 次提交
    • C
      blk-mq: allow blk_mq_make_request to consume the q_usage_counter reference · ac7c5675
      Christoph Hellwig 提交于
      blk_mq_make_request currently needs to grab an q_usage_counter
      reference when allocating a request.  This is because the block layer
      grabs one before calling blk_mq_make_request, but also releases it as
      soon as blk_mq_make_request returns.  Remove the blk_queue_exit call
      after blk_mq_make_request returns, and instead let it consume the
      reference.  This works perfectly fine for the block layer caller, just
      device mapper needs an extra reference as the old problem still
      persists there.  Open code blk_queue_enter_live in device mapper,
      as there should be no other callers and this allows better documenting
      why we do a non-try get.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      ac7c5675
  20. 15 5月, 2020 1 次提交
  21. 14 5月, 2020 1 次提交
    • S
      block: Inline encryption support for blk-mq · a892c8d5
      Satya Tangirala 提交于
      We must have some way of letting a storage device driver know what
      encryption context it should use for en/decrypting a request. However,
      it's the upper layers (like the filesystem/fscrypt) that know about and
      manages encryption contexts. As such, when the upper layer submits a bio
      to the block layer, and this bio eventually reaches a device driver with
      support for inline encryption, the device driver will need to have been
      told the encryption context for that bio.
      
      We want to communicate the encryption context from the upper layer to the
      storage device along with the bio, when the bio is submitted to the block
      layer. To do this, we add a struct bio_crypt_ctx to struct bio, which can
      represent an encryption context (note that we can't use the bi_private
      field in struct bio to do this because that field does not function to pass
      information across layers in the storage stack). We also introduce various
      functions to manipulate the bio_crypt_ctx and make the bio/request merging
      logic aware of the bio_crypt_ctx.
      
      We also make changes to blk-mq to make it handle bios with encryption
      contexts. blk-mq can merge many bios into the same request. These bios need
      to have contiguous data unit numbers (the necessary changes to blk-merge
      are also made to ensure this) - as such, it suffices to keep the data unit
      number of just the first bio, since that's all a storage driver needs to
      infer the data unit number to use for each data block in each bio in a
      request. blk-mq keeps track of the encryption context to be used for all
      the bios in a request with the request's rq_crypt_ctx. When the first bio
      is added to an empty request, blk-mq will program the encryption context
      of that bio into the request_queue's keyslot manager, and store the
      returned keyslot in the request's rq_crypt_ctx. All the functions to
      operate on encryption contexts are in blk-crypto.c.
      
      Upper layers only need to call bio_crypt_set_ctx with the encryption key,
      algorithm and data_unit_num; they don't have to worry about getting a
      keyslot for each encryption context, as blk-mq/blk-crypto handles that.
      Blk-crypto also makes it possible for request-based layered devices like
      dm-rq to make use of inline encryption hardware by cloning the
      rq_crypt_ctx and programming a keyslot in the new request_queue when
      necessary.
      
      Note that any user of the block layer can submit bios with an
      encryption context, such as filesystems, device-mapper targets, etc.
      Signed-off-by: NSatya Tangirala <satyat@google.com>
      Reviewed-by: NEric Biggers <ebiggers@google.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      a892c8d5
  22. 25 4月, 2020 1 次提交
  23. 03 4月, 2020 3 次提交
    • M
      Revert "dm: always call blk_queue_split() in dm_process_bio()" · 120c9257
      Mike Snitzer 提交于
      This reverts commit effd58c9.
      
      blk_queue_split() is causing excessive IO splitting -- because
      blk_max_size_offset() depends on 'chunk_sectors' limit being set and
      if it isn't (as is the case for DM targets!) it falls back to
      splitting on a 'max_sectors' boundary regardless of offset.
      
      "Fix" this by reverting back to _not_ using blk_queue_split() in
      dm_process_bio() for normal IO (reads and writes).  Long-term fix is
      still TBD but it should focus on training blk_max_size_offset() to
      call into a DM provided hook (to call DM's max_io_len()).
      
      Test results from simple misaligned IO test on 4-way dm-striped device
      with chunksize of 128K and stripesize of 512K:
      
      xfs_io -d -c 'pread -b 2m 224s 4072s' /dev/mapper/stripe_dev
      
      before this revert:
      
      253,0   21        1     0.000000000  2206  Q   R 224 + 4072 [xfs_io]
      253,0   21        2     0.000008267  2206  X   R 224 / 480 [xfs_io]
      253,0   21        3     0.000010530  2206  X   R 224 / 256 [xfs_io]
      253,0   21        4     0.000027022  2206  X   R 480 / 736 [xfs_io]
      253,0   21        5     0.000028751  2206  X   R 480 / 512 [xfs_io]
      253,0   21        6     0.000033323  2206  X   R 736 / 992 [xfs_io]
      253,0   21        7     0.000035130  2206  X   R 736 / 768 [xfs_io]
      253,0   21        8     0.000039146  2206  X   R 992 / 1248 [xfs_io]
      253,0   21        9     0.000040734  2206  X   R 992 / 1024 [xfs_io]
      253,0   21       10     0.000044694  2206  X   R 1248 / 1504 [xfs_io]
      253,0   21       11     0.000046422  2206  X   R 1248 / 1280 [xfs_io]
      253,0   21       12     0.000050376  2206  X   R 1504 / 1760 [xfs_io]
      253,0   21       13     0.000051974  2206  X   R 1504 / 1536 [xfs_io]
      253,0   21       14     0.000055881  2206  X   R 1760 / 2016 [xfs_io]
      253,0   21       15     0.000057462  2206  X   R 1760 / 1792 [xfs_io]
      253,0   21       16     0.000060999  2206  X   R 2016 / 2272 [xfs_io]
      253,0   21       17     0.000062489  2206  X   R 2016 / 2048 [xfs_io]
      253,0   21       18     0.000066133  2206  X   R 2272 / 2528 [xfs_io]
      253,0   21       19     0.000067507  2206  X   R 2272 / 2304 [xfs_io]
      253,0   21       20     0.000071136  2206  X   R 2528 / 2784 [xfs_io]
      253,0   21       21     0.000072764  2206  X   R 2528 / 2560 [xfs_io]
      253,0   21       22     0.000076185  2206  X   R 2784 / 3040 [xfs_io]
      253,0   21       23     0.000077486  2206  X   R 2784 / 2816 [xfs_io]
      253,0   21       24     0.000080885  2206  X   R 3040 / 3296 [xfs_io]
      253,0   21       25     0.000082316  2206  X   R 3040 / 3072 [xfs_io]
      253,0   21       26     0.000085788  2206  X   R 3296 / 3552 [xfs_io]
      253,0   21       27     0.000087096  2206  X   R 3296 / 3328 [xfs_io]
      253,0   21       28     0.000093469  2206  X   R 3552 / 3808 [xfs_io]
      253,0   21       29     0.000095186  2206  X   R 3552 / 3584 [xfs_io]
      253,0   21       30     0.000099228  2206  X   R 3808 / 4064 [xfs_io]
      253,0   21       31     0.000101062  2206  X   R 3808 / 3840 [xfs_io]
      253,0   21       32     0.000104956  2206  X   R 4064 / 4096 [xfs_io]
      253,0   21       33     0.001138823     0  C   R 4096 + 200 [0]
      
      after this revert:
      
      253,0   18        1     0.000000000  4430  Q   R 224 + 3896 [xfs_io]
      253,0   18        2     0.000018359  4430  X   R 224 / 256 [xfs_io]
      253,0   18        3     0.000028898  4430  X   R 256 / 512 [xfs_io]
      253,0   18        4     0.000033535  4430  X   R 512 / 768 [xfs_io]
      253,0   18        5     0.000065684  4430  X   R 768 / 1024 [xfs_io]
      253,0   18        6     0.000091695  4430  X   R 1024 / 1280 [xfs_io]
      253,0   18        7     0.000098494  4430  X   R 1280 / 1536 [xfs_io]
      253,0   18        8     0.000114069  4430  X   R 1536 / 1792 [xfs_io]
      253,0   18        9     0.000129483  4430  X   R 1792 / 2048 [xfs_io]
      253,0   18       10     0.000136759  4430  X   R 2048 / 2304 [xfs_io]
      253,0   18       11     0.000152412  4430  X   R 2304 / 2560 [xfs_io]
      253,0   18       12     0.000160758  4430  X   R 2560 / 2816 [xfs_io]
      253,0   18       13     0.000183385  4430  X   R 2816 / 3072 [xfs_io]
      253,0   18       14     0.000190797  4430  X   R 3072 / 3328 [xfs_io]
      253,0   18       15     0.000197667  4430  X   R 3328 / 3584 [xfs_io]
      253,0   18       16     0.000218751  4430  X   R 3584 / 3840 [xfs_io]
      253,0   18       17     0.000226005  4430  X   R 3840 / 4096 [xfs_io]
      253,0   18       18     0.000250404  4430  Q   R 4120 + 176 [xfs_io]
      253,0   18       19     0.000847708     0  C   R 4096 + 24 [0]
      253,0   18       20     0.000855783     0  C   R 4120 + 176 [0]
      
      Fixes: effd58c9 ("dm: always call blk_queue_split() in dm_process_bio()")
      Cc: stable@vger.kernel.org
      Reported-by: NAndreas Gruenbacher <agruenba@redhat.com>
      Tested-by: NBarry Marson <bmarson@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      120c9257
    • V
      dax: Move mandatory ->zero_page_range() check in alloc_dax() · 4e4ced93
      Vivek Goyal 提交于
      zero_page_range() dax operation is mandatory for dax devices. Right now
      that check happens in dax_zero_page_range() function. Dan thinks that's
      too late and its better to do the check earlier in alloc_dax().
      
      I also modified alloc_dax() to return pointer with error code in it in
      case of failure. Right now it returns NULL and caller assumes failure
      happened due to -ENOMEM. But with this ->zero_page_range() check, I
      need to return -EINVAL instead.
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Link: https://lore.kernel.org/r/20200401161125.GB9398@redhat.comSigned-off-by: NDan Williams <dan.j.williams@intel.com>
      4e4ced93
    • V
      dm,dax: Add dax zero_page_range operation · cdf6cdcd
      Vivek Goyal 提交于
      This patch adds support for dax zero_page_range operation to dm targets.
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Acked-by: NMike Snitzer <snitzer@redhat.com>
      Link: https://lore.kernel.org/r/20200228163456.1587-5-vgoyal@redhat.comSigned-off-by: NDan Williams <dan.j.williams@intel.com>
      cdf6cdcd
  24. 28 3月, 2020 1 次提交
    • C
      block: simplify queue allocation · 3d745ea5
      Christoph Hellwig 提交于
      Current make_request based drivers use either blk_alloc_queue_node or
      blk_alloc_queue to allocate a queue, and then set up the make_request_fn
      function pointer and a few parameters using the blk_queue_make_request
      helper.  Simplify this by passing the make_request pointer to
      blk_alloc_queue, and while at it merge the _node variant into the main
      helper by always passing a node_id, and remove the superfluous gfp_mask
      parameter.  A lower-level __blk_alloc_queue is kept for the blk-mq case.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      3d745ea5