1. 24 3月, 2021 1 次提交
    • D
      block: recalculate segment count for multi-segment discards correctly · a958937f
      David Jeffery 提交于
      When a stacked block device inserts a request into another block device
      using blk_insert_cloned_request, the request's nr_phys_segments field gets
      recalculated by a call to blk_recalc_rq_segments in
      blk_cloned_rq_check_limits. But blk_recalc_rq_segments does not know how to
      handle multi-segment discards. For disk types which can handle
      multi-segment discards like nvme, this results in discard requests which
      claim a single segment when it should report several, triggering a warning
      in nvme and causing nvme to fail the discard from the invalid state.
      
       WARNING: CPU: 5 PID: 191 at drivers/nvme/host/core.c:700 nvme_setup_discard+0x170/0x1e0 [nvme_core]
       ...
       nvme_setup_cmd+0x217/0x270 [nvme_core]
       nvme_loop_queue_rq+0x51/0x1b0 [nvme_loop]
       __blk_mq_try_issue_directly+0xe7/0x1b0
       blk_mq_request_issue_directly+0x41/0x70
       ? blk_account_io_start+0x40/0x50
       dm_mq_queue_rq+0x200/0x3e0
       blk_mq_dispatch_rq_list+0x10a/0x7d0
       ? __sbitmap_queue_get+0x25/0x90
       ? elv_rb_del+0x1f/0x30
       ? deadline_remove_request+0x55/0xb0
       ? dd_dispatch_request+0x181/0x210
       __blk_mq_do_dispatch_sched+0x144/0x290
       ? bio_attempt_discard_merge+0x134/0x1f0
       __blk_mq_sched_dispatch_requests+0x129/0x180
       blk_mq_sched_dispatch_requests+0x30/0x60
       __blk_mq_run_hw_queue+0x47/0xe0
       __blk_mq_delay_run_hw_queue+0x15b/0x170
       blk_mq_sched_insert_requests+0x68/0xe0
       blk_mq_flush_plug_list+0xf0/0x170
       blk_finish_plug+0x36/0x50
       xlog_cil_committed+0x19f/0x290 [xfs]
       xlog_cil_process_committed+0x57/0x80 [xfs]
       xlog_state_do_callback+0x1e0/0x2a0 [xfs]
       xlog_ioend_work+0x2f/0x80 [xfs]
       process_one_work+0x1b6/0x350
       worker_thread+0x53/0x3e0
       ? process_one_work+0x350/0x350
       kthread+0x11b/0x140
       ? __kthread_bind_mask+0x60/0x60
       ret_from_fork+0x22/0x30
      
      This patch fixes blk_recalc_rq_segments to be aware of devices which can
      have multi-segment discards. It calculates the correct discard segment
      count by counting the number of bio as each discard bio is considered its
      own segment.
      
      Fixes: 1e739730 ("block: optionally merge discontiguous discard bios into a single request")
      Signed-off-by: NDavid Jeffery <djeffery@redhat.com>
      Reviewed-by: NMing Lei <ming.lei@redhat.com>
      Reviewed-by: NLaurence Oberman <loberman@redhat.com>
      Link: https://lore.kernel.org/r/20210211143807.GA115624@redhatSigned-off-by: NJens Axboe <axboe@kernel.dk>
      a958937f
  2. 25 1月, 2021 1 次提交
  3. 08 12月, 2020 1 次提交
    • J
      block: disable iopoll for split bio · cc29e1bf
      Jeffle Xu 提交于
      iopoll is initially for small size, latency sensitive IO. It doesn't
      work well for big IO, especially when it needs to be split to multiple
      bios. In this case, the returned cookie of __submit_bio_noacct_mq() is
      indeed the cookie of the last split bio. The completion of *this* last
      split bio done by iopoll doesn't mean the whole original bio has
      completed. Callers of iopoll still need to wait for completion of other
      split bios.
      
      Besides bio splitting may cause more trouble for iopoll which isn't
      supposed to be used in case of big IO.
      
      iopoll for split bio may cause potential race if CPU migration happens
      during bio submission. Since the returned cookie is that of the last
      split bio, polling on the corresponding hardware queue doesn't help
      complete other split bios, if these split bios are enqueued into
      different hardware queues. Since interrupts are disabled for polling
      queues, the completion of these other split bios depends on timeout
      mechanism, thus causing a potential hang.
      
      iopoll for split bio may also cause hang for sync polling. Currently
      both the blkdev and iomap-based fs (ext4/xfs, etc) support sync polling
      in direct IO routine. These routines will submit bio without REQ_NOWAIT
      flag set, and then start sync polling in current process context. The
      process may hang in blk_mq_get_tag() if the submitted bio has to be
      split into multiple bios and can rapidly exhaust the queue depth. The
      process are waiting for the completion of the previously allocated
      requests, which should be reaped by the following polling, and thus
      causing a deadlock.
      
      To avoid these subtle trouble described above, just disable iopoll for
      split bio and return BLK_QC_T_NONE in this case. The side effect is that
      non-HIPRI IO also returns BLK_QC_T_NONE now. It should be acceptable
      since the returned cookie is never used for non-HIPRI IO.
      Suggested-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NJeffle Xu <jefflexu@linux.alibaba.com>
      Reviewed-by: NMing Lei <ming.lei@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      cc29e1bf
  4. 05 12月, 2020 4 次提交
  5. 02 12月, 2020 1 次提交
  6. 06 10月, 2020 1 次提交
  7. 02 9月, 2020 4 次提交
  8. 22 8月, 2020 1 次提交
    • K
      block: fix get_max_io_size() · e4b469c6
      Keith Busch 提交于
      A previous commit aligning splits to physical block sizes inadvertently
      modified one return case such that that it now returns 0 length splits
      when the number of sectors doesn't exceed the physical offset. This
      later hits a BUG in bio_split(). Restore the previous working behavior.
      
      Fixes: 9cc5169c ("block: Improve physical block alignment of split bios")
      Reported-by: NEric Deal <eric.deal@wdc.com>
      Signed-off-by: NKeith Busch <kbusch@kernel.org>
      Cc: Bart Van Assche <bvanassche@acm.org>
      Cc: stable@vger.kernel.org
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      e4b469c6
  9. 17 8月, 2020 1 次提交
  10. 17 7月, 2020 1 次提交
  11. 01 7月, 2020 2 次提交
  12. 26 6月, 2020 1 次提交
    • J
      blktrace: Provide event for request merging · f3bdc62f
      Jan Kara 提交于
      Currently blk-mq does not report any event when two requests get merged
      in the elevator. This then results in difficult to understand sequence
      of events like:
      
      ...
        8,0   34     1579     0.608765271  2718  I  WS 215023504 + 40 [dbench]
        8,0   34     1584     0.609184613  2719  A  WS 215023544 + 56 <- (8,4) 2160568
        8,0   34     1585     0.609184850  2719  Q  WS 215023544 + 56 [dbench]
        8,0   34     1586     0.609188524  2719  G  WS 215023544 + 56 [dbench]
        8,0    3      602     0.609684162   773  D  WS 215023504 + 96 [kworker/3:1H]
        8,0   34     1591     0.609843593     0  C  WS 215023504 + 96 [0]
      
      and you can only guess (after quite some headscratching since the above
      excerpt is intermixed with a lot of other IO) that request 215023544+56
      got merged to request 215023504+40. Provide proper event for request
      merging like we used to do in the legacy block layer.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      f3bdc62f
  13. 27 5月, 2020 2 次提交
  14. 19 5月, 2020 1 次提交
  15. 14 5月, 2020 1 次提交
    • S
      block: Inline encryption support for blk-mq · a892c8d5
      Satya Tangirala 提交于
      We must have some way of letting a storage device driver know what
      encryption context it should use for en/decrypting a request. However,
      it's the upper layers (like the filesystem/fscrypt) that know about and
      manages encryption contexts. As such, when the upper layer submits a bio
      to the block layer, and this bio eventually reaches a device driver with
      support for inline encryption, the device driver will need to have been
      told the encryption context for that bio.
      
      We want to communicate the encryption context from the upper layer to the
      storage device along with the bio, when the bio is submitted to the block
      layer. To do this, we add a struct bio_crypt_ctx to struct bio, which can
      represent an encryption context (note that we can't use the bi_private
      field in struct bio to do this because that field does not function to pass
      information across layers in the storage stack). We also introduce various
      functions to manipulate the bio_crypt_ctx and make the bio/request merging
      logic aware of the bio_crypt_ctx.
      
      We also make changes to blk-mq to make it handle bios with encryption
      contexts. blk-mq can merge many bios into the same request. These bios need
      to have contiguous data unit numbers (the necessary changes to blk-merge
      are also made to ensure this) - as such, it suffices to keep the data unit
      number of just the first bio, since that's all a storage driver needs to
      infer the data unit number to use for each data block in each bio in a
      request. blk-mq keeps track of the encryption context to be used for all
      the bios in a request with the request's rq_crypt_ctx. When the first bio
      is added to an empty request, blk-mq will program the encryption context
      of that bio into the request_queue's keyslot manager, and store the
      returned keyslot in the request's rq_crypt_ctx. All the functions to
      operate on encryption contexts are in blk-crypto.c.
      
      Upper layers only need to call bio_crypt_set_ctx with the encryption key,
      algorithm and data_unit_num; they don't have to worry about getting a
      keyslot for each encryption context, as blk-mq/blk-crypto handles that.
      Blk-crypto also makes it possible for request-based layered devices like
      dm-rq to make use of inline encryption hardware by cloning the
      rq_crypt_ctx and programming a keyslot in the new request_queue when
      necessary.
      
      Note that any user of the block layer can submit bios with an
      encryption context, such as filesystems, device-mapper targets, etc.
      Signed-off-by: NSatya Tangirala <satyat@google.com>
      Reviewed-by: NEric Biggers <ebiggers@google.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      a892c8d5
  16. 29 4月, 2020 1 次提交
  17. 23 4月, 2020 4 次提交
  18. 15 1月, 2020 1 次提交
  19. 30 12月, 2019 1 次提交
    • M
      block: fix splitting segments on boundary masks · 429120f3
      Ming Lei 提交于
      We ran into a problem with a mpt3sas based controller, where we would
      see random (and hard to reproduce) file corruption). The issue seemed
      specific to this controller, but wasn't specific to the file system.
      After a lot of debugging, we find out that it's caused by segments
      spanning a 4G memory boundary. This shouldn't happen, as the default
      setting for segment boundary masks is 4G.
      
      Turns out there are two issues in get_max_segment_size():
      
      1) The default segment boundary mask is bypassed
      
      2) The segment start address isn't taken into account when checking
         segment boundary limit
      
      Fix these two issues by removing the bypass of the segment boundary
      check even if the mask is set to the default value, and taking into
      account the actual start address of the request when checking if a
      segment needs splitting.
      
      Cc: stable@vger.kernel.org # v5.1+
      Reviewed-by: NChris Mason <clm@fb.com>
      Tested-by: NChris Mason <clm@fb.com>
      Fixes: dcebd755 ("block: use bio_for_each_bvec() to compute multi-page bvec count")
      Signed-off-by: NMing Lei <ming.lei@redhat.com>
      
      Dropped const on the page pointer, ppc page_to_phys() doesn't mark the
      page as const...
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      429120f3
  20. 22 11月, 2019 1 次提交
  21. 08 11月, 2019 2 次提交
  22. 05 11月, 2019 1 次提交
  23. 05 8月, 2019 5 次提交
  24. 03 7月, 2019 1 次提交