1. 10 8月, 2021 1 次提交
  2. 29 6月, 2021 1 次提交
  3. 25 6月, 2021 1 次提交
    • J
      blk: Fix lock inversion between ioc lock and bfqd lock · fd2ef39c
      Jan Kara 提交于
      Lockdep complains about lock inversion between ioc->lock and bfqd->lock:
      
      bfqd -> ioc:
       put_io_context+0x33/0x90 -> ioc->lock grabbed
       blk_mq_free_request+0x51/0x140
       blk_put_request+0xe/0x10
       blk_attempt_req_merge+0x1d/0x30
       elv_attempt_insert_merge+0x56/0xa0
       blk_mq_sched_try_insert_merge+0x4b/0x60
       bfq_insert_requests+0x9e/0x18c0 -> bfqd->lock grabbed
       blk_mq_sched_insert_requests+0xd6/0x2b0
       blk_mq_flush_plug_list+0x154/0x280
       blk_finish_plug+0x40/0x60
       ext4_writepages+0x696/0x1320
       do_writepages+0x1c/0x80
       __filemap_fdatawrite_range+0xd7/0x120
       sync_file_range+0xac/0xf0
      
      ioc->bfqd:
       bfq_exit_icq+0xa3/0xe0 -> bfqd->lock grabbed
       put_io_context_active+0x78/0xb0 -> ioc->lock grabbed
       exit_io_context+0x48/0x50
       do_exit+0x7e9/0xdd0
       do_group_exit+0x54/0xc0
      
      To avoid this inversion we change blk_mq_sched_try_insert_merge() to not
      free the merged request but rather leave that upto the caller similarly
      to blk_mq_sched_try_merge(). And in bfq_insert_requests() we make sure
      to free all the merged requests after dropping bfqd->lock.
      
      Fixes: aee69d78 ("block, bfq: introduce the BFQ-v0 I/O scheduler as an extra scheduler")
      Reviewed-by: NMing Lei <ming.lei@redhat.com>
      Acked-by: NPaolo Valente <paolo.valente@linaro.org>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Link: https://lore.kernel.org/r/20210623093634.27879-3-jack@suse.czSigned-off-by: NJens Axboe <axboe@kernel.dk>
      fd2ef39c
  4. 24 3月, 2021 1 次提交
    • D
      block: recalculate segment count for multi-segment discards correctly · a958937f
      David Jeffery 提交于
      When a stacked block device inserts a request into another block device
      using blk_insert_cloned_request, the request's nr_phys_segments field gets
      recalculated by a call to blk_recalc_rq_segments in
      blk_cloned_rq_check_limits. But blk_recalc_rq_segments does not know how to
      handle multi-segment discards. For disk types which can handle
      multi-segment discards like nvme, this results in discard requests which
      claim a single segment when it should report several, triggering a warning
      in nvme and causing nvme to fail the discard from the invalid state.
      
       WARNING: CPU: 5 PID: 191 at drivers/nvme/host/core.c:700 nvme_setup_discard+0x170/0x1e0 [nvme_core]
       ...
       nvme_setup_cmd+0x217/0x270 [nvme_core]
       nvme_loop_queue_rq+0x51/0x1b0 [nvme_loop]
       __blk_mq_try_issue_directly+0xe7/0x1b0
       blk_mq_request_issue_directly+0x41/0x70
       ? blk_account_io_start+0x40/0x50
       dm_mq_queue_rq+0x200/0x3e0
       blk_mq_dispatch_rq_list+0x10a/0x7d0
       ? __sbitmap_queue_get+0x25/0x90
       ? elv_rb_del+0x1f/0x30
       ? deadline_remove_request+0x55/0xb0
       ? dd_dispatch_request+0x181/0x210
       __blk_mq_do_dispatch_sched+0x144/0x290
       ? bio_attempt_discard_merge+0x134/0x1f0
       __blk_mq_sched_dispatch_requests+0x129/0x180
       blk_mq_sched_dispatch_requests+0x30/0x60
       __blk_mq_run_hw_queue+0x47/0xe0
       __blk_mq_delay_run_hw_queue+0x15b/0x170
       blk_mq_sched_insert_requests+0x68/0xe0
       blk_mq_flush_plug_list+0xf0/0x170
       blk_finish_plug+0x36/0x50
       xlog_cil_committed+0x19f/0x290 [xfs]
       xlog_cil_process_committed+0x57/0x80 [xfs]
       xlog_state_do_callback+0x1e0/0x2a0 [xfs]
       xlog_ioend_work+0x2f/0x80 [xfs]
       process_one_work+0x1b6/0x350
       worker_thread+0x53/0x3e0
       ? process_one_work+0x350/0x350
       kthread+0x11b/0x140
       ? __kthread_bind_mask+0x60/0x60
       ret_from_fork+0x22/0x30
      
      This patch fixes blk_recalc_rq_segments to be aware of devices which can
      have multi-segment discards. It calculates the correct discard segment
      count by counting the number of bio as each discard bio is considered its
      own segment.
      
      Fixes: 1e739730 ("block: optionally merge discontiguous discard bios into a single request")
      Signed-off-by: NDavid Jeffery <djeffery@redhat.com>
      Reviewed-by: NMing Lei <ming.lei@redhat.com>
      Reviewed-by: NLaurence Oberman <loberman@redhat.com>
      Link: https://lore.kernel.org/r/20210211143807.GA115624@redhatSigned-off-by: NJens Axboe <axboe@kernel.dk>
      a958937f
  5. 25 1月, 2021 1 次提交
  6. 08 12月, 2020 1 次提交
    • J
      block: disable iopoll for split bio · cc29e1bf
      Jeffle Xu 提交于
      iopoll is initially for small size, latency sensitive IO. It doesn't
      work well for big IO, especially when it needs to be split to multiple
      bios. In this case, the returned cookie of __submit_bio_noacct_mq() is
      indeed the cookie of the last split bio. The completion of *this* last
      split bio done by iopoll doesn't mean the whole original bio has
      completed. Callers of iopoll still need to wait for completion of other
      split bios.
      
      Besides bio splitting may cause more trouble for iopoll which isn't
      supposed to be used in case of big IO.
      
      iopoll for split bio may cause potential race if CPU migration happens
      during bio submission. Since the returned cookie is that of the last
      split bio, polling on the corresponding hardware queue doesn't help
      complete other split bios, if these split bios are enqueued into
      different hardware queues. Since interrupts are disabled for polling
      queues, the completion of these other split bios depends on timeout
      mechanism, thus causing a potential hang.
      
      iopoll for split bio may also cause hang for sync polling. Currently
      both the blkdev and iomap-based fs (ext4/xfs, etc) support sync polling
      in direct IO routine. These routines will submit bio without REQ_NOWAIT
      flag set, and then start sync polling in current process context. The
      process may hang in blk_mq_get_tag() if the submitted bio has to be
      split into multiple bios and can rapidly exhaust the queue depth. The
      process are waiting for the completion of the previously allocated
      requests, which should be reaped by the following polling, and thus
      causing a deadlock.
      
      To avoid these subtle trouble described above, just disable iopoll for
      split bio and return BLK_QC_T_NONE in this case. The side effect is that
      non-HIPRI IO also returns BLK_QC_T_NONE now. It should be acceptable
      since the returned cookie is never used for non-HIPRI IO.
      Suggested-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NJeffle Xu <jefflexu@linux.alibaba.com>
      Reviewed-by: NMing Lei <ming.lei@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      cc29e1bf
  7. 05 12月, 2020 4 次提交
  8. 02 12月, 2020 1 次提交
  9. 06 10月, 2020 1 次提交
  10. 02 9月, 2020 4 次提交
  11. 22 8月, 2020 1 次提交
    • K
      block: fix get_max_io_size() · e4b469c6
      Keith Busch 提交于
      A previous commit aligning splits to physical block sizes inadvertently
      modified one return case such that that it now returns 0 length splits
      when the number of sectors doesn't exceed the physical offset. This
      later hits a BUG in bio_split(). Restore the previous working behavior.
      
      Fixes: 9cc5169c ("block: Improve physical block alignment of split bios")
      Reported-by: NEric Deal <eric.deal@wdc.com>
      Signed-off-by: NKeith Busch <kbusch@kernel.org>
      Cc: Bart Van Assche <bvanassche@acm.org>
      Cc: stable@vger.kernel.org
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      e4b469c6
  12. 17 8月, 2020 1 次提交
  13. 17 7月, 2020 1 次提交
  14. 01 7月, 2020 2 次提交
  15. 26 6月, 2020 1 次提交
    • J
      blktrace: Provide event for request merging · f3bdc62f
      Jan Kara 提交于
      Currently blk-mq does not report any event when two requests get merged
      in the elevator. This then results in difficult to understand sequence
      of events like:
      
      ...
        8,0   34     1579     0.608765271  2718  I  WS 215023504 + 40 [dbench]
        8,0   34     1584     0.609184613  2719  A  WS 215023544 + 56 <- (8,4) 2160568
        8,0   34     1585     0.609184850  2719  Q  WS 215023544 + 56 [dbench]
        8,0   34     1586     0.609188524  2719  G  WS 215023544 + 56 [dbench]
        8,0    3      602     0.609684162   773  D  WS 215023504 + 96 [kworker/3:1H]
        8,0   34     1591     0.609843593     0  C  WS 215023504 + 96 [0]
      
      and you can only guess (after quite some headscratching since the above
      excerpt is intermixed with a lot of other IO) that request 215023544+56
      got merged to request 215023504+40. Provide proper event for request
      merging like we used to do in the legacy block layer.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      f3bdc62f
  16. 27 5月, 2020 2 次提交
  17. 19 5月, 2020 1 次提交
  18. 14 5月, 2020 1 次提交
    • S
      block: Inline encryption support for blk-mq · a892c8d5
      Satya Tangirala 提交于
      We must have some way of letting a storage device driver know what
      encryption context it should use for en/decrypting a request. However,
      it's the upper layers (like the filesystem/fscrypt) that know about and
      manages encryption contexts. As such, when the upper layer submits a bio
      to the block layer, and this bio eventually reaches a device driver with
      support for inline encryption, the device driver will need to have been
      told the encryption context for that bio.
      
      We want to communicate the encryption context from the upper layer to the
      storage device along with the bio, when the bio is submitted to the block
      layer. To do this, we add a struct bio_crypt_ctx to struct bio, which can
      represent an encryption context (note that we can't use the bi_private
      field in struct bio to do this because that field does not function to pass
      information across layers in the storage stack). We also introduce various
      functions to manipulate the bio_crypt_ctx and make the bio/request merging
      logic aware of the bio_crypt_ctx.
      
      We also make changes to blk-mq to make it handle bios with encryption
      contexts. blk-mq can merge many bios into the same request. These bios need
      to have contiguous data unit numbers (the necessary changes to blk-merge
      are also made to ensure this) - as such, it suffices to keep the data unit
      number of just the first bio, since that's all a storage driver needs to
      infer the data unit number to use for each data block in each bio in a
      request. blk-mq keeps track of the encryption context to be used for all
      the bios in a request with the request's rq_crypt_ctx. When the first bio
      is added to an empty request, blk-mq will program the encryption context
      of that bio into the request_queue's keyslot manager, and store the
      returned keyslot in the request's rq_crypt_ctx. All the functions to
      operate on encryption contexts are in blk-crypto.c.
      
      Upper layers only need to call bio_crypt_set_ctx with the encryption key,
      algorithm and data_unit_num; they don't have to worry about getting a
      keyslot for each encryption context, as blk-mq/blk-crypto handles that.
      Blk-crypto also makes it possible for request-based layered devices like
      dm-rq to make use of inline encryption hardware by cloning the
      rq_crypt_ctx and programming a keyslot in the new request_queue when
      necessary.
      
      Note that any user of the block layer can submit bios with an
      encryption context, such as filesystems, device-mapper targets, etc.
      Signed-off-by: NSatya Tangirala <satyat@google.com>
      Reviewed-by: NEric Biggers <ebiggers@google.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      a892c8d5
  19. 29 4月, 2020 1 次提交
  20. 23 4月, 2020 4 次提交
  21. 15 1月, 2020 1 次提交
  22. 30 12月, 2019 1 次提交
    • M
      block: fix splitting segments on boundary masks · 429120f3
      Ming Lei 提交于
      We ran into a problem with a mpt3sas based controller, where we would
      see random (and hard to reproduce) file corruption). The issue seemed
      specific to this controller, but wasn't specific to the file system.
      After a lot of debugging, we find out that it's caused by segments
      spanning a 4G memory boundary. This shouldn't happen, as the default
      setting for segment boundary masks is 4G.
      
      Turns out there are two issues in get_max_segment_size():
      
      1) The default segment boundary mask is bypassed
      
      2) The segment start address isn't taken into account when checking
         segment boundary limit
      
      Fix these two issues by removing the bypass of the segment boundary
      check even if the mask is set to the default value, and taking into
      account the actual start address of the request when checking if a
      segment needs splitting.
      
      Cc: stable@vger.kernel.org # v5.1+
      Reviewed-by: NChris Mason <clm@fb.com>
      Tested-by: NChris Mason <clm@fb.com>
      Fixes: dcebd755 ("block: use bio_for_each_bvec() to compute multi-page bvec count")
      Signed-off-by: NMing Lei <ming.lei@redhat.com>
      
      Dropped const on the page pointer, ppc page_to_phys() doesn't mark the
      page as const...
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      429120f3
  23. 22 11月, 2019 1 次提交
  24. 08 11月, 2019 2 次提交
  25. 05 11月, 2019 1 次提交
  26. 05 8月, 2019 3 次提交
    • B
      block: Improve physical block alignment of split bios · 9cc5169c
      Bart Van Assche 提交于
      Consider the following example:
      * The logical block size is 4 KB.
      * The physical block size is 8 KB.
      * max_sectors equals (16 KB >> 9) sectors.
      * A non-aligned 4 KB and an aligned 64 KB bio are merged into a single
        non-aligned 68 KB bio.
      
      The current behavior is to split such a bio into (16 KB + 16 KB + 16 KB
      + 16 KB + 4 KB). The start of none of these five bio's is aligned to a
      physical block boundary.
      
      This patch ensures that such a bio is split into four aligned and
      one non-aligned bio instead of being split into five non-aligned bios.
      This improves performance because most block devices can handle aligned
      requests faster than non-aligned requests.
      
      Since the physical block size is larger than or equal to the logical
      block size, this patch preserves the guarantee that the returned
      value is a multiple of the logical block size.
      
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Ming Lei <ming.lei@redhat.com>
      Cc: Hannes Reinecke <hare@suse.com>
      Signed-off-by: NBart Van Assche <bvanassche@acm.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      9cc5169c
    • B
      block: Simplify blk_bio_segment_split() · 708b25b3
      Bart Van Assche 提交于
      Move the max_sectors check into bvec_split_segs() such that a single
      call to that function can do all the necessary checks. This patch
      optimizes the fast path further, namely if a bvec fits in a page.
      
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Ming Lei <ming.lei@redhat.com>
      Cc: Hannes Reinecke <hare@suse.com>
      Signed-off-by: NBart Van Assche <bvanassche@acm.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      708b25b3
    • B
      block: Simplify bvec_split_segs() · ff9811b3
      Bart Van Assche 提交于
      Simplify this function by by removing two if-tests. Other than requiring
      that the @sectors pointer is not NULL, this patch does not change the
      behavior of bvec_split_segs().
      Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Ming Lei <ming.lei@redhat.com>
      Cc: Hannes Reinecke <hare@suse.com>
      Signed-off-by: NBart Van Assche <bvanassche@acm.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      ff9811b3