1. 18 10月, 2021 1 次提交
  2. 16 10月, 2021 1 次提交
  3. 08 9月, 2021 1 次提交
  4. 24 8月, 2021 4 次提交
  5. 18 8月, 2021 1 次提交
    • M
      blk-mq: fix is_flush_rq · a9ed27a7
      Ming Lei 提交于
      is_flush_rq() is called from bt_iter()/bt_tags_iter(), and runs the
      following check:
      
      	hctx->fq->flush_rq == req
      
      but the passed hctx from bt_iter()/bt_tags_iter() may be NULL because:
      
      1) memory re-order in blk_mq_rq_ctx_init():
      
      	rq->mq_hctx = data->hctx;
      	...
      	refcount_set(&rq->ref, 1);
      
      OR
      
      2) tag re-use and ->rqs[] isn't updated with new request.
      
      Fix the issue by re-writing is_flush_rq() as:
      
      	return rq->end_io == flush_end_io;
      
      which turns out simpler to follow and immune to data race since we have
      ordered WRITE rq->end_io and refcount_set(&rq->ref, 1).
      
      Fixes: 2e315dc0 ("blk-mq: grab rq->refcount before calling ->fn in blk_mq_tagset_busy_iter")
      Cc: "Blank-Burian, Markus, Dr." <blankburian@uni-muenster.de>
      Cc: Yufen Yu <yuyufen@huawei.com>
      Signed-off-by: NMing Lei <ming.lei@redhat.com>
      Link: https://lore.kernel.org/r/20210818010925.607383-1-ming.lei@redhat.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
      a9ed27a7
  6. 17 8月, 2021 1 次提交
  7. 13 8月, 2021 1 次提交
  8. 11 8月, 2021 1 次提交
  9. 10 8月, 2021 1 次提交
  10. 31 7月, 2021 1 次提交
  11. 01 7月, 2021 1 次提交
  12. 25 6月, 2021 1 次提交
  13. 18 6月, 2021 3 次提交
  14. 12 6月, 2021 4 次提交
  15. 04 6月, 2021 1 次提交
    • J
      block: Do not pull requests from the scheduler when we cannot dispatch them · 61347154
      Jan Kara 提交于
      Provided the device driver does not implement dispatch budget accounting
      (which only SCSI does) the loop in __blk_mq_do_dispatch_sched() pulls
      requests from the IO scheduler as long as it is willing to give out any.
      That defeats scheduling heuristics inside the scheduler by creating
      false impression that the device can take more IO when it in fact
      cannot.
      
      For example with BFQ IO scheduler on top of virtio-blk device setting
      blkio cgroup weight has barely any impact on observed throughput of
      async IO because __blk_mq_do_dispatch_sched() always sucks out all the
      IO queued in BFQ. BFQ first submits IO from higher weight cgroups but
      when that is all dispatched, it will give out IO of lower weight cgroups
      as well. And then we have to wait for all this IO to be dispatched to
      the disk (which means lot of it actually has to complete) before the
      IO scheduler is queried again for dispatching more requests. This
      completely destroys any service differentiation.
      
      So grab request tag for a request pulled out of the IO scheduler already
      in __blk_mq_do_dispatch_sched() and do not pull any more requests if we
      cannot get it because we are unlikely to be able to dispatch it. That
      way only single request is going to wait in the dispatch list for some
      tag to free.
      Reviewed-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Link: https://lore.kernel.org/r/20210603104721.6309-1-jack@suse.czSigned-off-by: NJens Axboe <axboe@kernel.dk>
      61347154
  16. 24 5月, 2021 5 次提交
  17. 14 5月, 2021 2 次提交
  18. 16 4月, 2021 1 次提交
    • L
      blk-mq: bypass IO scheduler's limit_depth for passthrough request · 8d663f34
      Lin Feng 提交于
      Commit 01e99aec ("blk-mq: insert passthrough request into
      hctx->dispatch directly") gives high priority to passthrough requests and
      bypass underlying IO scheduler. But as we allocate tag for such request it
      still runs io-scheduler's callback limit_depth, while we really want is to
      give full sbitmap-depth capabity to such request for acquiring available
      tag.
      blktrace shows PC requests(dmraid -s -c -i) hit bfq's limit_depth:
        8,0    2        0     0.000000000 39952 1,0  m   N bfq [bfq_limit_depth] wr_busy 0 sync 0 depth 8
        8,0    2        1     0.000008134 39952  D   R 4 [dmraid]
        8,0    2        2     0.000021538    24  C   R [0]
        8,0    2        0     0.000035442 39952 1,0  m   N bfq [bfq_limit_depth] wr_busy 0 sync 0 depth 8
        8,0    2        3     0.000038813 39952  D   R 24 [dmraid]
        8,0    2        4     0.000044356    24  C   R [0]
      
      This patch introduce a new wrapper to make code not that ugly.
      Signed-off-by: NLin Feng <linf@wangsu.com>
      Reviewed-by: NMing Lei <ming.lei@redhat.com>
      Link: https://lore.kernel.org/r/20210415033920.213963-1-linf@wangsu.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
      8d663f34
  19. 09 4月, 2021 1 次提交
  20. 05 3月, 2021 3 次提交
  21. 12 2月, 2021 2 次提交
  22. 25 1月, 2021 3 次提交
    • J
      blk-mq: Improve performance of non-mq IO schedulers with multiple HW queues · b6e68ee8
      Jan Kara 提交于
      Currently when non-mq aware IO scheduler (BFQ, mq-deadline) is used for
      a queue with multiple HW queues, the performance it rather bad. The
      problem is that these IO schedulers use queue-wide locking and their
      dispatch function does not respect the hctx it is passed in and returns
      any request it finds appropriate. Thus locality of request access is
      broken and dispatch from multiple CPUs just contends on IO scheduler
      locks. For these IO schedulers there's little point in dispatching from
      multiple CPUs. Instead dispatch always only from a single CPU to limit
      contention.
      
      Below is a comparison of dbench runs on XFS filesystem where the storage
      is a raid card with 64 HW queues and to it attached a single rotating
      disk. BFQ is used as IO scheduler:
      
            clients           MQ                     SQ             MQ-Patched
      Amean 1      39.12 (0.00%)       43.29 * -10.67%*       36.09 *   7.74%*
      Amean 2     128.58 (0.00%)      101.30 *  21.22%*       96.14 *  25.23%*
      Amean 4     577.42 (0.00%)      494.47 *  14.37%*      508.49 *  11.94%*
      Amean 8     610.95 (0.00%)      363.86 *  40.44%*      362.12 *  40.73%*
      Amean 16    391.78 (0.00%)      261.49 *  33.25%*      282.94 *  27.78%*
      Amean 32    324.64 (0.00%)      267.71 *  17.54%*      233.00 *  28.23%*
      Amean 64    295.04 (0.00%)      253.02 *  14.24%*      242.37 *  17.85%*
      Amean 512 10281.61 (0.00%)    10211.16 *   0.69%*    10447.53 *  -1.61%*
      
      Numbers are times so lower is better. MQ is stock 5.10-rc6 kernel. SQ is
      the same kernel with megaraid_sas.host_tagset_enable=0 so that the card
      advertises just a single HW queue. MQ-Patched is a kernel with this
      patch applied.
      
      You can see multiple hardware queues heavily hurt performance in
      combination with BFQ. The patch restores the performance.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Reviewed-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      b6e68ee8
    • J
      Revert "blk-mq, elevator: Count requests per hctx to improve performance" · 5ac83c64
      Jan Kara 提交于
      This reverts commit b445547e.
      
      Since both mq-deadline and BFQ completely ignore hctx they are passed to
      their dispatch function and dispatch whatever request they deem fit
      checking whether any request for a particular hctx is queued is just
      pointless since we'll very likely get a request from a different hctx
      anyway. In the following commit we'll deal with lock contention in these
      IO schedulers in presence of multiple HW queues in a different way.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Reviewed-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      5ac83c64
    • C
      block: store a block_device pointer in struct bio · 309dca30
      Christoph Hellwig 提交于
      Replace the gendisk pointer in struct bio with a pointer to the newly
      improved struct block device.  From that the gendisk can be trivially
      accessed with an extra indirection, but it also allows to directly
      look up all information related to partition remapping.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Acked-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      309dca30