1. 04 6月, 2021 1 次提交
    • J
      block: Do not pull requests from the scheduler when we cannot dispatch them · 61347154
      Jan Kara 提交于
      Provided the device driver does not implement dispatch budget accounting
      (which only SCSI does) the loop in __blk_mq_do_dispatch_sched() pulls
      requests from the IO scheduler as long as it is willing to give out any.
      That defeats scheduling heuristics inside the scheduler by creating
      false impression that the device can take more IO when it in fact
      cannot.
      
      For example with BFQ IO scheduler on top of virtio-blk device setting
      blkio cgroup weight has barely any impact on observed throughput of
      async IO because __blk_mq_do_dispatch_sched() always sucks out all the
      IO queued in BFQ. BFQ first submits IO from higher weight cgroups but
      when that is all dispatched, it will give out IO of lower weight cgroups
      as well. And then we have to wait for all this IO to be dispatched to
      the disk (which means lot of it actually has to complete) before the
      IO scheduler is queried again for dispatching more requests. This
      completely destroys any service differentiation.
      
      So grab request tag for a request pulled out of the IO scheduler already
      in __blk_mq_do_dispatch_sched() and do not pull any more requests if we
      cannot get it because we are unlikely to be able to dispatch it. That
      way only single request is going to wait in the dispatch list for some
      tag to free.
      Reviewed-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Link: https://lore.kernel.org/r/20210603104721.6309-1-jack@suse.czSigned-off-by: NJens Axboe <axboe@kernel.dk>
      61347154
  2. 24 5月, 2021 1 次提交
  3. 05 3月, 2021 1 次提交
  4. 25 1月, 2021 1 次提交
  5. 13 12月, 2020 1 次提交
  6. 02 12月, 2020 1 次提交
  7. 04 9月, 2020 5 次提交
  8. 02 7月, 2020 1 次提交
  9. 01 7月, 2020 1 次提交
  10. 30 6月, 2020 3 次提交
  11. 29 6月, 2020 1 次提交
  12. 07 6月, 2020 1 次提交
  13. 30 5月, 2020 1 次提交
  14. 27 2月, 2020 1 次提交
  15. 25 2月, 2020 1 次提交
    • M
      blk-mq: insert passthrough request into hctx->dispatch directly · 01e99aec
      Ming Lei 提交于
      For some reason, device may be in one situation which can't handle
      FS request, so STS_RESOURCE is always returned and the FS request
      will be added to hctx->dispatch. However passthrough request may
      be required at that time for fixing the problem. If passthrough
      request is added to scheduler queue, there isn't any chance for
      blk-mq to dispatch it given we prioritize requests in hctx->dispatch.
      Then the FS IO request may never be completed, and IO hang is caused.
      
      So passthrough request has to be added to hctx->dispatch directly
      for fixing the IO hang.
      
      Fix this issue by inserting passthrough request into hctx->dispatch
      directly together withing adding FS request to the tail of
      hctx->dispatch in blk_mq_dispatch_rq_list(). Actually we add FS request
      to tail of hctx->dispatch at default, see blk_mq_request_bypass_insert().
      
      Then it becomes consistent with original legacy IO request
      path, in which passthrough request is always added to q->queue_head.
      
      Cc: Dongli Zhang <dongli.zhang@oracle.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Ewan D. Milne <emilne@redhat.com>
      Signed-off-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      01e99aec
  16. 07 10月, 2019 1 次提交
  17. 11 7月, 2019 1 次提交
    • D
      block: Disable write plugging for zoned block devices · b49773e7
      Damien Le Moal 提交于
      Simultaneously writing to a sequential zone of a zoned block device
      from multiple contexts requires mutual exclusion for BIO issuing to
      ensure that writes happen sequentially. However, even for a well
      behaved user correctly implementing such synchronization, BIO plugging
      may interfere and result in BIOs from the different contextx to be
      reordered if plugging is done outside of the mutual exclusion section,
      e.g. the plug was started by a function higher in the call chain than
      the function issuing BIOs.
      
               Context A                     Context B
      
         | blk_start_plug()
         | ...
         | seq_write_zone()
           | mutex_lock(zone)
           | bio-0->bi_iter.bi_sector = zone->wp
           | zone->wp += bio_sectors(bio-0)
           | submit_bio(bio-0)
           | bio-1->bi_iter.bi_sector = zone->wp
           | zone->wp += bio_sectors(bio-1)
           | submit_bio(bio-1)
           | mutex_unlock(zone)
           | return
         | -----------------------> | seq_write_zone()
        				| mutex_lock(zone)
           				| bio-2->bi_iter.bi_sector = zone->wp
           				| zone->wp += bio_sectors(bio-2)
      				| submit_bio(bio-2)
      				| mutex_unlock(zone)
         | <------------------------- |
         | blk_finish_plug()
      
      In the above example, despite the mutex synchronization ensuring the
      correct BIO issuing order 0, 1, 2, context A BIOs 0 and 1 end up being
      issued after BIO 2 of context B, when the plug is released with
      blk_finish_plug().
      
      While this problem can be addressed using the blk_flush_plug_list()
      function (in the above example, the call must be inserted before the
      zone mutex lock is released), a simple generic solution in the block
      layer avoid this additional code in all zoned block device user code.
      The simple generic solution implemented with this patch is to introduce
      the internal helper function blk_mq_plug() to access the current
      context plug on BIO submission. This helper returns the current plug
      only if the target device is not a zoned block device or if the BIO to
      be plugged is not a write operation. Otherwise, the caller context plug
      is ignored and NULL returned, resulting is all writes to zoned block
      device to never be plugged.
      Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      b49773e7
  18. 03 7月, 2019 1 次提交
  19. 04 5月, 2019 1 次提交
    • M
      blk-mq: free hw queue's resource in hctx's release handler · c7e2d94b
      Ming Lei 提交于
      Once blk_cleanup_queue() returns, tags shouldn't be used any more,
      because blk_mq_free_tag_set() may be called. Commit 45a9c9d9
      ("blk-mq: Fix a use-after-free") fixes this issue exactly.
      
      However, that commit introduces another issue. Before 45a9c9d9,
      we are allowed to run queue during cleaning up queue if the queue's
      kobj refcount is held. After that commit, queue can't be run during
      queue cleaning up, otherwise oops can be triggered easily because
      some fields of hctx are freed by blk_mq_free_queue() in blk_cleanup_queue().
      
      We have invented ways for addressing this kind of issue before, such as:
      
      	8dc765d4 ("SCSI: fix queue cleanup race before queue initialization is done")
      	c2856ae2 ("blk-mq: quiesce queue before freeing queue")
      
      But still can't cover all cases, recently James reports another such
      kind of issue:
      
      	https://marc.info/?l=linux-scsi&m=155389088124782&w=2
      
      This issue can be quite hard to address by previous way, given
      scsi_run_queue() may run requeues for other LUNs.
      
      Fixes the above issue by freeing hctx's resources in its release handler, and this
      way is safe becasue tags isn't needed for freeing such hctx resource.
      
      This approach follows typical design pattern wrt. kobject's release handler.
      
      Cc: Dongli Zhang <dongli.zhang@oracle.com>
      Cc: James Smart <james.smart@broadcom.com>
      Cc: Bart Van Assche <bart.vanassche@wdc.com>
      Cc: linux-scsi@vger.kernel.org,
      Cc: Martin K . Petersen <martin.petersen@oracle.com>,
      Cc: Christoph Hellwig <hch@lst.de>,
      Cc: James E . J . Bottomley <jejb@linux.vnet.ibm.com>,
      Reported-by: NJames Smart <james.smart@broadcom.com>
      Fixes: 45a9c9d9 ("blk-mq: Fix a use-after-free")
      Cc: stable@vger.kernel.org
      Reviewed-by: NHannes Reinecke <hare@suse.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Tested-by: NJames Smart <james.smart@broadcom.com>
      Signed-off-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      c7e2d94b
  20. 05 4月, 2019 1 次提交
    • B
      block: Revert v5.0 blk_mq_request_issue_directly() changes · fd9c40f6
      Bart Van Assche 提交于
      blk_mq_try_issue_directly() can return BLK_STS*_RESOURCE for requests that
      have been queued. If that happens when blk_mq_try_issue_directly() is called
      by the dm-mpath driver then dm-mpath will try to resubmit a request that is
      already queued and a kernel crash follows. Since it is nontrivial to fix
      blk_mq_request_issue_directly(), revert the blk_mq_request_issue_directly()
      changes that went into kernel v5.0.
      
      This patch reverts the following commits:
      * d6a51a97 ("blk-mq: replace and kill blk_mq_request_issue_directly") # v5.0.
      * 5b7a6f12 ("blk-mq: issue directly with bypass 'false' in blk_mq_sched_insert_requests") # v5.0.
      * 7f556a44 ("blk-mq: refactor the code of issue request directly") # v5.0.
      
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Ming Lei <ming.lei@redhat.com>
      Cc: Jianchao Wang <jianchao.w.wang@oracle.com>
      Cc: Hannes Reinecke <hare@suse.com>
      Cc: Johannes Thumshirn <jthumshirn@suse.de>
      Cc: James Smart <james.smart@broadcom.com>
      Cc: Dongli Zhang <dongli.zhang@oracle.com>
      Cc: Laurence Oberman <loberman@redhat.com>
      Cc: <stable@vger.kernel.org>
      Reported-by: NLaurence Oberman <loberman@redhat.com>
      Tested-by: NLaurence Oberman <loberman@redhat.com>
      Fixes: 7f556a44 ("blk-mq: refactor the code of issue request directly") # v5.0.
      Signed-off-by: NBart Van Assche <bvanassche@acm.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      fd9c40f6
  21. 25 3月, 2019 1 次提交
  22. 21 3月, 2019 1 次提交
  23. 09 2月, 2019 1 次提交
  24. 01 2月, 2019 2 次提交
  25. 18 12月, 2018 1 次提交
    • M
      blk-mq: fix dispatch from sw queue · c16d6b5a
      Ming Lei 提交于
      When a request is added to rq list of sw queue(ctx), the rq may be from
      a different type of hctx, especially after multi queue mapping is
      introduced.
      
      So when dispach request from sw queue via blk_mq_flush_busy_ctxs() or
      blk_mq_dequeue_from_ctx(), one request belonging to other queue type of
      hctx can be dispatched to current hctx in case that read queue or poll
      queue is enabled.
      
      This patch fixes this issue by introducing per-queue-type list.
      
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: NMing Lei <ming.lei@redhat.com>
      
      Changed by me to not use separately cacheline aligned lists, just
      place them all in the same cacheline where we had just the one list
      and lock before.
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      c16d6b5a
  26. 17 12月, 2018 1 次提交
  27. 16 12月, 2018 1 次提交
  28. 10 12月, 2018 1 次提交
  29. 05 12月, 2018 1 次提交
  30. 30 11月, 2018 1 次提交
  31. 21 11月, 2018 1 次提交
  32. 08 11月, 2018 2 次提交