1. 03 7月, 2019 1 次提交
  2. 21 6月, 2019 1 次提交
    • C
      block: remove the bi_phys_segments field in struct bio · 14ccb66b
      Christoph Hellwig 提交于
      We only need the number of segments in the blk-mq submission path.
      Remove the field from struct bio, and return it from a variant of
      blk_queue_split instead of that it can passed as an argument to
      those functions that need the value.
      
      This also means we stop recounting segments except for cloning
      and partial segments.
      
      To keep the number of arguments in this how path down remove
      pointless struct request_queue arguments from any of the functions
      that had it and grew a nr_segs argument.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      14ccb66b
  3. 13 6月, 2019 1 次提交
    • M
      blk-mq: remove WARN_ON(!q->elevator) from blk_mq_sched_free_requests · c326f846
      Ming Lei 提交于
      blk_mq_sched_free_requests() may be called in failure path in which
      q->elevator may not be setup yet, so remove WARN_ON(!q->elevator) from
      blk_mq_sched_free_requests for avoiding the false positive.
      
      This function is actually safe to call in case of !q->elevator because
      hctx->sched_tags is checked.
      
      Cc: Bart Van Assche <bvanassche@acm.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Yi Zhang <yi.zhang@redhat.com>
      Fixes: c3e22192 ("block: free sched's request pool in blk_cleanup_queue")
      Reported-by: syzbot+b9d0d56867048c7bcfde@syzkaller.appspotmail.com
      Signed-off-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      c326f846
  4. 07 6月, 2019 1 次提交
    • M
      block: free sched's request pool in blk_cleanup_queue · c3e22192
      Ming Lei 提交于
      In theory, IO scheduler belongs to request queue, and the request pool
      of sched tags belongs to the request queue too.
      
      However, the current tags allocation interfaces are re-used for both
      driver tags and sched tags, and driver tags is definitely host wide,
      and doesn't belong to any request queue, same with its request pool.
      So we need tagset instance for freeing request of sched tags.
      
      Meantime, blk_mq_free_tag_set() often follows blk_cleanup_queue() in case
      of non-BLK_MQ_F_TAG_SHARED, this way requires that request pool of sched
      tags to be freed before calling blk_mq_free_tag_set().
      
      Commit 47cdee29 ("block: move blk_exit_queue into __blk_release_queue")
      moves blk_exit_queue into __blk_release_queue for simplying the fast
      path in generic_make_request(), then causes oops during freeing requests
      of sched tags in __blk_release_queue().
      
      Fix the above issue by move freeing request pool of sched tags into
      blk_cleanup_queue(), this way is safe becasue queue has been frozen and no any
      in-queue requests at that time. Freeing sched tags has to be kept in queue's
      release handler becasue there might be un-completed dispatch activity
      which might refer to sched tags.
      
      Cc: Bart Van Assche <bvanassche@acm.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Fixes: 47cdee29 ("block: move blk_exit_queue into __blk_release_queue")
      Tested-by: NYi Zhang <yi.zhang@redhat.com>
      Reported-by: Nkernel test robot <rong.a.chen@intel.com>
      Signed-off-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      c3e22192
  5. 04 5月, 2019 1 次提交
    • M
      blk-mq: grab .q_usage_counter when queuing request from plug code path · e87eb301
      Ming Lei 提交于
      Just like aio/io_uring, we need to grab 2 refcount for queuing one
      request, one is for submission, another is for completion.
      
      If the request isn't queued from plug code path, the refcount grabbed
      in generic_make_request() serves for submission. In theroy, this
      refcount should have been released after the sumission(async run queue)
      is done. blk_freeze_queue() works with blk_sync_queue() together
      for avoiding race between cleanup queue and IO submission, given async
      run queue activities are canceled because hctx->run_work is scheduled with
      the refcount held, so it is fine to not hold the refcount when
      running the run queue work function for dispatch IO.
      
      However, if request is staggered into plug list, and finally queued
      from plug code path, the refcount in submission side is actually missed.
      And we may start to run queue after queue is removed because the queue's
      kobject refcount isn't guaranteed to be grabbed in flushing plug list
      context, then kernel oops is triggered, see the following race:
      
      blk_mq_flush_plug_list():
              blk_mq_sched_insert_requests()
                      insert requests to sw queue or scheduler queue
                      blk_mq_run_hw_queue
      
      Because of concurrent run queue, all requests inserted above may be
      completed before calling the above blk_mq_run_hw_queue. Then queue can
      be freed during the above blk_mq_run_hw_queue().
      
      Fixes the issue by grab .q_usage_counter before calling
      blk_mq_sched_insert_requests() in blk_mq_flush_plug_list(). This way is
      safe because the queue is absolutely alive before inserting request.
      
      Cc: Dongli Zhang <dongli.zhang@oracle.com>
      Cc: James Smart <james.smart@broadcom.com>
      Cc: linux-scsi@vger.kernel.org,
      Cc: Martin K . Petersen <martin.petersen@oracle.com>,
      Cc: Christoph Hellwig <hch@lst.de>,
      Cc: James E . J . Bottomley <jejb@linux.vnet.ibm.com>,
      Reviewed-by: NBart Van Assche <bvanassche@acm.org>
      Tested-by: NJames Smart <james.smart@broadcom.com>
      Signed-off-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      e87eb301
  6. 01 5月, 2019 1 次提交
  7. 05 4月, 2019 1 次提交
    • B
      block: Revert v5.0 blk_mq_request_issue_directly() changes · fd9c40f6
      Bart Van Assche 提交于
      blk_mq_try_issue_directly() can return BLK_STS*_RESOURCE for requests that
      have been queued. If that happens when blk_mq_try_issue_directly() is called
      by the dm-mpath driver then dm-mpath will try to resubmit a request that is
      already queued and a kernel crash follows. Since it is nontrivial to fix
      blk_mq_request_issue_directly(), revert the blk_mq_request_issue_directly()
      changes that went into kernel v5.0.
      
      This patch reverts the following commits:
      * d6a51a97 ("blk-mq: replace and kill blk_mq_request_issue_directly") # v5.0.
      * 5b7a6f12 ("blk-mq: issue directly with bypass 'false' in blk_mq_sched_insert_requests") # v5.0.
      * 7f556a44 ("blk-mq: refactor the code of issue request directly") # v5.0.
      
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Ming Lei <ming.lei@redhat.com>
      Cc: Jianchao Wang <jianchao.w.wang@oracle.com>
      Cc: Hannes Reinecke <hare@suse.com>
      Cc: Johannes Thumshirn <jthumshirn@suse.de>
      Cc: James Smart <james.smart@broadcom.com>
      Cc: Dongli Zhang <dongli.zhang@oracle.com>
      Cc: Laurence Oberman <loberman@redhat.com>
      Cc: <stable@vger.kernel.org>
      Reported-by: NLaurence Oberman <loberman@redhat.com>
      Tested-by: NLaurence Oberman <loberman@redhat.com>
      Fixes: 7f556a44 ("blk-mq: refactor the code of issue request directly") # v5.0.
      Signed-off-by: NBart Van Assche <bvanassche@acm.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      fd9c40f6
  8. 01 2月, 2019 1 次提交
  9. 18 12月, 2018 2 次提交
    • M
      blk-mq: fix dispatch from sw queue · c16d6b5a
      Ming Lei 提交于
      When a request is added to rq list of sw queue(ctx), the rq may be from
      a different type of hctx, especially after multi queue mapping is
      introduced.
      
      So when dispach request from sw queue via blk_mq_flush_busy_ctxs() or
      blk_mq_dequeue_from_ctx(), one request belonging to other queue type of
      hctx can be dispatched to current hctx in case that read queue or poll
      queue is enabled.
      
      This patch fixes this issue by introducing per-queue-type list.
      
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: NMing Lei <ming.lei@redhat.com>
      
      Changed by me to not use separately cacheline aligned lists, just
      place them all in the same cacheline where we had just the one list
      and lock before.
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      c16d6b5a
    • D
      block: mq-deadline: Fix write completion handling · 7211aef8
      Damien Le Moal 提交于
      For a zoned block device using mq-deadline, if a write request for a
      zone is received while another write was already dispatched for the same
      zone, dd_dispatch_request() will return NULL and the newly inserted
      write request is kept in the scheduler queue waiting for the ongoing
      zone write to complete. With this behavior, when no other request has
      been dispatched, rq_list in blk_mq_sched_dispatch_requests() is empty
      and blk_mq_sched_mark_restart_hctx() not called. This in turn leads to
      __blk_mq_free_request() call of blk_mq_sched_restart() to not run the
      queue when the already dispatched write request completes. The newly
      dispatched request stays stuck in the scheduler queue until eventually
      another request is submitted.
      
      This problem does not affect SCSI disk as the SCSI stack handles queue
      restart on request completion. However, this problem is can be triggered
      the nullblk driver with zoned mode enabled.
      
      Fix this by always requesting a queue restart in dd_dispatch_request()
      if no request was dispatched while WRITE requests are queued.
      
      Fixes: 5700f691 ("mq-deadline: Introduce zone locking support")
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
      
      Add missing export of blk_mq_sched_restart()
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      7211aef8
  10. 16 12月, 2018 1 次提交
  11. 21 11月, 2018 1 次提交
  12. 20 11月, 2018 1 次提交
  13. 16 11月, 2018 1 次提交
  14. 08 11月, 2018 5 次提交
  15. 21 8月, 2018 1 次提交
    • J
      blk-mq: init hctx sched after update ctx and hctx mapping · d48ece20
      Jianchao Wang 提交于
      Currently, when update nr_hw_queues, IO scheduler's init_hctx will
      be invoked before the mapping between ctx and hctx is adapted
      correctly by blk_mq_map_swqueue. The IO scheduler init_hctx (kyber)
      may depend on this mapping and get wrong result and panic finally.
      A simply way to fix this is that switch the IO scheduler to 'none'
      before update the nr_hw_queues, and then switch it back after
      update nr_hw_queues. blk_mq_sched_init_/exit_hctx are removed due
      to nobody use them any more.
      Signed-off-by: NJianchao Wang <jianchao.w.wang@oracle.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      d48ece20
  16. 18 7月, 2018 1 次提交
    • M
      blk-mq: issue directly if hw queue isn't busy in case of 'none' · 6ce3dd6e
      Ming Lei 提交于
      In case of 'none' io scheduler, when hw queue isn't busy, it isn't
      necessary to enqueue request to sw queue and dequeue it from
      sw queue because request may be submitted to hw queue asap without
      extra cost, meantime there shouldn't be much request in sw queue,
      and we don't need to worry about effect on IO merge.
      
      There are still some single hw queue SCSI HBAs(HPSA, megaraid_sas, ...)
      which may connect high performance devices, so 'none' is often required
      for obtaining good performance.
      
      This patch improves IOPS and decreases CPU unilization on megaraid_sas,
      per Kashyap's test.
      
      Cc: Kashyap Desai <kashyap.desai@broadcom.com>
      Cc: Laurence Oberman <loberman@redhat.com>
      Cc: Omar Sandoval <osandov@fb.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Bart Van Assche <bart.vanassche@wdc.com>
      Cc: Hannes Reinecke <hare@suse.de>
      Reported-by: NKashyap Desai <kashyap.desai@broadcom.com>
      Tested-by: NKashyap Desai <kashyap.desai@broadcom.com>
      Signed-off-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      6ce3dd6e
  17. 09 7月, 2018 3 次提交
    • M
      blk-mq: dequeue request one by one from sw queue if hctx is busy · 6e768717
      Ming Lei 提交于
      It won't be efficient to dequeue request one by one from sw queue,
      but we have to do that when queue is busy for better merge performance.
      
      This patch takes the Exponential Weighted Moving Average(EWMA) to figure
      out if queue is busy, then only dequeue request one by one from sw queue
      when queue is busy.
      
      Fixes: b347689f ("blk-mq-sched: improve dispatching from sw queue")
      Cc: Kashyap Desai <kashyap.desai@broadcom.com>
      Cc: Laurence Oberman <loberman@redhat.com>
      Cc: Omar Sandoval <osandov@fb.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Bart Van Assche <bart.vanassche@wdc.com>
      Cc: Hannes Reinecke <hare@suse.de>
      Reported-by: NKashyap Desai <kashyap.desai@broadcom.com>
      Tested-by: NKashyap Desai <kashyap.desai@broadcom.com>
      Signed-off-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      6e768717
    • M
      blk-mq: only attempt to merge bio if there is rq in sw queue · b04f50ab
      Ming Lei 提交于
      Only attempt to merge bio iff the ctx->rq_list isn't empty, because:
      
      1) for high-performance SSD, most of times dispatch may succeed, then
      there may be nothing left in ctx->rq_list, so don't try to merge over
      sw queue if it is empty, then we can save one acquiring of ctx->lock
      
      2) we can't expect good merge performance on per-cpu sw queue, and missing
      one merge on sw queue won't be a big deal since tasks can be scheduled from
      one CPU to another.
      
      Cc: Laurence Oberman <loberman@redhat.com>
      Cc: Omar Sandoval <osandov@fb.com>
      Cc: Bart Van Assche <bart.vanassche@wdc.com>
      Tested-by: NKashyap Desai <kashyap.desai@broadcom.com>
      Reported-by: NKashyap Desai <kashyap.desai@broadcom.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      b04f50ab
    • M
      blk-mq: remove synchronize_rcu() from blk_mq_del_queue_tag_set() · 97889f9a
      Ming Lei 提交于
      We have to remove synchronize_rcu() from blk_queue_cleanup(),
      otherwise long delay can be caused during lun probe. For removing
      it, we have to avoid to iterate the set->tag_list in IO path, eg,
      blk_mq_sched_restart().
      
      This patch reverts 5b79413946d (Revert "blk-mq: don't handle
      TAG_SHARED in restart"). Given we have fixed enough IO hang issue,
      and there isn't any reason to restart all queues in one tags any more,
      see the following reasons:
      
      1) blk-mq core can deal with shared-tags case well via blk_mq_get_driver_tag(),
      which can wake up queues waiting for driver tag.
      
      2) SCSI is a bit special because it may return BLK_STS_RESOURCE if queue,
      target or host is ready, but SCSI built-in restart can cover all these well,
      see scsi_end_request(), queue will be rerun after any request initiated from
      this host/target is completed.
      
      In my test on scsi_debug(8 luns), this patch may improve IOPS by 20% ~ 30%
      when running I/O on these 8 luns concurrently.
      
      Fixes: 705cda97 ("blk-mq: Make it safe to use RCU to iterate over blk_mq_tag_set.tag_list")
      Cc: Omar Sandoval <osandov@fb.com>
      Cc: Bart Van Assche <bart.vanassche@wdc.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Martin K. Petersen <martin.petersen@oracle.com>
      Cc: linux-scsi@vger.kernel.org
      Reported-by: NAndrew Jones <drjones@redhat.com>
      Tested-by: NAndrew Jones <drjones@redhat.com>
      Signed-off-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      97889f9a
  18. 03 6月, 2018 1 次提交
  19. 01 6月, 2018 2 次提交
  20. 31 5月, 2018 1 次提交
  21. 02 2月, 2018 1 次提交
  22. 18 1月, 2018 1 次提交
  23. 05 1月, 2018 1 次提交
  24. 11 11月, 2017 2 次提交
    • J
      blk-mq: only run the hardware queue if IO is pending · 79f720a7
      Jens Axboe 提交于
      Currently we are inconsistent in when we decide to run the queue. Using
      blk_mq_run_hw_queues() we check if the hctx has pending IO before
      running it, but we don't do that from the individual queue run function,
      blk_mq_run_hw_queue(). This results in a lot of extra and pointless
      queue runs, potentially, on flush requests and (much worse) on tag
      starvation situations. This is observable just looking at top output,
      with lots of kworkers active. For the !async runs, it just adds to the
      CPU overhead of blk-mq.
      
      Move the has-pending check into the run function instead of having
      callers do it.
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      79f720a7
    • J
      Revert "blk-mq: don't handle TAG_SHARED in restart" · 05b79413
      Jens Axboe 提交于
      This reverts commit 358a3a6b.
      
      We have cases that aren't covered 100% in the drivers, so for now
      we have to retain the shared tag restart loops.
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      05b79413
  25. 05 11月, 2017 3 次提交
    • M
      blk-mq: don't allocate driver tag upfront for flush rq · 923218f6
      Ming Lei 提交于
      The idea behind it is simple:
      
      1) for none scheduler, driver tag has to be borrowed for flush rq,
         otherwise we may run out of tag, and that causes an IO hang. And
         get/put driver tag is actually noop for none, so reordering tags
         isn't necessary at all.
      
      2) for a real I/O scheduler, we need not allocate a driver tag upfront
         for flush rq. It works just fine to follow the same approach as
         normal requests: allocate driver tag for each rq just before calling
         ->queue_rq().
      
      One driver visible change is that the driver tag isn't shared in the
      flush request sequence. That won't be a problem, since we always do that
      in legacy path.
      
      Then flush rq need not be treated specially wrt. get/put driver tag.
      This cleans up the code - for instance, reorder_tags_to_front() can be
      removed, and we needn't worry about request ordering in dispatch list
      for avoiding I/O deadlock.
      
      Also we have to put the driver tag before requeueing.
      Signed-off-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      923218f6
    • M
      blk-mq-sched: decide how to handle flush rq via RQF_FLUSH_SEQ · a6a252e6
      Ming Lei 提交于
      In case of IO scheduler we always pre-allocate one driver tag before
      calling blk_insert_flush(), and flush request will be marked as
      RQF_FLUSH_SEQ once it is in flush machinery.
      
      So if RQF_FLUSH_SEQ isn't set, we call blk_insert_flush() to handle
      the request, otherwise the flush request is dispatched to ->dispatch
      list directly.
      
      This is a preparation patch for not preallocating a driver tag for flush
      requests, and for not treating flush requests as a special case. This is
      similar to what the legacy path does.
      Signed-off-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      a6a252e6
    • M
      blk-mq: don't handle failure in .get_budget · 88022d72
      Ming Lei 提交于
      It is enough to just check if we can get the budget via .get_budget().
      And we don't need to deal with device state change in .get_budget().
      
      For SCSI, one issue to be fixed is that we have to call
      scsi_mq_uninit_cmd() to free allocated ressources if SCSI device fails
      to handle the request. And it isn't enough to simply call
      blk_mq_end_request() to do that if this request is marked as
      RQF_DONTPREP.
      
      Fixes: 0df21c86(scsi: implement .get_budget and .put_budget for blk-mq)
      Signed-off-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      88022d72
  26. 01 11月, 2017 4 次提交
    • M
      blk-mq: don't restart queue when .get_budget returns BLK_STS_RESOURCE · 1f460b63
      Ming Lei 提交于
      SCSI restarts its queue in scsi_end_request() automatically, so we don't
      need to handle this case in blk-mq.
      
      Especailly any request won't be dequeued in this case, we needn't to
      worry about IO hang caused by restart vs. dispatch.
      Signed-off-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      1f460b63
    • M
      blk-mq: don't handle TAG_SHARED in restart · 358a3a6b
      Ming Lei 提交于
      Now restart is used in the following cases, and TAG_SHARED is for
      SCSI only.
      
      1) .get_budget() returns BLK_STS_RESOURCE
      - if resource in target/host level isn't satisfied, this SCSI device
      will be added in shost->starved_list, and the whole queue will be rerun
      (via SCSI's built-in RESTART) in scsi_end_request() after any request
      initiated from this host/targe is completed. Forget to mention, host level
      resource can't be an issue for blk-mq at all.
      
      - the same is true if resource in the queue level isn't satisfied.
      
      - if there isn't outstanding request on this queue, then SCSI's RESTART
      can't work(blk-mq's can't work too), and the queue will be run after
      SCSI_QUEUE_DELAY, and finally all starved sdevs will be handled by SCSI's
      RESTART when this request is finished
      
      2) scsi_dispatch_cmd() returns BLK_STS_RESOURCE
      - if there isn't onprogressing request on this queue, the queue
      will be run after SCSI_QUEUE_DELAY
      
      - otherwise, SCSI's RESTART covers the rerun.
      
      3) blk_mq_get_driver_tag() failed
      - BLK_MQ_S_TAG_WAITING covers the cross-queue RESTART for driver
      allocation.
      
      In one word, SCSI's built-in RESTART is enough to cover the queue
      rerun, and we don't need to pay special attention to TAG_SHARED wrt. restart.
      
      In my test on scsi_debug(8 luns), this patch improves IOPS by 20% ~ 30% when
      running I/O on these 8 luns concurrently.
      
      Aslo Roman Pen reported the current RESTART is very expensive especialy
      when there are lots of LUNs attached in one host, such as in his
      test, RESTART causes half of IOPS be cut.
      
      Fixes: https://marc.info/?l=linux-kernel&m=150832216727524&w=2
      Fixes: 6d8c6c0f ("blk-mq: Restart a single queue if tag sets are shared")
      Signed-off-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      358a3a6b
    • M
      blk-mq-sched: improve dispatching from sw queue · b347689f
      Ming Lei 提交于
      SCSI devices use host-wide tagset, and the shared driver tag space is
      often quite big. However, there is also a queue depth for each lun(
      .cmd_per_lun), which is often small, for example, on both lpfc and
      qla2xxx, .cmd_per_lun is just 3.
      
      So lots of requests may stay in sw queue, and we always flush all
      belonging to same hw queue and dispatch them all to driver.
      Unfortunately it is easy to cause queue busy because of the small
      .cmd_per_lun.  Once these requests are flushed out, they have to stay in
      hctx->dispatch, and no bio merge can happen on these requests, and
      sequential IO performance is harmed.
      
      This patch introduces blk_mq_dequeue_from_ctx for dequeuing a request
      from a sw queue, so that we can dispatch them in scheduler's way. We can
      then avoid dequeueing too many requests from sw queue, since we don't
      flush ->dispatch completely.
      
      This patch improves dispatching from sw queue by using the .get_budget
      and .put_budget callbacks.
      Reviewed-by: NOmar Sandoval <osandov@fb.com>
      Signed-off-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      b347689f
    • M
      blk-mq: introduce .get_budget and .put_budget in blk_mq_ops · de148297
      Ming Lei 提交于
      For SCSI devices, there is often a per-request-queue depth, which needs
      to be respected before queuing one request.
      
      Currently blk-mq always dequeues the request first, then calls
      .queue_rq() to dispatch the request to lld. One obvious issue with this
      approach is that I/O merging may not be successful, because when the
      per-request-queue depth can't be respected, .queue_rq() has to return
      BLK_STS_RESOURCE, and then this request has to stay in hctx->dispatch
      list. This means it never gets a chance to be merged with other IO.
      
      This patch introduces .get_budget and .put_budget callback in blk_mq_ops,
      then we can try to get reserved budget first before dequeuing request.
      If the budget for queueing I/O can't be satisfied, we don't need to
      dequeue request at all. Hence the request can be left in the IO
      scheduler queue, for more merging opportunities.
      Signed-off-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      de148297