1. 01 11月, 2017 2 次提交
    • M
      blk-mq-sched: improve dispatching from sw queue · b347689f
      Ming Lei 提交于
      SCSI devices use host-wide tagset, and the shared driver tag space is
      often quite big. However, there is also a queue depth for each lun(
      .cmd_per_lun), which is often small, for example, on both lpfc and
      qla2xxx, .cmd_per_lun is just 3.
      
      So lots of requests may stay in sw queue, and we always flush all
      belonging to same hw queue and dispatch them all to driver.
      Unfortunately it is easy to cause queue busy because of the small
      .cmd_per_lun.  Once these requests are flushed out, they have to stay in
      hctx->dispatch, and no bio merge can happen on these requests, and
      sequential IO performance is harmed.
      
      This patch introduces blk_mq_dequeue_from_ctx for dequeuing a request
      from a sw queue, so that we can dispatch them in scheduler's way. We can
      then avoid dequeueing too many requests from sw queue, since we don't
      flush ->dispatch completely.
      
      This patch improves dispatching from sw queue by using the .get_budget
      and .put_budget callbacks.
      Reviewed-by: NOmar Sandoval <osandov@fb.com>
      Signed-off-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      b347689f
    • M
      blk-mq: introduce .get_budget and .put_budget in blk_mq_ops · de148297
      Ming Lei 提交于
      For SCSI devices, there is often a per-request-queue depth, which needs
      to be respected before queuing one request.
      
      Currently blk-mq always dequeues the request first, then calls
      .queue_rq() to dispatch the request to lld. One obvious issue with this
      approach is that I/O merging may not be successful, because when the
      per-request-queue depth can't be respected, .queue_rq() has to return
      BLK_STS_RESOURCE, and then this request has to stay in hctx->dispatch
      list. This means it never gets a chance to be merged with other IO.
      
      This patch introduces .get_budget and .put_budget callback in blk_mq_ops,
      then we can try to get reserved budget first before dequeuing request.
      If the budget for queueing I/O can't be satisfied, we don't need to
      dequeue request at all. Hence the request can be left in the IO
      scheduler queue, for more merging opportunities.
      Signed-off-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      de148297
  2. 18 8月, 2017 1 次提交
  3. 22 6月, 2017 1 次提交
  4. 21 6月, 2017 3 次提交
  5. 20 6月, 2017 1 次提交
    • I
      sched/wait: Rename wait_queue_t => wait_queue_entry_t · ac6424b9
      Ingo Molnar 提交于
      Rename:
      
      	wait_queue_t		=>	wait_queue_entry_t
      
      'wait_queue_t' was always a slight misnomer: its name implies that it's a "queue",
      but in reality it's a queue *entry*. The 'real' queue is the wait queue head,
      which had to carry the name.
      
      Start sorting this out by renaming it to 'wait_queue_entry_t'.
      
      This also allows the real structure name 'struct __wait_queue' to
      lose its double underscore and become 'struct wait_queue_entry',
      which is the more canonical nomenclature for such data types.
      
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      ac6424b9
  6. 19 6月, 2017 5 次提交
  7. 09 6月, 2017 2 次提交
    • C
      blk-mq: switch ->queue_rq return value to blk_status_t · fc17b653
      Christoph Hellwig 提交于
      Use the same values for use for request completion errors as the return
      value from ->queue_rq.  BLK_STS_RESOURCE is special cased to cause
      a requeue, and all the others are completed as-is.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      fc17b653
    • C
      block: introduce new block status code type · 2a842aca
      Christoph Hellwig 提交于
      Currently we use nornal Linux errno values in the block layer, and while
      we accept any error a few have overloaded magic meanings.  This patch
      instead introduces a new  blk_status_t value that holds block layer specific
      status codes and explicitly explains their meaning.  Helpers to convert from
      and to the previous special meanings are provided for now, but I suspect
      we want to get rid of them in the long run - those drivers that have a
      errno input (e.g. networking) usually get errnos that don't know about
      the special block layer overloads, and similarly returning them to userspace
      will usually return somethings that strictly speaking isn't correct
      for file system operations, but that's left as an exercise for later.
      
      For now the set of errors is a very limited set that closely corresponds
      to the previous overloaded errno values, but there is some low hanging
      fruite to improve it.
      
      blk_status_t (ab)uses the sparse __bitwise annotations to allow for sparse
      typechecking, so that we can easily catch places passing the wrong values.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      2a842aca
  8. 23 5月, 2017 1 次提交
  9. 04 5月, 2017 2 次提交
  10. 02 5月, 2017 1 次提交
  11. 28 4月, 2017 2 次提交
    • J
      blk-mq: unify hctx delay_work and run_work · 21c6e939
      Jens Axboe 提交于
      The only difference between ->run_work and ->delay_work, is that
      the latter is used to defer running a queue. This is done by
      marking the queue stopped, and scheduling ->delay_work to run
      sometime in the future. While the queue is stopped, direct runs
      or runs through ->run_work will not run the queue.
      
      If we combine the handlers, then we need to handle two things:
      
      1) If a delayed/stopped run is scheduled, then we should not run
         the queue before that has been completed.
      2) If a queue is delayed/stopped, the handler needs to restart
         the queue. Normally a run of a queue with the stopped bit set
         would be a no-op.
      
      Case 1 is handled by modifying a currently pending queue run
      to the deadline set by the caller of blk_mq_delay_queue().
      Subsequent attempts to queue a queue run will find the work
      item already pending, and direct runs will see a stopped queue
      as before.
      
      Case 2 is handled by adding a new bit, BLK_MQ_S_START_ON_RUN,
      that tells the work handler that it should clear a stopped
      queue and run the handler.
      Reviewed-by: NBart Van Assche <Bart.VanAssche@sandisk.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      21c6e939
    • J
      blk-mq: unify hctx delayed_run_work and run_work · 9f993737
      Jens Axboe 提交于
      They serve the exact same purpose. Get rid of the non-delayed
      work variant, and just run it without delay for the normal case.
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NBart Van Assche <Bart.VanAssche@sandisk.com>
      Reviewed-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      9f993737
  12. 27 4月, 2017 1 次提交
  13. 21 4月, 2017 1 次提交
  14. 15 4月, 2017 1 次提交
  15. 08 4月, 2017 1 次提交
  16. 05 4月, 2017 1 次提交
  17. 29 3月, 2017 1 次提交
  18. 23 3月, 2017 1 次提交
  19. 02 3月, 2017 2 次提交
  20. 24 2月, 2017 1 次提交
    • O
      blk-mq: use sbq wait queues instead of restart for driver tags · da55f2cc
      Omar Sandoval 提交于
      Commit 50e1dab8 ("blk-mq-sched: fix starvation for multiple hardware
      queues and shared tags") fixed one starvation issue for shared tags.
      However, we can still get into a situation where we fail to allocate a
      tag because all tags are allocated but we don't have any pending
      requests on any hardware queue.
      
      One solution for this would be to restart all queues that share a tag
      map, but that really sucks. Ideally, we could just block and wait for a
      tag, but that isn't always possible from blk_mq_dispatch_rq_list().
      
      However, we can still use the struct sbitmap_queue wait queues with a
      custom callback instead of blocking. This has a few benefits:
      
      1. It avoids iterating over all hardware queues when completing an I/O,
         which the current restart code has to do.
      2. It benefits from the existing rolling wakeup code.
      3. It avoids punting to another thread just to have it block.
      Signed-off-by: NOmar Sandoval <osandov@fb.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      da55f2cc
  21. 18 1月, 2017 3 次提交
  22. 12 1月, 2017 1 次提交
  23. 10 12月, 2016 1 次提交
  24. 09 11月, 2016 1 次提交
  25. 03 11月, 2016 3 次提交