1. 09 3月, 2017 2 次提交
    • M
      blk-mq: make lifetime consitent between q/ctx and its kobject · 7ea5fe31
      Ming Lei 提交于
      Currently from kobject view, both q->mq_kobj and ctx->kobj can
      be released during one cycle of blk_mq_register_dev() and
      blk_mq_unregister_dev(). Actually, sw queue's lifetime is
      same with its request queue's, which is covered by request_queue->kobj.
      
      So we don't need to call kobject_put() for the two kinds of
      kobject in __blk_mq_unregister_dev(), instead we do that
      in release handler of request queue.
      Signed-off-by: NMing Lei <tom.leiming@gmail.com>
      Tested-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      7ea5fe31
    • M
      blk-mq: initialize mq kobjects in blk_mq_init_allocated_queue() · 737f98cf
      Ming Lei 提交于
      Both q->mq_kobj and sw queues' kobjects should have been initialized
      once, instead of doing that each add_disk context.
      
      Also this patch removes clearing of ctx in blk_mq_init_cpu_queues()
      because percpu allocator fills zero to allocated variable.
      
      This patch fixes one issue[1] reported from Omar.
      
      [1] kernel wearning when doing unbind/bind on one scsi-mq device
      
      [   19.347924] kobject (ffff8800791ea0b8): tried to init an initialized object, something is seriously wrong.
      [   19.349781] CPU: 1 PID: 84 Comm: kworker/u8:1 Not tainted 4.10.0-rc7-00210-g53f39eeaa263 #34
      [   19.350686] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.1-20161122_114906-anatol 04/01/2014
      [   19.350920] Workqueue: events_unbound async_run_entry_fn
      [   19.350920] Call Trace:
      [   19.350920]  dump_stack+0x63/0x83
      [   19.350920]  kobject_init+0x77/0x90
      [   19.350920]  blk_mq_register_dev+0x40/0x130
      [   19.350920]  blk_register_queue+0xb6/0x190
      [   19.350920]  device_add_disk+0x1ec/0x4b0
      [   19.350920]  sd_probe_async+0x10d/0x1c0 [sd_mod]
      [   19.350920]  async_run_entry_fn+0x48/0x150
      [   19.350920]  process_one_work+0x1d0/0x480
      [   19.350920]  worker_thread+0x48/0x4e0
      [   19.350920]  kthread+0x101/0x140
      [   19.350920]  ? process_one_work+0x480/0x480
      [   19.350920]  ? kthread_create_on_node+0x60/0x60
      [   19.350920]  ret_from_fork+0x2c/0x40
      
      Cc: Omar Sandoval <osandov@osandov.com>
      Signed-off-by: NMing Lei <tom.leiming@gmail.com>
      Tested-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      737f98cf
  2. 02 3月, 2017 1 次提交
  3. 03 2月, 2017 1 次提交
  4. 28 1月, 2017 2 次提交
  5. 27 1月, 2017 2 次提交
  6. 18 1月, 2017 4 次提交
  7. 10 12月, 2016 1 次提交
  8. 11 11月, 2016 1 次提交
    • J
      block: add scalable completion tracking of requests · cf43e6be
      Jens Axboe 提交于
      For legacy block, we simply track them in the request queue. For
      blk-mq, we track them on a per-sw queue basis, which we can then
      sum up through the hardware queues and finally to a per device
      state.
      
      The stats are tracked in, roughly, 0.1s interval windows.
      
      Add sysfs files to display the stats.
      
      The feature is off by default, to avoid any extra overhead. In-kernel
      users of it can turn it on by setting QUEUE_FLAG_STATS in the queue
      flags. We currently don't turn it on if someone just reads any of
      the stats files, that is something we could add as well.
      Signed-off-by: NJens Axboe <axboe@fb.com>
      cf43e6be
  9. 09 11月, 2016 1 次提交
  10. 03 11月, 2016 1 次提交
  11. 22 9月, 2016 1 次提交
  12. 17 9月, 2016 2 次提交
  13. 15 9月, 2016 2 次提交
  14. 10 2月, 2016 1 次提交
    • K
      blk-mq: dynamic h/w context count · 868f2f0b
      Keith Busch 提交于
      The hardware's provided queue count may change at runtime with resource
      provisioning. This patch allows a block driver to alter the number of
      h/w queues available when its resource count changes.
      
      The main part is a new blk-mq API to request a new number of h/w queues
      for a given live tag set. The new API freezes all queues using that set,
      then adjusts the allocated count prior to remapping these to CPUs.
      
      The bulk of the rest just shifts where h/w contexts and all their
      artifacts are allocated and freed.
      
      The number of max h/w contexts is capped to the number of possible cpus
      since there is no use for more than that. As such, all pre-allocated
      memory for pointers need to account for the max possible rather than
      the initial number of queues.
      
      A side effect of this is that the blk-mq will proceed successfully as
      long as it can allocate at least one h/w context. Previously it would
      fail request queue initialization if less than the requested number
      was allocated.
      Signed-off-by: NKeith Busch <keith.busch@intel.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Tested-by: NJon Derrick <jonathan.derrick@intel.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      868f2f0b
  15. 02 12月, 2015 1 次提交
  16. 12 11月, 2015 1 次提交
  17. 10 10月, 2015 1 次提交
  18. 30 9月, 2015 1 次提交
    • A
      blk-mq: avoid inserting requests before establishing new mapping · 5778322e
      Akinobu Mita 提交于
      Notifier callbacks for CPU_ONLINE action can be run on the other CPU
      than the CPU which was just onlined.  So it is possible for the
      process running on the just onlined CPU to insert request and run
      hw queue before establishing new mapping which is done by
      blk_mq_queue_reinit_notify().
      
      This can cause a problem when the CPU has just been onlined first time
      since the request queue was initialized.  At this time ctx->index_hw
      for the CPU, which is the index in hctx->ctxs[] for this ctx, is still
      zero before blk_mq_queue_reinit_notify() is called by notifier
      callbacks for CPU_ONLINE action.
      
      For example, there is a single hw queue (hctx) and two CPU queues
      (ctx0 for CPU0, and ctx1 for CPU1).  Now CPU1 is just onlined and
      a request is inserted into ctx1->rq_list and set bit0 in pending
      bitmap as ctx1->index_hw is still zero.
      
      And then while running hw queue, flush_busy_ctxs() finds bit0 is set
      in pending bitmap and tries to retrieve requests in
      hctx->ctxs[0]->rq_list.  But htx->ctxs[0] is a pointer to ctx0, so the
      request in ctx1->rq_list is ignored.
      
      Fix it by ensuring that new mapping is established before onlined cpu
      starts running.
      Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com>
      Reviewed-by: NMing Lei <tom.leiming@gmail.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Ming Lei <tom.leiming@gmail.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      5778322e
  19. 30 1月, 2015 1 次提交
  20. 01 1月, 2015 1 次提交
  21. 09 12月, 2014 1 次提交
  22. 26 9月, 2014 2 次提交
  23. 23 9月, 2014 1 次提交
  24. 02 7月, 2014 1 次提交
    • T
      blk-mq: decouble blk-mq freezing from generic bypassing · 780db207
      Tejun Heo 提交于
      blk_mq freezing is entangled with generic bypassing which bypasses
      blkcg and io scheduler and lets IO requests fall through the block
      layer to the drivers in FIFO order.  This allows forward progress on
      IOs with the advanced features disabled so that those features can be
      configured or altered without worrying about stalling IO which may
      lead to deadlock through memory allocation.
      
      However, generic bypassing doesn't quite fit blk-mq.  blk-mq currently
      doesn't make use of blkcg or ioscheds and it maps bypssing to
      freezing, which blocks request processing and drains all the in-flight
      ones.  This causes problems as bypassing assumes that request
      processing is online.  blk-mq works around this by conditionally
      allowing request processing for the problem case - during queue
      initialization.
      
      Another weirdity is that except for during queue cleanup, bypassing
      started on the generic side prevents blk-mq from processing new
      requests but doesn't drain the in-flight ones.  This shouldn't break
      anything but again highlights that something isn't quite right here.
      
      The root cause is conflating blk-mq freezing and generic bypassing
      which are two different mechanisms.  The only intersecting purpose
      that they serve is during queue cleanup.  Let's properly separate
      blk-mq freezing from generic bypassing and simply use it where
      necessary.
      
      * request_queue->mq_freeze_depth is added and
        blk_mq_[un]freeze_queue() now operate on this counter instead of
        ->bypass_depth.  The replacement for QUEUE_FLAG_BYPASS isn't added
        but the counter is tested directly.  This will be further updated by
        later changes.
      
      * blk_mq_drain_queue() is dropped and "__" prefix is dropped from
        blk_mq_freeze_queue().  Queue cleanup path now calls
        blk_mq_freeze_queue() directly.
      
      * blk_queue_enter()'s fast path condition is simplified to simply
        check @q->mq_freeze_depth.  Previously, the condition was
      
      	!blk_queue_dying(q) &&
      	    (!blk_queue_bypass(q) || !blk_queue_init_done(q))
      
        mq_freeze_depth is incremented right after dying is set and
        blk_queue_init_done() exception isn't necessary as blk-mq doesn't
        start frozen, which only leaves the blk_queue_bypass() test which
        can be replaced by @q->mq_freeze_depth test.
      
      This change simplifies the code and reduces confusion in the area.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Nicholas A. Bellinger <nab@linux-iscsi.org>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      780db207
  25. 04 6月, 2014 2 次提交
  26. 30 5月, 2014 1 次提交
    • J
      blk-mq: make the sysfs mq/ layout reflect current mappings · 67aec14c
      Jens Axboe 提交于
      Currently blk-mq registers all the hardware queues in sysfs,
      regardless of whether it uses them (e.g. they have CPU mappings)
      or not. The unused hardware queues lack the cpux/ directories,
      and the other sysfs entries (like active, pending, etc) are all
      zeroes.
      
      Change this so that sysfs correctly reflects the current mappings
      of the hardware queues.
      Signed-off-by: NJens Axboe <axboe@fb.com>
      67aec14c
  27. 28 5月, 2014 1 次提交
  28. 22 5月, 2014 1 次提交
  29. 21 5月, 2014 1 次提交
    • J
      blk-mq: allow changing of queue depth through sysfs · e3a2b3f9
      Jens Axboe 提交于
      For request_fn based devices, the block layer exports a 'nr_requests'
      file through sysfs to allow adjusting of queue depth on the fly.
      Currently this returns -EINVAL for blk-mq, since it's not wired up.
      Wire this up for blk-mq, so that it now also always dynamic
      adjustments of the allowed queue depth for any given block device
      managed by blk-mq.
      Signed-off-by: NJens Axboe <axboe@fb.com>
      e3a2b3f9
  30. 20 5月, 2014 1 次提交