1. 28 10月, 2016 1 次提交
  2. 24 9月, 2016 1 次提交
  3. 23 9月, 2016 3 次提交
  4. 22 9月, 2016 3 次提交
  5. 17 9月, 2016 3 次提交
  6. 15 9月, 2016 7 次提交
  7. 29 8月, 2016 2 次提交
  8. 25 8月, 2016 2 次提交
    • J
      blk-mq: improve warning for running a queue on the wrong CPU · 0e87e58b
      Jens Axboe 提交于
      __blk_mq_run_hw_queue() currently warns if we are running the queue on a
      CPU that isn't set in its mask. However, this can happen if a CPU is
      being offlined, and the workqueue handling will place the work on CPU0
      instead. Improve the warning so that it only triggers if the batch cpu
      in the hardware queue is currently online.  If it triggers for that
      case, then it's indicative of a flow problem in blk-mq, so we want to
      retain it for that case.
      Signed-off-by: NJens Axboe <axboe@fb.com>
      0e87e58b
    • J
      blk-mq: don't overwrite rq->mq_ctx · e57690fe
      Jens Axboe 提交于
      We do this in a few places, if the CPU is offline. This isn't allowed,
      though, since on multi queue hardware, we can't just move a request
      from one software queue to another, if they map to different hardware
      queues. The request and tag isn't valid on another hardware queue.
      
      This can happen if plugging races with CPU offlining. But it does
      no harm, since it can only happen in the window where we are
      currently busy freezing the queue and flushing IO, in preparation
      for redoing the software <-> hardware queue mappings.
      Signed-off-by: NJens Axboe <axboe@fb.com>
      e57690fe
  9. 08 8月, 2016 1 次提交
    • J
      block: rename bio bi_rw to bi_opf · 1eff9d32
      Jens Axboe 提交于
      Since commit 63a4cc24, bio->bi_rw contains flags in the lower
      portion and the op code in the higher portions. This means that
      old code that relies on manually setting bi_rw is most likely
      going to be broken. Instead of letting that brokeness linger,
      rename the member, to force old and out-of-tree code to break
      at compile time instead of at runtime.
      
      No intended functional changes in this commit.
      Signed-off-by: NJens Axboe <axboe@fb.com>
      1eff9d32
  10. 05 8月, 2016 1 次提交
    • G
      blk-mq: Allow timeouts to run while queue is freezing · 71f79fb3
      Gabriel Krisman Bertazi 提交于
      In case a submitted request gets stuck for some reason, the block layer
      can prevent the request starvation by starting the scheduled timeout work.
      If this stuck request occurs at the same time another thread has started
      a queue freeze, the blk_mq_timeout_work will not be able to acquire the
      queue reference and will return silently, thus not issuing the timeout.
      But since the request is already holding a q_usage_counter reference and
      is unable to complete, it will never release its reference, preventing
      the queue from completing the freeze started by first thread.  This puts
      the request_queue in a hung state, forever waiting for the freeze
      completion.
      
      This was observed while running IO to a NVMe device at the same time we
      toggled the CPU hotplug code. Eventually, once a request got stuck
      requiring a timeout during a queue freeze, we saw the CPU Hotplug
      notification code get stuck inside blk_mq_freeze_queue_wait, as shown in
      the trace below.
      
      [c000000deaf13690] [c000000deaf13738] 0xc000000deaf13738 (unreliable)
      [c000000deaf13860] [c000000000015ce8] __switch_to+0x1f8/0x350
      [c000000deaf138b0] [c000000000ade0e4] __schedule+0x314/0x990
      [c000000deaf13940] [c000000000ade7a8] schedule+0x48/0xc0
      [c000000deaf13970] [c0000000005492a4] blk_mq_freeze_queue_wait+0x74/0x110
      [c000000deaf139e0] [c00000000054b6a8] blk_mq_queue_reinit_notify+0x1a8/0x2e0
      [c000000deaf13a40] [c0000000000e7878] notifier_call_chain+0x98/0x100
      [c000000deaf13a90] [c0000000000b8e08] cpu_notify_nofail+0x48/0xa0
      [c000000deaf13ac0] [c0000000000b92f0] _cpu_down+0x2a0/0x400
      [c000000deaf13b90] [c0000000000b94a8] cpu_down+0x58/0xa0
      [c000000deaf13bc0] [c0000000006d5dcc] cpu_subsys_offline+0x2c/0x50
      [c000000deaf13bf0] [c0000000006cd244] device_offline+0x104/0x140
      [c000000deaf13c30] [c0000000006cd40c] online_store+0x6c/0xc0
      [c000000deaf13c80] [c0000000006c8c78] dev_attr_store+0x68/0xa0
      [c000000deaf13cc0] [c0000000003974d0] sysfs_kf_write+0x80/0xb0
      [c000000deaf13d00] [c0000000003963e8] kernfs_fop_write+0x188/0x200
      [c000000deaf13d50] [c0000000002e0f6c] __vfs_write+0x6c/0xe0
      [c000000deaf13d90] [c0000000002e1ca0] vfs_write+0xc0/0x230
      [c000000deaf13de0] [c0000000002e2cdc] SyS_write+0x6c/0x110
      [c000000deaf13e30] [c000000000009204] system_call+0x38/0xb4
      
      The fix is to allow the timeout work to execute in the window between
      dropping the initial refcount reference and the release of the last
      reference, which actually marks the freeze completion.  This can be
      achieved with percpu_refcount_tryget, which does not require the counter
      to be alive.  This way the timeout work can do it's job and terminate a
      stuck request even during a freeze, returning its reference and avoiding
      the deadlock.
      
      Allowing the timeout to run is just a part of the fix, since for some
      devices, we might get stuck again inside the device driver's timeout
      handler, should it attempt to allocate a new request in that path -
      which is a quite common action for Abort commands, which need to be sent
      after a timeout.  In NVMe, for instance, we call blk_mq_alloc_request
      from inside the timeout handler, which will fail during a freeze, since
      it also tries to acquire a queue reference.
      
      I considered a similar change to blk_mq_alloc_request as a generic
      solution for further device driver hangs, but we can't do that, since it
      would allow new requests to disturb the freeze process.  I thought about
      creating a new function in the block layer to support unfreezable
      requests for these occasions, but after working on it for a while, I
      feel like this should be handled in a per-driver basis.  I'm now
      experimenting with changes to the NVMe timeout path, but I'm open to
      suggestions of ways to make this generic.
      Signed-off-by: NGabriel Krisman Bertazi <krisman@linux.vnet.ibm.com>
      Cc: Brian King <brking@linux.vnet.ibm.com>
      Cc: Keith Busch <keith.busch@intel.com>
      Cc: linux-nvme@lists.infradead.org
      Cc: linux-block@vger.kernel.org
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      71f79fb3
  11. 21 7月, 2016 1 次提交
  12. 06 7月, 2016 1 次提交
  13. 09 6月, 2016 1 次提交
    • O
      blk-mq: actually hook up defer list when running requests · 52b9c330
      Omar Sandoval 提交于
      If ->queue_rq() returns BLK_MQ_RQ_QUEUE_OK, we use continue and skip
      over the rest of the loop body. However, dptr is assigned later in the
      loop body, and the BLK_MQ_RQ_QUEUE_OK case is exactly the case that we'd
      want it for.
      
      NVMe isn't actually using BLK_MQ_F_DEFER_ISSUE yet, nor is any other
      in-tree driver, but if the code's going to be there, it might as well
      work.
      
      Fixes: 74c45052 ("blk-mq: add a 'list' parameter to ->queue_rq()")
      Signed-off-by: NOmar Sandoval <osandov@fb.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      52b9c330
  14. 08 6月, 2016 3 次提交
  15. 03 6月, 2016 1 次提交
  16. 26 5月, 2016 1 次提交
  17. 16 5月, 2016 1 次提交
  18. 03 5月, 2016 1 次提交
  19. 20 3月, 2016 1 次提交
  20. 16 3月, 2016 1 次提交
  21. 04 3月, 2016 1 次提交
  22. 15 2月, 2016 1 次提交
    • M
      blk-mq: mark request queue as mq asap · 66841672
      Ming Lei 提交于
      Currently q->mq_ops is used widely to decide if the queue
      is mq or not, so we should set the 'flag' asap so that both
      block core and drivers can get the correct mq info.
      
      For example, commit 868f2f0b(blk-mq: dynamic h/w context count)
      moves the hctx's initialization before setting q->mq_ops in
      blk_mq_init_allocated_queue(), then cause blk_alloc_flush_queue()
      to think the queue is non-mq and don't allocate command size
      for the per-hctx flush rq.
      
      This patches should fix the problem reported by Sasha.
      
      Cc: Keith Busch <keith.busch@intel.com>
      Reported-by: NSasha Levin <sasha.levin@oracle.com>
      Signed-off-by: NMing Lei <tom.leiming@gmail.com>
      Fixes: 868f2f0b ("blk-mq: dynamic h/w context count")
      Signed-off-by: NJens Axboe <axboe@fb.com>
      66841672
  23. 12 2月, 2016 1 次提交
  24. 10 2月, 2016 1 次提交
    • K
      blk-mq: dynamic h/w context count · 868f2f0b
      Keith Busch 提交于
      The hardware's provided queue count may change at runtime with resource
      provisioning. This patch allows a block driver to alter the number of
      h/w queues available when its resource count changes.
      
      The main part is a new blk-mq API to request a new number of h/w queues
      for a given live tag set. The new API freezes all queues using that set,
      then adjusts the allocated count prior to remapping these to CPUs.
      
      The bulk of the rest just shifts where h/w contexts and all their
      artifacts are allocated and freed.
      
      The number of max h/w contexts is capped to the number of possible cpus
      since there is no use for more than that. As such, all pre-allocated
      memory for pointers need to account for the max possible rather than
      the initial number of queues.
      
      A side effect of this is that the blk-mq will proceed successfully as
      long as it can allocate at least one h/w context. Previously it would
      fail request queue initialization if less than the requested number
      was allocated.
      Signed-off-by: NKeith Busch <keith.busch@intel.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Tested-by: NJon Derrick <jonathan.derrick@intel.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      868f2f0b