1. 13 9月, 2021 1 次提交
    • M
      blk-mq: avoid to iterate over stale request · 67f3b2f8
      Ming Lei 提交于
      blk-mq can't run allocating driver tag and updating ->rqs[tag]
      atomically, meantime blk-mq doesn't clear ->rqs[tag] after the driver
      tag is released.
      
      So there is chance to iterating over one stale request just after the
      tag is allocated and before updating ->rqs[tag].
      
      scsi_host_busy_iter() calls scsi_host_check_in_flight() to count scsi
      in-flight requests after scsi host is blocked, so no new scsi command can
      be marked as SCMD_STATE_INFLIGHT. However, driver tag allocation still can
      be run by blk-mq core. One request is marked as SCMD_STATE_INFLIGHT,
      but this request may have been kept in another slot of ->rqs[], meantime
      the slot can be allocated out but ->rqs[] isn't updated yet. Then this
      in-flight request is counted twice as SCMD_STATE_INFLIGHT. This way causes
      trouble in handling scsi error.
      
      Fixes the issue by not iterating over stale request.
      
      Cc: linux-scsi@vger.kernel.org
      Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
      Reported-by: Nluojiaxing <luojiaxing@huawei.com>
      Signed-off-by: NMing Lei <ming.lei@redhat.com>
      Link: https://lore.kernel.org/r/20210906065003.439019-1-ming.lei@redhat.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
      67f3b2f8
  2. 24 5月, 2021 4 次提交
  3. 06 4月, 2021 1 次提交
  4. 26 3月, 2021 1 次提交
  5. 29 9月, 2020 1 次提交
  6. 11 9月, 2020 1 次提交
    • M
      blk-mq: always allow reserved allocation in hctx_may_queue · 28500850
      Ming Lei 提交于
      NVMe shares tagset between fabric queue and admin queue or between
      connect_q and NS queue, so hctx_may_queue() can be called to allocate
      request for these queues.
      
      Tags can be reserved in these tagset. Before error recovery, there is
      often lots of in-flight requests which can't be completed, and new
      reserved request may be needed in error recovery path. However,
      hctx_may_queue() can always return false because there is too many
      in-flight requests which can't be completed during error handling.
      Finally, nothing can proceed.
      
      Fix this issue by always allowing reserved tag allocation in
      hctx_may_queue(). This is reasonable because reserved tags are supposed
      to always be available.
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NHannes Reinecke <hare@suse.de>
      Cc: David Milburn <dmilburn@redhat.com>
      Cc: Ewan D. Milne <emilne@redhat.com>
      Signed-off-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      28500850
  7. 04 9月, 2020 5 次提交
  8. 01 7月, 2020 1 次提交
  9. 29 6月, 2020 1 次提交
  10. 15 6月, 2020 1 次提交
  11. 07 6月, 2020 2 次提交
  12. 30 5月, 2020 4 次提交
  13. 27 2月, 2020 1 次提交
  14. 14 11月, 2019 1 次提交
  15. 05 8月, 2019 1 次提交
    • M
      blk-mq: introduce blk_mq_tagset_wait_completed_request() · f9934a80
      Ming Lei 提交于
      blk-mq may schedule to call queue's complete function on remote CPU via
      IPI, but doesn't provide any way to synchronize the request's complete
      fn. The current queue freeze interface can't provide the synchonization
      because aborted requests stay at blk-mq queues during EH.
      
      In some driver's EH(such as NVMe), hardware queue's resource may be freed &
      re-allocated. If the completed request's complete fn is run finally after the
      hardware queue's resource is released, kernel crash will be triggered.
      
      Prepare for fixing this kind of issue by introducing
      blk_mq_tagset_wait_completed_request().
      
      Cc: Max Gurtovoy <maxg@mellanox.com>
      Cc: Sagi Grimberg <sagi@grimberg.me>
      Cc: Keith Busch <keith.busch@intel.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      f9934a80
  16. 03 7月, 2019 1 次提交
  17. 01 5月, 2019 1 次提交
  18. 01 2月, 2019 1 次提交
  19. 01 12月, 2018 1 次提交
    • J
      sbitmap: optimize wakeup check · 5d2ee712
      Jens Axboe 提交于
      Even if we have no waiters on any of the sbitmap_queue wait states, we
      still have to loop every entry to check. We do this for every IO, so
      the cost adds up.
      
      Shift a bit of the cost to the slow path, when we actually have waiters.
      Wrap prepare_to_wait_exclusive() and finish_wait(), so we can maintain
      an internal count of how many are currently active. Then we can simply
      check this count in sbq_wake_ptr() and not have to loop if we don't
      have any sleepers.
      
      Convert the two users of sbitmap with waiting, blk-mq-tag and iSCSI.
      Reviewed-by: NOmar Sandoval <osandov@fb.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      5d2ee712
  20. 09 11月, 2018 2 次提交
  21. 08 11月, 2018 3 次提交
  22. 26 9月, 2018 1 次提交
  23. 22 9月, 2018 1 次提交
  24. 21 8月, 2018 1 次提交
    • J
      blk-mq: sync the update nr_hw_queues with blk_mq_queue_tag_busy_iter · f5bbbbe4
      Jianchao Wang 提交于
      For blk-mq, part_in_flight/rw will invoke blk_mq_in_flight/rw to
      account the inflight requests. It will access the queue_hw_ctx and
      nr_hw_queues w/o any protection. When updating nr_hw_queues and
      blk_mq_in_flight/rw occur concurrently, panic comes up.
      
      Before update nr_hw_queues, the q will be frozen. So we could use
      q_usage_counter to avoid the race. percpu_ref_is_zero is used here
      so that we will not miss any in-flight request. The access to
      nr_hw_queues and queue_hw_ctx in blk_mq_queue_tag_busy_iter are
      under rcu critical section, __blk_mq_update_nr_hw_queues could use
      synchronize_rcu to ensure the zeroed q_usage_counter to be globally
      visible.
      Signed-off-by: NJianchao Wang <jianchao.w.wang@oracle.com>
      Reviewed-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      f5bbbbe4
  25. 09 8月, 2018 1 次提交
    • J
      blk-mq: count the hctx as active before allocating tag · d263ed99
      Jianchao Wang 提交于
      Currently, we count the hctx as active after allocate driver tag
      successfully. If a previously inactive hctx try to get tag first
      time, it may fails and need to wait. However, due to the stale tag
      ->active_queues, the other shared-tags users are still able to
      occupy all driver tags while there is someone waiting for tag.
      Consequently, even if the previously inactive hctx is waked up, it
      still may not be able to get a tag and could be starved.
      
      To fix it, we count the hctx as active before try to allocate driver
      tag, then when it is waiting the tag, the other shared-tag users
      will reserve budget for it.
      Reviewed-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NJianchao Wang <jianchao.w.wang@oracle.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      d263ed99
  26. 03 8月, 2018 1 次提交
    • M
      blk-mq: fix blk_mq_tagset_busy_iter · 2d5ba0e2
      Ming Lei 提交于
      Commit d250bf4e("blk-mq: only iterate over inflight requests
      in blk_mq_tagset_busy_iter") uses 'blk_mq_rq_state(rq) == MQ_RQ_IN_FLIGHT'
      to replace 'blk_mq_request_started(req)', this way is wrong, and causes
      lots of test system hang during booting.
      
      Fix the issue by using blk_mq_request_started(req) inside bt_tags_iter().
      
      Fixes: d250bf4e ("blk-mq: only iterate over inflight requests in blk_mq_tagset_busy_iter")
      Cc: Josef Bacik <josef@toxicpanda.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Guenter Roeck <linux@roeck-us.net>
      Cc: Mark Brown <broonie@kernel.org>
      Cc: Matt Hart <matthew.hart@linaro.org>
      Cc: Johannes Thumshirn <jthumshirn@suse.de>
      Cc: John Garry <john.garry@huawei.com>
      Cc: Hannes Reinecke <hare@suse.com>,
      Cc: "Martin K. Petersen" <martin.petersen@oracle.com>,
      Cc: James Bottomley <James.Bottomley@hansenpartnership.com>
      Cc: linux-scsi@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Reviewed-by: NBart Van Assche <bart.vanassche@wdc.com>
      Tested-by: NGuenter Roeck <linux@roeck-us.net>
      Reported-by: NMark Brown <broonie@kernel.org>
      Reported-by: NGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      2d5ba0e2