1. 12 7月, 2019 1 次提交
  2. 10 7月, 2019 1 次提交
    • D
      block: Fix potential overflow in blk_report_zones() · 113ab72e
      Damien Le Moal 提交于
      For large values of the number of zones reported and/or large zone
      sizes, the sector increment calculated with
      
      blk_queue_zone_sectors(q) * n
      
      in blk_report_zones() loop can overflow the unsigned int type used for
      the calculation as both "n" and blk_queue_zone_sectors() value are
      unsigned int. E.g. for a device with 256 MB zones (524288 sectors),
      overflow happens with 8192 or more zones reported.
      
      Changing the return type of blk_queue_zone_sectors() to sector_t, fixes
      this problem and avoids overflow problem for all other callers of this
      helper too. The same change is also applied to the bdev_zone_sectors()
      helper.
      
      Fixes: e76239a3 ("block: add a report_zones method")
      Cc: stable@vger.kernel.org
      Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      113ab72e
  3. 21 6月, 2019 3 次提交
  4. 20 6月, 2019 2 次提交
  5. 24 5月, 2019 1 次提交
    • B
      blk-mq: fix hang caused by freeze/unfreeze sequence · 7996a8b5
      Bob Liu 提交于
      The following is a description of a hang in blk_mq_freeze_queue_wait().
      The hang happens on attempt to freeze a queue while another task does
      queue unfreeze.
      
      The root cause is an incorrect sequence of percpu_ref_resurrect() and
      percpu_ref_kill() and as a result those two can be swapped:
      
       CPU#0                         CPU#1
       ----------------              -----------------
       q1 = blk_mq_init_queue(shared_tags)
      
                                      q2 = blk_mq_init_queue(shared_tags):
                                        blk_mq_add_queue_tag_set(shared_tags):
                                          blk_mq_update_tag_set_depth(shared_tags):
      				     list_for_each_entry()
                                            blk_mq_freeze_queue(q1)
                                             > percpu_ref_kill()
                                             > blk_mq_freeze_queue_wait()
      
       blk_cleanup_queue(q1)
        blk_mq_freeze_queue(q1)
         > percpu_ref_kill()
                       ^^^^^^ freeze_depth can't guarantee the order
      
                                            blk_mq_unfreeze_queue()
                                              > percpu_ref_resurrect()
      
         > blk_mq_freeze_queue_wait()
                       ^^^^^^ Hang here!!!!
      
      This wrong sequence raises kernel warning:
      percpu_ref_kill_and_confirm called more than once on blk_queue_usage_counter_release!
      WARNING: CPU: 0 PID: 11854 at lib/percpu-refcount.c:336 percpu_ref_kill_and_confirm+0x99/0xb0
      
      But the most unpleasant effect is a hang of a blk_mq_freeze_queue_wait(),
      which waits for a zero of a q_usage_counter, which never happens
      because percpu-ref was reinited (instead of being killed) and stays in
      PERCPU state forever.
      
      How to reproduce:
       - "insmod null_blk.ko shared_tags=1 nr_devices=0 queue_mode=2"
       - cpu0: python Script.py 0; taskset the corresponding process running on cpu0
       - cpu1: python Script.py 1; taskset the corresponding process running on cpu1
      
       Script.py:
       ------
       #!/usr/bin/python3
      
      import os
      import sys
      
      while True:
          on = "echo 1 > /sys/kernel/config/nullb/%s/power" % sys.argv[1]
          off = "echo 0 > /sys/kernel/config/nullb/%s/power" % sys.argv[1]
          os.system(on)
          os.system(off)
      ------
      
      This bug was first reported and fixed by Roman, previous discussion:
      [1] Message id: 1443287365-4244-7-git-send-email-akinobu.mita@gmail.com
      [2] Message id: 1443563240-29306-6-git-send-email-tj@kernel.org
      [3] https://patchwork.kernel.org/patch/9268199/Reviewed-by: NHannes Reinecke <hare@suse.com>
      Reviewed-by: NMing Lei <ming.lei@redhat.com>
      Reviewed-by: NBart Van Assche <bvanassche@acm.org>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NRoman Pen <roman.penyaev@profitbricks.com>
      Signed-off-by: NBob Liu <bob.liu@oracle.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      7996a8b5
  6. 04 5月, 2019 1 次提交
    • M
      blk-mq: always free hctx after request queue is freed · 2f8f1336
      Ming Lei 提交于
      In normal queue cleanup path, hctx is released after request queue
      is freed, see blk_mq_release().
      
      However, in __blk_mq_update_nr_hw_queues(), hctx may be freed because
      of hw queues shrinking. This way is easy to cause use-after-free,
      because: one implicit rule is that it is safe to call almost all block
      layer APIs if the request queue is alive; and one hctx may be retrieved
      by one API, then the hctx can be freed by blk_mq_update_nr_hw_queues();
      finally use-after-free is triggered.
      
      Fixes this issue by always freeing hctx after releasing request queue.
      If some hctxs are removed in blk_mq_update_nr_hw_queues(), introduce
      a per-queue list to hold them, then try to resuse these hctxs if numa
      node is matched.
      
      Cc: Dongli Zhang <dongli.zhang@oracle.com>
      Cc: James Smart <james.smart@broadcom.com>
      Cc: Bart Van Assche <bart.vanassche@wdc.com>
      Cc: linux-scsi@vger.kernel.org,
      Cc: Martin K . Petersen <martin.petersen@oracle.com>,
      Cc: Christoph Hellwig <hch@lst.de>,
      Cc: James E . J . Bottomley <jejb@linux.vnet.ibm.com>,
      Reviewed-by: NHannes Reinecke <hare@suse.com>
      Tested-by: NJames Smart <james.smart@broadcom.com>
      Signed-off-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      2f8f1336
  7. 01 5月, 2019 1 次提交
  8. 20 4月, 2019 1 次提交
  9. 05 4月, 2019 4 次提交
  10. 21 3月, 2019 1 次提交
  11. 15 2月, 2019 2 次提交
  12. 10 2月, 2019 2 次提交
    • J
      block: queue flag cleanup · eca7abf3
      Jens Axboe 提交于
      We have QUEUE_FLAG_DEFAULT defined, but it's not used anymore since
      the legacy IO stack is gone. Kill it.
      
      Sanitize the queue flags in general, they use spaces (for some
      reason), and the space is pretty sparse. With the flags renumbered,
      we can more clearly see how many we have available.
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      eca7abf3
    • J
      block: kill QUEUE_FLAG_FLUSH_NQ · d11a3998
      Jens Axboe 提交于
      We have various helpers for setting/clearing this flag, and also
      a helper to check if the queue supports queueable flushes or not.
      But nobody uses them anymore, kill it with fire.
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      d11a3998
  13. 06 2月, 2019 2 次提交
  14. 19 12月, 2018 1 次提交
  15. 17 12月, 2018 1 次提交
    • M
      blk-mq-debugfs: support rq_qos · cc56694f
      Ming Lei 提交于
      blk-mq-debugfs has been proved as very helpful for debug some
      tough issues, such as IO hang.
      
      We have seen blk-wbt related IO hang several times, even inside
      Red Hat BZ, there is such report not sovled yet, so this patch
      adds support debugfs on rq_qos.
      
      Cc: Bart Van Assche <bart.vanassche@wdc.com>
      Cc: Omar Sandoval <osandov@fb.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Josef Bacik <josef@toxicpanda.com>
      Signed-off-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      cc56694f
  16. 08 12月, 2018 1 次提交
  17. 05 12月, 2018 1 次提交
  18. 30 11月, 2018 1 次提交
  19. 27 11月, 2018 1 次提交
  20. 26 11月, 2018 2 次提交
  21. 21 11月, 2018 1 次提交
  22. 19 11月, 2018 1 次提交
    • J
      block: have ->poll_fn() return number of entries polled · 85f4d4b6
      Jens Axboe 提交于
      We currently only really support sync poll, ie poll with 1 IO in flight.
      This prepares us for supporting async poll.
      
      Note that the returned value isn't necessarily 100% accurate. If poll
      races with IRQ completion, we assume that the fact that the task is now
      runnable means we found at least one entry. In reality it could be more
      than 1, or not even 1. This is fine, the caller will just need to take
      this into account.
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      85f4d4b6
  23. 17 11月, 2018 1 次提交
  24. 16 11月, 2018 7 次提交