1. 09 3月, 2022 1 次提交
    • Z
      block: add a switch for precise iostat accounting · 69e55430
      Zhang Wensheng 提交于
      hulk inclusion
      category: bugfix
      bugzilla: 39265, https://gitee.com/openeuler/kernel/issues/I4WC06
      CVE: NA
      
      -----------------------------------------------
      
      When the inflight IOs are slow and no new IOs are issued, we expect
      iostat could manifest the IO hang problem. However after
      commit 9c6dea45 ("block: delete part_round_stats and switch to less
      precise counting"), io_tick and time_in_queue will not be updated until
      the end of IO, and the avgqu-sz and %util columns of iostat will be zero.
      
      To fix it, we could fallback to the implementation before commit
      9c6dea45, but it may cause performance regression on NVMe device
      or bio-based device (due to overhead of inflight calculation),
      so add a switch to control whether or not to use precise iostat
      accounting. It can be enabled by adding "precise_iostat=1" in kernel
      boot cmdline. When precise accouting is enabled, io_tick and time_in_queue
      will be updated when accessing /proc/diskstats and
      /sys/block/sdX/sdXN/stat.
      
      Fixes: 9c6dea45 ("block: delete part_round_stats and switch to less precise counting")
      Signed-off-by: NZhang Wensheng <zhangwensheng5@huawei.com>
      Reviewed-by: NJason Yan <yanaijie@huawei.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      69e55430
  2. 17 1月, 2022 1 次提交
  3. 09 12月, 2021 1 次提交
  4. 30 8月, 2021 1 次提交
  5. 16 8月, 2021 2 次提交
    • Y
      blk-mq: fix kabi broken by "blk-mq: fix hang caused by freeze/unfreeze sequence" · 2fd10d61
      Yu Kuai 提交于
      hulk inclusion
      category: bugfix
      bugzilla: 173119
      CVE: NA
      
      -----------------------------------------------
      
      Add struct request_queue_wrapper to avoid kabi broken.
      Signed-off-by: NYu Kuai <yukuai3@huawei.com>
      Reviewed-by: NHou Tao <houtao1@huawei.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      2fd10d61
    • B
      blk-mq: fix hang caused by freeze/unfreeze sequence · cbd9fe6e
      Bob Liu 提交于
      mainline inclusion
      from mainline-v5.2-rc2
      commit 7996a8b5
      category: bugfix
      bugzilla: 173119
      CVE: NA
      
      -----------------------------------------------
      
      The following is a description of a hang in blk_mq_freeze_queue_wait().
      The hang happens on attempt to freeze a queue while another task does
      queue unfreeze.
      
      The root cause is an incorrect sequence of percpu_ref_resurrect() and
      percpu_ref_kill() and as a result those two can be swapped:
      
       CPU#0                         CPU#1
       ----------------              -----------------
       q1 = blk_mq_init_queue(shared_tags)
      
                                      q2 = blk_mq_init_queue(shared_tags):
                                        blk_mq_add_queue_tag_set(shared_tags):
                                          blk_mq_update_tag_set_depth(shared_tags):
      				     list_for_each_entry()
                                            blk_mq_freeze_queue(q1)
                                             > percpu_ref_kill()
                                             > blk_mq_freeze_queue_wait()
      
       blk_cleanup_queue(q1)
        blk_mq_freeze_queue(q1)
         > percpu_ref_kill()
                       ^^^^^^ freeze_depth can't guarantee the order
      
                                            blk_mq_unfreeze_queue()
                                              > percpu_ref_resurrect()
      
         > blk_mq_freeze_queue_wait()
                       ^^^^^^ Hang here!!!!
      
      This wrong sequence raises kernel warning:
      percpu_ref_kill_and_confirm called more than once on blk_queue_usage_counter_release!
      WARNING: CPU: 0 PID: 11854 at lib/percpu-refcount.c:336 percpu_ref_kill_and_confirm+0x99/0xb0
      
      But the most unpleasant effect is a hang of a blk_mq_freeze_queue_wait(),
      which waits for a zero of a q_usage_counter, which never happens
      because percpu-ref was reinited (instead of being killed) and stays in
      PERCPU state forever.
      
      How to reproduce:
       - "insmod null_blk.ko shared_tags=1 nr_devices=0 queue_mode=2"
       - cpu0: python Script.py 0; taskset the corresponding process running on cpu0
       - cpu1: python Script.py 1; taskset the corresponding process running on cpu1
      
       Script.py:
       ------
       #!/usr/bin/python3
      
      import os
      import sys
      
      while True:
          on = "echo 1 > /sys/kernel/config/nullb/%s/power" % sys.argv[1]
          off = "echo 0 > /sys/kernel/config/nullb/%s/power" % sys.argv[1]
          os.system(on)
          os.system(off)
      ------
      
      This bug was first reported and fixed by Roman, previous discussion:
      [1] Message id: 1443287365-4244-7-git-send-email-akinobu.mita@gmail.com
      [2] Message id: 1443563240-29306-6-git-send-email-tj@kernel.org
      [3] https://patchwork.kernel.org/patch/9268199/Reviewed-by: NHannes Reinecke <hare@suse.com>
      Reviewed-by: NMing Lei <ming.lei@redhat.com>
      Reviewed-by: NBart Van Assche <bvanassche@acm.org>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NRoman Pen <roman.penyaev@profitbricks.com>
      Signed-off-by: NBob Liu <bob.liu@oracle.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NYu Kuai <yukuai3@huawei.com>
      Reviewed-by: NHou Tao <houtao1@huawei.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      cbd9fe6e
  6. 22 2月, 2021 1 次提交
    • Y
      scsi: do quiesce for enclosure driver · c3adeb60
      Yufen Yu 提交于
      hulk inclusion
      category: bugfix
      bugzilla: 46860
      
      --------------------------------
      
      Drivers (such as scsi enclosure) will not call blk_register_queue()
      to do initialize for request_queue. And we rely on driver self
      to deal with the race in that case when cleanup queue.
      
      But, some self-developed drivers cannot deal the race. To avoid
      null pointer reference as following, we do quiesce in kernel.
      
      [67760.308034] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
      ...
      [67760.308069] pc : blk_mq_do_dispatch_sched+0x94/0x130
      [67760.308072] lr : blk_mq_sched_dispatch_requests+0x128/0x1f0
      [67760.308072] sp : ffff0000b2bb3ca0
      [67760.308073] x29: ffff0000b2bb3ca0 x28: 0000000000000000
      [67760.308075] x27: 0000000000000000 x26: ffff00008128a000
      [67760.308076] x25: ffff8042b13e9700 x24: 0000000000000000
      [67760.308077] x23: ffff0000b2bb3cf8 x22: ffff8042b2976808
      [67760.308078] x21: ffff0000b2bb3d58 x20: ffff00008128a000
      [67760.308079] x19: ffff8042b2976800 x18: 0000000000000000
      [67760.308080] x17: 0000000000000000 x16: 0000000000000000
      [67760.308081] x15: 0000000000000000 x14: 0000000000000000
      [67760.308082] x13: 0000000000000000 x12: 0000000000000000
      [67760.308084] x11: ffff800040801004 x10: ffff80004080100c
      [67760.308085] x9 : 0000000000000060 x8 : ffff809e58f7e500
      [67760.308087] x7 : 0000000000000000 x6 : 00000000ffffffff
      [67760.308088] x5 : ffff000080ab7550 x4 : ffff80418e84ec80
      [67760.308089] x3 : ffff0000815d7f10 x2 : 94e133dc71839c00
      [67760.308090] x1 : 0000000000000000 x0 : ffff00008128a748
      [67760.308091] Call trace:
      [67760.308093]  blk_mq_do_dispatch_sched+0x94/0x130
      [67760.308095]  blk_mq_sched_dispatch_requests+0x128/0x1f0
      [67760.308096]  __blk_mq_run_hw_queue+0x98/0x138
      [67760.308097]  blk_mq_run_work_fn+0x28/0x38
      Signed-off-by: NYufen Yu <yuyufen@huawei.com>
      Reviewed-by: NJason Yan <yanaijie@huawei.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      Signed-off-by: NCheng Jian <cj.chengjian@huawei.com>
      c3adeb60
  7. 12 3月, 2020 1 次提交
  8. 27 12月, 2019 4 次提交
  9. 12 9月, 2018 1 次提交
  10. 09 8月, 2018 1 次提交
    • B
      block: Remove two superfluous #include directives · b1f4267c
      Bart Van Assche 提交于
      Commit 12f5b931 ("blk-mq: Remove generation seqeunce") removed the
      only seqcount_t and u64_stats_sync instances from <linux/blkdev.h> but
      did not remove the corresponding #include directives. Since these
      include directives are no longer needed, remove them.
      Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Keith Busch <keith.busch@intel.com>
      Cc: Ming Lei <ming.lei@redhat.com>
      Cc: Jianchao Wang <jianchao.w.wang@oracle.com>
      Cc: Hannes Reinecke <hare@suse.com>,
      Cc: Johannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      b1f4267c
  11. 27 7月, 2018 1 次提交
  12. 18 7月, 2018 1 次提交
    • T
      block: make bdev_ops->rw_page() take a REQ_OP instead of bool · 3f289dcb
      Tejun Heo 提交于
      c11f0c0b ("block/mm: make bdev_ops->rw_page() take a bool for
      read/write") replaced @OP with boolean @is_write, which limited the
      amount of information going into ->rw_page() and more importantly
      page_endio(), which removed the need to expose block internals to mm.
      
      Unfortunately, we want to track discards separately and @is_write
      isn't enough information.  This patch updates bdev_ops->rw_page() to
      take REQ_OP instead but leaves page_endio() to take bool @is_write.
      This allows the block part of operations to have enough information
      while not leaking it to mm.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Mike Christie <mchristi@redhat.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      3f289dcb
  13. 13 7月, 2018 1 次提交
  14. 09 7月, 2018 5 次提交
  15. 27 6月, 2018 1 次提交
  16. 15 6月, 2018 1 次提交
  17. 14 6月, 2018 1 次提交
  18. 31 5月, 2018 1 次提交
  19. 29 5月, 2018 5 次提交
  20. 14 5月, 2018 1 次提交
  21. 09 5月, 2018 3 次提交
  22. 19 4月, 2018 1 次提交
    • B
      scsi: sd_zbc: Avoid that resetting a zone fails sporadically · ccce20fc
      Bart Van Assche 提交于
      Since SCSI scanning occurs asynchronously, since sd_revalidate_disk() is
      called from sd_probe_async() and since sd_revalidate_disk() calls
      sd_zbc_read_zones() it can happen that sd_zbc_read_zones() is called
      concurrently with blkdev_report_zones() and/or blkdev_reset_zones().  That can
      cause these functions to fail with -EIO because sd_zbc_read_zones() e.g. sets
      q->nr_zones to zero before restoring it to the actual value, even if no drive
      characteristics have changed.  Avoid that this can happen by making the
      following changes:
      
      - Protect the code that updates zone information with blk_queue_enter()
        and blk_queue_exit().
      - Modify sd_zbc_setup_seq_zones_bitmap() and sd_zbc_setup() such that
        these functions do not modify struct scsi_disk before all zone
        information has been obtained.
      
      Note: since commit 055f6e18 ("block: Make q_usage_counter also track
      legacy requests"; kernel v4.15) the request queue freezing mechanism also
      affects legacy request queues.
      
      Fixes: 89d94756 ("sd: Implement support for ZBC devices")
      Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Damien Le Moal <damien.lemoal@wdc.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Hannes Reinecke <hare@suse.com>
      Cc: stable@vger.kernel.org # v4.16
      Reviewed-by: NDamien Le Moal <damien.lemoal@wdc.com>
      Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
      ccce20fc
  23. 18 4月, 2018 1 次提交
  24. 18 3月, 2018 1 次提交
    • B
      block: Move SECTOR_SIZE and SECTOR_SHIFT definitions into <linux/blkdev.h> · 233bde21
      Bart Van Assche 提交于
      It happens often while I'm preparing a patch for a block driver that
      I'm wondering: is a definition of SECTOR_SIZE and/or SECTOR_SHIFT
      available for this driver? Do I have to introduce definitions of these
      constants before I can use these constants? To avoid this confusion,
      move the existing definitions of SECTOR_SIZE and SECTOR_SHIFT into the
      <linux/blkdev.h> header file such that these become available for all
      block drivers. Make the SECTOR_SIZE definition in the uapi msdos_fs.h
      header file conditional to avoid that including that header file after
      <linux/blkdev.h> causes the compiler to complain about a SECTOR_SIZE
      redefinition.
      
      Note: the SECTOR_SIZE / SECTOR_SHIFT / SECTOR_BITS definitions have
      not been removed from uapi header files nor from NAND drivers in
      which these constants are used for another purpose than converting
      block layer offsets and sizes into a number of sectors.
      
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Mike Snitzer <snitzer@redhat.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Nitin Gupta <ngupta@vflare.org>
      Reviewed-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
      Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      233bde21
  25. 09 3月, 2018 2 次提交