1. 09 6月, 2022 1 次提交
  2. 03 6月, 2022 1 次提交
    • D
      block: Fix potential deadlock in blk_ia_range_sysfs_show() · 41e46b3c
      Damien Le Moal 提交于
      When being read, a sysfs attribute is already protected against removal
      with the kobject node active reference counter. As a result, in
      blk_ia_range_sysfs_show(), there is no need to take the queue sysfs
      lock when reading the value of a range attribute. Using the queue sysfs
      lock in this function creates a potential deadlock situation with the
      disk removal, something that a lockdep signals with a splat when the
      device is removed:
      
      [  760.703551]  Possible unsafe locking scenario:
      [  760.703551]
      [  760.703554]        CPU0                    CPU1
      [  760.703556]        ----                    ----
      [  760.703558]   lock(&q->sysfs_lock);
      [  760.703565]                                lock(kn->active#385);
      [  760.703573]                                lock(&q->sysfs_lock);
      [  760.703579]   lock(kn->active#385);
      [  760.703587]
      [  760.703587]  *** DEADLOCK ***
      
      Solve this by removing the mutex_lock()/mutex_unlock() calls from
      blk_ia_range_sysfs_show().
      
      Fixes: a2247f19 ("block: Add independent access ranges support")
      Cc: stable@vger.kernel.org
      Signed-off-by: NDamien Le Moal <damien.lemoal@opensource.wdc.com>
      Link: https://lore.kernel.org/r/20220603021905.1441419-1-damien.lemoal@opensource.wdc.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
      41e46b3c
  3. 02 6月, 2022 2 次提交
  4. 30 5月, 2022 1 次提交
  5. 29 5月, 2022 1 次提交
  6. 28 5月, 2022 5 次提交
  7. 27 5月, 2022 2 次提交
    • C
      block, loop: support partitions without scanning · b9684a71
      Christoph Hellwig 提交于
      Historically we did distinguish between a flag that surpressed partition
      scanning, and a combinations of the minors variable and another flag if
      any partitions were supported.  This was generally confusing and doesn't
      make much sense, but some corner case uses of the loop driver actually
      do want to support manually added partitions on a device that does not
      actively scan for partitions.  To make things worsee the loop driver
      also wants to dynamically toggle the scanning for partitions on a live
      gendisk, which makes the disk->flags updates non-atomic.
      
      Introduce a new GD_SUPPRESS_PART_SCAN bit in disk->state that disables
      just scanning for partitions, and toggle that instead of GENHD_FL_NO_PART
      in the loop driver.
      
      Fixes: 1ebe2e5f ("block: remove GENHD_FL_EXT_DEVT")
      Reported-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NMing Lei <ming.lei@redhat.com>
      Link: https://lore.kernel.org/r/20220527055806.1972352-1-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
      b9684a71
    • T
      blk-iolatency: Fix inflight count imbalances and IO hangs on offline · 8a177a36
      Tejun Heo 提交于
      iolatency needs to track the number of inflight IOs per cgroup. As this
      tracking can be expensive, it is disabled when no cgroup has iolatency
      configured for the device. To ensure that the inflight counters stay
      balanced, iolatency_set_limit() freezes the request_queue while manipulating
      the enabled counter, which ensures that no IO is in flight and thus all
      counters are zero.
      
      Unfortunately, iolatency_set_limit() isn't the only place where the enabled
      counter is manipulated. iolatency_pd_offline() can also dec the counter and
      trigger disabling. As this disabling happens without freezing the q, this
      can easily happen while some IOs are in flight and thus leak the counts.
      
      This can be easily demonstrated by turning on iolatency on an one empty
      cgroup while IOs are in flight in other cgroups and then removing the
      cgroup. Note that iolatency shouldn't have been enabled elsewhere in the
      system to ensure that removing the cgroup disables iolatency for the whole
      device.
      
      The following keeps flipping on and off iolatency on sda:
      
        echo +io > /sys/fs/cgroup/cgroup.subtree_control
        while true; do
            mkdir -p /sys/fs/cgroup/test
            echo '8:0 target=100000' > /sys/fs/cgroup/test/io.latency
            sleep 1
            rmdir /sys/fs/cgroup/test
            sleep 1
        done
      
      and there's concurrent fio generating direct rand reads:
      
        fio --name test --filename=/dev/sda --direct=1 --rw=randread \
            --runtime=600 --time_based --iodepth=256 --numjobs=4 --bs=4k
      
      while monitoring with the following drgn script:
      
        while True:
          for css in css_for_each_descendant_pre(prog['blkcg_root'].css.address_of_()):
              for pos in hlist_for_each(container_of(css, 'struct blkcg', 'css').blkg_list):
                  blkg = container_of(pos, 'struct blkcg_gq', 'blkcg_node')
                  pd = blkg.pd[prog['blkcg_policy_iolatency'].plid]
                  if pd.value_() == 0:
                      continue
                  iolat = container_of(pd, 'struct iolatency_grp', 'pd')
                  inflight = iolat.rq_wait.inflight.counter.value_()
                  if inflight:
                      print(f'inflight={inflight} {disk_name(blkg.q.disk).decode("utf-8")} '
                            f'{cgroup_path(css.cgroup).decode("utf-8")}')
          time.sleep(1)
      
      The monitoring output looks like the following:
      
        inflight=1 sda /user.slice
        inflight=1 sda /user.slice
        ...
        inflight=14 sda /user.slice
        inflight=13 sda /user.slice
        inflight=17 sda /user.slice
        inflight=15 sda /user.slice
        inflight=18 sda /user.slice
        inflight=17 sda /user.slice
        inflight=20 sda /user.slice
        inflight=19 sda /user.slice <- fio stopped, inflight stuck at 19
        inflight=19 sda /user.slice
        inflight=19 sda /user.slice
      
      If a cgroup with stuck inflight ends up getting throttled, the throttled IOs
      will never get issued as there's no completion event to wake it up leading
      to an indefinite hang.
      
      This patch fixes the bug by unifying enable handling into a work item which
      is automatically kicked off from iolatency_set_min_lat_nsec() which is
      called from both iolatency_set_limit() and iolatency_pd_offline() paths.
      Punting to a work item is necessary as iolatency_pd_offline() is called
      under spinlocks while freezing a request_queue requires a sleepable context.
      
      This also simplifies the code reducing LOC sans the comments and avoids the
      unnecessary freezes which were happening whenever a cgroup's latency target
      is newly set or cleared.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Josef Bacik <josef@toxicpanda.com>
      Cc: Liu Bo <bo.liu@linux.alibaba.com>
      Fixes: 8c772a9b ("blk-iolatency: fix IO hang due to negative inflight counter")
      Cc: stable@vger.kernel.org # v5.0+
      Link: https://lore.kernel.org/r/Yn9ScX6Nx2qIiQQi@slm.duckdns.orgSigned-off-by: NJens Axboe <axboe@kernel.dk>
      8a177a36
  8. 23 5月, 2022 2 次提交
  9. 21 5月, 2022 1 次提交
  10. 19 5月, 2022 5 次提交
  11. 18 5月, 2022 1 次提交
    • L
      blk-throttle: Set BIO_THROTTLED when bio has been throttled · 5a011f88
      Laibin Qiu 提交于
      1.In current process, all bio will set the BIO_THROTTLED flag
      after __blk_throtl_bio().
      
      2.If bio needs to be throttled, it will start the timer and
      stop submit bio directly. Bio will submit in
      blk_throtl_dispatch_work_fn() when the timer expires.But in
      the current process, if bio is throttled. The BIO_THROTTLED
      will be set to bio after timer start. If the bio has been
      completed, it may cause use-after-free blow.
      
      BUG: KASAN: use-after-free in blk_throtl_bio+0x12f0/0x2c70
      Read of size 2 at addr ffff88801b8902d4 by task fio/26380
      
       dump_stack+0x9b/0xce
       print_address_description.constprop.6+0x3e/0x60
       kasan_report.cold.9+0x22/0x3a
       blk_throtl_bio+0x12f0/0x2c70
       submit_bio_checks+0x701/0x1550
       submit_bio_noacct+0x83/0xc80
       submit_bio+0xa7/0x330
       mpage_readahead+0x380/0x500
       read_pages+0x1c1/0xbf0
       page_cache_ra_unbounded+0x471/0x6f0
       do_page_cache_ra+0xda/0x110
       ondemand_readahead+0x442/0xae0
       page_cache_async_ra+0x210/0x300
       generic_file_buffered_read+0x4d9/0x2130
       generic_file_read_iter+0x315/0x490
       blkdev_read_iter+0x113/0x1b0
       aio_read+0x2ad/0x450
       io_submit_one+0xc8e/0x1d60
       __se_sys_io_submit+0x125/0x350
       do_syscall_64+0x2d/0x40
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Allocated by task 26380:
       kasan_save_stack+0x19/0x40
       __kasan_kmalloc.constprop.2+0xc1/0xd0
       kmem_cache_alloc+0x146/0x440
       mempool_alloc+0x125/0x2f0
       bio_alloc_bioset+0x353/0x590
       mpage_alloc+0x3b/0x240
       do_mpage_readpage+0xddf/0x1ef0
       mpage_readahead+0x264/0x500
       read_pages+0x1c1/0xbf0
       page_cache_ra_unbounded+0x471/0x6f0
       do_page_cache_ra+0xda/0x110
       ondemand_readahead+0x442/0xae0
       page_cache_async_ra+0x210/0x300
       generic_file_buffered_read+0x4d9/0x2130
       generic_file_read_iter+0x315/0x490
       blkdev_read_iter+0x113/0x1b0
       aio_read+0x2ad/0x450
       io_submit_one+0xc8e/0x1d60
       __se_sys_io_submit+0x125/0x350
       do_syscall_64+0x2d/0x40
       entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      Freed by task 0:
       kasan_save_stack+0x19/0x40
       kasan_set_track+0x1c/0x30
       kasan_set_free_info+0x1b/0x30
       __kasan_slab_free+0x111/0x160
       kmem_cache_free+0x94/0x460
       mempool_free+0xd6/0x320
       bio_free+0xe0/0x130
       bio_put+0xab/0xe0
       bio_endio+0x3a6/0x5d0
       blk_update_request+0x590/0x1370
       scsi_end_request+0x7d/0x400
       scsi_io_completion+0x1aa/0xe50
       scsi_softirq_done+0x11b/0x240
       blk_mq_complete_request+0xd4/0x120
       scsi_mq_done+0xf0/0x200
       virtscsi_vq_done+0xbc/0x150
       vring_interrupt+0x179/0x390
       __handle_irq_event_percpu+0xf7/0x490
       handle_irq_event_percpu+0x7b/0x160
       handle_irq_event+0xcc/0x170
       handle_edge_irq+0x215/0xb20
       common_interrupt+0x60/0x120
       asm_common_interrupt+0x1e/0x40
      
      Fix this by move BIO_THROTTLED set into the queue_lock.
      Signed-off-by: NLaibin Qiu <qiulaibin@huawei.com>
      Reviewed-by: NMing Lei <ming.lei@redhat.com>
      Link: https://lore.kernel.org/r/20220301123919.2381579-1-qiulaibin@huawei.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
      5a011f88
  12. 17 5月, 2022 5 次提交
  13. 14 5月, 2022 1 次提交
  14. 12 5月, 2022 2 次提交
  15. 11 5月, 2022 1 次提交
  16. 10 5月, 2022 1 次提交
  17. 09 5月, 2022 2 次提交
  18. 05 5月, 2022 3 次提交
  19. 03 5月, 2022 3 次提交