1. 13 11月, 2021 1 次提交
  2. 12 11月, 2021 5 次提交
    • L
      blkcg: Remove extra blkcg_bio_issue_init · b781d8db
      Laibin Qiu 提交于
      KASAN reports a use-after-free report when doing block test:
      
      ==================================================================
      [10050.967049] BUG: KASAN: use-after-free in
      submit_bio_checks+0x1539/0x1550
      
      [10050.977638] Call Trace:
      [10050.978190]  dump_stack+0x9b/0xce
      [10050.979674]  print_address_description.constprop.6+0x3e/0x60
      [10050.983510]  kasan_report.cold.9+0x22/0x3a
      [10050.986089]  submit_bio_checks+0x1539/0x1550
      [10050.989576]  submit_bio_noacct+0x83/0xc80
      [10050.993714]  submit_bio+0xa7/0x330
      [10050.994435]  mpage_readahead+0x380/0x500
      [10050.998009]  read_pages+0x1c1/0xbf0
      [10051.002057]  page_cache_ra_unbounded+0x4c2/0x6f0
      [10051.007413]  do_page_cache_ra+0xda/0x110
      [10051.008207]  force_page_cache_ra+0x23d/0x3d0
      [10051.009087]  page_cache_sync_ra+0xca/0x300
      [10051.009970]  generic_file_buffered_read+0xbea/0x2130
      [10051.012685]  generic_file_read_iter+0x315/0x490
      [10051.014472]  blkdev_read_iter+0x113/0x1b0
      [10051.015300]  aio_read+0x2ad/0x450
      [10051.023786]  io_submit_one+0xc8e/0x1d60
      [10051.029855]  __se_sys_io_submit+0x125/0x350
      [10051.033442]  do_syscall_64+0x2d/0x40
      [10051.034156]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      [10051.048733] Allocated by task 18598:
      [10051.049482]  kasan_save_stack+0x19/0x40
      [10051.050263]  __kasan_kmalloc.constprop.1+0xc1/0xd0
      [10051.051230]  kmem_cache_alloc+0x146/0x440
      [10051.052060]  mempool_alloc+0x125/0x2f0
      [10051.052818]  bio_alloc_bioset+0x353/0x590
      [10051.053658]  mpage_alloc+0x3b/0x240
      [10051.054382]  do_mpage_readpage+0xddf/0x1ef0
      [10051.055250]  mpage_readahead+0x264/0x500
      [10051.056060]  read_pages+0x1c1/0xbf0
      [10051.056758]  page_cache_ra_unbounded+0x4c2/0x6f0
      [10051.057702]  do_page_cache_ra+0xda/0x110
      [10051.058511]  force_page_cache_ra+0x23d/0x3d0
      [10051.059373]  page_cache_sync_ra+0xca/0x300
      [10051.060198]  generic_file_buffered_read+0xbea/0x2130
      [10051.061195]  generic_file_read_iter+0x315/0x490
      [10051.062189]  blkdev_read_iter+0x113/0x1b0
      [10051.063015]  aio_read+0x2ad/0x450
      [10051.063686]  io_submit_one+0xc8e/0x1d60
      [10051.064467]  __se_sys_io_submit+0x125/0x350
      [10051.065318]  do_syscall_64+0x2d/0x40
      [10051.066082]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      [10051.067455] Freed by task 13307:
      [10051.068136]  kasan_save_stack+0x19/0x40
      [10051.068931]  kasan_set_track+0x1c/0x30
      [10051.069726]  kasan_set_free_info+0x1b/0x30
      [10051.070621]  __kasan_slab_free+0x111/0x160
      [10051.071480]  kmem_cache_free+0x94/0x460
      [10051.072256]  mempool_free+0xd6/0x320
      [10051.072985]  bio_free+0xe0/0x130
      [10051.073630]  bio_put+0xab/0xe0
      [10051.074252]  bio_endio+0x3a6/0x5d0
      [10051.074984]  blk_update_request+0x590/0x1370
      [10051.075870]  scsi_end_request+0x7d/0x400
      [10051.076667]  scsi_io_completion+0x1aa/0xe50
      [10051.077503]  scsi_softirq_done+0x11b/0x240
      [10051.078344]  blk_mq_complete_request+0xd4/0x120
      [10051.079275]  scsi_mq_done+0xf0/0x200
      [10051.080036]  virtscsi_vq_done+0xbc/0x150
      [10051.080850]  vring_interrupt+0x179/0x390
      [10051.081650]  __handle_irq_event_percpu+0xf7/0x490
      [10051.082626]  handle_irq_event_percpu+0x7b/0x160
      [10051.083527]  handle_irq_event+0xcc/0x170
      [10051.084297]  handle_edge_irq+0x215/0xb20
      [10051.085122]  asm_call_irq_on_stack+0xf/0x20
      [10051.085986]  common_interrupt+0xae/0x120
      [10051.086830]  asm_common_interrupt+0x1e/0x40
      
      ==================================================================
      
      Bio will be checked at beginning of submit_bio_noacct(). If bio needs
      to be throttled, it will start the timer and stop submit bio directly.
      Bio will submit in blk_throtl_dispatch_work_fn() when the timer expires.
      But in the current process, if bio is throttled, it will still set bio
      issue->value by blkcg_bio_issue_init(). This is redundant and may cause
      the above use-after-free.
      
      CPU0                                   CPU1
      submit_bio
      submit_bio_noacct
        submit_bio_checks
          blk_throtl_bio()
            <=mod_timer(&sq->pending_timer
                                            blk_throtl_dispatch_work_fn
                                              submit_bio_noacct() <= bio have
                                              throttle tag, will throw directly
                                              and bio issue->value will be set
                                              here
      
                                            bio_endio()
                                            bio_put()
                                            bio_free() <= free this bio
      
          blkcg_bio_issue_init(bio)
            <= bio has been freed and
            will lead to UAF
        return BLK_QC_T_NONE
      
      Fix this by remove extra blkcg_bio_issue_init.
      
      Fixes: e439bedf (blkcg: consolidate bio_issue_init() to be a part of core)
      Signed-off-by: NLaibin Qiu <qiulaibin@huawei.com>
      Link: https://lore.kernel.org/r/20211112093354.3581504-1-qiulaibin@huawei.comReviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      b781d8db
    • S
      block: Hold invalidate_lock in BLKRESETZONE ioctl · 86399ea0
      Shin'ichiro Kawasaki 提交于
      When BLKRESETZONE ioctl and data read race, the data read leaves stale
      page cache. The commit e5113505 ("block: Discard page cache of zone
      reset target range") added page cache truncation to avoid stale page
      cache after the ioctl. However, the stale page cache still can be read
      during the reset zone operation for the ioctl. To avoid the stale page
      cache completely, hold invalidate_lock of the block device file mapping.
      
      Fixes: e5113505 ("block: Discard page cache of zone reset target range")
      Signed-off-by: NShin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
      Cc: stable@vger.kernel.org # v5.15
      Reviewed-by: NJan Kara <jack@suse.cz>
      Reviewed-by: NMing Lei <ming.lei@redhat.com>
      Link: https://lore.kernel.org/r/20211111085238.942492-1-shinichiro.kawasaki@wdc.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
      86399ea0
    • M
      blk-mq: rename blk_attempt_bio_merge · b131f201
      Ming Lei 提交于
      It is very annoying to have two block layer functions which share same
      name, so rename blk_attempt_bio_merge in blk-mq.c as
      blk_mq_attempt_bio_merge.
      
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: NMing Lei <ming.lei@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Link: https://lore.kernel.org/r/20211111085134.345235-3-ming.lei@redhat.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
      b131f201
    • M
      blk-mq: don't grab ->q_usage_counter in blk_mq_sched_bio_merge · 10f7335e
      Ming Lei 提交于
      blk_mq_sched_bio_merge is only called from blk-mq.c:blk_attempt_bio_merge(),
      which is called when queue usage counter is grabbed already:
      
      1) blk_mq_get_new_requests()
      
      2) blk_mq_get_request()
      - cached request in current plug owns one queue usage counter
      
      So don't grab ->q_usage_counter in blk_mq_sched_bio_merge(), and more
      importantly this nest way causes hang in blk_mq_freeze_queue_wait().
      
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: NMing Lei <ming.lei@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Link: https://lore.kernel.org/r/20211111085134.345235-2-ming.lei@redhat.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
      10f7335e
    • J
      block: fix kerneldoc for disk_register_independent_access__ranges() · 438cd742
      Jens Axboe 提交于
      The naming got changed as part of a revision of the patchset, but the
      kerneldoc apparently never got updated. Fix it.
      Reported-by: Nkernel test robot <lkp@intel.com>
      Fixes: a2247f19 ("block: Add independent access ranges support")
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      438cd742
  3. 10 11月, 2021 4 次提交
  4. 09 11月, 2021 1 次提交
  5. 08 11月, 2021 1 次提交
    • Y
      blk-mq: don't free tags if the tag_set is used by other device in queue initialztion · a846a8e6
      Ye Bin 提交于
      We got UAF report on v5.10 as follows:
      [ 1446.674930] ==================================================================
      [ 1446.675970] BUG: KASAN: use-after-free in blk_mq_get_driver_tag+0x9a4/0xa90
      [ 1446.676902] Read of size 8 at addr ffff8880185afd10 by task kworker/1:2/12348
      [ 1446.677851]
      [ 1446.678073] CPU: 1 PID: 12348 Comm: kworker/1:2 Not tainted 5.10.0-10177-gc9c81b1e346a #2
      [ 1446.679168] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
      [ 1446.680692] Workqueue: kthrotld blk_throtl_dispatch_work_fn
      [ 1446.681448] Call Trace:
      [ 1446.681800]  dump_stack+0x9b/0xce
      [ 1446.682916]  print_address_description.constprop.6+0x3e/0x60
      [ 1446.685999]  kasan_report.cold.9+0x22/0x3a
      [ 1446.687186]  blk_mq_get_driver_tag+0x9a4/0xa90
      [ 1446.687785]  blk_mq_dispatch_rq_list+0x21a/0x1d40
      [ 1446.692576]  __blk_mq_do_dispatch_sched+0x394/0x830
      [ 1446.695758]  __blk_mq_sched_dispatch_requests+0x398/0x4f0
      [ 1446.698279]  blk_mq_sched_dispatch_requests+0xdf/0x140
      [ 1446.698967]  __blk_mq_run_hw_queue+0xc0/0x270
      [ 1446.699561]  __blk_mq_delay_run_hw_queue+0x4cc/0x550
      [ 1446.701407]  blk_mq_run_hw_queue+0x13b/0x2b0
      [ 1446.702593]  blk_mq_sched_insert_requests+0x1de/0x390
      [ 1446.703309]  blk_mq_flush_plug_list+0x4b4/0x760
      [ 1446.705408]  blk_flush_plug_list+0x2c5/0x480
      [ 1446.708471]  blk_finish_plug+0x55/0xa0
      [ 1446.708980]  blk_throtl_dispatch_work_fn+0x23b/0x2e0
      [ 1446.711236]  process_one_work+0x6d4/0xfe0
      [ 1446.711778]  worker_thread+0x91/0xc80
      [ 1446.713400]  kthread+0x32d/0x3f0
      [ 1446.714362]  ret_from_fork+0x1f/0x30
      [ 1446.714846]
      [ 1446.715062] Allocated by task 1:
      [ 1446.715509]  kasan_save_stack+0x19/0x40
      [ 1446.716026]  __kasan_kmalloc.constprop.1+0xc1/0xd0
      [ 1446.716673]  blk_mq_init_tags+0x6d/0x330
      [ 1446.717207]  blk_mq_alloc_rq_map+0x50/0x1c0
      [ 1446.717769]  __blk_mq_alloc_map_and_request+0xe5/0x320
      [ 1446.718459]  blk_mq_alloc_tag_set+0x679/0xdc0
      [ 1446.719050]  scsi_add_host_with_dma.cold.3+0xa0/0x5db
      [ 1446.719736]  virtscsi_probe+0x7bf/0xbd0
      [ 1446.720265]  virtio_dev_probe+0x402/0x6c0
      [ 1446.720808]  really_probe+0x276/0xde0
      [ 1446.721320]  driver_probe_device+0x267/0x3d0
      [ 1446.721892]  device_driver_attach+0xfe/0x140
      [ 1446.722491]  __driver_attach+0x13a/0x2c0
      [ 1446.723037]  bus_for_each_dev+0x146/0x1c0
      [ 1446.723603]  bus_add_driver+0x3fc/0x680
      [ 1446.724145]  driver_register+0x1c0/0x400
      [ 1446.724693]  init+0xa2/0xe8
      [ 1446.725091]  do_one_initcall+0x9e/0x310
      [ 1446.725626]  kernel_init_freeable+0xc56/0xcb9
      [ 1446.726231]  kernel_init+0x11/0x198
      [ 1446.726714]  ret_from_fork+0x1f/0x30
      [ 1446.727212]
      [ 1446.727433] Freed by task 26992:
      [ 1446.727882]  kasan_save_stack+0x19/0x40
      [ 1446.728420]  kasan_set_track+0x1c/0x30
      [ 1446.728943]  kasan_set_free_info+0x1b/0x30
      [ 1446.729517]  __kasan_slab_free+0x111/0x160
      [ 1446.730084]  kfree+0xb8/0x520
      [ 1446.730507]  blk_mq_free_map_and_requests+0x10b/0x1b0
      [ 1446.731206]  blk_mq_realloc_hw_ctxs+0x8cb/0x15b0
      [ 1446.731844]  blk_mq_init_allocated_queue+0x374/0x1380
      [ 1446.732540]  blk_mq_init_queue_data+0x7f/0xd0
      [ 1446.733155]  scsi_mq_alloc_queue+0x45/0x170
      [ 1446.733730]  scsi_alloc_sdev+0x73c/0xb20
      [ 1446.734281]  scsi_probe_and_add_lun+0x9a6/0x2d90
      [ 1446.734916]  __scsi_scan_target+0x208/0xc50
      [ 1446.735500]  scsi_scan_channel.part.3+0x113/0x170
      [ 1446.736149]  scsi_scan_host_selected+0x25a/0x360
      [ 1446.736783]  store_scan+0x290/0x2d0
      [ 1446.737275]  dev_attr_store+0x55/0x80
      [ 1446.737782]  sysfs_kf_write+0x132/0x190
      [ 1446.738313]  kernfs_fop_write_iter+0x319/0x4b0
      [ 1446.738921]  new_sync_write+0x40e/0x5c0
      [ 1446.739429]  vfs_write+0x519/0x720
      [ 1446.739877]  ksys_write+0xf8/0x1f0
      [ 1446.740332]  do_syscall_64+0x2d/0x40
      [ 1446.740802]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      [ 1446.741462]
      [ 1446.741670] The buggy address belongs to the object at ffff8880185afd00
      [ 1446.741670]  which belongs to the cache kmalloc-256 of size 256
      [ 1446.743276] The buggy address is located 16 bytes inside of
      [ 1446.743276]  256-byte region [ffff8880185afd00, ffff8880185afe00)
      [ 1446.744765] The buggy address belongs to the page:
      [ 1446.745416] page:ffffea0000616b00 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x185ac
      [ 1446.746694] head:ffffea0000616b00 order:2 compound_mapcount:0 compound_pincount:0
      [ 1446.747719] flags: 0x1fffff80010200(slab|head)
      [ 1446.748337] raw: 001fffff80010200 ffffea00006a3208 ffffea000061bf08 ffff88801004f240
      [ 1446.749404] raw: 0000000000000000 0000000000100010 00000001ffffffff 0000000000000000
      [ 1446.750455] page dumped because: kasan: bad access detected
      [ 1446.751227]
      [ 1446.751445] Memory state around the buggy address:
      [ 1446.752102]  ffff8880185afc00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      [ 1446.753090]  ffff8880185afc80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      [ 1446.754079] >ffff8880185afd00: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [ 1446.755065]                          ^
      [ 1446.755589]  ffff8880185afd80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [ 1446.756574]  ffff8880185afe00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      [ 1446.757566] ==================================================================
      
      Flag 'BLK_MQ_F_TAG_QUEUE_SHARED' will be set if the second device on the
      same host initializes it's queue successfully. However, if the second
      device failed to allocate memory in blk_mq_alloc_and_init_hctx() from
      blk_mq_realloc_hw_ctxs() from blk_mq_init_allocated_queue(),
      __blk_mq_free_map_and_rqs() will be called on error path, and if
      'BLK_MQ_TAG_HCTX_SHARED' is not set, 'tag_set->tags' will be freed
      while it's still used by the first device.
      
      To fix this issue we move release newly allocated hardware context from
      blk_mq_realloc_hw_ctxs to __blk_mq_update_nr_hw_queues. As there is needn't to
      release hardware context in blk_mq_init_allocated_queue.
      
      Fixes: 868f2f0b ("blk-mq: dynamic h/w context count")
      Signed-off-by: NYe Bin <yebin10@huawei.com>
      Signed-off-by: NYu Kuai <yukuai3@huawei.com>
      Reviewed-by: NMing Lei <ming.lei@redhat.com>
      Link: https://lore.kernel.org/r/20211108074019.1058843-1-yebin10@huawei.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
      a846a8e6
  6. 05 11月, 2021 7 次提交
  7. 04 11月, 2021 1 次提交
    • L
      block: update __register_blkdev() probe documentation · 26e06f5b
      Luis Chamberlain 提交于
      __register_blkdev() is used to register a probe callback, and
      that callback is typically used to call add_disk(). Now that
      we are able to capture errors for add_disk(), we need to fix
      those probe calls where add_disk() fails and clean up resources.
      
      We don't extend the probe call to return the error given:
      
      1) we'd have to always special-case the case where the disk
         was already present, as otherwise concurrent requests to
         open an existing block device would fail, and this would be
         a userspace visible change
      2) the error from ilookup() on blkdev_get_no_open() is sufficient
      3) The only thing the probe call is used for is to support
         pre-devtmpfs, pre-udev semantics that want to create disks when
         their pre-created device node is accessed, and so we don't care
         for failures on probe there.
      
      Expand documentation for the probe callback to ensure users cleanup
      resources if add_disk() is used and to clarify this interface may be
      removed in the future.
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NLuis Chamberlain <mcgrof@kernel.org>
      Link: https://lore.kernel.org/r/20211103230437.1639990-12-mcgrof@kernel.orgSigned-off-by: NJens Axboe <axboe@kernel.dk>
      26e06f5b
  8. 03 11月, 2021 4 次提交
  9. 02 11月, 2021 2 次提交
  10. 30 10月, 2021 1 次提交
  11. 29 10月, 2021 3 次提交
  12. 27 10月, 2021 10 次提交