1. 02 9月, 2020 16 次提交
  2. 29 6月, 2020 4 次提交
  3. 15 6月, 2020 6 次提交
    • Z
      block, bfq: fix use-after-free in bfq_idle_slice_timer_body · baecb6b1
      Zhiqiang Liu 提交于
      task #28557799
      
      [ Upstream commit 2f95fa5c955d0a9987ffdc3a095e2f4e62c5f2a9 ]
      
      In bfq_idle_slice_timer func, bfqq = bfqd->in_service_queue is
      not in bfqd-lock critical section. The bfqq, which is not
      equal to NULL in bfq_idle_slice_timer, may be freed after passing
      to bfq_idle_slice_timer_body. So we will access the freed memory.
      
      In addition, considering the bfqq may be in race, we should
      firstly check whether bfqq is in service before doing something
      on it in bfq_idle_slice_timer_body func. If the bfqq in race is
      not in service, it means the bfqq has been expired through
      __bfq_bfqq_expire func, and wait_request flags has been cleared in
      __bfq_bfqd_reset_in_service func. So we do not need to re-clear the
      wait_request of bfqq which is not in service.
      
      KASAN log is given as follows:
      [13058.354613] ==================================================================
      [13058.354640] BUG: KASAN: use-after-free in bfq_idle_slice_timer+0xac/0x290
      [13058.354644] Read of size 8 at addr ffffa02cf3e63f78 by task fork13/19767
      [13058.354646]
      [13058.354655] CPU: 96 PID: 19767 Comm: fork13
      [13058.354661] Call trace:
      [13058.354667]  dump_backtrace+0x0/0x310
      [13058.354672]  show_stack+0x28/0x38
      [13058.354681]  dump_stack+0xd8/0x108
      [13058.354687]  print_address_description+0x68/0x2d0
      [13058.354690]  kasan_report+0x124/0x2e0
      [13058.354697]  __asan_load8+0x88/0xb0
      [13058.354702]  bfq_idle_slice_timer+0xac/0x290
      [13058.354707]  __hrtimer_run_queues+0x298/0x8b8
      [13058.354710]  hrtimer_interrupt+0x1b8/0x678
      [13058.354716]  arch_timer_handler_phys+0x4c/0x78
      [13058.354722]  handle_percpu_devid_irq+0xf0/0x558
      [13058.354731]  generic_handle_irq+0x50/0x70
      [13058.354735]  __handle_domain_irq+0x94/0x110
      [13058.354739]  gic_handle_irq+0x8c/0x1b0
      [13058.354742]  el1_irq+0xb8/0x140
      [13058.354748]  do_wp_page+0x260/0xe28
      [13058.354752]  __handle_mm_fault+0x8ec/0x9b0
      [13058.354756]  handle_mm_fault+0x280/0x460
      [13058.354762]  do_page_fault+0x3ec/0x890
      [13058.354765]  do_mem_abort+0xc0/0x1b0
      [13058.354768]  el0_da+0x24/0x28
      [13058.354770]
      [13058.354773] Allocated by task 19731:
      [13058.354780]  kasan_kmalloc+0xe0/0x190
      [13058.354784]  kasan_slab_alloc+0x14/0x20
      [13058.354788]  kmem_cache_alloc_node+0x130/0x440
      [13058.354793]  bfq_get_queue+0x138/0x858
      [13058.354797]  bfq_get_bfqq_handle_split+0xd4/0x328
      [13058.354801]  bfq_init_rq+0x1f4/0x1180
      [13058.354806]  bfq_insert_requests+0x264/0x1c98
      [13058.354811]  blk_mq_sched_insert_requests+0x1c4/0x488
      [13058.354818]  blk_mq_flush_plug_list+0x2d4/0x6e0
      [13058.354826]  blk_flush_plug_list+0x230/0x548
      [13058.354830]  blk_finish_plug+0x60/0x80
      [13058.354838]  read_pages+0xec/0x2c0
      [13058.354842]  __do_page_cache_readahead+0x374/0x438
      [13058.354846]  ondemand_readahead+0x24c/0x6b0
      [13058.354851]  page_cache_sync_readahead+0x17c/0x2f8
      [13058.354858]  generic_file_buffered_read+0x588/0xc58
      [13058.354862]  generic_file_read_iter+0x1b4/0x278
      [13058.354965]  ext4_file_read_iter+0xa8/0x1d8 [ext4]
      [13058.354972]  __vfs_read+0x238/0x320
      [13058.354976]  vfs_read+0xbc/0x1c0
      [13058.354980]  ksys_read+0xdc/0x1b8
      [13058.354984]  __arm64_sys_read+0x50/0x60
      [13058.354990]  el0_svc_common+0xb4/0x1d8
      [13058.354994]  el0_svc_handler+0x50/0xa8
      [13058.354998]  el0_svc+0x8/0xc
      [13058.354999]
      [13058.355001] Freed by task 19731:
      [13058.355007]  __kasan_slab_free+0x120/0x228
      [13058.355010]  kasan_slab_free+0x10/0x18
      [13058.355014]  kmem_cache_free+0x288/0x3f0
      [13058.355018]  bfq_put_queue+0x134/0x208
      [13058.355022]  bfq_exit_icq_bfqq+0x164/0x348
      [13058.355026]  bfq_exit_icq+0x28/0x40
      [13058.355030]  ioc_exit_icq+0xa0/0x150
      [13058.355035]  put_io_context_active+0x250/0x438
      [13058.355038]  exit_io_context+0xd0/0x138
      [13058.355045]  do_exit+0x734/0xc58
      [13058.355050]  do_group_exit+0x78/0x220
      [13058.355054]  __wake_up_parent+0x0/0x50
      [13058.355058]  el0_svc_common+0xb4/0x1d8
      [13058.355062]  el0_svc_handler+0x50/0xa8
      [13058.355066]  el0_svc+0x8/0xc
      [13058.355067]
      [13058.355071] The buggy address belongs to the object at ffffa02cf3e63e70#012 which belongs to the cache bfq_queue of size 464
      [13058.355075] The buggy address is located 264 bytes inside of#012 464-byte region [ffffa02cf3e63e70, ffffa02cf3e64040)
      [13058.355077] The buggy address belongs to the page:
      [13058.355083] page:ffff7e80b3cf9800 count:1 mapcount:0 mapping:ffff802db5c90780 index:0xffffa02cf3e606f0 compound_mapcount: 0
      [13058.366175] flags: 0x2ffffe0000008100(slab|head)
      [13058.370781] raw: 2ffffe0000008100 ffff7e80b53b1408 ffffa02d730c1c90 ffff802db5c90780
      [13058.370787] raw: ffffa02cf3e606f0 0000000000370023 00000001ffffffff 0000000000000000
      [13058.370789] page dumped because: kasan: bad access detected
      [13058.370791]
      [13058.370792] Memory state around the buggy address:
      [13058.370797]  ffffa02cf3e63e00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fb fb
      [13058.370801]  ffffa02cf3e63e80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [13058.370805] >ffffa02cf3e63f00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [13058.370808]                                                                 ^
      [13058.370811]  ffffa02cf3e63f80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      [13058.370815]  ffffa02cf3e64000: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
      [13058.370817] ==================================================================
      [13058.370820] Disabling lock debugging due to kernel taint
      
      Here, we directly pass the bfqd to bfq_idle_slice_timer_body func.
      --
      V2->V3: rewrite the comment as suggested by Paolo Valente
      V1->V2: add one comment, and add Fixes and Reported-by tag.
      
      Fixes: aee69d78 ("block, bfq: introduce the BFQ-v0 I/O scheduler as an extra scheduler")
      Acked-by: NPaolo Valente <paolo.valente@linaro.org>
      Reported-by: NWang Wang <wangwang2@huawei.com>
      Signed-off-by: NZhiqiang Liu <liuzhiqiang26@huawei.com>
      Signed-off-by: NFeilong Lin <linfeilong@huawei.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NJeffle Xu <jefflexu@linux.alibaba.com>
      Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      baecb6b1
    • S
      block: Fix use-after-free issue accessing struct io_cq · fba123ba
      Sahitya Tummala 提交于
      task #28557799
      
      [ Upstream commit 30a2da7b7e225ef6c87a660419ea04d3cef3f6a7 ]
      
      There is a potential race between ioc_release_fn() and
      ioc_clear_queue() as shown below, due to which below kernel
      crash is observed. It also can result into use-after-free
      issue.
      
      context#1:				context#2:
      ioc_release_fn()			__ioc_clear_queue() gets the same icq
      ->spin_lock(&ioc->lock);		->spin_lock(&ioc->lock);
      ->ioc_destroy_icq(icq);
        ->list_del_init(&icq->q_node);
        ->call_rcu(&icq->__rcu_head,
        	icq_free_icq_rcu);
      ->spin_unlock(&ioc->lock);
      					->ioc_destroy_icq(icq);
      					  ->hlist_del_init(&icq->ioc_node);
      					  This results into below crash as this memory
      					  is now used by icq->__rcu_head in context#1.
      					  There is a chance that icq could be free'd
      					  as well.
      
      22150.386550:   <6> Unable to handle kernel write to read-only memory
      at virtual address ffffffaa8d31ca50
      ...
      Call trace:
      22150.607350:   <2>  ioc_destroy_icq+0x44/0x110
      22150.611202:   <2>  ioc_clear_queue+0xac/0x148
      22150.615056:   <2>  blk_cleanup_queue+0x11c/0x1a0
      22150.619174:   <2>  __scsi_remove_device+0xdc/0x128
      22150.623465:   <2>  scsi_forget_host+0x2c/0x78
      22150.627315:   <2>  scsi_remove_host+0x7c/0x2a0
      22150.631257:   <2>  usb_stor_disconnect+0x74/0xc8
      22150.635371:   <2>  usb_unbind_interface+0xc8/0x278
      22150.639665:   <2>  device_release_driver_internal+0x198/0x250
      22150.644897:   <2>  device_release_driver+0x24/0x30
      22150.649176:   <2>  bus_remove_device+0xec/0x140
      22150.653204:   <2>  device_del+0x270/0x460
      22150.656712:   <2>  usb_disable_device+0x120/0x390
      22150.660918:   <2>  usb_disconnect+0xf4/0x2e0
      22150.664684:   <2>  hub_event+0xd70/0x17e8
      22150.668197:   <2>  process_one_work+0x210/0x480
      22150.672222:   <2>  worker_thread+0x32c/0x4c8
      
      Fix this by adding a new ICQ_DESTROYED flag in ioc_destroy_icq() to
      indicate this icq is once marked as destroyed. Also, ensure
      __ioc_clear_queue() is accessing icq within rcu_read_lock/unlock so
      that icq doesn't get free'd up while it is still using it.
      Signed-off-by: NSahitya Tummala <stummala@codeaurora.org>
      Co-developed-by: NPradeep P V K <ppvk@codeaurora.org>
      Signed-off-by: NPradeep P V K <ppvk@codeaurora.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NJeffle Xu <jefflexu@linux.alibaba.com>
      Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      fba123ba
    • K
      block: keep bdi->io_pages in sync with max_sectors_kb for stacked devices · dc938b41
      Konstantin Khlebnikov 提交于
      task #28557799
      
      [ Upstream commit e74d93e96d721c4297f2a900ad0191890d2fc2b0 ]
      
      Field bdi->io_pages added in commit 9491ae4a ("mm: don't cap request
      size based on read-ahead setting") removes unneeded split of read requests.
      
      Stacked drivers do not call blk_queue_max_hw_sectors(). Instead they set
      limits of their devices by blk_set_stacking_limits() + disk_stack_limits().
      Field bio->io_pages stays zero until user set max_sectors_kb via sysfs.
      
      This patch updates io_pages after merging limits in disk_stack_limits().
      
      Commit c6d6e9b0f6b4 ("dm: do not allow readahead to limit IO size") fixed
      the same problem for device-mapper devices, this one fixes MD RAIDs.
      
      Fixes: 9491ae4a ("mm: don't cap request size based on read-ahead setting")
      Reviewed-by: NPaul Menzel <pmenzel@molgen.mpg.de>
      Reviewed-by: NBob Liu <bob.liu@oracle.com>
      Signed-off-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Signed-off-by: NSong Liu <songliubraving@fb.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NJeffle Xu <jefflexu@linux.alibaba.com>
      Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      dc938b41
    • C
      block, bfq: fix overwrite of bfq_group pointer in bfq_find_set_group() · 8cf52b47
      Carlo Nonato 提交于
      task #28557799
      
      [ Upstream commit 14afc59361976c0ba39e3a9589c3eaa43ebc7e1d ]
      
      The bfq_find_set_group() function takes as input a blkcg (which represents
      a cgroup) and retrieves the corresponding bfq_group, then it updates the
      bfq internal group hierarchy (see comments inside the function for why
      this is needed) and finally it returns the bfq_group.
      In the hierarchy update cycle, the pointer holding the correct bfq_group
      that has to be returned is mistakenly used to traverse the hierarchy
      bottom to top, meaning that in each iteration it gets overwritten with the
      parent of the current group. Since the update cycle stops at root's
      children (depth = 2), the overwrite becomes a problem only if the blkcg
      describes a cgroup at a hierarchy level deeper than that (depth > 2). In
      this case the root's child that happens to be also an ancestor of the
      correct bfq_group is returned. The main consequence is that processes
      contained in a cgroup at depth greater than 2 are wrongly placed in the
      group described above by BFQ.
      
      This commits fixes this problem by using a different bfq_group pointer in
      the update cycle in order to avoid the overwrite of the variable holding
      the original group reference.
      Reported-by: NKwon Je Oh <kwonje.oh2@gmail.com>
      Signed-off-by: NCarlo Nonato <carlo.nonato95@gmail.com>
      Signed-off-by: NPaolo Valente <paolo.valente@linaro.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NJeffle Xu <jefflexu@linux.alibaba.com>
      Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      8cf52b47
    • M
      block: fix an integer overflow in logical block size · 8b05616d
      Mikulas Patocka 提交于
      task #28557799
      
      commit ad6bf88a6c19a39fb3b0045d78ea880325dfcf15 upstream.
      
      Logical block size has type unsigned short. That means that it can be at
      most 32768. However, there are architectures that can run with 64k pages
      (for example arm64) and on these architectures, it may be possible to
      create block devices with 64k block size.
      
      For exmaple (run this on an architecture with 64k pages):
      
      Mount will fail with this error because it tries to read the superblock using 2-sector
      access:
        device-mapper: writecache: I/O is not aligned, sector 2, size 1024, block size 65536
        EXT4-fs (dm-0): unable to read superblock
      
      This patch changes the logical block size from unsigned short to unsigned
      int to avoid the overflow.
      
      Cc: stable@vger.kernel.org
      Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
      Reviewed-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NJeffle Xu <jefflexu@linux.alibaba.com>
      Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      8b05616d
    • Y
      block: fix memleak when __blk_rq_map_user_iov() is failed · a15ce925
      Yang Yingliang 提交于
      task #28557799
      
      [ Upstream commit 3b7995a98ad76da5597b488fa84aa5a56d43b608 ]
      
      When I doing fuzzy test, get the memleak report:
      
      BUG: memory leak
      unreferenced object 0xffff88837af80000 (size 4096):
        comm "memleak", pid 3557, jiffies 4294817681 (age 112.499s)
        hex dump (first 32 bytes):
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
          20 00 00 00 10 01 00 00 00 00 00 00 01 00 00 00   ...............
        backtrace:
          [<000000001c894df8>] bio_alloc_bioset+0x393/0x590
          [<000000008b139a3c>] bio_copy_user_iov+0x300/0xcd0
          [<00000000a998bd8c>] blk_rq_map_user_iov+0x2f1/0x5f0
          [<000000005ceb7f05>] blk_rq_map_user+0xf2/0x160
          [<000000006454da92>] sg_common_write.isra.21+0x1094/0x1870
          [<00000000064bb208>] sg_write.part.25+0x5d9/0x950
          [<000000004fc670f6>] sg_write+0x5f/0x8c
          [<00000000b0d05c7b>] __vfs_write+0x7c/0x100
          [<000000008e177714>] vfs_write+0x1c3/0x500
          [<0000000087d23f34>] ksys_write+0xf9/0x200
          [<000000002c8dbc9d>] do_syscall_64+0x9f/0x4f0
          [<00000000678d8e9a>] entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      If __blk_rq_map_user_iov() is failed in blk_rq_map_user_iov(),
      the bio(s) which is allocated before this failing will leak. The
      refcount of the bio(s) is init to 1 and increased to 2 by calling
      bio_get(), but __blk_rq_unmap_user() only decrease it to 1, so
      the bio cannot be freed. Fix it by calling blk_rq_unmap_user().
      Reviewed-by: NBob Liu <bob.liu@oracle.com>
      Reported-by: NHulk Robot <hulkci@huawei.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NJeffle Xu <jefflexu@linux.alibaba.com>
      Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      a15ce925
  4. 11 6月, 2020 1 次提交
    • J
      alinux: blk-mq: remove QUEUE_FLAG_POLL from default MQ flags · 294d5fb2
      Joseph Qi 提交于
      fix #28528017
      
      In case of virtio-blk device, checking /sys/block/<device>/queue/io_poll
      will show 1 and user can't disable it. Actually virtio-blk doesn't
      support poll yet, so it will confuse end user. The root cause is mq
      initialization will default set bit QUEUE_FLAG_POLL.
      
      This fix takes ideas from the following upstream commits:
      6544d229bf43 ("block: enable polling by default if a poll map is initalized")
      6e0de61107f0 ("blk-mq: remove QUEUE_FLAG_POLL from default MQ flags")
      Since we don't want to get HCTX_TYPE_POLL related logic involved, so
      just check mq_ops->poll and then set QUEUE_FLAG_POLL.
      Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      Reviewed-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
      294d5fb2
  5. 09 6月, 2020 2 次提交
  6. 07 5月, 2020 1 次提交
  7. 25 3月, 2020 1 次提交
    • X
      alinux: blk-mq: fix broken io_ticks & time_in_queue update · a9ee8ebe
      Xiaoguang Wang 提交于
      fix #25369772
      
      In blk-mq device, we observed a issue that though iops is low, but iostat
      shows a very high svctm & util value, which is counter-intuitive.
      
      The root cause is that blk_account_io_start() calls part_round_stats()
      before "rq->part = part" statement, so part_round_stats() will count
      an inflight request to the whole device, but not for the specific
      partition, then it'll update whole device's io_ticks and time_in_queue
      with a stale part->stamp.
      
      To fix this issue, if a request's part is NULL, we just don't count
      it as an inflight request to the whole device.
      Signed-off-by: NXiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
      Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      a9ee8ebe
  8. 18 3月, 2020 9 次提交