1. 20 6月, 2018 2 次提交
    • B
      Revert "block: Add warning for bi_next not NULL in bio_endio()" · 9c24c10a
      Bart Van Assche 提交于
      Commit 0ba99ca4 ("block: Add warning for bi_next not NULL in
      bio_endio()") breaks the dm driver. end_clone_bio() detects whether
      or not a bio is the last bio associated with a request by checking
      the .bi_next field. Commit 0ba99ca4 clears that field before
      end_clone_bio() has had a chance to inspect that field. Hence revert
      commit 0ba99ca4.
      
      This patch avoids that KASAN reports the following complaint when
      running the srp-test software (srp-test/run_tests -c -d -r 10 -t 02-mq):
      
      ==================================================================
      BUG: KASAN: use-after-free in bio_advance+0x11b/0x1d0
      Read of size 4 at addr ffff8801300e06d0 by task ksoftirqd/0/9
      
      CPU: 0 PID: 9 Comm: ksoftirqd/0 Not tainted 4.18.0-rc1-dbg+ #1
      Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014
      Call Trace:
       dump_stack+0xa4/0xf5
       print_address_description+0x6f/0x270
       kasan_report+0x241/0x360
       __asan_load4+0x78/0x80
       bio_advance+0x11b/0x1d0
       blk_update_request+0xa7/0x5b0
       scsi_end_request+0x56/0x320 [scsi_mod]
       scsi_io_completion+0x7d6/0xb20 [scsi_mod]
       scsi_finish_command+0x1c0/0x280 [scsi_mod]
       scsi_softirq_done+0x19a/0x230 [scsi_mod]
       blk_mq_complete_request+0x160/0x240
       scsi_mq_done+0x50/0x1a0 [scsi_mod]
       srp_recv_done+0x515/0x1330 [ib_srp]
       __ib_process_cq+0xa0/0xf0 [ib_core]
       ib_poll_handler+0x38/0xa0 [ib_core]
       irq_poll_softirq+0xe8/0x1f0
       __do_softirq+0x128/0x60d
       run_ksoftirqd+0x3f/0x60
       smpboot_thread_fn+0x352/0x460
       kthread+0x1c1/0x1e0
       ret_from_fork+0x24/0x30
      
      Allocated by task 1918:
       save_stack+0x43/0xd0
       kasan_kmalloc+0xad/0xe0
       kasan_slab_alloc+0x11/0x20
       kmem_cache_alloc+0xfe/0x350
       mempool_alloc_slab+0x15/0x20
       mempool_alloc+0xfb/0x270
       bio_alloc_bioset+0x244/0x350
       submit_bh_wbc+0x9c/0x2f0
       __block_write_full_page+0x299/0x5a0
       block_write_full_page+0x16b/0x180
       blkdev_writepage+0x18/0x20
       __writepage+0x42/0x80
       write_cache_pages+0x376/0x8a0
       generic_writepages+0xbe/0x110
       blkdev_writepages+0xe/0x10
       do_writepages+0x9b/0x180
       __filemap_fdatawrite_range+0x178/0x1c0
       file_write_and_wait_range+0x59/0xc0
       blkdev_fsync+0x46/0x80
       vfs_fsync_range+0x66/0x100
       do_fsync+0x3d/0x70
       __x64_sys_fsync+0x21/0x30
       do_syscall_64+0x77/0x230
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Freed by task 9:
       save_stack+0x43/0xd0
       __kasan_slab_free+0x137/0x190
       kasan_slab_free+0xe/0x10
       kmem_cache_free+0xd3/0x380
       mempool_free_slab+0x17/0x20
       mempool_free+0x63/0x160
       bio_free+0x81/0xa0
       bio_put+0x59/0x60
       end_bio_bh_io_sync+0x5d/0x70
       bio_endio+0x1a7/0x360
       blk_update_request+0xd0/0x5b0
       end_clone_bio+0xa3/0xd0 [dm_mod]
       bio_endio+0x1a7/0x360
       blk_update_request+0xd0/0x5b0
       scsi_end_request+0x56/0x320 [scsi_mod]
       scsi_io_completion+0x7d6/0xb20 [scsi_mod]
       scsi_finish_command+0x1c0/0x280 [scsi_mod]
       scsi_softirq_done+0x19a/0x230 [scsi_mod]
       blk_mq_complete_request+0x160/0x240
       scsi_mq_done+0x50/0x1a0 [scsi_mod]
       srp_recv_done+0x515/0x1330 [ib_srp]
       __ib_process_cq+0xa0/0xf0 [ib_core]
       ib_poll_handler+0x38/0xa0 [ib_core]
       irq_poll_softirq+0xe8/0x1f0
       __do_softirq+0x128/0x60d
      
      The buggy address belongs to the object at ffff8801300e0640
       which belongs to the cache bio-0 of size 200
      The buggy address is located 144 bytes inside of
       200-byte region [ffff8801300e0640, ffff8801300e0708)
      The buggy address belongs to the page:
      page:ffffea0004c03800 count:1 mapcount:0 mapping:ffff88015a563a00 index:0x0 compound_mapcount: 0
      flags: 0x8000000000008100(slab|head)
      raw: 8000000000008100 dead000000000100 dead000000000200 ffff88015a563a00
      raw: 0000000000000000 0000000000330033 00000001ffffffff 0000000000000000
      page dumped because: kasan: bad access detected
      
      Memory state around the buggy address:
       ffff8801300e0580: fb fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc
       ffff8801300e0600: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
      >ffff8801300e0680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                                       ^
       ffff8801300e0700: fb fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
       ffff8801300e0780: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      ==================================================================
      
      Cc: Kent Overstreet <kent.overstreet@gmail.com>
      Fixes: 0ba99ca4 ("block: Add warning for bi_next not NULL in bio_endio()")
      Acked-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      9c24c10a
    • C
      block: fix timeout changes for legacy request drivers · 0cc61e64
      Christoph Hellwig 提交于
      blk_mq_complete_request can only be called for blk-mq drivers, but when
      removing the BLK_EH_HANDLED return value, two legacy request timeout
      methods incorrectly got switched to call blk_mq_complete_request.
      Call __blk_complete_request instead to reinstance the previous behavior.
      For that __blk_complete_request needs to be exported.
      
      Fixes: 1fc2b62e ("scsi_transport_fc: complete requests from ->timeout")
      Fixes: 0df0bb08 ("null_blk: complete requests from ->timeout")
      Reported-by: NJianchao Wang <jianchao.w.wang@oracle.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      0cc61e64
  2. 15 6月, 2018 2 次提交
  3. 14 6月, 2018 2 次提交
  4. 11 6月, 2018 1 次提交
    • R
      blk-mq: reinit q->tag_set_list entry only after grace period · a347c7ad
      Roman Pen 提交于
      It is not allowed to reinit q->tag_set_list list entry while RCU grace
      period has not completed yet, otherwise the following soft lockup in
      blk_mq_sched_restart() happens:
      
      [ 1064.252652] watchdog: BUG: soft lockup - CPU#12 stuck for 23s! [fio:9270]
      [ 1064.254445] task: ffff99b912e8b900 task.stack: ffffa6d54c758000
      [ 1064.254613] RIP: 0010:blk_mq_sched_restart+0x96/0x150
      [ 1064.256510] Call Trace:
      [ 1064.256664]  <IRQ>
      [ 1064.256824]  blk_mq_free_request+0xea/0x100
      [ 1064.256987]  msg_io_conf+0x59/0xd0 [ibnbd_client]
      [ 1064.257175]  complete_rdma_req+0xf2/0x230 [ibtrs_client]
      [ 1064.257340]  ? ibtrs_post_recv_empty+0x4d/0x70 [ibtrs_core]
      [ 1064.257502]  ibtrs_clt_rdma_done+0xd1/0x1e0 [ibtrs_client]
      [ 1064.257669]  ib_create_qp+0x321/0x380 [ib_core]
      [ 1064.257841]  ib_process_cq_direct+0xbd/0x120 [ib_core]
      [ 1064.258007]  irq_poll_softirq+0xb7/0xe0
      [ 1064.258165]  __do_softirq+0x106/0x2a2
      [ 1064.258328]  irq_exit+0x92/0xa0
      [ 1064.258509]  do_IRQ+0x4a/0xd0
      [ 1064.258660]  common_interrupt+0x7a/0x7a
      [ 1064.258818]  </IRQ>
      
      Meanwhile another context frees other queue but with the same set of
      shared tags:
      
      [ 1288.201183] INFO: task bash:5910 blocked for more than 180 seconds.
      [ 1288.201833] bash            D    0  5910   5820 0x00000000
      [ 1288.202016] Call Trace:
      [ 1288.202315]  schedule+0x32/0x80
      [ 1288.202462]  schedule_timeout+0x1e5/0x380
      [ 1288.203838]  wait_for_completion+0xb0/0x120
      [ 1288.204137]  __wait_rcu_gp+0x125/0x160
      [ 1288.204287]  synchronize_sched+0x6e/0x80
      [ 1288.204770]  blk_mq_free_queue+0x74/0xe0
      [ 1288.204922]  blk_cleanup_queue+0xc7/0x110
      [ 1288.205073]  ibnbd_clt_unmap_device+0x1bc/0x280 [ibnbd_client]
      [ 1288.205389]  ibnbd_clt_unmap_dev_store+0x169/0x1f0 [ibnbd_client]
      [ 1288.205548]  kernfs_fop_write+0x109/0x180
      [ 1288.206328]  vfs_write+0xb3/0x1a0
      [ 1288.206476]  SyS_write+0x52/0xc0
      [ 1288.206624]  do_syscall_64+0x68/0x1d0
      [ 1288.206774]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
      
      What happened is the following:
      
      1. There are several MQ queues with shared tags.
      2. One queue is about to be freed and now task is in
         blk_mq_del_queue_tag_set().
      3. Other CPU is in blk_mq_sched_restart() and loops over all queues in
         tag list in order to find hctx to restart.
      
      Because linked list entry was modified in blk_mq_del_queue_tag_set()
      without proper waiting for a grace period, blk_mq_sched_restart()
      never ends, spining in list_for_each_entry_rcu_rr(), thus soft lockup.
      
      Fix is simple: reinit list entry after an RCU grace period elapsed.
      
      Fixes: Fixes: 705cda97 ("blk-mq: Make it safe to use RCU to iterate over blk_mq_tag_set.tag_list")
      Cc: stable@vger.kernel.org
      Cc: Sagi Grimberg <sagi@grimberg.me>
      Cc: linux-block@vger.kernel.org
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NMing Lei <ming.lei@redhat.com>
      Reviewed-by: NBart Van Assche <bart.vanassche@wdc.com>
      Signed-off-by: NRoman Pen <roman.penyaev@profitbricks.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      a347c7ad
  5. 09 6月, 2018 1 次提交
  6. 08 6月, 2018 1 次提交
  7. 07 6月, 2018 1 次提交
  8. 06 6月, 2018 1 次提交
  9. 05 6月, 2018 2 次提交
  10. 03 6月, 2018 2 次提交
    • M
      blk-mq: update nr_requests when switching to 'none' scheduler · 32a50fab
      Ming Lei 提交于
      Now we setup q->nr_requests when switching to one new scheduler,
      but not do it for 'none', then q->nr_requests may not be correct
      for 'none'.
      
      This patch fixes this issue by always updating 'nr_requests' when
      switching to 'none'.
      
      Cc: Marco Patalano <mpatalan@redhat.com>
      Cc: "Ewan D. Milne" <emilne@redhat.com>
      Signed-off-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      32a50fab
    • J
      block: don't use blocking queue entered for recursive bio submits · cd4a4ae4
      Jens Axboe 提交于
      If we end up splitting a bio and the queue goes away between
      the initial submission and the later split submission, then we
      can block forever in blk_queue_enter() waiting for the reference
      to drop to zero. This will never happen, since we already hold
      a reference.
      
      Mark a split bio as already having entered the queue, so we can
      just use the live non-blocking queue enter variant.
      
      Thanks to Tetsuo Handa for the analysis.
      
      Reported-by: syzbot+c4f9cebf9d651f6e54de@syzkaller.appspotmail.com
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      cd4a4ae4
  11. 02 6月, 2018 1 次提交
  12. 01 6月, 2018 5 次提交
  13. 31 5月, 2018 16 次提交
  14. 30 5月, 2018 1 次提交
  15. 29 5月, 2018 2 次提交