1. 21 3月, 2019 2 次提交
  2. 18 3月, 2019 1 次提交
  3. 01 3月, 2019 1 次提交
  4. 15 2月, 2019 1 次提交
    • M
      block: kill QUEUE_FLAG_NO_SG_MERGE · 2705c937
      Ming Lei 提交于
      Since bdced438 ("block: setup bi_phys_segments after splitting"),
      physical segment number is mainly figured out in blk_queue_split() for
      fast path, and the flag of BIO_SEG_VALID is set there too.
      
      Now only blk_recount_segments() and blk_recalc_rq_segments() use this
      flag.
      
      Basically blk_recount_segments() is bypassed in fast path given BIO_SEG_VALID
      is set in blk_queue_split().
      
      For another user of blk_recalc_rq_segments():
      
      - run in partial completion branch of blk_update_request, which is an unusual case
      
      - run in blk_cloned_rq_check_limits(), still not a big problem if the flag is killed
      since dm-rq is the only user.
      
      Multi-page bvec is enabled now, not doing S/G merging is rather pointless with the
      current setup of the I/O path, as it isn't going to save you a significant amount
      of cycles.
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NOmar Sandoval <osandov@fb.com>
      Signed-off-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      2705c937
  5. 12 2月, 2019 1 次提交
    • J
      blk-mq: insert rq with DONTPREP to hctx dispatch list when requeue · aef1897c
      Jianchao Wang 提交于
      When requeue, if RQF_DONTPREP, rq has contained some driver
      specific data, so insert it to hctx dispatch list to avoid any
      merge. Take scsi as example, here is the trace event log (no
      io scheduler, because RQF_STARTED would prevent merging),
      
         kworker/0:1H-339   [000] ...1  2037.209289: block_rq_insert: 8,0 R 4096 () 32768 + 8 [kworker/0:1H]
      scsi_inert_test-1987  [000] ....  2037.220465: block_bio_queue: 8,0 R 32776 + 8 [scsi_inert_test]
      scsi_inert_test-1987  [000] ...2  2037.220466: block_bio_backmerge: 8,0 R 32776 + 8 [scsi_inert_test]
         kworker/0:1H-339   [000] ....  2047.220913: block_rq_issue: 8,0 R 8192 () 32768 + 16 [kworker/0:1H]
      scsi_inert_test-1996  [000] ..s1  2047.221007: block_rq_complete: 8,0 R () 32768 + 8 [0]
      scsi_inert_test-1996  [000] .Ns1  2047.221045: block_rq_requeue: 8,0 R () 32776 + 8 [0]
         kworker/0:1H-339   [000] ...1  2047.221054: block_rq_insert: 8,0 R 4096 () 32776 + 8 [kworker/0:1H]
         kworker/0:1H-339   [000] ...1  2047.221056: block_rq_issue: 8,0 R 4096 () 32776 + 8 [kworker/0:1H]
      scsi_inert_test-1986  [000] ..s1  2047.221119: block_rq_complete: 8,0 R () 32776 + 8 [0]
      
      (32768 + 8) was requeued by scsi_queue_insert and had RQF_DONTPREP.
      Then it was merged with (32776 + 8) and issued. Due to RQF_DONTPREP,
      the sdb only contained the part of (32768 + 8), then only that part
      was completed. The lucky thing was that scsi_io_completion detected
      it and requeued the remaining part. So we didn't get corrupted data.
      However, the requeue of (32776 + 8) is not expected.
      Suggested-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NJianchao Wang <jianchao.w.wang@oracle.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      aef1897c
  6. 09 2月, 2019 1 次提交
  7. 06 2月, 2019 2 次提交
  8. 01 2月, 2019 2 次提交
  9. 16 1月, 2019 1 次提交
  10. 19 12月, 2018 1 次提交
  11. 18 12月, 2018 4 次提交
    • M
      blk-mq: enable IO poll if .nr_queues of type poll > 0 · cd19181b
      Ming Lei 提交于
      The queue mapping of type poll only exists when set->map[HCTX_TYPE_POLL].nr_queues
      is bigger than zero, so enhance the constraint by checking .nr_queues of type poll
      before enabling IO poll.
      
      Otherwise IO race & timeout can be observed when running block/007.
      
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      cd19181b
    • J
      blk-mq: change blk_mq_queue_busy() to blk_mq_queue_inflight() · 3c94d83c
      Jens Axboe 提交于
      There's a single user of this function, dm, and dm just wants
      to check if IO is inflight, not that it's just allocated.
      
      This fixes a hang with srp/002 in blktests with dm, where it tries
      to suspend but waits for inflight IO to finish first. As it checks
      for just allocated requests, this fails.
      Tested-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      3c94d83c
    • M
      blk-mq: skip zero-queue maps in blk_mq_map_swqueue · e5edd5f2
      Ming Lei 提交于
      From 7e849dd9 ("nvme-pci: don't share queue maps"), the mapping
      table won't be initialized actually if map->nr_queues is zero, so
      we can't use blk_mq_map_queue_type() to retrieve hctx any more.
      
      This way still may cause broken mapping, fix it by skipping zero-queues
      maps in blk_mq_map_swqueue().
      
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Mike Snitzer <snitzer@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      e5edd5f2
    • M
      blk-mq: fix dispatch from sw queue · c16d6b5a
      Ming Lei 提交于
      When a request is added to rq list of sw queue(ctx), the rq may be from
      a different type of hctx, especially after multi queue mapping is
      introduced.
      
      So when dispach request from sw queue via blk_mq_flush_busy_ctxs() or
      blk_mq_dequeue_from_ctx(), one request belonging to other queue type of
      hctx can be dispatched to current hctx in case that read queue or poll
      queue is enabled.
      
      This patch fixes this issue by introducing per-queue-type list.
      
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: NMing Lei <ming.lei@redhat.com>
      
      Changed by me to not use separately cacheline aligned lists, just
      place them all in the same cacheline where we had just the one list
      and lock before.
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      c16d6b5a
  12. 17 12月, 2018 1 次提交
  13. 16 12月, 2018 3 次提交
  14. 10 12月, 2018 1 次提交
  15. 08 12月, 2018 1 次提交
    • M
      blk-mq: re-build queue map in case of kdump kernel · 59388702
      Ming Lei 提交于
      Now almost all .map_queues() implementation based on managed irq
      affinity doesn't update queue mapping and it just retrieves the
      old built mapping, so if nr_hw_queues is changed, the mapping talbe
      includes stale mapping. And only blk_mq_map_queues() may rebuild
      the mapping talbe.
      
      One case is that we limit .nr_hw_queues as 1 in case of kdump kernel.
      However, drivers often builds queue mapping before allocating tagset
      via pci_alloc_irq_vectors_affinity(), but set->nr_hw_queues can be set
      as 1 in case of kdump kernel, so wrong queue mapping is used, and
      kernel panic[1] is observed during booting.
      
      This patch fixes the kernel panic triggerd on nvme by rebulding the
      mapping table via blk_mq_map_queues().
      
      [1] kernel panic log
      [    4.438371] nvme nvme0: 16/0/0 default/read/poll queues
      [    4.443277] BUG: unable to handle kernel NULL pointer dereference at 0000000000000098
      [    4.444681] PGD 0 P4D 0
      [    4.445367] Oops: 0000 [#1] SMP NOPTI
      [    4.446342] CPU: 3 PID: 201 Comm: kworker/u33:10 Not tainted 4.20.0-rc5-00664-g5eb02f7ee1eb-dirty #459
      [    4.447630] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-2.fc27 04/01/2014
      [    4.448689] Workqueue: nvme-wq nvme_scan_work [nvme_core]
      [    4.449368] RIP: 0010:blk_mq_map_swqueue+0xfb/0x222
      [    4.450596] Code: 04 f5 20 28 ef 81 48 89 c6 39 55 30 76 93 89 d0 48 c1 e0 04 48 03 83 f8 05 00 00 48 8b 00 42 8b 3c 28 48 8b 43 58 48 8b 04 f8 <48> 8b b8 98 00 00 00 4c 0f a3 37 72 42 f0 4c 0f ab 37 66 8b b8 f6
      [    4.453132] RSP: 0018:ffffc900023b3cd8 EFLAGS: 00010286
      [    4.454061] RAX: 0000000000000000 RBX: ffff888174448000 RCX: 0000000000000001
      [    4.456480] RDX: 0000000000000001 RSI: ffffe8feffc506c0 RDI: 0000000000000001
      [    4.458750] RBP: ffff88810722d008 R08: ffff88817647a880 R09: 0000000000000002
      [    4.464580] R10: ffffc900023b3c10 R11: 0000000000000004 R12: ffff888174448538
      [    4.467803] R13: 0000000000000004 R14: 0000000000000001 R15: 0000000000000001
      [    4.469220] FS:  0000000000000000(0000) GS:ffff88817bac0000(0000) knlGS:0000000000000000
      [    4.471554] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [    4.472464] CR2: 0000000000000098 CR3: 0000000174e4e001 CR4: 0000000000760ee0
      [    4.474264] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [    4.476007] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [    4.477061] PKRU: 55555554
      [    4.477464] Call Trace:
      [    4.478731]  blk_mq_init_allocated_queue+0x36a/0x3ad
      [    4.479595]  blk_mq_init_queue+0x32/0x4e
      [    4.480178]  nvme_validate_ns+0x98/0x623 [nvme_core]
      [    4.480963]  ? nvme_submit_sync_cmd+0x1b/0x20 [nvme_core]
      [    4.481685]  ? nvme_identify_ctrl.isra.8+0x70/0xa0 [nvme_core]
      [    4.482601]  nvme_scan_work+0x23a/0x29b [nvme_core]
      [    4.483269]  ? _raw_spin_unlock_irqrestore+0x25/0x38
      [    4.483930]  ? try_to_wake_up+0x38d/0x3b3
      [    4.484478]  ? process_one_work+0x179/0x2fc
      [    4.485118]  process_one_work+0x1d3/0x2fc
      [    4.485655]  ? rescuer_thread+0x2ae/0x2ae
      [    4.486196]  worker_thread+0x1e9/0x2be
      [    4.486841]  kthread+0x115/0x11d
      [    4.487294]  ? kthread_park+0x76/0x76
      [    4.487784]  ret_from_fork+0x3a/0x50
      [    4.488322] Modules linked in: nvme nvme_core qemu_fw_cfg virtio_scsi ip_tables
      [    4.489428] Dumping ftrace buffer:
      [    4.489939]    (ftrace buffer empty)
      [    4.490492] CR2: 0000000000000098
      [    4.491052] ---[ end trace 03cd268ad5a86ff7 ]---
      
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: linux-nvme@lists.infradead.org
      Cc: David Milburn <dmilburn@redhat.com>
      Signed-off-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      59388702
  16. 07 12月, 2018 1 次提交
    • J
      blk-mq: punt failed direct issue to dispatch list · c616cbee
      Jens Axboe 提交于
      After the direct dispatch corruption fix, we permanently disallow direct
      dispatch of non read/write requests. This works fine off the normal IO
      path, as they will be retried like any other failed direct dispatch
      request. But for the blk_insert_cloned_request() that only DM uses to
      bypass the bottom level scheduler, we always first attempt direct
      dispatch. For some types of requests, that's now a permanent failure,
      and no amount of retrying will make that succeed. This results in a
      livelock.
      
      Instead of making special cases for what we can direct issue, and now
      having to deal with DM solving the livelock while still retaining a BUSY
      condition feedback loop, always just add a request that has been through
      ->queue_rq() to the hardware queue dispatch list. These are safe to use
      as no merging can take place there. Additionally, if requests do have
      prepped data from drivers, we aren't dependent on them not sharing space
      in the request structure to safely add them to the IO scheduler lists.
      
      This basically reverts ffe81d45 and is based on a patch from Ming,
      but with the list insert case covered as well.
      
      Fixes: ffe81d45 ("blk-mq: fix corruption with direct issue")
      Cc: stable@vger.kernel.org
      Suggested-by: NMing Lei <ming.lei@redhat.com>
      Reported-by: NBart Van Assche <bvanassche@acm.org>
      Tested-by: NMing Lei <ming.lei@redhat.com>
      Acked-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      c616cbee
  17. 05 12月, 2018 3 次提交
  18. 04 12月, 2018 1 次提交
  19. 30 11月, 2018 4 次提交
  20. 29 11月, 2018 1 次提交
  21. 28 11月, 2018 1 次提交
  22. 27 11月, 2018 3 次提交
  23. 26 11月, 2018 3 次提交