- 05 12月, 2020 2 次提交
-
-
由 Christoph Hellwig 提交于
The request_queue can trivially be derived from the request. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDamien Le Moal <damien.lemoal@wdc.com> Reviewed-by: NHannes Reinecke <hare@suse.de> Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Acked-by: NTejun Heo <tj@kernel.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
The block_bio_merge tracepoint class can be reused for most bio-based tracepoints. For that it just needs to lose the superfluous q and rq parameters. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NDamien Le Moal <damien.lemoal@wdc.com> Reviewed-by: NHannes Reinecke <hare@suse.de> Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Acked-by: NTejun Heo <tj@kernel.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 03 12月, 2020 1 次提交
-
-
由 Jeffle Xu 提交于
The inflight of partition 0 doesn't include inflight IOs to all sub-partitions, since currently mq calculates inflight of specific partition by simply camparing the value of the partition pointer. Thus the following case is possible: $ cat /sys/block/vda/inflight 0 0 $ cat /sys/block/vda/vda1/inflight 0 128 While single queue device (on a previous version, e.g. v3.10) has no this issue: $cat /sys/block/sda/sda3/inflight 0 33 $cat /sys/block/sda/inflight 0 33 Partition 0 should be specially handled since it represents the whole disk. This issue is introduced since commit bf0ddaba ("blk-mq: fix sysfs inflight counter"). Besides, this patch can also fix the inflight statistics of part 0 in /proc/diskstats. Before this patch, the inflight statistics of part 0 doesn't include that of sub partitions. (I have marked the 'inflight' field with asterisk.) $cat /proc/diskstats 259 0 nvme0n1 45974469 0 367814768 6445794 1 0 1 0 *0* 111062 6445794 0 0 0 0 0 0 259 2 nvme0n1p1 45974058 0 367797952 6445727 0 0 0 0 *33* 111001 6445727 0 0 0 0 0 0 This is introduced since commit f299b7c7 ("blk-mq: provide internal in-flight variant"). Fixes: bf0ddaba ("blk-mq: fix sysfs inflight counter") Fixes: f299b7c7 ("blk-mq: provide internal in-flight variant") Signed-off-by: NJeffle Xu <jefflexu@linux.alibaba.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> [axboe: adapt for 5.11 partition change] Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 02 12月, 2020 1 次提交
-
-
由 Christoph Hellwig 提交于
Use struct block_device to lookup partitions on a disk. This removes all usage of struct hd_struct from the I/O path. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NJan Kara <jack@suse.cz> Reviewed-by: NHannes Reinecke <hare@suse.de> Acked-by: Coly Li <colyli@suse.de> [bcache] Acked-by: Chao Yu <yuchao0@huawei.com> [f2fs] Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 24 10月, 2020 1 次提交
-
-
由 Mauro Carvalho Chehab 提交于
Fix a typo: blk_mq_run_hw_queue -> blk_mq_run_hw_queues Signed-off-by: NMauro Carvalho Chehab <mchehab+huawei@kernel.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 20 10月, 2020 1 次提交
-
-
由 Xianting Tian 提交于
We don't need to check whether the node is memoryless numa node before calling allocator interface. SLUB(and SLAB,SLOB) relies on the page allocator to pick a node. Page allocator should deal with memoryless nodes just fine. It has zonelists constructed for each possible nodes. And it will automatically fall back into a node which is closest to the requested node. As long as __GFP_THISNODE is not enforced of course. The code comments of kmem_cache_alloc_node() of SLAB also showed this: * Fallback to other node is possible if __GFP_THISNODE is not set. blk-mq code doesn't set __GFP_THISNODE, so we can remove the calling of local_memory_node(). Signed-off-by: NXianting Tian <tian.xianting@h3c.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 10 10月, 2020 1 次提交
-
-
由 Yufen Yu 提交于
We have introduced helper function blk_mq_hctx_stopped() to test BLK_MQ_S_STOPPED. Signed-off-by: NYufen Yu <yuyufen@huawei.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 08 10月, 2020 1 次提交
-
-
由 Mike Snitzer 提交于
It is unnecessary to force request-based DM to call into bio-based dm_submit_bio (via indirect disk->fops->submit_bio) only to have it then call blk_mq_submit_bio(). Fix this by establishing a request-based DM block_device_operations (dm_rq_blk_dops, which doesn't have .submit_bio) and update dm_setup_md_queue() to set md->disk->fops to it for DM_TYPE_REQUEST_BASED. Remove DM_TYPE_REQUEST_BASED conditional in dm_submit_bio and unexport blk_mq_submit_bio. Fixes: c62b37d9 ("block: move ->make_request_fn to struct block_device_operations") Signed-off-by: NMike Snitzer <snitzer@redhat.com>
-
- 07 10月, 2020 1 次提交
-
-
由 Gabriel Krisman Bertazi 提交于
According to Documentation/block/stat.rst, inflight should not include I/O requests that are in the queue but not yet dispatched to the device, but blk-mq identifies as inflight any request that has a tag allocated, which, for queues without elevator, happens at request allocation time and before it is queued in the ctx (default case in blk_mq_submit_bio). In addition, current behavior is different for queues with elevator from queues without it, since for the former the driver tag is allocated at dispatch time. A more precise approach would be to only consider requests with state MQ_RQ_IN_FLIGHT. This effectively reverts commit 6131837b ("blk-mq: count allocated but not started requests in iostats inflight") to consolidate blk-mq behavior with itself (elevator case) and with original documentation, but it differs from the behavior used by the legacy path. This version differs from v1 by using blk_mq_rq_state to access the state attribute. Avoid using blk_mq_request_started, which was suggested, since we don't want to include MQ_RQ_COMPLETE. Signed-off-by: NGabriel Krisman Bertazi <krisman@collabora.com> Cc: Omar Sandoval <osandov@fb.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 06 10月, 2020 1 次提交
-
-
由 Eric Biggers 提交于
blk_crypto_rq_bio_prep() assumes its gfp_mask argument always includes __GFP_DIRECT_RECLAIM, so that the mempool_alloc() will always succeed. However, blk_crypto_rq_bio_prep() might be called with GFP_ATOMIC via setup_clone() in drivers/md/dm-rq.c. This case isn't currently reachable with a bio that actually has an encryption context. However, it's fragile to rely on this. Just make blk_crypto_rq_bio_prep() able to fail. Suggested-by: NSatya Tangirala <satyat@google.com> Signed-off-by: NEric Biggers <ebiggers@google.com> Reviewed-by: NMike Snitzer <snitzer@redhat.com> Reviewed-by: NSatya Tangirala <satyat@google.com> Cc: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 29 9月, 2020 1 次提交
-
-
由 yangerkun 提交于
Blk-mq should call commit_rqs once 'bd.last != true' and no more request will come(so virtscsi can kick the virtqueue, e.g.). We already do that in 'blk_mq_dispatch_rq_list/blk_mq_try_issue_list_directly' while list not empty and 'queued > 0'. However, we can seen the same scene once the last request in list call queue_rq and return error like BLK_STS_IOERR which will not requeue the request, and lead that list empty but need call commit_rqs too(Or the request for virtscsi will stay timeout until other request kick virtqueue). We found this problem by do fsstress test with offline/online virtscsi device repeat quickly. Fixes: d666ba98 ("blk-mq: add mq_ops->commit_rqs()") Reported-by: Nzhangyi (F) <yi.zhang@huawei.com> Signed-off-by: Nyangerkun <yangerkun@huawei.com> Reviewed-by: NMing Lei <ming.lei@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 28 9月, 2020 1 次提交
-
-
由 Xianting Tian 提交于
We found blk_mq_alloc_rq_maps() takes more time in kernel space when testing nvme device hot-plugging. The test and anlysis as below. Debug code, 1, blk_mq_alloc_rq_maps(): u64 start, end; depth = set->queue_depth; start = ktime_get_ns(); pr_err("[%d:%s switch:%ld,%ld] queue depth %d, nr_hw_queues %d\n", current->pid, current->comm, current->nvcsw, current->nivcsw, set->queue_depth, set->nr_hw_queues); do { err = __blk_mq_alloc_rq_maps(set); if (!err) break; set->queue_depth >>= 1; if (set->queue_depth < set->reserved_tags + BLK_MQ_TAG_MIN) { err = -ENOMEM; break; } } while (set->queue_depth); end = ktime_get_ns(); pr_err("[%d:%s switch:%ld,%ld] all hw queues init cost time %lld ns\n", current->pid, current->comm, current->nvcsw, current->nivcsw, end - start); 2, __blk_mq_alloc_rq_maps(): u64 start, end; for (i = 0; i < set->nr_hw_queues; i++) { start = ktime_get_ns(); if (!__blk_mq_alloc_rq_map(set, i)) goto out_unwind; end = ktime_get_ns(); pr_err("hw queue %d init cost time %lld ns\n", i, end - start); } Test nvme hot-plugging with above debug code, we found it totally cost more than 3ms in kernel space without being scheduled out when alloc rqs for all 16 hw queues with depth 1023, each hw queue cost about 140-250us. The cost time will be increased with hw queue number and queue depth increasing. And in an extreme case, if __blk_mq_alloc_rq_maps() returns -ENOMEM, it will try "queue_depth >>= 1", more time will be consumed. [ 428.428771] nvme nvme0: pci function 10000:01:00.0 [ 428.428798] nvme 10000:01:00.0: enabling device (0000 -> 0002) [ 428.428806] pcieport 10000:00:00.0: can't derive routing for PCI INT A [ 428.428809] nvme 10000:01:00.0: PCI INT A: no GSI [ 432.593374] [4688:kworker/u33:8 switch:663,2] queue depth 30, nr_hw_queues 1 [ 432.593404] hw queue 0 init cost time 22883 ns [ 432.593408] [4688:kworker/u33:8 switch:663,2] all hw queues init cost time 35960 ns [ 432.595953] nvme nvme0: 16/0/0 default/read/poll queues [ 432.595958] [4688:kworker/u33:8 switch:700,2] queue depth 1023, nr_hw_queues 16 [ 432.596203] hw queue 0 init cost time 242630 ns [ 432.596441] hw queue 1 init cost time 235913 ns [ 432.596659] hw queue 2 init cost time 216461 ns [ 432.596877] hw queue 3 init cost time 215851 ns [ 432.597107] hw queue 4 init cost time 228406 ns [ 432.597336] hw queue 5 init cost time 227298 ns [ 432.597564] hw queue 6 init cost time 224633 ns [ 432.597785] hw queue 7 init cost time 219954 ns [ 432.597937] hw queue 8 init cost time 150930 ns [ 432.598082] hw queue 9 init cost time 143496 ns [ 432.598231] hw queue 10 init cost time 147261 ns [ 432.598397] hw queue 11 init cost time 164522 ns [ 432.598542] hw queue 12 init cost time 143401 ns [ 432.598692] hw queue 13 init cost time 148934 ns [ 432.598841] hw queue 14 init cost time 147194 ns [ 432.598991] hw queue 15 init cost time 148942 ns [ 432.598993] [4688:kworker/u33:8 switch:700,2] all hw queues init cost time 3035099 ns [ 432.602611] nvme0n1: p1 So use this patch to trigger schedule between each hw queue init, to avoid other threads getting stuck. It is not in atomic context when executing __blk_mq_alloc_rq_maps(), so it is safe to call cond_resched(). Signed-off-by: NXianting Tian <tian.xianting@h3c.com> Reviewed-by: NBart Van Assche <bvanassche@acm.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 11 9月, 2020 1 次提交
-
-
由 Ming Lei 提交于
NVMe shares tagset between fabric queue and admin queue or between connect_q and NS queue, so hctx_may_queue() can be called to allocate request for these queues. Tags can be reserved in these tagset. Before error recovery, there is often lots of in-flight requests which can't be completed, and new reserved request may be needed in error recovery path. However, hctx_may_queue() can always return false because there is too many in-flight requests which can't be completed during error handling. Finally, nothing can proceed. Fix this issue by always allowing reserved tag allocation in hctx_may_queue(). This is reasonable because reserved tags are supposed to always be available. Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NHannes Reinecke <hare@suse.de> Cc: David Milburn <dmilburn@redhat.com> Cc: Ewan D. Milne <emilne@redhat.com> Signed-off-by: NMing Lei <ming.lei@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 04 9月, 2020 8 次提交
-
-
由 Kashyap Desai 提交于
High CPU utilization on "native_queued_spin_lock_slowpath" due to lock contention is possible for mq-deadline and bfq IO schedulers when nr_hw_queues is more than one. It is because kblockd work queue can submit IO from all online CPUs (through blk_mq_run_hw_queues()) even though only one hctx has pending commands. The elevator callback .has_work for mq-deadline and bfq scheduler considers pending work if there are any IOs on request queue but it does not account hctx context. Add a per-hctx 'elevator_queued' count to the hctx to avoid triggering the elevator even though there are no requests queued. [jpg: Relocated atomic_dec() in dd_dispatch_request(), update commit message per Kashyap] Signed-off-by: NKashyap Desai <kashyap.desai@broadcom.com> Signed-off-by: NHannes Reinecke <hare@suse.de> Signed-off-by: NJohn Garry <john.garry@huawei.com> Tested-by: NDouglas Gilbert <dgilbert@interlog.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 John Garry 提交于
For when using a shared sbitmap, no longer should the number of active request queues per hctx be relied on for when judging how to share the tag bitmap. Instead maintain the number of active request queues per tag_set, and make the judgement based on that. Originally-from: Kashyap Desai <kashyap.desai@broadcom.com> Signed-off-by: NJohn Garry <john.garry@huawei.com> Tested-by: Don Brace<don.brace@microsemi.com> #SCSI resv cmds patches used Tested-by: NDouglas Gilbert <dgilbert@interlog.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 John Garry 提交于
The per-hctx nr_active value can no longer be used to fairly assign a share of tag depth per request queue for when using a shared sbitmap, as it does not consider that the tags are shared tags over all hctx's. For this case, record the nr_active_requests per request_queue, and make the judgement based on that value. Co-developed-with: Kashyap Desai <kashyap.desai@broadcom.com> Signed-off-by: NJohn Garry <john.garry@huawei.com> Tested-by: Don Brace<don.brace@microsemi.com> #SCSI resv cmds patches used Tested-by: NDouglas Gilbert <dgilbert@interlog.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 John Garry 提交于
Some SCSI HBAs (such as HPSA, megaraid, mpt3sas, hisi_sas_v3 ..) support multiple reply queues with single hostwide tags. In addition, these drivers want to use interrupt assignment in pci_alloc_irq_vectors(PCI_IRQ_AFFINITY). However, as discussed in [0], CPU hotplug may cause in-flight IO completion to not be serviced when an interrupt is shutdown. That problem is solved in commit bf0beec0 ("blk-mq: drain I/O when all CPUs in a hctx are offline"). However, to take advantage of that blk-mq feature, the HBA HW queuess are required to be mapped to that of the blk-mq hctx's; to do that, the HBA HW queues need to be exposed to the upper layer. In making that transition, the per-SCSI command request tags are no longer unique per Scsi host - they are just unique per hctx. As such, the HBA LLDD would have to generate this tag internally, which has a certain performance overhead. However another problem is that blk-mq assumes the host may accept (Scsi_host.can_queue * #hw queue) commands. In commit 6eb045e0 ("scsi: core: avoid host-wide host_busy counter for scsi_mq"), the Scsi host busy counter was removed, which would stop the LLDD being sent more than .can_queue commands; however, it should still be ensured that the block layer does not issue more than .can_queue commands to the Scsi host. To solve this problem, introduce a shared sbitmap per blk_mq_tag_set, which may be requested at init time. New flag BLK_MQ_F_TAG_HCTX_SHARED should be set when requesting the tagset to indicate whether the shared sbitmap should be used. Even when BLK_MQ_F_TAG_HCTX_SHARED is set, a full set of tags and requests are still allocated per hctx; the reason for this is that if tags and requests were only allocated for a single hctx - like hctx0 - it may break block drivers which expect a request be associated with a specific hctx, i.e. not always hctx0. This will introduce extra memory usage. This change is based on work originally from Ming Lei in [1] and from Bart's suggestion in [2]. [0] https://lore.kernel.org/linux-block/alpine.DEB.2.21.1904051331270.1802@nanos.tec.linutronix.de/ [1] https://lore.kernel.org/linux-block/20190531022801.10003-1-ming.lei@redhat.com/ [2] https://lore.kernel.org/linux-block/ff77beff-5fd9-9f05-12b6-826922bace1f@huawei.com/T/#m3db0a602f095cbcbff27e9c884d6b4ae826144beSigned-off-by: NJohn Garry <john.garry@huawei.com> Tested-by: Don Brace<don.brace@microsemi.com> #SCSI resv cmds patches used Tested-by: NDouglas Gilbert <dgilbert@interlog.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 John Garry 提交于
Introduce pointers for the blk_mq_tags regular and reserved bitmap tags, with the goal of later being able to use a common shared tag bitmap across all HW contexts in a set. Signed-off-by: NJohn Garry <john.garry@huawei.com> Tested-by: Don Brace<don.brace@microsemi.com> #SCSI resv cmds patches used Tested-by: NDouglas Gilbert <dgilbert@interlog.com> Reviewed-by: NHannes Reinecke <hare@suse.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 John Garry 提交于
Pass hctx/tagset flags argument down to blk_mq_init_tags() and blk_mq_free_tags() for selective init/free. For now, make it include the alloc policy flag, which can be evaluated when needed (in blk_mq_init_tags()). Signed-off-by: NJohn Garry <john.garry@huawei.com> Tested-by: NDouglas Gilbert <dgilbert@interlog.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Hannes Reinecke 提交于
The function does not set the depth, but rather transitions from shared to non-shared queues and vice versa. So rename it to blk_mq_update_tag_set_shared() to better reflect its purpose. [jpg: take out some unrelated changes in blk_mq_init_bitmap_tags()] Signed-off-by: NHannes Reinecke <hare@suse.de> Signed-off-by: NJohn Garry <john.garry@huawei.com> Tested-by: NDouglas Gilbert <dgilbert@interlog.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Ming Lei 提交于
BLK_MQ_F_TAG_SHARED actually means that tags is shared among request queues, all of which should belong to LUNs attached to same HBA. So rename it to make the point explicitly. [jpg: rebase a few times, add rnbd-clt.c change] Suggested-by: NBart Van Assche <bvanassche@acm.org> Signed-off-by: NMing Lei <ming.lei@redhat.com> Signed-off-by: NJohn Garry <john.garry@huawei.com> Tested-by: NDouglas Gilbert <dgilbert@interlog.com> Reviewed-by: NHannes Reinecke <hare@suse.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 22 8月, 2020 1 次提交
-
-
由 Ming Lei 提交于
c616cbee ("blk-mq: punt failed direct issue to dispatch list") supposed to add request which has been through ->queue_rq() to the hw queue dispatch list, however it adds request running out of budget or driver tag to hw queue too. This way basically bypasses request merge, and causes too many request dispatched to LLD, and system% is unnecessary increased. Fixes this issue by adding request not through ->queue_rq into sw/scheduler queue, and this way is safe because no ->queue_rq is called on this request yet. High %system can be observed on Azure storvsc device, and even soft lock is observed. This patch reduces %system during heavy sequential IO, meantime decreases soft lockup risk. Fixes: c616cbee ("blk-mq: punt failed direct issue to dispatch list") Signed-off-by: NMing Lei <ming.lei@redhat.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Bart Van Assche <bvanassche@acm.org> Cc: Mike Snitzer <snitzer@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 17 8月, 2020 2 次提交
-
-
由 Ming Lei 提交于
SCHED_RESTART code path is relied to re-run queue for dispatch requests in hctx->dispatch. Meantime the SCHED_RSTART flag is checked when adding requests to hctx->dispatch. memory barriers have to be used for ordering the following two pair of OPs: 1) adding requests to hctx->dispatch and checking SCHED_RESTART in blk_mq_dispatch_rq_list() 2) clearing SCHED_RESTART and checking if there is request in hctx->dispatch in blk_mq_sched_restart(). Without the added memory barrier, either: 1) blk_mq_sched_restart() may miss requests added to hctx->dispatch meantime blk_mq_dispatch_rq_list() observes SCHED_RESTART, and not run queue in dispatch side or 2) blk_mq_dispatch_rq_list still sees SCHED_RESTART, and not run queue in dispatch side, meantime checking if there is request in hctx->dispatch from blk_mq_sched_restart() is missed. IO hang in ltp/fs_fill test is reported by kernel test robot: https://lkml.org/lkml/2020/7/26/77 Turns out it is caused by the above out-of-order OPs. And the IO hang can't be observed any more after applying this patch. Fixes: bd166ef1 ("blk-mq-sched: add framework for MQ capable IO schedulers") Reported-by: Nkernel test robot <rong.a.chen@intel.com> Signed-off-by: NMing Lei <ming.lei@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Cc: Bart Van Assche <bvanassche@acm.org> Cc: Christoph Hellwig <hch@lst.de> Cc: David Jeffery <djeffery@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Randy Dunlap 提交于
Fix a kernel-doc warning in block/blk-mq.c: ../block/blk-mq.c:1844: warning: Function parameter or member 'at_head' not described in 'blk_mq_request_bypass_insert' Fixes: 01e99aec ("blk-mq: insert passthrough request into hctx->dispatch directly") Signed-off-by: NRandy Dunlap <rdunlap@infradead.org> Cc: André Almeida <andrealmeid@collabora.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Ming Lei <ming.lei@redhat.com> Cc: linux-block@vger.kernel.org Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 01 8月, 2020 1 次提交
-
-
由 Randy Dunlap 提交于
Drop the repeated word "the". Signed-off-by: NRandy Dunlap <rdunlap@infradead.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: linux-block@vger.kernel.org Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 28 7月, 2020 1 次提交
-
-
由 Daniel Wagner 提交于
tag_set_list is only accessed under the tag_set_lock lock. There is no need for using the _rcu list functions. The _rcu list function were introduced to allow read access to the tag_set_list protected under RCU, see 705cda97 ("blk-mq: Make it safe to use RCU to iterate over blk_mq_tag_set.tag_list") and 05b79413 ("Revert "blk-mq: don't handle TAG_SHARED in restart""). Those changes got reverted later but the cleanup commit missed a couple of places to undo the changes. Fixes: 97889f9a ("blk-mq: remove synchronize_rcu() from blk_mq_del_queue_tag_set()" Signed-off-by: NDaniel Wagner <dwagner@suse.de> Reviewed-by: NHannes Reinecke <hare@suse.de> Cc: Ming Lei <ming.lei@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 10 7月, 2020 1 次提交
-
-
由 Baolin Wang 提交于
We've already validated the 'q->elevator' before calling ->ops.completed_request() in blk_mq_sched_completed_request(), thus no need to validate rq->internal_tag again. Rmove it. Signed-off-by: NBaolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: NMing Lei <ming.lei@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 09 7月, 2020 2 次提交
-
-
由 Ming Lei 提交于
Move .nr_active update and request assignment into blk_mq_get_driver_tag(), all are good to do during getting driver tag. Meantime blk-flush related code is simplified and flush request needn't to update the request table manually any more. Signed-off-by: NMing Lei <ming.lei@redhat.com> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Ming Lei 提交于
Current handling of q->mq_ops->queue_rq result is a bit ugly: - two branches which needs to 'continue' have to check if the dispatch local list is empty, otherwise one bad request may be retrieved via 'rq = list_first_entry(list, struct request, queuelist);' - the branch of 'if (unlikely(ret != BLK_STS_OK))' isn't easy to follow, since it is actually one error branch. Streamline this handling, so the code becomes more readable, meantime potential kernel oops can be avoided in case that the last request in local dispatch list is failed. Fixes: fc17b653 ("blk-mq: switch ->queue_rq return value to blk_status_t") Signed-off-by: NMing Lei <ming.lei@redhat.com> Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 07 7月, 2020 1 次提交
-
-
由 Ming Lei 提交于
dm-multipath is the only user of blk_mq_queue_inflight(). When dm-multipath calls blk_mq_queue_inflight() to check if it has outstanding IO it can get a false negative. The reason for this is blk_mq_rq_inflight() doesn't consider requests that are no longer MQ_RQ_IN_FLIGHT but that are now MQ_RQ_COMPLETE (->complete isn't called or finished yet) as "inflight". This causes request-based dm-multipath's dm_wait_for_completion() to return before all outstanding dm-multipath requests have actually completed. This breaks DM multipath's suspend functionality because blk-mq requests complete after DM's suspend has finished -- which shouldn't happen. Fix this by considering any request not in the MQ_RQ_IDLE state (so either MQ_RQ_COMPLETE or MQ_RQ_IN_FLIGHT) as "inflight" in blk_mq_rq_inflight(). Fixes: 3c94d83c ("blk-mq: change blk_mq_queue_busy() to blk_mq_queue_inflight()") Signed-off-by: NMing Lei <ming.lei@redhat.com> Signed-off-by: NMike Snitzer <snitzer@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 02 7月, 2020 1 次提交
-
-
由 Jens Axboe 提交于
This reverts commits the following commits: 37f4a24c 723bf178 36a3df5a The last one is the culprit, but we have to go a bit deeper to get this to revert cleanly. There's been a report that this breaks some MMC setups [1], and also causes an issue with swap [2]. Until this can be figured out, revert the offending commits. [1] https://lore.kernel.org/linux-block/57fb09b1-54ba-f3aa-f82c-d709b0e6b281@samsung.com/ [2] https://lore.kernel.org/linux-block/20200702043721.GA1087@lca.pw/Reported-by: NMarek Szyprowski <m.szyprowski@samsung.com> Reported-by: NQian Cai <cai@lca.pw> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 01 7月, 2020 5 次提交
-
-
由 Christoph Hellwig 提交于
The make_request_fn is a little weird in that it sits directly in struct request_queue instead of an operation vector. Replace it with a block_device_operations method called submit_bio (which describes much better what it does). Also remove the request_queue argument to it, as the queue can be derived pretty trivially from the bio. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
The queue can be trivially derived from the bio, so pass one less argument. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Ming Lei 提交于
Move .nr_active update and request assignment into blk_mq_get_driver_tag(), all are good to do during getting driver tag. Meantime blk-flush related code is simplified and flush request needn't to update the request table manually any more. Signed-off-by: NMing Lei <ming.lei@redhat.com> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Ming Lei 提交于
It is used by blk-mq.c only, so move it to the source file. Suggested-by: NChristoph Hellwig <hch@infradead.org> Signed-off-by: NMing Lei <ming.lei@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NHannes Reinecke <hare@suse.de> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Ming Lei 提交于
blk_mq_get_driver_tag() is only used by blk-mq.c and is supposed to stay in blk-mq.c, so move it and preparing for cleanup code of get/put driver tag. Meantime hctx_may_queue() is moved to header file and it is fine since it is defined as inline always. No functional change. Signed-off-by: NMing Lei <ming.lei@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NHannes Reinecke <hare@suse.de> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 30 6月, 2020 4 次提交
-
-
由 Ming Lei 提交于
More and more drivers want to get batching requests queued from block layer, such as mmc, and tcp based storage drivers. Also current in-tree users have virtio-scsi, virtio-blk and nvme. For none, we already support batching dispatch. But for io scheduler, every time we just take one request from scheduler and pass the single request to blk_mq_dispatch_rq_list(). This way makes batching dispatch not possible when io scheduler is applied. One reason is that we don't want to hurt sequential IO performance, becasue IO merge chance is reduced if more requests are dequeued from scheduler queue. Try to support batching dispatch for io scheduler by starting with the following simple approach: 1) still make sure we can get budget before dequeueing request 2) use hctx->dispatch_busy to evaluate if queue is busy, if it is busy we fackback to non-batching dispatch, otherwise dequeue as many as possible requests from scheduler, and pass them to blk_mq_dispatch_rq_list(). Wrt. 2), we use similar policy for none, and turns out that SCSI SSD performance got improved much. In future, maybe we can develop more intelligent algorithem for batching dispatch. Baolin has tested this patch and found that MMC performance is improved[3]. [1] https://lore.kernel.org/linux-block/20200512075501.GF1531898@T590/#r [2] https://lore.kernel.org/linux-block/fe6bd8b9-6ed9-b225-f80c-314746133722@grimberg.me/ [3] https://lore.kernel.org/linux-block/CADBw62o9eTQDJ9RvNgEqSpXmg6Xcq=2TxH0Hfxhp29uF2W=TXA@mail.gmail.com/Signed-off-by: NMing Lei <ming.lei@redhat.com> Tested-by: NBaolin Wang <baolin.wang7@gmail.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Cc: Sagi Grimberg <sagi@grimberg.me> Cc: Baolin Wang <baolin.wang7@gmail.com> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Ming Lei 提交于
Pass obtained budget count to blk_mq_dispatch_rq_list(), and prepare for supporting fully batching submission. With the obtained budget count, it is easier to put extra budgets in case of .queue_rq failure. Meantime remove the old 'got_budget' parameter. Signed-off-by: NMing Lei <ming.lei@redhat.com> Tested-by: NBaolin Wang <baolin.wang7@gmail.com> Reviewed-by: NChristoph Hellwig <hch@infradead.org> Cc: Sagi Grimberg <sagi@grimberg.me> Cc: Baolin Wang <baolin.wang7@gmail.com> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Ming Lei 提交于
When BLK_STS_RESOURCE or BLK_STS_DEV_RESOURCE is returned from .queue_rq, the 'list' variable always holds this rq which isn't queued to LLD successfully. So blk_mq_dispatch_rq_list() always returns false from the branch of '!list_empty(list)'. No functional change. Signed-off-by: NMing Lei <ming.lei@redhat.com> Tested-by: NBaolin Wang <baolin.wang7@gmail.com> Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Reviewed-by: NChristoph Hellwig <hch@infradead.org> Cc: Sagi Grimberg <sagi@grimberg.me> Cc: Baolin Wang <baolin.wang7@gmail.com> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Ming Lei 提交于
Move code for getting driver tag and budget into one helper, so blk_mq_dispatch_rq_list gets a bit simplified, and easier to read. Meantime move updating of 'no_tag' and 'no_budget_available' into the branch for handling partial dispatch because that is exactly consumer of the two local variables. Also rename the parameter of 'got_budget' as 'ask_budget'. No functional change. Signed-off-by: NMing Lei <ming.lei@redhat.com> Tested-by: NBaolin Wang <baolin.wang7@gmail.com> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Reviewed-by: NChristoph Hellwig <hch@infradead.org> Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com> Cc: Sagi Grimberg <sagi@grimberg.me> Cc: Baolin Wang <baolin.wang7@gmail.com> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-