- 09 11月, 2021 1 次提交
-
-
由 Ming Lei 提交于
Some drivers(NVMe, SCSI) need to call quiesce and unquiesce in pair, but it is hard to switch to this style, so these drivers need one atomic flag for helping to balance quiesce and unquiesce. When quiesce is in-progress, the driver still needs to wait until the quiesce is done, so add API of blk_mq_wait_quiesce_done() for these drivers. Signed-off-by: NMing Lei <ming.lei@redhat.com> Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20211109071144.181581-2-ming.lei@redhat.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
- 22 10月, 2021 1 次提交
-
-
由 Eric Biggers 提交于
blk_keyslot_manager is misnamed because it doesn't necessarily manage keyslots. It actually does several different things: - Contains the crypto capabilities of the device. - Provides functions to control the inline encryption hardware. Originally these were just for programming/evicting keyslots; however, new functionality (hardware-wrapped keys) will require new functions here which are unrelated to keyslots. Moreover, device-mapper devices already (ab)use "keyslot_evict" to pass key eviction requests to their underlying devices even though device-mapper devices don't have any keyslots themselves (so it really should be "evict_key", not "keyslot_evict"). - Sometimes (but not always!) it manages keyslots. Originally it always did, but device-mapper devices don't have keyslots themselves, so they use a "passthrough keyslot manager" which doesn't actually manage keyslots. This hack works, but the terminology is unnatural. Also, some hardware doesn't have keyslots and thus also uses a "passthrough keyslot manager" (support for such hardware is yet to be upstreamed, but it will happen eventually). Let's stop having keyslot managers which don't actually manage keyslots. Instead, rename blk_keyslot_manager to blk_crypto_profile. This is a fairly big change, since for consistency it also has to update keyslot manager-related function names, variable names, and comments -- not just the actual struct name. However it's still a fairly straightforward change, as it doesn't change any actual functionality. Acked-by: Ulf Hansson <ulf.hansson@linaro.org> # For MMC Reviewed-by: NMike Snitzer <snitzer@redhat.com> Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com> Signed-off-by: NEric Biggers <ebiggers@google.com> Link: https://lore.kernel.org/r/20211018180453.40441-4-ebiggers@kernel.orgSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
- 20 10月, 2021 1 次提交
-
-
由 Christoph Hellwig 提交于
This helper is internal to the block layer. Signed-off-by: NChristoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211020144119.142582-3-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
- 19 10月, 2021 4 次提交
-
-
由 Jens Axboe 提交于
This is in the fast path of driver issue or completion, and it's a single array index operation. Move it inline to avoid a function call for it. This does mean making struct blk_mq_tags block layer public, but there's not really much in there. Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
Instead of calling blk_mq_end_request() on a single request, add a helper that takes the new struct io_comp_batch and completes any request stored in there. Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
struct io_comp_batch contains a list head and a completion handler, which will allow completions to more effciently completed batches of IO. For now, no functional changes in this patch, we just define the io_comp_batch structure and add the argument to the file_operations iopoll handler. Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
Just like the blk_mq_ctx counterparts, we've got a bunch of counters in here that are only for debugfs and are of questionnable value. They are: - dispatched, index of how many requests were dispatched in one go - poll_{considered,invoked,success}, which track poll sucess rates. We're confident in the iopoll implementation at this point, don't bother tracking these. As a bonus, this shrinks each hardware queue from 576 bytes to 512 bytes, dropping a whole cacheline. Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 18 10月, 2021 9 次提交
-
-
由 Jens Axboe 提交于
Add an rq private RQF_ELV flag, which tells the block layer that this request was initialized on a queue that has an IO scheduler attached. This allows for faster checking in the fast path, rather than having to deference rq->q later on. Elevator switching does full quiesce of the queue before detaching an IO scheduler, so it's safe to cache this in the request itself. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
It's been a while since this was analyzed, move some members around to better flow with the use case. Initial state up top, and queued state after that. This improves my peak case by about 1.5%, from 7750K to 7900K IOPS. Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
Replace the blk_poll interface that requires the caller to keep a queue and cookie from the submissions with polling based on the bio. Polling for the bio itself leads to a few advantages: - the cookie construction can made entirely private in blk-mq.c - the caller does not need to remember the request_queue and cookie separately and thus sidesteps their lifetime issues - keeping the device and the cookie inside the bio allows to trivially support polling BIOs remapping by stacking drivers - a lot of code to propagate the cookie back up the submission path can be removed entirely. Signed-off-by: NChristoph Hellwig <hch@lst.de> Tested-by: NMark Wunderlich <mark.wunderlich@intel.com> Link: https://lore.kernel.org/r/20211012111226.760968-15-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
Fold bio_cur_bytes into the only caller. Signed-off-by: NChristoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211012161804.991559-5-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Jens Axboe 提交于
The caller typically has a good (or even exact) idea of how many requests it needs to submit. We can make the request/tag allocation a lot more efficient if we just allocate N requests/tags upfront when we queue the first bio from the batch. Provide a new plug start helper that allows the caller to specify how many IOs are expected. This sets plug->nr_ios, and we can use that for smarter request allocation. The plug provides a holding spot for requests, and request allocation will check it before calling into the normal request allocation path. The blk_finish_plug() is called, check if there are unused requests and free them. This should not happen in normal operations. The exception is if we get merging, then we may be left with requests that need freeing when done. This raises the per-core performance on my setup from ~5.8M to ~6.1M IOPS. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 John Garry 提交于
Now that shared sbitmap support really means shared tags, rename symbols to match that. Signed-off-by: NJohn Garry <john.garry@huawei.com> Link: https://lore.kernel.org/r/1633429419-228500-15-git-send-email-john.garry@huawei.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 John Garry 提交于
Currently we use separate sbitmap pairs and active_queues atomic_t for shared sbitmap support. However a full sets of static requests are used per HW queue, which is quite wasteful, considering that the total number of requests usable at any given time across all HW queues is limited by the shared sbitmap depth. As such, it is considerably more memory efficient in the case of shared sbitmap to allocate a set of static rqs per tag set or request queue, and not per HW queue. So replace the sbitmap pairs and active_queues atomic_t with a shared tags per tagset and request queue, which will hold a set of shared static rqs. Since there is now no valid HW queue index to be passed to the blk_mq_ops .init and .exit_request callbacks, pass an invalid index token. This changes the semantics of the APIs, such that the callback would need to validate the HW queue index before using it. Currently no user of shared sbitmap actually uses the HW queue index (as would be expected). Signed-off-by: NJohn Garry <john.garry@huawei.com> Reviewed-by: NMing Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/1633429419-228500-13-git-send-email-john.garry@huawei.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 John Garry 提交于
It is a bit confusing that there is BLKDEV_MAX_RQ and MAX_SCHED_RQ, as the name BLKDEV_MAX_RQ would imply the max requests always, which it is not. Rename to BLKDEV_MAX_RQ to BLKDEV_DEFAULT_RQ, matching its usage - that being the default number of requests assigned when allocating a request queue. Signed-off-by: NJohn Garry <john.garry@huawei.com> Reviewed-by: NMing Lei <ming.lei@redhat.com> Reviewed-by: NHannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/1633429419-228500-3-git-send-email-john.garry@huawei.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
struct request is only used by blk-mq drivers, so move it and all related declarations to blk-mq.h. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NJohannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20210920123328.1399408-18-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
- 24 8月, 2021 1 次提交
-
-
由 Christoph Hellwig 提交于
Pass the lockdep name to the low-level __blk_alloc_disk helper and hardcode the name for it given that the number of minors or node_id are not very useful information. While this passes a pointless argument for non-lockdep builds that is not really an issue as disk allocation is a probe time only slow path. Signed-off-by: NChristoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210816131910.615153-5-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
- 06 8月, 2021 1 次提交
-
-
由 Bart Van Assche 提交于
elevator_get_default() uses the following algorithm to select an I/O scheduler from inside add_disk(): - In case of a single hardware queue or if sharing hardware queues across multiple request queues (BLK_MQ_F_TAG_HCTX_SHARED), use mq-deadline. - Otherwise, use 'none'. This is a good choice for most but not for all block drivers. Make it possible to override the selection of mq-deadline with a new flag, namely BLK_MQ_F_NO_SCHED_BY_DEFAULT. Cc: Christoph Hellwig <hch@lst.de> Cc: Ming Lei <ming.lei@redhat.com> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Cc: Martijn Coenen <maco@android.com> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: NBart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20210805174200.3250718-2-bvanassche@acm.orgSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
- 01 7月, 2021 1 次提交
-
-
由 Christoph Hellwig 提交于
All driver uses are gone now. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NBart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20210624081012.256464-1-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
- 18 6月, 2021 1 次提交
-
-
由 Dan Carpenter 提交于
The __blk_mq_alloc_disk() function doesn't return NULLs it returns error pointers. Fixes: b461dfc4 ("blk-mq: add the blk_mq_alloc_disk APIs") Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/YMyjci35WBqrtqG+@mwandaSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
- 12 6月, 2021 4 次提交
-
-
由 Christoph Hellwig 提交于
All users are gone now. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Link: https://lore.kernel.org/r/20210602065345.355274-16-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
Add a new API to allocate a gendisk including the request_queue for use with blk-mq based drivers. This is to avoid boilerplate code in drivers. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Link: https://lore.kernel.org/r/20210602065345.355274-4-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
Don't return the passed in request_queue but a normal error code, and drop the elevator_init argument in favor of just calling elevator_init_mq directly from dm-rq. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Link: https://lore.kernel.org/r/20210602065345.355274-3-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
Factour out a helper to initialize a simple single hw queue tag_set from blk_mq_init_sq_queue. This will allow to phase out blk_mq_init_sq_queue in favor of a more symmetric and general API. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Link: https://lore.kernel.org/r/20210602065345.355274-2-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>
-
- 22 4月, 2021 1 次提交
-
-
由 Ming Lei 提交于
Fixes the following warning when running 'make htmldocs': include/linux/blk-mq.h:395: warning: Function parameter or member 'set_rq_budget_token' not described in 'blk_mq_ops' include/linux/blk-mq.h:395: warning: Function parameter or member 'get_rq_budget_token' not described in 'blk_mq_ops' [mkp: added warning messages] Link: https://lore.kernel.org/r/20210421154526.1954174-1-ming.lei@redhat.com Fixes: d022d18c ("scsi: blk-mq: Add callbacks for storing & retrieving budget token") Reported-by: NStephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: NMing Lei <ming.lei@redhat.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
- 05 3月, 2021 2 次提交
-
-
由 Ming Lei 提交于
SCSI uses a global atomic variable to track queue depth for each LUN/request queue. This doesn't scale well when there are lots of CPU cores and the disk is very fast. It has been observed that IOPS is affected a lot by tracking queue depth via sdev->device_busy in the I/O path. Return budget token from .get_budget callback. The budget token can be passed to driver so that we can replace the atomic variable with sbitmap_queue and alleviate the scaling problems that way. Link: https://lore.kernel.org/r/20210122023317.687987-9-ming.lei@redhat.com Cc: Omar Sandoval <osandov@fb.com> Cc: Kashyap Desai <kashyap.desai@broadcom.com> Cc: Sumanesh Samanta <sumanesh.samanta@broadcom.com> Cc: Ewan D. Milne <emilne@redhat.com> Tested-by: NSumanesh Samanta <sumanesh.samanta@broadcom.com> Reviewed-by: NHannes Reinecke <hare@suse.de> Signed-off-by: NMing Lei <ming.lei@redhat.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
由 Ming Lei 提交于
Since SCSI is the only driver which requires dispatch budget move the token from struct request to struct scsi_cmnd. Link: https://lore.kernel.org/r/20210122023317.687987-8-ming.lei@redhat.com Cc: Omar Sandoval <osandov@fb.com> Cc: Kashyap Desai <kashyap.desai@broadcom.com> Cc: Sumanesh Samanta <sumanesh.samanta@broadcom.com> Cc: Ewan D. Milne <emilne@redhat.com> Cc: Hannes Reinecke <hare@suse.de> Tested-by: NSumanesh Samanta <sumanesh.samanta@broadcom.com> Reviewed-by: NHannes Reinecke <hare@suse.de> Signed-off-by: NMing Lei <ming.lei@redhat.com> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
- 10 2月, 2021 1 次提交
-
-
由 Chao Leng 提交于
nvme drivers need to set the state of request to MQ_RQ_COMPLETE when directly complete request in queue_rq. So add blk_mq_set_request_complete. Signed-off-by: NChao Leng <lengchao@huawei.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
- 25 1月, 2021 2 次提交
-
-
由 Jan Kara 提交于
This reverts commit b445547e. Since both mq-deadline and BFQ completely ignore hctx they are passed to their dispatch function and dispatch whatever request they deem fit checking whether any request for a particular hctx is queued is just pointless since we'll very likely get a request from a different hctx anyway. In the following commit we'll deal with lock contention in these IO schedulers in presence of multiple HW queues in a different way. Signed-off-by: NJan Kara <jack@suse.cz> Reviewed-by: NMing Lei <ming.lei@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Christoph Hellwig 提交于
Replace the gendisk pointer in struct bio with a pointer to the newly improved struct block device. From that the gendisk can be trivially accessed with an extra indirection, but it also allows to directly look up all information related to partition remapping. Signed-off-by: NChristoph Hellwig <hch@lst.de> Acked-by: NTejun Heo <tj@kernel.org> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 10 12月, 2020 2 次提交
-
-
由 Bart Van Assche 提交于
Remove flag RQF_PREEMPT and BLK_MQ_REQ_PREEMPT since these are no longer used by any kernel code. Link: https://lore.kernel.org/r/20201209052951.16136-8-bvanassche@acm.org Cc: Can Guo <cang@codeaurora.org> Cc: Stanley Chu <stanley.chu@mediatek.com> Cc: Alan Stern <stern@rowland.harvard.edu> Cc: Ming Lei <ming.lei@redhat.com> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Martin Kepplinger <martin.kepplinger@puri.sm> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NHannes Reinecke <hare@suse.de> Reviewed-by: NJens Axboe <axboe@kernel.dk> Reviewed-by: NCan Guo <cang@codeaurora.org> Signed-off-by: NBart Van Assche <bvanassche@acm.org> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
由 Bart Van Assche 提交于
Introduce the BLK_MQ_REQ_PM flag. This flag makes the request allocation functions set RQF_PM. This is the first step towards removing BLK_MQ_REQ_PREEMPT. Link: https://lore.kernel.org/r/20201209052951.16136-3-bvanassche@acm.org Cc: Alan Stern <stern@rowland.harvard.edu> Cc: Stanley Chu <stanley.chu@mediatek.com> Cc: Ming Lei <ming.lei@redhat.com> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Can Guo <cang@codeaurora.org> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NHannes Reinecke <hare@suse.de> Reviewed-by: NJens Axboe <axboe@kernel.dk> Reviewed-by: NCan Guo <cang@codeaurora.org> Signed-off-by: NBart Van Assche <bvanassche@acm.org> Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
-
- 08 12月, 2020 1 次提交
-
-
由 Ming Lei 提交于
flush_end_io() may be called recursively from some driver, such as nvme-loop, so lockdep may complain 'possible recursive locking'. Commit b3c6a599("block: Fix a lockdep complaint triggered by request queue flushing") tried to address this issue by assigning dynamically allocated per-flush-queue lock class. This solution adds synchronize_rcu() for each hctx's release handler, and causes horrible SCSI MQ probe delay(more than half an hour on megaraid sas). Add new API of blk_mq_hctx_set_fq_lock_class() for these drivers, so we just need to use driver specific lock class for avoiding the lockdep warning of 'possible recursive locking'. Tested-by: NKashyap Desai <kashyap.desai@broadcom.com> Reported-by: NQian Cai <cai@redhat.com> Cc: Sumit Saxena <sumit.saxena@broadcom.com> Cc: John Garry <john.garry@huawei.com> Cc: Kashyap Desai <kashyap.desai@broadcom.com> Cc: Bart Van Assche <bvanassche@acm.org> Cc: Hannes Reinecke <hare@suse.de> Signed-off-by: NMing Lei <ming.lei@redhat.com> Reviewed-by: NHannes Reinecke <hare@suse.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 02 12月, 2020 1 次提交
-
-
由 Chaitanya Kulkarni 提交于
This is a preparation patch to have minimal block layer request bio append functionality in the context of the NVMeOF Passthru driver which falls in the fast path and doesn't need calls from blk_rq_append_bio(). Signed-off-by: NChaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: NLogan Gunthorpe <logang@deltatee.com> Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
- 29 10月, 2020 1 次提交
-
-
由 Mauro Carvalho Chehab 提交于
As reported by kernel-doc: ./include/linux/blk-mq.h:267: warning: Function parameter or member 'active_queues_shared_sbitmap' not described in 'blk_mq_tag_set' There is now a new member for struct blk_mq_tag_set. Add a description for it, based on the commit that introduced it. Fixes: f1b49fdc ("blk-mq: Record active_queues_shared_sbitmap per tag_set for when using shared sbitmap") Signed-off-by: NMauro Carvalho Chehab <mchehab+huawei@kernel.org> Reviewed-by: NJens Axboe <axboe@kernel.dk> Reviewed-by: NJohn Garry <john.garry@huawei.com> Link: https://lore.kernel.org/r/8e513153b83eefc05e358f51f2632b592c3f6772.1603791716.git.mchehab+huawei@kernel.orgSigned-off-by: NJonathan Corbet <corbet@lwn.net>
-
- 04 9月, 2020 4 次提交
-
-
由 Kashyap Desai 提交于
High CPU utilization on "native_queued_spin_lock_slowpath" due to lock contention is possible for mq-deadline and bfq IO schedulers when nr_hw_queues is more than one. It is because kblockd work queue can submit IO from all online CPUs (through blk_mq_run_hw_queues()) even though only one hctx has pending commands. The elevator callback .has_work for mq-deadline and bfq scheduler considers pending work if there are any IOs on request queue but it does not account hctx context. Add a per-hctx 'elevator_queued' count to the hctx to avoid triggering the elevator even though there are no requests queued. [jpg: Relocated atomic_dec() in dd_dispatch_request(), update commit message per Kashyap] Signed-off-by: NKashyap Desai <kashyap.desai@broadcom.com> Signed-off-by: NHannes Reinecke <hare@suse.de> Signed-off-by: NJohn Garry <john.garry@huawei.com> Tested-by: NDouglas Gilbert <dgilbert@interlog.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 John Garry 提交于
For when using a shared sbitmap, no longer should the number of active request queues per hctx be relied on for when judging how to share the tag bitmap. Instead maintain the number of active request queues per tag_set, and make the judgement based on that. Originally-from: Kashyap Desai <kashyap.desai@broadcom.com> Signed-off-by: NJohn Garry <john.garry@huawei.com> Tested-by: Don Brace<don.brace@microsemi.com> #SCSI resv cmds patches used Tested-by: NDouglas Gilbert <dgilbert@interlog.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 John Garry 提交于
Some SCSI HBAs (such as HPSA, megaraid, mpt3sas, hisi_sas_v3 ..) support multiple reply queues with single hostwide tags. In addition, these drivers want to use interrupt assignment in pci_alloc_irq_vectors(PCI_IRQ_AFFINITY). However, as discussed in [0], CPU hotplug may cause in-flight IO completion to not be serviced when an interrupt is shutdown. That problem is solved in commit bf0beec0 ("blk-mq: drain I/O when all CPUs in a hctx are offline"). However, to take advantage of that blk-mq feature, the HBA HW queuess are required to be mapped to that of the blk-mq hctx's; to do that, the HBA HW queues need to be exposed to the upper layer. In making that transition, the per-SCSI command request tags are no longer unique per Scsi host - they are just unique per hctx. As such, the HBA LLDD would have to generate this tag internally, which has a certain performance overhead. However another problem is that blk-mq assumes the host may accept (Scsi_host.can_queue * #hw queue) commands. In commit 6eb045e0 ("scsi: core: avoid host-wide host_busy counter for scsi_mq"), the Scsi host busy counter was removed, which would stop the LLDD being sent more than .can_queue commands; however, it should still be ensured that the block layer does not issue more than .can_queue commands to the Scsi host. To solve this problem, introduce a shared sbitmap per blk_mq_tag_set, which may be requested at init time. New flag BLK_MQ_F_TAG_HCTX_SHARED should be set when requesting the tagset to indicate whether the shared sbitmap should be used. Even when BLK_MQ_F_TAG_HCTX_SHARED is set, a full set of tags and requests are still allocated per hctx; the reason for this is that if tags and requests were only allocated for a single hctx - like hctx0 - it may break block drivers which expect a request be associated with a specific hctx, i.e. not always hctx0. This will introduce extra memory usage. This change is based on work originally from Ming Lei in [1] and from Bart's suggestion in [2]. [0] https://lore.kernel.org/linux-block/alpine.DEB.2.21.1904051331270.1802@nanos.tec.linutronix.de/ [1] https://lore.kernel.org/linux-block/20190531022801.10003-1-ming.lei@redhat.com/ [2] https://lore.kernel.org/linux-block/ff77beff-5fd9-9f05-12b6-826922bace1f@huawei.com/T/#m3db0a602f095cbcbff27e9c884d6b4ae826144beSigned-off-by: NJohn Garry <john.garry@huawei.com> Tested-by: Don Brace<don.brace@microsemi.com> #SCSI resv cmds patches used Tested-by: NDouglas Gilbert <dgilbert@interlog.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Ming Lei 提交于
BLK_MQ_F_TAG_SHARED actually means that tags is shared among request queues, all of which should belong to LUNs attached to same HBA. So rename it to make the point explicitly. [jpg: rebase a few times, add rnbd-clt.c change] Suggested-by: NBart Van Assche <bvanassche@acm.org> Signed-off-by: NMing Lei <ming.lei@redhat.com> Signed-off-by: NJohn Garry <john.garry@huawei.com> Tested-by: NDouglas Gilbert <dgilbert@interlog.com> Reviewed-by: NHannes Reinecke <hare@suse.de> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 02 9月, 2020 1 次提交
-
-
由 Baolin Wang 提交于
Move the blk_mq_bio_list_merge() into blk-merge.c and rename it as a generic name. Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NBaolin Wang <baolin.wang@linux.alibaba.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-