• J
    blk-mq: Facilitate a shared sbitmap per tagset · 32bc15af
    John Garry 提交于
    Some SCSI HBAs (such as HPSA, megaraid, mpt3sas, hisi_sas_v3 ..) support
    multiple reply queues with single hostwide tags.
    
    In addition, these drivers want to use interrupt assignment in
    pci_alloc_irq_vectors(PCI_IRQ_AFFINITY). However, as discussed in [0],
    CPU hotplug may cause in-flight IO completion to not be serviced when an
    interrupt is shutdown. That problem is solved in commit bf0beec0
    ("blk-mq: drain I/O when all CPUs in a hctx are offline").
    
    However, to take advantage of that blk-mq feature, the HBA HW queuess are
    required to be mapped to that of the blk-mq hctx's; to do that, the HBA HW
    queues need to be exposed to the upper layer.
    
    In making that transition, the per-SCSI command request tags are no
    longer unique per Scsi host - they are just unique per hctx. As such, the
    HBA LLDD would have to generate this tag internally, which has a certain
    performance overhead.
    
    However another problem is that blk-mq assumes the host may accept
    (Scsi_host.can_queue * #hw queue) commands. In commit 6eb045e0 ("scsi:
     core: avoid host-wide host_busy counter for scsi_mq"), the Scsi host busy
    counter was removed, which would stop the LLDD being sent more than
    .can_queue commands; however, it should still be ensured that the block
    layer does not issue more than .can_queue commands to the Scsi host.
    
    To solve this problem, introduce a shared sbitmap per blk_mq_tag_set,
    which may be requested at init time.
    
    New flag BLK_MQ_F_TAG_HCTX_SHARED should be set when requesting the
    tagset to indicate whether the shared sbitmap should be used.
    
    Even when BLK_MQ_F_TAG_HCTX_SHARED is set, a full set of tags and requests
    are still allocated per hctx; the reason for this is that if tags and
    requests were only allocated for a single hctx - like hctx0 - it may break
    block drivers which expect a request be associated with a specific hctx,
    i.e. not always hctx0. This will introduce extra memory usage.
    
    This change is based on work originally from Ming Lei in [1] and from
    Bart's suggestion in [2].
    
    [0] https://lore.kernel.org/linux-block/alpine.DEB.2.21.1904051331270.1802@nanos.tec.linutronix.de/
    [1] https://lore.kernel.org/linux-block/20190531022801.10003-1-ming.lei@redhat.com/
    [2] https://lore.kernel.org/linux-block/ff77beff-5fd9-9f05-12b6-826922bace1f@huawei.com/T/#m3db0a602f095cbcbff27e9c884d6b4ae826144beSigned-off-by: NJohn Garry <john.garry@huawei.com>
    Tested-by: Don Brace<don.brace@microsemi.com> #SCSI resv cmds patches used
    Tested-by: NDouglas Gilbert <dgilbert@interlog.com>
    Signed-off-by: NJens Axboe <axboe@kernel.dk>
    32bc15af
blk-mq-tag.c 16.5 KB