• D
    PCI: hv: Only reuse existing IRTE allocation for Multi-MSI · c234ba80
    Dexuan Cui 提交于
    Jeffrey added Multi-MSI support to the pci-hyperv driver by the 4 patches:
    08e61e86 ("PCI: hv: Fix multi-MSI to allow more than one MSI vector")
    455880df ("PCI: hv: Fix hv_arch_irq_unmask() for multi-MSI")
    b4b77778 ("PCI: hv: Reuse existing IRTE allocation in compose_msi_msg()")
    a2bad844 ("PCI: hv: Fix interrupt mapping for multi-MSI")
    
    It turns out that the third patch (b4b77778) causes a performance
    regression because all the interrupts now happen on 1 physical CPU (or two
    pCPUs, if one pCPU doesn't have enough vectors). When a guest has many PCI
    devices, it may suffer from soft lockups if the workload is heavy, e.g.,
    see https://lwn.net/ml/linux-kernel/20220804025104.15673-1-decui@microsoft.com/
    
    Commit b4b77778 itself is good. The real issue is that the hypercall in
    hv_irq_unmask() -> hv_arch_irq_unmask() ->
    hv_do_hypercall(HVCALL_RETARGET_INTERRUPT...) only changes the target
    virtual CPU rather than physical CPU; with b4b77778, the pCPU is
    determined only once in hv_compose_msi_msg() where only vCPU0 is specified;
    consequently the hypervisor only uses 1 target pCPU for all the interrupts.
    
    Note: before b4b77778, the pCPU is determined twice, and when the pCPU
    is determined the second time, the vCPU in the effective affinity mask is
    used (i.e., it isn't always vCPU0), so the hypervisor chooses different
    pCPU for each interrupt.
    
    The hypercall will be fixed in future to update the pCPU as well, but
    that will take quite a while, so let's restore the old behavior in
    hv_compose_msi_msg(), i.e., don't reuse the existing IRTE allocation for
    single-MSI and MSI-X; for multi-MSI, we choose the vCPU in a round-robin
    manner for each PCI device, so the interrupts of different devices can
    happen on different pCPUs, though the interrupts of each device happen on
    some single pCPU.
    
    The hypercall fix may not be backported to all old versions of Hyper-V, so
    we want to have this guest side change forever (or at least till we're sure
    the old affected versions of Hyper-V are no longer supported).
    
    Fixes: b4b77778 ("PCI: hv: Reuse existing IRTE allocation in compose_msi_msg()")
    Co-developed-by: NJeffrey Hugo <quic_jhugo@quicinc.com>
    Signed-off-by: NJeffrey Hugo <quic_jhugo@quicinc.com>
    Co-developed-by: NCarl Vanderlip <quic_carlv@quicinc.com>
    Signed-off-by: NCarl Vanderlip <quic_carlv@quicinc.com>
    Signed-off-by: NDexuan Cui <decui@microsoft.com>
    Reviewed-by: NMichael Kelley <mikelley@microsoft.com>
    Link: https://lore.kernel.org/r/20221104222953.11356-1-decui@microsoft.comSigned-off-by: NWei Liu <wei.liu@kernel.org>
    c234ba80
pci-hyperv.c 112.6 KB