• P
    intel_iommu: better handling of dmar state switch · b7fce727
    Peter Xu 提交于
    QEMU is not handling the global DMAR switch well, especially when from
    "on" to "off".
    
    Let's first take the example of system reset.
    
    Assuming that a guest has IOMMU enabled.  When it reboots, we will drop
    all the existing DMAR mappings to handle the system reset, however we'll
    still keep the existing memory layouts which has the IOMMU memory region
    enabled.  So after the reboot and before the kernel reloads again, there
    will be no mapping at all for the host device.  That's problematic since
    any software (for example, SeaBIOS) that runs earlier than the kernel
    after the reboot will assume the IOMMU is disabled, so any DMA from the
    software will fail.
    
    For example, a guest that boots on an assigned NVMe device might fail to
    find the boot device after a system reboot/reset and we'll be able to
    observe SeaBIOS errors if we capture the debugging log:
    
      WARNING - Timeout at nvme_wait:144!
    
    Meanwhile, we should see DMAR errors on the host of that NVMe device.
    It's the DMA fault that caused a NVMe driver timeout.
    
    The correct fix should be that we do proper switching of device DMA
    address spaces when system resets, which will setup correct memory
    regions and notify the backend of the devices.  This might not affect
    much on non-assigned devices since QEMU VT-d emulation will assume a
    default passthrough mapping if DMAR is not enabled in the GCMD
    register (please refer to vtd_iommu_translate).  However that's required
    for an assigned devices, since that'll rebuild the correct GPA to HPA
    mapping that is needed for any DMA operation during guest bootstrap.
    
    Besides the system reset, we have some other places that might change
    the global DMAR status and we'd better do the same thing there.  For
    example, when we change the state of GCMD register, or the DMAR root
    pointer.  Do the same refresh for all these places.  For these two
    places we'll also need to explicitly invalidate the context entry cache
    and iotlb cache.
    
    Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1625173
    CC: QEMU Stable <qemu-stable@nongnu.org>
    Reported-by: NCong Li <coli@redhat.com>
    Signed-off-by: NPeter Xu <peterx@redhat.com>
    --
    v2:
    - do the same for GCMD write, or root pointer update [Alex]
    - test is carried out by me this time, by observing the
      vtd_switch_address_space tracepoint after system reboot
    v3:
    - rewrite commit message as suggested by Alex
    Signed-off-by: NPeter Xu <peterx@redhat.com>
    Reviewed-by: NEric Auger <eric.auger@redhat.com>
    Reviewed-by: NJason Wang <jasowang@redhat.com>
    Reviewed-by: NMichael S. Tsirkin <mst@redhat.com>
    Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
    (cherry picked from commit 2cc9ddcc)
    Signed-off-by: NMichael Roth <mdroth@linux.vnet.ibm.com>
    b7fce727
intel_iommu.c 103.6 KB