• A
    vfio/iommu_type1: Multi-IOMMU domain support · 1ef3e2bc
    Alex Williamson 提交于
    We currently have a problem that we cannot support advanced features
    of an IOMMU domain (ex. IOMMU_CACHE), because we have no guarantee
    that those features will be supported by all of the hardware units
    involved with the domain over its lifetime.  For instance, the Intel
    VT-d architecture does not require that all DRHDs support snoop
    control.  If we create a domain based on a device behind a DRHD that
    does support snoop control and enable SNP support via the IOMMU_CACHE
    mapping option, we cannot then add a device behind a DRHD which does
    not support snoop control or we'll get reserved bit faults from the
    SNP bit in the pagetables.  To add to the complexity, we can't know
    the properties of a domain until a device is attached.
    
    We could pass this problem off to userspace and require that a
    separate vfio container be used, but we don't know how to handle page
    accounting in that case.  How do we know that a page pinned in one
    container is the same page as a different container and avoid double
    billing the user for the page.
    
    The solution is therefore to support multiple IOMMU domains per
    container.  In the majority of cases, only one domain will be required
    since hardware is typically consistent within a system.  However, this
    provides us the ability to validate compatibility of domains and
    support mixed environments where page table flags can be different
    between domains.
    
    To do this, our DMA tracking needs to change.  We currently try to
    coalesce user mappings into as few tracking entries as possible.  The
    problem then becomes that we lose granularity of user mappings.  We've
    never guaranteed that a user is able to unmap at a finer granularity
    than the original mapping, but we must honor the granularity of the
    original mapping.  This coalescing code is therefore removed, allowing
    only unmaps covering complete maps.  The change in accounting is
    fairly small here, a typical QEMU VM will start out with roughly a
    dozen entries, so it's arguable if this coalescing was ever needed.
    
    We also move IOMMU domain creation to the point where a group is
    attached to the container.  An interesting side-effect of this is that
    we now have access to the device at the time of domain creation and
    can probe the devices within the group to determine the bus_type.
    This finally makes vfio_iommu_type1 completely device/bus agnostic.
    In fact, each IOMMU domain can host devices on different buses managed
    by different physical IOMMUs, and present a single DMA mapping
    interface to the user.  When a new domain is created, mappings are
    replayed to bring the IOMMU pagetables up to the state of the current
    container.  And of course, DMA mapping and unmapping automatically
    traverse all of the configured IOMMU domains.
    Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
    Cc: Varun Sethi <Varun.Sethi@freescale.com>
    1ef3e2bc
vfio.h 15.8 KB