• A
    spapr: Support NVIDIA V100 GPU with NVLink2 · ec132efa
    Alexey Kardashevskiy 提交于
    NVIDIA V100 GPUs have on-board RAM which is mapped into the host memory
    space and accessible as normal RAM via an NVLink bus. The VFIO-PCI driver
    implements special regions for such GPUs and emulates an NVLink bridge.
    NVLink2-enabled POWER9 CPUs also provide address translation services
    which includes an ATS shootdown (ATSD) register exported via the NVLink
    bridge device.
    
    This adds a quirk to VFIO to map the GPU memory and create an MR;
    the new MR is stored in a PCI device as a QOM link. The sPAPR PCI uses
    this to get the MR and map it to the system address space.
    Another quirk does the same for ATSD.
    
    This adds additional steps to sPAPR PHB setup:
    
    1. Search for specific GPUs and NPUs, collect findings in
    sPAPRPHBState::nvgpus, manage system address space mappings;
    
    2. Add device-specific properties such as "ibm,npu", "ibm,gpu",
    "memory-block", "link-speed" to advertise the NVLink2 function to
    the guest;
    
    3. Add "mmio-atsd" to vPHB to advertise the ATSD capability;
    
    4. Add new memory blocks (with extra "linux,memory-usable" to prevent
    the guest OS from accessing the new memory until it is onlined) and
    npuphb# nodes representing an NPU unit for every vPHB as the GPU driver
    uses it for link discovery.
    
    This allocates space for GPU RAM and ATSD like we do for MMIOs by
    adding 2 new parameters to the phb_placement() hook. Older machine types
    set these to zero.
    
    This puts new memory nodes in a separate NUMA node to as the GPU RAM
    needs to be configured equally distant from any other node in the system.
    Unlike the host setup which assigns numa ids from 255 downwards, this
    adds new NUMA nodes after the user configures nodes or from 1 if none
    were configured.
    
    This adds requirement similar to EEH - one IOMMU group per vPHB.
    The reason for this is that ATSD registers belong to a physical NPU
    so they cannot invalidate translations on GPUs attached to another NPU.
    It is guaranteed by the host platform as it does not mix NVLink bridges
    or GPUs from different NPU in the same IOMMU group. If more than one
    IOMMU group is detected on a vPHB, this disables ATSD support for that
    vPHB and prints a warning.
    Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
    [aw: for vfio portions]
    Acked-by: NAlex Williamson <alex.williamson@redhat.com>
    Message-Id: <20190312082103.130561-1-aik@ozlabs.ru>
    Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
    ec132efa
pci.c 105.6 KB