1. 16 10月, 2016 7 次提交
    • D
      spapr: Improved placement of PCI host bridges in guest memory map · 357d1e3b
      David Gibson 提交于
      Currently, the MMIO space for accessing PCI on pseries guests begins at
      1 TiB in guest address space.  Each PCI host bridge (PHB) has a 64 GiB
      chunk of address space in which it places its outbound PIO and 32-bit and
      64-bit MMIO windows.
      
      This scheme as several problems:
        - It limits guest RAM to 1 TiB (though we have a limited fix for this
          now)
        - It limits the total MMIO window to 64 GiB.  This is not always enough
          for some of the large nVidia GPGPU cards
        - Putting all the windows into a single 64 GiB area means that naturally
          aligning things within there will waste more address space.
      In addition there was a miscalculation in some of the defaults, which meant
      that the MMIO windows for each PHB actually slightly overran the 64 GiB
      region for that PHB.  We got away without nasty consequences because
      the overrun fit within an unused area at the beginning of the next PHB's
      region, but it's not pretty.
      
      This patch implements a new scheme which addresses those problems, and is
      also closer to what bare metal hardware and pHyp guests generally use.
      
      Because some guest versions (including most current distro kernels) can't
      access PCI MMIO above 64 TiB, we put all the PCI windows between 32 TiB and
      64 TiB.  This is broken into 1 TiB chunks.  The first 1 TiB contains the
      PIO (64 kiB) and 32-bit MMIO (2 GiB) windows for all of the PHBs.  Each
      subsequent TiB chunk contains a naturally aligned 64-bit MMIO window for
      one PHB each.
      
      This reduces the number of allowed PHBs (without full manual configuration
      of all the windows) from 256 to 31, but this should still be plenty in
      practice.
      
      We also change some of the default window sizes for manually configured
      PHBs to saner values.
      
      Finally we adjust some tests and libqos so that it correctly uses the new
      default locations.  Ideally it would parse the device tree given to the
      guest, but that's a more complex problem for another time.
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Reviewed-by: NLaurent Vivier <lvivier@redhat.com>
      357d1e3b
    • D
      spapr_pci: Add a 64-bit MMIO window · daa23699
      David Gibson 提交于
      On real hardware, and under pHyp, the PCI host bridges on Power machines
      typically advertise two outbound MMIO windows from the guest's physical
      memory space to PCI memory space:
        - A 32-bit window which maps onto 2GiB..4GiB in the PCI address space
        - A 64-bit window which maps onto a large region somewhere high in PCI
          address space (traditionally this used an identity mapping from guest
          physical address to PCI address, but that's not always the case)
      
      The qemu implementation in spapr-pci-host-bridge, however, only supports a
      single outbound MMIO window, however.  At least some Linux versions expect
      the two windows however, so we arranged this window to map onto the PCI
      memory space from 2 GiB..~64 GiB, then advertised it as two contiguous
      windows, the "32-bit" window from 2G..4G and the "64-bit" window from
      4G..~64G.
      
      This approach means, however, that the 64G window is not naturally aligned.
      In turn this limits the size of the largest BAR we can map (which does have
      to be naturally aligned) to roughly half of the total window.  With some
      large nVidia GPGPU cards which have huge memory BARs, this is starting to
      be a problem.
      
      This patch adds true support for separate 32-bit and 64-bit outbound MMIO
      windows to the spapr-pci-host-bridge implementation, each of which can
      be independently configured.  The 32-bit window always maps to 2G.. in PCI
      space, but the PCI address of the 64-bit window can be configured (it
      defaults to the same as the guest physical address).
      
      So as not to break possible existing configurations, as long as a 64-bit
      window is not specified, a large single window can be specified.  This
      will appear the same way to the guest as the old approach, although it's
      now implemented by two contiguous memory regions rather than a single one.
      
      For now, this only adds the possibility of 64-bit windows.  The default
      configuration still uses the legacy mode.
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Reviewed-by: NLaurent Vivier <lvivier@redhat.com>
      daa23699
    • D
      spapr: Adjust placement of PCI host bridge to allow > 1TiB RAM · 2efff1c0
      David Gibson 提交于
      Currently the default PCI host bridge for the 'pseries' machine type is
      constructed with its IO windows in the 1TiB..(1TiB + 64GiB) range in
      guest memory space.  This means that if > 1TiB of guest RAM is specified,
      the RAM will collide with the PCI IO windows, causing serious problems.
      
      Problems won't be obvious until guest RAM goes a bit beyond 1TiB, because
      there's a little unused space at the bottom of the area reserved for PCI,
      but essentially this means that > 1TiB of RAM has never worked with the
      pseries machine type.
      
      This patch fixes this by altering the placement of PHBs on large-RAM VMs.
      Instead of always placing the first PHB at 1TiB, it is placed at the next
      1 TiB boundary after the maximum RAM address.
      
      Technically, this changes behaviour in a migration-breaking way for
      existing machines with > 1TiB maximum memory, but since having > 1 TiB
      memory was broken anyway, this seems like a reasonable trade-off.
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Reviewed-by: NLaurent Vivier <lvivier@redhat.com>
      2efff1c0
    • D
      spapr_pci: Delegate placement of PCI host bridges to machine type · 6737d9ad
      David Gibson 提交于
      The 'spapr-pci-host-bridge' represents the virtual PCI host bridge (PHB)
      for a PAPR guest.  Unlike on x86, it's routine on Power (both bare metal
      and PAPR guests) to have numerous independent PHBs, each controlling a
      separate PCI domain.
      
      There are two ways of configuring the spapr-pci-host-bridge device: first
      it can be done fully manually, specifying the locations and sizes of all
      the IO windows.  This gives the most control, but is very awkward with 6
      mandatory parameters.  Alternatively just an "index" can be specified
      which essentially selects from an array of predefined PHB locations.
      The PHB at index 0 is automatically created as the default PHB.
      
      The current set of default locations causes some problems for guests with
      large RAM (> 1 TiB) or PCI devices with very large BARs (e.g. big nVidia
      GPGPU cards via VFIO).  Obviously, for migration we can only change the
      locations on a new machine type, however.
      
      This is awkward, because the placement is currently decided within the
      spapr-pci-host-bridge code, so it breaks abstraction to look inside the
      machine type version.
      
      So, this patch delegates the "default mode" PHB placement from the
      spapr-pci-host-bridge device back to the machine type via a public method
      in sPAPRMachineClass.  It's still a bit ugly, but it's about the best we
      can do.
      
      For now, this just changes where the calculation is done.  It doesn't
      change the actual location of the host bridges, or any other behaviour.
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Reviewed-by: NLaurent Vivier <lvivier@redhat.com>
      6737d9ad
    • D
      libqos: Limit spapr-pci to 32-bit MMIO for now · 8360544a
      David Gibson 提交于
      Currently the functions in pci-spapr.c (like pci-pc.c on which it's based)
      don't distinguish between 32-bit and 64-bit PCI MMIO.  At the moment, the
      qemu side implementation is a bit weird and has a single MMIO window
      straddling 32-bit and 64-bit regions, but we're likely to change that in
      future.
      
      In any case, pci-pc.c - and therefore the testcases using PCI - only handle
      32-bit MMIOs for now.  For spapr despite whatever changes might happen with
      the MMIO windows, the 32-bit window is likely to remain at 2..4 GiB in PCI
      space.
      
      So, explicitly limit pci-spapr.c to 32-bit MMIOs for now, we can add 64-bit
      MMIO support back in when and if we need it.
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Reviewed-by: NLaurent Vivier <lvivier@redhat.com>
      8360544a
    • D
      libqos: Correct error in PCI hole sizing for spapr · c7113690
      David Gibson 提交于
      In pci-spapr.c (as in pci-pc.c from which it was derived), the
      pci_hole_start/pci_hole_size and pci_iohole_start/pci_iohole_size pairs[1]
      essentially define the region of PCI (not CPU) addresses in which MMIO
      or PIO BARs respectively will be allocated.
      
      The size value is relative to the start value.  But in pci-spapr.c it is
      set to the entire size of the window supported by the (emulated) hardware,
      but the start values are *not* at the beginning of the emulated windows.
      
      That means if you tried to map enough PCI BARs, we'd messily overrun the
      IO windows, instead of failing in iomap as we should.
      
      This patch corrects this by calculating the hole sizes from the location
      of the window in PCI space and the hole start.
      
      [1] Those are bad names, but that's a problem for another time.
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Reviewed-by: NLaurent Vivier <lvivier@redhat.com>
      c7113690
    • D
      libqos: Isolate knowledge of spapr memory map to qpci_init_spapr() · cd1b354e
      David Gibson 提交于
      The libqos code for accessing PCI on the spapr machine type uses IOBASE()
      and MMIOBASE() macros to determine the address in the CPU memory map of
      the windows to PCI address space.
      
      This is a detail of the implementation of PCI in the machine type, it's not
      specified by the PAPR standard.  Real guests would get the addresses of the
      PCI windows from the device tree.
      
      Finding the device tree in libqos would be awkward, but we can at least
      localize this knowledge of the implementation to the init function, saving
      it in the QPCIBusSPAPR structure for use by the accessors.
      
      That leaves only one place to fix if we alter the location of the PCI
      windows, as we're planning to do.
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Reviewed-by: NLaurent Vivier <lvivier@redhat.com>
      cd1b354e
  2. 14 10月, 2016 9 次提交
  3. 13 10月, 2016 17 次提交
  4. 12 10月, 2016 7 次提交