1. 28 10月, 2016 7 次提交
  2. 16 10月, 2016 2 次提交
    • D
      spapr_pci: Add a 64-bit MMIO window · daa23699
      David Gibson 提交于
      On real hardware, and under pHyp, the PCI host bridges on Power machines
      typically advertise two outbound MMIO windows from the guest's physical
      memory space to PCI memory space:
        - A 32-bit window which maps onto 2GiB..4GiB in the PCI address space
        - A 64-bit window which maps onto a large region somewhere high in PCI
          address space (traditionally this used an identity mapping from guest
          physical address to PCI address, but that's not always the case)
      
      The qemu implementation in spapr-pci-host-bridge, however, only supports a
      single outbound MMIO window, however.  At least some Linux versions expect
      the two windows however, so we arranged this window to map onto the PCI
      memory space from 2 GiB..~64 GiB, then advertised it as two contiguous
      windows, the "32-bit" window from 2G..4G and the "64-bit" window from
      4G..~64G.
      
      This approach means, however, that the 64G window is not naturally aligned.
      In turn this limits the size of the largest BAR we can map (which does have
      to be naturally aligned) to roughly half of the total window.  With some
      large nVidia GPGPU cards which have huge memory BARs, this is starting to
      be a problem.
      
      This patch adds true support for separate 32-bit and 64-bit outbound MMIO
      windows to the spapr-pci-host-bridge implementation, each of which can
      be independently configured.  The 32-bit window always maps to 2G.. in PCI
      space, but the PCI address of the 64-bit window can be configured (it
      defaults to the same as the guest physical address).
      
      So as not to break possible existing configurations, as long as a 64-bit
      window is not specified, a large single window can be specified.  This
      will appear the same way to the guest as the old approach, although it's
      now implemented by two contiguous memory regions rather than a single one.
      
      For now, this only adds the possibility of 64-bit windows.  The default
      configuration still uses the legacy mode.
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Reviewed-by: NLaurent Vivier <lvivier@redhat.com>
      daa23699
    • D
      spapr_pci: Delegate placement of PCI host bridges to machine type · 6737d9ad
      David Gibson 提交于
      The 'spapr-pci-host-bridge' represents the virtual PCI host bridge (PHB)
      for a PAPR guest.  Unlike on x86, it's routine on Power (both bare metal
      and PAPR guests) to have numerous independent PHBs, each controlling a
      separate PCI domain.
      
      There are two ways of configuring the spapr-pci-host-bridge device: first
      it can be done fully manually, specifying the locations and sizes of all
      the IO windows.  This gives the most control, but is very awkward with 6
      mandatory parameters.  Alternatively just an "index" can be specified
      which essentially selects from an array of predefined PHB locations.
      The PHB at index 0 is automatically created as the default PHB.
      
      The current set of default locations causes some problems for guests with
      large RAM (> 1 TiB) or PCI devices with very large BARs (e.g. big nVidia
      GPGPU cards via VFIO).  Obviously, for migration we can only change the
      locations on a new machine type, however.
      
      This is awkward, because the placement is currently decided within the
      spapr-pci-host-bridge code, so it breaks abstraction to look inside the
      machine type version.
      
      So, this patch delegates the "default mode" PHB placement from the
      spapr-pci-host-bridge device back to the machine type via a public method
      in sPAPRMachineClass.  It's still a bit ugly, but it's about the best we
      can do.
      
      For now, this just changes where the calculation is done.  It doesn't
      change the actual location of the host bridges, or any other behaviour.
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Reviewed-by: NLaurent Vivier <lvivier@redhat.com>
      6737d9ad
  3. 06 10月, 2016 1 次提交
    • T
      hw/ppc/spapr: Use POWER8 by default for the pseries-2.8 machine · 3daa4a9f
      Thomas Huth 提交于
      A couple of distributors are compiling their distributions
      with "-mcpu=power8" for ppc64le these days, so the user sooner
      or later runs into a crash there when not explicitely specifying
      the "-cpu POWER8" option to QEMU (which is currently using POWER7
      for the "pseries" machine by default). Due to this reason, the
      linux-user target already switched to POWER8 a while ago (see commit
      de3f1b98). Since the softmmu target
      of course has the same problem, we should switch there to POWER8 for
      the newer machine types, too.
      Signed-off-by: NThomas Huth <thuth@redhat.com>
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      3daa4a9f
  4. 15 9月, 2016 1 次提交
  5. 08 8月, 2016 1 次提交
    • D
      spapr: Correctly set query_hotpluggable_cpus hook based on machine version · 3c0c47e3
      David Gibson 提交于
      Prior to c8721d35 "spapr: Error out when CPU hotplug is attempted on older
      pseries machines", attempting to use query-hotpluggable-cpus on pseries-2.6
      and earlier machine types would SEGV.
      
      That change fixed that, but due to some unexpected interactions in init
      order and a brown-paper-bag worthy failure to test, it accidentally
      disabled query-hotpluggable-cpus for all pseries machine types, including
      the current one which should allow it.
      
      In fact, query_hotpluggable_cpus needs to be non-NULL when and only when
      the dr_cpu_enabled flag in sPAPRMachineClass is set, which makes
      dr_cpu_enabled itself redundant.
      
      This patch removes dr_cpu_enabled, instead directly setting
      query_hotpluggable_cpus from the machine class_init functions, and using
      that to determine the availability of CPU hotplug when necessary.
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      3c0c47e3
  6. 12 7月, 2016 1 次提交
  7. 05 7月, 2016 1 次提交
    • A
      spapr_pci/spapr_pci_vfio: Support Dynamic DMA Windows (DDW) · ae4de14c
      Alexey Kardashevskiy 提交于
      This adds support for Dynamic DMA Windows (DDW) option defined by
      the SPAPR specification which allows to have additional DMA window(s)
      
      The "ddw" property is enabled by default on a PHB but for compatibility
      the pseries-2.6 machine and older disable it.
      This also creates a single DMA window for the older machines to
      maintain backward migration.
      
      This implements DDW for PHB with emulated and VFIO devices. The host
      kernel support is required. The advertised IOMMU page sizes are 4K and
      64K; 16M pages are supported but not advertised by default, in order to
      enable them, the user has to specify "pgsz" property for PHB and
      enable huge pages for RAM.
      
      The existing linux guests try creating one additional huge DMA window
      with 64K or 16MB pages and map the entire guest RAM to. If succeeded,
      the guest switches to dma_direct_ops and never calls TCE hypercalls
      (H_PUT_TCE,...) again. This enables VFIO devices to use the entire RAM
      and not waste time on map/unmap later. This adds a "dma64_win_addr"
      property which is a bus address for the 64bit window and by default
      set to 0x800.0000.0000.0000 as this is what the modern POWER8 hardware
      uses and this allows having emulated and VFIO devices on the same bus.
      
      This adds 4 RTAS handlers:
      * ibm,query-pe-dma-window
      * ibm,create-pe-dma-window
      * ibm,remove-pe-dma-window
      * ibm,reset-pe-dma-window
      These are registered from type_init() callback.
      
      These RTAS handlers are implemented in a separate file to avoid polluting
      spapr_iommu.c with PCI.
      
      This changes sPAPRPHBState::dma_liobn to an array to allow 2 LIOBNs
      and updates all references to dma_liobn. However this does not add
      64bit LIOBN to the migration stream as in fact even 32bit LIOBN is
      rather pointless there (as it is a PHB property and the management
      software can/should pass LIOBNs via CLI) but we keep it for the backward
      migration support.
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      ae4de14c
  8. 01 7月, 2016 1 次提交
  9. 17 6月, 2016 4 次提交
  10. 14 6月, 2016 1 次提交
    • B
      spapr: Ensure all LMBs are represented in ibm,dynamic-memory · d0e5a8f2
      Bharata B Rao 提交于
      Memory hotplug can fail for some combinations of RAM and maxmem when
      DDW is enabled in the presence of devices like nec-usb-xhci. DDW depends
      on maximum addressable memory returned by guest and this value is currently
      being calculated wrongly by the guest kernel routine memory_hotplug_max().
      While there is an attempt to fix the guest kernel, this patch works
      around the problem within QEMU itself.
      
      memory_hotplug_max() routine in the guest kernel arrives at max
      addressable memory by multiplying lmb-size with the lmb-count obtained
      from ibm,dynamic-memory property. There are two assumptions here:
      
      - All LMBs are part of ibm,dynamic memory: This is not true for PowerKVM
        where only hot-pluggable LMBs are present in this property.
      - The memory area comprising of RAM and hotplug region is contiguous: This
        needn't be true always for PowerKVM as there can be gap between
        boot time RAM and hotplug region.
      
      To work around this guest kernel bug, ensure that ibm,dynamic-memory
      has information about all the LMBs (RMA, boot-time LMBs, future
      hotpluggable LMBs, and dummy LMBs to cover the gap between RAM and
      hotpluggable region).
      
      RMA is represented separately by memory@0 node. Hence mark RMA LMBs
      and also the LMBs for the gap b/n RAM and hotpluggable region as
      reserved and as having no valid DRC so that these LMBs are not considered
      by the guest.
      Signed-off-by: NBharata B Rao <bharata@linux.vnet.ibm.com>
      Reviewed-by: NMichael Roth <mdroth@linux.vnet.ibm.com>
      Reviewed-by: NNathan Fontenot <nfont@linux.vnet.ibm.com>
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      d0e5a8f2
  11. 07 6月, 2016 3 次提交
    • A
      spapr_iommu: Add root memory region · b4b6eb77
      Alexey Kardashevskiy 提交于
      We are going to have multiple DMA windows at different offsets on
      a PCI bus. For the sake of migration, we will have as many TCE table
      objects pre-created as many windows supported.
      So we need a way to map windows dynamically onto a PCI bus
      when migration of a table is completed but at this stage a TCE table
      object does not have access to a PHB to ask it to map a DMA window
      backed by just migrated TCE table.
      
      This adds a "root" memory region (UINT64_MAX long) to the TCE object.
      This new region is mapped on a PCI bus with enabled overlapping as
      there will be one root MR per TCE table, each of them mapped at 0.
      The actual IOMMU memory region is a subregion of the root region and
      a TCE table enables/disables this subregion and maps it at
      the specific offset inside the root MR which is 1:1 mapping of
      a PCI address space.
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Reviewed-by: NThomas Huth <thuth@redhat.com>
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      b4b6eb77
    • A
      spapr_iommu: Migrate full state · a26fdf39
      Alexey Kardashevskiy 提交于
      The source guest could have reallocated the default TCE table and
      migrate bigger/smaller table. This adds reallocation in post_load()
      if the default table size is different on source and destination.
      
      This adds @bus_offset, @page_shift to the migration stream as
      a subsection so when DDW is added, migration to older machines will
      still be possible. As @bus_offset and @page_shift are not used yet,
      this makes no change in behavior.
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      a26fdf39
    • A
      spapr_iommu: Introduce "enabled" state for TCE table · df7625d4
      Alexey Kardashevskiy 提交于
      Currently TCE tables are created once at start and their sizes never
      change. We are going to change that by introducing a Dynamic DMA windows
      support where DMA configuration may change during the guest execution.
      
      This changes spapr_tce_new_table() to create an empty zero-size IOMMU
      memory region (IOMMU MR). Only LIOBN is assigned by the time of creation.
      It still will be called once at the owner object (VIO or PHB) creation.
      
      This introduces an "enabled" state for TCE table objects, some
      helper functions are added:
      - spapr_tce_table_enable() receives TCE table parameters, stores in
      sPAPRTCETable and allocates a guest view of the TCE table
      (in the user space or KVM) and sets the correct size on the IOMMU MR;
      - spapr_tce_table_disable() disposes the table and resets the IOMMU MR
      size; it is made public as the following DDW code will be using it.
      
      This changes the PHB reset handler to do the default DMA initialization
      instead of spapr_phb_realize(). This does not make differenct now but
      later with more than just one DMA window, we will have to remove them all
      and create the default one on a system reset.
      
      No visible change in behaviour is expected except the actual table
      will be reallocated every reset. We might optimize this later.
      
      The other way to implement this would be dynamically create/remove
      the TCE table QOM objects but this would make migration impossible
      as the migration code expects all QOM objects to exist at the receiver
      so we have to have TCE table objects created when migration begins.
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      df7625d4
  12. 05 4月, 2016 1 次提交
  13. 17 2月, 2016 1 次提交
    • D
      pseries: Simplify handling of the hash page table fd · 715c5407
      David Gibson 提交于
      When migrating the 'pseries' machine type with KVM, we use a special fd
      to access the hash page table stored within KVM.  Usually, this fd is
      opened at the beginning of migration, and kept open until the migration
      is complete.
      
      However, if there is a guest reset during the migration, the fd can become
      stale and we need to re-open it.  At the moment we use an 'htab_fd_stale'
      flag in sPAPRMachineState to signal this, which is checked in the migration
      iterators.
      
      But that's rather ugly.  It's simpler to just close and invalidate the
      fd on reset, and lazily re-open it in migration if necessary.  This patch
      implements that change.
      
      This requires a small addition to the machine state's instance_init,
      so that htab_fd is initialized to -1 (telling the migration code it
      needs to open it) instead of 0, which could be a valid fd.
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Reviewed-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      715c5407
  14. 30 1月, 2016 2 次提交
  15. 11 1月, 2016 1 次提交
  16. 23 10月, 2015 2 次提交
    • D
      spapr_iommu: Provide a function to switch a TCE table to allowing VFIO · c10325d6
      David Gibson 提交于
      Because of the way non-VFIO guest IOMMU operations are KVM accelerated, not
      all TCE tables (guest IOMMU contexts) can support VFIO devices.  Currently,
      this is decided at creation time.
      
      To support hotplug of VFIO devices, we need to allow a TCE table which
      previously didn't allow VFIO devices to be switched so that it can.  This
      patch adds an spapr_tce_set_need_vfio() function to do this, by
      reallocating the table in userspace if necessary.
      
      Currently this doesn't allow the KVM acceleration to be re-enabled if all
      the VFIO devices are removed.  That's an optimization for another time.
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Reviewed-by: NLaurent Vivier <lvivier@redhat.com>
      c10325d6
    • D
      spapr_iommu: Rename vfio_accel parameter · 6a81dd17
      David Gibson 提交于
      The vfio_accel parameter used when creating a new TCE table (guest IOMMU
      context) has a confusing name.  What it really means is whether we need the
      TCE table created to be able to support VFIO devices.
      
      VFIO is relevant, because when available we use in-kernel acceleration of
      the TCE table, but that may not work with VFIO devices because updates to
      the table are handled in kernel, bypass qemu and so don't hit qemu's
      infrastructure for keeping the VFIO host IOMMU state in sync with the guest
      IOMMU state.
      
      Rename the parameter to "need_vfio" throughout.  This is a cosmetic change,
      with no impact on the logic.
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Reviewed-by: NLaurent Vivier <lvivier@redhat.com>
      6a81dd17
  17. 23 9月, 2015 8 次提交
  18. 07 7月, 2015 2 次提交