1. 23 11月, 2016 2 次提交
    • D
      target-ppc: Allow eventual removal of old migration mistakes · 146c11f1
      David Gibson 提交于
      Until very recently, the vmstate for ppc cpus included some poorly
      thought out VMSTATE_EQUAL() components, that can easily break
      migration compatibility, and did so between qemu-2.6 and later
      versions.  A hack was recently added which fixes this migration
      breakage, but it leaves the unhelpful cruft of these fields in the
      migration stream.
      
      This patch adds a new cpu property allowing these fields to be removed
      from the stream entirely.  For the pseries-2.8 machine type - which
      comes after the fix - and for all non-pseries machine types - which
      aren't mature enough to care about cross-version migration - we remove
      the fields from the stream.
      
      For pseries-2.7 and earlier, The migration hack remains in place,
      allowing backwards and forwards migration with the older machine
      types.
      
      This restricts the migration compatibility cruft to older machine
      types, and at least opens the possibility of eventually deprecating
      and removing it entirely.
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Reviewed-by: NDr. David Alan Gilbert <dgilbert@redhat.com>
      Reviewed-by: NThomas Huth <thuth@redhat.com>
      Reviewed-by: NGreg Kurz <groug@kaod.org>
      Reviewed-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      146c11f1
    • M
      spapr: migration support for CAS-negotiated option vectors · 62ef3760
      Michael Roth 提交于
      With the additional of the OV5_HP_EVT option vector, we now have
      certain functionality (namely, memory unplug) that checks at run-time
      for whether or not the guest negotiated the option via CAS. Because
      we don't currently migrate these negotiated values, we are unable
      to unplug memory from a guest after it's been migrated until after
      the guest is rebooted and CAS-negotiation is repeated.
      
      This patch fixes this by adding CAS-negotiated options to the
      migration stream. We do this using a subsection, since the
      negotiated value of OV5_HP_EVT is the only option currently needed
      to maintain proper functionality for a running guest.
      Signed-off-by: NMichael Roth <mdroth@linux.vnet.ibm.com>
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      62ef3760
  2. 31 10月, 2016 1 次提交
  3. 28 10月, 2016 20 次提交
  4. 25 10月, 2016 1 次提交
  5. 16 10月, 2016 4 次提交
    • D
      spapr: Improved placement of PCI host bridges in guest memory map · 357d1e3b
      David Gibson 提交于
      Currently, the MMIO space for accessing PCI on pseries guests begins at
      1 TiB in guest address space.  Each PCI host bridge (PHB) has a 64 GiB
      chunk of address space in which it places its outbound PIO and 32-bit and
      64-bit MMIO windows.
      
      This scheme as several problems:
        - It limits guest RAM to 1 TiB (though we have a limited fix for this
          now)
        - It limits the total MMIO window to 64 GiB.  This is not always enough
          for some of the large nVidia GPGPU cards
        - Putting all the windows into a single 64 GiB area means that naturally
          aligning things within there will waste more address space.
      In addition there was a miscalculation in some of the defaults, which meant
      that the MMIO windows for each PHB actually slightly overran the 64 GiB
      region for that PHB.  We got away without nasty consequences because
      the overrun fit within an unused area at the beginning of the next PHB's
      region, but it's not pretty.
      
      This patch implements a new scheme which addresses those problems, and is
      also closer to what bare metal hardware and pHyp guests generally use.
      
      Because some guest versions (including most current distro kernels) can't
      access PCI MMIO above 64 TiB, we put all the PCI windows between 32 TiB and
      64 TiB.  This is broken into 1 TiB chunks.  The first 1 TiB contains the
      PIO (64 kiB) and 32-bit MMIO (2 GiB) windows for all of the PHBs.  Each
      subsequent TiB chunk contains a naturally aligned 64-bit MMIO window for
      one PHB each.
      
      This reduces the number of allowed PHBs (without full manual configuration
      of all the windows) from 256 to 31, but this should still be plenty in
      practice.
      
      We also change some of the default window sizes for manually configured
      PHBs to saner values.
      
      Finally we adjust some tests and libqos so that it correctly uses the new
      default locations.  Ideally it would parse the device tree given to the
      guest, but that's a more complex problem for another time.
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Reviewed-by: NLaurent Vivier <lvivier@redhat.com>
      357d1e3b
    • D
      spapr_pci: Add a 64-bit MMIO window · daa23699
      David Gibson 提交于
      On real hardware, and under pHyp, the PCI host bridges on Power machines
      typically advertise two outbound MMIO windows from the guest's physical
      memory space to PCI memory space:
        - A 32-bit window which maps onto 2GiB..4GiB in the PCI address space
        - A 64-bit window which maps onto a large region somewhere high in PCI
          address space (traditionally this used an identity mapping from guest
          physical address to PCI address, but that's not always the case)
      
      The qemu implementation in spapr-pci-host-bridge, however, only supports a
      single outbound MMIO window, however.  At least some Linux versions expect
      the two windows however, so we arranged this window to map onto the PCI
      memory space from 2 GiB..~64 GiB, then advertised it as two contiguous
      windows, the "32-bit" window from 2G..4G and the "64-bit" window from
      4G..~64G.
      
      This approach means, however, that the 64G window is not naturally aligned.
      In turn this limits the size of the largest BAR we can map (which does have
      to be naturally aligned) to roughly half of the total window.  With some
      large nVidia GPGPU cards which have huge memory BARs, this is starting to
      be a problem.
      
      This patch adds true support for separate 32-bit and 64-bit outbound MMIO
      windows to the spapr-pci-host-bridge implementation, each of which can
      be independently configured.  The 32-bit window always maps to 2G.. in PCI
      space, but the PCI address of the 64-bit window can be configured (it
      defaults to the same as the guest physical address).
      
      So as not to break possible existing configurations, as long as a 64-bit
      window is not specified, a large single window can be specified.  This
      will appear the same way to the guest as the old approach, although it's
      now implemented by two contiguous memory regions rather than a single one.
      
      For now, this only adds the possibility of 64-bit windows.  The default
      configuration still uses the legacy mode.
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Reviewed-by: NLaurent Vivier <lvivier@redhat.com>
      daa23699
    • D
      spapr: Adjust placement of PCI host bridge to allow > 1TiB RAM · 2efff1c0
      David Gibson 提交于
      Currently the default PCI host bridge for the 'pseries' machine type is
      constructed with its IO windows in the 1TiB..(1TiB + 64GiB) range in
      guest memory space.  This means that if > 1TiB of guest RAM is specified,
      the RAM will collide with the PCI IO windows, causing serious problems.
      
      Problems won't be obvious until guest RAM goes a bit beyond 1TiB, because
      there's a little unused space at the bottom of the area reserved for PCI,
      but essentially this means that > 1TiB of RAM has never worked with the
      pseries machine type.
      
      This patch fixes this by altering the placement of PHBs on large-RAM VMs.
      Instead of always placing the first PHB at 1TiB, it is placed at the next
      1 TiB boundary after the maximum RAM address.
      
      Technically, this changes behaviour in a migration-breaking way for
      existing machines with > 1TiB maximum memory, but since having > 1 TiB
      memory was broken anyway, this seems like a reasonable trade-off.
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Reviewed-by: NLaurent Vivier <lvivier@redhat.com>
      2efff1c0
    • D
      spapr_pci: Delegate placement of PCI host bridges to machine type · 6737d9ad
      David Gibson 提交于
      The 'spapr-pci-host-bridge' represents the virtual PCI host bridge (PHB)
      for a PAPR guest.  Unlike on x86, it's routine on Power (both bare metal
      and PAPR guests) to have numerous independent PHBs, each controlling a
      separate PCI domain.
      
      There are two ways of configuring the spapr-pci-host-bridge device: first
      it can be done fully manually, specifying the locations and sizes of all
      the IO windows.  This gives the most control, but is very awkward with 6
      mandatory parameters.  Alternatively just an "index" can be specified
      which essentially selects from an array of predefined PHB locations.
      The PHB at index 0 is automatically created as the default PHB.
      
      The current set of default locations causes some problems for guests with
      large RAM (> 1 TiB) or PCI devices with very large BARs (e.g. big nVidia
      GPGPU cards via VFIO).  Obviously, for migration we can only change the
      locations on a new machine type, however.
      
      This is awkward, because the placement is currently decided within the
      spapr-pci-host-bridge code, so it breaks abstraction to look inside the
      machine type version.
      
      So, this patch delegates the "default mode" PHB placement from the
      spapr-pci-host-bridge device back to the machine type via a public method
      in sPAPRMachineClass.  It's still a bit ugly, but it's about the best we
      can do.
      
      For now, this just changes where the calculation is done.  It doesn't
      change the actual location of the host bridges, or any other behaviour.
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Reviewed-by: NLaurent Vivier <lvivier@redhat.com>
      6737d9ad
  6. 14 10月, 2016 1 次提交
    • M
      spapr: fix inheritance chain for default machine options · 672de881
      Michael Roth 提交于
      Rather than machine instances having backward-compatible option
      defaults that need to be repeatedly re-enabled for every new machine
      type we introduce, we set the defaults appropriate for newer machine
      types, then add code to explicitly disable instance options as needed
      to maintain compatibility with older machine types.
      
      Currently pseries-2.5 does not inherit from pseries-2.6 in this
      fashion, which is okay at the moment since we do not have any
      instance compatibility options for pseries-2.6+ currently.
      
      We will make use of this in future patches though, so fix it here.
      Signed-off-by: NMichael Roth <mdroth@linux.vnet.ibm.com>
      [dwg: Extended to make 2.7 inherit from 2.8 as well]
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      672de881
  7. 13 10月, 2016 3 次提交
  8. 06 10月, 2016 1 次提交
    • T
      hw/ppc/spapr: Use POWER8 by default for the pseries-2.8 machine · 3daa4a9f
      Thomas Huth 提交于
      A couple of distributors are compiling their distributions
      with "-mcpu=power8" for ppc64le these days, so the user sooner
      or later runs into a crash there when not explicitely specifying
      the "-cpu POWER8" option to QEMU (which is currently using POWER7
      for the "pseries" machine by default). Due to this reason, the
      linux-user target already switched to POWER8 a while ago (see commit
      de3f1b98). Since the softmmu target
      of course has the same problem, we should switch there to POWER8 for
      the newer machine types, too.
      Signed-off-by: NThomas Huth <thuth@redhat.com>
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      3daa4a9f
  9. 05 10月, 2016 4 次提交
  10. 28 9月, 2016 1 次提交
  11. 27 9月, 2016 1 次提交
    • A
      cpus: pass CPUState to run_on_cpu helpers · e0eeb4a2
      Alex Bennée 提交于
      CPUState is a fairly common pointer to pass to these helpers. This means
      if you need other arguments for the async_run_on_cpu case you end up
      having to do a g_malloc to stuff additional data into the routine. For
      the current users this isn't a massive deal but for MTTCG this gets
      cumbersome when the only other parameter is often an address.
      
      This adds the typedef run_on_cpu_func for helper functions which has an
      explicit CPUState * passed as the first parameter. All the users of
      run_on_cpu and async_run_on_cpu have had their helpers updated to use
      CPUState where available.
      Signed-off-by: NAlex Bennée <alex.bennee@linaro.org>
      [Sergey Fedorov:
       - eliminate more CPUState in user data;
       - remove unnecessary user data passing;
       - fix target-s390x/kvm.c and target-s390x/misc_helper.c]
      Signed-off-by: NSergey Fedorov <sergey.fedorov@linaro.org>
      Acked-by: David Gibson <david@gibson.dropbear.id.au> (ppc parts)
      Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> (s390 parts)
      Signed-off-by: NAlex Bennée <alex.bennee@linaro.org>
      Message-Id: <1470158864-17651-3-git-send-email-alex.bennee@linaro.org>
      Reviewed-by: NRichard Henderson <rth@twiddle.net>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      e0eeb4a2
  12. 23 9月, 2016 1 次提交