提交 · 146c11f16f12dbfea62cbd7f865614bb6fcbc6b5 · openeuler / qemu

23 11月, 2016 2 次提交

target-ppc: Allow eventual removal of old migration mistakes · 146c11f1

由 David Gibson 提交于 11月 21, 2016

Until very recently, the vmstate for ppc cpus included some poorly
thought out VMSTATE_EQUAL() components, that can easily break
migration compatibility, and did so between qemu-2.6 and later
versions.  A hack was recently added which fixes this migration
breakage, but it leaves the unhelpful cruft of these fields in the
migration stream.

This patch adds a new cpu property allowing these fields to be removed
from the stream entirely.  For the pseries-2.8 machine type - which
comes after the fix - and for all non-pseries machine types - which
aren't mature enough to care about cross-version migration - we remove
the fields from the stream.

For pseries-2.7 and earlier, The migration hack remains in place,
allowing backwards and forwards migration with the older machine
types.

This restricts the migration compatibility cruft to older machine
types, and at least opens the possibility of eventually deprecating
and removing it entirely.
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
Reviewed-by: NDr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: NThomas Huth <thuth@redhat.com>
Reviewed-by: NGreg Kurz <groug@kaod.org>
Reviewed-by: NAlexey Kardashevskiy <aik@ozlabs.ru>

146c11f1

spapr: migration support for CAS-negotiated option vectors · 62ef3760

由 Michael Roth 提交于 11月 17, 2016

With the additional of the OV5_HP_EVT option vector, we now have
certain functionality (namely, memory unplug) that checks at run-time
for whether or not the guest negotiated the option via CAS. Because
we don't currently migrate these negotiated values, we are unable
to unplug memory from a guest after it's been migrated until after
the guest is rebooted and CAS-negotiation is repeated.

This patch fixes this by adding CAS-negotiated options to the
migration stream. We do this using a subsection, since the
negotiated value of OV5_HP_EVT is the only option currently needed
to maintain proper functionality for a running guest.
Signed-off-by: NMichael Roth <mdroth@linux.vnet.ibm.com>
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>

62ef3760

31 10月, 2016 1 次提交

*_run_on_cpu: introduce run_on_cpu_data type · 14e6fe12

由 Paolo Bonzini 提交于 10月 31, 2016

This changes the *_run_on_cpu APIs (and helpers) to pass data in a
run_on_cpu_data type instead of a plain void *. This is because we
sometimes want to pass a target address (target_ulong) and this fails on
32 bit hosts emulating 64 bit guests.
Signed-off-by: NAlex Bennée <alex.bennee@linaro.org>
Message-Id: <20161027151030.20863-24-alex.bennee@linaro.org>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

14e6fe12

28 10月, 2016 20 次提交

clean-up: removed duplicate #includes · 814bb12a

由 Anand J 提交于 10月 21, 2016

Some files contain multiple #includes of the same header file.
Removed most of those unnecessary duplicate entries using
scripts/clean-includes.
Reviewed-by: NThomas Huth <thuth@redhat.com>
Signed-off-by: NAnand J <anand.indukala@gmail.com>
Signed-off-by: NMichael Tokarev <mjt@tls.msk.ru>

814bb12a

spapr: Memory hot-unplug support · cf632463

由 Bharata B Rao 提交于 10月 26, 2016

Add support to hot remove pc-dimm memory devices.

Since we're introducing a machine-level unplug_request hook, we also
had handling for CPU unplug there as well to ensure CPU unplug
continues to work as it did before.
Signed-off-by: NBharata B Rao <bharata@linux.vnet.ibm.com>
* add hooks to CAS/cmdline enablement of hotplug ACR support
* add hook for CPU unplug
Signed-off-by: NMichael Roth <mdroth@linux.vnet.ibm.com>
Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>

cf632463

spapr: use count+index for memory hotplug · 79b78a6b

由 Michael Roth 提交于 10月 26, 2016

Commit 0a417869:

    spapr: Move memory hotplug to RTAS_LOG_V6_HP_ID_DRC_COUNT type

dropped per-DRC/per-LMB hotplugs event in favor of a bulk add via a
single LMB count value. This was to avoid overrunning the guest EPOW
event queue with hotplug events. This works fine, but relies on the
guest exhaustively scanning for pluggable LMBs to satisfy the
requested count by issuing rtas-get-sensor(DR_ENTITY_SENSE, ...) calls
until all the LMBs associated with the DIMM are identified.

With newer support for dedicated hotplug event source, this queue
exhaustion is no longer as much of an issue due to implementation
details on the guest side, but we still try to avoid excessive hotplug
events by now supporting both a count and a starting index to avoid
unecessary work. This patch makes use of that approach when the
capability is available.

Cc: bharata@linux.vnet.ibm.com
Signed-off-by: NMichael Roth <mdroth@linux.vnet.ibm.com>
Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>

79b78a6b

spapr: add hotplug interrupt machine options · f6229214

由 Michael Roth 提交于 10月 26, 2016

This adds machine options of the form:

  -machine pseries,modern-hotplug-events=true
  -machine pseries,modern-hotplug-events=false

If false, QEMU will force the use of "legacy" style hotplug events,
which are surfaced through EPOW events instead of a dedicated
hot plug event source, and lack certain features necessary, mainly,
for memory unplug support.

If true, QEMU will enable support for "modern" dedicated hot plug
event source. Note that we will still default to "legacy" style unless
the guest advertises support for the "modern" hotplug events via
ibm,client-architecture-support hcall during early boot.

For pseries-2.7 and earlier we default to false, for newer machine
types we default to true.
Signed-off-by: NMichael Roth <mdroth@linux.vnet.ibm.com>
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>

f6229214

spapr_events: add support for dedicated hotplug event source · ffbb1705

由 Michael Roth 提交于 10月 26, 2016

Hotplug events were previously delivered using an EPOW interrupt
and were queued by linux guests into a circular buffer. For traditional
EPOW events like shutdown/resets, this isn't an issue, but for hotplug
events there are cases where this buffer can be exhausted, resulting
in the loss of hotplug events, resets, etc.

Newer-style hotplug event are delivered using a dedicated event source.
We enable this in supported guests by adding standard an additional
event source in the guest device-tree via /event-sources, and, if
the guest advertises support for the newer-style hotplug events,
using the corresponding interrupt to signal the available of
hotplug/unplug events.
Signed-off-by: NMichael Roth <mdroth@linux.vnet.ibm.com>
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>

ffbb1705

spapr: improve ibm,architecture-vec-5 property handling · 417ece33

由 Michael Roth 提交于 10月 24, 2016

ibm,architecture-vec-5 is supposed to encode all option vector 5 bits
negotiated between platform/guest. Currently we hardcode this property
in the boot-time device tree to advertise a single negotiated
capability, "Form 1" NUMA Affinity, regardless of whether or not CAS
has been invoked or that capability has actually been negotiated.

Improve this by generating ibm,architecture-vec-5 based on the full
set of option vector 5 capabilities negotiated via CAS.
Signed-off-by: NMichael Roth <mdroth@linux.vnet.ibm.com>
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>

417ece33

spapr: add option vector handling in CAS-generated resets · 6787d27b

由 Michael Roth 提交于 10月 24, 2016

In some cases, ibm,client-architecture-support calls can fail. This
could happen in the current code for situations where the modified
device tree segment exceeds the buffer size provided by the guest
via the call parameters. In these cases, QEMU will reset, allowing
an opportunity to regenerate the device tree from scratch via
boot-time handling. There are potentially other scenarios as well,
not currently reachable in the current code, but possible in theory,
such as cases where device-tree properties or nodes need to be removed.

We currently don't handle either of these properly for option vector
capabilities however. Instead of carrying the negotiated capability
beyond the reset and creating the boot-time device tree accordingly,
we start from scratch, generating the same boot-time device tree as we
did prior to the CAS-generated and the same device tree updates as we
did before. This could (in theory) cause us to get stuck in a reset
loop. This hasn't been observed, but depending on the extensiveness
of CAS-induced device tree updates in the future, could eventually
become an issue.

Address this by pulling capability-related device tree
updates resulting from CAS calls into a common routine,
spapr_dt_cas_updates(), and adding an sPAPROptionVector*
parameter that allows us to test for newly-negotiated capabilities.
We invoke it as follows:

1) When ibm,client-architecture-support gets called, we
call spapr_dt_cas_updates() with the set of capabilities
added since the previous call to ibm,client-architecture-support.
For the initial boot, or a system reset generated by something
other than the CAS call itself, this set will consist of *all*
options supported both the platform and the guest. For calls
to ibm,client-architecture-support immediately after a CAS-induced
reset, we call spapr_dt_cas_updates() with only the set
of capabilities added since the previous call, since the other
capabilities will have already been addressed by the boot-time
device-tree this time around. In the unlikely event that
capabilities are *removed* since the previous CAS, we will
generate a CAS-induced reset. In the unlikely event that we
cannot fit the device-tree updates into the buffer provided
by the guest, well generate a CAS-induced reset.

2) When a CAS update results in the need to reset the machine and
include the updates in the boot-time device tree, we call the
spapr_dt_cas_updates() using the full set of negotiated
capabilities as part of the reset path. At initial boot, or after
a reset generated by something other than the CAS call itself,
this set will be empty, resulting in what should be the same
boot-time device-tree as we generated prior to this patch. For
CAS-induced reset, this routine will be called with the full set of
capabilities negotiated by the platform/guest in the previous
CAS call, which should result in CAS updates from previous call
being accounted for in the initial boot-time device tree.
Signed-off-by: NMichael Roth <mdroth@linux.vnet.ibm.com>
Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
[dwg: Changed an int -> bool conversion to be more explicit]
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>

6787d27b

spapr_hcall: use spapr_ovec_* interfaces for CAS options · facdb8b6

由 Michael Roth 提交于 10月 24, 2016

Currently we access individual bytes of an option vector via
ldub_phys() to test for the presence of a particular capability
within that byte. Currently this is only done for the "dynamic
reconfiguration memory" capability bit. If that bit is present,
we pass a boolean value to spapr_h_cas_compose_response()
to generate a modified device tree segment with the additional
properties required to enable this functionality.

As more capability bits are added, will would need to modify the
code to add additional option vector accesses and extend the
param list for spapr_h_cas_compose_response() to include similar
boolean values for these parameters.

Avoid this by switching to spapr_ovec_* helpers so we can do all
the parsing in one shot and then test for these additional bits
within spapr_h_cas_compose_response() directly.

Cc: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: NMichael Roth <mdroth@linux.vnet.ibm.com>
Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
Reviewed-by: NBharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>

facdb8b6

pseries: Remove spapr_create_fdt_skel() · 398a0bd5

由 David Gibson 提交于 10月 20, 2016

For historical reasons construction of the guest device tree in spapr is
divided between spapr_create_fdt_skel() which is called at init time, and
spapr_build_fdt() which runs at reset time. Over time, more and more
things have needed to be moved to reset time.

Previous cleanups mean the only things left in spapr_create_fdt_skel() are
the properties of the root node itself. Finish consolidating these two
parts of device tree construction, by moving this to the start of
spapr_build_fdt(), and removing spapr_create_fdt_skel() entirely.
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
Reviewed-by: NThomas Huth <thuth@redhat.com>
Reviewed-by: NMichael Roth <mdroth@linux.vnet.ibm.com>

398a0bd5

pseries: Consolidate construction of /vdevice device tree node · bf5a6696

由 David Gibson 提交于 10月 20, 2016

Construction of the /vdevice node (and its children) is divided between
spapr_create_fdt_skel() (at init time), which creates the base node, and
spapr_populate_vdevice() (at reset time) which creates the nodes for each
individual virtual device.

This consolidates both into a single function called from
spapr_build_fdt().
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
Reviewed-by: NThomas Huth <thuth@redhat.com>
Reviewed-by: NMichael Roth <mdroth@linux.vnet.ibm.com>

bf5a6696

pseries: Move /hypervisor node construction to fdt_build_fdt() · fca5f2dc

由 David Gibson 提交于 10月 20, 2016

Currently the /hypervisor device tree node is constructed in
spapr_create_fdt_skel().  As part of consolidating device tree construction
to reset time, move it to a function called from spapr_build_fdt().
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
Reviewed-by: NThomas Huth <thuth@redhat.com>
Reviewed-by: NMichael Roth <mdroth@linux.vnet.ibm.com>

fca5f2dc

pseries: Move /event-sources construction to spapr_build_fdt() · ffb1e275

由 David Gibson 提交于 10月 20, 2016

The /event-sources device tree node is built from spapr_create_fdt_skel().
As part of consolidating device tree construction to reset time, this moves
it to spapr_build_fdt().
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
Reviewed-by: NThomas Huth <thuth@redhat.com>
Reviewed-by: NMichael Roth <mdroth@linux.vnet.ibm.com>

ffb1e275

pseries: Consolidate construction of /rtas device tree node · 3f5dabce

由 David Gibson 提交于 10月 20, 2016

For historical reasons construction of the /rtas node in the device
tree (amongst others) is split into several places.  In particular
it's split between spapr_create_fdt_skel(), spapr_build_fdt() and
spapr_rtas_device_tree_setup().

In fact, as well as adding the actual RTAS tokens to the device tree,
spapr_rtas_device_tree_setup() just adds the ibm,lrdr-capacity
property, which despite going in the /rtas node, doesn't have a lot to
do with RTAS.

This patch consolidates the code constructing /rtas together into a new
spapr_dt_rtas() function.  spapr_rtas_device_tree_setup() is renamed to
spapr_dt_rtas_tokens() and now only adds the token properties.
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
Reviewed-by: NThomas Huth <thuth@redhat.com>
Reviewed-by: NMichael Roth <mdroth@linux.vnet.ibm.com>

3f5dabce

pseries: Consolidate construction of /chosen device tree node · 7c866c6a

由 David Gibson 提交于 10月 24, 2016

For historical reasons, building the /chosen node in the guest device tree
is split across several places and includes both parts which write the DT
sequentially and others which use random access functions.

This patch consolidates construction of the node into one place, using
random access functions throughout.
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
Reviewed-by: NThomas Huth <thuth@redhat.com>
Reviewed-by: NMichael Roth <mdroth@linux.vnet.ibm.com>

7c866c6a

pseries: Move construction of /interrupt-controller fdt node · 9b9a1908

由 David Gibson 提交于 10月 20, 2016

Currently the device tree node for the XICS interrupt controller is in
spapr_create_fdt_skel(). As part of consolidating device tree construction
to reset time, this moves it to a function called from spapr_build_fdt().

In addition we move the actual code into hw/intc/xics_spapr.c with the
rest of the PAPR specific interrupt controller code.
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
Reviewed-by: NThomas Huth <thuth@redhat.com>
Reviewed-by: NMichael Roth <mdroth@linux.vnet.ibm.com>

9b9a1908

pseries: Consolidate RTAS loading · 2cac78c1

由 David Gibson 提交于 10月 20, 2016

At each system reset, the pseries machine needs to load RTAS (the runtime
portion of the guest firmware) into the VM.  This means copying
the actual RTAS code into guest memory, and also updating the device
tree so that the guest OS and boot firmware can locate it.

For historical reasons the copy and update to the device tree were in
different parts of the code.  This cleanup brings them both together in
an spapr_load_rtas() function.
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
Reviewed-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: NThomas Huth <thuth@redhat.com>
Reviewed-by: NMichael Roth <mdroth@linux.vnet.ibm.com>

2cac78c1

pseries: Move adding of fdt reserve map entries · cf6e5223

由 David Gibson 提交于 10月 20, 2016

The flattened device tree passed to pseries guests contains a list of
reserved memory areas.  Currently we construct this list early in
spapr_create_fdt_skel() as we sequentially write the fdt.

This will be inconvenient for upcoming cleanups, so this patch moves
the reserve map changes to the end of fdt construction.  This changes
fdt_add_reservemap_entry() calls - which work when writing the fdt
sequentially to fdt_add_mem_rsv() calls used when altering the fdt in
random access mode.
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
Reviewed-by: NThomas Huth <thuth@redhat.com>
Reviewed-by: NMichael Roth <mdroth@linux.vnet.ibm.com>

cf6e5223

pseries: Make spapr_create_fdt_skel() get information from machine state · a19f7fb0

由 David Gibson 提交于 10月 20, 2016

Currently spapr_create_fdt_skel() takes a bunch of individual parameters
for various things it will put in the device tree.  Some of these can
already be taken directly from sPAPRMachineState.  This patch alters it so
that all of them can be taken from there, which will allow this code to
be moved away from its current caller in future.
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
Reviewed-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: NThomas Huth <thuth@redhat.com>
Reviewed-by: NMichael Roth <mdroth@linux.vnet.ibm.com>

a19f7fb0

pseries: Remove rtas_addr and fdt_addr fields from machinestate · cae172ab

由 David Gibson 提交于 10月 20, 2016

These values are used only within ppc_spapr_reset(), so just change them
to local variables.
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
Reviewed-by: NThomas Huth <thuth@redhat.com>
Reviewed-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: NMichael Roth <mdroth@linux.vnet.ibm.com>

cae172ab

pseries: Split device tree construction from device tree load · 997b6cfc

由 David Gibson 提交于 10月 25, 2016

spapr_finalize_fdt() both finishes building the device tree for the guest
and loads it into guest memory.  For future cleanups, it's going to be
more convenient to do these two things separately.  The loading portion is
pretty trivial, so we move it inline into the caller, ppc_spapr_reset().

We also rename spapr_finalize_fdt(), because the current name is going to
become inaccurate.
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
Reviewed-by: NMichael Roth <mdroth@linux.vnet.ibm.com>
Reviewed-by: NAlexey Kardashevskiy <aik@ozlabs.ru>

997b6cfc

25 10月, 2016 1 次提交

Increase MAX_CPUMASK_BITS from 255 to 288 · 079019f2

由 Igor Mammedov 提交于 10月 19, 2016

so that it would be possible to increase maxcpus limit
for x86 target. Keep spapr/virt_arm at limit they used
to have 255.
Signed-off-by: NIgor Mammedov <imammedo@redhat.com>
Reviewed-by: NAndrew Jones <drjones@redhat.com>
Reviewed-by: NEduardo Habkost <ehabkost@redhat.com>
Signed-off-by: NEduardo Habkost <ehabkost@redhat.com>

079019f2

16 10月, 2016 4 次提交

spapr: Improved placement of PCI host bridges in guest memory map · 357d1e3b

由 David Gibson 提交于 10月 16, 2016

Currently, the MMIO space for accessing PCI on pseries guests begins at
1 TiB in guest address space.  Each PCI host bridge (PHB) has a 64 GiB
chunk of address space in which it places its outbound PIO and 32-bit and
64-bit MMIO windows.

This scheme as several problems:
  - It limits guest RAM to 1 TiB (though we have a limited fix for this
    now)
  - It limits the total MMIO window to 64 GiB.  This is not always enough
    for some of the large nVidia GPGPU cards
  - Putting all the windows into a single 64 GiB area means that naturally
    aligning things within there will waste more address space.
In addition there was a miscalculation in some of the defaults, which meant
that the MMIO windows for each PHB actually slightly overran the 64 GiB
region for that PHB.  We got away without nasty consequences because
the overrun fit within an unused area at the beginning of the next PHB's
region, but it's not pretty.

This patch implements a new scheme which addresses those problems, and is
also closer to what bare metal hardware and pHyp guests generally use.

Because some guest versions (including most current distro kernels) can't
access PCI MMIO above 64 TiB, we put all the PCI windows between 32 TiB and
64 TiB.  This is broken into 1 TiB chunks.  The first 1 TiB contains the
PIO (64 kiB) and 32-bit MMIO (2 GiB) windows for all of the PHBs.  Each
subsequent TiB chunk contains a naturally aligned 64-bit MMIO window for
one PHB each.

This reduces the number of allowed PHBs (without full manual configuration
of all the windows) from 256 to 31, but this should still be plenty in
practice.

We also change some of the default window sizes for manually configured
PHBs to saner values.

Finally we adjust some tests and libqos so that it correctly uses the new
default locations.  Ideally it would parse the device tree given to the
guest, but that's a more complex problem for another time.
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
Reviewed-by: NLaurent Vivier <lvivier@redhat.com>

357d1e3b

spapr_pci: Add a 64-bit MMIO window · daa23699

由 David Gibson 提交于 10月 11, 2016

On real hardware, and under pHyp, the PCI host bridges on Power machines
typically advertise two outbound MMIO windows from the guest's physical
memory space to PCI memory space:
  - A 32-bit window which maps onto 2GiB..4GiB in the PCI address space
  - A 64-bit window which maps onto a large region somewhere high in PCI
    address space (traditionally this used an identity mapping from guest
    physical address to PCI address, but that's not always the case)

The qemu implementation in spapr-pci-host-bridge, however, only supports a
single outbound MMIO window, however.  At least some Linux versions expect
the two windows however, so we arranged this window to map onto the PCI
memory space from 2 GiB..~64 GiB, then advertised it as two contiguous
windows, the "32-bit" window from 2G..4G and the "64-bit" window from
4G..~64G.

This approach means, however, that the 64G window is not naturally aligned.
In turn this limits the size of the largest BAR we can map (which does have
to be naturally aligned) to roughly half of the total window.  With some
large nVidia GPGPU cards which have huge memory BARs, this is starting to
be a problem.

This patch adds true support for separate 32-bit and 64-bit outbound MMIO
windows to the spapr-pci-host-bridge implementation, each of which can
be independently configured.  The 32-bit window always maps to 2G.. in PCI
space, but the PCI address of the 64-bit window can be configured (it
defaults to the same as the guest physical address).

So as not to break possible existing configurations, as long as a 64-bit
window is not specified, a large single window can be specified.  This
will appear the same way to the guest as the old approach, although it's
now implemented by two contiguous memory regions rather than a single one.

For now, this only adds the possibility of 64-bit windows.  The default
configuration still uses the legacy mode.
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
Reviewed-by: NLaurent Vivier <lvivier@redhat.com>

daa23699

spapr: Adjust placement of PCI host bridge to allow > 1TiB RAM · 2efff1c0

由 David Gibson 提交于 10月 13, 2016

Currently the default PCI host bridge for the 'pseries' machine type is
constructed with its IO windows in the 1TiB..(1TiB + 64GiB) range in
guest memory space. This means that if > 1TiB of guest RAM is specified,
the RAM will collide with the PCI IO windows, causing serious problems.

Problems won't be obvious until guest RAM goes a bit beyond 1TiB, because
there's a little unused space at the bottom of the area reserved for PCI,
but essentially this means that > 1TiB of RAM has never worked with the
pseries machine type.

This patch fixes this by altering the placement of PHBs on large-RAM VMs.
Instead of always placing the first PHB at 1TiB, it is placed at the next
1 TiB boundary after the maximum RAM address.

Technically, this changes behaviour in a migration-breaking way for
existing machines with > 1TiB maximum memory, but since having > 1 TiB
memory was broken anyway, this seems like a reasonable trade-off.
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
Reviewed-by: NLaurent Vivier <lvivier@redhat.com>

2efff1c0

spapr_pci: Delegate placement of PCI host bridges to machine type · 6737d9ad

由 David Gibson 提交于 10月 13, 2016

The 'spapr-pci-host-bridge' represents the virtual PCI host bridge (PHB)
for a PAPR guest. Unlike on x86, it's routine on Power (both bare metal
and PAPR guests) to have numerous independent PHBs, each controlling a
separate PCI domain.

There are two ways of configuring the spapr-pci-host-bridge device: first
it can be done fully manually, specifying the locations and sizes of all
the IO windows. This gives the most control, but is very awkward with 6
mandatory parameters. Alternatively just an "index" can be specified
which essentially selects from an array of predefined PHB locations.
The PHB at index 0 is automatically created as the default PHB.

The current set of default locations causes some problems for guests with
large RAM (> 1 TiB) or PCI devices with very large BARs (e.g. big nVidia
GPGPU cards via VFIO). Obviously, for migration we can only change the
locations on a new machine type, however.

This is awkward, because the placement is currently decided within the
spapr-pci-host-bridge code, so it breaks abstraction to look inside the
machine type version.

So, this patch delegates the "default mode" PHB placement from the
spapr-pci-host-bridge device back to the machine type via a public method
in sPAPRMachineClass. It's still a bit ugly, but it's about the best we
can do.

For now, this just changes where the calculation is done. It doesn't
change the actual location of the host bridges, or any other behaviour.
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
Reviewed-by: NLaurent Vivier <lvivier@redhat.com>

6737d9ad

14 10月, 2016 1 次提交

spapr: fix inheritance chain for default machine options · 672de881

由 Michael Roth 提交于 10月 12, 2016

Rather than machine instances having backward-compatible option
defaults that need to be repeatedly re-enabled for every new machine
type we introduce, we set the defaults appropriate for newer machine
types, then add code to explicitly disable instance options as needed
to maintain compatibility with older machine types.

Currently pseries-2.5 does not inherit from pseries-2.6 in this
fashion, which is okay at the moment since we do not have any
instance compatibility options for pseries-2.6+ currently.

We will make use of this in future patches though, so fix it here.
Signed-off-by: NMichael Roth <mdroth@linux.vnet.ibm.com>
[dwg: Extended to make 2.7 inherit from 2.8 as well]
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>

672de881

13 10月, 2016 3 次提交

ppc: Check the availability of transactional memory · 2e68f288

由 Thomas Huth 提交于 9月 28, 2016

KVM-PR currently does not support transactional memory, and the
implementation in TCG is just a fake. We should not announce TM
support in the ibm,pa-features property when running on such a
system, so disable it by default and only enable it if the KVM
implementation supports it (i.e. recent versions of KVM-HV).
These changes are based on some earlier work from Anton Blanchard
(thanks!).
Signed-off-by: NThomas Huth <thuth@redhat.com>
Reviewed-by: NCédric Le Goater <clg@kaod.org>
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
(cherry picked from commit bac3bf28)

2e68f288

hw/ppc/spapr: Fix the selection of the processor features · 45a4f18e

由 Thomas Huth 提交于 9月 28, 2016

The current code uses pa_features_206 for POWERPC_MMU_2_06, and
for everything else, it uses pa_features_207. This is bad in some
cases because there is also a "degraded" MMU version of ISA 2.06,
called POWERPC_MMU_2_06a, which should of course use the flags for
2.06 instead. And there is also the possibility that the user runs
the pseries machine with a POWER5+ or even 970 processor. In that
case we certainly do not want to set the flags for 2.07, and rather
simply skip the setting of the pa-features property instead.
Signed-off-by: NThomas Huth <thuth@redhat.com>
Reviewed-by: NCédric Le Goater <clg@kaod.org>
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
(cherry picked from commit 4cbec30d)

45a4f18e

hw/ppc/spapr: Move code related to "ibm,pa-features" to a separate function · 5c179666

由 Thomas Huth 提交于 9月 28, 2016

The function spapr_populate_cpu_dt() has become quite big
already, and since we likely have to extend the pa-features
property for every new processor generation, it is nicer
if we put the related code into a separate function.
Signed-off-by: NThomas Huth <thuth@redhat.com>
Reviewed-by: NCédric Le Goater <clg@kaod.org>
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
(cherry picked from commit 230bf719)

5c179666

06 10月, 2016 1 次提交

hw/ppc/spapr: Use POWER8 by default for the pseries-2.8 machine · 3daa4a9f

由 Thomas Huth 提交于 10月 05, 2016

A couple of distributors are compiling their distributions
with "-mcpu=power8" for ppc64le these days, so the user sooner
or later runs into a crash there when not explicitely specifying
the "-cpu POWER8" option to QEMU (which is currently using POWER7
for the "pseries" machine by default). Due to this reason, the
linux-user target already switched to POWER8 a while ago (see commit
de3f1b98). Since the softmmu target
of course has the same problem, we should switch there to POWER8 for
the newer machine types, too.
Signed-off-by: NThomas Huth <thuth@redhat.com>
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>

3daa4a9f

05 10月, 2016 4 次提交

ppc: Check the availability of transactional memory · bac3bf28

由 Thomas Huth 提交于 9月 28, 2016

KVM-PR currently does not support transactional memory, and the
implementation in TCG is just a fake. We should not announce TM
support in the ibm,pa-features property when running on such a
system, so disable it by default and only enable it if the KVM
implementation supports it (i.e. recent versions of KVM-HV).
These changes are based on some earlier work from Anton Blanchard
(thanks!).
Signed-off-by: NThomas Huth <thuth@redhat.com>
Reviewed-by: NCédric Le Goater <clg@kaod.org>
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>

bac3bf28

hw/ppc/spapr: Fix the selection of the processor features · 4cbec30d

由 Thomas Huth 提交于 9月 28, 2016

The current code uses pa_features_206 for POWERPC_MMU_2_06, and
for everything else, it uses pa_features_207. This is bad in some
cases because there is also a "degraded" MMU version of ISA 2.06,
called POWERPC_MMU_2_06a, which should of course use the flags for
2.06 instead. And there is also the possibility that the user runs
the pseries machine with a POWER5+ or even 970 processor. In that
case we certainly do not want to set the flags for 2.07, and rather
simply skip the setting of the pa-features property instead.
Signed-off-by: NThomas Huth <thuth@redhat.com>
Reviewed-by: NCédric Le Goater <clg@kaod.org>
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>

4cbec30d

hw/ppc/spapr: Move code related to "ibm,pa-features" to a separate function · 230bf719

由 Thomas Huth 提交于 9月 28, 2016

The function spapr_populate_cpu_dt() has become quite big
already, and since we likely have to extend the pa-features
property for every new processor generation, it is nicer
if we put the related code into a separate function.
Signed-off-by: NThomas Huth <thuth@redhat.com>
Reviewed-by: NCédric Le Goater <clg@kaod.org>
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>

230bf719

pseries: Add 2.8 machine type, set up compatibility macros · db800b21

由 David Gibson 提交于 9月 28, 2016

Now that 2.7 is released, create the pseries-2.8 machine type and add the
boilerplate compatiblity macro stuff. There's nothing new to put into the
2.7 compatiliby properties yet, but we'll need something eventually, so
we might as well get it ready now.
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>

db800b21

28 9月, 2016 1 次提交

sysbus: Remove ignored return value of FindSysbusDeviceFunc · 4f01a637

由 David Gibson 提交于 9月 21, 2016

Functions of type FindSysbusDeviceFunc currently return an integer.
However, this return value is always ignored by the caller in
find_sysbus_device().

This changes the function type to return void, to avoid confusion over
the function semantics.
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
Reviewed-by: NEduardo Habkost <ehabkost@redhat.com>
Signed-off-by: NEduardo Habkost <ehabkost@redhat.com>

4f01a637

27 9月, 2016 1 次提交

cpus: pass CPUState to run_on_cpu helpers · e0eeb4a2

由 Alex Bennée 提交于 8月 02, 2016

CPUState is a fairly common pointer to pass to these helpers. This means
if you need other arguments for the async_run_on_cpu case you end up
having to do a g_malloc to stuff additional data into the routine. For
the current users this isn't a massive deal but for MTTCG this gets
cumbersome when the only other parameter is often an address.

This adds the typedef run_on_cpu_func for helper functions which has an
explicit CPUState * passed as the first parameter. All the users of
run_on_cpu and async_run_on_cpu have had their helpers updated to use
CPUState where available.
Signed-off-by: NAlex Bennée <alex.bennee@linaro.org>
[Sergey Fedorov:
 - eliminate more CPUState in user data;
 - remove unnecessary user data passing;
 - fix target-s390x/kvm.c and target-s390x/misc_helper.c]
Signed-off-by: NSergey Fedorov <sergey.fedorov@linaro.org>
Acked-by: David Gibson <david@gibson.dropbear.id.au> (ppc parts)
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com> (s390 parts)
Signed-off-by: NAlex Bennée <alex.bennee@linaro.org>
Message-Id: <1470158864-17651-3-git-send-email-alex.bennee@linaro.org>
Reviewed-by: NRichard Henderson <rth@twiddle.net>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

e0eeb4a2

23 9月, 2016 1 次提交

vl: Switch qemu_uuid to QemuUUID · 9c5ce8db

由 Fam Zheng 提交于 9月 21, 2016

Update all qemu_uuid users as well, especially get rid of the duplicated
low level g_strdup_printf, sscanf and snprintf calls with QEMU UUID API.

Since qemu_uuid_parse is quite tangled with qemu_uuid, its switching to
QemuUUID is done here too to keep everything in sync and avoid code
churn.
Signed-off-by: NFam Zheng <famz@redhat.com>
Reviewed-by: NEric Blake <eblake@redhat.com>
Reviewed-by: NJeff Cody <jcody@redhat.com>
Message-Id: <1474432046-325-10-git-send-email-famz@redhat.com>

9c5ce8db