提交 · 82516263cead40ac240ae5fb2a6f5fc0fda9614c · openeuler / qemu

14 3月, 2017 1 次提交

pseries: Don't expose PCIe extended config space on older machine types · 82516263

由 David Gibson 提交于 3月 14, 2017

bb998645 "spapr_pci: Advertise access to PCIe extended config space"
allowed guests to access the extended config space of PCI Express devices
via the PAPR interfaces, even though the paravirtualized bus mostly acts
like plain PCI.

However, that patch enabled access unconditionally, including for existing
machine types, which is an unwise change in behaviour.  This patch limits
the change to pseries-2.9 (and later) machine types.
Suggested-by: NAndrea Bolognani <abologna@redhat.com>
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>

82516263

01 3月, 2017 1 次提交

ppc/xics: use the QOM interface to get irqs · f7759e43

由 Cédric Le Goater 提交于 2月 27, 2017

Signed-off-by: NCédric Le Goater <clg@kaod.org>
Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>

f7759e43

23 11月, 2016 1 次提交

spapr: Fix 2.7<->2.8 migration of PCI host bridge · 5c4537bd

由 David Gibson 提交于 11月 23, 2016

daa23699 "spapr_pci: Add a 64-bit MMIO window" subtly broke migration
from qemu-2.7 to the current version.  It split the device's MMIO
window into two pieces for 32-bit and 64-bit MMIO.

The patch included backwards compatibility code to convert the old
property into the new format.  However, the property value was also
transferred in the migration stream and compared with a (probably
unwise) VMSTATE_EQUAL.  So, the "raw" value from 2.7 is compared to
the new style converted value from (pre-)2.8 giving a mismatch and
migration failure.

Along with the actual field that caused the breakage, there are
several other ill-advised VMSTATE_EQUAL()s.  To fix forwards
migration, we read the values in the stream into scratch variables and
ignore them, instead of comparing for equality.  To fix backwards
migration, we populate those scratch variables in pre_save() with
adjusted values to match the old behaviour.

To permit the eventual possibility of removing this cruft from the
stream, we only include these compatibility fields if a new
'pre-2.8-migration' property is set.  We clear it on the pseries-2.8
machine type, which obviously can't be migrated backwards, but set it
on earlier machine type versions.
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
Reviewed-by: NDr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: NThomas Huth <thuth@redhat.com>
Reviewed-by: NGreg Kurz <groug@kaod.org>
Reviewed-by: NAlexey Kardashevskiy <aik@ozlabs.ru>

5c4537bd

16 10月, 2016 3 次提交

spapr: Improved placement of PCI host bridges in guest memory map · 357d1e3b

由 David Gibson 提交于 10月 16, 2016

Currently, the MMIO space for accessing PCI on pseries guests begins at
1 TiB in guest address space.  Each PCI host bridge (PHB) has a 64 GiB
chunk of address space in which it places its outbound PIO and 32-bit and
64-bit MMIO windows.

This scheme as several problems:
  - It limits guest RAM to 1 TiB (though we have a limited fix for this
    now)
  - It limits the total MMIO window to 64 GiB.  This is not always enough
    for some of the large nVidia GPGPU cards
  - Putting all the windows into a single 64 GiB area means that naturally
    aligning things within there will waste more address space.
In addition there was a miscalculation in some of the defaults, which meant
that the MMIO windows for each PHB actually slightly overran the 64 GiB
region for that PHB.  We got away without nasty consequences because
the overrun fit within an unused area at the beginning of the next PHB's
region, but it's not pretty.

This patch implements a new scheme which addresses those problems, and is
also closer to what bare metal hardware and pHyp guests generally use.

Because some guest versions (including most current distro kernels) can't
access PCI MMIO above 64 TiB, we put all the PCI windows between 32 TiB and
64 TiB.  This is broken into 1 TiB chunks.  The first 1 TiB contains the
PIO (64 kiB) and 32-bit MMIO (2 GiB) windows for all of the PHBs.  Each
subsequent TiB chunk contains a naturally aligned 64-bit MMIO window for
one PHB each.

This reduces the number of allowed PHBs (without full manual configuration
of all the windows) from 256 to 31, but this should still be plenty in
practice.

We also change some of the default window sizes for manually configured
PHBs to saner values.

Finally we adjust some tests and libqos so that it correctly uses the new
default locations.  Ideally it would parse the device tree given to the
guest, but that's a more complex problem for another time.
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
Reviewed-by: NLaurent Vivier <lvivier@redhat.com>

357d1e3b

spapr_pci: Add a 64-bit MMIO window · daa23699

由 David Gibson 提交于 10月 11, 2016

On real hardware, and under pHyp, the PCI host bridges on Power machines
typically advertise two outbound MMIO windows from the guest's physical
memory space to PCI memory space:
  - A 32-bit window which maps onto 2GiB..4GiB in the PCI address space
  - A 64-bit window which maps onto a large region somewhere high in PCI
    address space (traditionally this used an identity mapping from guest
    physical address to PCI address, but that's not always the case)

The qemu implementation in spapr-pci-host-bridge, however, only supports a
single outbound MMIO window, however.  At least some Linux versions expect
the two windows however, so we arranged this window to map onto the PCI
memory space from 2 GiB..~64 GiB, then advertised it as two contiguous
windows, the "32-bit" window from 2G..4G and the "64-bit" window from
4G..~64G.

This approach means, however, that the 64G window is not naturally aligned.
In turn this limits the size of the largest BAR we can map (which does have
to be naturally aligned) to roughly half of the total window.  With some
large nVidia GPGPU cards which have huge memory BARs, this is starting to
be a problem.

This patch adds true support for separate 32-bit and 64-bit outbound MMIO
windows to the spapr-pci-host-bridge implementation, each of which can
be independently configured.  The 32-bit window always maps to 2G.. in PCI
space, but the PCI address of the 64-bit window can be configured (it
defaults to the same as the guest physical address).

So as not to break possible existing configurations, as long as a 64-bit
window is not specified, a large single window can be specified.  This
will appear the same way to the guest as the old approach, although it's
now implemented by two contiguous memory regions rather than a single one.

For now, this only adds the possibility of 64-bit windows.  The default
configuration still uses the legacy mode.
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
Reviewed-by: NLaurent Vivier <lvivier@redhat.com>

daa23699

spapr_pci: Delegate placement of PCI host bridges to machine type · 6737d9ad

由 David Gibson 提交于 10月 13, 2016

The 'spapr-pci-host-bridge' represents the virtual PCI host bridge (PHB)
for a PAPR guest. Unlike on x86, it's routine on Power (both bare metal
and PAPR guests) to have numerous independent PHBs, each controlling a
separate PCI domain.

There are two ways of configuring the spapr-pci-host-bridge device: first
it can be done fully manually, specifying the locations and sizes of all
the IO windows. This gives the most control, but is very awkward with 6
mandatory parameters. Alternatively just an "index" can be specified
which essentially selects from an array of predefined PHB locations.
The PHB at index 0 is automatically created as the default PHB.

The current set of default locations causes some problems for guests with
large RAM (> 1 TiB) or PCI devices with very large BARs (e.g. big nVidia
GPGPU cards via VFIO). Obviously, for migration we can only change the
locations on a new machine type, however.

This is awkward, because the placement is currently decided within the
spapr-pci-host-bridge code, so it breaks abstraction to look inside the
machine type version.

So, this patch delegates the "default mode" PHB placement from the
spapr-pci-host-bridge device back to the machine type via a public method
in sPAPRMachineClass. It's still a bit ugly, but it's about the best we
can do.

For now, this just changes where the calculation is done. It doesn't
change the actual location of the host bridges, or any other behaviour.
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
Reviewed-by: NLaurent Vivier <lvivier@redhat.com>

6737d9ad

23 9月, 2016 1 次提交

spapr_pci: Add numa node id · 4814401f

由 Alexey Kardashevskiy 提交于 7月 27, 2016

This adds a numa id property to a PHB to allow linking passed PCI device
to CPU/memory. It is up to the management stack to do CPU/memory pinning
to the node with the actual PCI device.
Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
[dwg: Renamed property from "node" to "numa_node" to match the similar
 one in the pxb device]
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>

4814401f

15 9月, 2016 1 次提交

Remove unused function declarations · d4b84d56

由 Ladi Prosek 提交于 6月 13, 2016

Unused function declarations were found using a simple gcc plugin and
manually verified by grepping the sources.
Signed-off-by: NLadi Prosek <lprosek@redhat.com>
Signed-off-by: NMichael Tokarev <mjt@tls.msk.ru>

d4b84d56

12 7月, 2016 2 次提交

Clean up header guards that don't match their file name · 121d0712

由 Markus Armbruster 提交于 6月 29, 2016

Header guard symbols should match their file name to make guard
collisions less likely.  Offenders found with
scripts/clean-header-guards.pl -vn.

Cleaned up with scripts/clean-header-guards.pl, followed by some
renaming of new guard symbols picked by the script to better ones.
Signed-off-by: NMarkus Armbruster <armbru@redhat.com>
Reviewed-by: NRichard Henderson <rth@twiddle.net>

121d0712

spapr_pci: Include spapr.h instead of playing games with #error · 20668fde

由 Markus Armbruster 提交于 6月 29, 2016

include/hw/pci-host/spapr.h needs hw/ppc/spapr.h.  It checks whether
its header guard is defined, and errors out if it isn't.

Playing games with some other header's guard symbol is not a good
idea.  Just include the frackin' header already.
Signed-off-by: NMarkus Armbruster <armbru@redhat.com>
Reviewed-by: NRichard Henderson <rth@twiddle.net>

20668fde

05 7月, 2016 1 次提交

spapr_pci/spapr_pci_vfio: Support Dynamic DMA Windows (DDW) · ae4de14c

由 Alexey Kardashevskiy 提交于 7月 04, 2016

This adds support for Dynamic DMA Windows (DDW) option defined by
the SPAPR specification which allows to have additional DMA window(s)

The "ddw" property is enabled by default on a PHB but for compatibility
the pseries-2.6 machine and older disable it.
This also creates a single DMA window for the older machines to
maintain backward migration.

This implements DDW for PHB with emulated and VFIO devices. The host
kernel support is required. The advertised IOMMU page sizes are 4K and
64K; 16M pages are supported but not advertised by default, in order to
enable them, the user has to specify "pgsz" property for PHB and
enable huge pages for RAM.

The existing linux guests try creating one additional huge DMA window
with 64K or 16MB pages and map the entire guest RAM to. If succeeded,
the guest switches to dma_direct_ops and never calls TCE hypercalls
(H_PUT_TCE,...) again. This enables VFIO devices to use the entire RAM
and not waste time on map/unmap later. This adds a "dma64_win_addr"
property which is a bus address for the 64bit window and by default
set to 0x800.0000.0000.0000 as this is what the modern POWER8 hardware
uses and this allows having emulated and VFIO devices on the same bus.

This adds 4 RTAS handlers:
* ibm,query-pe-dma-window
* ibm,create-pe-dma-window
* ibm,remove-pe-dma-window
* ibm,reset-pe-dma-window
These are registered from type_init() callback.

These RTAS handlers are implemented in a separate file to avoid polluting
spapr_iommu.c with PCI.

This changes sPAPRPHBState::dma_liobn to an array to allow 2 LIOBNs
and updates all references to dma_liobn. However this does not add
64bit LIOBN to the migration stream as in fact even 32bit LIOBN is
rather pointless there (as it is a PHB property and the management
software can/should pass LIOBNs via CLI) but we keep it for the backward
migration support.
Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>

ae4de14c

01 7月, 2016 1 次提交

ppc/xics: Replace "icp" with "xics" in most places · 27f24582

由 Benjamin Herrenschmidt 提交于 6月 29, 2016

The "ICP" is a different object than the "XICS". For historical reasons,
we have a number of places where we name a variable "icp" while it contains
a XICSState pointer. There *is* an ICPState structure too so this makes
the code really confusing.

This is a mechanical replacement of all those instances to use the name
"xics" instead. There should be no functional change.
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
[spapr_cpu_init has been moved to spapr_cpu_core.c, change there]
Signed-off-by: NNikunj A Dadhania <nikunj@linux.vnet.ibm.com>
Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>

27f24582

07 6月, 2016 1 次提交

spapr_pci: Add and export DMA resetting helper · b3162f22

由 Alexey Kardashevskiy 提交于 6月 01, 2016

This will be later used by the "ibm,reset-pe-dma-window" RTAS handler
which resets the DMA configuration to the defaults.
Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>

b3162f22

16 3月, 2016 4 次提交

spapr_pci: Remove finish_realize hook · a36304fd

由 David Gibson 提交于 2月 29, 2016

Now that spapr-pci-vfio-host-bridge is reduced to just a stub, there is
only one implementation of the finish_realize hook in sPAPRPHBClass.  So,
we can fold that implementation into its (single) caller, and remove the
hook.  That's the last thing left in sPAPRPHBClass, so that can go away as
well.
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
Reviewed-by: NAlexey Kardashevskiy <aik@ozlabs.ru>

a36304fd

spapr_pci: (Mostly) remove spapr-pci-vfio-host-bridge · 72700d7e

由 David Gibson 提交于 2月 29, 2016

Now that the regular spapr-pci-host-bridge can handle EEH, there are only
two things that spapr-pci-vfio-host-bridge does differently:
    1. automatically sizes its DMA window to match the host IOMMU
    2. checks if the attached VFIO container is backed by the
       VFIO_SPAPR_TCE_IOMMU type on the host

(1) is not particularly useful, since the default window used by the
regular host bridge will work with the host IOMMU configuration on all
current systems anyway.

Plus, automatically changing guest visible configuration (such as the DMA
window) based on host settings is generally a bad idea.  It's not
definitively broken, since spapr-pci-vfio-host-bridge is only supposed to
support VFIO devices which can't be migrated anyway, but still.

(2) is not really useful, because if a guest tries to configure EEH on a
different host IOMMU, the first call will fail and that will be that.

It's possible there are scripts or tools out there which expect
spapr-pci-vfio-host-bridge, so we don't remove it entirely.  This patch
reduces it to just a stub for backwards compatibility.
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
Reviewed-by: NAlexey Kardashevskiy <aik@ozlabs.ru>

72700d7e

spapr_pci: Allow EEH on spapr-pci-host-bridge · c1fa017c

由 David Gibson 提交于 2月 29, 2016

Now that the EEH code is independent of the special
spapr-vfio-pci-host-bridge device, we can allow it on all spapr PCI
host bridges instead. We do this by changing spapr_phb_eeh_available()
to be based on the vfio_eeh_as_ok() call instead of the host bridge class.

Because the value of vfio_eeh_as_ok() can change with devices being
hotplugged or unplugged, this can potentially lead to some strange edge
cases where the guest starts using EEH, then it starts failing because
of a change in status.

However, it's not really any worse than the current situation. Cases that
would have worked previously will still work (i.e. VFIO devices from at
most one VFIO IOMMU group per vPHB), it's just that it's no longer
necessary to use spapr-vfio-pci-host-bridge with the groupid pre-specified.
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
Reviewed-by: NAlexey Kardashevskiy <aik@ozlabs.ru>

c1fa017c

spapr_pci: Eliminate class callbacks · fbb4e983

由 David Gibson 提交于 2月 29, 2016

The EEH operations in the spapr-vfio-pci-host-bridge no longer rely on the
special groupid field in sPAPRPHBVFIOState. So we can simplify, removing
the class specific callbacks with direct calls based on a simple
spapr_phb_eeh_enabled() helper. For now we implement that in terms of
a boolean in the class, but we'll continue to clean that up later.

On its own this is a rather strange way of doing things, but it's a useful
intermediate step to further cleanups.
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
Reviewed-by: NAlexey Kardashevskiy <aik@ozlabs.ru>

fbb4e983

23 10月, 2015 1 次提交

spapr_pci: Allow PCI host bridge DMA window to be configured · f93caaac

由 David Gibson 提交于 9月 24, 2015

At present the PCI host bridge (PHB) for the pseries machine type has a
fixed DMA window from 0..1GB (in PCI address space) which is mapped to real
memory via the PAPR paravirtualized IOMMU.

For better support of VFIO devices, we're going to want to allow for
different configurations of the DMA window.

Eventually we'll want to allow the guest itself to reconfigure the window
via the PAPR dynamic DMA window interface, but as a preliminary this patch
allows the user to reconfigure the window with new properties on the PHB
device.
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
Reviewed-by: NThomas Huth <thuth@redhat.com>
Reviewed-by: NLaurent Vivier <lvivier@redhat.com>

f93caaac

07 7月, 2015 1 次提交

spapr: Merge sPAPREnvironment into sPAPRMachineState · 28e02042

由 David Gibson 提交于 7月 02, 2015

The code for -machine pseries maintains a global sPAPREnvironment structure
which keeps track of general state information about the guest platform.
This predates the existence of the MachineState structure, but performs
basically the same function.

Now that we have the generic MachineState, fold sPAPREnvironment into
sPAPRMachineState, the pseries specific subclass of MachineState.

This is mostly a matter of search and replace, although a few places which
relied on the global spapr variable are changed to find the structure via
qdev_get_machine().
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
Signed-off-by: NAlexander Graf <agraf@suse.de>

28e02042

04 6月, 2015 3 次提交

spapr_pci: add dynamic-reconfiguration option for spapr-pci-host-bridge · 7619c7b0

由 Michael Roth 提交于 5月 07, 2015

This option enables/disables PCI hotplug for a particular PHB.

Also add machine compatibility code to disable it by default for machine
types prior to pseries-2.4.
Signed-off-by: NMichael Roth <mdroth@linux.vnet.ibm.com>
Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
[agraf: move commas for compat fields]
Signed-off-by: NAlexander Graf <agraf@suse.de>

7619c7b0

spapr_pci: Make find_phb()/find_dev() public · 46c5874e

由 Alexey Kardashevskiy 提交于 5月 07, 2015

This makes find_phb()/find_dev() public and changed its names
to spapr_pci_find_phb()/spapr_pci_find_dev() as they are going to
be used from other parts of QEMU such as VFIO DDW (dynamic DMA window)
or VFIO PCI error injection or VFIO EEH handling - in all these
cases there are RTAS calls which are addressed to BUID+config_addr
in IEEE1275 format.
Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
Signed-off-by: NAlexander Graf <agraf@suse.de>

46c5874e

spapr_pci: Define default DMA window size as a macro · 3e1a01cb

由 Alexey Kardashevskiy 提交于 5月 07, 2015

This gets rid of a magic constant describing the default DMA window size
for an emulated PHB.
Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
Signed-off-by: NAlexander Graf <agraf@suse.de>

3e1a01cb

09 3月, 2015 3 次提交

sPAPR: Implement EEH RTAS calls · ee954280

由 Gavin Shan 提交于 2月 20, 2015

The emulation for EEH RTAS requests from guest isn't covered
by QEMU yet and the patch implements them.

The patch defines constants used by EEH RTAS calls and adds
callbacks sPAPRPHBClass::{eeh_set_option, eeh_get_state, eeh_reset,
eeh_configure}, which are going to be used as follows:

  * RTAS calls are received in spapr_pci.c, sanity check is done
    there.
  * RTAS handlers handle what they can. If there is something it
    cannot handle and the corresponding sPAPRPHBClass callback is
    defined, it is called.
  * Those callbacks are only implemented for VFIO now. They do ioctl()
    to the IOMMU container fd to complete the calls. Error codes from
    that ioctl() are transferred back to the guest.

[aik: defined RTAS tokens for EEH RTAS calls]
Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
Signed-off-by: NAlexander Graf <agraf@suse.de>

ee954280

spapr-pci: Enable huge BARs · b194df47

由 Alexey Kardashevskiy 提交于 1月 30, 2015

At the moment sPAPR only supports 512MB window for MMIO BARs. However
modern devices might want bigger 64bit BARs.

This extends MMIO window from 512MB to 62GB (aligned to
SPAPR_PCI_WINDOW_SPACING) and advertises it in 2 records in
the PHB "ranges" property. 32bit gets the space from
SPAPR_PCI_MEM_WIN_BUS_OFFSET till the end of 4GB, 64bit gets the rest
of the space. If no space is left, 64bit range is not advertised.

The MMIO space size is set to old value of 0x20000000 by default
for pseries machines older than 2.3.

The approach changes the device tree which is a guest visible change, however
it won't break migration as:
1. we do not support migration to older QEMU versions
2. migration to newer QEMU will migrate the device tree as well and since
the new layout only extends the old one and does not change address mappigns,
no breakage is expected here too.

SLOF change is required to utilize this extension.
Suggested-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
Signed-off-by: NAlexander Graf <agraf@suse.de>

b194df47

pseries: Limit PCI host bridge "index" value · 3e4ac968

由 David Gibson 提交于 1月 14, 2015

pseries guests can have large numbers of PCI host bridges.  To avoid the
user having to specify a number of different configuration values for every
one, the device supports an "index" property which is a shorthand setting
the various window and configuration addresses from a predefined sensible
set.

There are some problems with the details at present:
  * The "index" propery is signed, but negative values will create PCI
windows below where we expect, potentially colliding with other devices
  * No limit is imposed on the "index" property and large values can
translate to extremely large window addresses.  With PCI passthrough in
particular this can mean we exceed various mapping and physical address
limits causing the guest host bridge to not work in strange ways.

This patch addresses this, by making "index" unsigned, and imposing a
limit.  Currently the limit allows indices from 0..255 which is probably
enough host bridges for the time being.  It's fairly easy to extend if
we discover we need more.
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
Reviewed-by: NMichael Roth <mdroth@linux.vnet.ibm.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>

3e4ac968

08 9月, 2014 1 次提交

spapr_pci: map the MSI window in each PHB · 8c46f7ec

由 Greg Kurz 提交于 8月 27, 2014

On sPAPR, virtio devices are connected to the PCI bus and use MSI-X.
Commit cc943c36 has modified MSI-X
so that writes are made using the bus master address space and follow
the IOMMU path.

Unfortunately, the IOMMU address space address space does not have an
MSI window: the notification is silently dropped in unassigned_mem_write
instead of reaching the guest... The most visible effect is that all
virtio devices are non-functional on sPAPR since then. :(

This patch does the following:
1) map the MSI window into the IOMMU address space for each PHB
   - since each PHB instantiates its own IOMMU address space, we
     can safely map the window at a fixed address (SPAPR_PCI_MSI_WINDOW)
   - no real need to keep the MSI window setup in a separate function,
     the spapr_pci_msi_init() code moves to spapr_phb_realize().

2) kill the global MSI window as it is not needed in the end
Signed-off-by: NGreg Kurz <gkurz@linux.vnet.ibm.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>

8c46f7ec

27 6月, 2014 2 次提交

spapr_pci: Use XICS interrupt allocator and do not cache interrupts in PHB · 9a321e92

由 Alexey Kardashevskiy 提交于 5月 30, 2014

Currently SPAPR PHB keeps track of all allocated MSI (here and below
MSI stands for both MSI and MSIX) interrupt because
XICS used to be unable to reuse interrupts. This is a problem for
dynamic MSI reconfiguration which happens when guest reloads a driver
or performs PCI hotplug. Another problem is that the existing
implementation can enable MSI on 32 devices maximum
(SPAPR_MSIX_MAX_DEVS=32) and there is no good reason for that.

This makes use of new XICS ability to reuse interrupts.

This reorganizes MSI information storage in sPAPRPHBState. Instead of
static array of 32 descriptors (one per a PCI function), this patch adds
a GHashTable when @config_addr is a key and (first_irq, num) pair is
a value. GHashTable can dynamically grow and shrink so the initial limit
of 32 devices is gone.

This changes migration stream as @msi_table was a static array while new
@msi_devs is a dynamic hash table. This adds temporary array which is
used for migration, it is populated in "spapr_pci"::pre_save() callback
and expanded into the hash table in post_load() callback. Since
the destination side does not know the number of MSI-enabled devices
in advance and cannot pre-allocate the temporary array to receive
migration state, this makes use of new VMSTATE_STRUCT_VARRAY_ALLOC macro
which allocates the array automatically.

This resets the MSI configuration space when interrupts are released by
the ibm,change-msi RTAS call.

This fixed traces to be more informative.

This changes vmstate_spapr_pci_msi name from "...lsi" to "...msi" which
was incorrect by accident. As the internal representation changed,
thus bumps migration version number.
Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
[agraf: drop g_malloc_n usage]
Signed-off-by: NAlexander Graf <agraf@suse.de>

9a321e92

spapr_pci_vfio: Add spapr-pci-vfio-host-bridge to support vfio · 9fc34ada

由 Alexey Kardashevskiy 提交于 6月 10, 2014

The patch adds a spapr-pci-vfio-host-bridge device type
which is a PCI Host Bridge with VFIO support. The new device
inherits from the spapr-pci-host-bridge device and adds an "iommu"
property which is an IOMMU id. This ID represents a minimal entity
for which IOMMU isolation can be guaranteed. In SPAPR architecture IOMMU
group is called a Partitionable Endpoint (PE).

Current implementation supports one IOMMU id per QEMU VFIO PHB. Since
SPAPR allows multiple PHB for no extra cost, this does not seem to
be a problem. This limitation may change in the future though.

Example of use:
Configure and Add 3 functions of a multifunctional device to QEMU:
(the NEC PCI USB card is used as an example here):
-device spapr-pci-vfio-host-bridge,id=USB,iommu=4,index=7 \
-device vfio-pci,host=4:0:1.0,addr=1.0,bus=USB,multifunction=true
-device vfio-pci,host=4:0:1.1,addr=1.1,bus=USB
-device vfio-pci,host=4:0:1.2,addr=1.2,bus=USB

where:
* index=7 is a QEMU PHB index (used as source for MMIO/MSI/IO windows
offset);
* iommu=4 is an IOMMU id which can be found in sysfs:
[aik@vpl2 ~]$ cd /sys/bus/pci/devices/0004:00:00.0/
[aik@vpl2 0004:00:00.0]$ ls -l iommu_group
lrwxrwxrwx 1 root root 0 Jun  5 12:49 iommu_group -> ../../../kernel/iommu_groups/4
Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: NAlexander Graf <agraf@suse.de>

9fc34ada

16 6月, 2014 4 次提交

spapr_iommu: Get rid of window_size in sPAPRTCETable · 523e7b8a

由 Alexey Kardashevskiy 提交于 5月 27, 2014

This removes window_size as it is basically a copy of nb_table
shifted by SPAPR_TCE_PAGE_SHIFT. As new dynamic DMA windows are
going to support windows as big as the entire RAM and this number
will be bigger that 32 capacity, we will have to do something
about @window_size anyway and removal seems to be the right way to go.

This removes dma_window_start/dma_window_size from sPAPRPHBState as
they are no longer used.
Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: NAlexander Graf <agraf@suse.de>

523e7b8a

spapr_pci: Allow multiple TCE tables per PHB · e28c16f6

由 Alexey Kardashevskiy 提交于 5月 27, 2014

At the moment sPAPRPHBState contains a @tcet pointer to the only
TCE table. However sPAPR spec allows having more than one DMA window.

Since the TCE object is already a child of SPAPR PHB object, there is
no need to keep an additional pointer to it in sPAPRPHBState so remove it.

This changes the way sPAPRPHBState::reset performs reset of sPAPRTCETable
objects.

This changes the default DMA window properties calculation.
Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: NAlexander Graf <agraf@suse.de>

e28c16f6

spapr_pci: spapr_iommu: Make DMA window a subregion · cca7fad5

由 Alexey Kardashevskiy 提交于 5月 27, 2014

Currently the default DMA window is represented by a single MemoryRegion.
However there can be more than just one window so we need
a "root" memory region to be separated from the actual DMA window(s).

This introduces a "root" IOMMU memory region and adds a subregion for
the default DMA 32bit window. Following patches will add other
subregion(s).

This initializes a default DMA window subregion size to the guest RAM
size as this window can be switched into "bypass" mode which implements
direct DMA mapping.
Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: NAlexander Graf <agraf@suse.de>

cca7fad5

spapr_pci: Introduce a finish_realize() callback · da6ccee4

由 Alexey Kardashevskiy 提交于 5月 27, 2014

The spapr-pci PHB initializes IOMMU for emulated devices only.
The upcoming VFIO support will do it different. However both emulated
and VFIO PHB types share most of the initialization code.
For the type specific things a new finish_realize() callback is
introduced.

This introduces sPAPRPHBClass derived from PCIHostBridgeClass and
adds the callback pointer.

This implements finish_realize() for emulated devices.
Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
[agraf: Fix compilation]
Signed-off-by: NAlexander Graf <agraf@suse.de>

da6ccee4

02 9月, 2013 1 次提交

spapr-pci: rework MSI/MSIX · f1c2dc7c

由 Alexey Kardashevskiy 提交于 7月 12, 2013

On the sPAPR platform a guest allocates MSI/MSIX vectors via RTAS
hypercalls which return global IRQ numbers to a guest so it only
operates with those and never touches MSIMessage.

Therefore MSIMessage handling is completely hidden in QEMU.

Previously every sPAPR PCI host bridge implemented its own MSI window
to catch msi_notify()/msix_notify() calls from QEMU devices (virtio-pci
or vfio) and route them to the guest via qemu_pulse_irq().
MSIMessage used to be encoded as:
	.addr - address within the PHB MSI window;
	.data - the device index on PHB plus vector number.
The MSI MR write function translated this MSIMessage to a global IRQ
number and called qemu_pulse_irq().

However the total number of IRQs is not really big (at the moment it is
1024 IRQs starting from 4096) and even 16bit data field of MSIMessage
seems to be enough to store an IRQ number there.

This simplifies MSI handling in sPAPR PHB. Specifically, this does:
1. remove a MSI window from a PHB;
2. add a single memory region for all MSIs to sPAPREnvironment
and spapr_pci_msi_init() to initialize it;
3. encode MSIMessage as:
    * .addr - a fixed address of SPAPR_PCI_MSI_WINDOW==0x40000000000ULL;
    * .data as an IRQ number.
4. change IRQ allocator to align first IRQ number in a block for MSI.
MSI uses lower bits to specify the vector number so the first IRQ has to
be aligned. MSIX does not need any special allocator though.
Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: NAnthony Liguori <aliguori@us.ibm.com>
Acked-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NAlexander Graf <agraf@suse.de>

f1c2dc7c

29 7月, 2013 1 次提交

pseries: savevm support for PCI host bridge · 1112cf94

由 David Gibson 提交于 7月 18, 2013

This adds the necessary support for saving the state of the PAPR virtual
PCI host bridge (or host bridges).
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
Reviewed-by: NAnthony Liguori <aliguori@us.ibm.com>
Message-id: 1374175984-8930-10-git-send-email-aliguori@us.ibm.com
Signed-off-by: NAnthony Liguori <aliguori@us.ibm.com>

1112cf94

20 6月, 2013 2 次提交

pci: use memory core for iommu support · e00387d5

由 Avi Kivity 提交于 10月 30, 2012

Use the new iommu support in the memory core for iommu support.  The only
user, spapr, is also converted, but it still provides a DMAContext
interface until the non-PCI bits switch to AddressSpace.
Reviewed-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NAvi Kivity <avi.kivity@gmail.com>
[ Do not calls memory_region_del_subregion() on the device's
  bus_master_enable_region, it is an alias; return an AddressSpace
  from the IOMMU hook and remove the destructor hook. - David Gibson ]
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

e00387d5

spapr: convert TCE API to use an opaque type · 2b7dc949

由 Paolo Bonzini 提交于 4月 10, 2013

The TCE table is currently returned as a DMAContext, and non-type-safe
APIs are called later passing back the DMAContext.  Since we want to move
away from DMAContext, use an opaque type instead, and add an accessor
to retrieve the DMAContext from it.
Acked-by: NDavid Gibson <david@gibson.dropbear.id.au>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

2b7dc949

09 4月, 2013 1 次提交

hw: move headers to include/ · 0d09e41a

由 Paolo Bonzini 提交于 2月 05, 2013

Many of these should be cleaned up with proper qdev-/QOM-ification.
Right now there are many catch-all headers in include/hw/ARCH depending
on cpu.h, and this makes it necessary to compile these files per-target.
However, fixing this does not belong in these patches.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

0d09e41a

22 3月, 2013 1 次提交

pseries: Remove "busname" property for PCI host bridge · 89dfd6e1

由 David Gibson 提交于 3月 13, 2013

Currently the "spapr-pci-host-bridge" device has a "busname" property which
can be used to override the default assignment of qbus names for the bus
subordinate to the PHB. We use that for the default primary PCI bus, to
make libvirt happy, which expects there to be a bus named simply "pci".
The default qdev core logic would name the bus "pci.0", and the pseries
code would otherwise name it "pci@800000020000000" which is the name it
is given in the device tree based on its BUID.

The "busname" property is rather clunky though, so this patch simplifies
things by just using a special case hack for the default PHB, setting
busname to "pci" when index=0.
Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
Signed-off-by: NAlexander Graf <agraf@suse.de>

89dfd6e1

26 1月, 2013 1 次提交

pseries: Improve handling of multiple PCI host bridges · caae58cb

由 David Gibson 提交于 1月 23, 2013

Multiple - even many - PCI host bridges (i.e. PCI domains) are very
common on real PAPR compliant hardware. For reasons related to the
PAPR specified IOMMU interfaces, PCI device assignment with VFIO will
generally require at least two (virtual) PHBs and possibly more
depending on which devices are assigned.

At the moment the qemu PAPR PCI code will not deal with this well,
leaving several crucial parameters of PHBs other than the default one
uninitialized. This patch reworks the code to allow this.

Every PHB needs a unique BUID (Bus Unit Identifier, the id used for
the PAPR PCI related interfaces) and a unique LIOBN (Logical IO Bus
Number, the id used for the PAPR IOMMU related interfaces). In
addition they need windows in CPU real address space to access PCI
memory space, PCI IO space and MSIs. Properties are added to the PCI
host bridge qdevice to allow configuration of all these.

To simplify configuration of multiple PHBs for common cases, a
convenience "index" property is also added. This can be set instead
of the low-level properties, and will generate suitable values for the
other parameters, different for each index value.
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
Signed-off-by: NAlexander Graf <agraf@suse.de>

caae58cb

17 12月, 2012 1 次提交

pci: update all users to look in pci/ · a2cb15b0

由 Michael S. Tsirkin 提交于 12月 12, 2012

update all users so we can remove the makefile hack.
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

a2cb15b0