提交 · 95251725e335af2b885e2ab33dd29c86f8084663 · openeuler / qemu

31 10月, 2016 4 次提交

vfio: Add support for mmapping sub-page MMIO BARs · 95251725

由 Yongji Xie 提交于 10月 31, 2016

Now the kernel commit 05f0c03fbac1 ("vfio-pci: Allow to mmap
sub-page MMIO BARs if the mmio page is exclusive") allows VFIO
to mmap sub-page BARs. This is the corresponding QEMU patch.
With those patches applied, we could passthrough sub-page BARs
to guest, which can help to improve IO performance for some devices.

In this patch, we expand MemoryRegions of these sub-page
MMIO BARs to PAGE_SIZE in vfio_pci_write_config(), so that
the BARs could be passed to KVM ioctl KVM_SET_USER_MEMORY_REGION
with a valid size. The expanding size will be recovered when
the base address of sub-page BAR is changed and not page aligned
any more in guest. And we also set the priority of these BARs'
memory regions to zero in case of overlap with BARs which share
the same page with sub-page BARs in guest.
Signed-off-by: NYongji Xie <xyjxie@linux.vnet.ibm.com>
Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>

95251725

vfio/pci: fix out-of-sync BAR information on reset · a52a4c47

由 Ido Yariv 提交于 10月 31, 2016

When a PCI device is reset, pci_do_device_reset resets all BAR addresses
in the relevant PCIDevice's config buffer.

The VFIO configuration space stays untouched, so the guest OS may choose
to skip restoring the BAR addresses as they would seem intact. The PCI
device may be left non-operational.
One example of such a scenario is when the guest exits S3.

Fix this by resetting the BAR addresses in the VFIO configuration space
as well.
Signed-off-by: NIdo Yariv <ido@wizery.com>
Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>

a52a4c47

vfio: Handle zero-length sparse mmap ranges · 24acf72b

由 Alex Williamson 提交于 10月 31, 2016

As reported in the link below, user has a PCI device with a 4KB BAR
which contains the MSI-X table. This seems to hit a corner case in
the kernel where the region reports being mmap capable, but the sparse
mmap information reports a zero sized range. It's not entirely clear
that the kernel is incorrect in doing this, but regardless, we need
to handle it. To do this, fill our mmap array only with non-zero
sized sparse mmap entries and add an error return from the function
so we can tell the difference between nr_mmaps being zero based on
sparse mmap info vs lack of sparse mmap info.

NB, this doesn't actually change the behavior of the device, it only
removes the scary "Failed to mmap ... Performance may be slow" error
message. We cannot currently create an mmap over the MSI-X table.

Link: http://lists.nongnu.org/archive/html/qemu-discuss/2016-10/msg00009.htmlSigned-off-by: NAlex Williamson <alex.williamson@redhat.com>

24acf72b

memory: Replace skip_dump flag with "ram_device" · 21e00fa5

由 Alex Williamson 提交于 10月 31, 2016

Setting skip_dump on a MemoryRegion allows us to modify one specific
code path, but the restriction we're trying to address encompasses
more than that.  If we have a RAM MemoryRegion backed by a physical
device, it not only restricts our ability to dump that region, but
also affects how we should manipulate it.  Here we recognize that
MemoryRegions do not change to sometimes allow dumps and other times
not, so we replace setting the skip_dump flag with a new initializer
so that we know exactly the type of region to which we're applying
this behavior.
Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
Acked-by: NPaolo Bonzini <pbonzini@redhat.com>

21e00fa5

18 10月, 2016 19 次提交

vfio: fix duplicate function call · 893bfc3c

由 Cao jin 提交于 10月 17, 2016

When vfio device is reset(encounter FLR, or bus reset), if need to do
bus reset(vfio_pci_hot_reset_one is called), vfio_pci_pre_reset &
vfio_pci_post_reset will be called twice.
Signed-off-by: NCao jin <caoj.fnst@cn.fujitsu.com>
Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>

893bfc3c

vfio/pci: Fix vfio_rtl8168_quirk_data_read address offset · 31e6a7b1

由 Thorsten Kohfeldt 提交于 10月 17, 2016

Introductory comment for rtl8168 VFIO MSI-X quirk states:
At BAR2 offset 0x70 there is a dword data register,
         offset 0x74 is a dword address register.
vfio: vfio_bar_read(0000:05:00.0:BAR2+0x70, 4) = 0xfee00398 // read data

Thus, correct offset for data read is 0x70,
but function vfio_rtl8168_quirk_data_read() wrongfully uses offset 0x74.
Signed-off-by: NThorsten Kohfeldt <thorsten.kohfeldt@gmx.de>
Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>

31e6a7b1

vfio/pci: Handle host oversight · 4a946268

由 Eric Auger 提交于 10月 17, 2016

In case the end-user calls qemu with -vfio-pci option without passing
either sysfsdev or host property value, the device is interpreted as
0000:00:00.0. Let's create a specific error message to guide the end-user.
Signed-off-by: NEric Auger <eric.auger@redhat.com>
Reviewed-by: NMarkus Armbruster <armbru@redhat.com>
Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>

4a946268

vfio/pci: Remove vfio_populate_device returned value · e04cff9d

由 Eric Auger 提交于 10月 17, 2016

The returned value (either -errno or -1) is not used anymore by the caller,
vfio_realize, since the error now is stored in the error object. So let's
remove it.
Signed-off-by: NEric Auger <eric.auger@redhat.com>
Reviewed-by: NMarkus Armbruster <armbru@redhat.com>
Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>

e04cff9d

vfio/pci: Remove vfio_msix_early_setup returned value · ec3bcf42

由 Eric Auger 提交于 10月 17, 2016

The returned value is not used anymore by the caller, vfio_realize,
since the error now is stored in the error object. So let's remove it.
Signed-off-by: NEric Auger <eric.auger@redhat.com>
Reviewed-by: NMarkus Armbruster <armbru@redhat.com>
Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>

ec3bcf42

vfio/pci: Conversion to realize · 1a22aca1

由 Eric Auger 提交于 10月 17, 2016

This patch converts VFIO PCI to realize function.

Also original initfn errors now are propagated using QEMU
error objects. All errors are formatted with the same pattern:
"vfio: %s: the error description"
Signed-off-by: NEric Auger <eric.auger@redhat.com>
Reviewed-by: NMarkus Armbruster <armbru@redhat.com>
Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>

1a22aca1

vfio/platform: Pass an error object to vfio_base_device_init · 9bdbfbd5

由 Eric Auger 提交于 10月 17, 2016

This patch propagates errors encountered during vfio_base_device_init
up to the realize function.

In case the host value is not set or badly formed we now report an
error.
Signed-off-by: NEric Auger <eric.auger@redhat.com>
Reviewed-by: NMarkus Armbruster <armbru@redhat.com>
Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>

9bdbfbd5

vfio/platform: fix a wrong returned value in vfio_populate_device · 0d84f47b

由 Eric Auger 提交于 10月 17, 2016

In case the vfio_init_intp fails we currently do not return an
error value. This patch fixes the bug. The returned value is not
explicit but in practice the error object is the one used to
report the error to the end-user and the actual returned error
value is not used.
Signed-off-by: NEric Auger <eric.auger@redhat.com>
Reviewed-by: NMarkus Armbruster <armbru@redhat.com>
Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>

0d84f47b

vfio/platform: Pass an error object to vfio_populate_device · 5ff7419d

由 Eric Auger 提交于 10月 17, 2016

Propagate the vfio_populate_device errors up to vfio_base_device_init.
The error object also is passed to vfio_init_intp. At the moment we
only report the error. Subsequent patches will propagate the error
up to the realize function.
Signed-off-by: NEric Auger <eric.auger@redhat.com>
Reviewed-by: NMarkus Armbruster <armbru@redhat.com>
Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>

5ff7419d

vfio: Pass an error object to vfio_get_device · 59f7d674

由 Eric Auger 提交于 10月 17, 2016

Pass an error object to prepare for migration to VFIO-PCI realize.

In vfio platform vfio_base_device_init we currently just report the
error. Subsequent patches will propagate the error up to the realize
function.
Signed-off-by: NEric Auger <eric.auger@redhat.com>
Reviewed-by: NMarkus Armbruster <armbru@redhat.com>
Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>

59f7d674

vfio: Pass an error object to vfio_get_group · 1b808d5b

由 Eric Auger 提交于 10月 17, 2016

Pass an error object to prepare for migration to VFIO-PCI realize.

For the time being let's just simply report the error in
vfio platform's vfio_base_device_init(). A subsequent patch will
duly propagate the error up to vfio_platform_realize.
Signed-off-by: NEric Auger <eric.auger@redhat.com>
Reviewed-by: NMarkus Armbruster <armbru@redhat.com>
Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>

1b808d5b

vfio: Pass an Error object to vfio_connect_container · 01905f58

由 Eric Auger 提交于 10月 17, 2016

The error is currently simply reported in vfio_get_group. Don't
bother too much with the prefix which will be handled at upper level,
later on.

Also return an error value in case container->error is not 0 and
the container is teared down.

On vfio_spapr_remove_window failure, we also report an error whereas
it was silent before.
Signed-off-by: NEric Auger <eric.auger@redhat.com>
Reviewed-by: NMarkus Armbruster <armbru@redhat.com>
Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>

01905f58

vfio/pci: Pass an error object to vfio_pci_igd_opregion_init · 7237011d

由 Eric Auger 提交于 10月 17, 2016

Pass an error object to prepare for migration to VFIO-PCI realize.

In vfio_probe_igd_bar4_quirk, simply report the error.
Signed-off-by: NEric Auger <eric.auger@redhat.com>
Reviewed-by: NMarkus Armbruster <armbru@redhat.com>
Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>

7237011d

vfio/pci: Pass an error object to vfio_add_capabilities · 7ef165b9

由 Eric Auger 提交于 10月 17, 2016

Pass an error object to prepare for migration to VFIO-PCI realize.
The error is cascaded downto vfio_add_std_cap and then vfio_msi(x)_setup,
vfio_setup_pcie_cap.

vfio_add_ext_cap does not return anything else than 0 so let's transform
it into a void function.

Also use pci_add_capability2 which takes an error object.
Signed-off-by: NEric Auger <eric.auger@redhat.com>
Reviewed-by: NMarkus Armbruster <armbru@redhat.com>
Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>

7ef165b9

vfio/pci: Pass an error object to vfio_intx_enable · 7dfb3424

由 Eric Auger 提交于 10月 17, 2016

Pass an error object to prepare for migration to VFIO-PCI realize.

The error object is propagated down to vfio_intx_enable_kvm().

The three other callers, vfio_intx_enable_kvm(), vfio_msi_disable_common()
and vfio_pci_post_reset() do not propagate the error and simply call
error_reportf_err() with the ERR_PREFIX formatting.
Signed-off-by: NEric Auger <eric.auger@redhat.com>
Reviewed-by: NMarkus Armbruster <armbru@redhat.com>
Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>

7dfb3424

vfio/pci: Pass an error object to vfio_msix_early_setup · 008d0e2d

由 Eric Auger 提交于 10月 17, 2016

Pass an error object to prepare for migration to VFIO-PCI realize.
The returned value will be removed later on.

We now format an error in case of reading failure for
- the MSIX flags
- the MSIX table,
- the MSIX PBA.
Signed-off-by: NEric Auger <eric.auger@redhat.com>
Reviewed-by: NMarkus Armbruster <armbru@redhat.com>
Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>

008d0e2d

vfio/pci: Pass an error object to vfio_populate_device · 2312d907

由 Eric Auger 提交于 10月 17, 2016

Pass an error object to prepare for migration to VFIO-PCI realize.
The returned value will be removed later on.

The case where error recovery cannot be enabled is not converted into
an error object but directly reported through error_report, as before.
Populating an error instead would cause the future realize function to
fail, which is not wanted.
Signed-off-by: NEric Auger <eric.auger@redhat.com>
Reviewed-by: NMarkus Armbruster <armbru@redhat.com>
Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>

2312d907

vfio/pci: Pass an error object to vfio_populate_vga · cde4279b

由 Eric Auger 提交于 10月 17, 2016

Pass an error object to prepare for the same operation in
vfio_populate_device. Eventually this contributes to the migration
to VFIO-PCI realize.

We now report an error on vfio_get_region_info failure.

vfio_probe_igd_bar4_quirk is not involved in the migration to realize
and simply calls error_reportf_err.
Signed-off-by: NEric Auger <eric.auger@redhat.com>
Reviewed-by: NMarkus Armbruster <armbru@redhat.com>
Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>

cde4279b

vfio/pci: Use local error object in vfio_initfn · 426ec904

由 Eric Auger 提交于 10月 17, 2016

To prepare for migration to realize, let's use a local error
object in vfio_initfn. Also let's use the same error prefix for all
error messages.

On top of the 1-1 conversion, we start using a common error prefix for
all error messages. We also introduce a similar warning prefix which will
be used later on.
Signed-off-by: NEric Auger <eric.auger@redhat.com>
Reviewed-by: NMarkus Armbruster <armbru@redhat.com>
Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>

426ec904

27 9月, 2016 1 次提交

memory: introduce IOMMUNotifier and its caps · cdb30812

由 Peter Xu 提交于 9月 23, 2016

IOMMU Notifier list is used for notifying IO address mapping changes.
Currently VFIO is the only user.

However it is possible that future consumer like vhost would like to
only listen to part of its notifications (e.g., cache invalidations).

This patch introduced IOMMUNotifier and IOMMUNotfierFlag bits for a
finer grained control of it.

IOMMUNotifier contains a bitfield for the notify consumer describing
what kind of notification it is interested in. Currently two kinds of
notifications are defined:

- IOMMU_NOTIFIER_MAP:    for newly mapped entries (additions)
- IOMMU_NOTIFIER_UNMAP:  for entries to be removed (cache invalidates)

When registering the IOMMU notifier, we need to specify one or multiple
types of messages to listen to.

When notifications are triggered, its type will be checked against the
notifier's type bits, and only notifiers with registered bits will be
notified.

(For any IOMMU implementation, an in-place mapping change should be
 notified with an UNMAP followed by a MAP.)
Signed-off-by: NPeter Xu <peterx@redhat.com>
Message-Id: <1474606948-14391-2-git-send-email-peterx@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

cdb30812

16 9月, 2016 1 次提交

vfio/pci: Fix regression in MSI routing configuration · 6d17a018

由 David Gibson 提交于 9月 15, 2016

d1f6af6a "kvm-irqchip: simplify kvm_irqchip_add_msi_route" was a cleanup
of kvmchip routing configuration, that was mostly intended for x86.
However, it also contains a subtle change in behaviour which breaks EEH[1]
error recovery on certain VFIO passthrough devices on spapr guests.  So far
it's only been seen on a BCM5719 NIC on a POWER8 server, but there may be
other hardware with the same problem.  It's also possible there could be
circumstances where it causes a bug on x86 as well, though I don't know of
any obvious candidates.

Prior to d1f6af6a, both vfio_msix_vector_do_use() and
vfio_add_kvm_msi_virq() used msg == NULL as a special flag to mark this
as the "dummy" vector used to make the host hardware state sync with the
guest expected hardware state in terms of MSI configuration.

Specifically that flag caused vfio_add_kvm_msi_virq() to become a no-op,
meaning the dummy irq would always be delivered via qemu. d1f6af6a changed
vfio_add_kvm_msi_virq() so it takes a vector number instead of the msg
parameter, and determines the correct message itself.  The test for !msg
was removed, and not replaced with anything there or in the caller.

With an spapr guest which has a VFIO device, if an EEH error occurs on the
host hardware, then the device will be isolated then reset.  This is a
combination of host and guest action, mediated by some EEH related
hypercalls.  I haven't fully traced the mechanics, but somehow installing
the kvm irqchip route for the dummy irq on the BCM5719 means that after EEH
reset and recovery, at least some irqs are no longer delivered to the
guest.

In particular, the guest never gets the link up event, and so the NIC is
effectively dead.

[1] EEH (Enhanced Error Handling) is an IBM POWER server specific PCI-*
    error reporting and recovery mechanism.  The concept is somewhat
    similar to PCI-E AER, but the details are different.

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1373802

Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Gavin Shan <gwshan@au1.ibm.com>
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
Cc: qemu-stable@nongnu.org
Fixes: d1f6af6a ("kvm-irqchip: simplify kvm_irqchip_add_msi_route")
Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>

6d17a018

12 8月, 2016 1 次提交

trace-events: fix first line comment in trace-events · e723b871

由 Laurent Vivier 提交于 8月 08, 2016

Documentation is docs/tracing.txt instead of docs/trace-events.txt.

find . -name trace-events -exec \
     sed -i "s?See docs/trace-events.txt for syntax documentation.?See docs/tracing.txt for syntax documentation.?" \
     {} \;
Signed-off-by: NLaurent Vivier <lvivier@redhat.com>
Message-id: 1470669081-17860-1-git-send-email-lvivier@redhat.com
Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>

e723b871

08 8月, 2016 1 次提交

vfio: Use error_report() instead of error_printf() for errors · fea1c099

由 Markus Armbruster 提交于 8月 03, 2016

Cc: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: NMarkus Armbruster <armbru@redhat.com>
Message-Id: <1470224274-31522-4-git-send-email-armbru@redhat.com>
Reviewed-by: NEric Blake <eblake@redhat.com>

fea1c099

22 7月, 2016 2 次提交

kvm-irqchip: do explicit commit when update irq · 3f1fea0f

由 Peter Xu 提交于 7月 14, 2016

In the past, we are doing gsi route commit for each irqchip route
update. This is not efficient if we are updating lots of routes in the
same time. This patch removes the committing phase in
kvm_irqchip_update_msi_route(). Instead, we do explicit commit after all
routes updated.
Signed-off-by: NPeter Xu <peterx@redhat.com>
Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
Reviewed-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

3f1fea0f

kvm-irqchip: simplify kvm_irqchip_add_msi_route · d1f6af6a

由 Peter Xu 提交于 7月 14, 2016

Changing the original MSIMessage parameter in kvm_irqchip_add_msi_route
into the vector number. Vector index provides more information than the
MSIMessage, we can retrieve the MSIMessage using the vector easily. This
will avoid fetching MSIMessage every time before adding MSI routes.

Meanwhile, the vector info will be used in the coming patches to further
enable gsi route update notifications.
Signed-off-by: NPeter Xu <peterx@redhat.com>
Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
Reviewed-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

d1f6af6a

19 7月, 2016 1 次提交

vfio/pci: Hide ARI capability · 383a7af7

由 Alex Williamson 提交于 7月 18, 2016

QEMU supports ARI on downstream ports and assigned devices may support
ARI in their extended capabilities. The endpoint ARI capability
specifies the next function, such that the OS doesn't need to walk
each possible function, however this next function is relative to the
host, not the guest. This leads to device discovery issues when we
combine separate functions into virtual multi-function packages in a
guest. For example, SR-IOV VFs are not enumerated by simply probing
the function address space, therefore the ARI next-function field is
zero. When we combine multiple VFs together as a multi-function
device in the guest, the guest OS identifies ARI is enabled, relies on
this next-function field, and stops looking for additional function
after the first is found.

Long term we should expose the ARI capability to the guest to enable
configurations with more than 8 functions per slot, but this requires
additional QEMU PCI infrastructure to manage the next-function field
for multiple, otherwise independent devices. In the short term,
hiding this capability allows equivalent functionality to what we
currently have on non-express chipsets.
Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
Reviewed-by: NMarcel Apfelbaum <marcel@redhat.com>

383a7af7

18 7月, 2016 1 次提交

vfio/spapr: Remove stale ioctl() call · 21bb3093

由 David Gibson 提交于 7月 12, 2016

This ioctl() call to VFIO_IOMMU_SPAPR_TCE_REMOVE was left over from an
earlier version of the code and has since been folded into
vfio_spapr_remove_window().

It wasn't caught because although the argument structure has been removed,
the libc function remove() means this didn't trigger a compile failure.
The ioctl() was also almost certain to fail silently and harmlessly with
the bogus argument, so this wasn't caught in testing.
Suggested-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
Reviewed-by: NAlexey Kardashevskiy <aik@ozlabs.ru>

21bb3093

12 7月, 2016 1 次提交

Use #include "..." for our own headers, <...> for others · a9c94277

由 Markus Armbruster 提交于 6月 22, 2016

Tracked down with an ugly, brittle and probably buggy Perl script.

Also move includes converted to <...> up so they get included before
ours where that's obviously okay.
Signed-off-by: NMarkus Armbruster <armbru@redhat.com>
Reviewed-by: NEric Blake <eblake@redhat.com>
Tested-by: NEric Blake <eblake@redhat.com>
Reviewed-by: NRichard Henderson <rth@twiddle.net>

a9c94277

05 7月, 2016 4 次提交

pci: Convert msi_init() to Error and fix callers to check it · 1108b2f8

由 Cao jin 提交于 6月 20, 2016

msi_init() reports errors with error_report(), which is wrong
when it's used in realize().

Fix by converting it to Error.

Fix its callers to handle failure instead of ignoring it.

For those callers who don't handle the failure, it might happen:
when user want msi on, but he doesn't get what he want because of
msi_init fails silently.

cc: Gerd Hoffmann <kraxel@redhat.com>
cc: John Snow <jsnow@redhat.com>
cc: Dmitry Fleytman <dmitry@daynix.com>
cc: Jason Wang <jasowang@redhat.com>
cc: Michael S. Tsirkin <mst@redhat.com>
cc: Hannes Reinecke <hare@suse.de>
cc: Paolo Bonzini <pbonzini@redhat.com>
cc: Alex Williamson <alex.williamson@redhat.com>
cc: Markus Armbruster <armbru@redhat.com>
cc: Marcel Apfelbaum <marcel@redhat.com>
Reviewed-by: NMarkus Armbruster <armbru@redhat.com>
Signed-off-by: NCao jin <caoj.fnst@cn.fujitsu.com>
Reviewed-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
Reviewed-by: NHannes Reinecke <hare@suse.com>

1108b2f8

vfio/spapr: Create DMA window dynamically (SPAPR IOMMU v2) · 2e4109de

由 Alexey Kardashevskiy 提交于 7月 04, 2016

New VFIO_SPAPR_TCE_v2_IOMMU type supports dynamic DMA window management.
This adds ability to VFIO common code to dynamically allocate/remove
DMA windows in the host kernel when new VFIO container is added/removed.

This adds a helper to vfio_listener_region_add which makes
VFIO_IOMMU_SPAPR_TCE_CREATE ioctl and adds just created IOMMU into
the host IOMMU list; the opposite action is taken in
vfio_listener_region_del.

When creating a new window, this uses heuristic to decide on the TCE table
levels number.

This should cause no guest visible change in behavior.
Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
[dwg: Added some casts to prevent printf() warnings on certain targets
 where the kernel headers' __u64 doesn't match uint64_t or PRIx64]
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>

2e4109de

vfio: Add host side DMA window capabilities · f4ec5e26

由 Alexey Kardashevskiy 提交于 7月 04, 2016

There are going to be multiple IOMMUs per a container. This moves
the single host IOMMU parameter set to a list of VFIOHostDMAWindow.

This should cause no behavioral change and will be used later by
the SPAPR TCE IOMMU v2 which will also add a vfio_host_win_del() helper.
Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>

f4ec5e26

vfio: spapr: Add DMA memory preregistering (SPAPR IOMMU v2) · 318f67ce

由 Alexey Kardashevskiy 提交于 7月 04, 2016

This makes use of the new "memory registering" feature. The idea is
to provide the userspace ability to notify the host kernel about pages
which are going to be used for DMA. Having this information, the host
kernel can pin them all once per user process, do locked pages
accounting (once) and not spent time on doing that in real time with
possible failures which cannot be handled nicely in some cases.

This adds a prereg memory listener which listens on address_space_memory
and notifies a VFIO container about memory which needs to be
pinned/unpinned. VFIO MMIO regions (i.e. "skip dump" regions) are skipped.

The feature is only enabled for SPAPR IOMMU v2. The host kernel changes
are required. Since v2 does not need/support VFIO_IOMMU_ENABLE, this does
not call it when v2 is detected and enabled.

This enforces guest RAM blocks to be host page size aligned; however
this is not new as KVM already requires memory slots to be host page
size aligned.
Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
[dwg: Fix compile error on 32-bit host]
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>

318f67ce

01 7月, 2016 4 次提交

memory: Add MemoryRegionIOMMUOps.notify_started/stopped callbacks · d22d8956

由 Alexey Kardashevskiy 提交于 6月 30, 2016

The IOMMU driver may change behavior depending on whether a notifier
client is present.  In the case of POWER, this represents a change in
the visibility of the IOTLB, for other drivers such as intel-iommu and
future AMD-Vi emulation, notifier support is not yet enabled and this
provides the opportunity to flag that incompatibility.
Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
Reviewed-by: NPeter Xu <peterx@redhat.com>
Tested-by: NPeter Xu <peterx@redhat.com>
Acked-by: NPaolo Bonzini <pbonzini@redhat.com>
[new log & extracted from [PATCH qemu v17 12/12] spapr_iommu, vfio, memory: Notify IOMMU about starting/stopping listening]
Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>

d22d8956

vfio/pci: Hide SR-IOV capability · e37dac06

由 Alex Williamson 提交于 6月 30, 2016

The kernel currently exposes the SR-IOV capability as read-only
through vfio-pci. This is sufficient to protect the host kernel, but
has the potential to confuse guests without further virtualization.
In particular, OVMF tries to size the VF BARs and comes up with absurd
results, ending with an assert. There's not much point in adding
virtualization to a read-only capability, so we simply hide it for
now. If the kernel ever enables SR-IOV virtualization, we should
easily be able to test it through VF BAR sizing or explicit flags.

Testing whether we should parse extended capabilities is also pulled
into the function to keep these assumptions in one place.
Tested-by: NLaszlo Ersek <lersek@redhat.com>
Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>

e37dac06

vfio: add pcie extended capability support · 325ae8d5

由 Chen Fan 提交于 6月 30, 2016

For vfio pcie device, we could expose the extended capability on
PCIE bus. due to add a new pcie capability at the tail of the chain,
in order to avoid config space overwritten, we introduce a copy config
for parsing extended caps. and rebuild the pcie extended config space.
Signed-off-by: NChen Fan <chen.fan.fnst@cn.fujitsu.com>
Tested-by: NLaszlo Ersek <lersek@redhat.com>
Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>

325ae8d5

vfio/pci: Fix VGA quirks · 4d3fc4fd

由 Alex Williamson 提交于 6月 30, 2016

Commit 2d82f8a3 ("vfio/pci: Convert all MemoryRegion to dynamic
alloc and consistent functions") converted VFIOPCIDevice.vga to be
dynamically allocted, negating the need for VFIOPCIDevice.has_vga.
Unfortunately not all of the has_vga users were converted, nor was
the field removed from the structure. Correct these oversights.
Reported-by: NPeter Maloney <peter.maloney@brockmann-consult.de>
Tested-by: NPeter Maloney <peter.maloney@brockmann-consult.de>
Fixes: 2d82f8a3 ("vfio/pci: Convert all MemoryRegion to dynamic alloc and consistent functions")
Fixes: https://bugs.launchpad.net/qemu/+bug/1591628
Cc: qemu-stable@nongnu.org
Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>

4d3fc4fd