提交 · 95251725e335af2b885e2ab33dd29c86f8084663 · openeuler / qemu

31 10月, 2016 3 次提交

vfio: Add support for mmapping sub-page MMIO BARs · 95251725

由 Yongji Xie 提交于 10月 31, 2016

Now the kernel commit 05f0c03fbac1 ("vfio-pci: Allow to mmap
sub-page MMIO BARs if the mmio page is exclusive") allows VFIO
to mmap sub-page BARs. This is the corresponding QEMU patch.
With those patches applied, we could passthrough sub-page BARs
to guest, which can help to improve IO performance for some devices.

In this patch, we expand MemoryRegions of these sub-page
MMIO BARs to PAGE_SIZE in vfio_pci_write_config(), so that
the BARs could be passed to KVM ioctl KVM_SET_USER_MEMORY_REGION
with a valid size. The expanding size will be recovered when
the base address of sub-page BAR is changed and not page aligned
any more in guest. And we also set the priority of these BARs'
memory regions to zero in case of overlap with BARs which share
the same page with sub-page BARs in guest.
Signed-off-by: NYongji Xie <xyjxie@linux.vnet.ibm.com>
Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>

95251725

vfio: Handle zero-length sparse mmap ranges · 24acf72b

由 Alex Williamson 提交于 10月 31, 2016

As reported in the link below, user has a PCI device with a 4KB BAR
which contains the MSI-X table. This seems to hit a corner case in
the kernel where the region reports being mmap capable, but the sparse
mmap information reports a zero sized range. It's not entirely clear
that the kernel is incorrect in doing this, but regardless, we need
to handle it. To do this, fill our mmap array only with non-zero
sized sparse mmap entries and add an error return from the function
so we can tell the difference between nr_mmaps being zero based on
sparse mmap info vs lack of sparse mmap info.

NB, this doesn't actually change the behavior of the device, it only
removes the scary "Failed to mmap ... Performance may be slow" error
message. We cannot currently create an mmap over the MSI-X table.

Link: http://lists.nongnu.org/archive/html/qemu-discuss/2016-10/msg00009.htmlSigned-off-by: NAlex Williamson <alex.williamson@redhat.com>

24acf72b

memory: Replace skip_dump flag with "ram_device" · 21e00fa5

由 Alex Williamson 提交于 10月 31, 2016

Setting skip_dump on a MemoryRegion allows us to modify one specific
code path, but the restriction we're trying to address encompasses
more than that.  If we have a RAM MemoryRegion backed by a physical
device, it not only restricts our ability to dump that region, but
also affects how we should manipulate it.  Here we recognize that
MemoryRegions do not change to sometimes allow dumps and other times
not, so we replace setting the skip_dump flag with a new initializer
so that we know exactly the type of region to which we're applying
this behavior.
Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
Acked-by: NPaolo Bonzini <pbonzini@redhat.com>

21e00fa5

18 10月, 2016 3 次提交

vfio: Pass an error object to vfio_get_device · 59f7d674

由 Eric Auger 提交于 10月 17, 2016

Pass an error object to prepare for migration to VFIO-PCI realize.

In vfio platform vfio_base_device_init we currently just report the
error. Subsequent patches will propagate the error up to the realize
function.
Signed-off-by: NEric Auger <eric.auger@redhat.com>
Reviewed-by: NMarkus Armbruster <armbru@redhat.com>
Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>

59f7d674

vfio: Pass an error object to vfio_get_group · 1b808d5b

由 Eric Auger 提交于 10月 17, 2016

Pass an error object to prepare for migration to VFIO-PCI realize.

For the time being let's just simply report the error in
vfio platform's vfio_base_device_init(). A subsequent patch will
duly propagate the error up to vfio_platform_realize.
Signed-off-by: NEric Auger <eric.auger@redhat.com>
Reviewed-by: NMarkus Armbruster <armbru@redhat.com>
Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>

1b808d5b

vfio: Pass an Error object to vfio_connect_container · 01905f58

由 Eric Auger 提交于 10月 17, 2016

The error is currently simply reported in vfio_get_group. Don't
bother too much with the prefix which will be handled at upper level,
later on.

Also return an error value in case container->error is not 0 and
the container is teared down.

On vfio_spapr_remove_window failure, we also report an error whereas
it was silent before.
Signed-off-by: NEric Auger <eric.auger@redhat.com>
Reviewed-by: NMarkus Armbruster <armbru@redhat.com>
Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>

01905f58

27 9月, 2016 1 次提交

memory: introduce IOMMUNotifier and its caps · cdb30812

由 Peter Xu 提交于 9月 23, 2016

IOMMU Notifier list is used for notifying IO address mapping changes.
Currently VFIO is the only user.

However it is possible that future consumer like vhost would like to
only listen to part of its notifications (e.g., cache invalidations).

This patch introduced IOMMUNotifier and IOMMUNotfierFlag bits for a
finer grained control of it.

IOMMUNotifier contains a bitfield for the notify consumer describing
what kind of notification it is interested in. Currently two kinds of
notifications are defined:

- IOMMU_NOTIFIER_MAP:    for newly mapped entries (additions)
- IOMMU_NOTIFIER_UNMAP:  for entries to be removed (cache invalidates)

When registering the IOMMU notifier, we need to specify one or multiple
types of messages to listen to.

When notifications are triggered, its type will be checked against the
notifier's type bits, and only notifiers with registered bits will be
notified.

(For any IOMMU implementation, an in-place mapping change should be
 notified with an UNMAP followed by a MAP.)
Signed-off-by: NPeter Xu <peterx@redhat.com>
Message-Id: <1474606948-14391-2-git-send-email-peterx@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

cdb30812

12 7月, 2016 1 次提交

Use #include "..." for our own headers, <...> for others · a9c94277

由 Markus Armbruster 提交于 6月 22, 2016

Tracked down with an ugly, brittle and probably buggy Perl script.

Also move includes converted to <...> up so they get included before
ours where that's obviously okay.
Signed-off-by: NMarkus Armbruster <armbru@redhat.com>
Reviewed-by: NEric Blake <eblake@redhat.com>
Tested-by: NEric Blake <eblake@redhat.com>
Reviewed-by: NRichard Henderson <rth@twiddle.net>

a9c94277

05 7月, 2016 3 次提交

vfio/spapr: Create DMA window dynamically (SPAPR IOMMU v2) · 2e4109de

由 Alexey Kardashevskiy 提交于 7月 04, 2016

New VFIO_SPAPR_TCE_v2_IOMMU type supports dynamic DMA window management.
This adds ability to VFIO common code to dynamically allocate/remove
DMA windows in the host kernel when new VFIO container is added/removed.

This adds a helper to vfio_listener_region_add which makes
VFIO_IOMMU_SPAPR_TCE_CREATE ioctl and adds just created IOMMU into
the host IOMMU list; the opposite action is taken in
vfio_listener_region_del.

When creating a new window, this uses heuristic to decide on the TCE table
levels number.

This should cause no guest visible change in behavior.
Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
[dwg: Added some casts to prevent printf() warnings on certain targets
 where the kernel headers' __u64 doesn't match uint64_t or PRIx64]
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>

2e4109de

vfio: Add host side DMA window capabilities · f4ec5e26

由 Alexey Kardashevskiy 提交于 7月 04, 2016

There are going to be multiple IOMMUs per a container. This moves
the single host IOMMU parameter set to a list of VFIOHostDMAWindow.

This should cause no behavioral change and will be used later by
the SPAPR TCE IOMMU v2 which will also add a vfio_host_win_del() helper.
Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>

f4ec5e26

vfio: spapr: Add DMA memory preregistering (SPAPR IOMMU v2) · 318f67ce

由 Alexey Kardashevskiy 提交于 7月 04, 2016

This makes use of the new "memory registering" feature. The idea is
to provide the userspace ability to notify the host kernel about pages
which are going to be used for DMA. Having this information, the host
kernel can pin them all once per user process, do locked pages
accounting (once) and not spent time on doing that in real time with
possible failures which cannot be handled nicely in some cases.

This adds a prereg memory listener which listens on address_space_memory
and notifies a VFIO container about memory which needs to be
pinned/unpinned. VFIO MMIO regions (i.e. "skip dump" regions) are skipped.

The feature is only enabled for SPAPR IOMMU v2. The host kernel changes
are required. Since v2 does not need/support VFIO_IOMMU_ENABLE, this does
not call it when v2 is detected and enabled.

This enforces guest RAM blocks to be host page size aligned; however
this is not new as KVM already requires memory slots to be host page
size aligned.
Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
[dwg: Fix compile error on 32-bit host]
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>

318f67ce

01 7月, 2016 1 次提交

memory: Add MemoryRegionIOMMUOps.notify_started/stopped callbacks · d22d8956

由 Alexey Kardashevskiy 提交于 6月 30, 2016

The IOMMU driver may change behavior depending on whether a notifier
client is present.  In the case of POWER, this represents a change in
the visibility of the IOTLB, for other drivers such as intel-iommu and
future AMD-Vi emulation, notifier support is not yet enabled and this
provides the opportunity to flag that incompatibility.
Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
Reviewed-by: NPeter Xu <peterx@redhat.com>
Tested-by: NPeter Xu <peterx@redhat.com>
Acked-by: NPaolo Bonzini <pbonzini@redhat.com>
[new log & extracted from [PATCH qemu v17 12/12] spapr_iommu, vfio, memory: Notify IOMMU about starting/stopping listening]
Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>

d22d8956

22 6月, 2016 1 次提交

memory: Add reporting of supported page sizes · f682e9c2

由 Alexey Kardashevskiy 提交于 6月 21, 2016

Every IOMMU has some granularity which MemoryRegionIOMMUOps::translate
uses when translating, however this information is not available outside
the translate context for various checks.

This adds a get_min_page_size callback to MemoryRegionIOMMUOps and
a wrapper for it so IOMMU users (such as VFIO) can know the minimum
actual page size supported by an IOMMU.

As IOMMU MR represents a guest IOMMU, this uses TARGET_PAGE_SIZE
as fallback.

This removes vfio_container_granularity() and uses new helper in
memory_region_iommu_replay() when replaying IOMMU mappings on added
IOMMU memory region.
Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
Acked-by: NAlex Williamson <alex.williamson@redhat.com>
[dwg: Removed an unnecessary calculation]
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>

f682e9c2

17 6月, 2016 2 次提交

vfio: Fix broken EEH · d917e88d

由 Gavin Shan 提交于 6月 15, 2016

vfio_eeh_container_op() is the backend that communicates with
host kernel to support EEH functionality in QEMU. However, the
functon should return the value from host kernel instead of 0
unconditionally.

dwg: Specifically the problem occurs for the handful of EEH
sub-operations which can return a non-zero, non-error result.
Signed-off-by: NGavin Shan <gwshan@linux.vnet.ibm.com>
Acked-by: NAlex Williamson <alex.williamson@redhat.com>
[dwg: clarification to commit message]
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>

d917e88d

os-posix: include sys/mman.h · 02d0e095

由 Paolo Bonzini 提交于 6月 06, 2016

qemu/osdep.h checks whether MAP_ANONYMOUS is defined, but this check
is bogus without a previous inclusion of sys/mman.h.  Include it in
sysemu/os-posix.h and remove it from everywhere else.
Reviewed-by: NPeter Maydell <peter.maydell@linaro.org>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

02d0e095

27 5月, 2016 4 次提交

vfio: Check that IOMMU MR translates to system address space · f1f93650

由 Alexey Kardashevskiy 提交于 5月 26, 2016

At the moment IOMMU MR only translate to the system memory.
However if some new code changes this, we will need clear indication why
it is not working so here is the check.
Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>

f1f93650

memory: Fix IOMMU replay base address · d78c19b5

由 Alexey Kardashevskiy 提交于 5月 26, 2016

Since a788f227 "memory: Allow replay of IOMMU mapping notifications"
when new VFIO listener is added, all existing IOMMU mappings are
replayed. However there is a problem that the base address of
an IOMMU memory region (IOMMU MR) is ignored which is not a problem
for the existing user (which is pseries) with its default 32bit DMA
window starting at 0 but it is if there is another DMA window.

This stores the IOMMU's offset_within_address_space and adjusts
the IOVA before calling vfio_dma_map/vfio_dma_unmap.

As the IOMMU notifier expects IOVA offset rather than the absolute
address, this also adjusts IOVA in sPAPR H_PUT_TCE handler before
calling notifier(s).
Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>

d78c19b5

vfio: Fix 128 bit handling when deleting region · 7a057b4f

由 Alexey Kardashevskiy 提交于 5月 26, 2016

7532d3cb "vfio: Fix 128 bit handling" added support for 64bit IOMMU
memory regions when those are added to VFIO address space; however
removing code cannot cope with these as int128_get64() will fail on
1<<64.

This copies 128bit handling from region_add() to region_del().

Since the only machine type which is actually going to use 64bit IOMMU
is pseries and it never really removes them (instead it will dynamically
add/remove subregions), this should cause no behavioral change.
Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>

7a057b4f

vfio: Create device specific region info helper · e61a424f

由 Alex Williamson 提交于 5月 26, 2016

Given a device specific region type and sub-type, find it.  Also
cleanup return point on error in vfio_get_region_info() so that we
always return 0 with a valid pointer or -errno and NULL.
Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
Reviewed-by: NGerd Hoffmann <kraxel@redhat.com>
Tested-by: NGerd Hoffmann <kraxel@redhat.com>

e61a424f

26 5月, 2016 1 次提交

vfio: Enable sparse mmap capability · b53b0f69

由 Alex Williamson 提交于 5月 26, 2016

The sparse mmap capability in a vfio region info allows vfio to tell
us which sub-areas of a region may be mmap'd.  Thus rather than
assuming a single mmap covers the entire region and later frobbing it
ourselves for things like the PCI MSI-X vector table, we can read that
directly from vfio.
Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
Reviewed-by: NGerd Hoffmann <kraxel@redhat.com>
Tested-by: NGerd Hoffmann <kraxel@redhat.com>

b53b0f69

19 5月, 2016 1 次提交
- P
  explicitly include linux/kvm.h · e81096b1
  由 Paolo Bonzini 提交于 12月 04, 2015
```
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
```
  e81096b1
29 3月, 2016 1 次提交

vfio: convert to 128 bit arithmetic calculations when adding mem regions · 55efcc53

由 Bandan Das 提交于 3月 23, 2016

vfio_listener_region_add for a iommu mr results in
an overflow assert since iommu memory region is initialized
with UINT64_MAX. Convert calculations to 128 bit arithmetic
for iommu memory regions and let int128_get64 assert for non iommu
regions if there's an overflow.
Suggested-by: NAlex Williamson <alex.williamson@redhat.com>
Signed-off-by: NBandan Das <bsd@redhat.com>
[missed (end - 1) on 2nd trace call, move llsize closer to use]
Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>

55efcc53

16 3月, 2016 2 次提交

vfio: Eliminate vfio_container_ioctl() · 3356128c

由 David Gibson 提交于 3月 09, 2016

vfio_container_ioctl() was a bad interface that bypassed abstraction
boundaries, had semantics that sat uneasily with its name, and was unsafe
in many realistic circumstances.  Now that spapr-pci-vfio-host-bridge has
been folded into spapr-pci-host-bridge, there are no more users, so remove
it.
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
Reviewed-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
Acked-by: NAlex Williamson <alex.williamson@redhat.com>

3356128c

vfio: Start improving VFIO/EEH interface · 3153119e

由 David Gibson 提交于 3月 09, 2016

At present the code handling IBM's Enhanced Error Handling (EEH) interface
on VFIO devices operates by bypassing the usual VFIO logic with
vfio_container_ioctl(). That's a poorly designed interface with unclear
semantics about exactly what can be operated on.

In particular it operates on a single vfio container internally (hence the
name), but takes an address space and group id, from which it deduces the
container in a rather roundabout way. groupids are something that code
outside vfio shouldn't even be aware of.

This patch creates new interfaces for EEH operations. Internally we
have vfio_eeh_container_op() which takes a VFIOContainer object
directly. For external use we have vfio_eeh_as_ok() which determines
if an AddressSpace is usable for EEH (at present this means it has a
single container with exactly one group attached), and vfio_eeh_as_op()
which will perform an operation on an AddressSpace in the unambiguous case,
and otherwise returns an error.

This interface still isn't great, but it's enough of an improvement to
allow a number of cleanups in other places.
Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
Reviewed-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
Acked-by: NAlex Williamson <alex.williamson@redhat.com>

3153119e

11 3月, 2016 2 次提交

vfio: Generalize region support · db0da029

由 Alex Williamson 提交于 3月 10, 2016

Both platform and PCI vfio drivers create a "slow", I/O memory region
with one or more mmap memory regions overlayed when supported by the
device. Generalize this to a set of common helpers in the core that
pulls the region info from vfio, fills the region data, configures
slow mapping, and adds helpers for comleting the mmap, enable/disable,
and teardown. This can be immediately used by the PCI MSI-X code,
which needs to mmap around the MSI-X vector table.

This also changes VFIORegion.mem to be dynamically allocated because
otherwise we don't know how the caller has allocated VFIORegion and
therefore don't know whether to unreference it to destroy the
MemoryRegion or not.
Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>

db0da029

vfio: Wrap VFIO_DEVICE_GET_REGION_INFO · 46900226

由 Alex Williamson 提交于 3月 10, 2016

In preparation for supporting capability chains on regions, wrap
ioctl(VFIO_DEVICE_GET_REGION_INFO) so we don't duplicate the code for
each caller.
Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>

46900226

29 1月, 2016 1 次提交

hw/vfio: Clean up includes · c6eacb1a

由 Peter Maydell 提交于 1月 26, 2016

Clean up includes so that osdep.h is included first and headers
which it implies are not included manually.

This commit was created with scripts/clean-includes.
Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>
Message-id: 1453832250-766-22-git-send-email-peter.maydell@linaro.org

c6eacb1a

06 10月, 2015 5 次提交

vfio: Allow hotplug of containers onto existing guest IOMMU mappings · 508ce5eb