1. 31 10月, 2016 3 次提交
    • Y
      vfio: Add support for mmapping sub-page MMIO BARs · 95251725
      Yongji Xie 提交于
      Now the kernel commit 05f0c03fbac1 ("vfio-pci: Allow to mmap
      sub-page MMIO BARs if the mmio page is exclusive") allows VFIO
      to mmap sub-page BARs. This is the corresponding QEMU patch.
      With those patches applied, we could passthrough sub-page BARs
      to guest, which can help to improve IO performance for some devices.
      
      In this patch, we expand MemoryRegions of these sub-page
      MMIO BARs to PAGE_SIZE in vfio_pci_write_config(), so that
      the BARs could be passed to KVM ioctl KVM_SET_USER_MEMORY_REGION
      with a valid size. The expanding size will be recovered when
      the base address of sub-page BAR is changed and not page aligned
      any more in guest. And we also set the priority of these BARs'
      memory regions to zero in case of overlap with BARs which share
      the same page with sub-page BARs in guest.
      Signed-off-by: NYongji Xie <xyjxie@linux.vnet.ibm.com>
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      95251725
    • A
      vfio: Handle zero-length sparse mmap ranges · 24acf72b
      Alex Williamson 提交于
      As reported in the link below, user has a PCI device with a 4KB BAR
      which contains the MSI-X table.  This seems to hit a corner case in
      the kernel where the region reports being mmap capable, but the sparse
      mmap information reports a zero sized range.  It's not entirely clear
      that the kernel is incorrect in doing this, but regardless, we need
      to handle it.  To do this, fill our mmap array only with non-zero
      sized sparse mmap entries and add an error return from the function
      so we can tell the difference between nr_mmaps being zero based on
      sparse mmap info vs lack of sparse mmap info.
      
      NB, this doesn't actually change the behavior of the device, it only
      removes the scary "Failed to mmap ... Performance may be slow" error
      message.  We cannot currently create an mmap over the MSI-X table.
      
      Link: http://lists.nongnu.org/archive/html/qemu-discuss/2016-10/msg00009.htmlSigned-off-by: NAlex Williamson <alex.williamson@redhat.com>
      24acf72b
    • A
      memory: Replace skip_dump flag with "ram_device" · 21e00fa5
      Alex Williamson 提交于
      Setting skip_dump on a MemoryRegion allows us to modify one specific
      code path, but the restriction we're trying to address encompasses
      more than that.  If we have a RAM MemoryRegion backed by a physical
      device, it not only restricts our ability to dump that region, but
      also affects how we should manipulate it.  Here we recognize that
      MemoryRegions do not change to sometimes allow dumps and other times
      not, so we replace setting the skip_dump flag with a new initializer
      so that we know exactly the type of region to which we're applying
      this behavior.
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      Acked-by: NPaolo Bonzini <pbonzini@redhat.com>
      21e00fa5
  2. 18 10月, 2016 3 次提交
  3. 27 9月, 2016 1 次提交
    • P
      memory: introduce IOMMUNotifier and its caps · cdb30812
      Peter Xu 提交于
      IOMMU Notifier list is used for notifying IO address mapping changes.
      Currently VFIO is the only user.
      
      However it is possible that future consumer like vhost would like to
      only listen to part of its notifications (e.g., cache invalidations).
      
      This patch introduced IOMMUNotifier and IOMMUNotfierFlag bits for a
      finer grained control of it.
      
      IOMMUNotifier contains a bitfield for the notify consumer describing
      what kind of notification it is interested in. Currently two kinds of
      notifications are defined:
      
      - IOMMU_NOTIFIER_MAP:    for newly mapped entries (additions)
      - IOMMU_NOTIFIER_UNMAP:  for entries to be removed (cache invalidates)
      
      When registering the IOMMU notifier, we need to specify one or multiple
      types of messages to listen to.
      
      When notifications are triggered, its type will be checked against the
      notifier's type bits, and only notifiers with registered bits will be
      notified.
      
      (For any IOMMU implementation, an in-place mapping change should be
       notified with an UNMAP followed by a MAP.)
      Signed-off-by: NPeter Xu <peterx@redhat.com>
      Message-Id: <1474606948-14391-2-git-send-email-peterx@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      cdb30812
  4. 12 7月, 2016 1 次提交
  5. 05 7月, 2016 3 次提交
    • A
      vfio/spapr: Create DMA window dynamically (SPAPR IOMMU v2) · 2e4109de
      Alexey Kardashevskiy 提交于
      New VFIO_SPAPR_TCE_v2_IOMMU type supports dynamic DMA window management.
      This adds ability to VFIO common code to dynamically allocate/remove
      DMA windows in the host kernel when new VFIO container is added/removed.
      
      This adds a helper to vfio_listener_region_add which makes
      VFIO_IOMMU_SPAPR_TCE_CREATE ioctl and adds just created IOMMU into
      the host IOMMU list; the opposite action is taken in
      vfio_listener_region_del.
      
      When creating a new window, this uses heuristic to decide on the TCE table
      levels number.
      
      This should cause no guest visible change in behavior.
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      [dwg: Added some casts to prevent printf() warnings on certain targets
       where the kernel headers' __u64 doesn't match uint64_t or PRIx64]
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      2e4109de
    • A
      vfio: Add host side DMA window capabilities · f4ec5e26
      Alexey Kardashevskiy 提交于
      There are going to be multiple IOMMUs per a container. This moves
      the single host IOMMU parameter set to a list of VFIOHostDMAWindow.
      
      This should cause no behavioral change and will be used later by
      the SPAPR TCE IOMMU v2 which will also add a vfio_host_win_del() helper.
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      f4ec5e26
    • A
      vfio: spapr: Add DMA memory preregistering (SPAPR IOMMU v2) · 318f67ce
      Alexey Kardashevskiy 提交于
      This makes use of the new "memory registering" feature. The idea is
      to provide the userspace ability to notify the host kernel about pages
      which are going to be used for DMA. Having this information, the host
      kernel can pin them all once per user process, do locked pages
      accounting (once) and not spent time on doing that in real time with
      possible failures which cannot be handled nicely in some cases.
      
      This adds a prereg memory listener which listens on address_space_memory
      and notifies a VFIO container about memory which needs to be
      pinned/unpinned. VFIO MMIO regions (i.e. "skip dump" regions) are skipped.
      
      The feature is only enabled for SPAPR IOMMU v2. The host kernel changes
      are required. Since v2 does not need/support VFIO_IOMMU_ENABLE, this does
      not call it when v2 is detected and enabled.
      
      This enforces guest RAM blocks to be host page size aligned; however
      this is not new as KVM already requires memory slots to be host page
      size aligned.
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      [dwg: Fix compile error on 32-bit host]
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      318f67ce
  6. 01 7月, 2016 1 次提交
  7. 22 6月, 2016 1 次提交
  8. 17 6月, 2016 2 次提交
  9. 27 5月, 2016 4 次提交
  10. 26 5月, 2016 1 次提交
  11. 19 5月, 2016 1 次提交
  12. 29 3月, 2016 1 次提交
  13. 16 3月, 2016 2 次提交
    • D
      vfio: Eliminate vfio_container_ioctl() · 3356128c
      David Gibson 提交于
      vfio_container_ioctl() was a bad interface that bypassed abstraction
      boundaries, had semantics that sat uneasily with its name, and was unsafe
      in many realistic circumstances.  Now that spapr-pci-vfio-host-bridge has
      been folded into spapr-pci-host-bridge, there are no more users, so remove
      it.
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Reviewed-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Acked-by: NAlex Williamson <alex.williamson@redhat.com>
      3356128c
    • D
      vfio: Start improving VFIO/EEH interface · 3153119e
      David Gibson 提交于
      At present the code handling IBM's Enhanced Error Handling (EEH) interface
      on VFIO devices operates by bypassing the usual VFIO logic with
      vfio_container_ioctl().  That's a poorly designed interface with unclear
      semantics about exactly what can be operated on.
      
      In particular it operates on a single vfio container internally (hence the
      name), but takes an address space and group id, from which it deduces the
      container in a rather roundabout way.  groupids are something that code
      outside vfio shouldn't even be aware of.
      
      This patch creates new interfaces for EEH operations.  Internally we
      have vfio_eeh_container_op() which takes a VFIOContainer object
      directly.  For external use we have vfio_eeh_as_ok() which determines
      if an AddressSpace is usable for EEH (at present this means it has a
      single container with exactly one group attached), and vfio_eeh_as_op()
      which will perform an operation on an AddressSpace in the unambiguous case,
      and otherwise returns an error.
      
      This interface still isn't great, but it's enough of an improvement to
      allow a number of cleanups in other places.
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Reviewed-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Acked-by: NAlex Williamson <alex.williamson@redhat.com>
      3153119e
  14. 11 3月, 2016 2 次提交
    • A
      vfio: Generalize region support · db0da029
      Alex Williamson 提交于
      Both platform and PCI vfio drivers create a "slow", I/O memory region
      with one or more mmap memory regions overlayed when supported by the
      device. Generalize this to a set of common helpers in the core that
      pulls the region info from vfio, fills the region data, configures
      slow mapping, and adds helpers for comleting the mmap, enable/disable,
      and teardown.  This can be immediately used by the PCI MSI-X code,
      which needs to mmap around the MSI-X vector table.
      
      This also changes VFIORegion.mem to be dynamically allocated because
      otherwise we don't know how the caller has allocated VFIORegion and
      therefore don't know whether to unreference it to destroy the
      MemoryRegion or not.
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      db0da029
    • A
      vfio: Wrap VFIO_DEVICE_GET_REGION_INFO · 46900226
      Alex Williamson 提交于
      In preparation for supporting capability chains on regions, wrap
      ioctl(VFIO_DEVICE_GET_REGION_INFO) so we don't duplicate the code for
      each caller.
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      46900226
  15. 29 1月, 2016 1 次提交
  16. 06 10月, 2015 5 次提交
    • D
      vfio: Allow hotplug of containers onto existing guest IOMMU mappings · 508ce5eb
      David Gibson 提交于
      At present the memory listener used by vfio to keep host IOMMU mappings
      in sync with the guest memory image assumes that if a guest IOMMU
      appears, then it has no existing mappings.
      
      This may not be true if a VFIO device is hotplugged onto a guest bus
      which didn't previously include a VFIO device, and which has existing
      guest IOMMU mappings.
      
      Therefore, use the memory_region_register_iommu_notifier_replay()
      function in order to fix this case, replaying existing guest IOMMU
      mappings, bringing the host IOMMU into sync with the guest IOMMU.
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      508ce5eb
    • D
      vfio: Record host IOMMU's available IO page sizes · 7a140a57
      David Gibson 提交于
      Depending on the host IOMMU type we determine and record the available page
      sizes for IOMMU translation.  We'll need this for other validation in
      future patches.
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Reviewed-by: NThomas Huth <thuth@redhat.com>
      Reviewed-by: NLaurent Vivier <lvivier@redhat.com>
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      7a140a57
    • D
      vfio: Check guest IOVA ranges against host IOMMU capabilities · 3898aad3
      David Gibson 提交于
      The current vfio core code assumes that the host IOMMU is capable of
      mapping any IOVA the guest wants to use to where we need.  However, real
      IOMMUs generally only support translating a certain range of IOVAs (the
      "DMA window") not a full 64-bit address space.
      
      The common x86 IOMMUs support a wide enough range that guests are very
      unlikely to go beyond it in practice, however the IOMMU used on IBM Power
      machines - in the default configuration - supports only a much more limited
      IOVA range, usually 0..2GiB.
      
      If the guest attempts to set up an IOVA range that the host IOMMU can't
      map, qemu won't report an error until it actually attempts to map a bad
      IOVA.  If guest RAM is being mapped directly into the IOMMU (i.e. no guest
      visible IOMMU) then this will show up very quickly.  If there is a guest
      visible IOMMU, however, the problem might not show up until much later when
      the guest actually attempt to DMA with an IOVA the host can't handle.
      
      This patch adds a test so that we will detect earlier if the guest is
      attempting to use IOVA ranges that the host IOMMU won't be able to deal
      with.
      
      For now, we assume that "Type1" (x86) IOMMUs can support any IOVA, this is
      incorrect, but no worse than what we have already.  We can't do better for
      now because the Type1 kernel interface doesn't tell us what IOVA range the
      IOMMU actually supports.
      
      For the Power "sPAPR TCE" IOMMU, however, we can retrieve the supported
      IOVA range and validate guest IOVA ranges against it, and this patch does
      so.
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Reviewed-by: NLaurent Vivier <lvivier@redhat.com>
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      3898aad3
    • D
      vfio: Generalize vfio_listener_region_add failure path · ac6dc389
      David Gibson 提交于
      If a DMA mapping operation fails in vfio_listener_region_add() it
      checks to see if we've already completed initial setup of the
      container.  If so it reports an error so the setup code can fail
      gracefully, otherwise throws a hw_error().
      
      There are other potential failure cases in vfio_listener_region_add()
      which could benefit from the same logic, so move it to its own
      fail: block.  Later patches can use this to extend other failure cases
      to fail as gracefully as possible under the circumstances.
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Reviewed-by: NThomas Huth <thuth@redhat.com>
      Reviewed-by: NLaurent Vivier <lvivier@redhat.com>
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      ac6dc389
    • D
      vfio: Remove unneeded union from VFIOContainer · ee0bf0e5
      David Gibson 提交于
      Currently the VFIOContainer iommu_data field contains a union with
      different information for different host iommu types.  However:
         * It only actually contains information for the x86-like "Type1" iommu
         * Because we have a common listener the Type1 fields are actually used
      on all IOMMU types, including the SPAPR TCE type as well
      
      In fact we now have a general structure for the listener which is unlikely
      to ever need per-iommu-type information, so this patch removes the union.
      
      In a similar way we can unify the setup of the vfio memory listener in
      vfio_connect_container() that is currently split across a switch on iommu
      type, but is effectively the same in both cases.
      
      The iommu_data.release pointer was only needed as a cleanup function
      which would handle potentially different data in the union.  With the
      union gone, it too can be removed.
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Reviewed-by: NLaurent Vivier <lvivier@redhat.com>
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      ee0bf0e5
  17. 24 9月, 2015 1 次提交
  18. 11 9月, 2015 1 次提交
  19. 07 7月, 2015 1 次提交
  20. 30 4月, 2015 1 次提交
  21. 10 3月, 2015 1 次提交
  22. 09 3月, 2015 1 次提交
  23. 03 3月, 2015 2 次提交