1. 07 2月, 2018 1 次提交
  2. 14 12月, 2017 3 次提交
  3. 18 7月, 2017 1 次提交
    • A
      vfio-pci, ppc64/spapr: Reorder group-to-container attaching · 8c37faa4
      Alexey Kardashevskiy 提交于
      At the moment VFIO PCI device initialization works as follows:
      vfio_realize
      	vfio_get_group
      		vfio_connect_container
      			register memory listeners (1)
      			update QEMU groups lists
      		vfio_kvm_device_add_group
      
      Then (example for pseries) the machine reset hook triggers region_add()
      for all regions where listeners from (1) are listening:
      
      ppc_spapr_reset
      	spapr_phb_reset
      		spapr_tce_table_enable
      			memory_region_add_subregion
      				vfio_listener_region_add
      					vfio_spapr_create_window
      
      This scheme works fine until we need to handle VFIO PCI device hotplug
      and we want to enable PPC64/sPAPR in-kernel TCE acceleration on,
      i.e. after PCI hotplug we need a place to call
      ioctl(vfio_kvm_device_fd, KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE).
      Since the ioctl needs a LIOBN fd (from sPAPRTCETable) and a IOMMU group fd
      (from VFIOGroup), vfio_listener_region_add() seems to be the only place
      for this ioctl().
      
      However this only works during boot time because the machine reset
      happens strictly after all devices are finalized. When hotplug happens,
      vfio_listener_region_add() is called when a memory listener is registered
      but when this happens:
      1. new group is not added to the container->group_list yet;
      2. VFIO KVM device is unaware of the new IOMMU group.
      
      This moves bits around to have all necessary VFIO infrastructure
      in place for both initial startup and hotplug cases.
      
      [aw: ie, register vfio groups with kvm prior to memory listener
      registration such that kvm-vfio pseudo device ioctls are available
      during the region_add callback]
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      8c37faa4
  4. 14 7月, 2017 1 次提交
  5. 11 7月, 2017 1 次提交
    • A
      vfio: Test realized when using VFIOGroup.device_list iterator · 7da624e2
      Alex Williamson 提交于
      VFIOGroup.device_list is effectively our reference tracking mechanism
      such that we can teardown a group when all of the device references
      are removed.  However, we also use this list from our machine reset
      handler for processing resets that affect multiple devices.  Generally
      device removals are fully processed (exitfn + finalize) when this
      reset handler is invoked, however if the removal is triggered via
      another reset handler (piix4_reset->acpi_pcihp_reset) then the device
      exitfn may run, but not finalize.  In this case we hit asserts when
      we start trying to access PCI helpers since much of the PCI state of
      the device is released.  To resolve this, add a pointer to the Object
      DeviceState in our common base-device and skip non-realized devices
      as we iterate.
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      7da624e2
  6. 26 5月, 2017 1 次提交
  7. 04 5月, 2017 2 次提交
    • J
      vfio: enable 8-byte reads/writes to vfio · 38d49e8c
      Jose Ricardo Ziviani 提交于
      This patch enables 8-byte writes and reads to VFIO. Such implemention
      is already done but it's missing the 'case' to handle such accesses in
      both vfio_region_write and vfio_region_read and the MemoryRegionOps:
      impl.max_access_size and impl.min_access_size.
      
      After this patch, 8-byte writes such as:
      
      qemu_mutex_lock locked mutex 0x10905ad8
      vfio_region_write  (0001:03:00.0:region1+0xc0, 0x4140c, 4)
      vfio_region_write  (0001:03:00.0:region1+0xc4, 0xa0000, 4)
      qemu_mutex_unlock unlocked mutex 0x10905ad8
      
      goes like this:
      
      qemu_mutex_lock locked mutex 0x10905ad8
      vfio_region_write  (0001:03:00.0:region1+0xc0, 0xbfd0008, 8)
      qemu_mutex_unlock unlocked mutex 0x10905ad8
      Signed-off-by: NJose Ricardo Ziviani <joserz@linux.vnet.ibm.com>
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      38d49e8c
    • J
      vfio: Set MemoryRegionOps:max_access_size and min_access_size · 15126cba
      Jose Ricardo Ziviani 提交于
      Sets valid.max_access_size and valid.min_access_size to ensure safe
      8-byte accesses to vfio. Today, 8-byte accesses are broken into pairs
      of 4-byte calls that goes unprotected:
      
      qemu_mutex_lock locked mutex 0x10905ad8
        vfio_region_write  (0001:03:00.0:region1+0xc0, 0x2020c, 4)
      qemu_mutex_unlock unlocked mutex 0x10905ad8
      qemu_mutex_lock locked mutex 0x10905ad8
        vfio_region_write  (0001:03:00.0:region1+0xc4, 0xa0000, 4)
      qemu_mutex_unlock unlocked mutex 0x10905ad8
      
      which occasionally leads to:
      
      qemu_mutex_lock locked mutex 0x10905ad8
        vfio_region_write  (0001:03:00.0:region1+0xc0, 0x2030c, 4)
      qemu_mutex_unlock unlocked mutex 0x10905ad8
      qemu_mutex_lock locked mutex 0x10905ad8
        vfio_region_write  (0001:03:00.0:region1+0xc0, 0x1000c, 4)
      qemu_mutex_unlock unlocked mutex 0x10905ad8
      qemu_mutex_lock locked mutex 0x10905ad8
        vfio_region_write  (0001:03:00.0:region1+0xc4, 0xb0000, 4)
      qemu_mutex_unlock unlocked mutex 0x10905ad8
      qemu_mutex_lock locked mutex 0x10905ad8
        vfio_region_write  (0001:03:00.0:region1+0xc4, 0xa0000, 4)
      qemu_mutex_unlock unlocked mutex 0x10905ad8
      
      causing strange errors in guest OS. With this patch, such accesses
      are protected by the same lock guard:
      
      qemu_mutex_lock locked mutex 0x10905ad8
      vfio_region_write  (0001:03:00.0:region1+0xc0, 0x2000c, 4)
      vfio_region_write  (0001:03:00.0:region1+0xc4, 0xb0000, 4)
      qemu_mutex_unlock unlocked mutex 0x10905ad8
      
      This happens because the 8-byte write should be broken into 4-byte
      writes by memory.c:access_with_adjusted_size() in order to be under
      the same lock. Today, it's done in exec.c:address_space_write_continue()
      which was able to handle only 4 bytes due to a zero'ed
      valid.max_access_size (see exec.c:memory_access_size()).
      Signed-off-by: NJose Ricardo Ziviani <joserz@linux.vnet.ibm.com>
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      15126cba
  8. 21 4月, 2017 1 次提交
    • P
      memory: add section range info for IOMMU notifier · 698feb5e
      Peter Xu 提交于
      In this patch, IOMMUNotifier.{start|end} are introduced to store section
      information for a specific notifier. When notification occurs, we not
      only check the notification type (MAP|UNMAP), but also check whether the
      notified iova range overlaps with the range of specific IOMMU notifier,
      and skip those notifiers if not in the listened range.
      
      When removing an region, we need to make sure we removed the correct
      VFIOGuestIOMMU by checking the IOMMUNotifier.start address as well.
      
      This patch is solving the problem that vfio-pci devices receive
      duplicated UNMAP notification on x86 platform when vIOMMU is there. The
      issue is that x86 IOMMU has a (0, 2^64-1) IOMMU region, which is
      splitted by the (0xfee00000, 0xfeefffff) IRQ region. AFAIK
      this (splitted IOMMU region) is only happening on x86.
      
      This patch also helps vhost to leverage the new interface as well, so
      that vhost won't get duplicated cache flushes. In that sense, it's an
      slight performance improvement.
      Suggested-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Reviewed-by: NEric Auger <eric.auger@redhat.com>
      Reviewed-by: NMichael S. Tsirkin <mst@redhat.com>
      Acked-by: NAlex Williamson <alex.williamson@redhat.com>
      Signed-off-by: NPeter Xu <peterx@redhat.com>
      Message-Id: <1491562755-23867-2-git-send-email-peterx@redhat.com>
      [ehabkost: included extra vhost_iommu_region_del() change from Peter Xu]
      Signed-off-by: NEduardo Habkost <ehabkost@redhat.com>
      698feb5e
  9. 18 2月, 2017 3 次提交
  10. 31 10月, 2016 3 次提交
    • Y
      vfio: Add support for mmapping sub-page MMIO BARs · 95251725
      Yongji Xie 提交于
      Now the kernel commit 05f0c03fbac1 ("vfio-pci: Allow to mmap
      sub-page MMIO BARs if the mmio page is exclusive") allows VFIO
      to mmap sub-page BARs. This is the corresponding QEMU patch.
      With those patches applied, we could passthrough sub-page BARs
      to guest, which can help to improve IO performance for some devices.
      
      In this patch, we expand MemoryRegions of these sub-page
      MMIO BARs to PAGE_SIZE in vfio_pci_write_config(), so that
      the BARs could be passed to KVM ioctl KVM_SET_USER_MEMORY_REGION
      with a valid size. The expanding size will be recovered when
      the base address of sub-page BAR is changed and not page aligned
      any more in guest. And we also set the priority of these BARs'
      memory regions to zero in case of overlap with BARs which share
      the same page with sub-page BARs in guest.
      Signed-off-by: NYongji Xie <xyjxie@linux.vnet.ibm.com>
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      95251725
    • A
      vfio: Handle zero-length sparse mmap ranges · 24acf72b
      Alex Williamson 提交于
      As reported in the link below, user has a PCI device with a 4KB BAR
      which contains the MSI-X table.  This seems to hit a corner case in
      the kernel where the region reports being mmap capable, but the sparse
      mmap information reports a zero sized range.  It's not entirely clear
      that the kernel is incorrect in doing this, but regardless, we need
      to handle it.  To do this, fill our mmap array only with non-zero
      sized sparse mmap entries and add an error return from the function
      so we can tell the difference between nr_mmaps being zero based on
      sparse mmap info vs lack of sparse mmap info.
      
      NB, this doesn't actually change the behavior of the device, it only
      removes the scary "Failed to mmap ... Performance may be slow" error
      message.  We cannot currently create an mmap over the MSI-X table.
      
      Link: http://lists.nongnu.org/archive/html/qemu-discuss/2016-10/msg00009.htmlSigned-off-by: NAlex Williamson <alex.williamson@redhat.com>
      24acf72b
    • A
      memory: Replace skip_dump flag with "ram_device" · 21e00fa5
      Alex Williamson 提交于
      Setting skip_dump on a MemoryRegion allows us to modify one specific
      code path, but the restriction we're trying to address encompasses
      more than that.  If we have a RAM MemoryRegion backed by a physical
      device, it not only restricts our ability to dump that region, but
      also affects how we should manipulate it.  Here we recognize that
      MemoryRegions do not change to sometimes allow dumps and other times
      not, so we replace setting the skip_dump flag with a new initializer
      so that we know exactly the type of region to which we're applying
      this behavior.
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      Acked-by: NPaolo Bonzini <pbonzini@redhat.com>
      21e00fa5
  11. 18 10月, 2016 3 次提交
  12. 27 9月, 2016 1 次提交
    • P
      memory: introduce IOMMUNotifier and its caps · cdb30812
      Peter Xu 提交于
      IOMMU Notifier list is used for notifying IO address mapping changes.
      Currently VFIO is the only user.
      
      However it is possible that future consumer like vhost would like to
      only listen to part of its notifications (e.g., cache invalidations).
      
      This patch introduced IOMMUNotifier and IOMMUNotfierFlag bits for a
      finer grained control of it.
      
      IOMMUNotifier contains a bitfield for the notify consumer describing
      what kind of notification it is interested in. Currently two kinds of
      notifications are defined:
      
      - IOMMU_NOTIFIER_MAP:    for newly mapped entries (additions)
      - IOMMU_NOTIFIER_UNMAP:  for entries to be removed (cache invalidates)
      
      When registering the IOMMU notifier, we need to specify one or multiple
      types of messages to listen to.
      
      When notifications are triggered, its type will be checked against the
      notifier's type bits, and only notifiers with registered bits will be
      notified.
      
      (For any IOMMU implementation, an in-place mapping change should be
       notified with an UNMAP followed by a MAP.)
      Signed-off-by: NPeter Xu <peterx@redhat.com>
      Message-Id: <1474606948-14391-2-git-send-email-peterx@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      cdb30812
  13. 12 7月, 2016 1 次提交
  14. 05 7月, 2016 3 次提交
    • A
      vfio/spapr: Create DMA window dynamically (SPAPR IOMMU v2) · 2e4109de
      Alexey Kardashevskiy 提交于
      New VFIO_SPAPR_TCE_v2_IOMMU type supports dynamic DMA window management.
      This adds ability to VFIO common code to dynamically allocate/remove
      DMA windows in the host kernel when new VFIO container is added/removed.
      
      This adds a helper to vfio_listener_region_add which makes
      VFIO_IOMMU_SPAPR_TCE_CREATE ioctl and adds just created IOMMU into
      the host IOMMU list; the opposite action is taken in
      vfio_listener_region_del.
      
      When creating a new window, this uses heuristic to decide on the TCE table
      levels number.
      
      This should cause no guest visible change in behavior.
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      [dwg: Added some casts to prevent printf() warnings on certain targets
       where the kernel headers' __u64 doesn't match uint64_t or PRIx64]
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      2e4109de
    • A
      vfio: Add host side DMA window capabilities · f4ec5e26
      Alexey Kardashevskiy 提交于
      There are going to be multiple IOMMUs per a container. This moves
      the single host IOMMU parameter set to a list of VFIOHostDMAWindow.
      
      This should cause no behavioral change and will be used later by
      the SPAPR TCE IOMMU v2 which will also add a vfio_host_win_del() helper.
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      f4ec5e26
    • A
      vfio: spapr: Add DMA memory preregistering (SPAPR IOMMU v2) · 318f67ce
      Alexey Kardashevskiy 提交于
      This makes use of the new "memory registering" feature. The idea is
      to provide the userspace ability to notify the host kernel about pages
      which are going to be used for DMA. Having this information, the host
      kernel can pin them all once per user process, do locked pages
      accounting (once) and not spent time on doing that in real time with
      possible failures which cannot be handled nicely in some cases.
      
      This adds a prereg memory listener which listens on address_space_memory
      and notifies a VFIO container about memory which needs to be
      pinned/unpinned. VFIO MMIO regions (i.e. "skip dump" regions) are skipped.
      
      The feature is only enabled for SPAPR IOMMU v2. The host kernel changes
      are required. Since v2 does not need/support VFIO_IOMMU_ENABLE, this does
      not call it when v2 is detected and enabled.
      
      This enforces guest RAM blocks to be host page size aligned; however
      this is not new as KVM already requires memory slots to be host page
      size aligned.
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      [dwg: Fix compile error on 32-bit host]
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      318f67ce
  15. 01 7月, 2016 1 次提交
  16. 22 6月, 2016 1 次提交
  17. 17 6月, 2016 2 次提交
  18. 27 5月, 2016 4 次提交
  19. 26 5月, 2016 1 次提交
  20. 19 5月, 2016 1 次提交
  21. 29 3月, 2016 1 次提交
  22. 16 3月, 2016 2 次提交
    • D
      vfio: Eliminate vfio_container_ioctl() · 3356128c
      David Gibson 提交于
      vfio_container_ioctl() was a bad interface that bypassed abstraction
      boundaries, had semantics that sat uneasily with its name, and was unsafe
      in many realistic circumstances.  Now that spapr-pci-vfio-host-bridge has
      been folded into spapr-pci-host-bridge, there are no more users, so remove
      it.
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Reviewed-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Acked-by: NAlex Williamson <alex.williamson@redhat.com>
      3356128c
    • D
      vfio: Start improving VFIO/EEH interface · 3153119e
      David Gibson 提交于
      At present the code handling IBM's Enhanced Error Handling (EEH) interface
      on VFIO devices operates by bypassing the usual VFIO logic with
      vfio_container_ioctl().  That's a poorly designed interface with unclear
      semantics about exactly what can be operated on.
      
      In particular it operates on a single vfio container internally (hence the
      name), but takes an address space and group id, from which it deduces the
      container in a rather roundabout way.  groupids are something that code
      outside vfio shouldn't even be aware of.
      
      This patch creates new interfaces for EEH operations.  Internally we
      have vfio_eeh_container_op() which takes a VFIOContainer object
      directly.  For external use we have vfio_eeh_as_ok() which determines
      if an AddressSpace is usable for EEH (at present this means it has a
      single container with exactly one group attached), and vfio_eeh_as_op()
      which will perform an operation on an AddressSpace in the unambiguous case,
      and otherwise returns an error.
      
      This interface still isn't great, but it's enough of an improvement to
      allow a number of cleanups in other places.
      Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Reviewed-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Acked-by: NAlex Williamson <alex.williamson@redhat.com>
      3153119e
  23. 11 3月, 2016 2 次提交
    • A
      vfio: Generalize region support · db0da029
      Alex Williamson 提交于
      Both platform and PCI vfio drivers create a "slow", I/O memory region
      with one or more mmap memory regions overlayed when supported by the
      device. Generalize this to a set of common helpers in the core that
      pulls the region info from vfio, fills the region data, configures
      slow mapping, and adds helpers for comleting the mmap, enable/disable,
      and teardown.  This can be immediately used by the PCI MSI-X code,
      which needs to mmap around the MSI-X vector table.
      
      This also changes VFIORegion.mem to be dynamically allocated because
      otherwise we don't know how the caller has allocated VFIORegion and
      therefore don't know whether to unreference it to destroy the
      MemoryRegion or not.
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      db0da029
    • A
      vfio: Wrap VFIO_DEVICE_GET_REGION_INFO · 46900226
      Alex Williamson 提交于
      In preparation for supporting capability chains on regions, wrap
      ioctl(VFIO_DEVICE_GET_REGION_INFO) so we don't duplicate the code for
      each caller.
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      46900226