1. 19 7月, 2016 1 次提交
    • A
      vfio/pci: Hide ARI capability · 383a7af7
      Alex Williamson 提交于
      QEMU supports ARI on downstream ports and assigned devices may support
      ARI in their extended capabilities.  The endpoint ARI capability
      specifies the next function, such that the OS doesn't need to walk
      each possible function, however this next function is relative to the
      host, not the guest.  This leads to device discovery issues when we
      combine separate functions into virtual multi-function packages in a
      guest.  For example, SR-IOV VFs are not enumerated by simply probing
      the function address space, therefore the ARI next-function field is
      zero.  When we combine multiple VFs together as a multi-function
      device in the guest, the guest OS identifies ARI is enabled, relies on
      this next-function field, and stops looking for additional function
      after the first is found.
      
      Long term we should expose the ARI capability to the guest to enable
      configurations with more than 8 functions per slot, but this requires
      additional QEMU PCI infrastructure to manage the next-function field
      for multiple, otherwise independent devices.  In the short term,
      hiding this capability allows equivalent functionality to what we
      currently have on non-express chipsets.
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      Reviewed-by: NMarcel Apfelbaum <marcel@redhat.com>
      383a7af7
  2. 05 7月, 2016 1 次提交
  3. 01 7月, 2016 2 次提交
    • A
      vfio/pci: Hide SR-IOV capability · e37dac06
      Alex Williamson 提交于
      The kernel currently exposes the SR-IOV capability as read-only
      through vfio-pci.  This is sufficient to protect the host kernel, but
      has the potential to confuse guests without further virtualization.
      In particular, OVMF tries to size the VF BARs and comes up with absurd
      results, ending with an assert.  There's not much point in adding
      virtualization to a read-only capability, so we simply hide it for
      now.  If the kernel ever enables SR-IOV virtualization, we should
      easily be able to test it through VF BAR sizing or explicit flags.
      
      Testing whether we should parse extended capabilities is also pulled
      into the function to keep these assumptions in one place.
      Tested-by: NLaszlo Ersek <lersek@redhat.com>
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      e37dac06
    • C
      vfio: add pcie extended capability support · 325ae8d5
      Chen Fan 提交于
      For vfio pcie device, we could expose the extended capability on
      PCIE bus. due to add a new pcie capability at the tail of the chain,
      in order to avoid config space overwritten, we introduce a copy config
      for parsing extended caps. and rebuild the pcie extended config space.
      Signed-off-by: NChen Fan <chen.fan.fnst@cn.fujitsu.com>
      Tested-by: NLaszlo Ersek <lersek@redhat.com>
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      325ae8d5
  4. 17 6月, 2016 1 次提交
  5. 27 5月, 2016 5 次提交
    • A
      vfio/pci: Add a separate option for IGD OpRegion support · 6ced0bba
      Alex Williamson 提交于
      The IGD OpRegion is enabled automatically when running in legacy mode,
      but it can sometimes be useful in universal passthrough mode as well.
      Without an OpRegion, output spigots don't work, and even though Intel
      doesn't officially support physical outputs in UPT mode, it's a
      useful feature.  Note that if an OpRegion is enabled but a monitor is
      not connected, some graphics features will be disabled in the guest
      versus a headless system without an OpRegion, where they would work.
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      Reviewed-by: NGerd Hoffmann <kraxel@redhat.com>
      Tested-by: NGerd Hoffmann <kraxel@redhat.com>
      6ced0bba
    • A
      vfio/pci: Intel graphics legacy mode assignment · c4c45e94
      Alex Williamson 提交于
      Enable quirks to support SandyBridge and newer IGD devices as primary
      VM graphics.  This requires new vfio-pci device specific regions added
      in kernel v4.6 to expose the IGD OpRegion, the shadow ROM, and config
      space access to the PCI host bridge and LPC/ISA bridge.  VM firmware
      support, SeaBIOS only so far, is also required for reserving memory
      regions for IGD specific use.  In order to enable this mode, IGD must
      be assigned to the VM at PCI bus address 00:02.0, it must have a ROM,
      it must be able to enable VGA, it must have or be able to create on
      its own an LPC/ISA bridge of the proper type at PCI bus address
      00:1f.0 (sorry, not compatible with Q35 yet), and it must have the
      above noted vfio-pci kernel features and BIOS.  The intention is that
      to enable this mode, a user simply needs to assign 00:02.0 from the
      host to 00:02.0 in the VM:
      
        -device vfio-pci,host=0000:00:02.0,bus=pci.0,addr=02.0
      
      and everything either happens automatically or it doesn't.  In the
      case that it doesn't, we leave error reports, but assume the device
      will operate in universal passthrough mode (UPT), which doesn't
      require any of this, but has a much more narrow window of supported
      devices, supported use cases, and supported guest drivers.
      
      When using IGD in this mode, the VM firmware is required to reserve
      some VM RAM for the OpRegion (on the order or several 4k pages) and
      stolen memory for the GTT (up to 8MB for the latest GPUs).  An
      additional option, x-igd-gms allows the user to specify some amount
      of additional memory (value is number of 32MB chunks up to 512MB) that
      is pre-allocated for graphics use.  TBH, I don't know of anything that
      requires this or makes use of this memory, which is why we don't
      allocate any by default, but the specification suggests this is not
      actually a valid combination, so the option exists as a workaround.
      Please report if it's actually necessary in some environment.
      
      See code comments for further discussion about the actual operation
      of the quirks necessary to assign these devices.
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      Reviewed-by: NGerd Hoffmann <kraxel@redhat.com>
      Tested-by: NGerd Hoffmann <kraxel@redhat.com>
      c4c45e94
    • A
      vfio/pci: Setup BAR quirks after capabilities probing · 581406e0
      Alex Williamson 提交于
      Capability probing modifies wmask, which quirks may be interested in
      changing themselves.  Apply our BAR quirks after the capability scan
      to make this possible.
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      Reviewed-by: NGerd Hoffmann <kraxel@redhat.com>
      Tested-by: NGerd Hoffmann <kraxel@redhat.com>
      581406e0
    • A
      vfio/pci: Consolidate VGA setup · 182bca45
      Alex Williamson 提交于
      Combine VGA discovery and registration.  Quirks can have dependencies
      on BARs, so the quirks push out until after we've scanned the BARs.
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      Reviewed-by: NGerd Hoffmann <kraxel@redhat.com>
      Tested-by: NGerd Hoffmann <kraxel@redhat.com>
      182bca45
    • A
      vfio/pci: Fix return of vfio_populate_vga() · 4225f2b6
      Alex Williamson 提交于
      This function returns success if either we setup the VGA region or
      the host vfio doesn't return enough regions to support the VGA index.
      This latter case doesn't make any sense.  If we're asked to populate
      VGA, fail if it doesn't exist and let the caller decide if that's
      important.
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      Reviewed-by: NGerd Hoffmann <kraxel@redhat.com>
      Tested-by: NGerd Hoffmann <kraxel@redhat.com>
      4225f2b6
  6. 11 3月, 2016 7 次提交
    • N
      vfio/pci: replace fixed string limit by g_strdup_printf · 062ed5d8
      Neo Jia 提交于
      A trivial change to remove string limit by using g_strdup_printf
      Tested-by: NNeo Jia <cjia@nvidia.com>
      Signed-off-by: NNeo Jia <cjia@nvidia.com>
      Signed-off-by: NKirti Wankhede <kwankhede@nvidia.com>
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      062ed5d8
    • A
      vfio/pci: Split out VGA setup · e593c021
      Alex Williamson 提交于
      This could be setup later by device specific code, such as IGD
      initialization.
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      e593c021
    • A
      vfio/pci: Fixup PCI option ROMs · e2e5ee9c
      Alex Williamson 提交于
      Devices like Intel graphics are known to not only have bad checksums,
      but also the wrong device ID.  This is not so surprising given that
      the video BIOS is typically part of the system firmware image rather
      that embedded into the device and needs to support any IGD device
      installed into the system.
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      e2e5ee9c
    • A
      vfio/pci: Convert all MemoryRegion to dynamic alloc and consistent functions · 2d82f8a3
      Alex Williamson 提交于
      Match common vfio code with setup, exit, and finalize functions for
      BAR, quirk, and VGA management.  VGA is also changed to dynamic
      allocation to match the other MemoryRegions.
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      2d82f8a3
    • A
      vfio: Generalize region support · db0da029
      Alex Williamson 提交于
      Both platform and PCI vfio drivers create a "slow", I/O memory region
      with one or more mmap memory regions overlayed when supported by the
      device. Generalize this to a set of common helpers in the core that
      pulls the region info from vfio, fills the region data, configures
      slow mapping, and adds helpers for comleting the mmap, enable/disable,
      and teardown.  This can be immediately used by the PCI MSI-X code,
      which needs to mmap around the MSI-X vector table.
      
      This also changes VFIORegion.mem to be dynamically allocated because
      otherwise we don't know how the caller has allocated VFIORegion and
      therefore don't know whether to unreference it to destroy the
      MemoryRegion or not.
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      db0da029
    • A
      vfio: Wrap VFIO_DEVICE_GET_REGION_INFO · 46900226
      Alex Williamson 提交于
      In preparation for supporting capability chains on regions, wrap
      ioctl(VFIO_DEVICE_GET_REGION_INFO) so we don't duplicate the code for
      each caller.
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      46900226
    • A
      vfio: Add sysfsdev property for pci & platform · 7df9381b
      Alex Williamson 提交于
      vfio-pci currently requires a host= parameter, which comes in the
      form of a PCI address in [domain:]<bus:slot.function> notation.  We
      expect to find a matching entry in sysfs for that under
      /sys/bus/pci/devices/.  vfio-platform takes a similar approach, but
      defines the host= parameter to be a string, which can be matched
      directly under /sys/bus/platform/devices/.  On the PCI side, we have
      some interest in using vfio to expose vGPU devices.  These are not
      actual discrete PCI devices, so they don't have a compatible host PCI
      bus address or a device link where QEMU wants to look for it.  There's
      also really no requirement that vfio can only be used to expose
      physical devices, a new vfio bus and iommu driver could expose a
      completely emulated device.  To fit within the vfio framework, it
      would need a kernel struct device and associated IOMMU group, but
      those are easy constraints to manage.
      
      To support such devices, which would include vGPUs, that honor the
      VFIO PCI programming API, but are not necessarily backed by a unique
      PCI address, add support for specifying any device in sysfs.  The
      vfio API already has support for probing the device type to ensure
      compatibility with either vfio-pci or vfio-platform.
      
      With this, a vfio-pci device could either be specified as:
      
      -device vfio-pci,host=02:00.0
      
      or
      
      -device vfio-pci,sysfsdev=/sys/devices/pci0000:00/0000:00:1c.0/0000:02:00.0
      
      or even
      
      -device vfio-pci,sysfsdev=/sys/bus/pci/devices/0000:02:00.0
      
      When vGPU support comes along, this might look something more like:
      
      -device vfio-pci,sysfsdev=/sys/devices/virtual/intel-vgpu/vgpu0@0000:00:02.0
      
      NB - This is only a made up example path
      
      The same change is made for vfio-platform, specifying sysfsdev has
      precedence over the old host option.
      Tested-by: NEric Auger <eric.auger@linaro.org>
      Reviewed-by: NEric Auger <eric.auger@linaro.org>
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      7df9381b
  7. 20 2月, 2016 3 次提交
  8. 29 1月, 2016 1 次提交
  9. 20 1月, 2016 1 次提交
    • A
      vfio/pci: Lazy PBA emulation · 95239e16
      Alex Williamson 提交于
      The PCI spec recommends devices use additional alignment for MSI-X
      data structures to allow software to map them to separate processor
      pages.  One advantage of doing this is that we can emulate those data
      structures without a significant performance impact to the operation
      of the device.  Some devices fail to implement that suggestion and
      assigned device performance suffers.
      
      One such case of this is a Mellanox MT27500 series, ConnectX-3 VF,
      where the MSI-X vector table and PBA are aligned on separate 4K
      pages.  If PBA emulation is enabled, performance suffers.  It's not
      clear how much value we get from PBA emulation, but the solution here
      is to only lazily enable the emulated PBA when a masked MSI-X vector
      fires.  We then attempt to more aggresively disable the PBA memory
      region any time a vector is unmasked.  The expectation is then that
      a typical VM will run entirely with PBA emulation disabled, and only
      when used is that emulation re-enabled.
      Reported-by: NShyam Kaushik <shyam.kaushik@gmail.com>
      Tested-by: NShyam Kaushik <shyam.kaushik@gmail.com>
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      95239e16
  10. 11 11月, 2015 2 次提交
  11. 19 10月, 2015 1 次提交
    • P
      kvm: Pass PCI device pointer to MSI routing functions · dc9f06ca
      Pavel Fedin 提交于
      In-kernel ITS emulation on ARM64 will require to supply requester IDs.
      These IDs can now be retrieved from the device pointer using new
      pci_requester_id() function.
      
      This patch adds pci_dev pointer to KVM GSI routing functions and makes
      callers passing it.
      
      x86 architecture does not use requester IDs, but hw/i386/kvm/pci-assign.c
      also made passing PCI device pointer instead of NULL for consistency with
      the rest of the code.
      Signed-off-by: NPavel Fedin <p.fedin@samsung.com>
      Message-Id: <ce081423ba2394a4efc30f30708fca07656bc500.1444916432.git.p.fedin@samsung.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      dc9f06ca
  12. 24 9月, 2015 11 次提交
  13. 11 9月, 2015 2 次提交
  14. 23 7月, 2015 2 次提交
    • A
      vfio/pci: Fix bootindex · 759b484c
      Alex Williamson 提交于
      bootindex was incorrectly changed to a device Property during the
      platform code split, resulting in it no longer working.  Remove it.
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      Cc: qemu-stable@nongnu.org # v2.3+
      759b484c
    • A
      vfio/pci: Fix RTL8168 NIC quirks · 69970fce
      Alex Williamson 提交于
      The RTL8168 quirk correctly describes using bit 31 as a signal to
      mark a latch/completion, but the code mistakenly uses bit 28.  This
      causes the Realtek driver to spin on this register for quite a while,
      20k cycles on Windows 7 v7.092 driver.  Then it gets frustrated and
      tries to set the bit itself and spins for another 20k cycles.  For
      some this still results in a working driver, for others not.  About
      the only thing the code really does in its current form is protect
      the guest from sneaking in writes to the real hardware MSI-X table.
      The fix is obviously to use bit 31 as we document that we should.
      
      The other problem doesn't seem to affect current drivers as nobody
      seems to use these window registers for writes to the MSI-X table, but
      we need to use the stored data when a write is triggered, not the
      value of the current write, which only provides the offset.
      
      Note that only the Windows drivers from Realtek seem to use these
      registers, the Microsoft drivers provided with Windows 8.1 do not
      access them, nor do Linux in-kernel drivers.
      
      Link: https://bugs.launchpad.net/qemu/+bug/1384892Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      Cc: qemu-stable@nongnu.org # v2.1+
      69970fce