1. 07 2月, 2018 14 次提交
    • P
      Merge remote-tracking branch 'remotes/awilliam/tags/vfio-update-20180206.0' into staging · ea62da09
      Peter Maydell 提交于
      VFIO updates 2018-02-06
      
       - SPAPR in-kernel TCE accleration (Alexey Kardashevskiy)
      
       - MSI-X relocation (Alex Williamson)
      
       - Add missing platform mutex init (Eric Auger)
      
       - Redundant variable cleanup (Alexey Kardashevskiy)
      
       - Option to disable GeForce quirks (Alex Williamson)
      
      # gpg: Signature made Tue 06 Feb 2018 18:21:22 GMT
      # gpg:                using RSA key 239B9B6E3BB08B22
      # gpg: Good signature from "Alex Williamson <alex.williamson@redhat.com>"
      # gpg:                 aka "Alex Williamson <alex@shazbot.org>"
      # gpg:                 aka "Alex Williamson <alwillia@redhat.com>"
      # gpg:                 aka "Alex Williamson <alex.l.williamson@gmail.com>"
      # Primary key fingerprint: 42F6 C04E 540B D1A9 9E7B  8A90 239B 9B6E 3BB0 8B22
      
      * remotes/awilliam/tags/vfio-update-20180206.0:
        vfio/pci: Add option to disable GeForce quirks
        vfio/common: Remove redundant copy of local variable
        hw/vfio/platform: Init the interrupt mutex
        vfio/pci: Allow relocating MSI-X MMIO
        qapi: Create DEFINE_PROP_OFF_AUTO_PCIBAR
        vfio/pci: Emulate BARs
        vfio/pci: Add base BAR MemoryRegion
        vfio/pci: Fixup VFIOMSIXInfo comment
        spapr/iommu: Enable in-kernel TCE acceleration via VFIO KVM device
        vfio/spapr: Use iommu memory region's get_attr()
        memory/iommu: Add get_attr()
      Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>
      ea62da09
    • P
      Merge remote-tracking branch 'remotes/dgilbert/tags/pull-migration-20180206a' into staging · 0833df03
      Peter Maydell 提交于
      Migration pull 2018-02-06
      
      This is based off Juan's last pull with a few extras, but
      also removing:
         Add migration xbzrle test
         Add migration precopy test
      
      As well as my normal test boxes, I also gave it a test
      on a 32 bit ARM box and it seems happy (a Calxeda highbank)
      and a big-endian power box.
      
      Dave
      
      # gpg: Signature made Tue 06 Feb 2018 15:33:31 GMT
      # gpg:                using RSA key 0516331EBC5BFDE7
      # gpg: Good signature from "Dr. David Alan Gilbert (RH2) <dgilbert@redhat.com>"
      # Primary key fingerprint: 45F5 C71B 4A0C B7FB 977A  9FA9 0516 331E BC5B FDE7
      
      * remotes/dgilbert/tags/pull-migration-20180206a:
        migration: incoming postcopy advise sanity checks
        migration: Don't leak IO channels
        migration: Recover block devices if failure in device state
        tests: Adjust sleeps for migration test
        tests: Create migrate-start-postcopy command
        tests: Add deprecated commands migration test
        tests: Use consistent names for migration
        tests: Consolidate accelerators declaration
        tests: Remove deprecated migration tests commands
        migration: Drop current address parameter from save_zero_page()
        migration: use s->threshold_size inside migration_update_counters
        migration/savevm.c: set MAX_VM_CMD_PACKAGED_SIZE to 1ul << 32
        migration: Route errors down through migration_channel_connect
        migration: Allow migrate_fd_connect to take an Error *
      Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>
      0833df03
    • P
      Merge remote-tracking branch 'remotes/ehabkost/tags/python-next-pull-request' into staging · bc2943d6
      Peter Maydell 提交于
      Python queue, 2018-02-05
      
      # gpg: Signature made Mon 05 Feb 2018 23:07:57 GMT
      # gpg:                using RSA key 2807936F984DC5A6
      # gpg: Good signature from "Eduardo Habkost <ehabkost@redhat.com>"
      # Primary key fingerprint: 5A32 2FD5 ABC4 D3DB ACCF  D1AA 2807 936F 984D C5A6
      
      * remotes/ehabkost/tags/python-next-pull-request: (21 commits)
        docker: change Fedora images to run with python3
        travis: improve python version test coverage
        ui: update keycodemapdb to get py3 fixes
        input: add missing JIS keys to virtio input
        qemu.py: don't launch again before shutdown()
        qemu.py: cleanup redundant calls in launch()
        qemu.py: use poll() instead of 'returncode'
        qemu.py: always cleanup on shutdown()
        qemu.py: refactor launch()
        qemu.py: better control of created files
        qemu.py: remove unused import
        configure: allow use of python 3
        scripts: ensure signrom treats data as bytes
        qapi: force a UTF-8 locale for running Python
        qapi: ensure stable sort ordering when checking QAPI entities
        qapi: remove '-q' arg to diff when comparing QAPI output
        qapi: Adapt to moved location of 'maketrans' function in py3
        qapi: adapt to moved location of StringIO module in py3
        qapi: Use OrderedDict from standard library if available
        qapi: use items()/values() intead of iteritems()/itervalues()
        ...
      Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>
      bc2943d6
    • A
      vfio/pci: Add option to disable GeForce quirks · db32d0f4
      Alex Williamson 提交于
      These quirks are necessary for GeForce, but not for Quadro/GRID/Tesla
      assignment.  Leaving them enabled is fully functional and provides the
      most compatibility, but due to the unique NVIDIA MSI ACK behavior[1],
      it also introduces latency in re-triggering the MSI interrupt.  This
      overhead is typically negligible, but has been shown to adversely
      affect some (very) high interrupt rate applications.  This adds the
      vfio-pci device option "x-no-geforce-quirks=" which can be set to
      "on" to disable this additional overhead.
      
      A follow-on optimization for GeForce might be to make use of an
      ioeventfd to allow KVM to trigger an irqfd in the kernel vfio-pci
      driver, avoiding the bounce through userspace to handle this device
      write.
      
      [1] Background: the NVIDIA driver has been observed to issue a write
      to the MMIO mirror of PCI config space in BAR0 in order to allow the
      MSI interrupt for the device to retrigger.  Older reports indicated a
      write of 0xff to the (read-only) MSI capability ID register, while
      more recently a write of 0x0 is observed at config space offset 0x704,
      non-architected, extended config space of the device (BAR0 offset
      0x88704).  Virtualization of this range is only required for GeForce.
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      db32d0f4
    • A
      vfio/common: Remove redundant copy of local variable · a5b04f7c
      Alexey Kardashevskiy 提交于
      There is already @hostwin in vfio_listener_region_add() so there is no
      point in having the other one.
      
      Fixes: 2e4109de ("vfio/spapr: Create DMA window dynamically (SPAPR IOMMU v2)")
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      a5b04f7c
    • E
      hw/vfio/platform: Init the interrupt mutex · 89202c6f
      Eric Auger 提交于
      Add the initialization of the mutex protecting the interrupt list.
      Signed-off-by: NEric Auger <eric.auger@redhat.com>
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      89202c6f
    • A
      vfio/pci: Allow relocating MSI-X MMIO · 89d5202e
      Alex Williamson 提交于
      Recently proposed vfio-pci kernel changes (v4.16) remove the
      restriction preventing userspace from mmap'ing PCI BARs in areas
      overlapping the MSI-X vector table.  This change is primarily intended
      to benefit host platforms which make use of system page sizes larger
      than the PCI spec recommendation for alignment of MSI-X data
      structures (ie. not x86_64).  In the case of POWER systems, the SPAPR
      spec requires the VM to program MSI-X using hypercalls, rendering the
      MSI-X vector table unused in the VM view of the device.  However,
      ARM64 platforms also support 64KB pages and rely on QEMU emulation of
      MSI-X.  Regardless of the kernel driver allowing mmaps overlapping
      the MSI-X vector table, emulation of the MSI-X vector table also
      prevents direct mapping of device MMIO spaces overlapping this page.
      Thanks to the fact that PCI devices have a standard self discovery
      mechanism, we can try to resolve this by relocating the MSI-X data
      structures, either by creating a new PCI BAR or extending an existing
      BAR and updating the MSI-X capability for the new location.  There's
      even a very slim chance that this could benefit devices which do not
      adhere to the PCI spec alignment guidelines on x86_64 systems.
      
      This new x-msix-relocation option accepts the following choices:
      
        off: Disable MSI-X relocation, use native device config (default)
        auto: Use a known good combination for the platform/device (none yet)
        bar0..bar5: Specify the target BAR for MSI-X data structures
      
      If compatible, the target BAR will either be created or extended and
      the new portion will be used for MSI-X emulation.
      
      The first obvious user question with this option is how to determine
      whether a given platform and device might benefit from this option.
      In most cases, the answer is that it won't, especially on x86_64.
      Devices often dedicate an entire BAR to MSI-X and therefore no
      performance sensitive registers overlap the MSI-X area.  Take for
      example:
      
      # lspci -vvvs 0a:00.0
      0a:00.0 Ethernet controller: Intel Corporation I350 Gigabit Network Connection
      	...
      	Region 0: Memory at db680000 (32-bit, non-prefetchable) [size=512K]
      	Region 3: Memory at db7f8000 (32-bit, non-prefetchable) [size=16K]
      	...
      	Capabilities: [70] MSI-X: Enable+ Count=10 Masked-
      		Vector table: BAR=3 offset=00000000
      		PBA: BAR=3 offset=00002000
      
      This device uses the 16K bar3 for MSI-X with the vector table at
      offset zero and the pending bits arrary at offset 8K, fully honoring
      the PCI spec alignment guidance.  The data sheet specifically refers
      to this as an MSI-X BAR.  This device would not see a benefit from
      MSI-X relocation regardless of the platform, regardless of the page
      size.
      
      However, here's another example:
      
      # lspci -vvvs 02:00.0
      02:00.0 Serial Attached SCSI controller: xxxxxxxx
      	...
      	Region 0: I/O ports at c000 [size=256]
      	Region 1: Memory at ef640000 (64-bit, non-prefetchable) [size=64K]
      	Region 3: Memory at ef600000 (64-bit, non-prefetchable) [size=256K]
      	...
      	Capabilities: [c0] MSI-X: Enable+ Count=16 Masked-
      		Vector table: BAR=1 offset=0000e000
      		PBA: BAR=1 offset=0000f000
      
      Here the MSI-X data structures are placed on separate 4K pages at the
      end of a 64KB BAR.  If our host page size is 4K, we're likely fine,
      but at 64KB page size, MSI-X emulation at that location prevents the
      entire BAR from being directly mapped into the VM address space.
      Overlapping performance sensitive registers then starts to be a very
      likely scenario on such a platform.  At this point, the user could
      enable tracing on vfio_region_read and vfio_region_write to determine
      more conclusively if device accesses are being trapped through QEMU.
      
      Upon finding a device and platform in need of MSI-X relocation, the
      next problem is how to choose target PCI BAR to host the MSI-X data
      structures.  A few key rules to keep in mind for this selection
      include:
      
       * There are only 6 BAR slots, bar0..bar5
       * 64-bit BARs occupy two BAR slots, 'lspci -vvv' lists the first slot
       * PCI BARs are always a power of 2 in size, extending == doubling
       * The maximum size of a 32-bit BAR is 2GB
       * MSI-X data structures must reside in an MMIO BAR
      
      Using these rules, we can evaluate each BAR of the second example
      device above as follows:
      
       bar0: I/O port BAR, incompatible with MSI-X tables
       bar1: BAR could be extended, incurring another 64KB of MMIO
       bar2: Unavailable, bar1 is 64-bit, this register is used by bar1
       bar3: BAR could be extended, incurring another 256KB of MMIO
       bar4: Unavailable, bar3 is 64bit, this register is used by bar3
       bar5: Available, empty BAR, minimum additional MMIO
      
      A secondary optimization we might wish to make in relocating MSI-X
      is to minimize the additional MMIO required for the device, therefore
      we might test the available choices in order of preference as bar5,
      bar1, and finally bar3.  The original proposal for this feature
      included an 'auto' option which would choose bar5 in this case, but
      various drivers have been found that make assumptions about the
      properties of the "first" BAR or the size of BARs such that there
      appears to be no foolproof automatic selection available, requiring
      known good combinations to be sourced from users.  This patch is
      pre-enabled for an 'auto' selection making use of a validated lookup
      table, but no entries are yet identified.
      Tested-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Reviewed-by: NEric Auger <eric.auger@redhat.com>
      Tested-by: NEric Auger <eric.auger@redhat.com>
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      89d5202e
    • A
      qapi: Create DEFINE_PROP_OFF_AUTO_PCIBAR · c3bbbdbf
      Alex Williamson 提交于
      Add an option which allows the user to specify a PCI BAR number,
      including an 'off' and 'auto' selection.
      
      Cc: Markus Armbruster <armbru@redhat.com>
      Cc: Eric Blake <eblake@redhat.com>
      Tested-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Reviewed-by: NEric Auger <eric.auger@redhat.com>
      Tested-by: NEric Auger <eric.auger@redhat.com>
      Reviewed-by: NMarkus Armbruster <armbru@redhat.com>
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      c3bbbdbf
    • A
      vfio/pci: Emulate BARs · 04f336b0
      Alex Williamson 提交于
      The kernel provides similar emulation of PCI BAR register access to
      QEMU, so up until now we've used that for things like BAR sizing and
      storing the BAR address.  However, if we intend to resize BARs or add
      BARs that don't exist on the physical device, we need to switch to the
      pure QEMU emulation of the BAR.
      Tested-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Reviewed-by: NEric Auger <eric.auger@redhat.com>
      Tested-by: NEric Auger <eric.auger@redhat.com>
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      04f336b0
    • A
      vfio/pci: Add base BAR MemoryRegion · 3a286732
      Alex Williamson 提交于
      Add one more layer to our stack of MemoryRegions, this base region
      allows us to register BARs independently of the vfio region or to
      extend the size of BARs which do map to a region.  This will be
      useful when we want hypervisor defined BARs or sections of BARs,
      for purposes such as relocating MSI-X emulation.  We therefore call
      msix_init() based on this new base MemoryRegion, while the quirks,
      which only modify regions still operate on those sub-MemoryRegions.
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      3a286732
    • A
      vfio/pci: Fixup VFIOMSIXInfo comment · edd09278
      Alex Williamson 提交于
      The fields were removed in the referenced commit, but the comment
      still mentions them.
      
      Fixes: 2fb9636e ("vfio-pci: Remove unused fields from VFIOMSIXInfo")
      Tested-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Reviewed-by: NEric Auger <eric.auger@redhat.com>
      Tested-by: NEric Auger <eric.auger@redhat.com>
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      edd09278
    • A
      spapr/iommu: Enable in-kernel TCE acceleration via VFIO KVM device · 9ded780c
      Alexey Kardashevskiy 提交于
      In order to enable TCE operations support in KVM, we have to inform
      the KVM about VFIO groups being attached to specific LIOBNs;
      the necessary bits are implemented already by IOMMU MR and VFIO.
      
      This defines get_attr() for the SPAPR TCE IOMMU MR which makes VFIO
      call the KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE ioctl and establish
      LIOBN-to-IOMMU link.
      
      This changes spapr_tce_set_need_vfio() to avoid TCE table reallocation
      if the kernel supports the TCE acceleration.
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Acked-by: NPaolo Bonzini <pbonzini@redhat.com>
      Acked-by: NDavid Gibson <david@gibson.dropbear.id.au>
      [aw - remove unnecessary sys/ioctl.h include]
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      9ded780c
    • A
      vfio/spapr: Use iommu memory region's get_attr() · 07bc681a
      Alexey Kardashevskiy 提交于
      In order to enable TCE operations support in KVM, we have to inform
      the KVM about VFIO groups being attached to specific LIOBNs. The KVM
      already knows about VFIO groups, the only bit missing is which
      in-kernel TCE table (the one with user visible TCEs) should update
      the attached broups. There is an KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE
      attribute of the VFIO KVM device which receives a groupfd/tablefd couple.
      
      This uses a new memory_region_iommu_get_attr() helper to get the IOMMU fd
      and calls KVM to establish the link.
      
      As get_attr() is not implemented yet, this should cause no behavioural
      change.
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Acked-by: NPaolo Bonzini <pbonzini@redhat.com>
      Acked-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      07bc681a
    • A
      memory/iommu: Add get_attr() · f1334de6
      Alexey Kardashevskiy 提交于
      This adds get_attr() to IOMMUMemoryRegionClass, like
      iommu_ops::domain_get_attr in the Linux kernel.
      
      This defines the first attribute - IOMMU_ATTR_SPAPR_TCE_FD - which
      will be used between the pSeries machine and VFIO-PCI.
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Acked-by: NPaolo Bonzini <pbonzini@redhat.com>
      Acked-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      f1334de6
  2. 06 2月, 2018 26 次提交