1. 08 2月, 2018 2 次提交
    • P
      Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging · 7b213bb4
      Peter Maydell 提交于
      * socket option parsing fix (Daniel)
      * SCSI fixes (Fam)
      * Readline double-free fix (Greg)
      * More HVF attribution fixes (Izik)
      * WHPX (Windows Hypervisor Platform Extensions) support (Justin)
      * POLLHUP handler (Klim)
      * ivshmem fixes (Ladi)
      * memfd memory backend (Marc-André)
      * improved error message (Marcelo)
      * Memory fixes (Peter Xu, Zhecheng)
      * Remove obsolete code and comments (Peter M.)
      * qdev API improvements (Philippe)
      * Add CONFIG_I2C switch (Thomas)
      
      # gpg: Signature made Wed 07 Feb 2018 15:24:08 GMT
      # gpg:                using RSA key BFFBD25F78C7AE83
      # gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>"
      # gpg:                 aka "Paolo Bonzini <pbonzini@redhat.com>"
      # Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4  E2F7 7E15 100C CD36 69B1
      #      Subkey fingerprint: F133 3857 4B66 2389 866C  7682 BFFB D25F 78C7 AE83
      
      * remotes/bonzini/tags/for-upstream: (47 commits)
        Add the WHPX acceleration enlightenments
        Introduce the WHPX impl
        Add the WHPX vcpu API
        Add the Windows Hypervisor Platform accelerator.
        tests/test-filter-redirector: move close()
        tests: use memfd in vhost-user-test
        vhost-user-test: make read-guest-mem setup its own qemu
        tests: keep compiling failing vhost-user tests
        Add memfd based hostmem
        memfd: add hugetlbsize argument
        memfd: add hugetlb support
        memfd: add error argument, instead of perror()
        cpus: join thread when removing a vCPU
        cpus: hvf: unregister thread with RCU
        cpus: tcg: unregister thread with RCU, fix exiting of loop on unplug
        cpus: dummy: unregister thread with RCU, exit loop on unplug
        cpus: kvm: unregister thread with RCU
        cpus: hax: register/unregister thread with RCU, exit loop on unplug
        ivshmem: Disable irqfd on device reset
        ivshmem: Improve MSI irqfd error handling
        ...
      Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>
      
      # Conflicts:
      #	cpus.c
      7b213bb4
    • P
      Merge remote-tracking branch 'remotes/armbru/tags/pull-error-2018-02-06' into staging · 17a5bbb4
      Peter Maydell 提交于
      Error reporting patches for 2018-02-06
      
      # gpg: Signature made Tue 06 Feb 2018 19:48:30 GMT
      # gpg:                using RSA key 3870B400EB918653
      # gpg: Good signature from "Markus Armbruster <armbru@redhat.com>"
      # gpg:                 aka "Markus Armbruster <armbru@pond.sub.org>"
      # Primary key fingerprint: 354B C8B3 D7EB 2A6B 6867  4E5F 3870 B400 EB91 8653
      
      * remotes/armbru/tags/pull-error-2018-02-06:
        tcg: Replace fprintf(stderr, "*\n" with error_report()
        hw/xen*: Replace fprintf(stderr, "*\n" with error_report()
        hw/sparc*: Replace fprintf(stderr, "*\n" with error_report()
        hw/sd: Replace fprintf(stderr, "*\n" with DPRINTF()
        hw/ppc: Replace fprintf(stderr, "*\n" with error_report()
        hw/pci*: Replace fprintf(stderr, "*\n" with error_report()
        hw/openrisc: Replace fprintf(stderr, "*\n" with error_report()
        hw/moxie: Replace fprintf(stderr, "*\n" with error_report()
        hw/mips: Replace fprintf(stderr, "*\n" with error_report()
        hw/lm32: Replace fprintf(stderr, "*\n" with error_report()
        hw/dma: Replace fprintf(stderr, "*\n" with error_report()
        hw/arm: Replace fprintf(stderr, "*\n" with error_report()
        audio: Replace AUDIO_FUNC with __func__
        error: Improve documentation of error_append_hint()
      Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>
      17a5bbb4
  2. 07 2月, 2018 38 次提交
    • P
      Merge remote-tracking branch 'remotes/awilliam/tags/vfio-update-20180206.0' into staging · ea62da09
      Peter Maydell 提交于
      VFIO updates 2018-02-06
      
       - SPAPR in-kernel TCE accleration (Alexey Kardashevskiy)
      
       - MSI-X relocation (Alex Williamson)
      
       - Add missing platform mutex init (Eric Auger)
      
       - Redundant variable cleanup (Alexey Kardashevskiy)
      
       - Option to disable GeForce quirks (Alex Williamson)
      
      # gpg: Signature made Tue 06 Feb 2018 18:21:22 GMT
      # gpg:                using RSA key 239B9B6E3BB08B22
      # gpg: Good signature from "Alex Williamson <alex.williamson@redhat.com>"
      # gpg:                 aka "Alex Williamson <alex@shazbot.org>"
      # gpg:                 aka "Alex Williamson <alwillia@redhat.com>"
      # gpg:                 aka "Alex Williamson <alex.l.williamson@gmail.com>"
      # Primary key fingerprint: 42F6 C04E 540B D1A9 9E7B  8A90 239B 9B6E 3BB0 8B22
      
      * remotes/awilliam/tags/vfio-update-20180206.0:
        vfio/pci: Add option to disable GeForce quirks
        vfio/common: Remove redundant copy of local variable
        hw/vfio/platform: Init the interrupt mutex
        vfio/pci: Allow relocating MSI-X MMIO
        qapi: Create DEFINE_PROP_OFF_AUTO_PCIBAR
        vfio/pci: Emulate BARs
        vfio/pci: Add base BAR MemoryRegion
        vfio/pci: Fixup VFIOMSIXInfo comment
        spapr/iommu: Enable in-kernel TCE acceleration via VFIO KVM device
        vfio/spapr: Use iommu memory region's get_attr()
        memory/iommu: Add get_attr()
      Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>
      ea62da09
    • J
      Add the WHPX acceleration enlightenments · 19306806
      Justin Terry (VM) 提交于
      Implements the WHPX accelerator cpu enlightenments to actually use the whpx-all
      accelerator on Windows platforms.
      Signed-off-by: NJustin Terry (VM) <juterry@microsoft.com>
      Message-Id: <1516655269-1785-5-git-send-email-juterry@microsoft.com>
      [Register/unregister VCPU thread with RCU. - Paolo]
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      19306806
    • J
      Introduce the WHPX impl · 812d49f2
      Justin Terry (VM) 提交于
      Implements the Windows Hypervisor Platform accelerator (WHPX) target. Which
      acts as a hypervisor accelerator for QEMU on the Windows platform. This enables
      QEMU much greater speed over the emulated x86_64 path's that are taken on
      Windows today.
      
      1. Adds support for vPartition management.
      2. Adds support for vCPU management.
      3. Adds support for MMIO/PortIO.
      4. Registers the WHPX ACCEL_CLASS.
      Signed-off-by: NJustin Terry (VM) <juterry@microsoft.com>
      Message-Id: <1516655269-1785-4-git-send-email-juterry@microsoft.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      812d49f2
    • J
      Add the WHPX vcpu API · 29b22c79
      Justin Terry (VM) 提交于
      Adds support for the Windows Hypervisor Platform accelerator (WHPX) stubs and
      introduces the whpx.h sysemu API for managing the vcpu scheduling and
      management.
      Signed-off-by: NJustin Terry (VM) <juterry@microsoft.com>
      Message-Id: <1516655269-1785-3-git-send-email-juterry@microsoft.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      29b22c79
    • J
      Add the Windows Hypervisor Platform accelerator. · d661d9a4
      Justin Terry (VM) 提交于
      Introduces the configure support for the new Windows Hypervisor Platform that
      allows for hypervisor acceleration from usermode components on the Windows
      platform.
      Signed-off-by: NJustin Terry (VM) <juterry@microsoft.com>
      Message-Id: <1516655269-1785-2-git-send-email-juterry@microsoft.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      d661d9a4
    • K
      tests/test-filter-redirector: move close() · 8f6d7010
      Klim Kireev 提交于
      Since we have separate handler on POLLHUP, which drops data
      after closing the connection we need to fix this test, because
      it sends data and instantly close the socket creating race condition.
      In some cases on other end of socket client closes it faster than
      reads data. To prevent it I suggest to close socket after recieving.
      Signed-off-by: NKlim Kireev <klim.kireev@virtuozzo.com>
      Message-Id: <20180201134831.17709-1-klim.kireev@virtuozzo.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      8f6d7010
    • M
      tests: use memfd in vhost-user-test · 7e49f5e8
      Marc-André Lureau 提交于
      This will exercise the memfd memory backend and should generally be
      better for testing than memory-backend-file (thanks to anonymous files
      and sealing).
      
      If memfd is available, it is preferred.
      
      However, in order to check that file & memfd backends both work
      correctly, the read-guest-mem test is checked explicitly for each.
      Signed-off-by: NMarc-André Lureau <marcandre.lureau@redhat.com>
      Message-Id: <20180201132757.23063-8-marcandre.lureau@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      7e49f5e8
    • M
      vhost-user-test: make read-guest-mem setup its own qemu · 83265145
      Marc-André Lureau 提交于
      Signed-off-by: NMarc-André Lureau <marcandre.lureau@redhat.com>
      Message-Id: <20180201132757.23063-7-marcandre.lureau@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      83265145
    • M
      tests: keep compiling failing vhost-user tests · 7a9ec654
      Marc-André Lureau 提交于
      Let's protect the failing tests under a QTEST_VHOST_USER_FIXME
      environment variable, so we keep compiling the tests and we can easily
      run them.
      Signed-off-by: NMarc-André Lureau <marcandre.lureau@redhat.com>
      Message-Id: <20180201132757.23063-6-marcandre.lureau@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      7a9ec654
    • M
      Add memfd based hostmem · dbb9e0f4
      Marc-André Lureau 提交于
      Add a new memory backend, similar to hostmem-file, except that it
      doesn't need to create files. It also enforces memory sealing.
      
      This backend is mainly useful for sharing the memory with other
      processes.
      
      Note that Linux supports transparent huge-pages of shmem/memfd memory
      since 4.8. It is relatively easier to set up THP than a dedicate
      hugepage mount point by using "madvise" in
      /sys/kernel/mm/transparent_hugepage/shmem_enabled.
      
      Since 4.14, memfd allows to set hugetlb requirement explicitly.
      
      Pending for merge in 4.16 is memfd sealing support for hugetlb backed
      memory.
      
      Usage:
      -object memory-backend-memfd,id=mem1,size=1G
      Signed-off-by: NMarc-André Lureau <marcandre.lureau@redhat.com>
      Message-Id: <20180201132757.23063-5-marcandre.lureau@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      dbb9e0f4
    • M
      memfd: add hugetlbsize argument · 2ef8c0c9
      Marc-André Lureau 提交于
      Learn to specificy hugetlb size as qemu_memfd_create() argument.
      Signed-off-by: NMarc-André Lureau <marcandre.lureau@redhat.com>
      Message-Id: <20180201132757.23063-4-marcandre.lureau@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      2ef8c0c9
    • M
      memfd: add hugetlb support · c5b2a9e0
      Marc-André Lureau 提交于
      Linux commit 749df87bd7bee5a79cef073f5d032ddb2b211de8 (v4.14-rc1)
      added a new flag MFD_HUGETLB to memfd_create() that specify the file
      to be created resides in the hugetlbfs filesystem.  This is the
      generic hugetlbfs filesystem not associated with any specific mount
      point.
      
      hugetlbfs does not support sealing operations in v4.14, therefore
      specifying MFD_ALLOW_SEALING with MFD_HUGETLB will result in EINVAL.
      
      However, I added sealing support in "[PATCH v3 0/9] memfd: add sealing
      to hugetlb-backed memory" series, queued in -mm tree for v4.16.
      Signed-off-by: NMarc-André Lureau <marcandre.lureau@redhat.com>
      Message-Id: <20180201132757.23063-3-marcandre.lureau@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      c5b2a9e0
    • M
      memfd: add error argument, instead of perror() · 0f2956f9
      Marc-André Lureau 提交于
      This will allow callers to silence error report when the call is
      allowed to failed.
      Signed-off-by: NMarc-André Lureau <marcandre.lureau@redhat.com>
      Message-Id: <20180201132757.23063-2-marcandre.lureau@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      0f2956f9
    • P
      cpus: join thread when removing a vCPU · dbadee4f
      Paolo Bonzini 提交于
      If no one joins the thread, its associated memory is leaked.
      Reported-by: NCheneyLin <linzc@zju.edu.cn>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      dbadee4f
    • P
      cpus: hvf: unregister thread with RCU · 8178e637
      Paolo Bonzini 提交于
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      8178e637
    • P
      cpus: tcg: unregister thread with RCU, fix exiting of loop on unplug · 9b0605f9
      Paolo Bonzini 提交于
      Keep running until cpu_can_run(cpu) becomes false, for consistency
      with other acceslerators.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      9b0605f9
    • P
      d2831ab0
    • P
      cpus: kvm: unregister thread with RCU · 57615ed5
      Paolo Bonzini 提交于
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      57615ed5
    • P
    • L
      ivshmem: Disable irqfd on device reset · a4022791
      Ladi Prosek 提交于
      The effects of ivshmem_enable_irqfd() was not undone on device reset.
      
      This manifested as:
      ivshmem_add_kvm_msi_virq: Assertion `!s->msi_vectors[vector].pdev' failed.
      
      when irqfd was enabled before reset and then enabled again after reset, making
      ivshmem_enable_irqfd() run for the second time.
      
      To reproduce, run:
      
        ivshmem-server
      
      and QEMU with:
      
        -device ivshmem-doorbell,chardev=iv
        -chardev socket,path=/tmp/ivshmem_socket,id=iv
      
      then install the Windows driver, at the time of writing available at:
      
      https://github.com/virtio-win/kvm-guest-drivers-windows/tree/master/ivshmem
      
      and crash-reboot the guest by inducing a BSOD.
      Signed-off-by: NLadi Prosek <lprosek@redhat.com>
      Message-Id: <20171211072110.9058-5-lprosek@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      a4022791
    • L
      ivshmem: Improve MSI irqfd error handling · 0b88dd94
      Ladi Prosek 提交于
      Adds a rollback path to ivshmem_enable_irqfd() and fixes
      ivshmem_disable_irqfd() to bail if irqfd has not been enabled.
      
      To reproduce, run:
      
        ivshmem-server -n 0
      
      and QEMU with:
      
        -device ivshmem-doorbell,chardev=iv
        -chardev socket,path=/tmp/ivshmem_socket,id=iv
      
      then load, unload, and load again the Windows driver, at the time of writing
      available at:
      
      https://github.com/virtio-win/kvm-guest-drivers-windows/tree/master/ivshmem
      
      The issue is believed to have been masked by other guest drivers, notably
      Linux ones, not enabling MSI-X on the device.
      Signed-off-by: NLadi Prosek <lprosek@redhat.com>
      Reviewed-by: NMarkus Armbruster <armbru@redhat.com>
      Message-Id: <20171211072110.9058-4-lprosek@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      0b88dd94
    • L
      ivshmem: Always remove irqfd notifiers · 089fd803
      Ladi Prosek 提交于
      As of commit 660c97ee ("ivshmem: use kvm irqfd for msi notifications"),
      QEMU crashes with:
      
      ivshmem: msix_set_vector_notifiers failed
      msix_unset_vector_notifiers: Assertion `dev->msix_vector_use_notifier && dev->msix_vector_release_notifier' failed.
      
      if MSI-X is repeatedly enabled and disabled on the ivshmem device, for example
      by loading and unloading the Windows ivshmem driver. This is because
      msix_unset_vector_notifiers() doesn't call any of the release notifier callbacks
      since MSI-X is already disabled at that point (msix_enabled() returning false
      is how this transition is detected in the first place). Thus ivshmem_vector_mask()
      doesn't run and when MSI-X is subsequently enabled again ivshmem_vector_unmask()
      fails.
      
      This is fixed by keeping track of unmasked vectors and making sure that
      ivshmem_vector_mask() always runs on MSI-X disable.
      
      Fixes: 660c97ee ("ivshmem: use kvm irqfd for msi notifications")
      Signed-off-by: NLadi Prosek <lprosek@redhat.com>
      Reviewed-by: NMarkus Armbruster <armbru@redhat.com>
      Message-Id: <20171211072110.9058-3-lprosek@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      089fd803
    • L
      ivshmem: Don't update non-existent MSI routes · e6a354be
      Ladi Prosek 提交于
      As of commit 660c97ee ("ivshmem: use kvm irqfd for msi notifications"),
      QEMU crashes with:
      
        kvm_irqchip_commit_routes: Assertion `ret == 0' failed.
      
      if the ivshmem device is configured with more vectors than what the server
      supports. This is caused by the ivshmem_vector_unmask() being called on
      vectors that have not been initialized by ivshmem_add_kvm_msi_virq().
      
      This commit fixes it by adding a simple check to the mask and unmask
      callbacks.
      
      Note that the opposite mismatch, if the server supplies more vectors than
      what the device is configured for, is already handled and leads to output
      like:
      
        Too many eventfd received, device has 1 vectors
      
      To reproduce the assert, run:
      
        ivshmem-server -n 0
      
      and QEMU with:
      
        -device ivshmem-doorbell,chardev=iv
        -chardev socket,path=/tmp/ivshmem_socket,id=iv
      
      then load the Windows driver, at the time of writing available at:
      
      https://github.com/virtio-win/kvm-guest-drivers-windows/tree/master/ivshmem
      
      The issue is believed to have been masked by other guest drivers, notably
      Linux ones, not enabling MSI-X on the device.
      
      Fixes: 660c97ee ("ivshmem: use kvm irqfd for msi notifications")
      Signed-off-by: NLadi Prosek <lprosek@redhat.com>
      Reviewed-by: NMarc-André Lureau <marcandre.lureau@redhat.com>
      Reviewed-by: NMarkus Armbruster <armbru@redhat.com>
      Message-Id: <20171211072110.9058-2-lprosek@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      e6a354be
    • K
      chardev/char-socket: add POLLHUP handler · a8aa6197
      Klim Kireev 提交于
      The following behavior was observed for QEMU configured by libvirt
      to use guest agent as usual for the guests without virtio-serial
      driver (Windows or the guest remaining in BIOS stage).
      
      In QEMU on first connect to listen character device socket
      the listen socket is removed from poll just after the accept().
      virtio_serial_guest_ready() returns 0 and the descriptor
      of the connected Unix socket is removed from poll and it will
      not be present in poll() until the guest will initialize the driver
      and change the state of the serial to "guest connected".
      
      In libvirt connect() to guest agent is performed on restart and
      is run under VM state lock. Connect() is blocking and can
      wait forever.
      In this case libvirt can not perform ANY operation on that VM.
      
      The bug can be easily reproduced this way:
      
      Terminal 1:
      qemu-system-x86_64 -m 512 -device pci-serial,chardev=serial1 -chardev socket,id=serial1,path=/tmp/console.sock,server,nowait
      (virtio-serial and isa-serial also fit)
      
      Terminal 2:
      minicom -D unix\#/tmp/console.sock
      (type something and press enter)
      C-a x (to exit)
      
      Do 3 times:
      minicom -D unix\#/tmp/console.sock
      C-a x
      
      It needs 4 connections, because the first one is accepted by QEMU, then two are queued by
      the kernel, and the 4th blocks.
      
      The problem is that QEMU doesn't add a read watcher after succesful read
      until the guest device wants to acquire recieved data, so
      I propose to install a separate pullhup watcher regardless of
      whether the device waits for data or not.
      Signed-off-by: NKlim Kireev <klim.kireev@virtuozzo.com>
      Message-Id: <20180125135129.9305-1-klim.kireev@virtuozzo.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      a8aa6197
    • P
      memory: do explicit cleanup when remove listeners · d25836ca
      Peter Xu 提交于
      When unregister memory listeners, we should call, e.g.,
      region_del() (and possibly other undo operations) on every existing
      memory region sections there, otherwise we may leak resources that are
      held during the region_add(). This patch undo the stuff for the
      listeners, which emulates the case when the address space is set from
      current to an empty state.
      
      I found this problem when debugging a refcount leak issue that leads to
      a device unplug event lost (please see the "Bug:" line below).  In that
      case, the leakage of resource is the PCI BAR memory region refcount.
      And since memory regions are not keeping their own refcount but onto
      their owners, so the vfio-pci device's (who is the owner of the PCI BAR
      memory regions) refcount is leaked, and event missing.
      
      We had encountered similar issues before and fixed in other
      way (ee4c1128, "vhost: Release memory references on cleanup"). This
      patch can be seen as a more high-level fix of similar problems that are
      caused by the resource leaks from memory listeners. So now we can remove
      the explicit unref of memory regions since that'll be done altogether
      during unregistering of listeners now.
      
      Bug: https://bugzilla.redhat.com/show_bug.cgi?id=1531393Signed-off-by: NPeter Xu <peterx@redhat.com>
      Message-Id: <20180122060244.29368-5-peterx@redhat.com>
      Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      d25836ca
    • P
      vfio: listener unregister before unset container · 36968626
      Peter Xu 提交于
      After next patch, listener unregister will need the container to be
      alive.  Let's move this unregister phase to be before unset container,
      since that operation will free the backend container in kernel,
      otherwise we'll get these after next patch:
      
      qemu-system-x86_64: VFIO_UNMAP_DMA: -22
      qemu-system-x86_64: vfio_dma_unmap(0x559bf53a4590, 0x0, 0xa0000) = -22 (Invalid argument)
      Signed-off-by: NPeter Xu <peterx@redhat.com>
      Message-Id: <20180122060244.29368-4-peterx@redhat.com>
      Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
      Acked-by: NAlex Williamson <alex.williamson@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      36968626
    • P
      arm: postpone device listener unregister · 0bbe4354
      Peter Xu 提交于
      It's a preparation for follow-up patch to call region_del() in
      memory_listener_unregister(), otherwise all device addr attached with
      kvm_devices_head will be reset before calling kvm_arm_set_device_addr.
      Signed-off-by: NPeter Xu <peterx@redhat.com>
      Message-Id: <20180122060244.29368-3-peterx@redhat.com>
      Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      0bbe4354
    • P
      vhost: add traces for memory listeners · 0750b060
      Peter Xu 提交于
      Trace these operations on two memory listeners.  It helps to verify the
      new memory listener fix, and good to keep them there.
      Signed-off-by: NPeter Xu <peterx@redhat.com>
      Message-Id: <20180122060244.29368-2-peterx@redhat.com>
      Acked-by: NMichael S. Tsirkin <mst@redhat.com>
      Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      0750b060
    • M
      ucontext: annotate coroutine stack for ASAN · d83414e1
      Marc-André Lureau 提交于
      It helps ASAN to detect more leaks on coroutine stacks, and to get rid
      of some extra warnings.
      
      Before:
      
      tests/test-coroutine -p
      /basic/lifecycle
      /basic/lifecycle: ==20781==WARNING: ASan doesn't fully support
      makecontext/swapcontext functions and may produce false positives in
      some cases!
      ==20781==WARNING: ASan is ignoring requested __asan_handle_no_return:
      stack top: 0x7ffcb184d000; bottom 0x7ff6c4cfd000; size: 0x0005ecb50000
      (25446121472)
      False positive error reports may follow
      For details see https://github.com/google/sanitizers/issues/189
      OK
      
      After:
      
      tests/test-coroutine -p /basic/lifecycle
      /basic/lifecycle: ==21110==WARNING: ASan doesn't fully support
      makecontext/swapcontext functions and may produce false positives in
      some cases!
      OK
      
      A similar work would need to be done for sigaltstack & windows fibers
      to have similar coverage. Since ucontext is preferred, I didn't bother
      checking the other coroutine implementations for now.
      
      Update travis to fix the build with ASAN annotations.
      Signed-off-by: NMarc-André Lureau <marcandre.lureau@redhat.com>
      Message-Id: <20180116151152.4040-4-marcandre.lureau@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      d83414e1
    • M
      build-sys: add --enable-sanitizers · 247724cb
      Marc-André Lureau 提交于
      Typical slowdown introduced by AddressSanitizer is 2x.
      UBSan shouldn't have much impact on runtime cost.
      
      Enable it by default when --enable-debug, unless --disable-sanitizers.
      Signed-off-by: NMarc-André Lureau <marcandre.lureau@redhat.com>
      Message-Id: <20180116151152.4040-3-marcandre.lureau@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      247724cb
    • P
      Merge remote-tracking branch 'remotes/dgilbert/tags/pull-migration-20180206a' into staging · 0833df03
      Peter Maydell 提交于
      Migration pull 2018-02-06
      
      This is based off Juan's last pull with a few extras, but
      also removing:
         Add migration xbzrle test
         Add migration precopy test
      
      As well as my normal test boxes, I also gave it a test
      on a 32 bit ARM box and it seems happy (a Calxeda highbank)
      and a big-endian power box.
      
      Dave
      
      # gpg: Signature made Tue 06 Feb 2018 15:33:31 GMT
      # gpg:                using RSA key 0516331EBC5BFDE7
      # gpg: Good signature from "Dr. David Alan Gilbert (RH2) <dgilbert@redhat.com>"
      # Primary key fingerprint: 45F5 C71B 4A0C B7FB 977A  9FA9 0516 331E BC5B FDE7
      
      * remotes/dgilbert/tags/pull-migration-20180206a:
        migration: incoming postcopy advise sanity checks
        migration: Don't leak IO channels
        migration: Recover block devices if failure in device state
        tests: Adjust sleeps for migration test
        tests: Create migrate-start-postcopy command
        tests: Add deprecated commands migration test
        tests: Use consistent names for migration
        tests: Consolidate accelerators declaration
        tests: Remove deprecated migration tests commands
        migration: Drop current address parameter from save_zero_page()
        migration: use s->threshold_size inside migration_update_counters
        migration/savevm.c: set MAX_VM_CMD_PACKAGED_SIZE to 1ul << 32
        migration: Route errors down through migration_channel_connect
        migration: Allow migrate_fd_connect to take an Error *
      Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>
      0833df03
    • P
      Merge remote-tracking branch 'remotes/ehabkost/tags/python-next-pull-request' into staging · bc2943d6
      Peter Maydell 提交于
      Python queue, 2018-02-05
      
      # gpg: Signature made Mon 05 Feb 2018 23:07:57 GMT
      # gpg:                using RSA key 2807936F984DC5A6
      # gpg: Good signature from "Eduardo Habkost <ehabkost@redhat.com>"
      # Primary key fingerprint: 5A32 2FD5 ABC4 D3DB ACCF  D1AA 2807 936F 984D C5A6
      
      * remotes/ehabkost/tags/python-next-pull-request: (21 commits)
        docker: change Fedora images to run with python3
        travis: improve python version test coverage
        ui: update keycodemapdb to get py3 fixes
        input: add missing JIS keys to virtio input
        qemu.py: don't launch again before shutdown()
        qemu.py: cleanup redundant calls in launch()
        qemu.py: use poll() instead of 'returncode'
        qemu.py: always cleanup on shutdown()
        qemu.py: refactor launch()
        qemu.py: better control of created files
        qemu.py: remove unused import
        configure: allow use of python 3
        scripts: ensure signrom treats data as bytes
        qapi: force a UTF-8 locale for running Python
        qapi: ensure stable sort ordering when checking QAPI entities
        qapi: remove '-q' arg to diff when comparing QAPI output
        qapi: Adapt to moved location of 'maketrans' function in py3
        qapi: adapt to moved location of StringIO module in py3
        qapi: Use OrderedDict from standard library if available
        qapi: use items()/values() intead of iteritems()/itervalues()
        ...
      Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>
      bc2943d6
    • A
      vfio/pci: Add option to disable GeForce quirks · db32d0f4
      Alex Williamson 提交于
      These quirks are necessary for GeForce, but not for Quadro/GRID/Tesla
      assignment.  Leaving them enabled is fully functional and provides the
      most compatibility, but due to the unique NVIDIA MSI ACK behavior[1],
      it also introduces latency in re-triggering the MSI interrupt.  This
      overhead is typically negligible, but has been shown to adversely
      affect some (very) high interrupt rate applications.  This adds the
      vfio-pci device option "x-no-geforce-quirks=" which can be set to
      "on" to disable this additional overhead.
      
      A follow-on optimization for GeForce might be to make use of an
      ioeventfd to allow KVM to trigger an irqfd in the kernel vfio-pci
      driver, avoiding the bounce through userspace to handle this device
      write.
      
      [1] Background: the NVIDIA driver has been observed to issue a write
      to the MMIO mirror of PCI config space in BAR0 in order to allow the
      MSI interrupt for the device to retrigger.  Older reports indicated a
      write of 0xff to the (read-only) MSI capability ID register, while
      more recently a write of 0x0 is observed at config space offset 0x704,
      non-architected, extended config space of the device (BAR0 offset
      0x88704).  Virtualization of this range is only required for GeForce.
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      db32d0f4
    • A
      vfio/common: Remove redundant copy of local variable · a5b04f7c
      Alexey Kardashevskiy 提交于
      There is already @hostwin in vfio_listener_region_add() so there is no
      point in having the other one.
      
      Fixes: 2e4109de ("vfio/spapr: Create DMA window dynamically (SPAPR IOMMU v2)")
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      a5b04f7c
    • E
      hw/vfio/platform: Init the interrupt mutex · 89202c6f
      Eric Auger 提交于
      Add the initialization of the mutex protecting the interrupt list.
      Signed-off-by: NEric Auger <eric.auger@redhat.com>
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      89202c6f
    • A
      vfio/pci: Allow relocating MSI-X MMIO · 89d5202e
      Alex Williamson 提交于
      Recently proposed vfio-pci kernel changes (v4.16) remove the
      restriction preventing userspace from mmap'ing PCI BARs in areas
      overlapping the MSI-X vector table.  This change is primarily intended
      to benefit host platforms which make use of system page sizes larger
      than the PCI spec recommendation for alignment of MSI-X data
      structures (ie. not x86_64).  In the case of POWER systems, the SPAPR
      spec requires the VM to program MSI-X using hypercalls, rendering the
      MSI-X vector table unused in the VM view of the device.  However,
      ARM64 platforms also support 64KB pages and rely on QEMU emulation of
      MSI-X.  Regardless of the kernel driver allowing mmaps overlapping
      the MSI-X vector table, emulation of the MSI-X vector table also
      prevents direct mapping of device MMIO spaces overlapping this page.
      Thanks to the fact that PCI devices have a standard self discovery
      mechanism, we can try to resolve this by relocating the MSI-X data
      structures, either by creating a new PCI BAR or extending an existing
      BAR and updating the MSI-X capability for the new location.  There's
      even a very slim chance that this could benefit devices which do not
      adhere to the PCI spec alignment guidelines on x86_64 systems.
      
      This new x-msix-relocation option accepts the following choices:
      
        off: Disable MSI-X relocation, use native device config (default)
        auto: Use a known good combination for the platform/device (none yet)
        bar0..bar5: Specify the target BAR for MSI-X data structures
      
      If compatible, the target BAR will either be created or extended and
      the new portion will be used for MSI-X emulation.
      
      The first obvious user question with this option is how to determine
      whether a given platform and device might benefit from this option.
      In most cases, the answer is that it won't, especially on x86_64.
      Devices often dedicate an entire BAR to MSI-X and therefore no
      performance sensitive registers overlap the MSI-X area.  Take for
      example:
      
      # lspci -vvvs 0a:00.0
      0a:00.0 Ethernet controller: Intel Corporation I350 Gigabit Network Connection
      	...
      	Region 0: Memory at db680000 (32-bit, non-prefetchable) [size=512K]
      	Region 3: Memory at db7f8000 (32-bit, non-prefetchable) [size=16K]
      	...
      	Capabilities: [70] MSI-X: Enable+ Count=10 Masked-
      		Vector table: BAR=3 offset=00000000
      		PBA: BAR=3 offset=00002000
      
      This device uses the 16K bar3 for MSI-X with the vector table at
      offset zero and the pending bits arrary at offset 8K, fully honoring
      the PCI spec alignment guidance.  The data sheet specifically refers
      to this as an MSI-X BAR.  This device would not see a benefit from
      MSI-X relocation regardless of the platform, regardless of the page
      size.
      
      However, here's another example:
      
      # lspci -vvvs 02:00.0
      02:00.0 Serial Attached SCSI controller: xxxxxxxx
      	...
      	Region 0: I/O ports at c000 [size=256]
      	Region 1: Memory at ef640000 (64-bit, non-prefetchable) [size=64K]
      	Region 3: Memory at ef600000 (64-bit, non-prefetchable) [size=256K]
      	...
      	Capabilities: [c0] MSI-X: Enable+ Count=16 Masked-
      		Vector table: BAR=1 offset=0000e000
      		PBA: BAR=1 offset=0000f000
      
      Here the MSI-X data structures are placed on separate 4K pages at the
      end of a 64KB BAR.  If our host page size is 4K, we're likely fine,
      but at 64KB page size, MSI-X emulation at that location prevents the
      entire BAR from being directly mapped into the VM address space.
      Overlapping performance sensitive registers then starts to be a very
      likely scenario on such a platform.  At this point, the user could
      enable tracing on vfio_region_read and vfio_region_write to determine
      more conclusively if device accesses are being trapped through QEMU.
      
      Upon finding a device and platform in need of MSI-X relocation, the
      next problem is how to choose target PCI BAR to host the MSI-X data
      structures.  A few key rules to keep in mind for this selection
      include:
      
       * There are only 6 BAR slots, bar0..bar5
       * 64-bit BARs occupy two BAR slots, 'lspci -vvv' lists the first slot
       * PCI BARs are always a power of 2 in size, extending == doubling
       * The maximum size of a 32-bit BAR is 2GB
       * MSI-X data structures must reside in an MMIO BAR
      
      Using these rules, we can evaluate each BAR of the second example
      device above as follows:
      
       bar0: I/O port BAR, incompatible with MSI-X tables
       bar1: BAR could be extended, incurring another 64KB of MMIO
       bar2: Unavailable, bar1 is 64-bit, this register is used by bar1
       bar3: BAR could be extended, incurring another 256KB of MMIO
       bar4: Unavailable, bar3 is 64bit, this register is used by bar3
       bar5: Available, empty BAR, minimum additional MMIO
      
      A secondary optimization we might wish to make in relocating MSI-X
      is to minimize the additional MMIO required for the device, therefore
      we might test the available choices in order of preference as bar5,
      bar1, and finally bar3.  The original proposal for this feature
      included an 'auto' option which would choose bar5 in this case, but
      various drivers have been found that make assumptions about the
      properties of the "first" BAR or the size of BARs such that there
      appears to be no foolproof automatic selection available, requiring
      known good combinations to be sourced from users.  This patch is
      pre-enabled for an 'auto' selection making use of a validated lookup
      table, but no entries are yet identified.
      Tested-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Reviewed-by: NEric Auger <eric.auger@redhat.com>
      Tested-by: NEric Auger <eric.auger@redhat.com>
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      89d5202e
    • A
      qapi: Create DEFINE_PROP_OFF_AUTO_PCIBAR · c3bbbdbf
      Alex Williamson 提交于
      Add an option which allows the user to specify a PCI BAR number,
      including an 'off' and 'auto' selection.
      
      Cc: Markus Armbruster <armbru@redhat.com>
      Cc: Eric Blake <eblake@redhat.com>
      Tested-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Reviewed-by: NEric Auger <eric.auger@redhat.com>
      Tested-by: NEric Auger <eric.auger@redhat.com>
      Reviewed-by: NMarkus Armbruster <armbru@redhat.com>
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      c3bbbdbf
    • A
      vfio/pci: Emulate BARs · 04f336b0
      Alex Williamson 提交于
      The kernel provides similar emulation of PCI BAR register access to
      QEMU, so up until now we've used that for things like BAR sizing and
      storing the BAR address.  However, if we intend to resize BARs or add
      BARs that don't exist on the physical device, we need to switch to the
      pure QEMU emulation of the BAR.
      Tested-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Reviewed-by: NEric Auger <eric.auger@redhat.com>
      Tested-by: NEric Auger <eric.auger@redhat.com>
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      04f336b0