1. 04 4月, 2014 1 次提交
    • P
      KVM: ioapic: fix assignment of ioapic->rtc_status.pending_eoi (CVE-2014-0155) · 5678de3f
      Paolo Bonzini 提交于
      QE reported that they got the BUG_ON in ioapic_service to trigger.
      I cannot reproduce it, but there are two reasons why this could happen.
      
      The less likely but also easiest one, is when kvm_irq_delivery_to_apic
      does not deliver to any APIC and returns -1.
      
      Because irqe.shorthand == 0, the kvm_for_each_vcpu loop in that
      function is never reached.  However, you can target the similar loop in
      kvm_irq_delivery_to_apic_fast; just program a zero logical destination
      address into the IOAPIC, or an out-of-range physical destination address.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      5678de3f
  2. 21 3月, 2014 4 次提交
  3. 19 3月, 2014 1 次提交
    • C
      KVM: eventfd: Fix lock order inversion. · 684a0b71
      Cornelia Huck 提交于
      When registering a new irqfd, we call its ->poll method to collect any
      event that might have previously been pending so that we can trigger it.
      This is done under the kvm->irqfds.lock, which means the eventfd's ctx
      lock is taken under it.
      
      However, if we get a POLLHUP in irqfd_wakeup, we will be called with the
      ctx lock held before getting the irqfds.lock to deactivate the irqfd,
      causing lockdep to complain.
      
      Calling the ->poll method does not really need the irqfds.lock, so let's
      just move it after we've given up the irqfds.lock in kvm_irqfd_assign().
      Signed-off-by: NCornelia Huck <cornelia.huck@de.ibm.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      684a0b71
  4. 13 3月, 2014 1 次提交
    • G
      kvm: x86: ignore ioapic polarity · 100943c5
      Gabriel L. Somlo 提交于
      Both QEMU and KVM have already accumulated a significant number of
      optimizations based on the hard-coded assumption that ioapic polarity
      will always use the ActiveHigh convention, where the logical and
      physical states of level-triggered irq lines always match (i.e.,
      active(asserted) == high == 1, inactive == low == 0). QEMU guests
      are expected to follow directions given via ACPI and configure the
      ioapic with polarity 0 (ActiveHigh). However, even when misbehaving
      guests (e.g. OS X <= 10.9) set the ioapic polarity to 1 (ActiveLow),
      QEMU will still use the ActiveHigh signaling convention when
      interfacing with KVM.
      
      This patch modifies KVM to completely ignore ioapic polarity as set by
      the guest OS, enabling misbehaving guests to work alongside those which
      comply with the ActiveHigh polarity specified by QEMU's ACPI tables.
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NGabriel L. Somlo <somlo@cmu.edu>
      [Move documentation to KVM_IRQ_LINE, add ia64. - Paolo]
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      100943c5
  5. 27 2月, 2014 1 次提交
  6. 18 2月, 2014 1 次提交
  7. 14 2月, 2014 2 次提交
  8. 04 2月, 2014 1 次提交
  9. 30 1月, 2014 4 次提交
  10. 15 1月, 2014 2 次提交
    • S
      kvm: Provide kvm_vcpu_eligible_for_directed_yield() stub · 4a55dd72
      Scott Wood 提交于
      Commit 7940876e ("kvm: make local
      functions static") broke KVM PPC builds due to removing (rather than
      moving) the stub version of kvm_vcpu_eligible_for_directed_yield().
      
      This patch reintroduces it.
      Signed-off-by: NScott Wood <scottwood@freescale.com>
      Cc: Stephen Hemminger <stephen@networkplumber.org>
      Cc: Alexander Graf <agraf@suse.de>
      [Move the #ifdef inside the function. - Paolo]
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      4a55dd72
    • P
      kvm: vfio: silence GCC warning · e81d1ad3
      Paul Bolle 提交于
      Building vfio.o triggers a GCC warning (when building for 32 bits x86):
          arch/x86/kvm/../../../virt/kvm/vfio.c: In function 'kvm_vfio_set_group':
          arch/x86/kvm/../../../virt/kvm/vfio.c:104:22: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
            void __user *argp = (void __user *)arg;
                                ^
      
      Silence this warning by casting arg to unsigned long.
      
      argp's current type, "void __user *", is always casted to "int32_t
      __user *". So its type might as well be changed to "int32_t __user *".
      Signed-off-by: NPaul Bolle <pebolle@tiscali.nl>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      e81d1ad3
  11. 09 1月, 2014 2 次提交
  12. 22 12月, 2013 10 次提交
    • C
      KVM: arm-vgic: Support CPU interface reg access · fa20f5ae
      Christoffer Dall 提交于
      Implement support for the CPU interface register access driven by MMIO
      address offsets from the CPU interface base address.  Useful for user
      space to support save/restore of the VGIC state.
      
      This commit adds support only for the same logic as the current VGIC
      support, and no more.  For example, the active priority registers are
      handled as RAZ/WI, just like setting priorities on the emulated
      distributor.
      Acked-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      fa20f5ae
    • C
      KVM: arm-vgic: Add GICD_SPENDSGIR and GICD_CPENDSGIR handlers · 90a5355e
      Christoffer Dall 提交于
      Handle MMIO accesses to the two registers which should support both the
      case where the VMs want to read/write either of these registers and the
      case where user space reads/writes these registers to do save/restore of
      the VGIC state.
      
      Note that the added complexity compared to simple set/clear enable
      registers stems from the bookkeping of source cpu ids.  It may be
      possible to change the underlying data structure to simplify the
      complexity, but since this is not in the critical path at all, this will
      do.
      
      Also note that reading this register from a live guest will not be
      accurate compared to on hardware, because some state may be living on
      the CPU LRs and the only way to give a consistent read would be to force
      stop all the VCPUs and request them to unqueu the LR state onto the
      distributor.  Until we have an actual user of live reading this
      register, we can live with the difference.
      Acked-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      90a5355e
    • C
      KVM: arm-vgic: Support unqueueing of LRs to the dist · cbd333a4
      Christoffer Dall 提交于
      To properly access the VGIC state from user space it is very unpractical
      to have to loop through all the LRs in all register access functions.
      Instead, support moving all pending state from LRs to the distributor,
      but leave active state LRs alone.
      
      Note that to accurately present the active and pending state to VCPUs
      reading these distributor registers from a live VM, we would have to
      stop all other VPUs than the calling VCPU and ask each CPU to unqueue
      their LR state onto the distributor and add fields to track active state
      on the distributor side as well.  We don't have any users of such
      functionality yet and there are other inaccuracies of the GIC emulation,
      so don't provide accurate synchronized access to this state just yet.
      However, when the time comes, having this function should help.
      Acked-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      cbd333a4
    • C
      KVM: arm-vgic: Add vgic reg access from dev attr · c07a0191
      Christoffer Dall 提交于
      Add infrastructure to handle distributor and cpu interface register
      accesses through the KVM_{GET/SET}_DEVICE_ATTR interface by adding the
      KVM_DEV_ARM_VGIC_GRP_DIST_REGS and KVM_DEV_ARM_VGIC_GRP_CPU_REGS groups
      and defining the semantics of the attr field to be the MMIO offset as
      specified in the GICv2 specs.
      
      Missing register accesses or other changes in individual register access
      functions to support save/restore of the VGIC state is added in
      subsequent patches.
      Acked-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      c07a0191
    • C
      KVM: arm-vgic: Make vgic mmio functions more generic · 1006e8cb
      Christoffer Dall 提交于
      Rename the vgic_ranges array to vgic_dist_ranges to be more specific and
      to prepare for handling CPU interface register access as well (for
      save/restore of VGIC state).
      
      Pass offset from distributor or interface MMIO base to
      find_matching_range function instead of the physical address of the
      access in the VM memory map.  This allows other callers unaware of the
      VM specifics, but with generic VGIC knowledge to reuse the function.
      Acked-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      1006e8cb
    • C
      KVM: arm-vgic: Set base addr through device API · ce01e4e8
      Christoffer Dall 提交于
      Support setting the distributor and cpu interface base addresses in the
      VM physical address space through the KVM_{SET,GET}_DEVICE_ATTR API
      in addition to the ARM specific API.
      
      This has the added benefit of being able to share more code in user
      space and do things in a uniform manner.
      
      Also deprecate the older API at the same time, but backwards
      compatibility will be maintained.
      Acked-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      ce01e4e8
    • C
      KVM: arm-vgic: Support KVM_CREATE_DEVICE for VGIC · 7330672b
      Christoffer Dall 提交于
      Support creating the ARM VGIC device through the KVM_CREATE_DEVICE
      ioctl, which can then later be leveraged to use the
      KVM_{GET/SET}_DEVICE_ATTR, which is useful both for setting addresses in
      a more generic API than the ARM-specific one and is useful for
      save/restore of VGIC state.
      
      Adds KVM_CAP_DEVICE_CTRL to ARM capabilities.
      
      Note that we change the check for creating a VGIC from bailing out if
      any VCPUs were created, to bailing out if any VCPUs were ever run.  This
      is an important distinction that shouldn't break anything, but allows
      creating the VGIC after the VCPUs have been created.
      Acked-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      7330672b
    • C
      ARM: KVM: Allow creating the VGIC after VCPUs · e1ba0207
      Christoffer Dall 提交于
      Rework the VGIC initialization slightly to allow initialization of the
      vgic cpu-specific state even if the irqchip (the VGIC) hasn't been
      created by user space yet.  This is safe, because the vgic data
      structures are already allocated when the CPU is allocated if VGIC
      support is compiled into the kernel.  Further, the init process does not
      depend on any other information and the sacrifice is a slight
      performance degradation for creating VMs in the no-VGIC case.
      
      The reason is that the new device control API doesn't mandate creating
      the VGIC before creating the VCPU and it is unreasonable to require user
      space to create the VGIC before creating the VCPUs.
      
      At the same time move the irqchip_in_kernel check out of
      kvm_vcpu_first_run_init and into the init function to make the per-vcpu
      and global init functions symmetric and add comments on the exported
      functions making it a bit easier to understand the init flow by only
      looking at vgic.c.
      Acked-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      e1ba0207
    • A
      ARM/KVM: save and restore generic timer registers · 39735a3a
      Andre Przywara 提交于
      For migration to work we need to save (and later restore) the state of
      each core's virtual generic timer.
      Since this is per VCPU, we can use the [gs]et_one_reg ioctl and export
      the three needed registers (control, counter, compare value).
      Though they live in cp15 space, we don't use the existing list, since
      they need special accessor functions and the arch timer is optional.
      Acked-by: NMarc Zynger <marc.zyngier@arm.com>
      Signed-off-by: NAndre Przywara <andre.przywara@linaro.org>
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      39735a3a
    • C
      arm/arm64: KVM: arch_timer: Initialize cntvoff at kvm_init · a1a64387
      Christoffer Dall 提交于
      Initialize the cntvoff at kvm_init_vm time, not before running the VCPUs
      at the first time because that will overwrite any potentially restored
      values from user space.
      
      Cc: Andre Przywara <andre.przywara@linaro.org>
      Acked-by: NMarc Zynger <marc.zyngier@arm.com>
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      a1a64387
  13. 13 12月, 2013 2 次提交
  14. 21 11月, 2013 1 次提交
    • H
      KVM: kvm_clear_guest_page(): fix empty_zero_page usage · 8a3caa6d
      Heiko Carstens 提交于
      Using the address of 'empty_zero_page' as source address in order to
      clear a page is wrong. On some architectures empty_zero_page is only the
      pointer to the struct page of the empty_zero_page.  Therefore the clear
      page operation would copy the contents of a couple of struct pages instead
      of clearing a page.  For kvm only arm/arm64 are affected by this bug.
      
      To fix this use the ZERO_PAGE macro instead which will return the struct
      page address of the empty_zero_page on all architectures.
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NGleb Natapov <gleb@redhat.com>
      8a3caa6d
  15. 06 11月, 2013 1 次提交
  16. 05 11月, 2013 1 次提交
    • G
      KVM: IOMMU: hva align mapping page size · 27ef63c7
      Greg Edwards 提交于
      When determining the page size we could use to map with the IOMMU, the
      page size should also be aligned with the hva, not just the gfn.  The
      gfn may not reflect the real alignment within the hugetlbfs file.
      
      Most of the time, this works fine.  However, if the hugetlbfs file is
      backed by non-contiguous huge pages, a multi-huge page memslot starts at
      an unaligned offset within the hugetlbfs file, and the gfn is aligned
      with respect to the huge page size, kvm_host_page_size() will return the
      huge page size and we will use that to map with the IOMMU.
      
      When we later unpin that same memslot, the IOMMU returns the unmap size
      as the huge page size, and we happily unpin that many pfns in
      monotonically increasing order, not realizing we are spanning
      non-contiguous huge pages and partially unpin the wrong huge page.
      
      Ensure the IOMMU mapping page size is aligned with the hva corresponding
      to the gfn, which does reflect the alignment within the hugetlbfs file.
      Reviewed-by: NMarcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: NGreg Edwards <gedwards@ddn.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NGleb Natapov <gleb@redhat.com>
      27ef63c7
  17. 31 10月, 2013 3 次提交
  18. 30 10月, 2013 1 次提交
  19. 28 10月, 2013 1 次提交
    • Y
      KVM: Mapping IOMMU pages after updating memslot · e0230e13
      Yang Zhang 提交于
      In kvm_iommu_map_pages(), we need to know the page size via call
      kvm_host_page_size(). And it will check whether the target slot
      is valid before return the right page size.
      Currently, we will map the iommu pages when creating a new slot.
      But we call kvm_iommu_map_pages() during preparing the new slot.
      At that time, the new slot is not visible by domain(still in preparing).
      So we cannot get the right page size from kvm_host_page_size() and
      this will break the IOMMU super page logic.
      The solution is to map the iommu pages after we insert the new slot
      into domain.
      Signed-off-by: NYang Zhang <yang.z.zhang@Intel.com>
      Tested-by: NPatrick Lu <patrick.lu@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      e0230e13