1. 05 8月, 2014 1 次提交
    • P
      KVM: Don't keep reference to irq routing table in irqfd struct · 56f89f36
      Paul Mackerras 提交于
      This makes the irqfd code keep a copy of the irq routing table entry
      for each irqfd, rather than a reference to the copy in the actual
      irq routing table maintained in kvm/virt/irqchip.c.  This will enable
      us to change the routing table structure in future, or even not have a
      routing table at all on some platforms.
      
      The synchronization that was previously achieved using srcu_dereference
      on the read side is now achieved using a seqcount_t structure.  That
      ensures that we don't get a halfway-updated copy of the structure if
      we read it while another thread is updating it.
      
      We still use srcu_read_lock/unlock around the read side so that when
      changing the routing table we can be sure that after calling
      synchronize_srcu, nothing will be using the old routing.
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Tested-by: NEric Auger <eric.auger@linaro.org>
      Tested-by: NCornelia Huck <cornelia.huck@de.ibm.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      56f89f36
  2. 31 7月, 2014 2 次提交
    • M
      KVM: arm64: GICv3: mandate page-aligned GICV region · fb3ec679
      Marc Zyngier 提交于
      Just like GICv2 was fixed in 63afbe7a
      (kvm: arm64: vgic: fix hyp panic with 64k pages on juno platform),
      mandate the GICV region to be both aligned on a page boundary and
      its size to be a multiple of page size.
      
      This prevents a guest from being able to poke at regions where we
      have no idea what is sitting there.
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      fb3ec679
    • P
      KVM: x86: always exit on EOIs for interrupts listed in the IOAPIC redir table · 0f6c0a74
      Paolo Bonzini 提交于
      Currently, the EOI exit bitmap (used for APICv) does not include
      interrupts that are masked.  However, this can cause a bug that manifests
      as an interrupt storm inside the guest.  Alex Williamson reported the
      bug and is the one who really debugged this; I only wrote the patch. :)
      
      The scenario involves a multi-function PCI device with OHCI and EHCI
      USB functions and an audio function, all assigned to the guest, where
      both USB functions use legacy INTx interrupts.
      
      As soon as the guest boots, interrupts for these devices turn into an
      interrupt storm in the guest; the host does not see the interrupt storm.
      Basically the EOI path does not work, and the guest continues to see the
      interrupt over and over, even after it attempts to mask it at the APIC.
      The bug is only visible with older kernels (RHEL6.5, based on 2.6.32
      with not many changes in the area of APIC/IOAPIC handling).
      
      Alex then tried forcing bit 59 (corresponding to the USB functions' IRQ)
      on in the eoi_exit_bitmap and TMR, and things then work.  What happens
      is that VFIO asserts IRQ11, then KVM recomputes the EOI exit bitmap.
      It does not have set bit 59 because the RTE was masked, so the IOAPIC
      never sees the EOI and the interrupt continues to fire in the guest.
      
      My guess was that the guest is masking the interrupt in the redirection
      table in the interrupt routine, i.e. while the interrupt is set in a
      LAPIC's ISR, The simplest fix is to ignore the masking state, we would
      rather have an unnecessary exit rather than a missed IRQ ACK and anyway
      IOAPIC interrupts are not as performance-sensitive as for example MSIs.
      Alex tested this patch and it fixed his bug.
      
      [Thanks to Alex for his precise description of the problem
       and initial debugging effort.  A lot of the text above is
       based on emails exchanged with him.]
      Reported-by: NAlex Williamson <alex.williamson@redhat.com>
      Tested-by: NAlex Williamson <alex.williamson@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      0f6c0a74
  3. 30 7月, 2014 1 次提交
    • W
      kvm: arm64: vgic: fix hyp panic with 64k pages on juno platform · 63afbe7a
      Will Deacon 提交于
      If the physical address of GICV isn't page-aligned, then we end up
      creating a stage-2 mapping of the page containing it, which causes us to
      map neighbouring memory locations directly into the guest.
      
      As an example, consider a platform with GICV at physical 0x2c02f000
      running a 64k-page host kernel. If qemu maps this into the guest at
      0x80010000, then guest physical addresses 0x80010000 - 0x8001efff will
      map host physical region 0x2c020000 - 0x2c02efff. Accesses to these
      physical regions may cause UNPREDICTABLE behaviour, for example, on the
      Juno platform this will cause an SError exception to EL3, which brings
      down the entire physical CPU resulting in RCU stalls / HYP panics / host
      crashing / wasted weeks of debugging.
      
      SBSA recommends that systems alias the 4k GICV across the bounding 64k
      region, in which case GICV physical could be described as 0x2c020000 in
      the above scenario.
      
      This patch fixes the problem by failing the vgic probe if the physical
      base address or the size of GICV aren't page-aligned. Note that this
      generated a warning in dmesg about freeing enabled IRQs, so I had to
      move the IRQ enabling later in the probe.
      
      Cc: Christoffer Dall <christoffer.dall@linaro.org>
      Cc: Marc Zyngier <marc.zyngier@arm.com>
      Cc: Gleb Natapov <gleb@kernel.org>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Joel Schopp <joel.schopp@amd.com>
      Cc: Don Dutile <ddutile@redhat.com>
      Acked-by: NPeter Maydell <peter.maydell@linaro.org>
      Acked-by: NJoel Schopp <joel.schopp@amd.com>
      Acked-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      63afbe7a
  4. 28 7月, 2014 2 次提交
  5. 25 7月, 2014 1 次提交
  6. 11 7月, 2014 16 次提交
  7. 05 6月, 2014 1 次提交
  8. 03 6月, 2014 1 次提交
  9. 05 5月, 2014 1 次提交
    • C
      kvm/irqchip: Speed up KVM_SET_GSI_ROUTING · 719d93cd
      Christian Borntraeger 提交于
      When starting lots of dataplane devices the bootup takes very long on
      Christian's s390 with irqfd patches. With larger setups he is even
      able to trigger some timeouts in some components.  Turns out that the
      KVM_SET_GSI_ROUTING ioctl takes very long (strace claims up to 0.1 sec)
      when having multiple CPUs.  This is caused by the  synchronize_rcu and
      the HZ=100 of s390.  By changing the code to use a private srcu we can
      speed things up.  This patch reduces the boot time till mounting root
      from 8 to 2 seconds on my s390 guest with 100 disks.
      
      Uses of hlist_for_each_entry_rcu, hlist_add_head_rcu, hlist_del_init_rcu
      are fine because they do not have lockdep checks (hlist_for_each_entry_rcu
      uses rcu_dereference_raw rather than rcu_dereference, and write-sides
      do not do rcu lockdep at all).
      
      Note that we're hardly relying on the "sleepable" part of srcu.  We just
      want SRCU's faster detection of grace periods.
      
      Testing was done by Andrew Theurer using netperf tests STREAM, MAERTS
      and RR.  The difference between results "before" and "after" the patch
      has mean -0.2% and standard deviation 0.6%.  Using a paired t-test on the
      data points says that there is a 2.5% probability that the patch is the
      cause of the performance difference (rather than a random fluctuation).
      
      (Restricting the t-test to RR, which is the most likely to be affected,
      changes the numbers to respectively -0.3% mean, 0.7% stdev, and 8%
      probability that the numbers actually say something about the patch.
      The probability increases mostly because there are fewer data points).
      
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Michael S. Tsirkin <mst@redhat.com>
      Tested-by: Christian Borntraeger <borntraeger@de.ibm.com> # s390
      Reviewed-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      719d93cd
  10. 29 4月, 2014 1 次提交
  11. 28 4月, 2014 6 次提交
  12. 24 4月, 2014 1 次提交
  13. 22 4月, 2014 1 次提交
  14. 18 4月, 2014 2 次提交
    • M
      KVM: VMX: speed up wildcard MMIO EVENTFD · 68c3b4d1
      Michael S. Tsirkin 提交于
      With KVM, MMIO is much slower than PIO, due to the need to
      do page walk and emulation. But with EPT, it does not have to be: we
      know the address from the VMCS so if the address is unique, we can look
      up the eventfd directly, bypassing emulation.
      
      Unfortunately, this only works if userspace does not need to match on
      access length and data.  The implementation adds a separate FAST_MMIO
      bus internally. This serves two purposes:
          - minimize overhead for old userspace that does not use eventfd with lengtth = 0
          - minimize disruption in other code (since we don't know the length,
            devices on the MMIO bus only get a valid address in write, this
            way we don't need to touch all devices to teach them to handle
            an invalid length)
      
      At the moment, this optimization only has effect for EPT on x86.
      
      It will be possible to speed up MMIO for NPT and MMU using the same
      idea in the future.
      
      With this patch applied, on VMX MMIO EVENTFD is essentially as fast as PIO.
      I was unable to detect any measureable slowdown to non-eventfd MMIO.
      
      Making MMIO faster is important for the upcoming virtio 1.0 which
      includes an MMIO signalling capability.
      
      The idea was suggested by Peter Anvin.  Lots of thanks to Gleb for
      pre-review and suggestions.
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      68c3b4d1
    • M
      KVM: support any-length wildcard ioeventfd · f848a5a8
      Michael S. Tsirkin 提交于
      It is sometimes benefitial to ignore IO size, and only match on address.
      In hindsight this would have been a better default than matching length
      when KVM_IOEVENTFD_FLAG_DATAMATCH is not set, In particular, this kind
      of access can be optimized on VMX: there no need to do page lookups.
      This can currently be done with many ioeventfds but in a suboptimal way.
      
      However we can't change kernel/userspace ABI without risk of breaking
      some applications.
      Use len = 0 to mean "ignore length for matching" in a more optimal way.
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      f848a5a8
  15. 08 4月, 2014 1 次提交
  16. 04 4月, 2014 2 次提交
    • P
      KVM: ioapic: try to recover if pending_eoi goes out of range · 4009b249
      Paolo Bonzini 提交于
      The RTC tracking code tracks the cardinality of rtc_status.dest_map
      into rtc_status.pending_eoi.  It has some WARN_ONs that trigger if
      pending_eoi ever becomes negative; however, these do not do anything
      to recover, and it bad things will happen soon after they trigger.
      
      When the next RTC interrupt is triggered, rtc_check_coalesced() will
      return false, but ioapic_service will find pending_eoi != 0 and
      do a BUG_ON.  To avoid this, should pending_eoi ever be nonzero,
      call kvm_rtc_eoi_tracking_restore_all to recompute a correct
      dest_map and pending_eoi.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      4009b249
    • P
      KVM: ioapic: fix assignment of ioapic->rtc_status.pending_eoi (CVE-2014-0155) · 5678de3f
      Paolo Bonzini 提交于
      QE reported that they got the BUG_ON in ioapic_service to trigger.
      I cannot reproduce it, but there are two reasons why this could happen.
      
      The less likely but also easiest one, is when kvm_irq_delivery_to_apic
      does not deliver to any APIC and returns -1.
      
      Because irqe.shorthand == 0, the kvm_for_each_vcpu loop in that
      function is never reached.  However, you can target the similar loop in
      kvm_irq_delivery_to_apic_fast; just program a zero logical destination
      address into the IOAPIC, or an out-of-range physical destination address.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      5678de3f