1. 21 7月, 2015 2 次提交
  2. 17 6月, 2015 6 次提交
    • M
      arm/arm64: KVM: vgic: Do not save GICH_HCR / ICH_HCR_EL2 · 4642019d
      Marc Zyngier 提交于
      The GIC Hypervisor Configuration Register is used to enable
      the delivery of virtual interupts to a guest, as well as to
      define in which conditions maintenance interrupts are delivered
      to the host.
      
      This register doesn't contain any information that we need to
      read back (the EOIcount is utterly useless for us).
      
      So let's save ourselves some cycles, and not save it before
      writing zero to it.
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      4642019d
    • L
      ARM: kvm: psci: fix handling of unimplemented functions · e2d99736
      Lorenzo Pieralisi 提交于
      According to the PSCI specification and the SMC/HVC calling
      convention, PSCI function_ids that are not implemented must
      return NOT_SUPPORTED as return value.
      
      Current KVM implementation takes an unhandled PSCI function_id
      as an error and injects an undefined instruction into the guest
      if PSCI implementation is called with a function_id that is not
      handled by the resident PSCI version (ie it is not implemented),
      which is not the behaviour expected by a guest when calling a
      PSCI function_id that is not implemented.
      
      This patch fixes this issue by returning NOT_SUPPORTED whenever
      the kvm PSCI call is executed for a function_id that is not
      implemented by the PSCI kvm layer.
      
      Cc: <stable@vger.kernel.org> # 3.18+
      Cc: Christoffer Dall <christoffer.dall@linaro.org>
      Acked-by: NSudeep Holla <sudeep.holla@arm.com>
      Signed-off-by: NLorenzo Pieralisi <lorenzo.pieralisi@arm.com>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      e2d99736
    • K
      KVM: arm/arm64: Enable the KVM-VFIO device · 8889583c
      Kim Phillips 提交于
      The KVM-VFIO device is used by the QEMU VFIO device. It is used to
      record the list of in-use VFIO groups so that KVM can manipulate
      them.
      Signed-off-by: NKim Phillips <kim.phillips@linaro.org>
      Signed-off-by: NEric Auger <eric.auger@linaro.org>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      8889583c
    • C
      arm/arm64: KVM: Properly account for guest CPU time · 1b3d546d
      Christoffer Dall 提交于
      Until now we have been calling kvm_guest_exit after re-enabling
      interrupts when we come back from the guest, but this has the
      unfortunate effect that CPU time accounting done in the context of timer
      interrupts occurring while the guest is running doesn't properly notice
      that the time since the last tick was spent in the guest.
      
      Inspired by the comment in the x86 code, move the kvm_guest_exit() call
      below the local_irq_enable() call and change __kvm_guest_exit() to
      kvm_guest_exit(), because we are now calling this function with
      interrupts enabled.  We have to now explicitly disable preemption and
      not enable preemption before we've called kvm_guest_exit(), since
      otherwise we could be preempted and everything happening before we
      eventually get scheduled again would be accounted for as guest time.
      
      At the same time, move the trace_kvm_exit() call outside of the atomic
      section, since there is no reason for us to do that with interrupts
      disabled.
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      1b3d546d
    • T
      kvm: remove one useless check extension · ea2c6d97
      Tiejun Chen 提交于
      We already check KVM_CAP_IRQFD in generic once enable CONFIG_HAVE_KVM_IRQFD,
      
      kvm_vm_ioctl_check_extension_generic()
          |
          + switch (arg) {
          +   ...
          +   #ifdef CONFIG_HAVE_KVM_IRQFD
          +       case KVM_CAP_IRQFD:
          +   #endif
          +   ...
          +   return 1;
          +   ...
          + }
          |
          + kvm_vm_ioctl_check_extension()
      
      So its not necessary to check this in arch again, and also fix one typo,
      s/emlation/emulation.
      Signed-off-by: NTiejun Chen <tiejun.chen@intel.com>
      Acked-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      ea2c6d97
    • M
      arm: KVM: force execution of HCPTR access on VM exit · 85e84ba3
      Marc Zyngier 提交于
      On VM entry, we disable access to the VFP registers in order to
      perform a lazy save/restore of these registers.
      
      On VM exit, we restore access, test if we did enable them before,
      and save/restore the guest/host registers if necessary. In this
      sequence, the FPEXC register is always accessed, irrespective
      of the trapping configuration.
      
      If the guest didn't touch the VFP registers, then the HCPTR access
      has now enabled such access, but we're missing a barrier to ensure
      architectural execution of the new HCPTR configuration. If the HCPTR
      access has been delayed/reordered, the subsequent access to FPEXC
      will cause a trap, which we aren't prepared to handle at all.
      
      The same condition exists when trapping to enable VFP for the guest.
      
      The fix is to introduce a barrier after enabling VFP access. In the
      vmexit case, it can be relaxed to only takes place if the guest hasn't
      accessed its view of the VFP registers, making the access to FPEXC safe.
      
      The set_hcptr macro is modified to deal with both vmenter/vmexit and
      vmtrap operations, and now takes an optional label that is branched to
      when the guest hasn't touched the VFP registers.
      Reported-by: NVikram Sethi <vikrams@codeaurora.org>
      Cc: stable@kernel.org	# v3.9+
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      85e84ba3
  3. 10 6月, 2015 1 次提交
  4. 28 5月, 2015 1 次提交
  5. 27 5月, 2015 1 次提交
  6. 26 5月, 2015 3 次提交
  7. 09 5月, 2015 1 次提交
  8. 07 5月, 2015 1 次提交
  9. 22 4月, 2015 1 次提交
    • A
      KVM: arm/arm64: check IRQ number on userland injection · fd1d0ddf
      Andre Przywara 提交于
      When userland injects a SPI via the KVM_IRQ_LINE ioctl we currently
      only check it against a fixed limit, which historically is set
      to 127. With the new dynamic IRQ allocation the effective limit may
      actually be smaller (64).
      So when now a malicious or buggy userland injects a SPI in that
      range, we spill over on our VGIC bitmaps and bytemaps memory.
      I could trigger a host kernel NULL pointer dereference with current
      mainline by injecting some bogus IRQ number from a hacked kvmtool:
      -----------------
      ....
      DEBUG: kvm_vgic_inject_irq(kvm, cpu=0, irq=114, level=1)
      DEBUG: vgic_update_irq_pending(kvm, cpu=0, irq=114, level=1)
      DEBUG: IRQ #114 still in the game, writing to bytemap now...
      Unable to handle kernel NULL pointer dereference at virtual address 00000000
      pgd = ffffffc07652e000
      [00000000] *pgd=00000000f658b003, *pud=00000000f658b003, *pmd=0000000000000000
      Internal error: Oops: 96000006 [#1] PREEMPT SMP
      Modules linked in:
      CPU: 1 PID: 1053 Comm: lkvm-msi-irqinj Not tainted 4.0.0-rc7+ #3027
      Hardware name: FVP Base (DT)
      task: ffffffc0774e9680 ti: ffffffc0765a8000 task.ti: ffffffc0765a8000
      PC is at kvm_vgic_inject_irq+0x234/0x310
      LR is at kvm_vgic_inject_irq+0x30c/0x310
      pc : [<ffffffc0000ae0a8>] lr : [<ffffffc0000ae180>] pstate: 80000145
      .....
      
      So this patch fixes this by checking the SPI number against the
      actual limit. Also we remove the former legacy hard limit of
      127 in the ioctl code.
      Signed-off-by: NAndre Przywara <andre.przywara@arm.com>
      Reviewed-by: NChristoffer Dall <christoffer.dall@linaro.org>
      CC: <stable@vger.kernel.org> # 4.0, 3.19, 3.18
      [maz: wrap KVM_ARM_IRQ_GIC_MAX with #ifndef __KERNEL__,
      as suggested by Christopher Covington]
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      fd1d0ddf
  10. 31 3月, 2015 2 次提交
    • N
      KVM: arm/arm64: enable KVM_CAP_IOEVENTFD · d44758c0
      Nikolay Nikolaev 提交于
      As the infrastructure for eventfd has now been merged, report the
      ioeventfd capability as being supported.
      Signed-off-by: NNikolay Nikolaev <n.nikolaev@virtualopensystems.com>
      [maz: grouped the case entry with the others, fixed commit log]
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      d44758c0
    • A
      KVM: arm/arm64: rework MMIO abort handling to use KVM MMIO bus · 950324ab
      Andre Przywara 提交于
      Currently we have struct kvm_exit_mmio for encapsulating MMIO abort
      data to be passed on from syndrome decoding all the way down to the
      VGIC register handlers. Now as we switch the MMIO handling to be
      routed through the KVM MMIO bus, it does not make sense anymore to
      use that structure already from the beginning. So we keep the data in
      local variables until we put them into the kvm_io_bus framework.
      Then we fill kvm_exit_mmio in the VGIC only, making it a VGIC private
      structure. On that way we replace the data buffer in that structure
      with a pointer pointing to a single location in a local variable, so
      we get rid of some copying on the way.
      With all of the virtual GIC emulation code now being registered with
      the kvm_io_bus, we can remove all of the old MMIO handling code and
      its dispatching functionality.
      
      I didn't bother to rename kvm_exit_mmio (to vgic_mmio or something),
      because that touches a lot of code lines without any good reason.
      
      This is based on an original patch by Nikolay.
      Signed-off-by: NAndre Przywara <andre.przywara@arm.com>
      Cc: Nikolay Nikolaev <n.nikolaev@virtualopensystems.com>
      Reviewed-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      950324ab
  11. 27 3月, 2015 2 次提交
  12. 23 3月, 2015 1 次提交
  13. 20 3月, 2015 1 次提交
    • A
      ARM, arm64: kvm: get rid of the bounce page · 06f75a1f
      Ard Biesheuvel 提交于
      The HYP init bounce page is a runtime construct that ensures that the
      HYP init code does not cross a page boundary. However, this is something
      we can do perfectly well at build time, by aligning the code appropriately.
      
      For arm64, we just align to 4 KB, and enforce that the code size is less
      than 4 KB, regardless of the chosen page size.
      
      For ARM, the whole code is less than 256 bytes, so we tweak the linker
      script to align at a power of 2 upper bound of the code size
      
      Note that this also fixes a benign off-by-one error in the original bounce
      page code, where a bounce page would be allocated unnecessarily if the code
      was exactly 1 page in size.
      
      On ARM, it also fixes an issue with very large kernels reported by Arnd
      Bergmann, where stub sections with linker emitted veneers could erroneously
      trigger the size/alignment ASSERT() in the linker script.
      Tested-by: NMarc Zyngier <marc.zyngier@arm.com>
      Reviewed-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      06f75a1f
  14. 14 3月, 2015 2 次提交
    • C
      arm/arm64: KVM: Fix migration race in the arch timer · 1a748478
      Christoffer Dall 提交于
      When a VCPU is no longer running, we currently check to see if it has a
      timer scheduled in the future, and if it does, we schedule a host
      hrtimer to notify is in case the timer expires while the VCPU is still
      not running.  When the hrtimer fires, we mask the guest's timer and
      inject the timer IRQ (still relying on the guest unmasking the time when
      it receives the IRQ).
      
      This is all good and fine, but when migration a VM (checkpoint/restore)
      this introduces a race.  It is unlikely, but possible, for the following
      sequence of events to happen:
      
       1. Userspace stops the VM
       2. Hrtimer for VCPU is scheduled
       3. Userspace checkpoints the VGIC state (no pending timer interrupts)
       4. The hrtimer fires, schedules work in a workqueue
       5. Workqueue function runs, masks the timer and injects timer interrupt
       6. Userspace checkpoints the timer state (timer masked)
      
      At restore time, you end up with a masked timer without any timer
      interrupts and your guest halts never receiving timer interrupts.
      
      Fix this by only kicking the VCPU in the workqueue function, and sample
      the expired state of the timer when entering the guest again and inject
      the interrupt and mask the timer only then.
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      Signed-off-by: NAlex Bennée <alex.bennee@linaro.org>
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      1a748478
    • A
      arm/arm64: KVM: export VCPU power state via MP_STATE ioctl · ecccf0cc
      Alex Bennée 提交于
      To cleanly restore an SMP VM we need to ensure that the current pause
      state of each vcpu is correctly recorded. Things could get confused if
      the CPU starts running after migration restore completes when it was
      paused before it state was captured.
      
      We use the existing KVM_GET/SET_MP_STATE ioctl to do this. The arm/arm64
      interface is a lot simpler as the only valid states are
      KVM_MP_STATE_RUNNABLE and KVM_MP_STATE_STOPPED.
      Signed-off-by: NAlex Bennée <alex.bennee@linaro.org>
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      ecccf0cc
  15. 13 3月, 2015 3 次提交
  16. 12 3月, 2015 4 次提交
  17. 11 3月, 2015 3 次提交
    • P
      KVM: arm/arm64: prefer IS_ENABLED to a static variable · 69ff5c61
      Paolo Bonzini 提交于
      IS_ENABLED gives compile-time checking and keeps the code clearer.
      
      The one exception is inside kvm_vm_ioctl_check_extension, where
      the established idiom is to wrap the case labels with an #ifdef.
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      69ff5c61
    • M
      arm64: KVM: Do not use pgd_index to index stage-2 pgd · 04b8dc85
      Marc Zyngier 提交于
      The kernel's pgd_index macro is designed to index a normal, page
      sized array. KVM is a bit diffferent, as we can use concatenated
      pages to have a bigger address space (for example 40bit IPA with
      4kB pages gives us an 8kB PGD.
      
      In the above case, the use of pgd_index will always return an index
      inside the first 4kB, which makes a guest that has memory above
      0x8000000000 rather unhappy, as it spins forever in a page fault,
      whist the host happilly corrupts the lower pgd.
      
      The obvious fix is to get our own kvm_pgd_index that does the right
      thing(tm).
      
      Tested on X-Gene with a hacked kvmtool that put memory at a stupidly
      high address.
      Reviewed-by: NChristoffer Dall <christoffer.dall@linaro.org>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      04b8dc85
    • M
      arm64: KVM: Fix stage-2 PGD allocation to have per-page refcounting · a987370f
      Marc Zyngier 提交于
      We're using __get_free_pages with to allocate the guest's stage-2
      PGD. The standard behaviour of this function is to return a set of
      pages where only the head page has a valid refcount.
      
      This behaviour gets us into trouble when we're trying to increment
      the refcount on a non-head page:
      
      page:ffff7c00cfb693c0 count:0 mapcount:0 mapping:          (null) index:0x0
      flags: 0x4000000000000000()
      page dumped because: VM_BUG_ON_PAGE((*({ __attribute__((unused)) typeof((&page->_count)->counter) __var = ( typeof((&page->_count)->counter)) 0; (volatile typeof((&page->_count)->counter) *)&((&page->_count)->counter); })) <= 0)
      BUG: failure at include/linux/mm.h:548/get_page()!
      Kernel panic - not syncing: BUG!
      CPU: 1 PID: 1695 Comm: kvm-vcpu-0 Not tainted 4.0.0-rc1+ #3825
      Hardware name: APM X-Gene Mustang board (DT)
      Call trace:
      [<ffff80000008a09c>] dump_backtrace+0x0/0x13c
      [<ffff80000008a1e8>] show_stack+0x10/0x1c
      [<ffff800000691da8>] dump_stack+0x74/0x94
      [<ffff800000690d78>] panic+0x100/0x240
      [<ffff8000000a0bc4>] stage2_get_pmd+0x17c/0x2bc
      [<ffff8000000a1dc4>] kvm_handle_guest_abort+0x4b4/0x6b0
      [<ffff8000000a420c>] handle_exit+0x58/0x180
      [<ffff80000009e7a4>] kvm_arch_vcpu_ioctl_run+0x114/0x45c
      [<ffff800000099df4>] kvm_vcpu_ioctl+0x2e0/0x754
      [<ffff8000001c0a18>] do_vfs_ioctl+0x424/0x5c8
      [<ffff8000001c0bfc>] SyS_ioctl+0x40/0x78
      CPU0: stopping
      
      A possible approach for this is to split the compound page using
      split_page() at allocation time, and change the teardown path to
      free one page at a time.  It turns out that alloc_pages_exact() and
      free_pages_exact() does exactly that.
      
      While we're at it, the PGD allocation code is reworked to reduce
      duplication.
      
      This has been tested on an X-Gene platform with a 4kB/48bit-VA host
      kernel, and kvmtool hacked to place memory in the second page of
      the hardware PGD (PUD for the host kernel). Also regression-tested
      on a Cubietruck (Cortex-A7).
      
       [ Reworked to use alloc_pages_exact() and free_pages_exact() and to
         return pointers directly instead of by reference as arguments
          - Christoffer ]
      Reported-by: NMark Rutland <mark.rutland@arm.com>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      a987370f
  18. 24 2月, 2015 1 次提交
  19. 30 1月, 2015 3 次提交
    • M
      arm/arm64: KVM: Use kernel mapping to perform invalidation on page fault · 0d3e4d4f
      Marc Zyngier 提交于
      When handling a fault in stage-2, we need to resync I$ and D$, just
      to be sure we don't leave any old cache line behind.
      
      That's very good, except that we do so using the *user* address.
      Under heavy load (swapping like crazy), we may end up in a situation
      where the page gets mapped in stage-2 while being unmapped from
      userspace by another CPU.
      
      At that point, the DC/IC instructions can generate a fault, which
      we handle with kvm->mmu_lock held. The box quickly deadlocks, user
      is unhappy.
      
      Instead, perform this invalidation through the kernel mapping,
      which is guaranteed to be present. The box is much happier, and so
      am I.
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      0d3e4d4f
    • M
      arm/arm64: KVM: Invalidate data cache on unmap · 363ef89f
      Marc Zyngier 提交于
      Let's assume a guest has created an uncached mapping, and written
      to that page. Let's also assume that the host uses a cache-coherent
      IO subsystem. Let's finally assume that the host is under memory
      pressure and starts to swap things out.
      
      Before this "uncached" page is evicted, we need to make sure
      we invalidate potential speculated, clean cache lines that are
      sitting there, or the IO subsystem is going to swap out the
      cached view, loosing the data that has been written directly
      into memory.
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      363ef89f
    • M
      arm/arm64: KVM: Use set/way op trapping to track the state of the caches · 3c1e7165
      Marc Zyngier 提交于
      Trying to emulate the behaviour of set/way cache ops is fairly
      pointless, as there are too many ways we can end-up missing stuff.
      Also, there is some system caches out there that simply ignore
      set/way operations.
      
      So instead of trying to implement them, let's convert it to VA ops,
      and use them as a way to re-enable the trapping of VM ops. That way,
      we can detect the point when the MMU/caches are turned off, and do
      a full VM flush (which is what the guest was trying to do anyway).
      
      This allows a 32bit zImage to boot on the APM thingy, and will
      probably help bootloaders in general.
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      3c1e7165
  20. 29 1月, 2015 1 次提交