1. 25 11月, 2015 1 次提交
    • A
      ARM/arm64: KVM: test properly for a PTE's uncachedness · e6fab544
      Ard Biesheuvel 提交于
      The open coded tests for checking whether a PTE maps a page as
      uncached use a flawed '(pte_val(xxx) & CONST) != CONST' pattern,
      which is not guaranteed to work since the type of a mapping is
      not a set of mutually exclusive bits
      
      For HYP mappings, the type is an index into the MAIR table (i.e, the
      index itself does not contain any information whatsoever about the
      type of the mapping), and for stage-2 mappings it is a bit field where
      normal memory and device types are defined as follows:
      
          #define MT_S2_NORMAL            0xf
          #define MT_S2_DEVICE_nGnRE      0x1
      
      I.e., masking *and* comparing with the latter matches on the former,
      and we have been getting lucky merely because the S2 device mappings
      also have the PTE_UXN bit set, or we would misidentify memory mappings
      as device mappings.
      
      Since the unmap_range() code path (which contains one instance of the
      flawed test) is used both for HYP mappings and stage-2 mappings, and
      considering the difference between the two, it is non-trivial to fix
      this by rewriting the tests in place, as it would involve passing
      down the type of mapping through all the functions.
      
      However, since HYP mappings and stage-2 mappings both deal with host
      physical addresses, we can simply check whether the mapping is backed
      by memory that is managed by the host kernel, and only perform the
      D-cache maintenance if this is the case.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Tested-by: NPavel Fedin <p.fedin@samsung.com>
      Reviewed-by: NChristoffer Dall <christoffer.dall@linaro.org>
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      e6fab544
  2. 23 10月, 2015 8 次提交
    • C
      arm/arm64: KVM: Improve kvm_exit tracepoint · b5905dc1
      Christoffer Dall 提交于
      The ARM architecture only saves the exit class to the HSR (ESR_EL2 for
      arm64) on synchronous exceptions, not on asynchronous exceptions like an
      IRQ.  However, we only report the exception class on kvm_exit, which is
      confusing because an IRQ looks like it exited at some PC with the same
      reason as the previous exit.  Add a lookup table for the exception index
      and prepend the kvm_exit tracepoint text with the exception type to
      clarify this situation.
      
      Also resolve the exception class (EC) to a human-friendly text version
      so the trace output becomes immediately usable for debugging this code.
      
      Cc: Wei Huang <wei@redhat.com>
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      b5905dc1
    • E
      KVM: arm/arm64: implement kvm_arm_[halt,resume]_guest · 3b92830a
      Eric Auger 提交于
      We introduce kvm_arm_halt_guest and resume functions. They
      will be used for IRQ forward state change.
      
      Halt is synchronous and prevents the guest from being re-entered.
      We use the same mechanism put in place for PSCI former pause,
      now renamed power_off. A new flag is introduced in arch vcpu state,
      pause, only meant to be used by those functions.
      Signed-off-by: NEric Auger <eric.auger@linaro.org>
      Reviewed-by: NChristoffer Dall <christoffer.dall@linaro.org>
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      3b92830a
    • E
      KVM: arm/arm64: check power_off in critical section before VCPU run · 101d3da0
      Eric Auger 提交于
      In case a vcpu off PSCI call is called just after we executed the
      vcpu_sleep check, we can enter the guest although power_off
      is set. Let's check the power_off state in the critical section,
      just before entering the guest.
      Signed-off-by: NEric Auger <eric.auger@linaro.org>
      Reported-by: NChristoffer Dall <christoffer.dall@linaro.org>
      Reviewed-by: NChristoffer Dall <christoffer.dall@linaro.org>
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      101d3da0
    • E
      KVM: arm/arm64: check power_off in kvm_arch_vcpu_runnable · 4f5f1dc0
      Eric Auger 提交于
      kvm_arch_vcpu_runnable now also checks whether the power_off
      flag is set.
      Signed-off-by: NEric Auger <eric.auger@linaro.org>
      Reviewed-by: NChristoffer Dall <christoffer.dall@linaro.org>
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      4f5f1dc0
    • E
      KVM: arm/arm64: rename pause into power_off · 3781528e
      Eric Auger 提交于
      The kvm_vcpu_arch pause field is renamed into power_off to prepare
      for the introduction of a new pause field. Also vcpu_pause is renamed
      into vcpu_sleep since we will sleep until both power_off and pause are
      false.
      Signed-off-by: NEric Auger <eric.auger@linaro.org>
      Reviewed-by: NChristoffer Dall <christoffer.dall@linaro.org>
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      3781528e
    • W
      arm/arm64: KVM : Enable vhost device selection under KVM config menu · 75755c6d
      Wei Huang 提交于
      vhost drivers provide guest VMs with better I/O performance and lower
      CPU utilization. This patch allows users to select vhost devices under
      KVM configuration menu on ARM. This makes vhost support on arm/arm64
      on a par with other architectures (e.g. x86, ppc).
      Signed-off-by: NWei Huang <wei@redhat.com>
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      75755c6d
    • C
      arm/arm64: KVM: Rework the arch timer to use level-triggered semantics · 4b4b4512
      Christoffer Dall 提交于
      The arch timer currently uses edge-triggered semantics in the sense that
      the line is never sampled by the vgic and lowering the line from the
      timer to the vgic doesn't have any effect on the pending state of
      virtual interrupts in the vgic.  This means that we do not support a
      guest with the otherwise valid behavior of (1) disable interrupts (2)
      enable the timer (3) disable the timer (4) enable interrupts.  Such a
      guest would validly not expect to see any interrupts on real hardware,
      but will see interrupts on KVM.
      
      This patch fixes this shortcoming through the following series of
      changes.
      
      First, we change the flow of the timer/vgic sync/flush operations.  Now
      the timer is always flushed/synced before the vgic, because the vgic
      samples the state of the timer output.  This has the implication that we
      move the timer operations in to non-preempible sections, but that is
      fine after the previous commit getting rid of hrtimer schedules on every
      entry/exit.
      
      Second, we change the internal behavior of the timer, letting the timer
      keep track of its previous output state, and only lower/raise the line
      to the vgic when the state changes.  Note that in theory this could have
      been accomplished more simply by signalling the vgic every time the
      state *potentially* changed, but we don't want to be hitting the vgic
      more often than necessary.
      
      Third, we get rid of the use of the map->active field in the vgic and
      instead simply set the interrupt as active on the physical distributor
      whenever the input to the GIC is asserted and conversely clear the
      physical active state when the input to the GIC is deasserted.
      
      Fourth, and finally, we now initialize the timer PPIs (and all the other
      unused PPIs for now), to be level-triggered, and modify the sync code to
      sample the line state on HW sync and re-inject a new interrupt if it is
      still pending at that time.
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      4b4b4512
    • C
      arm/arm64: KVM: arch_timer: Only schedule soft timer on vcpu_block · d35268da
      Christoffer Dall 提交于
      We currently schedule a soft timer every time we exit the guest if the
      timer did not expire while running the guest.  This is really not
      necessary, because the only work we do in the timer work function is to
      kick the vcpu.
      
      Kicking the vcpu does two things:
      (1) If the vpcu thread is on a waitqueue, make it runnable and remove it
      from the waitqueue.
      (2) If the vcpu is running on a different physical CPU from the one
      doing the kick, it sends a reschedule IPI.
      
      The second case cannot happen, because the soft timer is only ever
      scheduled when the vcpu is not running.  The first case is only relevant
      when the vcpu thread is on a waitqueue, which is only the case when the
      vcpu thread has called kvm_vcpu_block().
      
      Therefore, we only need to make sure a timer is scheduled for
      kvm_vcpu_block(), which we do by encapsulating all calls to
      kvm_vcpu_block() with kvm_timer_{un}schedule calls.
      
      Additionally, we only schedule a soft timer if the timer is enabled and
      unmasked, since it is useless otherwise.
      
      Note that theoretically userspace can use the SET_ONE_REG interface to
      change registers that should cause the timer to fire, even if the vcpu
      is blocked without a scheduled timer, but this case was not supported
      before this patch and we leave it for future work for now.
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      d35268da
  3. 21 10月, 2015 2 次提交
  4. 17 9月, 2015 3 次提交
    • M
      arm/arm64: KVM: Remove 'config KVM_ARM_MAX_VCPUS' · ef748917
      Ming Lei 提交于
      This patch removes config option of KVM_ARM_MAX_VCPUS,
      and like other ARCHs, just choose the maximum allowed
      value from hardware, and follows the reasons:
      
      1) from distribution view, the option has to be
      defined as the max allowed value because it need to
      meet all kinds of virtulization applications and
      need to support most of SoCs;
      
      2) using a bigger value doesn't introduce extra memory
      consumption, and the help text in Kconfig isn't accurate
      because kvm_vpu structure isn't allocated until request
      of creating VCPU is sent from QEMU;
      
      3) the main effect is that the field of vcpus[] in 'struct kvm'
      becomes a bit bigger(sizeof(void *) per vcpu) and need more cache
      lines to hold the structure, but 'struct kvm' is one generic struct,
      and it has worked well on other ARCHs already in this way. Also,
      the world switch frequecy is often low, for example, it is ~2000
      when running kernel building load in VM from APM xgene KVM host,
      so the effect is very small, and the difference can't be observed
      in my test at all.
      
      Cc: Dann Frazier <dann.frazier@canonical.com>
      Signed-off-by: NMing Lei <ming.lei@canonical.com>
      Reviewed-by: NChristoffer Dall <christoffer.dall@linaro.org>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      ef748917
    • M
      arm: KVM: Disable virtual timer even if the guest is not using it · 688bc577
      Marc Zyngier 提交于
      When running a guest with the architected timer disabled (with QEMU and
      the kernel_irqchip=off option, for example), it is important to make
      sure the timer gets turned off. Otherwise, the guest may try to
      enable it anyway, leading to a screaming HW interrupt.
      
      The fix is to unconditionally turn off the virtual timer on guest
      exit.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      688bc577
    • P
      arm/arm64: KVM: vgic: Check for !irqchip_in_kernel() when mapping resources · c2f58514
      Pavel Fedin 提交于
      Until b26e5fda ("arm/arm64: KVM: introduce per-VM ops"),
      kvm_vgic_map_resources() used to include a check on irqchip_in_kernel(),
      and vgic_v2_map_resources() still has it.
      
      But now vm_ops are not initialized until we call kvm_vgic_create().
      Therefore kvm_vgic_map_resources() can being called without a VGIC,
      and we die because vm_ops.map_resources is NULL.
      
      Fixing this restores QEMU's kernel-irqchip=off option to a working state,
      allowing to use GIC emulation in userspace.
      
      Fixes: b26e5fda ("arm/arm64: KVM: introduce per-VM ops")
      Cc: stable@vger.kernel.org
      Signed-off-by: NPavel Fedin <p.fedin@samsung.com>
      [maz: reworked commit message]
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      c2f58514
  5. 16 9月, 2015 1 次提交
    • M
      arm: KVM: Fix incorrect device to IPA mapping · ca09f02f
      Marek Majtyka 提交于
      A critical bug has been found in device memory stage1 translation for
      VMs with more then 4GB of address space. Once vm_pgoff size is smaller
      then pa (which is true for LPAE case, u32 and u64 respectively) some
      more significant bits of pa may be lost as a shift operation is performed
      on u32 and later cast onto u64.
      
      Example: vm_pgoff(u32)=0x00210030, PAGE_SHIFT=12
              expected pa(u64):   0x0000002010030000
              produced pa(u64):   0x0000000010030000
      
      The fix is to change the order of operations (casting first onto phys_addr_t
      and then shifting).
      Reviewed-by: NMarc Zyngier <marc.zyngier@arm.com>
      [maz: fixed changelog and patch formatting]
      Cc: stable@vger.kernel.org
      Signed-off-by: NMarek Majtyka <marek.majtyka@tieto.com>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      ca09f02f
  6. 05 9月, 2015 1 次提交
  7. 20 8月, 2015 1 次提交
  8. 12 8月, 2015 4 次提交
  9. 21 7月, 2015 3 次提交
  10. 17 6月, 2015 6 次提交
    • M
      arm/arm64: KVM: vgic: Do not save GICH_HCR / ICH_HCR_EL2 · 4642019d
      Marc Zyngier 提交于
      The GIC Hypervisor Configuration Register is used to enable
      the delivery of virtual interupts to a guest, as well as to
      define in which conditions maintenance interrupts are delivered
      to the host.
      
      This register doesn't contain any information that we need to
      read back (the EOIcount is utterly useless for us).
      
      So let's save ourselves some cycles, and not save it before
      writing zero to it.
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      4642019d
    • L
      ARM: kvm: psci: fix handling of unimplemented functions · e2d99736
      Lorenzo Pieralisi 提交于
      According to the PSCI specification and the SMC/HVC calling
      convention, PSCI function_ids that are not implemented must
      return NOT_SUPPORTED as return value.
      
      Current KVM implementation takes an unhandled PSCI function_id
      as an error and injects an undefined instruction into the guest
      if PSCI implementation is called with a function_id that is not
      handled by the resident PSCI version (ie it is not implemented),
      which is not the behaviour expected by a guest when calling a
      PSCI function_id that is not implemented.
      
      This patch fixes this issue by returning NOT_SUPPORTED whenever
      the kvm PSCI call is executed for a function_id that is not
      implemented by the PSCI kvm layer.
      
      Cc: <stable@vger.kernel.org> # 3.18+
      Cc: Christoffer Dall <christoffer.dall@linaro.org>
      Acked-by: NSudeep Holla <sudeep.holla@arm.com>
      Signed-off-by: NLorenzo Pieralisi <lorenzo.pieralisi@arm.com>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      e2d99736
    • K
      KVM: arm/arm64: Enable the KVM-VFIO device · 8889583c
      Kim Phillips 提交于
      The KVM-VFIO device is used by the QEMU VFIO device. It is used to
      record the list of in-use VFIO groups so that KVM can manipulate
      them.
      Signed-off-by: NKim Phillips <kim.phillips@linaro.org>
      Signed-off-by: NEric Auger <eric.auger@linaro.org>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      8889583c
    • C
      arm/arm64: KVM: Properly account for guest CPU time · 1b3d546d
      Christoffer Dall 提交于
      Until now we have been calling kvm_guest_exit after re-enabling
      interrupts when we come back from the guest, but this has the
      unfortunate effect that CPU time accounting done in the context of timer
      interrupts occurring while the guest is running doesn't properly notice
      that the time since the last tick was spent in the guest.
      
      Inspired by the comment in the x86 code, move the kvm_guest_exit() call
      below the local_irq_enable() call and change __kvm_guest_exit() to
      kvm_guest_exit(), because we are now calling this function with
      interrupts enabled.  We have to now explicitly disable preemption and
      not enable preemption before we've called kvm_guest_exit(), since
      otherwise we could be preempted and everything happening before we
      eventually get scheduled again would be accounted for as guest time.
      
      At the same time, move the trace_kvm_exit() call outside of the atomic
      section, since there is no reason for us to do that with interrupts
      disabled.
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      1b3d546d
    • T
      kvm: remove one useless check extension · ea2c6d97
      Tiejun Chen 提交于
      We already check KVM_CAP_IRQFD in generic once enable CONFIG_HAVE_KVM_IRQFD,
      
      kvm_vm_ioctl_check_extension_generic()
          |
          + switch (arg) {
          +   ...
          +   #ifdef CONFIG_HAVE_KVM_IRQFD
          +       case KVM_CAP_IRQFD:
          +   #endif
          +   ...
          +   return 1;
          +   ...
          + }
          |
          + kvm_vm_ioctl_check_extension()
      
      So its not necessary to check this in arch again, and also fix one typo,
      s/emlation/emulation.
      Signed-off-by: NTiejun Chen <tiejun.chen@intel.com>
      Acked-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      ea2c6d97
    • M
      arm: KVM: force execution of HCPTR access on VM exit · 85e84ba3
      Marc Zyngier 提交于
      On VM entry, we disable access to the VFP registers in order to
      perform a lazy save/restore of these registers.
      
      On VM exit, we restore access, test if we did enable them before,
      and save/restore the guest/host registers if necessary. In this
      sequence, the FPEXC register is always accessed, irrespective
      of the trapping configuration.
      
      If the guest didn't touch the VFP registers, then the HCPTR access
      has now enabled such access, but we're missing a barrier to ensure
      architectural execution of the new HCPTR configuration. If the HCPTR
      access has been delayed/reordered, the subsequent access to FPEXC
      will cause a trap, which we aren't prepared to handle at all.
      
      The same condition exists when trapping to enable VFP for the guest.
      
      The fix is to introduce a barrier after enabling VFP access. In the
      vmexit case, it can be relaxed to only takes place if the guest hasn't
      accessed its view of the VFP registers, making the access to FPEXC safe.
      
      The set_hcptr macro is modified to deal with both vmenter/vmexit and
      vmtrap operations, and now takes an optional label that is branched to
      when the guest hasn't touched the VFP registers.
      Reported-by: NVikram Sethi <vikrams@codeaurora.org>
      Cc: stable@kernel.org	# v3.9+
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      85e84ba3
  11. 10 6月, 2015 1 次提交
  12. 28 5月, 2015 1 次提交
  13. 27 5月, 2015 1 次提交
  14. 26 5月, 2015 3 次提交
  15. 09 5月, 2015 1 次提交
  16. 07 5月, 2015 1 次提交
  17. 22 4月, 2015 1 次提交
    • A
      KVM: arm/arm64: check IRQ number on userland injection · fd1d0ddf
      Andre Przywara 提交于
      When userland injects a SPI via the KVM_IRQ_LINE ioctl we currently
      only check it against a fixed limit, which historically is set
      to 127. With the new dynamic IRQ allocation the effective limit may
      actually be smaller (64).
      So when now a malicious or buggy userland injects a SPI in that
      range, we spill over on our VGIC bitmaps and bytemaps memory.
      I could trigger a host kernel NULL pointer dereference with current
      mainline by injecting some bogus IRQ number from a hacked kvmtool:
      -----------------
      ....
      DEBUG: kvm_vgic_inject_irq(kvm, cpu=0, irq=114, level=1)
      DEBUG: vgic_update_irq_pending(kvm, cpu=0, irq=114, level=1)
      DEBUG: IRQ #114 still in the game, writing to bytemap now...
      Unable to handle kernel NULL pointer dereference at virtual address 00000000
      pgd = ffffffc07652e000
      [00000000] *pgd=00000000f658b003, *pud=00000000f658b003, *pmd=0000000000000000
      Internal error: Oops: 96000006 [#1] PREEMPT SMP
      Modules linked in:
      CPU: 1 PID: 1053 Comm: lkvm-msi-irqinj Not tainted 4.0.0-rc7+ #3027
      Hardware name: FVP Base (DT)
      task: ffffffc0774e9680 ti: ffffffc0765a8000 task.ti: ffffffc0765a8000
      PC is at kvm_vgic_inject_irq+0x234/0x310
      LR is at kvm_vgic_inject_irq+0x30c/0x310
      pc : [<ffffffc0000ae0a8>] lr : [<ffffffc0000ae180>] pstate: 80000145
      .....
      
      So this patch fixes this by checking the SPI number against the
      actual limit. Also we remove the former legacy hard limit of
      127 in the ioctl code.
      Signed-off-by: NAndre Przywara <andre.przywara@arm.com>
      Reviewed-by: NChristoffer Dall <christoffer.dall@linaro.org>
      CC: <stable@vger.kernel.org> # 4.0, 3.19, 3.18
      [maz: wrap KVM_ARM_IRQ_GIC_MAX with #ifndef __KERNEL__,
      as suggested by Christopher Covington]
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      fd1d0ddf
  18. 31 3月, 2015 1 次提交