1. 30 5月, 2014 22 次提交
    • J
      MIPS: KVM: Override guest kernel timer frequency directly · eda3d33c
      James Hogan 提交于
      The KVM_HOST_FREQ Kconfig symbol was used by KVM guest kernels to
      override the timer frequency calculation to a value based on the host
      frequency. Now that the KVM timer emulation is implemented independent
      of the host timer frequency and defaults to 100MHz, adjust the working
      of CONFIG_KVM_HOST_FREQ to match.
      
      The Kconfig symbol now specifies the guest timer frequency directly, and
      has been renamed accordingly to KVM_GUEST_TIMER_FREQ. It now defaults to
      100MHz too and the help text is updated to make it clear that a zero
      value will allow the normal timer frequency calculation to take place
      (based on the emulated RTC).
      Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Gleb Natapov <gleb@kernel.org>
      Cc: kvm@vger.kernel.org
      Cc: linux-mips@linux-mips.org
      Cc: Sanjay Lal <sanjayl@kymasys.com>
      Acked-by: NRalf Baechle <ralf@linux-mips.org>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      eda3d33c
    • J
      MIPS: KVM: Rewrite count/compare timer emulation · e30492bb
      James Hogan 提交于
      Previously the emulation of the CPU timer was just enough to get a Linux
      guest running but some shortcuts were taken:
       - The guest timer interrupt was hard coded to always happen every 10 ms
         rather than being timed to when CP0_Count would match CP0_Compare.
       - The guest's CP0_Count register was based on the host's CP0_Count
         register. This isn't very portable and fails on cores without a
         CP_Count register implemented such as Ingenic XBurst. It also meant
         that the guest's CP0_Cause.DC bit to disable the CP0_Count register
         took no effect.
       - The guest's CP0_Count register was emulated by just dividing the
         host's CP0_Count register by 4. This resulted in continuity problems
         when used as a clock source, since when the host CP0_Count overflows
         from 0x7fffffff to 0x80000000, the guest CP0_Count transitions
         discontinuously from 0x1fffffff to 0xe0000000.
      
      Therefore rewrite & fix emulation of the guest timer based on the
      monotonic kernel time (i.e. ktime_get()). Internally a 32-bit count_bias
      value is added to the frequency scaled nanosecond monotonic time to get
      the guest's CP0_Count. The frequency of the timer is initialised to
      100MHz and cannot yet be changed, but a later patch will allow the
      frequency to be configured via the KVM_{GET,SET}_ONE_REG ioctl
      interface.
      
      The timer can now be stopped via the CP0_Cause.DC bit (by the guest or
      via the KVM_SET_ONE_REG ioctl interface), at which point the current
      CP0_Count is stored and can be read directly. When it is restarted the
      bias is recalculated such that the CP0_Count value is continuous.
      
      Due to the nature of hrtimer interrupts any read of the guest's
      CP0_Count register while it is running triggers a check for whether the
      hrtimer has expired, so that the guest/userland cannot observe the
      CP0_Count passing CP0_Compare without queuing a timer interrupt. This is
      also taken advantage of when stopping the timer to ensure that a pending
      timer interrupt is queued.
      
      This replaces the implementation of:
       - Guest read of CP0_Count
       - Guest write of CP0_Count
       - Guest write of CP0_Compare
       - Guest write of CP0_Cause
       - Guest read of HWR 2 (CC) with RDHWR
       - Host read of CP0_Count via KVM_GET_ONE_REG ioctl interface
       - Host write of CP0_Count via KVM_SET_ONE_REG ioctl interface
       - Host write of CP0_Compare via KVM_SET_ONE_REG ioctl interface
       - Host write of CP0_Cause via KVM_SET_ONE_REG ioctl interface
      Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Gleb Natapov <gleb@kernel.org>
      Cc: kvm@vger.kernel.org
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: linux-mips@linux-mips.org
      Cc: Sanjay Lal <sanjayl@kymasys.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      e30492bb
    • J
      MIPS: KVM: Migrate hrtimer to follow VCPU · 3a0ba774
      James Hogan 提交于
      When a VCPU is scheduled in on a different CPU, refresh the hrtimer used
      for emulating count/compare so that it gets migrated to the same CPU.
      
      This should prevent a timer interrupt occurring on a different CPU to
      where the guest it relates to is running, which would cause the guest
      timer interrupt not to be delivered until after the next guest exit.
      Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Gleb Natapov <gleb@kernel.org>
      Cc: kvm@vger.kernel.org
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: linux-mips@linux-mips.org
      Cc: Sanjay Lal <sanjayl@kymasys.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      3a0ba774
    • J
      MIPS: KVM: Fix timer race modifying guest CP0_Cause · c73c99b0
      James Hogan 提交于
      The hrtimer callback for guest timer timeouts sets the guest's
      CP0_Cause.TI bit to indicate to the guest that a timer interrupt is
      pending, however there is no mutual exclusion implemented to prevent
      this occurring while the guest's CP0_Cause register is being
      read-modify-written elsewhere.
      
      When this occurs the setting of the CP0_Cause.TI bit is undone and the
      guest misses the timer interrupt and doesn't reprogram the CP0_Compare
      register for the next timeout. Currently another timer interrupt will be
      triggered again in another 10ms anyway due to the way timers are
      emulated, but after the MIPS timer emulation is fixed this would result
      in Linux guest time standing still and the guest scheduler not being
      invoked until the guest CP0_Count has looped around again, which at
      100MHz takes just under 43 seconds.
      
      Currently this is the only asynchronous modification of guest registers,
      therefore it is fixed by adjusting the implementations of the
      kvm_set_c0_guest_cause(), kvm_clear_c0_guest_cause(), and
      kvm_change_c0_guest_cause() macros which are used for modifying the
      guest CP0_Cause register to use ll/sc to ensure atomic modification.
      This should work in both UP and SMP cases without requiring interrupts
      to be disabled.
      Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Gleb Natapov <gleb@kernel.org>
      Cc: kvm@vger.kernel.org
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: linux-mips@linux-mips.org
      Cc: Sanjay Lal <sanjayl@kymasys.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      c73c99b0
    • J
      MIPS: KVM: Deliver guest interrupts after local_irq_disable() · 044f0f03
      James Hogan 提交于
      When about to run the guest, deliver guest interrupts after disabling
      host interrupts. This should prevent an hrtimer interrupt from being
      handled after delivering guest interrupts, and therefore not delivering
      the guest timer interrupt until after the next guest exit.
      Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Gleb Natapov <gleb@kernel.org>
      Cc: kvm@vger.kernel.org
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: linux-mips@linux-mips.org
      Cc: Sanjay Lal <sanjayl@kymasys.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      044f0f03
    • J
      MIPS: KVM: Add CP0_HWREna KVM register access · 16fd5c1d
      James Hogan 提交于
      Implement KVM_{GET,SET}_ONE_REG ioctl based access to the guest CP0
      HWREna register. This is so that userland can save and restore its
      value so that RDHWR instructions don't have to be emulated by the guest.
      Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Gleb Natapov <gleb@kernel.org>
      Cc: kvm@vger.kernel.org
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: linux-mips@linux-mips.org
      Cc: David Daney <david.daney@cavium.com>
      Cc: Sanjay Lal <sanjayl@kymasys.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      16fd5c1d
    • J
      MIPS: KVM: Add CP0_UserLocal KVM register access · 7767b7d2
      James Hogan 提交于
      Implement KVM_{GET,SET}_ONE_REG ioctl based access to the guest CP0
      UserLocal register. This is so that userland can save and restore its
      value.
      Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Gleb Natapov <gleb@kernel.org>
      Cc: kvm@vger.kernel.org
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: linux-mips@linux-mips.org
      Cc: David Daney <david.daney@cavium.com>
      Cc: Sanjay Lal <sanjayl@kymasys.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      7767b7d2
    • J
      MIPS: KVM: Add CP0_Count/Compare KVM register access · f8be02da
      James Hogan 提交于
      Implement KVM_{GET,SET}_ONE_REG ioctl based access to the guest CP0
      Count and Compare registers. These registers are special in that writing
      to them has side effects (adjusting the time until the next timer
      interrupt) and reading of Count depends on the time. Therefore add a
      couple of callbacks so that different implementations (trap & emulate or
      VZ) can implement them differently depending on what the hardware
      provides.
      
      The trap & emulate versions mostly duplicate what happens when a T&E
      guest reads or writes these registers, so it inherits the same
      limitations which can be fixed in later patches.
      Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Gleb Natapov <gleb@kernel.org>
      Cc: kvm@vger.kernel.org
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: linux-mips@linux-mips.org
      Cc: David Daney <david.daney@cavium.com>
      Cc: Sanjay Lal <sanjayl@kymasys.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      f8be02da
    • J
      MIPS: KVM: Move KVM_{GET,SET}_ONE_REG definitions into kvm_host.h · 48a3c4e4
      James Hogan 提交于
      Move the KVM_{GET,SET}_ONE_REG MIPS register id definitions out of
      kvm_mips.c to kvm_host.h so that they can be shared between multiple
      source files. This allows register access to be indirected depending on
      the underlying implementation (trap & emulate or VZ).
      Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Gleb Natapov <gleb@kernel.org>
      Cc: kvm@vger.kernel.org
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: linux-mips@linux-mips.org
      Cc: David Daney <david.daney@cavium.com>
      Cc: Sanjay Lal <sanjayl@kymasys.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      48a3c4e4
    • J
      MIPS: KVM: Add CP0_EPC KVM register access · fb6df0cd
      James Hogan 提交于
      Contrary to the comment, the guest CP0_EPC register cannot be set via
      kvm_regs, since it is distinct from the guest PC. Add the EPC register
      to the KVM_{GET,SET}_ONE_REG ioctl interface.
      Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Gleb Natapov <gleb@kernel.org>
      Cc: kvm@vger.kernel.org
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: linux-mips@linux-mips.org
      Cc: David Daney <david.daney@cavium.com>
      Cc: Sanjay Lal <sanjayl@kymasys.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      fb6df0cd
    • J
      MIPS: KVM: Use tlb_write_random · b5dfc6c1
      James Hogan 提交于
      When MIPS KVM needs to write a TLB entry for the guest it reads the
      CP0_Random register, uses it to generate the CP_Index, and writes the
      TLB entry using the TLBWI instruction (tlb_write_indexed()).
      
      However there's an instruction for that, TLBWR (tlb_write_random()) so
      use that instead.
      
      This happens to also fix an issue with Ingenic XBurst cores where the
      same TLB entry is replaced each time preventing forward progress on
      stores due to alternating between TLB load misses for the instruction
      fetch and TLB store misses.
      Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Gleb Natapov <gleb@kernel.org>
      Cc: kvm@vger.kernel.org
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: linux-mips@linux-mips.org
      Cc: Sanjay Lal <sanjayl@kymasys.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      b5dfc6c1
    • J
      MIPS: KVM: Use local_flush_icache_range to fix RI on XBurst · facaaec1
      James Hogan 提交于
      MIPS KVM uses mips32_SyncICache to synchronise the icache with the
      dcache after dynamically modifying guest instructions or writing guest
      exception vector. However this uses rdhwr to get the SYNCI step, which
      causes a reserved instruction exception on Ingenic XBurst cores.
      
      It would seem to make more sense to use local_flush_icache_range()
      instead which does the same thing but is more portable.
      Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Gleb Natapov <gleb@kernel.org>
      Cc: kvm@vger.kernel.org
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: linux-mips@linux-mips.org
      Cc: Sanjay Lal <sanjayl@kymasys.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      facaaec1
    • J
      MIPS: Export local_flush_icache_range for KVM · 90f91356
      James Hogan 提交于
      Export the local_flush_icache_range function pointer for GPL modules so
      that it can be used by KVM for syncing the icache after binary
      translation of trapping instructions.
      Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Gleb Natapov <gleb@kernel.org>
      Cc: kvm@vger.kernel.org
      Cc: linux-mips@linux-mips.org
      Cc: Sanjay Lal <sanjayl@kymasys.com>
      Acked-by: NRalf Baechle <ralf@linux-mips.org>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      90f91356
    • J
      MIPS: KVM: Allocate at least 16KB for exception handlers · 7006e2df
      James Hogan 提交于
      Each MIPS KVM guest has its own copy of the KVM exception vector. This
      contains the TLB refill exception handler at offset 0x000, the general
      exception handler at offset 0x180, and interrupt exception handlers at
      offset 0x200 in case Cause_IV=1. A common handler is copied to offset
      0x2000 and offset 0x3000 is used for temporarily storing k1 during entry
      from guest.
      
      However the amount of memory allocated for this purpose is calculated as
      0x200 rounded up to the next page boundary, which is insufficient if 4KB
      pages are in use. This can lead to the common handler at offset 0x2000
      being overwritten and infinitely recursive exceptions on the next exit
      from the guest.
      
      Increase the minimum size from 0x200 to 0x4000 to cover the full use of
      the page.
      Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Gleb Natapov <gleb@kernel.org>
      Cc: kvm@vger.kernel.org
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: linux-mips@linux-mips.org
      Cc: Sanjay Lal <sanjayl@kymasys.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      7006e2df
    • P
      Merge tag 'kvm-s390-20140530' of... · 146b2cfe
      Paolo Bonzini 提交于
      Merge tag 'kvm-s390-20140530' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into kvm-next
      
      1. Several minor fixes and cleanups for KVM:
      2. Fix flag check for gdb support
      3. Remove unnecessary vcpu start
      4. Remove code duplication for sigp interrupts
      5. Better DAT handling for the TPROT instruction
      6. Correct addressing exception for standby memory
      146b2cfe
    • M
      KVM: s390: Intercept the tprot instruction · 5a5e6536
      Matthew Rosato 提交于
      Based on original patch from Jeng-fang (Nick) Wang
      
      When standby memory is specified for a guest Linux, but no virtual memory has
      been allocated on the Qemu host backing that guest, the guest memory detection
      process encounters a memory access exception which is not thrown from the KVM
      handle_tprot() instruction-handler function. The access exception comes from
      sie64a returning EFAULT, which then passes an addressing exception to the guest.
      Unfortunately this does not the proper PSW fixup (nullifying vs.
      suppressing) so the guest will get a fault for the wrong address.
      
      Let's just intercept the tprot instruction all the time to do the right thing
      and not go the page fault handler path for standby memory. tprot is only used
      by Linux during startup so some exits should be ok.
      Without this patch, standby memory cannot be used with KVM.
      Signed-off-by: NNick Wang <jfwang@us.ibm.com>
      Reviewed-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      Reviewed-by: NCornelia Huck <cornelia.huck@de.ibm.com>
      Tested-by: NMatthew Rosato <mjrosato@linux.vnet.ibm.com>
      Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      5a5e6536
    • D
      KVM: s390: a VCPU is already started when delivering interrupts · 3192c639
      David Hildenbrand 提交于
      This patch removes the start of a VCPU when delivering a RESTART interrupt.
      Interrupt delivery is called from kvm_arch_vcpu_ioctl_run. So the VCPU is
      already considered started - no need to call kvm_s390_vcpu_start. This function
      will early exit anyway.
      Signed-off-by: NDavid Hildenbrand <dahi@linux.vnet.ibm.com>
      Reviewed-by: NCornelia Huck <cornelia.huck@de.ibm.com>
      Reviewed-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      3192c639
    • D
      KVM: s390: check the given debug flags, not the set ones · 2de3bfc2
      David Hildenbrand 提交于
      This patch fixes a minor bug when updating the guest debug settings.
      We should check the given debug flags, not the already set ones.
      Doesn't do any harm but too many (for now unused) flags could be set internally
      without error.
      Signed-off-by: NDavid Hildenbrand <dahi@linux.vnet.ibm.com>
      Reviewed-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      2de3bfc2
    • J
      KVM: s390: clean up interrupt injection in sigp code · 22ff4a33
      Jens Freimann 提交于
      We have all the logic to inject interrupts available in
      kvm_s390_inject_vcpu(), so let's use it instead of
      injecting irqs manually to the list in sigp code.
      
      SIGP stop is special because we have to check the
      action_flags before injecting the interrupt. As
      the action_flags are not available in kvm_s390_inject_vcpu()
      we leave the code for the stop order code untouched for now.
      Signed-off-by: NJens Freimann <jfrei@linux.vnet.ibm.com>
      Reviewed-by: NDavid Hildenbrand <dahi@linux.vnet.ibm.com>
      Reviewed-by: NCornelia Huck <cornelia.huck@de.ibm.com>
      22ff4a33
    • T
      KVM: s390: Enable DAT support for TPROT handler · a0465f9a
      Thomas Huth 提交于
      The TPROT instruction can be used to check the accessability of storage
      for any kind of logical addresses. So far, our handler only supported
      real addresses. This patch now also enables support for addresses that
      have to be translated via DAT first. And while we're at it, change the
      code to use the common KVM function gfn_to_hva_prot() to check for the
      validity and writability of the memory page.
      Signed-off-by: NThomas Huth <thuth@linux.vnet.ibm.com>
      Reviewed-by: NCornelia Huck <cornelia.huck@de.ibm.com>
      a0465f9a
    • T
      KVM: s390: Add a generic function for translating guest addresses · 9fbc0276
      Thomas Huth 提交于
      This patch adds a function for translating logical guest addresses into
      physical guest addresses without touching the memory at the given location.
      Signed-off-by: NThomas Huth <thuth@linux.vnet.ibm.com>
      Reviewed-by: NCornelia Huck <cornelia.huck@de.ibm.com>
      9fbc0276
    • D
      MIPS: KVM: remove the stale memory alias support function unalias_gfn · 356d4c20
      Deng-Cheng Zhu 提交于
      The memory alias support has been removed since a1f4d395 (KVM: Remove
      memory alias support). So remove unalias_gfn from the MIPS port.
      Reviewed-by: NJames Hogan <james.hogan@imgtec.com>
      Signed-off-by: NDeng-Cheng Zhu <dengcheng.zhu@imgtec.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      356d4c20
  2. 27 5月, 2014 4 次提交
    • C
      arm: Fix compile warning for psci · d6d7a95c
      Christoffer Dall 提交于
      Commit e71246a2 changes psci_init from a
      function returning a void to an int, but does not change the non
      CONFIG_ARM_PSCI implementation to return a value, which causes a compile
      warning.  Just return 0.
      
      Cc: Ashwin Chaugule <ashwin.chaugule@linaro.org>
      Cc: Shawn Guo <shawn.guo@freescale.com>
      Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      d6d7a95c
    • P
      Merge tag 'kvm-arm-for-3.16' of... · 04092204
      Paolo Bonzini 提交于
      Merge tag 'kvm-arm-for-3.16' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into kvm-next
      
      Changed for the 3.16 merge window.
      
      This includes KVM support for PSCI v0.2 and also includes generic Linux
      support for PSCI v0.2 (on hosts that advertise that feature via their
      DT), since the latter depends on headers introduced by the former.
      
      Finally there's a small patch from Marc that enables Cortex-A53 support.
      04092204
    • N
      KVM: x86: MOV CR/DR emulation should ignore mod · 9b88ae99
      Nadav Amit 提交于
      MOV CR/DR instructions ignore the mod field (in the ModR/M byte). As the SDM
      states: "The 2 bits in the mod field are ignored".  Accordingly, the second
      operand of these instructions is always a general purpose register.
      
      The current emulator implementation does not do so. If the mod bits do not
      equal 3, it expects the second operand to be in memory.
      Signed-off-by: NNadav Amit <namit@cs.technion.ac.il>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      9b88ae99
    • P
      KVM: lapic: sync highest ISR to hardware apic on EOI · fc57ac2c
      Paolo Bonzini 提交于
      When Hyper-V enlightenments are in effect, Windows prefers to issue an
      Hyper-V MSR write to issue an EOI rather than an x2apic MSR write.
      The Hyper-V MSR write is not handled by the processor, and besides
      being slower, this also causes bugs with APIC virtualization.  The
      reason is that on EOI the processor will modify the highest in-service
      interrupt (SVI) field of the VMCS, as explained in section 29.1.4 of
      the SDM; every other step in EOI virtualization is already done by
      apic_send_eoi or on VM entry, but this one is missing.
      
      We need to do the same, and be careful not to muck with the isr_count
      and highest_isr_cache fields that are unused when virtual interrupt
      delivery is enabled.
      
      Cc: stable@vger.kernel.org
      Reviewed-by: NYang Zhang <yang.z.zhang@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      fc57ac2c
  3. 26 5月, 2014 1 次提交
  4. 22 5月, 2014 6 次提交
    • N
      KVM: vmx: DR7 masking on task switch emulation is wrong · 1f854112
      Nadav Amit 提交于
      The DR7 masking which is done on task switch emulation should be in hex format
      (clearing the local breakpoints enable bits 0,2,4 and 6).
      Signed-off-by: NNadav Amit <namit@cs.technion.ac.il>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      1f854112
    • D
      x86: fix page fault tracing when KVM guest support enabled · 65a7f03f
      Dave Hansen 提交于
      I noticed on some of my systems that page fault tracing doesn't
      work:
      
      	cd /sys/kernel/debug/tracing
      	echo 1 > events/exceptions/enable
      	cat trace;
      	# nothing shows up
      
      I eventually traced it down to CONFIG_KVM_GUEST.  At least in a
      KVM VM, enabling that option breaks page fault tracing, and
      disabling fixes it.  I tried on some old kernels and this does
      not appear to be a regression: it never worked.
      
      There are two page-fault entry functions today.  One when tracing
      is on and another when it is off.  The KVM code calls do_page_fault()
      directly instead of calling the traced version:
      
      > dotraplinkage void __kprobes
      > do_async_page_fault(struct pt_regs *regs, unsigned long
      > error_code)
      > {
      >         enum ctx_state prev_state;
      >
      >         switch (kvm_read_and_reset_pf_reason()) {
      >         default:
      >                 do_page_fault(regs, error_code);
      >                 break;
      >         case KVM_PV_REASON_PAGE_NOT_PRESENT:
      
      I'm also having problems with the page fault tracing on bare
      metal (same symptom of no trace output).  I'm unsure if it's
      related.
      
      Steven had an alternative to this which has zero overhead when
      tracing is off where this includes the standard noops even when
      tracing is disabled.  I'm unconvinced that the extra complexity
      of his apporach:
      
      	http://lkml.kernel.org/r/20140508194508.561ed220@gandalf.local.home
      
      is worth it, expecially considering that the KVM code is already
      making page fault entry slower here.  This solution is
      dirt-simple.
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: x86@kernel.org
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Gleb Natapov <gleb@redhat.com>
      Cc: kvm@vger.kernel.org
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Acked-by: N"H. Peter Anvin" <hpa@zytor.com>
      Acked-by: NSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      65a7f03f
    • P
      KVM: x86: get CPL from SS.DPL · ae9fedc7
      Paolo Bonzini 提交于
      CS.RPL is not equal to the CPL in the few instructions between
      setting CR0.PE and reloading CS.  And CS.DPL is also not equal
      to the CPL for conforming code segments.
      
      However, SS.DPL *is* always equal to the CPL except for the weird
      case of SYSRET on AMD processors, which sets SS.DPL=SS.RPL from the
      value in the STAR MSR, but force CPL=3 (Intel instead forces
      SS.DPL=SS.RPL=CPL=3).
      
      So this patch:
      
      - modifies SVM to update the CPL from SS.DPL rather than CS.RPL;
      the above case with SYSRET is not broken further, and the way
      to fix it would be to pass the CPL to userspace and back
      
      - modifies VMX to always return the CPL from SS.DPL (except
      forcing it to 0 if we are emulating real mode via vm86 mode;
      in vm86 mode all DPLs have to be 3, but real mode does allow
      privileged instructions).  It also removes the CPL cache,
      which becomes a duplicate of the SS access rights cache.
      
      This fixes doing KVM_IOCTL_SET_SREGS exactly after setting
      CR0.PE=1 but before CS has been reloaded.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      ae9fedc7
    • P
      KVM: x86: check CS.DPL against RPL during task switch · 5045b468
      Paolo Bonzini 提交于
      Table 7-1 of the SDM mentions a check that the code segment's
      DPL must match the selector's RPL.  This was not done by KVM,
      fix it.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      5045b468
    • P
      KVM: x86: drop set_rflags callback · fb5e336b
      Paolo Bonzini 提交于
      Not needed anymore now that the CPL is computed directly
      during task switch.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      fb5e336b
    • P
      KVM: x86: use new CS.RPL as CPL during task switch · 2356aaeb
      Paolo Bonzini 提交于
      During task switch, all of CS.DPL, CS.RPL, SS.DPL must match (in addition
      to all the other requirements) and will be the new CPL.  So far this
      worked by carefully setting the CS selector and flag before doing the
      task switch; setting CS.selector will already change the CPL.
      
      However, this will not work once we get the CPL from SS.DPL, because
      then you will have to set the full segment descriptor cache to change
      the CPL.  ctxt->ops->cpl(ctxt) will then return the old CPL during the
      task switch, and the check that SS.DPL == CPL will fail.
      
      Temporarily assume that the CPL comes from CS.RPL during task switch
      to a protected-mode task.  This is the same approach used in QEMU's
      emulation code, which (until version 2.0) manually tracks the CPL.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      2356aaeb
  5. 17 5月, 2014 1 次提交
    • P
      Merge tag 'kvm-s390-20140516' of... · afa538f0
      Paolo Bonzini 提交于
      Merge tag 'kvm-s390-20140516' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into kvm-next
      
      1. Correct locking for lazy storage key handling
         A test loop with multiple CPUs triggered a race in the lazy storage
         key handling as introduced by commit 934bc131
         (KVM: s390: Allow skeys to be enabled for the current process). This
         race should not happen with Linux guests, but let's fix it anyway.
         Patch touches !/kvm/ code, but is from the s390 maintainer.
      
      2. Better handling of broken guests
         If we detect a program check loop we stop the guest instead of
         wasting CPU cycles.
      
      3. Better handling on MVPG emulation
         The move page handling is improved to be architecturally correct.
      
      3. Trace point rework
         Let's rework the kvm trace points to have a common header file (for
         later perf usage) and provided a table based instruction decoder.
      
      4. Interpretive execution of SIGP external call
         Let the hardware handle most cases of SIGP external call (IPI) and
         wire up the fixup code for the corner cases.
      
      5. Initial preparations for the IBC facility
         Prepare the code to handle instruction blocking
      afa538f0
  6. 16 5月, 2014 6 次提交