1. 01 8月, 2016 2 次提交
    • J
      KVM: VMX: Add VMCS to CPU's loaded VMCSs before VMPTRLD · b80c76ec
      Jim Mattson 提交于
      Kexec needs to know the addresses of all VMCSs that are active on
      each CPU, so that it can flush them from the VMCS caches. It is
      safe to record superfluous addresses that are not associated with
      an active VMCS, but it is not safe to omit an address associated
      with an active VMCS.
      
      After a call to vmcs_load, the VMCS that was loaded is active on
      the CPU. The VMCS should be added to the CPU's list of active
      VMCSs before it is loaded.
      Signed-off-by: NJim Mattson <jmattson@google.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      b80c76ec
    • D
      kvm: x86: nVMX: maintain internal copy of current VMCS · 4f2777bc
      David Matlack 提交于
      KVM maintains L1's current VMCS in guest memory, at the guest physical
      page identified by the argument to VMPTRLD. This makes hairy
      time-of-check to time-of-use bugs possible,as VCPUs can be writing
      the the VMCS page in memory while KVM is emulating VMLAUNCH and
      VMRESUME.
      
      The spec documents that writing to the VMCS page while it is loaded is
      "undefined". Therefore it is reasonable to load the entire VMCS into
      an internal cache during VMPTRLD and ignore writes to the VMCS page
      -- the guest should be using VMREAD and VMWRITE to access the current
      VMCS.
      
      To adhere to the spec, KVM should flush the current VMCS during VMPTRLD,
      and the target VMCS during VMCLEAR (as given by the operand to VMCLEAR).
      Since this implementation of VMCS caching only maintains the the current
      VMCS, VMCLEAR will only do a flush if the operand to VMCLEAR is the
      current VMCS pointer.
      
      KVM will also flush during VMXOFF, which is not mandated by the spec,
      but also not in conflict with the spec.
      Signed-off-by: NDavid Matlack <dmatlack@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      4f2777bc
  2. 28 7月, 2016 2 次提交
    • P
      KVM: PPC: Book3S HV: Save/restore TM state in H_CEDE · 93d17397
      Paul Mackerras 提交于
      It turns out that if the guest does a H_CEDE while the CPU is in
      a transactional state, and the H_CEDE does a nap, and the nap
      loses the architected state of the CPU (which is is allowed to do),
      then we lose the checkpointed state of the virtual CPU.  In addition,
      the transactional-memory state recorded in the MSR gets reset back
      to non-transactional, and when we try to return to the guest, we take
      a TM bad thing type of program interrupt because we are trying to
      transition from non-transactional to transactional with a hrfid
      instruction, which is not permitted.
      
      The result of the program interrupt occurring at that point is that
      the host CPU will hang in an infinite loop with interrupts disabled.
      Thus this is a denial of service vulnerability in the host which can
      be triggered by any guest (and depending on the guest kernel, it can
      potentially triggered by unprivileged userspace in the guest).
      
      This vulnerability has been assigned the ID CVE-2016-5412.
      
      To fix this, we save the TM state before napping and restore it
      on exit from the nap, when handling a H_CEDE in real mode.  The
      case where H_CEDE exits to host virtual mode is already OK (as are
      other hcalls which exit to host virtual mode) because the exit
      path saves the TM state.
      
      Cc: stable@vger.kernel.org # v3.15+
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      93d17397
    • P
      KVM: PPC: Book3S HV: Pull out TM state save/restore into separate procedures · f024ee09
      Paul Mackerras 提交于
      This moves the transactional memory state save and restore sequences
      out of the guest entry/exit paths into separate procedures.  This is
      so that these sequences can be used in going into and out of nap
      in a subsequent patch.
      
      The only code changes here are (a) saving and restore LR on the
      stack, since these new procedures get called with a bl instruction,
      (b) explicitly saving r1 into the PACA instead of assuming that
      HSTATE_HOST_R1(r13) is already set, and (c) removing an unnecessary
      and redundant setting of MSR[TM] that should have been removed by
      commit 9d4d0bdd9e0a ("KVM: PPC: Book3S HV: Add transactional memory
      support", 2013-09-24) but wasn't.
      
      Cc: stable@vger.kernel.org # v3.15+
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      f024ee09
  3. 19 7月, 2016 3 次提交
  4. 18 7月, 2016 2 次提交
  5. 14 7月, 2016 20 次提交
  6. 11 7月, 2016 4 次提交
    • P
      KVM: VMX: introduce vm_{entry,exit}_control_reset_shadow · 8391ce44
      Paolo Bonzini 提交于
      There is no reason to read the entry/exit control fields of the
      VMCS and immediately write back the same value.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      8391ce44
    • P
      KVM: nVMX: keep preemption timer enabled during L2 execution · 9314006d
      Paolo Bonzini 提交于
      Because the vmcs12 preemption timer is emulated through a separate hrtimer,
      we can keep on using the preemption timer in the vmcs02 to emulare L1's
      TSC deadline timer.
      
      However, the corresponding bit in the pin-based execution control field
      must be kept consistent between vmcs01 and vmcs02.  On vmentry we copy
      it into the vmcs02; on vmexit the preemption timer must be disabled in
      the vmcs01 if a preemption timer vmexit happened while in guest mode.
      
      The preemption timer value in the vmcs02 is set by vmx_vcpu_run, so it
      need not be considered in prepare_vmcs02.
      
      Cc: Yunhong Jiang <yunhong.jiang@intel.com>
      Cc: Haozhong Zhang <haozhong.zhang@intel.com>
      Tested-by: NWanpeng Li <kernellwp@gmail.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      9314006d
    • W
      KVM: nVMX: avoid incorrect preemption timer vmexit in nested guest · 55123e3c
      Wanpeng Li 提交于
      The preemption timer for nested VMX is emulated by hrtimer which is started on L2
      entry, stopped on L2 exit and evaluated via the check_nested_events hook. However,
      nested_vmx_exit_handled is always returning true for preemption timer vmexit.  Then,
      the L1 preemption timer vmexit is captured and be treated as a L2 preemption
      timer vmexit, causing NULL pointer dereferences or worse in the L1 guest's
      vmexit handler:
      
          BUG: unable to handle kernel NULL pointer dereference at           (null)
          IP: [<          (null)>]           (null)
          PGD 0
          Oops: 0010 [#1] SMP
          Call Trace:
           ? kvm_lapic_expired_hv_timer+0x47/0x90 [kvm]
           handle_preemption_timer+0xe/0x20 [kvm_intel]
           vmx_handle_exit+0x169/0x15a0 [kvm_intel]
           ? kvm_arch_vcpu_ioctl_run+0xd5d/0x19d0 [kvm]
           kvm_arch_vcpu_ioctl_run+0xdee/0x19d0 [kvm]
           ? kvm_arch_vcpu_ioctl_run+0xd5d/0x19d0 [kvm]
           ? vcpu_load+0x1c/0x60 [kvm]
           ? kvm_arch_vcpu_load+0x57/0x260 [kvm]
           kvm_vcpu_ioctl+0x2d3/0x7c0 [kvm]
           do_vfs_ioctl+0x96/0x6a0
           ? __fget_light+0x2a/0x90
           SyS_ioctl+0x79/0x90
           do_syscall_64+0x68/0x180
           entry_SYSCALL64_slow_path+0x25/0x25
          Code:  Bad RIP value.
          RIP  [<          (null)>]           (null)
           RSP <ffff8800b5263c48>
          CR2: 0000000000000000
          ---[ end trace 9c70c48b1a2bc66e ]---
      
      This can be reproduced readily by preemption timer enabled on L0 and disabled
      on L1.
      
      Return false since preemption timer vmexits must never be reflected to L2.
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Yunhong Jiang <yunhong.jiang@intel.com>
      Cc: Jan Kiszka <jan.kiszka@siemens.com>
      Cc: Haozhong Zhang <haozhong.zhang@intel.com>
      Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      55123e3c
    • P
      KVM: VMX: reflect broken preemption timer in vmcs_config · 1c17c3e6
      Paolo Bonzini 提交于
      Simplify cpu_has_vmx_preemption_timer.  This is consistent with the
      rest of setup_vmcs_config and preparatory for the next patch.
      Tested-by: NWanpeng Li <kernellwp@gmail.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      1c17c3e6
  7. 05 7月, 2016 7 次提交
    • J
      MIPS: KVM: Emulate generic QEMU machine on r6 T&E · 84260972
      James Hogan 提交于
      Default the guest PRId register to represent a generic QEMU machine
      instead of a 24kc on MIPSr6. 24kc isn't supported by r6 Linux kernels.
      Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim KrÄmář <rkrcmar@redhat.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: linux-mips@linux-mips.org
      Cc: kvm@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      84260972
    • J
      MIPS: KVM: Decode RDHWR more strictly · 8eeab81c
      James Hogan 提交于
      When KVM emulates the RDHWR instruction, decode the instruction more
      strictly. The rs field (bits 25:21) should be zero, as should bits 10:9.
      Bits 8:6 is the register select field in MIPSr6, so we aren't strict
      about those bits (no other operations should use that encoding space).
      Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim KrÄmář <rkrcmar@redhat.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: linux-mips@linux-mips.org
      Cc: kvm@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      8eeab81c
    • J
      MIPS: KVM: Recognise r6 CACHE encoding · 5cc4aafc
      James Hogan 提交于
      Recognise the new MIPSr6 CACHE instruction encoding rather than the
      pre-r6 one when an r6 kernel is being built. A SPECIAL3 opcode is used
      and the immediate field is reduced to 9 bits wide since MIPSr6.
      Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim KrÄmář <rkrcmar@redhat.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: linux-mips@linux-mips.org
      Cc: kvm@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      5cc4aafc
    • J
      MIPS: KVM: Support r6 compact branch emulation · 2e0badfa
      James Hogan 提交于
      Add support in KVM for emulation of instructions in the forbidden slot
      of MIPSr6 compact branches. If we hit an exception on the forbidden
      slot, then the branch must not have been taken, which makes calculation
      of the resume PC trivial.
      Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim KrÄmář <rkrcmar@redhat.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: linux-mips@linux-mips.org
      Cc: kvm@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      2e0badfa
    • J
      MIPS: KVM: Don't save/restore lo/hi for r6 · 70e92c7e
      James Hogan 提交于
      MIPSr6 doesn't have lo/hi registers, so don't bother saving or
      restoring them, and don't expose them to userland with the KVM ioctl
      interface either.
      
      In fact the lo/hi registers aren't callee saved in the MIPS ABIs anyway,
      so there is no need to preserve the host lo/hi values at all when
      transitioning to and from the guest (which happens via a function call).
      Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim KrÄmář <rkrcmar@redhat.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: linux-mips@linux-mips.org
      Cc: kvm@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      70e92c7e
    • J
      MIPS: KVM: Fix pre-r6 ll/sc instructions on r6 · d85ebff0
      James Hogan 提交于
      The atomic KVM register access macros in kvm_host.h (for the guest Cause
      register with KVM in trap & emulate mode) use ll/sc instructions,
      however they still .set mips3, which causes pre-MIPSr6 instruction
      encodings to be emitted, even for a MIPSr6 build.
      
      Fix it to use MIPS_ISA_ARCH_LEVEL as other parts of arch/mips already
      do.
      Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim KrÄmář <rkrcmar@redhat.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: linux-mips@linux-mips.org
      Cc: kvm@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      d85ebff0
    • J
      MIPS: KVM: Fix fpu.S misassembly with r6 · d14740fe
      James Hogan 提交于
      __kvm_save_fpu and __kvm_restore_fpu use .set mips64r2 so that they can
      access the odd FPU registers as well as the even, however this causes
      misassembly of the return instruction on MIPSr6.
      
      Fix by replacing .set mips64r2 with .set fp=64, which doesn't change the
      architecture revision.
      Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim KrÄmář <rkrcmar@redhat.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: linux-mips@linux-mips.org
      Cc: kvm@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      d14740fe