1. 16 4月, 2019 4 次提交
    • S
      KVM: x86: Load SMRAM in a single shot when leaving SMM · ed19321f
      Sean Christopherson 提交于
      RSM emulation is currently broken on VMX when the interrupted guest has
      CR4.VMXE=1.  Rather than dance around the issue of HF_SMM_MASK being set
      when loading SMSTATE into architectural state, ideally RSM emulation
      itself would be reworked to clear HF_SMM_MASK prior to loading non-SMM
      architectural state.
      
      Ostensibly, the only motivation for having HF_SMM_MASK set throughout
      the loading of state from the SMRAM save state area is so that the
      memory accesses from GET_SMSTATE() are tagged with role.smm.  Load
      all of the SMRAM save state area from guest memory at the beginning of
      RSM emulation, and load state from the buffer instead of reading guest
      memory one-by-one.
      
      This paves the way for clearing HF_SMM_MASK prior to loading state,
      and also aligns RSM with the enter_smm() behavior, which fills a
      buffer and writes SMRAM save state in a single go.
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      ed19321f
    • L
      KVM: nVMX: Expose RDPMC-exiting only when guest supports PMU · e51bfdb6
      Liran Alon 提交于
      Issue was discovered when running kvm-unit-tests on KVM running as L1 on
      top of Hyper-V.
      
      When vmx_instruction_intercept unit-test attempts to run RDPMC to test
      RDPMC-exiting, it is intercepted by L1 KVM which it's EXIT_REASON_RDPMC
      handler raise #GP because vCPU exposed by Hyper-V doesn't support PMU.
      Instead of unit-test expectation to be reflected with EXIT_REASON_RDPMC.
      
      The reason vmx_instruction_intercept unit-test attempts to run RDPMC
      even though Hyper-V doesn't support PMU is because L1 expose to L2
      support for RDPMC-exiting. Which is reasonable to assume that is
      supported only in case CPU supports PMU to being with.
      
      Above issue can easily be simulated by modifying
      vmx_instruction_intercept config in x86/unittests.cfg to run QEMU with
      "-cpu host,+vmx,-pmu" and run unit-test.
      
      To handle issue, change KVM to expose RDPMC-exiting only when guest
      supports PMU.
      Reported-by: NSaar Amar <saaramar@microsoft.com>
      Reviewed-by: NMihai Carabas <mihai.carabas@oracle.com>
      Reviewed-by: NJim Mattson <jmattson@google.com>
      Signed-off-by: NLiran Alon <liran.alon@oracle.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      e51bfdb6
    • W
      x86/kvm: move kvm_load/put_guest_xcr0 into atomic context · 1811d979
      WANG Chao 提交于
      guest xcr0 could leak into host when MCE happens in guest mode. Because
      do_machine_check() could schedule out at a few places.
      
      For example:
      
      kvm_load_guest_xcr0
      ...
      kvm_x86_ops->run(vcpu) {
        vmx_vcpu_run
          vmx_complete_atomic_exit
            kvm_machine_check
              do_machine_check
                do_memory_failure
                  memory_failure
                    lock_page
      
      In this case, host_xcr0 is 0x2ff, guest vcpu xcr0 is 0xff. After schedule
      out, host cpu has guest xcr0 loaded (0xff).
      
      In __switch_to {
           switch_fpu_finish
             copy_kernel_to_fpregs
               XRSTORS
      
      If any bit i in XSTATE_BV[i] == 1 and xcr0[i] == 0, XRSTORS will
      generate #GP (In this case, bit 9). Then ex_handler_fprestore kicks in
      and tries to reinitialize fpu by restoring init fpu state. Same story as
      last #GP, except we get DOUBLE FAULT this time.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NWANG Chao <chao.wang@ucloud.cn>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      1811d979
    • P
      KVM: nVMX: allow tests to use bad virtual-APIC page address · 69090810
      Paolo Bonzini 提交于
      As mentioned in the comment, there are some special cases where we can simply
      clear the TPR shadow bit from the CPU-based execution controls in the vmcs02.
      Handle them so that we can remove some XFAILs from vmx.flat.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      69090810
  2. 29 3月, 2019 2 次提交
    • S
      KVM: x86: Emulate MSR_IA32_ARCH_CAPABILITIES on AMD hosts · 0cf9135b
      Sean Christopherson 提交于
      The CPUID flag ARCH_CAPABILITIES is unconditioinally exposed to host
      userspace for all x86 hosts, i.e. KVM advertises ARCH_CAPABILITIES
      regardless of hardware support under the pretense that KVM fully
      emulates MSR_IA32_ARCH_CAPABILITIES.  Unfortunately, only VMX hosts
      handle accesses to MSR_IA32_ARCH_CAPABILITIES (despite KVM_GET_MSRS
      also reporting MSR_IA32_ARCH_CAPABILITIES for all hosts).
      
      Move the MSR_IA32_ARCH_CAPABILITIES handling to common x86 code so
      that it's emulated on AMD hosts.
      
      Fixes: 1eaafe91 ("kvm: x86: IA32_ARCH_CAPABILITIES is always supported")
      Cc: stable@vger.kernel.org
      Reported-by: NXiaoyao Li <xiaoyao.li@linux.intel.com>
      Cc: Jim Mattson <jmattson@google.com>
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      0cf9135b
    • S
      KVM: SVM: Workaround errata#1096 (insn_len maybe zero on SMAP violation) · 05d5a486
      Singh, Brijesh 提交于
      Errata#1096:
      
      On a nested data page fault when CR.SMAP=1 and the guest data read
      generates a SMAP violation, GuestInstrBytes field of the VMCB on a
      VMEXIT will incorrectly return 0h instead the correct guest
      instruction bytes .
      
      Recommend Workaround:
      
      To determine what instruction the guest was executing the hypervisor
      will have to decode the instruction at the instruction pointer.
      
      The recommended workaround can not be implemented for the SEV
      guest because guest memory is encrypted with the guest specific key,
      and instruction decoder will not be able to decode the instruction
      bytes. If we hit this errata in the SEV guest then log the message
      and request a guest shutdown.
      Reported-by: NVenkatesh Srinivas <venkateshs@google.com>
      Cc: Jim Mattson <jmattson@google.com>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: "Radim Krčmář" <rkrcmar@redhat.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NBrijesh Singh <brijesh.singh@amd.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      05d5a486
  3. 16 3月, 2019 1 次提交
  4. 21 2月, 2019 12 次提交
  5. 14 2月, 2019 2 次提交
    • X
      kvm: vmx: Fix entry number check for add_atomic_switch_msr() · 98ae70cc
      Xiaoyao Li 提交于
      Commit ca83b4a7 ("x86/KVM/VMX: Add find_msr() helper function")
      introduces the helper function find_msr(), which returns -ENOENT when
      not find the msr in vmx->msr_autoload.guest/host. Correct checking contion
      of no more available entry in vmx->msr_autoload.
      
      Fixes: ca83b4a7 ("x86/KVM/VMX: Add find_msr() helper function")
      Cc: stable@vger.kernel.org
      Signed-off-by: NXiaoyao Li <xiaoyao.li@linux.intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      98ae70cc
    • L
      KVM: x86: Recompute PID.ON when clearing PID.SN · c112b5f5
      Luwei Kang 提交于
      Some Posted-Interrupts from passthrough devices may be lost or
      overwritten when the vCPU is in runnable state.
      
      The SN (Suppress Notification) of PID (Posted Interrupt Descriptor) will
      be set when the vCPU is preempted (vCPU in KVM_MP_STATE_RUNNABLE state but
      not running on physical CPU). If a posted interrupt comes at this time,
      the irq remapping facility will set the bit of PIR (Posted Interrupt
      Requests) but not ON (Outstanding Notification).  Then, the interrupt
      will not be seen by KVM, which always expects PID.ON=1 if PID.PIR=1
      as documented in the Intel processor SDM but not in the VT-d specification.
      To fix this, restore the invariant after PID.SN is cleared.
      Signed-off-by: NLuwei Kang <luwei.kang@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      c112b5f5
  6. 12 2月, 2019 13 次提交
  7. 31 1月, 2019 1 次提交
    • J
      cpu/hotplug: Fix "SMT disabled by BIOS" detection for KVM · b284909a
      Josh Poimboeuf 提交于
      With the following commit:
      
        73d5e2b4 ("cpu/hotplug: detect SMT disabled by BIOS")
      
      ... the hotplug code attempted to detect when SMT was disabled by BIOS,
      in which case it reported SMT as permanently disabled.  However, that
      code broke a virt hotplug scenario, where the guest is booted with only
      primary CPU threads, and a sibling is brought online later.
      
      The problem is that there doesn't seem to be a way to reliably
      distinguish between the HW "SMT disabled by BIOS" case and the virt
      "sibling not yet brought online" case.  So the above-mentioned commit
      was a bit misguided, as it permanently disabled SMT for both cases,
      preventing future virt sibling hotplugs.
      
      Going back and reviewing the original problems which were attempted to
      be solved by that commit, when SMT was disabled in BIOS:
      
        1) /sys/devices/system/cpu/smt/control showed "on" instead of
           "notsupported"; and
      
        2) vmx_vm_init() was incorrectly showing the L1TF_MSG_SMT warning.
      
      I'd propose that we instead consider #1 above to not actually be a
      problem.  Because, at least in the virt case, it's possible that SMT
      wasn't disabled by BIOS and a sibling thread could be brought online
      later.  So it makes sense to just always default the smt control to "on"
      to allow for that possibility (assuming cpuid indicates that the CPU
      supports SMT).
      
      The real problem is #2, which has a simple fix: change vmx_vm_init() to
      query the actual current SMT state -- i.e., whether any siblings are
      currently online -- instead of looking at the SMT "control" sysfs value.
      
      So fix it by:
      
        a) reverting the original "fix" and its followup fix:
      
           73d5e2b4 ("cpu/hotplug: detect SMT disabled by BIOS")
           bc2d8d26 ("cpu/hotplug: Fix SMT supported evaluation")
      
           and
      
        b) changing vmx_vm_init() to query the actual current SMT state --
           instead of the sysfs control value -- to determine whether the L1TF
           warning is needed.  This also requires the 'sched_smt_present'
           variable to exported, instead of 'cpu_smt_control'.
      
      Fixes: 73d5e2b4 ("cpu/hotplug: detect SMT disabled by BIOS")
      Reported-by: NIgor Mammedov <imammedo@redhat.com>
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Joe Mario <jmario@redhat.com>
      Cc: Jiri Kosina <jikos@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: kvm@vger.kernel.org
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/e3a85d585da28cc333ecbc1e78ee9216e6da9396.1548794349.git.jpoimboe@redhat.com
      b284909a
  8. 26 1月, 2019 4 次提交
    • G
      KVM: x86: Mark expected switch fall-throughs · b2869f28
      Gustavo A. R. Silva 提交于
      In preparation to enabling -Wimplicit-fallthrough, mark switch
      cases where we are expecting to fall through.
      
      This patch fixes the following warnings:
      
      arch/x86/kvm/lapic.c:1037:27: warning: this statement may fall through [-Wimplicit-fallthrough=]
      arch/x86/kvm/lapic.c:1876:3: warning: this statement may fall through [-Wimplicit-fallthrough=]
      arch/x86/kvm/hyperv.c:1637:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
      arch/x86/kvm/svm.c:4396:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
      arch/x86/kvm/mmu.c:4372:36: warning: this statement may fall through [-Wimplicit-fallthrough=]
      arch/x86/kvm/x86.c:3835:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
      arch/x86/kvm/x86.c:7938:23: warning: this statement may fall through [-Wimplicit-fallthrough=]
      arch/x86/kvm/vmx/vmx.c:2015:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
      arch/x86/kvm/vmx/vmx.c:1773:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
      
      Warning level 3 was used: -Wimplicit-fallthrough=3
      
      This patch is part of the ongoing efforts to enabling -Wimplicit-fallthrough.
      Signed-off-by: NGustavo A. R. Silva <gustavo@embeddedor.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      b2869f28
    • S
      KVM: VMX: Move vmx_vcpu_run()'s VM-Enter asm blob to a helper function · 5ad6ece8
      Sean Christopherson 提交于
      ...along with the function's STACK_FRAME_NON_STANDARD tag.  Moving the
      asm blob results in a significantly smaller amount of code that is
      marked with STACK_FRAME_NON_STANDARD, which makes it far less likely
      that gcc will split the function and trigger a spurious objtool warning.
      As a bonus, removing STACK_FRAME_NON_STANDARD from vmx_vcpu_run() allows
      the bulk of code to be properly checked by objtool.
      
      Because %rbp is not loaded via VMCS fields, vmx_vcpu_run() must manually
      save/restore the host's RBP and load the guest's RBP prior to calling
      vmx_vmenter().  Modifying %rbp triggers objtool's stack validation code,
      and so vmx_vcpu_run() is tagged with STACK_FRAME_NON_STANDARD since it's
      impossible to avoid modifying %rbp.
      
      Unfortunately, vmx_vcpu_run() is also a gigantic function that gcc will
      split into separate functions, e.g. so that pieces of the function can
      be inlined.  Splitting the function means that the compiled Elf file
      will contain one or more vmx_vcpu_run.part.* functions in addition to
      a vmx_vcpu_run function.  Depending on where the function is split,
      objtool may warn about a "call without frame pointer save/setup" in
      vmx_vcpu_run.part.* since objtool's stack validation looks for exact
      names when whitelisting functions tagged with STACK_FRAME_NON_STANDARD.
      
      Up until recently, the undesirable function splitting was effectively
      blocked because vmx_vcpu_run() was tagged with __noclone.  At the time,
      __noclone had an unintended side effect that put vmx_vcpu_run() into a
      separate optimization unit, which in turn prevented gcc from inlining
      the function (or any of its own function calls) and thus eliminated gcc's
      motivation to split the function.  Removing the __noclone attribute
      allowed gcc to optimize vmx_vcpu_run(), exposing the objtool warning.
      
      Kudos to Qian Cai for root causing that the fnsplit optimization is what
      caused objtool to complain.
      
      Fixes: 453eafbe ("KVM: VMX: Move VM-Enter + VM-Exit handling to non-inline sub-routines")
      Tested-by: NQian Cai <cai@lca.pw>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Reported-by: Nkbuild test robot <lkp@intel.com>
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      5ad6ece8
    • Y
      kvm: vmx: fix some -Wmissing-prototypes warnings · 8997f657
      Yi Wang 提交于
      We get some warnings when building kernel with W=1:
      arch/x86/kvm/vmx/vmx.c:426:5: warning: no previous prototype for ‘kvm_fill_hv_flush_list_func’ [-Wmissing-prototypes]
      arch/x86/kvm/vmx/nested.c:58:6: warning: no previous prototype for ‘init_vmcs_shadow_fields’ [-Wmissing-prototypes]
      
      Make them static to fix this.
      Signed-off-by: NYi Wang <wang.yi59@zte.com.cn>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      8997f657
    • S
      KVM: VMX: Use the correct field var when clearing VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL · 85ba2b16
      Sean Christopherson 提交于
      Fix a recently introduced bug that results in the wrong VMCS control
      field being updated when applying a IA32_PERF_GLOBAL_CTRL errata.
      
      Fixes: c73da3fc ("KVM: VMX: Properly handle dynamic VM Entry/Exit controls")
      Reported-by: NHarald Arnesen <harald@skogtun.org>
      Tested-by: NHarald Arnesen <harald@skogtun.org>
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      85ba2b16
  9. 12 1月, 2019 1 次提交