1. 04 2月, 2021 5 次提交
    • L
      KVM: vmx/pmu: Add PMU_CAP_LBR_FMT check when guest LBR is enabled · c6462363
      Like Xu 提交于
      Usespace could set the bits [0, 5] of the IA32_PERF_CAPABILITIES
      MSR which tells about the record format stored in the LBR records.
      
      The LBR will be enabled on the guest if host perf supports LBR
      (checked via x86_perf_get_lbr()) and the vcpu model is compatible
      with the host one.
      Signed-off-by: NLike Xu <like.xu@linux.intel.com>
      Message-Id: <20210201051039.255478-4-like.xu@linux.intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      c6462363
    • P
      KVM: vmx/pmu: Add PMU_CAP_LBR_FMT check when guest LBR is enabled · 9c9520ce
      Paolo Bonzini 提交于
      Usespace could set the bits [0, 5] of the IA32_PERF_CAPABILITIES
      MSR which tells about the record format stored in the LBR records.
      
      The LBR will be enabled on the guest if host perf supports LBR
      (checked via x86_perf_get_lbr()) and the vcpu model is compatible
      with the host one.
      Signed-off-by: NLike Xu <like.xu@linux.intel.com>
      Message-Id: <20210201051039.255478-4-like.xu@linux.intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      9c9520ce
    • L
      KVM: x86/vmx: Make vmx_set_intercept_for_msr() non-static · 252e365e
      Like Xu 提交于
      To make code responsibilities clear, we may resue and invoke the
      vmx_set_intercept_for_msr() in other vmx-specific files (e.g. pmu_intel.c),
      so expose it to passthrough LBR msrs later.
      Signed-off-by: NLike Xu <like.xu@linux.intel.com>
      Reviewed-by: NAndi Kleen <ak@linux.intel.com>
      Message-Id: <20210201051039.255478-2-like.xu@linux.intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      252e365e
    • C
      KVM: VMX: Enable bus lock VM exit · fe6b6bc8
      Chenyi Qiang 提交于
      Virtual Machine can exploit bus locks to degrade the performance of
      system. Bus lock can be caused by split locked access to writeback(WB)
      memory or by using locks on uncacheable(UC) memory. The bus lock is
      typically >1000 cycles slower than an atomic operation within a cache
      line. It also disrupts performance on other cores (which must wait for
      the bus lock to be released before their memory operations can
      complete).
      
      To address the threat, bus lock VM exit is introduced to notify the VMM
      when a bus lock was acquired, allowing it to enforce throttling or other
      policy based mitigations.
      
      A VMM can enable VM exit due to bus locks by setting a new "Bus Lock
      Detection" VM-execution control(bit 30 of Secondary Processor-based VM
      execution controls). If delivery of this VM exit was preempted by a
      higher priority VM exit (e.g. EPT misconfiguration, EPT violation, APIC
      access VM exit, APIC write VM exit, exception bitmap exiting), bit 26 of
      exit reason in vmcs field is set to 1.
      
      In current implementation, the KVM exposes this capability through
      KVM_CAP_X86_BUS_LOCK_EXIT. The user can get the supported mode bitmap
      (i.e. off and exit) and enable it explicitly (disabled by default). If
      bus locks in guest are detected by KVM, exit to user space even when
      current exit reason is handled by KVM internally. Set a new field
      KVM_RUN_BUS_LOCK in vcpu->run->flags to inform the user space that there
      is a bus lock detected in guest.
      
      Document for Bus Lock VM exit is now available at the latest "Intel
      Architecture Instruction Set Extensions Programming Reference".
      
      Document Link:
      https://software.intel.com/content/www/us/en/develop/download/intel-architecture-instruction-set-extensions-programming-reference.htmlCo-developed-by: NXiaoyao Li <xiaoyao.li@intel.com>
      Signed-off-by: NXiaoyao Li <xiaoyao.li@intel.com>
      Signed-off-by: NChenyi Qiang <chenyi.qiang@intel.com>
      Message-Id: <20201106090315.18606-4-chenyi.qiang@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      fe6b6bc8
    • S
      KVM: VMX: Convert vcpu_vmx.exit_reason to a union · 8e533240
      Sean Christopherson 提交于
      Convert vcpu_vmx.exit_reason from a u32 to a union (of size u32).  The
      full VM_EXIT_REASON field is comprised of a 16-bit basic exit reason in
      bits 15:0, and single-bit modifiers in bits 31:16.
      
      Historically, KVM has only had to worry about handling the "failed
      VM-Entry" modifier, which could only be set in very specific flows and
      required dedicated handling.  I.e. manually stripping the FAILED_VMENTRY
      bit was a somewhat viable approach.  But even with only a single bit to
      worry about, KVM has had several bugs related to comparing a basic exit
      reason against the full exit reason store in vcpu_vmx.
      
      Upcoming Intel features, e.g. SGX, will add new modifier bits that can
      be set on more or less any VM-Exit, as opposed to the significantly more
      restricted FAILED_VMENTRY, i.e. correctly handling everything in one-off
      flows isn't scalable.  Tracking exit reason in a union forces code to
      explicitly choose between consuming the full exit reason and the basic
      exit, and is a convenient way to document and access the modifiers.
      
      No functional change intended.
      
      Cc: Xiaoyao Li <xiaoyao.li@intel.com>
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NChenyi Qiang <chenyi.qiang@intel.com>
      Message-Id: <20201106090315.18606-2-chenyi.qiang@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      8e533240
  2. 15 11月, 2020 1 次提交
    • S
      KVM: x86: Move vendor CR4 validity check to dedicated kvm_x86_ops hook · c2fe3cd4
      Sean Christopherson 提交于
      Split out VMX's checks on CR4.VMXE to a dedicated hook, .is_valid_cr4(),
      and invoke the new hook from kvm_valid_cr4().  This fixes an issue where
      KVM_SET_SREGS would return success while failing to actually set CR4.
      
      Fixing the issue by explicitly checking kvm_x86_ops.set_cr4()'s return
      in __set_sregs() is not a viable option as KVM has already stuffed a
      variety of vCPU state.
      
      Note, kvm_valid_cr4() and is_valid_cr4() have different return types and
      inverted semantics.  This will be remedied in a future patch.
      
      Fixes: 5e1746d6 ("KVM: nVMX: Allow setting the VMXE bit in CR4")
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Message-Id: <20201007014417.29276-5-sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      c2fe3cd4
  3. 22 10月, 2020 1 次提交
  4. 28 9月, 2020 15 次提交
  5. 23 9月, 2020 1 次提交
  6. 12 9月, 2020 1 次提交
    • P
      KVM: nVMX: Update VMCS02 when L2 PAE PDPTE updates detected · 43fea4e4
      Peter Shier 提交于
      When L2 uses PAE, L0 intercepts of L2 writes to CR0/CR3/CR4 call
      load_pdptrs to read the possibly updated PDPTEs from the guest
      physical address referenced by CR3.  It loads them into
      vcpu->arch.walk_mmu->pdptrs and sets VCPU_EXREG_PDPTR in
      vcpu->arch.regs_dirty.
      
      At the subsequent assumed reentry into L2, the mmu will call
      vmx_load_mmu_pgd which calls ept_load_pdptrs. ept_load_pdptrs sees
      VCPU_EXREG_PDPTR set in vcpu->arch.regs_dirty and loads
      VMCS02.GUEST_PDPTRn from vcpu->arch.walk_mmu->pdptrs[]. This all works
      if the L2 CRn write intercept always resumes L2.
      
      The resume path calls vmx_check_nested_events which checks for
      exceptions, MTF, and expired VMX preemption timers. If
      vmx_check_nested_events finds any of these conditions pending it will
      reflect the corresponding exit into L1. Live migration at this point
      would also cause a missed immediate reentry into L2.
      
      After L1 exits, vmx_vcpu_run calls vmx_register_cache_reset which
      clears VCPU_EXREG_PDPTR in vcpu->arch.regs_dirty.  When L2 next
      resumes, ept_load_pdptrs finds VCPU_EXREG_PDPTR clear in
      vcpu->arch.regs_dirty and does not load VMCS02.GUEST_PDPTRn from
      vcpu->arch.walk_mmu->pdptrs[]. prepare_vmcs02 will then load
      VMCS02.GUEST_PDPTRn from vmcs12->pdptr0/1/2/3 which contain the stale
      values stored at last L2 exit. A repro of this bug showed L2 entering
      triple fault immediately due to the bad VMCS02.GUEST_PDPTRn values.
      
      When L2 is in PAE paging mode add a call to ept_load_pdptrs before
      leaving L2. This will update VMCS02.GUEST_PDPTRn if they are dirty in
      vcpu->arch.walk_mmu->pdptrs[].
      
      Tested:
      kvm-unit-tests with new directed test: vmx_mtf_pdpte_test.
      Verified that test fails without the fix.
      
      Also ran Google internal VMM with an Ubuntu 16.04 4.4.0-83 guest running a
      custom hypervisor with a 32-bit Windows XP L2 guest using PAE. Prior to fix
      would repro readily. Ran 14 simultaneous L2s for 140 iterations with no
      failures.
      Signed-off-by: NPeter Shier <pshier@google.com>
      Reviewed-by: NJim Mattson <jmattson@google.com>
      Message-Id: <20200820230545.2411347-1-pshier@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      43fea4e4
  7. 31 7月, 2020 3 次提交
  8. 11 7月, 2020 2 次提交
    • M
      KVM: VMX: Add guest physical address check in EPT violation and misconfig · 1dbf5d68
      Mohammed Gamal 提交于
      Check guest physical address against its maximum, which depends on the
      guest MAXPHYADDR. If the guest's physical address exceeds the
      maximum (i.e. has reserved bits set), inject a guest page fault with
      PFERR_RSVD_MASK set.
      
      This has to be done both in the EPT violation and page fault paths, as
      there are complications in both cases with respect to the computation
      of the correct error code.
      
      For EPT violations, unfortunately the only possibility is to emulate,
      because the access type in the exit qualification might refer to an
      access to a paging structure, rather than to the access performed by
      the program.
      
      Trapping page faults instead is needed in order to correct the error code,
      but the access type can be obtained from the original error code and
      passed to gva_to_gpa.  The corrections required in the error code are
      subtle. For example, imagine that a PTE for a supervisor page has a reserved
      bit set.  On a supervisor-mode access, the EPT violation path would trigger.
      However, on a user-mode access, the processor will not notice the reserved
      bit and not include PFERR_RSVD_MASK in the error code.
      Co-developed-by: NMohammed Gamal <mgamal@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Message-Id: <20200710154811.418214-8-mgamal@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      1dbf5d68
    • P
      KVM: VMX: introduce vmx_need_pf_intercept · a0c13434
      Paolo Bonzini 提交于
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Message-Id: <20200710154811.418214-7-mgamal@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      a0c13434
  9. 09 7月, 2020 2 次提交
  10. 23 6月, 2020 1 次提交
  11. 08 6月, 2020 1 次提交
    • V
      KVM: VMX: Properly handle kvm_read/write_guest_virt*() result · 7a35e515
      Vitaly Kuznetsov 提交于
      Syzbot reports the following issue:
      
      WARNING: CPU: 0 PID: 6819 at arch/x86/kvm/x86.c:618
       kvm_inject_emulated_page_fault+0x210/0x290 arch/x86/kvm/x86.c:618
      ...
      Call Trace:
      ...
      RIP: 0010:kvm_inject_emulated_page_fault+0x210/0x290 arch/x86/kvm/x86.c:618
      ...
       nested_vmx_get_vmptr+0x1f9/0x2a0 arch/x86/kvm/vmx/nested.c:4638
       handle_vmon arch/x86/kvm/vmx/nested.c:4767 [inline]
       handle_vmon+0x168/0x3a0 arch/x86/kvm/vmx/nested.c:4728
       vmx_handle_exit+0x29c/0x1260 arch/x86/kvm/vmx/vmx.c:6067
      
      'exception' we're trying to inject with kvm_inject_emulated_page_fault()
      comes from:
      
        nested_vmx_get_vmptr()
         kvm_read_guest_virt()
           kvm_read_guest_virt_helper()
             vcpu->arch.walk_mmu->gva_to_gpa()
      
      but it is only set when GVA to GPA conversion fails. In case it doesn't but
      we still fail kvm_vcpu_read_guest_page(), X86EMUL_IO_NEEDED is returned and
      nested_vmx_get_vmptr() calls kvm_inject_emulated_page_fault() with zeroed
      'exception'. This happen when the argument is MMIO.
      
      Paolo also noticed that nested_vmx_get_vmptr() is not the only place in
      KVM code where kvm_read/write_guest_virt*() return result is mishandled.
      VMX instructions along with INVPCID have the same issue. This was already
      noticed before, e.g. see commit 541ab2ae ("KVM: x86: work around
      leak of uninitialized stack contents") but was never fully fixed.
      
      KVM could've handled the request correctly by going to userspace and
      performing I/O but there doesn't seem to be a good need for such requests
      in the first place.
      
      Introduce vmx_handle_memory_failure() as an interim solution.
      
      Note, nested_vmx_get_vmptr() now has three possible outcomes: OK, PF,
      KVM_EXIT_INTERNAL_ERROR and callers need to know if userspace exit is
      needed (for KVM_EXIT_INTERNAL_ERROR) in case of failure. We don't seem
      to have a good enum describing this tristate, just add "int *ret" to
      nested_vmx_get_vmptr() interface to pass the information.
      
      Reported-by: syzbot+2a7156e11dc199bdbd8a@syzkaller.appspotmail.com
      Suggested-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Message-Id: <20200605115906.532682-1-vkuznets@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      7a35e515
  12. 01 6月, 2020 1 次提交
  13. 14 5月, 2020 5 次提交
  14. 21 4月, 2020 1 次提交