1. 18 6月, 2020 1 次提交
  2. 11 6月, 2020 1 次提交
  3. 08 6月, 2020 1 次提交
    • V
      KVM: VMX: Properly handle kvm_read/write_guest_virt*() result · 7a35e515
      Vitaly Kuznetsov 提交于
      Syzbot reports the following issue:
      
      WARNING: CPU: 0 PID: 6819 at arch/x86/kvm/x86.c:618
       kvm_inject_emulated_page_fault+0x210/0x290 arch/x86/kvm/x86.c:618
      ...
      Call Trace:
      ...
      RIP: 0010:kvm_inject_emulated_page_fault+0x210/0x290 arch/x86/kvm/x86.c:618
      ...
       nested_vmx_get_vmptr+0x1f9/0x2a0 arch/x86/kvm/vmx/nested.c:4638
       handle_vmon arch/x86/kvm/vmx/nested.c:4767 [inline]
       handle_vmon+0x168/0x3a0 arch/x86/kvm/vmx/nested.c:4728
       vmx_handle_exit+0x29c/0x1260 arch/x86/kvm/vmx/vmx.c:6067
      
      'exception' we're trying to inject with kvm_inject_emulated_page_fault()
      comes from:
      
        nested_vmx_get_vmptr()
         kvm_read_guest_virt()
           kvm_read_guest_virt_helper()
             vcpu->arch.walk_mmu->gva_to_gpa()
      
      but it is only set when GVA to GPA conversion fails. In case it doesn't but
      we still fail kvm_vcpu_read_guest_page(), X86EMUL_IO_NEEDED is returned and
      nested_vmx_get_vmptr() calls kvm_inject_emulated_page_fault() with zeroed
      'exception'. This happen when the argument is MMIO.
      
      Paolo also noticed that nested_vmx_get_vmptr() is not the only place in
      KVM code where kvm_read/write_guest_virt*() return result is mishandled.
      VMX instructions along with INVPCID have the same issue. This was already
      noticed before, e.g. see commit 541ab2ae ("KVM: x86: work around
      leak of uninitialized stack contents") but was never fully fixed.
      
      KVM could've handled the request correctly by going to userspace and
      performing I/O but there doesn't seem to be a good need for such requests
      in the first place.
      
      Introduce vmx_handle_memory_failure() as an interim solution.
      
      Note, nested_vmx_get_vmptr() now has three possible outcomes: OK, PF,
      KVM_EXIT_INTERNAL_ERROR and callers need to know if userspace exit is
      needed (for KVM_EXIT_INTERNAL_ERROR) in case of failure. We don't seem
      to have a good enum describing this tristate, just add "int *ret" to
      nested_vmx_get_vmptr() interface to pass the information.
      
      Reported-by: syzbot+2a7156e11dc199bdbd8a@syzkaller.appspotmail.com
      Suggested-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Message-Id: <20200605115906.532682-1-vkuznets@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      7a35e515
  4. 05 6月, 2020 1 次提交
  5. 01 6月, 2020 2 次提交
    • L
      KVM: x86/pmu: Support full width counting · 27461da3
      Like Xu 提交于
      Intel CPUs have a new alternative MSR range (starting from MSR_IA32_PMC0)
      for GP counters that allows writing the full counter width. Enable this
      range from a new capability bit (IA32_PERF_CAPABILITIES.FW_WRITE[bit 13]).
      
      The guest would query CPUID to get the counter width, and sign extends
      the counter values as needed. The traditional MSRs always limit to 32bit,
      even though the counter internally is larger (48 or 57 bits).
      
      When the new capability is set, use the alternative range which do not
      have these restrictions. This lowers the overhead of perf stat slightly
      because it has to do less interrupts to accumulate the counter value.
      Signed-off-by: NLike Xu <like.xu@linux.intel.com>
      Message-Id: <20200529074347.124619-3-like.xu@linux.intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      27461da3
    • V
      KVM: x86: extend struct kvm_vcpu_pv_apf_data with token info · 68fd66f1
      Vitaly Kuznetsov 提交于
      Currently, APF mechanism relies on the #PF abuse where the token is being
      passed through CR2. If we switch to using interrupts to deliver page-ready
      notifications we need a different way to pass the data. Extent the existing
      'struct kvm_vcpu_pv_apf_data' with token information for page-ready
      notifications.
      
      While on it, rename 'reason' to 'flags'. This doesn't change the semantics
      as we only have reasons '1' and '2' and these can be treated as bit flags
      but KVM_PV_REASON_PAGE_READY is going away with interrupt based delivery
      making 'reason' name misleading.
      
      The newly introduced apf_put_user_ready() temporary puts both flags and
      token information, this will be changed to put token only when we switch
      to interrupt based notifications.
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Message-Id: <20200525144125.143875-3-vkuznets@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      68fd66f1
  6. 28 5月, 2020 6 次提交
    • P
      KVM: nVMX: always update CR3 in VMCS · df7e0681
      Paolo Bonzini 提交于
      vmx_load_mmu_pgd is delaying the write of GUEST_CR3 to prepare_vmcs02 as
      an optimization, but this is only correct before the nested vmentry.
      If userspace is modifying CR3 with KVM_SET_SREGS after the VM has
      already been put in guest mode, the value of CR3 will not be updated.
      Remove the optimization, which almost never triggers anyway.
      
      Fixes: 04f11ef4 ("KVM: nVMX: Always write vmcs02.GUEST_CR3 during nested VM-Enter")
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      df7e0681
    • P
      KVM: x86: enable event window in inject_pending_event · c9d40913
      Paolo Bonzini 提交于
      In case an interrupt arrives after nested.check_events but before the
      call to kvm_cpu_has_injectable_intr, we could end up enabling the interrupt
      window even if the interrupt is actually going to be a vmexit.  This is
      useless rather than harmful, but it really complicates reasoning about
      SVM's handling of the VINTR intercept.  We'd like to never bother with
      the VINTR intercept if V_INTR_MASKING=1 && INTERCEPT_INTR=1, because in
      that case there is no interrupt window and we can just exit the nested
      guest whenever we want.
      
      This patch moves the opening of the interrupt window inside
      inject_pending_event.  This consolidates the check for pending
      interrupt/NMI/SMI in one place, and makes KVM's usage of immediate
      exits more consistent, extending it beyond just nested virtualization.
      
      There are two functional changes here.  They only affect corner cases,
      but overall they simplify the inject_pending_event.
      
      - re-injection of still-pending events will also use req_immediate_exit
      instead of using interrupt-window intercepts.  This should have no impact
      on performance on Intel since it simply replaces an interrupt-window
      or NMI-window exit for a preemption-timer exit.  On AMD, which has no
      equivalent of the preemption time, it may incur some overhead but an
      actual effect on performance should only be visible in pathological cases.
      
      - kvm_arch_interrupt_allowed and kvm_vcpu_has_events will return true
      if an interrupt, NMI or SMI is blocked by nested_run_pending.  This
      makes sense because entering the VM will allow it to make progress
      and deliver the event.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      c9d40913
    • M
      KVM: VMX: replace "fall through" with "return" to indicate different case · a8cfbae5
      Miaohe Lin 提交于
      The second "/* fall through */" in rmode_exception() makes code harder to
      read. Replace it with "return" to indicate they are different cases, only
      the #DB and #BP check vcpu->guest_debug, while others don't care. And this
      also improves the readability.
      Suggested-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Reviewed-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: NMiaohe Lin <linmiaohe@huawei.com>
      Message-Id: <1582080348-20827-1-git-send-email-linmiaohe@huawei.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      a8cfbae5
    • S
      KVM: x86: Take an unsigned 32-bit int for has_emulated_msr()'s index · cb97c2d6
      Sean Christopherson 提交于
      Take a u32 for the index in has_emulated_msr() to match hardware, which
      treats MSR indices as unsigned 32-bit values.  Functionally, taking a
      signed int doesn't cause problems with the current code base, but could
      theoretically cause problems with 32-bit KVM, e.g. if the index were
      checked via a less-than statement, which would evaluate incorrectly for
      MSR indices with bit 31 set.
      Reviewed-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Message-Id: <20200218234012.7110-3-sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      cb97c2d6
    • P
      KVM: x86: simplify is_mmio_spte · e7581cac
      Paolo Bonzini 提交于
      We can simply look at bits 52-53 to identify MMIO entries in KVM's page
      tables.  Therefore, there is no need to pass a mask to kvm_mmu_set_mmio_spte_mask.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      e7581cac
    • M
      KVM: VMX: enable X86_FEATURE_WAITPKG in KVM capabilities · 0abcc8f6
      Maxim Levitsky 提交于
      Even though we might not allow the guest to use WAITPKG's new
      instructions, we should tell KVM that the feature is supported by the
      host CPU.
      
      Note that vmx_waitpkg_supported checks that WAITPKG _can_ be set in
      secondary execution controls as specified by VMX capability MSR, rather
      that we actually enable it for a guest.
      
      Cc: stable@vger.kernel.org
      Fixes: e69e72fa ("KVM: x86: Add support for user wait instructions")
      Suggested-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
      Message-Id: <20200523161455.3940-2-mlevitsk@redhat.com>
      Reviewed-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      0abcc8f6
  7. 16 5月, 2020 10 次提交
  8. 14 5月, 2020 16 次提交
  9. 13 5月, 2020 1 次提交
    • B
      KVM: x86: Fix pkru save/restore when guest CR4.PKE=0, move it to x86.c · 37486135
      Babu Moger 提交于
      Though rdpkru and wrpkru are contingent upon CR4.PKE, the PKRU
      resource isn't. It can be read with XSAVE and written with XRSTOR.
      So, if we don't set the guest PKRU value here(kvm_load_guest_xsave_state),
      the guest can read the host value.
      
      In case of kvm_load_host_xsave_state, guest with CR4.PKE clear could
      potentially use XRSTOR to change the host PKRU value.
      
      While at it, move pkru state save/restore to common code and the
      host_pkru field to kvm_vcpu_arch.  This will let SVM support protection keys.
      
      Cc: stable@vger.kernel.org
      Reported-by: NJim Mattson <jmattson@google.com>
      Signed-off-by: NBabu Moger <babu.moger@amd.com>
      Message-Id: <158932794619.44260.14508381096663848853.stgit@naples-babu.amd.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      37486135
  10. 08 5月, 2020 1 次提交
    • P
      KVM: VMX: pass correct DR6 for GD userspace exit · 45981ded
      Paolo Bonzini 提交于
      When KVM_EXIT_DEBUG is raised for the disabled-breakpoints case (DR7.GD),
      DR6 was incorrectly copied from the value in the VM.  Instead,
      DR6.BD should be set in order to catch this case.
      
      On AMD this does not need any special code because the processor triggers
      a #DB exception that is intercepted.  However, the testcase would fail
      without the previous patch because both DR6.BS and DR6.BD would be set.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      45981ded