1. 02 8月, 2021 6 次提交
    • S
      KVM: x86: Consolidate APIC base RESET initialization code · 4547700a
      Sean Christopherson 提交于
      Consolidate the APIC base RESET logic, which is currently spread out
      across both x86 and vendor code.  For an in-kernel APIC, the vendor code
      is redundant.  But for a userspace APIC, KVM relies on the vendor code
      to initialize vcpu->arch.apic_base.  Hoist the vcpu->arch.apic_base
      initialization above the !apic check so that it applies to both flavors
      of APIC emulation, and delete the vendor code.
      Reviewed-by: NReiji Watanabe <reijiw@google.com>
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210713163324.627647-19-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      4547700a
    • S
      KVM: SVM: Drop explicit MMU reset at RESET/INIT · 5d2d7e41
      Sean Christopherson 提交于
      Drop an explicit MMU reset in SVM's vCPU RESET/INIT flow now that the
      common x86 path correctly handles conditional MMU resets, e.g. if INIT
      arrives while the vCPU is in 64-bit mode.
      
      This reverts commit ebae871a ("kvm: svm: reset mmu on VCPU reset").
      Reviewed-by: NReiji Watanabe <reijiw@google.com>
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210713163324.627647-9-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      5d2d7e41
    • S
      KVM: SVM: Fall back to KVM's hardcoded value for EDX at RESET/INIT · 665f4d92
      Sean Christopherson 提交于
      At vCPU RESET/INIT (mostly RESET), stuff EDX with KVM's hardcoded,
      default Family-Model-Stepping ID of 0x600 if CPUID.0x1 isn't defined.
      At RESET, the CPUID lookup is guaranteed to "miss" because KVM emulates
      RESET before exposing the vCPU to userspace, i.e. userspace can't
      possibly have done set the vCPU's CPUID model, and thus KVM will always
      write '0'.  At INIT, using 0x600 is less bad than using '0'.
      
      While initializing EDX to '0' is _extremely_ unlikely to be noticed by
      the guest, let alone break the guest, and can be overridden by
      userspace for the RESET case, using 0x600 is preferable as it will allow
      consolidating the relevant VMX and SVM RESET/INIT logic in the future.
      And, digging through old specs suggests that neither Intel nor AMD have
      ever shipped a CPU that initialized EDX to '0' at RESET.
      
      Regarding 0x600 as KVM's default Family, it is a sane default and in
      many ways the most appropriate.  Prior to the 386 implementations, DX
      was undefined at RESET.  With the 386, 486, 586/P5, and 686/P6/Athlon,
      both Intel and AMD set EDX to 3, 4, 5, and 6 respectively.  AMD switched
      to using '15' as its primary Family with the introduction of AMD64, but
      Intel has continued using '6' for the last few decades.
      
      So, '6' is a valid Family for both Intel and AMD CPUs, is compatible
      with both 32-bit and 64-bit CPUs (albeit not a perfect fit for 64-bit
      AMD), and of the common Families (3 - 6), is the best fit with respect to
      KVM's virtual CPU model.  E.g. prior to the P6, Intel CPUs did not have a
      STI window.  Modern operating systems, Linux included, rely on the STI
      window, e.g. for "safe halt", and KVM unconditionally assumes the virtual
      CPU has an STI window.  Thus enumerating a Family ID of 3, 4, or 5 would
      be provably wrong.
      
      Opportunistically remove a stale comment.
      
      Fixes: 66f7b72e ("KVM: x86: Make register state after reset conform to specification")
      Reviewed-by: NReiji Watanabe <reijiw@google.com>
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210713163324.627647-7-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      665f4d92
    • S
      KVM: SVM: Require exact CPUID.0x1 match when stuffing EDX at INIT · 067a456d
      Sean Christopherson 提交于
      Do not allow an inexact CPUID "match" when querying the guest's CPUID.0x1
      to stuff EDX during INIT.  In the common case, where the guest CPU model
      is an AMD variant, allowing an inexact match is a nop since KVM doesn't
      emulate Intel's goofy "out-of-range" logic for AMD and Hygon.  If the
      vCPU model happens to be an Intel variant, an inexact match is possible
      if and only if the max CPUID leaf is precisely '0'. Aside from the fact
      that there's probably no CPU in existence with a single CPUID leaf, if
      the max CPUID leaf is '0', that means that CPUID.0.EAX is '0', and thus
      an inexact match for CPUID.0x1.EAX will also yield '0'.
      
      So, with lots of twisty logic, no functional change intended.
      Reviewed-by: NReiji Watanabe <reijiw@google.com>
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210713163324.627647-6-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      067a456d
    • S
      KVM: SVM: Zero out GDTR.base and IDTR.base on INIT · 4f117ce4
      Sean Christopherson 提交于
      Explicitly set GDTR.base and IDTR.base to zero when intializing the VMCB.
      Functionally this only affects INIT, as the bases are implicitly set to
      zero on RESET by virtue of the VMCB being zero allocated.
      
      Per AMD's APM, GDTR.base and IDTR.base are zeroed after RESET and INIT.
      
      Fixes: 04d2cc77 ("KVM: Move main vcpu loop into subarch independent code")
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210713163324.627647-4-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      4f117ce4
    • S
      KVM: x86: Use KVM_BUG/KVM_BUG_ON to handle bugs that are fatal to the VM · 67369273
      Sean Christopherson 提交于
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NIsaku Yamahata <isaku.yamahata@intel.com>
      Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
      Message-Id: <0e8760a26151f47dc47052b25ca8b84fffe0641e.1625186503.git.isaku.yamahata@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      67369273
  2. 28 7月, 2021 4 次提交
  3. 26 7月, 2021 2 次提交
  4. 15 7月, 2021 11 次提交
    • V
      KVM: nSVM: Restore nested control upon leaving SMM · bb00bd9c
      Vitaly Kuznetsov 提交于
      If the VM was migrated while in SMM, no nested state was saved/restored,
      and therefore svm_leave_smm has to load both save and control area
      of the vmcb12. Save area is already loaded from HSAVE area,
      so now load the control area as well from the vmcb12.
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Message-Id: <20210628104425.391276-6-vkuznets@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      bb00bd9c
    • V
      KVM: nSVM: Fix L1 state corruption upon return from SMM · 37be407b
      Vitaly Kuznetsov 提交于
      VMCB split commit 4995a368 ("KVM: SVM: Use a separate vmcb for the
      nested L2 guest") broke return from SMM when we entered there from guest
      (L2) mode. Gen2 WS2016/Hyper-V is known to do this on boot. The problem
      manifests itself like this:
      
        kvm_exit:             reason EXIT_RSM rip 0x7ffbb280 info 0 0
        kvm_emulate_insn:     0:7ffbb280: 0f aa
        kvm_smm_transition:   vcpu 0: leaving SMM, smbase 0x7ffb3000
        kvm_nested_vmrun:     rip: 0x000000007ffbb280 vmcb: 0x0000000008224000
          nrip: 0xffffffffffbbe119 int_ctl: 0x01020000 event_inj: 0x00000000
          npt: on
        kvm_nested_intercepts: cr_read: 0000 cr_write: 0010 excp: 40060002
          intercepts: fd44bfeb 0000217f 00000000
        kvm_entry:            vcpu 0, rip 0xffffffffffbbe119
        kvm_exit:             reason EXIT_NPF rip 0xffffffffffbbe119 info
          200000006 1ab000
        kvm_nested_vmexit:    vcpu 0 reason npf rip 0xffffffffffbbe119 info1
          0x0000000200000006 info2 0x00000000001ab000 intr_info 0x00000000
          error_code 0x00000000
        kvm_page_fault:       address 1ab000 error_code 6
        kvm_nested_vmexit_inject: reason EXIT_NPF info1 200000006 info2 1ab000
          int_info 0 int_info_err 0
        kvm_entry:            vcpu 0, rip 0x7ffbb280
        kvm_exit:             reason EXIT_EXCP_GP rip 0x7ffbb280 info 0 0
        kvm_emulate_insn:     0:7ffbb280: 0f aa
        kvm_inj_exception:    #GP (0x0)
      
      Note: return to L2 succeeded but upon first exit to L1 its RIP points to
      'RSM' instruction but we're not in SMM.
      
      The problem appears to be that VMCB01 gets irreversibly destroyed during
      SMM execution. Previously, we used to have 'hsave' VMCB where regular
      (pre-SMM) L1's state was saved upon nested_svm_vmexit() but now we just
      switch to VMCB01 from VMCB02.
      
      Pre-split (working) flow looked like:
      - SMM is triggered during L2's execution
      - L2's state is pushed to SMRAM
      - nested_svm_vmexit() restores L1's state from 'hsave'
      - SMM -> RSM
      - enter_svm_guest_mode() switches to L2 but keeps 'hsave' intact so we have
        pre-SMM (and pre L2 VMRUN) L1's state there
      - L2's state is restored from SMRAM
      - upon first exit L1's state is restored from L1.
      
      This was always broken with regards to svm_get_nested_state()/
      svm_set_nested_state(): 'hsave' was never a part of what's being
      save and restored so migration happening during SMM triggered from L2 would
      never restore L1's state correctly.
      
      Post-split flow (broken) looks like:
      - SMM is triggered during L2's execution
      - L2's state is pushed to SMRAM
      - nested_svm_vmexit() switches to VMCB01 from VMCB02
      - SMM -> RSM
      - enter_svm_guest_mode() switches from VMCB01 to VMCB02 but pre-SMM VMCB01
        is already lost.
      - L2's state is restored from SMRAM
      - upon first exit L1's state is restored from VMCB01 but it is corrupted
       (reflects the state during 'RSM' execution).
      
      VMX doesn't have this problem because unlike VMCB, VMCS keeps both guest
      and host state so when we switch back to VMCS02 L1's state is intact there.
      
      To resolve the issue we need to save L1's state somewhere. We could've
      created a third VMCB for SMM but that would require us to modify saved
      state format. L1's architectural HSAVE area (pointed by MSR_VM_HSAVE_PA)
      seems appropriate: L0 is free to save any (or none) of L1's state there.
      Currently, KVM does 'none'.
      
      Note, for nested state migration to succeed, both source and destination
      hypervisors must have the fix. We, however, don't need to create a new
      flag indicating the fact that HSAVE area is now populated as migration
      during SMM triggered from L2 was always broken.
      
      Fixes: 4995a368 ("KVM: SVM: Use a separate vmcb for the nested L2 guest")
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      37be407b
    • V
      KVM: nSVM: Introduce svm_copy_vmrun_state() · 0a758290
      Vitaly Kuznetsov 提交于
      Separate the code setting non-VMLOAD-VMSAVE state from
      svm_set_nested_state() into its own function. This is going to be
      re-used from svm_enter_smm()/svm_leave_smm().
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Message-Id: <20210628104425.391276-4-vkuznets@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      0a758290
    • V
      KVM: nSVM: Check that VM_HSAVE_PA MSR was set before VMRUN · fb79f566
      Vitaly Kuznetsov 提交于
      APM states that "The address written to the VM_HSAVE_PA MSR, which holds
      the address of the page used to save the host state on a VMRUN, must point
      to a hypervisor-owned page. If this check fails, the WRMSR will fail with
      a #GP(0) exception. Note that a value of 0 is not considered valid for the
      VM_HSAVE_PA MSR and a VMRUN that is attempted while the HSAVE_PA is 0 will
      fail with a #GP(0) exception."
      
      svm_set_msr() already checks that the supplied address is valid, so only
      check for '0' is missing. Add it to nested_svm_vmrun().
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Message-Id: <20210628104425.391276-3-vkuznets@redhat.com>
      Reviewed-by: NMaxim Levitsky <mlevitsk@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      fb79f566
    • V
      KVM: nSVM: Check the value written to MSR_VM_HSAVE_PA · fce7e152
      Vitaly Kuznetsov 提交于
      APM states that #GP is raised upon write to MSR_VM_HSAVE_PA when
      the supplied address is not page-aligned or is outside of "maximum
      supported physical address for this implementation".
      page_address_valid() check seems suitable. Also, forcefully page-align
      the address when it's written from VMM.
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Message-Id: <20210628104425.391276-2-vkuznets@redhat.com>
      Cc: stable@vger.kernel.org
      Reviewed-by: NMaxim Levitsky <mlevitsk@redhat.com>
      [Add comment about behavior for host-provided values. - Paolo]
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      fce7e152
    • S
      KVM: SVM: Fix sev_pin_memory() error checks in SEV migration utilities · c7a1b2b6
      Sean Christopherson 提交于
      Use IS_ERR() instead of checking for a NULL pointer when querying for
      sev_pin_memory() failures.  sev_pin_memory() always returns an error code
      cast to a pointer, or a valid pointer; it never returns NULL.
      Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
      Cc: Steve Rutherford <srutherford@google.com>
      Cc: Brijesh Singh <brijesh.singh@amd.com>
      Cc: Ashish Kalra <ashish.kalra@amd.com>
      Fixes: d3d1af85 ("KVM: SVM: Add KVM_SEND_UPDATE_DATA command")
      Fixes: 15fb7de1 ("KVM: SVM: Add KVM_SEV_RECEIVE_UPDATE_DATA command")
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210506175826.2166383-3-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      c7a1b2b6
    • S
      KVM: SVM: Return -EFAULT if copy_to_user() for SEV mig packet header fails · b4a69392
      Sean Christopherson 提交于
      Return -EFAULT if copy_to_user() fails; if accessing user memory faults,
      copy_to_user() returns the number of bytes remaining, not an error code.
      Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
      Cc: Steve Rutherford <srutherford@google.com>
      Cc: Brijesh Singh <brijesh.singh@amd.com>
      Cc: Ashish Kalra <ashish.kalra@amd.com>
      Fixes: d3d1af85 ("KVM: SVM: Add KVM_SEND_UPDATE_DATA command")
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210506175826.2166383-2-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      b4a69392
    • M
      KVM: SVM: add module param to control the #SMI interception · 4b639a9f
      Maxim Levitsky 提交于
      In theory there are no side effects of not intercepting #SMI,
      because then #SMI becomes transparent to the OS and the KVM.
      
      Plus an observation on recent Zen2 CPUs reveals that these
      CPUs ignore #SMI interception and never deliver #SMI VMexits.
      
      This is also useful to test nested KVM to see that L1
      handles #SMIs correctly in case when L1 doesn't intercept #SMI.
      
      Finally the default remains the same, the SMI are intercepted
      by default thus this patch doesn't have any effect unless
      non default module param value is used.
      Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
      Message-Id: <20210707125100.677203-4-mlevitsk@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      4b639a9f
    • M
      KVM: SVM: remove INIT intercept handler · 896707c2
      Maxim Levitsky 提交于
      Kernel never sends real INIT even to CPUs, other than on boot.
      
      Thus INIT interception is an error which should be caught
      by a check for an unknown VMexit reason.
      
      On top of that, the current INIT VM exit handler skips
      the current instruction which is wrong.
      That was added in commit 5ff3a351 ("KVM: x86: Move trivial
      instruction-based exit handlers to common code").
      
      Fixes: 5ff3a351 ("KVM: x86: Move trivial instruction-based exit handlers to common code")
      Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
      Message-Id: <20210707125100.677203-3-mlevitsk@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      896707c2
    • M
      KVM: SVM: #SMI interception must not skip the instruction · 991afbbe
      Maxim Levitsky 提交于
      Commit 5ff3a351 ("KVM: x86: Move trivial instruction-based
      exit handlers to common code"), unfortunately made a mistake of
      treating nop_on_interception and nop_interception in the same way.
      
      Former does truly nothing while the latter skips the instruction.
      
      SMI VM exit handler should do nothing.
      (SMI itself is handled by the host when we do STGI)
      
      Fixes: 5ff3a351 ("KVM: x86: Move trivial instruction-based exit handlers to common code")
      Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
      Message-Id: <20210707125100.677203-2-mlevitsk@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      991afbbe
    • S
      KVM: SVM: Revert clearing of C-bit on GPA in #NPF handler · 76ff371b
      Sean Christopherson 提交于
      Don't clear the C-bit in the #NPF handler, as it is a legal GPA bit for
      non-SEV guests, and for SEV guests the C-bit is dropped before the GPA
      hits the NPT in hardware.  Clearing the bit for non-SEV guests causes KVM
      to mishandle #NPFs with that collide with the host's C-bit.
      
      Although the APM doesn't explicitly state that the C-bit is not reserved
      for non-SEV, Tom Lendacky confirmed that the following snippet about the
      effective reduction due to the C-bit does indeed apply only to SEV guests.
      
        Note that because guest physical addresses are always translated
        through the nested page tables, the size of the guest physical address
        space is not impacted by any physical address space reduction indicated
        in CPUID 8000_001F[EBX]. If the C-bit is a physical address bit however,
        the guest physical address space is effectively reduced by 1 bit.
      
      And for SEV guests, the APM clearly states that the bit is dropped before
      walking the nested page tables.
      
        If the C-bit is an address bit, this bit is masked from the guest
        physical address when it is translated through the nested page tables.
        Consequently, the hypervisor does not need to be aware of which pages
        the guest has chosen to mark private.
      
      Note, the bogus C-bit clearing was removed from legacy #PF handler in
      commit 6d1b867d ("KVM: SVM: Don't strip the C-bit from CR2 on #PF
      interception").
      
      Fixes: 0ede79e1 ("KVM: SVM: Clear C-bit from the page fault address")
      Cc: Peter Gonda <pgonda@google.com>
      Cc: Brijesh Singh <brijesh.singh@amd.com>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NSean Christopherson <seanjc@google.com>
      Message-Id: <20210625020354.431829-3-seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      76ff371b
  5. 25 6月, 2021 5 次提交
  6. 24 6月, 2021 2 次提交
  7. 18 6月, 2021 10 次提交