提交 · 4f117ce4aefca0e90cd44680219d4c261c1381b9 · openeuler / Kernel

02 8月, 2021 2 次提交

KVM: SVM: Zero out GDTR.base and IDTR.base on INIT · 4f117ce4

由 Sean Christopherson 提交于 7月 13, 2021

Explicitly set GDTR.base and IDTR.base to zero when intializing the VMCB.
Functionally this only affects INIT, as the bases are implicitly set to
zero on RESET by virtue of the VMCB being zero allocated.

Per AMD's APM, GDTR.base and IDTR.base are zeroed after RESET and INIT.

Fixes: 04d2cc77 ("KVM: Move main vcpu loop into subarch independent code")
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210713163324.627647-4-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

4f117ce4

KVM: x86: Use KVM_BUG/KVM_BUG_ON to handle bugs that are fatal to the VM · 67369273

由 Sean Christopherson 提交于 7月 02, 2021

Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: NIsaku Yamahata <isaku.yamahata@intel.com>
Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
Message-Id: <0e8760a26151f47dc47052b25ca8b84fffe0641e.1625186503.git.isaku.yamahata@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

67369273

28 7月, 2021 2 次提交

KVM: SVM: svm_set_vintr don't warn if AVIC is active but is about to be deactivated · f1577ab2

由 Maxim Levitsky 提交于 7月 13, 2021

It is possible for AVIC inhibit and AVIC active state to be mismatched.
Currently we disable AVIC right away on vCPU which started the AVIC inhibit
request thus this warning doesn't trigger but at least in theory,
if svm_set_vintr is called at the same time on multiple vCPUs,
the warning can happen.
Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20210713142023.106183-2-mlevitsk@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

f1577ab2

KVM: SVM: delay svm_vcpu_init_msrpm after svm->vmcb is initialized · 3fa5e8fd

由 Paolo Bonzini 提交于 7月 26, 2021

Right now, svm_hv_vmcb_dirty_nested_enlightenments has an incorrect
dereference of vmcb->control.reserved_sw before the vmcb is checked
for being non-NULL.  The compiler is usually sinking the dereference
after the check; instead of doing this ourselves in the source,
ensure that svm_hv_vmcb_dirty_nested_enlightenments is only called
with a non-NULL VMCB.
Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
Cc: Vineeth Pillai <viremana@linux.microsoft.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
[Untested for now due to issues with my AMD machine. - Paolo]

3fa5e8fd

26 7月, 2021 2 次提交

KVM: nSVM: Swap the parameter order for svm_copy_vmrun_state()/svm_copy_vmloadsave_state() · 2bb16bea

由 Vitaly Kuznetsov 提交于 7月 19, 2021

Make svm_copy_vmrun_state()/svm_copy_vmloadsave_state() interface match
'memcpy(dest, src)' to avoid any confusion.

No functional change intended.
Suggested-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20210719090322.625277-1-vkuznets@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

2bb16bea

KVM: nSVM: Rename nested_svm_vmloadsave() to svm_copy_vmloadsave_state() · 9a9e7481

由 Vitaly Kuznetsov 提交于 7月 16, 2021

To match svm_copy_vmrun_state(), rename nested_svm_vmloadsave() to
svm_copy_vmloadsave_state().

Opportunistically add missing braces to 'else' branch in
vmload_vmsave_interception().

No functional change intended.
Suggested-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20210716144104.465269-1-vkuznets@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

9a9e7481

15 7月, 2021 7 次提交

KVM: nSVM: Restore nested control upon leaving SMM · bb00bd9c

由 Vitaly Kuznetsov 提交于 6月 28, 2021

If the VM was migrated while in SMM, no nested state was saved/restored,
and therefore svm_leave_smm has to load both save and control area
of the vmcb12. Save area is already loaded from HSAVE area,
so now load the control area as well from the vmcb12.
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20210628104425.391276-6-vkuznets@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

bb00bd9c

KVM: nSVM: Fix L1 state corruption upon return from SMM · 37be407b

由 Vitaly Kuznetsov 提交于 6月 28, 2021

VMCB split commit 4995a368 ("KVM: SVM: Use a separate vmcb for the
nested L2 guest") broke return from SMM when we entered there from guest
(L2) mode. Gen2 WS2016/Hyper-V is known to do this on boot. The problem
manifests itself like this:

  kvm_exit:             reason EXIT_RSM rip 0x7ffbb280 info 0 0
  kvm_emulate_insn:     0:7ffbb280: 0f aa
  kvm_smm_transition:   vcpu 0: leaving SMM, smbase 0x7ffb3000
  kvm_nested_vmrun:     rip: 0x000000007ffbb280 vmcb: 0x0000000008224000
    nrip: 0xffffffffffbbe119 int_ctl: 0x01020000 event_inj: 0x00000000
    npt: on
  kvm_nested_intercepts: cr_read: 0000 cr_write: 0010 excp: 40060002
    intercepts: fd44bfeb 0000217f 00000000
  kvm_entry:            vcpu 0, rip 0xffffffffffbbe119
  kvm_exit:             reason EXIT_NPF rip 0xffffffffffbbe119 info
    200000006 1ab000
  kvm_nested_vmexit:    vcpu 0 reason npf rip 0xffffffffffbbe119 info1
    0x0000000200000006 info2 0x00000000001ab000 intr_info 0x00000000
    error_code 0x00000000
  kvm_page_fault:       address 1ab000 error_code 6
  kvm_nested_vmexit_inject: reason EXIT_NPF info1 200000006 info2 1ab000
    int_info 0 int_info_err 0
  kvm_entry:            vcpu 0, rip 0x7ffbb280
  kvm_exit:             reason EXIT_EXCP_GP rip 0x7ffbb280 info 0 0
  kvm_emulate_insn:     0:7ffbb280: 0f aa
  kvm_inj_exception:    #GP (0x0)

Note: return to L2 succeeded but upon first exit to L1 its RIP points to
'RSM' instruction but we're not in SMM.

The problem appears to be that VMCB01 gets irreversibly destroyed during
SMM execution. Previously, we used to have 'hsave' VMCB where regular
(pre-SMM) L1's state was saved upon nested_svm_vmexit() but now we just
switch to VMCB01 from VMCB02.

Pre-split (working) flow looked like:
- SMM is triggered during L2's execution
- L2's state is pushed to SMRAM
- nested_svm_vmexit() restores L1's state from 'hsave'
- SMM -> RSM
- enter_svm_guest_mode() switches to L2 but keeps 'hsave' intact so we have
  pre-SMM (and pre L2 VMRUN) L1's state there
- L2's state is restored from SMRAM
- upon first exit L1's state is restored from L1.

This was always broken with regards to svm_get_nested_state()/
svm_set_nested_state(): 'hsave' was never a part of what's being
save and restored so migration happening during SMM triggered from L2 would
never restore L1's state correctly.

Post-split flow (broken) looks like:
- SMM is triggered during L2's execution
- L2's state is pushed to SMRAM
- nested_svm_vmexit() switches to VMCB01 from VMCB02
- SMM -> RSM
- enter_svm_guest_mode() switches from VMCB01 to VMCB02 but pre-SMM VMCB01
  is already lost.
- L2's state is restored from SMRAM
- upon first exit L1's state is restored from VMCB01 but it is corrupted
 (reflects the state during 'RSM' execution).

VMX doesn't have this problem because unlike VMCB, VMCS keeps both guest
and host state so when we switch back to VMCS02 L1's state is intact there.

To resolve the issue we need to save L1's state somewhere. We could've
created a third VMCB for SMM but that would require us to modify saved
state format. L1's architectural HSAVE area (pointed by MSR_VM_HSAVE_PA)
seems appropriate: L0 is free to save any (or none) of L1's state there.
Currently, KVM does 'none'.

Note, for nested state migration to succeed, both source and destination
hypervisors must have the fix. We, however, don't need to create a new
flag indicating the fact that HSAVE area is now populated as migration
during SMM triggered from L2 was always broken.

Fixes: 4995a368 ("KVM: SVM: Use a separate vmcb for the nested L2 guest")
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

37be407b

KVM: nSVM: Check the value written to MSR_VM_HSAVE_PA · fce7e152

由 Vitaly Kuznetsov 提交于 6月 28, 2021

APM states that #GP is raised upon write to MSR_VM_HSAVE_PA when
the supplied address is not page-aligned or is outside of "maximum
supported physical address for this implementation".
page_address_valid() check seems suitable. Also, forcefully page-align
the address when it's written from VMM.
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20210628104425.391276-2-vkuznets@redhat.com>
Cc: stable@vger.kernel.org
Reviewed-by: NMaxim Levitsky <mlevitsk@redhat.com>
[Add comment about behavior for host-provided values. - Paolo]
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

fce7e152

KVM: SVM: add module param to control the #SMI interception · 4b639a9f

由 Maxim Levitsky 提交于 7月 07, 2021

In theory there are no side effects of not intercepting #SMI,
because then #SMI becomes transparent to the OS and the KVM.

Plus an observation on recent Zen2 CPUs reveals that these
CPUs ignore #SMI interception and never deliver #SMI VMexits.

This is also useful to test nested KVM to see that L1
handles #SMIs correctly in case when L1 doesn't intercept #SMI.

Finally the default remains the same, the SMI are intercepted
by default thus this patch doesn't have any effect unless
non default module param value is used.
Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20210707125100.677203-4-mlevitsk@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

4b639a9f

KVM: SVM: remove INIT intercept handler · 896707c2

由 Maxim Levitsky 提交于 7月 07, 2021

Kernel never sends real INIT even to CPUs, other than on boot.

Thus INIT interception is an error which should be caught
by a check for an unknown VMexit reason.

On top of that, the current INIT VM exit handler skips
the current instruction which is wrong.
That was added in commit 5ff3a351 ("KVM: x86: Move trivial
instruction-based exit handlers to common code").

Fixes: 5ff3a351 ("KVM: x86: Move trivial instruction-based exit handlers to common code")
Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20210707125100.677203-3-mlevitsk@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

896707c2

KVM: SVM: #SMI interception must not skip the instruction · 991afbbe

由 Maxim Levitsky 提交于 7月 07, 2021

Commit 5ff3a351 ("KVM: x86: Move trivial instruction-based
exit handlers to common code"), unfortunately made a mistake of
treating nop_on_interception and nop_interception in the same way.

Former does truly nothing while the latter skips the instruction.

SMI VM exit handler should do nothing.
(SMI itself is handled by the host when we do STGI)

Fixes: 5ff3a351 ("KVM: x86: Move trivial instruction-based exit handlers to common code")
Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20210707125100.677203-2-mlevitsk@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

991afbbe

KVM: SVM: Revert clearing of C-bit on GPA in #NPF handler · 76ff371b

由 Sean Christopherson 提交于 6月 24, 2021

Don't clear the C-bit in the #NPF handler, as it is a legal GPA bit for
non-SEV guests, and for SEV guests the C-bit is dropped before the GPA
hits the NPT in hardware.  Clearing the bit for non-SEV guests causes KVM
to mishandle #NPFs with that collide with the host's C-bit.

Although the APM doesn't explicitly state that the C-bit is not reserved
for non-SEV, Tom Lendacky confirmed that the following snippet about the
effective reduction due to the C-bit does indeed apply only to SEV guests.

  Note that because guest physical addresses are always translated
  through the nested page tables, the size of the guest physical address
  space is not impacted by any physical address space reduction indicated
  in CPUID 8000_001F[EBX]. If the C-bit is a physical address bit however,
  the guest physical address space is effectively reduced by 1 bit.

And for SEV guests, the APM clearly states that the bit is dropped before
walking the nested page tables.

  If the C-bit is an address bit, this bit is masked from the guest
  physical address when it is translated through the nested page tables.
  Consequently, the hypervisor does not need to be aware of which pages
  the guest has chosen to mark private.

Note, the bogus C-bit clearing was removed from legacy #PF handler in
commit 6d1b867d ("KVM: SVM: Don't strip the C-bit from CR2 on #PF
interception").

Fixes: 0ede79e1 ("KVM: SVM: Clear C-bit from the page fault address")
Cc: Peter Gonda <pgonda@google.com>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210625020354.431829-3-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

76ff371b

24 6月, 2021 1 次提交

KVM: x86: Print CPU of last attempted VM-entry when dumping VMCS/VMCB · 18f63b15

由 Jim Mattson 提交于 6月 21, 2021

Failed VM-entry is often due to a faulty core. To help identify bad
cores, print the id of the last logical processor that attempted
VM-entry whenever dumping a VMCS or VMCB.
Signed-off-by: NJim Mattson <jmattson@google.com>
Message-Id: <20210621221648.1833148-1-jmattson@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

18f63b15

18 6月, 2021 12 次提交

KVM: SVM: Refuse to load kvm_amd if NX support is not available · b26a71a1

由 Sean Christopherson 提交于 6月 15, 2021

Refuse to load KVM if NX support is not available. Shadow paging has
assumed NX support since commit 9167ab79 ("KVM: vmx, svm: always run
with EFER.NXE=1 when shadow paging is active"), and NPT has assumed NX
support since commit b8e8c830 ("kvm: mmu: ITLB_MULTIHIT mitigation").
While the NX huge pages mitigation should not be enabled by default for
AMD CPUs, it can be turned on by userspace at will.

Unlike Intel CPUs, AMD does not provide a way for firmware to disable NX
support, and Linux always sets EFER.NX=1 if it is supported. Given that
it's extremely unlikely that a CPU supports NPT but not NX, making NX a
formal requirement is far simpler than adding requirements to the
mitigation flow.

Fixes: 9167ab79 ("KVM: vmx, svm: always run with EFER.NXE=1 when shadow paging is active")
Fixes: b8e8c830 ("kvm: mmu: ITLB_MULTIHIT mitigation")
Signed-off-by: NSean Christopherson <seanjc@google.com>
Reviewed-by: NJim Mattson <jmattson@google.com>
Message-Id: <20210615164535.2146172-3-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

b26a71a1

KVM: x86: introduce kvm_register_clear_available · 329675dd

由 Maxim Levitsky 提交于 6月 07, 2021

Small refactoring that will be used in the next patch.
Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20210607090203.133058-7-mlevitsk@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

329675dd

KVM: SVM: hyper-v: Direct Virtual Flush support · 1183646a

由 Vineeth Pillai 提交于 6月 03, 2021

From Hyper-V TLFS:
 "The hypervisor exposes hypercalls (HvFlushVirtualAddressSpace,
  HvFlushVirtualAddressSpaceEx, HvFlushVirtualAddressList, and
  HvFlushVirtualAddressListEx) that allow operating systems to more
  efficiently manage the virtual TLB. The L1 hypervisor can choose to
  allow its guest to use those hypercalls and delegate the responsibility
  to handle them to the L0 hypervisor. This requires the use of a
  partition assist page."

Add the Direct Virtual Flush support for SVM.

Related VMX changes:
commit 6f6a657c ("KVM/Hyper-V/VMX: Add direct tlb flush support")
Signed-off-by: NVineeth Pillai <viremana@linux.microsoft.com>
Message-Id: <fc8d24d8eb7017266bb961e39a171b0caf298d7f.1622730232.git.viremana@linux.microsoft.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

1183646a

KVM: SVM: hyper-v: Enlightened MSR-Bitmap support · c4327f15

由 Vineeth Pillai 提交于 6月 03, 2021

Enlightened MSR-Bitmap as per TLFS:

 "The L1 hypervisor may collaborate with the L0 hypervisor to make MSR
  accesses more efficient. It can enable enlightened MSR bitmaps by setting
  the corresponding field in the enlightened VMCS to 1. When enabled, L0
  hypervisor does not monitor the MSR bitmaps for changes. Instead, the L1
  hypervisor must invalidate the corresponding clean field after making
  changes to one of the MSR bitmaps."

Enable this for SVM.

Related VMX changes:
commit ceef7d10 ("KVM: x86: VMX: hyper-v: Enlightened MSR-Bitmap support")
Signed-off-by: NVineeth Pillai <viremana@linux.microsoft.com>
Message-Id: <87df0710f95d28b91cc4ea014fc4d71056eebbee.1622730232.git.viremana@linux.microsoft.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c4327f15

KVM: SVM: hyper-v: Remote TLB flush for SVM · 1e0c7d40

由 Vineeth Pillai 提交于 6月 03, 2021

Enable remote TLB flush for SVM.
Signed-off-by: NVineeth Pillai <viremana@linux.microsoft.com>
Message-Id: <1ee364e397e142aed662d2920d198cd03772f1a5.1622730232.git.viremana@linux.microsoft.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

1e0c7d40

KVM: nVMX: nSVM: 'nested_run' should count guest-entry attempts that make it to guest code · b93af02c

由 Krish Sadhukhan 提交于 6月 09, 2021

Currently, the 'nested_run' statistic counts all guest-entry attempts,
including those that fail during vmentry checks on Intel and during
consistency checks on AMD. Convert this statistic to count only those
guest-entries that make it past these state checks and make it to guest
code. This will tell us the number of guest-entries that actually executed
or tried to execute guest code.
Signed-off-by: NKrish Sadhukhan <Krish.Sadhukhan@oracle.com>
Message-Id: <20210609180340.104248-2-krish.sadhukhan@oracle.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

b93af02c

KVM: x86: Drop "pre_" from enter/leave_smm() helpers · ecc513e5

由 Sean Christopherson 提交于 6月 09, 2021

Now that .post_leave_smm() is gone, drop "pre_" from the remaining
helpers.  The helpers aren't invoked purely before SMI/RSM processing,
e.g. both helpers are invoked after state is snapshotted (from regs or
SMRAM), and the RSM helper is invoked after some amount of register state
has been stuffed.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210609185619.992058-10-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

ecc513e5

KVM: x86: Drop vendor specific functions for APICv/AVIC enablement · 4651fc56

由 Vitaly Kuznetsov 提交于 6月 09, 2021

Now that APICv/AVIC enablement is kept in common 'enable_apicv' variable,
there's no need to call kvm_apicv_init() from vendor specific code.

No functional change intended.
Reviewed-by: NSean Christopherson <seanjc@google.com>
Reviewed-by: NMaxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20210609150911.1471882c-3-vkuznets@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

4651fc56

KVM: x86: Use common 'enable_apicv' variable for both APICv and AVIC · fdf513e3

由 Vitaly Kuznetsov 提交于 6月 09, 2021

Unify VMX and SVM code by moving APICv/AVIC enablement tracking to common
'enable_apicv' variable. Note: unlike APICv, AVIC is disabled by default.

No functional change intended.
Suggested-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20210609150911.1471882c-2-vkuznets@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

fdf513e3

KVM: X86: Add vendor callbacks for writing the TSC multiplier · 1ab9287a

由 Ilias Stamatis 提交于 6月 07, 2021

Currently vmx_vcpu_load_vmcs() writes the TSC_MULTIPLIER field of the
VMCS every time the VMCS is loaded. Instead of doing this, set this
field from common code on initialization and whenever the scaling ratio
changes.

Additionally remove vmx->current_tsc_ratio. This field is redundant as
vcpu->arch.tsc_scaling_ratio already tracks the current TSC scaling
ratio. The vmx->current_tsc_ratio field is only used for avoiding
unnecessary writes but it is no longer needed after removing the code
from the VMCS load path.
Suggested-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NIlias Stamatis <ilstam@amazon.com>
Message-Id: <20210607105438.16541-1-ilstam@amazon.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

1ab9287a

KVM: X86: Move write_l1_tsc_offset() logic to common code and rename it · edcfe540

由 Ilias Stamatis 提交于 5月 26, 2021

The write_l1_tsc_offset() callback has a misleading name. It does not
set L1's TSC offset, it rather updates the current TSC offset which
might be different if a nested guest is executing. Additionally, both
the vmx and svm implementations use the same logic for calculating the
current TSC before writing it to hardware.

Rename the function and move the common logic to the caller. The vmx/svm
specific code now merely sets the given offset to the corresponding
hardware structure.
Signed-off-by: NIlias Stamatis <ilstam@amazon.com>
Reviewed-by: NMaxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20210526184418.28881-9-ilstam@amazon.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

edcfe540

KVM: X86: Add functions for retrieving L2 TSC fields from common code · 307a94c7

由 Ilias Stamatis 提交于 5月 26, 2021

In order to implement as much of the nested TSC scaling logic as
possible in common code, we need these vendor callbacks for retrieving
the TSC offset and the TSC multiplier that L1 has set for L2.
Signed-off-by: NIlias Stamatis <ilstam@amazon.com>
Reviewed-by: NMaxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20210526184418.28881-7-ilstam@amazon.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

307a94c7

25 5月, 2021 1 次提交

KVM: SVM: Drop unneeded CONFIG_X86_LOCAL_APIC check · 778a136e

由 Vitaly Kuznetsov 提交于 5月 18, 2021

AVIC dependency on CONFIG_X86_LOCAL_APIC is dead code since
commit e42eef4b ("KVM: add X86_LOCAL_APIC dependency").
Suggested-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20210518144339.1987982-2-vkuznets@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Reviewed-by: NSean Christopherson <seanjc@google.com>

778a136e

10 5月, 2021 1 次提交

x86/msr: Rename MSR_K8_SYSCFG to MSR_AMD64_SYSCFG · 059e5c32

由 Brijesh Singh 提交于 4月 27, 2021

The SYSCFG MSR continued being updated beyond the K8 family; drop the K8
name from it.
Suggested-by: NBorislav Petkov <bp@alien8.de>
Signed-off-by: NBrijesh Singh <brijesh.singh@amd.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Acked-by: NJoerg Roedel <jroedel@suse.de>
Link: https://lkml.kernel.org/r/20210427111636.1207-4-brijesh.singh@amd.com

059e5c32

07 5月, 2021 7 次提交

KVM: SVM: Move GHCB unmapping to fix RCU warning · ce7ea0cf

由 Tom Lendacky 提交于 5月 06, 2021

When an SEV-ES guest is running, the GHCB is unmapped as part of the
vCPU run support. However, kvm_vcpu_unmap() triggers an RCU dereference
warning with CONFIG_PROVE_LOCKING=y because the SRCU lock is released
before invoking the vCPU run support.

Move the GHCB unmapping into the prepare_guest_switch callback, which is
invoked while still holding the SRCU lock, eliminating the RCU dereference
warning.

Fixes: 291bd20d ("KVM: SVM: Add initial support for a VMGEXIT VMEXIT")
Reported-by: NBorislav Petkov <bp@alien8.de>
Signed-off-by: NTom Lendacky <thomas.lendacky@amd.com>
Message-Id: <b2f9b79d15166f2c3e4375c0d9bc3268b7696455.1620332081.git.thomas.lendacky@amd.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

ce7ea0cf

KVM: x86: Prevent KVM SVM from loading on kernels with 5-level paging · 03ca4589

由 Sean Christopherson 提交于 5月 05, 2021

Disallow loading KVM SVM if 5-level paging is supported.  In theory, NPT
for L1 should simply work, but there unknowns with respect to how the
guest's MAXPHYADDR will be handled by hardware.

Nested NPT is more problematic, as running an L1 VMM that is using
2-level page tables requires stacking single-entry PDP and PML4 tables in
KVM's NPT for L2, as there are no equivalent entries in L1's NPT to
shadow.  Barring hardware magic, for 5-level paging, KVM would need stack
another layer to handle PML5.

Opportunistically rename the lm_root pointer, which is used for the
aforementioned stacking when shadowing 2-level L1 NPT, to pml4_root to
call out that it's specifically for PML4.
Suggested-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210505204221.1934471-1-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

03ca4589

KVM: x86: Tie Intel and AMD behavior for MSR_TSC_AUX to guest CPU model · 61a05d44

由 Sean Christopherson 提交于 5月 04, 2021

Squish the Intel and AMD emulation of MSR_TSC_AUX together and tie it to
the guest CPU model instead of the host CPU behavior. While not strictly
necessary to avoid guest breakage, emulating cross-vendor "architecture"
will provide consistent behavior for the guest, e.g. WRMSR fault behavior
won't change if the vCPU is migrated to a host with divergent behavior.

Note, the "new" kvm_is_supported_user_return_msr() checks do not add new
functionality on either SVM or VMX. On SVM, the equivalent was
"tsc_aux_uret_slot < 0", and on VMX the check was buried in the
vmx_find_uret_msr() call at the find_uret_msr label.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210504171734.1434054-15-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

61a05d44

KVM: x86: Move uret MSR slot management to common x86 · e5fda4bb

由 Sean Christopherson 提交于 5月 04, 2021

Now that SVM and VMX both probe MSRs before "defining" user return slots
for them, consolidate the code for probe+define into common x86 and
eliminate the odd behavior of having the vendor code define the slot for
a given MSR.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210504171734.1434054-14-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

e5fda4bb

KVM: x86: Add support for RDPID without RDTSCP · 36fa06f9

由 Sean Christopherson 提交于 5月 04, 2021

Allow userspace to enable RDPID for a guest without also enabling RDTSCP.
Aside from checking for RDPID support in the obvious flows, VMX also needs
to set ENABLE_RDTSCP=1 when RDPID is exposed.

For the record, there is no known scenario where enabling RDPID without
RDTSCP is desirable.  But, both AMD and Intel architectures allow for the
condition, i.e. this is purely to make KVM more architecturally accurate.

Fixes: 41cd02c6 ("kvm: x86: Expose RDPID in KVM_GET_SUPPORTED_CPUID")
Cc: stable@vger.kernel.org
Reported-by: NReiji Watanabe <reijiw@google.com>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210504171734.1434054-8-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

36fa06f9

KVM: SVM: Probe and load MSR_TSC_AUX regardless of RDTSCP support in host · 0caa0a77

由 Sean Christopherson 提交于 5月 04, 2021

Probe MSR_TSC_AUX whether or not RDTSCP is supported in the host, and
if probing succeeds, load the guest's MSR_TSC_AUX into hardware prior to
VMRUN. Because SVM doesn't support interception of RDPID, RDPID cannot
be disallowed in the guest (without resorting to binary translation).
Leaving the host's MSR_TSC_AUX in hardware would leak the host's value to
the guest if RDTSCP is not supported.

Note, there is also a kernel bug that prevents leaking the host's value.
The host kernel initializes MSR_TSC_AUX if and only if RDTSCP is
supported, even though the vDSO usage consumes MSR_TSC_AUX via RDPID.
I.e. if RDTSCP is not supported, there is no host value to leak. But,
if/when the host kernel bug is fixed, KVM would start leaking MSR_TSC_AUX
in the case where hardware supports RDPID but RDTSCP is unavailable for
whatever reason.

Probing MSR_TSC_AUX will also allow consolidating the probe and define
logic in common x86, and will make it simpler to condition the existence
of MSR_TSX_AUX (from the guest's perspective) on RDTSCP *or* RDPID.

Cc: stable@vger.kernel.org
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210504171734.1434054-7-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

0caa0a77

KVM: SVM: Inject #UD on RDTSCP when it should be disabled in the guest · 3b195ac9

由 Sean Christopherson 提交于 5月 04, 2021

Intercept RDTSCP to inject #UD if RDTSC is disabled in the guest.

Note, SVM does not support intercepting RDPID.  Unlike VMX's
ENABLE_RDTSCP control, RDTSCP interception does not apply to RDPID.  This
is a benign virtualization hole as the host kernel (incorrectly) sets
MSR_TSC_AUX if RDTSCP is supported, and KVM loads the guest's MSR_TSC_AUX
into hardware if RDTSCP is supported in the host, i.e. KVM will not leak
the host's MSR_TSC_AUX to the guest.

But, when the kernel bug is fixed, KVM will start leaking the host's
MSR_TSC_AUX if RDPID is supported in hardware, but RDTSCP isn't available
for whatever reason.  This leak will be remedied in a future commit.

Fixes: 46896c73 ("KVM: svm: add support for RDTSCP")
Cc: stable@vger.kernel.org
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210504171734.1434054-4-seanjc@google.com>
Reviewed-by: NJim Mattson <jmattson@google.com>
Reviewed-by: NReiji Watanabe <reijiw@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

3b195ac9

06 5月, 2021 2 次提交

KVM: x86: Consolidate guest enter/exit logic to common helpers · bc908e09

由 Sean Christopherson 提交于 5月 04, 2021

Move the enter/exit logic in {svm,vmx}_vcpu_enter_exit() to common
helpers.  Opportunistically update the somewhat stale comment about the
updates needing to occur immediately after VM-Exit.

No functional change intended.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20210505002735.1684165-9-seanjc@google.com

bc908e09

KVM: x86: Defer vtime accounting 'til after IRQ handling · 16045714

由 Wanpeng Li 提交于 5月 04, 2021

Defer the call to account guest time until after servicing any IRQ(s)
that happened in the guest or immediately after VM-Exit.  Tick-based
accounting of vCPU time relies on PF_VCPU being set when the tick IRQ
handler runs, and IRQs are blocked throughout the main sequence of
vcpu_enter_guest(), including the call into vendor code to actually
enter and exit the guest.

This fixes a bug where reported guest time remains '0', even when
running an infinite loop in the guest:

  https://bugzilla.kernel.org/show_bug.cgi?id=209831

Fixes: 87fa7f3e ("x86/kvm: Move context tracking where it belongs")
Suggested-by: NThomas Gleixner <tglx@linutronix.de>
Co-developed-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20210505002735.1684165-4-seanjc@google.com

16045714

03 5月, 2021 1 次提交

KVM: nSVM: fix few bugs in the vmcb02 caching logic · c74ad08f

由 Maxim Levitsky 提交于 5月 03, 2021

* Define and use an invalid GPA (all ones) for init value of last
  and current nested vmcb physical addresses.

* Reset the current vmcb12 gpa to the invalid value when leaving
  the nested mode, similar to what is done on nested vmexit.

* Reset	the last seen vmcb12 address when disabling the nested SVM,
  as it relies on vmcb02 fields which are freed at that point.

Fixes: 4995a368 ("KVM: SVM: Use a separate vmcb for the nested L2 guest")
Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20210503125446.1353307-3-mlevitsk@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c74ad08f

26 4月, 2021 2 次提交

KVM: SVM: Move SEV VMCB tracking allocation to sev.c · b95c221c

由 Sean Christopherson 提交于 4月 21, 2021

Move the allocation of the SEV VMCB array to sev.c to help pave the way
toward encapsulating SEV enabling wholly within sev.c.

No functional change intended.

Reviewed by: Tom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: NBrijesh Singh <brijesh.singh@amd.com>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210422021125.3417167-13-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

b95c221c

KVM: SVM: Unconditionally invoke sev_hardware_teardown() · 4cafd0c5

由 Sean Christopherson 提交于 4月 21, 2021

Remove the redundant svm_sev_enabled() check when calling
sev_hardware_teardown(), the teardown helper itself does the check.
Removing the check from svm.c will eventually allow dropping
svm_sev_enabled() entirely.

No functional change intended.

Reviewed by: Tom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: NBrijesh Singh <brijesh.singh@amd.com>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210422021125.3417167-11-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

4cafd0c5

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功