提交 · 3c0c2ad1ae75963c05bf89ec91918c6a53a72696 · openeuler / Kernel

20 4月, 2021 17 次提交

KVM: VMX: Add basic handling of VM-Exit from SGX enclave · 3c0c2ad1

由 Sean Christopherson 提交于 4月 12, 2021

Add support for handling VM-Exits that originate from a guest SGX
enclave. In SGX, an "enclave" is a new CPL3-only execution environment,
wherein the CPU and memory state is protected by hardware to make the
state inaccesible to code running outside of the enclave. When exiting
an enclave due to an asynchronous event (from the perspective of the
enclave), e.g. exceptions, interrupts, and VM-Exits, the enclave's state
is automatically saved and scrubbed (the CPU loads synthetic state), and
then reloaded when re-entering the enclave. E.g. after an instruction
based VM-Exit from an enclave, vmcs.GUEST_RIP will not contain the RIP
of the enclave instruction that trigered VM-Exit, but will instead point
to a RIP in the enclave's untrusted runtime (the guest userspace code
that coordinates entry/exit to/from the enclave).

To help a VMM recognize and handle exits from enclaves, SGX adds bits to
existing VMCS fields, VM_EXIT_REASON.VMX_EXIT_REASON_FROM_ENCLAVE and
GUEST_INTERRUPTIBILITY_INFO.GUEST_INTR_STATE_ENCLAVE_INTR. Define the
new architectural bits, and add a boolean to struct vcpu_vmx to cache
VMX_EXIT_REASON_FROM_ENCLAVE. Clear the bit in exit_reason so that
checks against exit_reason do not need to account for SGX, e.g.
"if (exit_reason == EXIT_REASON_EXCEPTION_NMI)" continues to work.

KVM is a largely a passive observer of the new bits, e.g. KVM needs to
account for the bits when propagating information to a nested VMM, but
otherwise doesn't need to act differently for the majority of VM-Exits
from enclaves.

The one scenario that is directly impacted is emulation, which is for
all intents and purposes impossible[1] since KVM does not have access to
the RIP or instruction stream that triggered the VM-Exit. The inability
to emulate is a non-issue for KVM, as most instructions that might
trigger VM-Exit unconditionally #UD in an enclave (before the VM-Exit
check. For the few instruction that conditionally #UD, KVM either never
sets the exiting control, e.g. PAUSE_EXITING[2], or sets it if and only
if the feature is not exposed to the guest in order to inject a #UD,
e.g. RDRAND_EXITING.

But, because it is still possible for a guest to trigger emulation,
e.g. MMIO, inject a #UD if KVM ever attempts emulation after a VM-Exit
from an enclave. This is architecturally accurate for instruction
VM-Exits, and for MMIO it's the least bad choice, e.g. it's preferable
to killing the VM. In practice, only broken or particularly stupid
guests should ever encounter this behavior.

Add a WARN in skip_emulated_instruction to detect any attempt to
modify the guest's RIP during an SGX enclave VM-Exit as all such flows
should either be unreachable or must handle exits from enclaves before
getting to skip_emulated_instruction.

[1] Impossible for all practical purposes. Not truly impossible
since KVM could implement some form of para-virtualization scheme.

[2] PAUSE_LOOP_EXITING only affects CPL0 and enclaves exist only at
CPL3, so we also don't need to worry about that interaction.
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: NKai Huang <kai.huang@intel.com>
Message-Id: <315f54a8507d09c292463ef29104e1d4c62e9090.1618196135.git.kai.huang@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

3c0c2ad1

KVM: x86: Add reverse-CPUID lookup support for scattered SGX features · 01de8682

由 Sean Christopherson 提交于 4月 12, 2021

Define a new KVM-only feature word for advertising and querying SGX
sub-features in CPUID.0x12.0x0.EAX.  Because SGX1 and SGX2 are scattered
in the kernel's feature word, they need to be translated so that the
bit numbers match those of hardware.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NKai Huang <kai.huang@intel.com>
Message-Id: <e797c533f4c71ae89265bbb15a02aef86b67cbec.1618196135.git.kai.huang@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

01de8682

KVM: x86: Add support for reverse CPUID lookup of scattered features · 4e66c0cb

由 Sean Christopherson 提交于 4月 12, 2021

Introduce a scheme that allows KVM's CPUID magic to support features
that are scattered in the kernel's feature words. To advertise and/or
query guest support for CPUID-based features, KVM requires the bit
number of an X86_FEATURE_* to match the bit number in its associated
CPUID entry. For scattered features, this does not hold true.

Add a framework to allow defining KVM-only words, stored in
kvm_cpu_caps after the shared kernel caps, that can be used to gather
the scattered feature bits by translating X86_FEATURE_* flags into their
KVM-defined feature.

Note, because reverse_cpuid_check() effectively forces kvm_cpu_caps
lookups to be resolved at compile time, there is no runtime cost for
translating from kernel-defined to kvm-defined features.

More details here: https://lkml.kernel.org/r/X/jxCOLG+HUO4QlZ@google.comSigned-off-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NKai Huang <kai.huang@intel.com>
Message-Id: <16cad8d00475f67867fb36701fc7fb7c1ec86ce1.1618196135.git.kai.huang@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

4e66c0cb

KVM: x86: Define new #PF SGX error code bit · 00e7646c

由 Sean Christopherson 提交于 4月 12, 2021

Page faults that are signaled by the SGX Enclave Page Cache Map (EPCM),
as opposed to the traditional IA32/EPT page tables, set an SGX bit in
the error code to indicate that the #PF was induced by SGX. KVM will
need to emulate this behavior as part of its trap-and-execute scheme for
virtualizing SGX Launch Control, e.g. to inject SGX-induced #PFs if
EINIT faults in the host, and to support live migration.
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: NKai Huang <kai.huang@intel.com>
Message-Id: <e170c5175cb9f35f53218a7512c9e3db972b97a2.1618196135.git.kai.huang@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

00e7646c

KVM: x86: Export kvm_mmu_gva_to_gpa_{read,write}() for SGX (VMX) · 54f958cd

由 Sean Christopherson 提交于 4月 12, 2021

Export the gva_to_gpa() helpers for use by SGX virtualization when
executing ENCLS[ECREATE] and ENCLS[EINIT] on behalf of the guest.
To execute ECREATE and EINIT, KVM must obtain the GPA of the target
Secure Enclave Control Structure (SECS) in order to get its
corresponding HVA.

Because the SECS must reside in the Enclave Page Cache (EPC), copying
the SECS's data to a host-controlled buffer via existing exported
helpers is not a viable option as the EPC is not readable or writable
by the kernel.

SGX virtualization will also use gva_to_gpa() to obtain HVAs for
non-EPC pages in order to pass user pointers directly to ECREATE and
EINIT, which avoids having to copy pages worth of data into the kernel.
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Acked-by: NJarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: NKai Huang <kai.huang@intel.com>
Message-Id: <02f37708321bcdfaa2f9d41c8478affa6e84b04d.1618196135.git.kai.huang@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

54f958cd

KVM: vmx: add mismatched size assertions in vmcs_check32() · 870c575a

由 Haiwei Li 提交于 4月 09, 2021

Add compile-time assertions in vmcs_check32() to disallow accesses to
64-bit and 64-bit high fields via vmcs_{read,write}32().  Upper level KVM
code should never do partial accesses to VMCS fields.  KVM handles the
split accesses automatically in vmcs_{read,write}64() when running as a
32-bit kernel.
Reviewed-and-tested-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NHaiwei Li <lihaiwei@tencent.com>
Message-Id: <20210409022456.23528-1-lihaiwei.kernel@gmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

870c575a

KVM: x86: Remove unused function declaration · d90b15ed

由 Keqian Zhu 提交于 4月 06, 2021

kvm_mmu_slot_largepage_remove_write_access() is decared but not used,
just remove it.
Signed-off-by: NKeqian Zhu <zhukeqian1@huawei.com>
Message-Id: <20210406063504.17552-1-zhukeqian1@huawei.com>
Reviewed-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

d90b15ed

KVM: SVM: Enhance and clean up the vmcb tracking comment in pre_svm_run() · 44f1b558

由 Sean Christopherson 提交于 4月 06, 2021

Explicitly document why a vmcb must be marked dirty and assigned a new
asid when it will be run on a different cpu.  The "what" is relatively
obvious, whereas the "why" requires reading the APM and/or KVM code.

Opportunistically remove a spurious period and several unnecessary
newlines in the comment.

No functional change intended.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210406171811.4043363-5-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

44f1b558

KVM: SVM: Add a comment to clarify what vcpu_svm.vmcb points at · 554cf314

由 Sean Christopherson 提交于 4月 06, 2021

Add a comment above the declaration of vcpu_svm.vmcb to call out that it
is simply a shorthand for current_vmcb->ptr.  The myriad accesses to
svm->vmcb are quite confusing without this crucial detail.

No functional change intended.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210406171811.4043363-4-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

554cf314

KVM: SVM: Drop vcpu_svm.vmcb_pa · d1788191

由 Sean Christopherson 提交于 4月 06, 2021

Remove vmcb_pa from vcpu_svm and simply read current_vmcb->pa directly in
the one path where it is consumed.  Unlike svm->vmcb, use of the current
vmcb's address is very limited, as evidenced by the fact that its use
can be trimmed to a single dereference.

Opportunistically add a comment about using vmcb01 for VMLOAD/VMSAVE, at
first glance using vmcb01 instead of vmcb_pa looks wrong.

No functional change intended.

Cc: Maxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210406171811.4043363-3-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

d1788191

KVM: SVM: Don't set current_vmcb->cpu when switching vmcb · 17e5e964

由 Sean Christopherson 提交于 4月 06, 2021

Do not update the new vmcb's last-run cpu when switching to a different
vmcb. If the vCPU is migrated between its last run and a vmcb switch,
e.g. for nested VM-Exit, then setting the cpu without marking the vmcb
dirty will lead to KVM running the vCPU on a different physical cpu with
stale clean bit settings.

vcpu->cpu current_vmcb->cpu hardware
pre_svm_run() cpu0 cpu0 cpu0,clean
kvm_arch_vcpu_load() cpu1 cpu0 cpu0,clean
svm_switch_vmcb() cpu1 cpu1 cpu0,clean
pre_svm_run() cpu1 cpu1 kaboom

Simply delete the offending code; unlike VMX, which needs to update the
cpu at switch time due to the need to do VMPTRLD, SVM only cares about
which cpu last ran the vCPU.

Fixes: af18fa77 ("KVM: nSVM: Track the physical cpu of the vmcb vmrun through the vmcb")
Cc: Cathy Avery <cavery@redhat.com>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210406171811.4043363-2-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

17e5e964

KVM: SVM: Make sure GHCB is mapped before updating · a3ba26ec

由 Tom Lendacky 提交于 4月 09, 2021

Access to the GHCB is mainly in the VMGEXIT path and it is known that the
GHCB will be mapped. But there are two paths where it is possible the GHCB
might not be mapped.

The sev_vcpu_deliver_sipi_vector() routine will update the GHCB to inform
the caller of the AP Reset Hold NAE event that a SIPI has been delivered.
However, if a SIPI is performed without a corresponding AP Reset Hold,
then the GHCB might not be mapped (depending on the previous VMEXIT),
which will result in a NULL pointer dereference.

The svm_complete_emulated_msr() routine will update the GHCB to inform
the caller of a RDMSR/WRMSR operation about any errors. While it is likely
that the GHCB will be mapped in this situation, add a safe guard
in this path to be certain a NULL pointer dereference is not encountered.

Fixes: f1c6366e ("KVM: SVM: Add required changes to support intercepts under SEV-ES")
Fixes: 647daca2 ("KVM: SVM: Add support for booting APs in an SEV-ES guest")
Signed-off-by: NTom Lendacky <thomas.lendacky@amd.com>
Cc: stable@vger.kernel.org
Message-Id: <a5d3ebb600a91170fc88599d5a575452b3e31036.1617979121.git.thomas.lendacky@amd.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

a3ba26ec

KVM: X86: Do not yield to self · a1fa4cbd

由 Wanpeng Li 提交于 4月 09, 2021

If the target is self we do not need to yield, we can avoid malicious
guest to play this.
Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
Message-Id: <1617941911-5338-3-git-send-email-wanpengli@tencent.com>
Cc: stable@vger.kernel.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

a1fa4cbd

KVM: X86: Count attempted/successful directed yield · 4a7132ef

由 Wanpeng Li 提交于 4月 09, 2021

To analyze some performance issues with lock contention and scheduling,
it is nice to know when directed yield are successful or failing.
Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
Message-Id: <1617941911-5338-2-git-send-email-wanpengli@tencent.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

4a7132ef

x86/kvm: Don't bother __pv_cpu_mask when !CONFIG_SMP · 2b519b57

由 Wanpeng Li 提交于 4月 09, 2021

Enable PV TLB shootdown when !CONFIG_SMP doesn't make sense. Let's
move it inside CONFIG_SMP. In addition, we can avoid define and
alloc __pv_cpu_mask when !CONFIG_SMP and get rid of 'alloc' variable
in kvm_alloc_cpumask.
Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
Message-Id: <1617941911-5338-1-git-send-email-wanpengli@tencent.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

2b519b57

KVM: x86/mmu: Tear down roots before kvm_mmu_zap_all_fast returns · 4c6654bd

由 Ben Gardon 提交于 4月 01, 2021

To avoid saddling a vCPU thread with the work of tearing down an entire
paging structure, take a reference on each root before they become
obsolete, so that the thread initiating the fast invalidation can tear
down the paging structure and (most likely) release the last reference.
As a bonus, this teardown can happen under the MMU lock in read mode so
as not to block the progress of vCPU threads.
Signed-off-by: NBen Gardon <bgardon@google.com>
Message-Id: <20210401233736.638171-14-bgardon@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

4c6654bd

KVM: x86/mmu: Fast invalidation for TDP MMU · b7cccd39

由 Ben Gardon 提交于 4月 01, 2021

Provide a real mechanism for fast invalidation by marking roots as
invalid so that their reference count will quickly fall to zero
and they will be torn down.

One negative side affect of this approach is that a vCPU thread will
likely drop the last reference to a root and be saddled with the work of
tearing down an entire paging structure. This issue will be resolved in
a later commit.
Signed-off-by: NBen Gardon <bgardon@google.com>
Message-Id: <20210401233736.638171-13-bgardon@google.com>
[Move the loop to tdp_mmu.c, otherwise compilation fails on 32-bit. - Paolo]
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

b7cccd39

19 4月, 2021 12 次提交

KVM: x86/mmu: Allow enabling/disabling dirty logging under MMU read lock · 24ae4cfa

由 Ben Gardon 提交于 4月 01, 2021

To reduce lock contention and interference with page fault handlers,
allow the TDP MMU functions which enable and disable dirty logging
to operate under the MMU read lock.
Signed-off-by: NBen Gardon <bgardon@google.com>
Message-Id: <20210401233736.638171-12-bgardon@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

24ae4cfa

KVM: x86/mmu: Allow zapping collapsible SPTEs to use MMU read lock · 2db6f772

由 Ben Gardon 提交于 4月 01, 2021

To reduce the impact of disabling dirty logging, change the TDP MMU
function which zaps collapsible SPTEs to run under the MMU read lock.
This way, page faults on zapped SPTEs can proceed in parallel with
kvm_mmu_zap_collapsible_sptes.
Signed-off-by: NBen Gardon <bgardon@google.com>
Message-Id: <20210401233736.638171-11-bgardon@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

2db6f772

KVM: x86/mmu: Allow zap gfn range to operate under the mmu read lock · 6103bc07

由 Ben Gardon 提交于 4月 01, 2021

To reduce lock contention and interference with page fault handlers,
allow the TDP MMU function to zap a GFN range to operate under the MMU
read lock.
Signed-off-by: NBen Gardon <bgardon@google.com>
Message-Id: <20210401233736.638171-10-bgardon@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

6103bc07

KVM: x86/mmu: Protect the tdp_mmu_roots list with RCU · c0e64238

由 Ben Gardon 提交于 4月 01, 2021

Protect the contents of the TDP MMU roots list with RCU in preparation
for a future patch which will allow the iterator macro to be used under
the MMU lock in read mode.
Signed-off-by: NBen Gardon <bgardon@google.com>
Message-Id: <20210401233736.638171-9-bgardon@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c0e64238

KVM: x86/mmu: handle cmpxchg failure in kvm_tdp_mmu_get_root · fb101293

由 Ben Gardon 提交于 4月 01, 2021

To reduce dependence on the MMU write lock, don't rely on the assumption
that the atomic operation in kvm_tdp_mmu_get_root will always succeed.
By not relying on that assumption, threads do not need to hold the MMU
lock in write mode in order to take a reference on a TDP MMU root.

In the root iterator, this change means that some roots might have to be
skipped if they are found to have a zero refcount. This will still never
happen as of this patch, but a future patch will need that flexibility to
make the root iterator safe under the MMU read lock.
Signed-off-by: NBen Gardon <bgardon@google.com>
Message-Id: <20210401233736.638171-8-bgardon@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

fb101293

KVM: x86/mmu: Make TDP MMU root refcount atomic · 11cccf5c

由 Ben Gardon 提交于 4月 01, 2021

In order to parallelize more operations for the TDP MMU, make the
refcount on TDP MMU roots atomic, so that a future patch can allow
multiple threads to take a reference on the root concurrently, while
holding the MMU lock in read mode.
Signed-off-by: NBen Gardon <bgardon@google.com>
Message-Id: <20210401233736.638171-7-bgardon@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

11cccf5c

KVM: x86/mmu: Refactor yield safe root iterator · cfc10997

由 Ben Gardon 提交于 4月 01, 2021

Refactor the yield safe TDP MMU root iterator to be more amenable to
changes in future commits which will allow it to be used under the MMU
lock in read mode. Currently the iterator requires a complicated dance
between the helper functions and different parts of the for loop which
makes it hard to reason about. Moving all the logic into a single function
simplifies the iterator substantially.
Signed-off-by: NBen Gardon <bgardon@google.com>
Message-Id: <20210401233736.638171-6-bgardon@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

cfc10997

KVM: x86/mmu: Merge TDP MMU put and free root · 2bdb3d84

由 Ben Gardon 提交于 4月 01, 2021

kvm_tdp_mmu_put_root and kvm_tdp_mmu_free_root are always called
together, so merge the functions to simplify TDP MMU root refcounting /
freeing.
Signed-off-by: NBen Gardon <bgardon@google.com>
Message-Id: <20210401233736.638171-5-bgardon@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

2bdb3d84

KVM: x86/mmu: use tdp_mmu_free_sp to free roots · 4bba36d7

由 Ben Gardon 提交于 4月 01, 2021

Minor cleanup to deduplicate the code used to free a struct kvm_mmu_page
in the TDP MMU.

No functional change intended.
Signed-off-by: NBen Gardon <bgardon@google.com>
Message-Id: <20210401233736.638171-4-bgardon@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

4bba36d7

KVM: x86/mmu: Move kvm_mmu_(get|put)_root to TDP MMU · 76eb54e7

由 Ben Gardon 提交于 4月 01, 2021

The TDP MMU is almost the only user of kvm_mmu_get_root and
kvm_mmu_put_root. There is only one use of put_root in mmu.c for the
legacy / shadow MMU. Open code that one use and move the get / put
functions to the TDP MMU so they can be extended in future commits.

No functional change intended.
Signed-off-by: NBen Gardon <bgardon@google.com>
Message-Id: <20210401233736.638171-3-bgardon@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

76eb54e7

KVM: x86/mmu: Re-add const qualifier in kvm_tdp_mmu_zap_collapsible_sptes · 8ca6f063

由 Ben Gardon 提交于 4月 01, 2021

kvm_tdp_mmu_zap_collapsible_sptes unnecessarily removes the const
qualifier from its memlsot argument, leading to a compiler warning. Add
the const annotation and pass it to subsequent functions.
Signed-off-by: NBen Gardon <bgardon@google.com>
Message-Id: <20210401233736.638171-2-bgardon@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

8ca6f063

KVM: x86/mmu: Allow yielding during MMU notifier unmap/zap, if possible · e1eed584

由 Sean Christopherson 提交于 4月 01, 2021

Let the TDP MMU yield when unmapping a range in response to a MMU
notification, if yielding is allowed by said notification.  There is no
reason to disallow yielding in this case, and in theory the range being
invalidated could be quite large.

Cc: Ben Gardon <bgardon@google.com>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210402005658.3024832-11-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

e1eed584

17 4月, 2021 11 次提交

KVM: Kill off the old hva-based MMU notifier callbacks · b4c5936c

由 Sean Christopherson 提交于 4月 01, 2021

Yank out the hva-based MMU notifier APIs now that all architectures that
use the notifiers have moved to the gfn-based APIs.

No functional change intended.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210402005658.3024832-7-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

b4c5936c

KVM: Move x86's MMU notifier memslot walkers to generic code · 3039bcc7

由 Sean Christopherson 提交于 4月 01, 2021

Move the hva->gfn lookup for MMU notifiers into common code.  Every arch
does a similar lookup, and some arch code is all but identical across
multiple architectures.

In addition to consolidating code, this will allow introducing
optimizations that will benefit all architectures without incurring
multiple walks of the memslots, e.g. by taking mmu_lock if and only if a
relevant range exists in the memslots.

The use of __always_inline to avoid indirect call retpolines, as done by
x86, may also benefit other architectures.

Consolidating the lookups also fixes a wart in x86, where the legacy MMU
and TDP MMU each do their own memslot walks.

Lastly, future enhancements to the memslot implementation, e.g. to add an
interval tree to track host address, will need to touch far less arch
specific code.

MIPS, PPC, and arm64 will be converted one at a time in future patches.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210402005658.3024832-3-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

3039bcc7

KVM: constify kvm_arch_flush_remote_tlbs_memslot · 6c9dd6d2

由 Paolo Bonzini 提交于 4月 02, 2021

memslots are stored in RCU and there should be no need to
change them.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

6c9dd6d2

KVM: MMU: protect TDP MMU pages only down to required level · dbb6964e

由 Paolo Bonzini 提交于 4月 02, 2021

When using manual protection of dirty pages, it is not necessary
to protect nested page tables down to the 4K level; instead KVM
can protect only hugepages in order to split them lazily, and
delay write protection at 4K-granularity until KVM_CLEAR_DIRTY_LOG.
This was overlooked in the TDP MMU, so do it there as well.

Fixes: a6a0b05d ("kvm: x86/mmu: Support dirty logging for the TDP MMU")
Cc: Ben Gardon <bgardon@google.com>
Reviewed-by: NKeqian Zhu <zhukeqian1@huawei.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

dbb6964e

KVM: x86: implement KVM_CAP_SET_GUEST_DEBUG2 · 7e582ccb

由 Maxim Levitsky 提交于 4月 01, 2021

Store the supported bits into KVM_GUESTDBG_VALID_MASK
macro, similar to how arm does this.
Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20210401135451.1004564-4-mlevitsk@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

7e582ccb

KVM: x86: pending exceptions must not be blocked by an injected event · 4020da3b

由 Maxim Levitsky 提交于 4月 01, 2021

Injected interrupts/nmi should not block a pending exception,
but rather be either lost if nested hypervisor doesn't
intercept the pending exception (as in stock x86), or be delivered
in exitintinfo/IDT_VECTORING_INFO field, as a part of a VMexit
that corresponds to the pending exception.

The only reason for an exception to be blocked is when nested run
is pending (and that can't really happen currently
but still worth checking for).
Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20210401143817.1030695-2-mlevitsk@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

4020da3b

KVM: nSVM: call nested_svm_load_cr3 on nested state load · 232f75d3

由 Maxim Levitsky 提交于 4月 01, 2021

While KVM's MMU should be fully reset by loading of nested CR0/CR3/CR4
by KVM_SET_SREGS, we are not in nested mode yet when we do it and therefore
only root_mmu is reset.

On regular nested entries we call nested_svm_load_cr3 which both updates
the guest's CR3 in the MMU when it is needed, and it also initializes
the mmu again which makes it initialize the walk_mmu as well when nested
paging is enabled in both host and guest.

Since we don't call nested_svm_load_cr3 on nested state load,
the walk_mmu can be left uninitialized, which can lead to a NULL pointer
dereference while accessing it if we happen to get a nested page fault
right after entering the nested guest first time after the migration and
we decide to emulate it, which leads to the emulator trying to access
walk_mmu->gva_to_gpa which is NULL.

Therefore we should call this function on nested state load as well.
Suggested-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20210401141814.1029036-3-mlevitsk@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

232f75d3

KVM: x86: dump_vmcs should include the autoload/autostore MSR lists · 8486039a

由 David Edmondson 提交于 3月 18, 2021

When dumping the current VMCS state, include the MSRs that are being
automatically loaded/stored during VM entry/exit.
Suggested-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NDavid Edmondson <david.edmondson@oracle.com>
Message-Id: <20210318120841.133123-6-david.edmondson@oracle.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

8486039a

KVM: x86: dump_vmcs should show the effective EFER · 0702a3cb

由 David Edmondson 提交于 3月 18, 2021

If EFER is not being loaded from the VMCS, show the effective value by
reference to the MSR autoload list or calculation.
Suggested-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NDavid Edmondson <david.edmondson@oracle.com>
Message-Id: <20210318120841.133123-5-david.edmondson@oracle.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

0702a3cb

KVM: x86: dump_vmcs should consider only the load controls of EFER/PAT · 5518da62

由 David Edmondson 提交于 3月 18, 2021

When deciding whether to dump the GUEST_IA32_EFER and GUEST_IA32_PAT
fields of the VMCS, examine only the VM entry load controls, as saving
on VM exit has no effect on whether VM entry succeeds or fails.
Suggested-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NDavid Edmondson <david.edmondson@oracle.com>
Message-Id: <20210318120841.133123-4-david.edmondson@oracle.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

5518da62

KVM: x86: dump_vmcs should not conflate EFER and PAT presence in VMCS · 699e1b2e

由 David Edmondson 提交于 3月 18, 2021

Show EFER and PAT based on their individual entry/exit controls.
Signed-off-by: NDavid Edmondson <david.edmondson@oracle.com>
Message-Id: <20210318120841.133123-3-david.edmondson@oracle.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

699e1b2e

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功