提交 · 49d8665cc20b973995b5f519222d3270daa82f3f · openeuler / Kernel

02 8月, 2021 2 次提交

KVM: x86: Move EDX initialization at vCPU RESET to common code · 49d8665c

由 Sean Christopherson 提交于 7月 13, 2021

Move the EDX initialization at vCPU RESET, which is now identical between
VMX and SVM, into common code.

No functional change intended.
Reviewed-by: NReiji Watanabe <reijiw@google.com>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210713163324.627647-20-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

49d8665c

KVM: X86: Add per-vm stat for max rmap list size · ec1cf69c

由 Peter Xu 提交于 6月 25, 2021

Add a new statistic max_mmu_rmap_size, which stores the maximum size of rmap
for the vm.
Signed-off-by: NPeter Xu <peterx@redhat.com>
Message-Id: <20210625153214.43106-2-peterx@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

ec1cf69c

25 6月, 2021 10 次提交

KVM: x86: rename apic_access_page_done to apic_access_memslot_enabled · a01b45e9

由 Maxim Levitsky 提交于 6月 23, 2021

This better reflects the purpose of this variable on AMD, since
on AMD the AVIC's memory slot can be enabled and disabled dynamically.
Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20210623113002.111448-4-mlevitsk@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

a01b45e9

kvm: x86: Allow userspace to handle emulation errors · 19238e75

由 Aaron Lewis 提交于 5月 10, 2021

Add a fallback mechanism to the in-kernel instruction emulator that
allows userspace the opportunity to process an instruction the emulator
was unable to.  When the in-kernel instruction emulator fails to process
an instruction it will either inject a #UD into the guest or exit to
userspace with exit reason KVM_INTERNAL_ERROR.  This is because it does
not know how to proceed in an appropriate manner.  This feature lets
userspace get involved to see if it can figure out a better path
forward.
Signed-off-by: NAaron Lewis <aaronlewis@google.com>
Reviewed-by: NDavid Edmondson <david.edmondson@oracle.com>
Message-Id: <20210510144834.658457-2-aaronlewis@google.com>
Reviewed-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

19238e75

KVM: x86/mmu: Optimize and clean up so called "last nonleaf level" logic · 7cd138db

由 Sean Christopherson 提交于 6月 22, 2021

Drop the pre-computed last_nonleaf_level, which is arguably wrong and at
best confusing.  Per the comment:

  Can have large pages at levels 2..last_nonleaf_level-1.

the intent of the variable would appear to be to track what levels can
_legally_ have large pages, but that intent doesn't align with reality.
The computed value will be wrong for 5-level paging, or if 1gb pages are
not supported.

The flawed code is not a problem in practice, because except for 32-bit
PSE paging, bit 7 is reserved if large pages aren't supported at the
level.  Take advantage of this invariant and simply omit the level magic
math for 64-bit page tables (including PAE).

For 32-bit paging (non-PAE), the adjustments are needed purely because
bit 7 is ignored if PSE=0.  Retain that logic as is, but make
is_last_gpte() unique per PTTYPE so that the PSE check is avoided for
PAE and EPT paging.  In the spirit of avoiding branches, bump the "last
nonleaf level" for 32-bit PSE paging by adding the PSE bit itself.

Note, bit 7 is ignored or has other meaning in CR3/EPTP, but despite
FNAME(walk_addr_generic) briefly grabbing CR3/EPTP in "pte", they are
not PTEs and will blow up all the other gpte helpers.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210622175739.3610207-51-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

7cd138db

KVM: x86: Enhance comments for MMU roles and nested transition trickiness · 616007c8

由 Sean Christopherson 提交于 6月 22, 2021

Expand the comments for the MMU roles.  The interactions with gfn_track
PGD reuse in particular are hairy.

Regarding PGD reuse, add comments in the nested virtualization flows to
call out why kvm_init_mmu() is unconditionally called even when nested
TDP is used.

Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210622175739.3610207-50-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

616007c8

KVM: x86/mmu: Drop "nx" from MMU context now that there are no readers · a4c93252

由 Sean Christopherson 提交于 6月 22, 2021

Drop kvm_mmu.nx as there no consumers left.

No functional change intended.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210622175739.3610207-39-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

a4c93252

KVM: x86/mmu: Rename "nxe" role bit to "efer_nx" for macro shenanigans · 167f8a5c

由 Sean Christopherson 提交于 6月 22, 2021

Rename "nxe" to "efer_nx" so that future macro magic can use the pattern
<reg>_<bit> for all CR0, CR4, and EFER bits that included in the role.
Using "efer_nx" also makes it clear that the role bit reflects EFER.NX,
not the NX bit in the corresponding PTE.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210622175739.3610207-25-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

167f8a5c

Revert "KVM: MMU: record maximum physical address width in kvm_mmu_extended_role" · 6c032f12

由 Sean Christopherson 提交于 6月 22, 2021

Drop MAXPHYADDR from mmu_role now that all MMUs have their role
invalidated after a CPUID update.  Invalidating the role forces all MMUs
to re-evaluate the guest's MAXPHYADDR, and the guest's MAXPHYADDR can
only be changed only through a CPUID update.

This reverts commit de3ccd26.

Cc: Yu Zhang <yu.c.zhang@linux.intel.com>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210622175739.3610207-9-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

6c032f12

KVM: x86: Alert userspace that KVM_SET_CPUID{,2} after KVM_RUN is broken · 63f5a190

由 Sean Christopherson 提交于 6月 22, 2021

Warn userspace that KVM_SET_CPUID{,2} after KVM_RUN "may" cause guest
instability. Initialize last_vmentry_cpu to -1 and use it to detect if
the vCPU has been run at least once when its CPUID model is changed.

KVM does not correctly handle changes to paging related settings in the
guest's vCPU model after KVM_RUN, e.g. MAXPHYADDR, GBPAGES, etc... KVM
could theoretically zap all shadow pages, but actually making that happen
is a mess due to lock inversion (vcpu->mutex is held). And even then,
updating paging settings on the fly would only work if all vCPUs are
stopped, updated in concert with identical settings, then restarted.

To support running vCPUs with different vCPU models (that affect paging),
KVM would need to track all relevant information in kvm_mmu_page_role.
Note, that's the _page_ role, not the full mmu_role. Updating mmu_role
isn't sufficient as a vCPU can reuse a shadow page translation that was
created by a vCPU with different settings and thus completely skip the
reserved bit checks (that are tied to CPUID).

Tracking CPUID state in kvm_mmu_page_role is _extremely_ undesirable as
it would require doubling gfn_track from a u16 to a u32, i.e. would
increase KVM's memory footprint by 2 bytes for every 4kb of guest memory.
E.g. MAXPHYADDR (6 bits), GBPAGES, AMD vs. INTEL = 1 bit, and SEV C-BIT
would all need to be tracked.

In practice, there is no remotely sane use case for changing any paging
related CPUID entries on the fly, so just sweep it under the rug (after
yelling at userspace).
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210622175739.3610207-8-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

63f5a190

KVM: x86: Force all MMUs to reinitialize if guest CPUID is modified · 49c6f875

由 Sean Christopherson 提交于 6月 22, 2021

Invalidate all MMUs' roles after a CPUID update to force reinitizliation
of the MMU context/helpers.  Despite the efforts of commit de3ccd26
("KVM: MMU: record maximum physical address width in kvm_mmu_extended_role"),
there are still a handful of CPUID-based properties that affect MMU
behavior but are not incorporated into mmu_role.  E.g. 1gb hugepage
support, AMD vs. Intel handling of bit 8, and SEV's C-Bit location all
factor into the guest's reserved PTE bits.

The obvious alternative would be to add all such properties to mmu_role,
but doing so provides no benefit over simply forcing a reinitialization
on every CPUID update, as setting guest CPUID is a rare operation.

Note, reinitializing all MMUs after a CPUID update does not fix all of
KVM's woes.  Specifically, kvm_mmu_page_role doesn't track the CPUID
properties, which means that a vCPU can reuse shadow pages that should
not exist for the new vCPU model, e.g. that map GPAs that are now illegal
(due to MAXPHYADDR changes) or that set bits that are now reserved
(PAGE_SIZE for 1gb pages), etc...

Tracking the relevant CPUID properties in kvm_mmu_page_role would address
the majority of problems, but fully tracking that much state in the
shadow page role comes with an unpalatable cost as it would require a
non-trivial increase in KVM's memory footprint.  The GBPAGES case is even
worse, as neither Intel nor AMD provides a way to disable 1gb hugepage
support in the hardware page walker, i.e. it's a virtualization hole that
can't be closed when using TDP.

In other words, resetting the MMU after a CPUID update is largely a
superficial fix.  But, it will allow reverting the tracking of MAXPHYADDR
in the mmu_role, and that case in particular needs to mostly work because
KVM's shadow_root_level depends on guest MAXPHYADDR when 5-level paging
is supported.  For cases where KVM botches guest behavior, the damage is
limited to that guest.  But for the shadow_root_level, a misconfigured
MMU can cause KVM to incorrectly access memory, e.g. due to walking off
the end of its shadow page tables.

Fixes: 7dcd5755 ("x86/kvm/mmu: check if tdp/shadow MMU reconfiguration is needed")
Cc: Yu Zhang <yu.c.zhang@linux.intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210622175739.3610207-7-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

49c6f875

Revert "KVM: x86/mmu: Drop kvm_mmu_extended_role.cr4_la57 hack" · f71a53d1

由 Sean Christopherson 提交于 6月 22, 2021

Restore CR4.LA57 to the mmu_role to fix an amusing edge case with nested
virtualization. When KVM (L0) is using TDP, CR4.LA57 is not reflected in
mmu_role.base.level because that tracks the shadow root level, i.e. TDP
level. Normally, this is not an issue because LA57 can't be toggled
while long mode is active, i.e. the guest has to first disable paging,
then toggle LA57, then re-enable paging, thus ensuring an MMU
reinitialization.

But if L1 is crafty, it can load a new CR4 on VM-Exit and toggle LA57
without having to bounce through an unpaged section. L1 can also load a
new CR3 on exit, i.e. it doesn't even need to play crazy paging games, a
single entry PML5 is sufficient. Such shenanigans are only problematic
if L0 and L1 use TDP, otherwise L1 and L2 share an MMU that gets
reinitialized on nested VM-Enter/VM-Exit due to mmu_role.base.guest_mode.

Note, in the L2 case with nested TDP, even though L1 can switch between
L2s with different LA57 settings, thus bypassing the paging requirement,
in that case KVM's nested_mmu will track LA57 in base.level.

This reverts commit 8053f924.

Fixes: 8053f924 ("KVM: x86/mmu: Drop kvm_mmu_extended_role.cr4_la57 hack")
Cc: stable@vger.kernel.org
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210622175739.3610207-6-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

f71a53d1

24 6月, 2021 1 次提交

KVM: stats: Separate generic stats from architecture specific ones · 0193cc90

由 Jing Zhang 提交于 6月 18, 2021

Generic KVM stats are those collected in architecture independent code
or those supported by all architectures; put all generic statistics in
a separate structure.  This ensures that they are defined the same way
in the statistics API which is being added, removing duplication among
different architectures in the declaration of the descriptors.

No functional change intended.
Reviewed-by: NDavid Matlack <dmatlack@google.com>
Reviewed-by: NRicardo Koller <ricarkol@google.com>
Reviewed-by: NKrish Sadhukhan <krish.sadhukhan@oracle.com>
Signed-off-by: NJing Zhang <jingzhangos@google.com>
Message-Id: <20210618222709.1858088-2-jingzhangos@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

0193cc90

18 6月, 2021 22 次提交

KVM: X86: Introduce KVM_HC_MAP_GPA_RANGE hypercall · 0dbb1123

由 Ashish Kalra 提交于 6月 08, 2021

This hypercall is used by the SEV guest to notify a change in the page
encryption status to the hypervisor. The hypercall should be invoked
only when the encryption attribute is changed from encrypted -> decrypted
and vice versa. By default all guest pages are considered encrypted.

The hypercall exits to userspace to manage the guest shared regions and
integrate with the userspace VMM's migration code.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: x86@kernel.org
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Reviewed-by: NSteve Rutherford <srutherford@google.com>
Signed-off-by: NBrijesh Singh <brijesh.singh@amd.com>
Signed-off-by: NAshish Kalra <ashish.kalra@amd.com>
Co-developed-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Co-developed-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Message-Id: <90778988e1ee01926ff9cac447aacb745f954c8c.1623174621.git.ashish.kalra@amd.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

0dbb1123

KVM: switch per-VM stats to u64 · e3cb6fa0

由 Paolo Bonzini 提交于 6月 10, 2021

Make them the same type as vCPU stats.  There is no reason
to limit the counters to unsigned long.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

e3cb6fa0

KVM: nVMX: Free only guest_mode (L2) roots on INVVPID w/o EPT · 25b62c62

由 Sean Christopherson 提交于 6月 09, 2021

When emulating INVVPID for L1, free only L2+ roots, using the guest_mode
tag in the MMU role to identify L2+ roots.  From L1's perspective, its
own TLB entries use VPID=0, and INVVPID is not requied to invalidate such
entries.  Per Intel's SDM, INVVPID _may_ invalidate entries with VPID=0,
but it is not required to do so.

Cc: Lai Jiangshan <laijs@linux.alibaba.com>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210609234235.1244004-10-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

25b62c62

KVM: x86: Drop skip MMU sync and TLB flush params from "new PGD" helpers · b5129100

由 Sean Christopherson 提交于 6月 09, 2021

Drop skip_mmu_sync and skip_tlb_flush from __kvm_mmu_new_pgd() now that
all call sites unconditionally skip both the sync and flush.

No functional change intended.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210609234235.1244004-8-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

b5129100

KVM: nVMX: Sync all PGDs on nested transition with shadow paging · 07ffaf34

由 Sean Christopherson 提交于 6月 09, 2021

Trigger a full TLB flush on behalf of the guest on nested VM-Enter and
VM-Exit when VPID is disabled for L2. kvm_mmu_new_pgd() syncs only the
current PGD, which can theoretically leave stale, unsync'd entries in a
previous guest PGD, which could be consumed if L2 is allowed to load CR3
with PCID_NOFLUSH=1.

Rename KVM_REQ_HV_TLB_FLUSH to KVM_REQ_TLB_FLUSH_GUEST so that it can
be utilized for its obvious purpose of emulating a guest TLB flush.

Note, there is no change the actual TLB flush executed by KVM, even
though the fast PGD switch uses KVM_REQ_TLB_FLUSH_CURRENT. When VPID is
disabled for L2, vpid02 is guaranteed to be '0', and thus
nested_get_vpid02() will return the VPID that is shared by L1 and L2.

Generate the request outside of kvm_mmu_new_pgd(), as getting the common
helper to correctly identify which requested is needed is quite painful.
E.g. using KVM_REQ_TLB_FLUSH_GUEST when nested EPT is in play is wrong as
a TLB flush from the L1 kernel's perspective does not invalidate EPT
mappings. And, by using KVM_REQ_TLB_FLUSH_GUEST, nVMX can do future
simplification by moving the logic into nested_vmx_transition_tlb_flush().

Fixes: 41fab65e ("KVM: nVMX: Skip MMU sync on nested VMX transition when possible")
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210609234235.1244004-2-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

07ffaf34

KVM: x86: avoid loading PDPTRs after migration when possible · 158a48ec

由 Maxim Levitsky 提交于 6月 07, 2021

if new KVM_*_SREGS2 ioctls are used, the PDPTRs are
a part of the migration state and are correctly
restored by those ioctls.
Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20210607090203.133058-9-mlevitsk@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

158a48ec

KVM: x86: Always load PDPTRs on CR3 load for SVM w/o NPT and a PAE guest · c7313155

由 Sean Christopherson 提交于 6月 07, 2021

Kill off pdptrs_changed() and instead go through the full kvm_set_cr3()
for PAE guest, even if the new CR3 is the same as the current CR3. For
VMX, and SVM with NPT enabled, the PDPTRs are unconditionally marked as
unavailable after VM-Exit, i.e. the optimization is dead code except for
SVM without NPT.

In the unlikely scenario that anyone cares about SVM without NPT _and_ a
PAE guest, they've got bigger problems if their guest is loading the same
CR3 so frequently that the performance of kvm_set_cr3() is notable,
especially since KVM's fast PGD switching means reloading the same CR3
does not require a full rebuild. Given that PAE and PCID are mutually
exclusive, i.e. a sync and flush are guaranteed in any case, the actual
benefits of the pdptrs_changed() optimization are marginal at best.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210607090203.133058-4-mlevitsk@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c7313155

KVM: x86: hyper-v: Cache guest CPUID leaves determining features availability · 10d7bf1e

由 Vitaly Kuznetsov 提交于 5月 21, 2021

Limiting exposed Hyper-V features requires a fast way to check if the
particular feature is exposed in guest visible CPUIDs or not. To aboid
looping through all CPUID entries on every hypercall/MSR access cache
the required leaves on CPUID update.
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Message-Id: <20210521095204.2161214-4-vkuznets@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

10d7bf1e

KVM: x86: hyper-v: Introduce KVM_CAP_HYPERV_ENFORCE_CPUID · 644f7067

由 Vitaly Kuznetsov 提交于 5月 21, 2021

Modeled after KVM_CAP_ENFORCE_PV_FEATURE_CPUID, the new capability allows
for limiting Hyper-V features to those exposed to the guest in Hyper-V
CPUIDs (0x40000003, 0x40000004, ...).
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Message-Id: <20210521095204.2161214-3-vkuznets@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

644f7067

KVM: x86: hyper-v: Move the remote TLB flush logic out of vmx · 3c86c0d3

由 Vineeth Pillai 提交于 6月 03, 2021

Currently the remote TLB flush logic is specific to VMX.
Move it to a common place so that SVM can use it as well.
Signed-off-by: NVineeth Pillai <viremana@linux.microsoft.com>
Message-Id: <4f4e4ca19778437dae502f44363a38e99e3ef5d1.1622730232.git.viremana@linux.microsoft.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

3c86c0d3

KVM: nVMX: nSVM: Add a new VCPU statistic to show if VCPU is in guest mode · d5a0483f

由 Krish Sadhukhan 提交于 6月 09, 2021

Add the following per-VCPU statistic to KVM debugfs to show if a given
VCPU is in guest mode:

	guest_mode

Also add this as a per-VM statistic to KVM debugfs to show the total number
of VCPUs that are in guest mode in a given VM.
Signed-off-by: NKrish Sadhukhan <Krish.Sadhukhan@oracle.com>
Message-Id: <20210609180340.104248-3-krish.sadhukhan@oracle.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

d5a0483f

KVM: x86: Drop "pre_" from enter/leave_smm() helpers · ecc513e5

由 Sean Christopherson 提交于 6月 09, 2021

Now that .post_leave_smm() is gone, drop "pre_" from the remaining
helpers.  The helpers aren't invoked purely before SMI/RSM processing,
e.g. both helpers are invoked after state is snapshotted (from regs or
SMRAM), and the RSM helper is invoked after some amount of register state
has been stuffed.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210609185619.992058-10-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

ecc513e5

KVM: x86: Drop vendor specific functions for APICv/AVIC enablement · 4651fc56

由 Vitaly Kuznetsov 提交于 6月 09, 2021

Now that APICv/AVIC enablement is kept in common 'enable_apicv' variable,
there's no need to call kvm_apicv_init() from vendor specific code.

No functional change intended.
Reviewed-by: NSean Christopherson <seanjc@google.com>
Reviewed-by: NMaxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20210609150911.1471882c-3-vkuznets@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

4651fc56

KVM: x86: Use common 'enable_apicv' variable for both APICv and AVIC · fdf513e3

由 Vitaly Kuznetsov 提交于 6月 09, 2021

Unify VMX and SVM code by moving APICv/AVIC enablement tracking to common
'enable_apicv' variable. Note: unlike APICv, AVIC is disabled by default.

No functional change intended.
Suggested-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20210609150911.1471882c-2-vkuznets@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

fdf513e3

KVM: X86: Add vendor callbacks for writing the TSC multiplier · 1ab9287a

由 Ilias Stamatis 提交于 6月 07, 2021

Currently vmx_vcpu_load_vmcs() writes the TSC_MULTIPLIER field of the
VMCS every time the VMCS is loaded. Instead of doing this, set this
field from common code on initialization and whenever the scaling ratio
changes.

Additionally remove vmx->current_tsc_ratio. This field is redundant as
vcpu->arch.tsc_scaling_ratio already tracks the current TSC scaling
ratio. The vmx->current_tsc_ratio field is only used for avoiding
unnecessary writes but it is no longer needed after removing the code
from the VMCS load path.
Suggested-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NIlias Stamatis <ilstam@amazon.com>
Message-Id: <20210607105438.16541-1-ilstam@amazon.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

1ab9287a

KVM: X86: Move write_l1_tsc_offset() logic to common code and rename it · edcfe540

由 Ilias Stamatis 提交于 5月 26, 2021

The write_l1_tsc_offset() callback has a misleading name. It does not
set L1's TSC offset, it rather updates the current TSC offset which
might be different if a nested guest is executing. Additionally, both
the vmx and svm implementations use the same logic for calculating the
current TSC before writing it to hardware.

Rename the function and move the common logic to the caller. The vmx/svm
specific code now merely sets the given offset to the corresponding
hardware structure.
Signed-off-by: NIlias Stamatis <ilstam@amazon.com>
Reviewed-by: NMaxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20210526184418.28881-9-ilstam@amazon.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

edcfe540

KVM: X86: Add functions that calculate the nested TSC fields · 83150f29

由 Ilias Stamatis 提交于 5月 26, 2021

When L2 is entered we need to "merge" the TSC multiplier and TSC offset
values of 01 and 12 together.

The merging is done using the following equations:
  offset_02 = ((offset_01 * mult_12) >> shift_bits) + offset_12
  mult_02 = (mult_01 * mult_12) >> shift_bits

Where shift_bits is kvm_tsc_scaling_ratio_frac_bits.
Signed-off-by: NIlias Stamatis <ilstam@amazon.com>
Reviewed-by: NMaxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20210526184418.28881-8-ilstam@amazon.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

83150f29

KVM: X86: Add functions for retrieving L2 TSC fields from common code · 307a94c7

由 Ilias Stamatis 提交于 5月 26, 2021

In order to implement as much of the nested TSC scaling logic as
possible in common code, we need these vendor callbacks for retrieving
the TSC offset and the TSC multiplier that L1 has set for L2.
Signed-off-by: NIlias Stamatis <ilstam@amazon.com>
Reviewed-by: NMaxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20210526184418.28881-7-ilstam@amazon.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

307a94c7

KVM: X86: Add a ratio parameter to kvm_scale_tsc() · fe3eb504

由 Ilias Stamatis 提交于 5月 26, 2021

Sometimes kvm_scale_tsc() needs to use the current scaling ratio and
other times (like when reading the TSC from user space) it needs to use
L1's scaling ratio. Have the caller specify this by passing the ratio as
a parameter.
Signed-off-by: NIlias Stamatis <ilstam@amazon.com>
Reviewed-by: NMaxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20210526184418.28881-5-ilstam@amazon.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

fe3eb504

KVM: X86: Store L1's TSC scaling ratio in 'struct kvm_vcpu_arch' · 805d705f

由 Ilias Stamatis 提交于 5月 26, 2021

Store L1's scaling ratio in the kvm_vcpu_arch struct like we already do
for L1's TSC offset. This allows for easy save/restore when we enter and
then exit the nested guest.
Signed-off-by: NIlias Stamatis <ilstam@amazon.com>
Reviewed-by: NMaxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20210526184418.28881-3-ilstam@amazon.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

805d705f

KVM: x86/mmu: Lazily allocate memslot rmaps · d501f747

由 Ben Gardon 提交于 5月 18, 2021

If the TDP MMU is in use, wait to allocate the rmaps until the shadow
MMU is actually used. (i.e. a nested VM is launched.) This saves memory
equal to 0.2% of guest memory in cases where the TDP MMU is used and
there are no nested guests involved.
Signed-off-by: NBen Gardon <bgardon@google.com>
Message-Id: <20210518173414.450044-8-bgardon@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

d501f747

KVM: x86/mmu: Add a field to control memslot rmap allocation · a2557408

由 Ben Gardon 提交于 5月 18, 2021

Add a field to control whether new memslots should have rmaps allocated
for them. As of this change, it's not safe to skip allocating rmaps, so
the field is always set to allocate rmaps. Future changes will make it
safe to operate without rmaps, using the TDP MMU. Then further changes
will allow the rmaps to be allocated lazily when needed for nested
oprtation.

No functional change expected.
Reviewed-by: NDavid Hildenbrand <david@redhat.com>
Signed-off-by: NBen Gardon <bgardon@google.com>
Message-Id: <20210518173414.450044-6-bgardon@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

a2557408

27 5月, 2021 1 次提交

KVM: x86: add start_assignment hook to kvm_x86_ops · 57ab8794

由 Marcelo Tosatti 提交于 5月 25, 2021

Add a start_assignment hook to kvm_x86_ops, which is called when
kvm_arch_start_assignment is done.

The hook is required to update the wakeup vector of a sleeping vCPU
when a device is assigned to the guest.
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

Message-Id: <20210525134321.254128742@redhat.com>
Reviewed-by: NPeter Xu <peterx@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

57ab8794

07 5月, 2021 4 次提交

KVM: x86: Prevent KVM SVM from loading on kernels with 5-level paging · 03ca4589

由 Sean Christopherson 提交于 5月 05, 2021

Disallow loading KVM SVM if 5-level paging is supported.  In theory, NPT
for L1 should simply work, but there unknowns with respect to how the
guest's MAXPHYADDR will be handled by hardware.

Nested NPT is more problematic, as running an L1 VMM that is using
2-level page tables requires stacking single-entry PDP and PML4 tables in
KVM's NPT for L2, as there are no equivalent entries in L1's NPT to
shadow.  Barring hardware magic, for 5-level paging, KVM would need stack
another layer to handle PML5.

Opportunistically rename the lm_root pointer, which is used for the
aforementioned stacking when shadowing 2-level L1 NPT, to pml4_root to
call out that it's specifically for PML4.
Suggested-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210505204221.1934471-1-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

03ca4589

KVM: X86: Add support for the emulation of DR6_BUS_LOCK bit · e8ea85fb

由 Chenyi Qiang 提交于 2月 02, 2021

Bus lock debug exception introduces a new bit DR6_BUS_LOCK (bit 11 of
DR6) to indicate that bus lock #DB exception is generated. The set/clear
of DR6_BUS_LOCK is similar to the DR6_RTM. The processor clears
DR6_BUS_LOCK when the exception is generated. For all other #DB, the
processor sets this bit to 1. Software #DB handler should set this bit
before returning to the interrupted task.

In VMM, to avoid breaking the CPUs without bus lock #DB exception
support, activate the DR6_BUS_LOCK conditionally in DR6_FIXED_1 bits.
When intercepting the #DB exception caused by bus locks, bit 11 of the
exit qualification is set to identify it. The VMM should emulate the
exception by clearing the bit 11 of the guest DR6.
Co-developed-by: NXiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: NXiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: NChenyi Qiang <chenyi.qiang@intel.com>
Message-Id: <20210202090433.13441-3-chenyi.qiang@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

e8ea85fb

KVM: x86: Tie Intel and AMD behavior for MSR_TSC_AUX to guest CPU model · 61a05d44

由 Sean Christopherson 提交于 5月 04, 2021

Squish the Intel and AMD emulation of MSR_TSC_AUX together and tie it to
the guest CPU model instead of the host CPU behavior. While not strictly
necessary to avoid guest breakage, emulating cross-vendor "architecture"
will provide consistent behavior for the guest, e.g. WRMSR fault behavior
won't change if the vCPU is migrated to a host with divergent behavior.

Note, the "new" kvm_is_supported_user_return_msr() checks do not add new
functionality on either SVM or VMX. On SVM, the equivalent was
"tsc_aux_uret_slot < 0", and on VMX the check was buried in the
vmx_find_uret_msr() call at the find_uret_msr label.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210504171734.1434054-15-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

61a05d44

KVM: x86: Move uret MSR slot management to common x86 · e5fda4bb

由 Sean Christopherson 提交于 5月 04, 2021

Now that SVM and VMX both probe MSRs before "defining" user return slots
for them, consolidate the code for probe+define into common x86 and
eliminate the odd behavior of having the vendor code define the slot for
a given MSR.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210504171734.1434054-14-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

e5fda4bb

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功