提交 · c646236344e9054cc84cd5a9f763163b9654cf7e · openeuler / Kernel

04 2月, 2021 18 次提交

KVM: vmx/pmu: Add PMU_CAP_LBR_FMT check when guest LBR is enabled · c6462363

由 Like Xu 提交于 2月 01, 2021

Usespace could set the bits [0, 5] of the IA32_PERF_CAPABILITIES
MSR which tells about the record format stored in the LBR records.

The LBR will be enabled on the guest if host perf supports LBR
(checked via x86_perf_get_lbr()) and the vcpu model is compatible
with the host one.
Signed-off-by: NLike Xu <like.xu@linux.intel.com>
Message-Id: <20210201051039.255478-4-like.xu@linux.intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c6462363

KVM: vmx/pmu: Add PMU_CAP_LBR_FMT check when guest LBR is enabled · 9c9520ce

由 Paolo Bonzini 提交于 2月 02, 2021

Usespace could set the bits [0, 5] of the IA32_PERF_CAPABILITIES
MSR which tells about the record format stored in the LBR records.

The LBR will be enabled on the guest if host perf supports LBR
(checked via x86_perf_get_lbr()) and the vcpu model is compatible
with the host one.
Signed-off-by: NLike Xu <like.xu@linux.intel.com>
Message-Id: <20210201051039.255478-4-like.xu@linux.intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

9c9520ce

KVM: x86/pmu: preserve IA32_PERF_CAPABILITIES across CPUID refresh · a7557539

由 Paolo Bonzini 提交于 2月 02, 2021

Once MSR_IA32_PERF_CAPABILITIES is changed via vmx_set_msr(), the
value should not be changed by cpuid(). To ensure that the new value
is kept, the default initialization path is moved to intel_pmu_init().
The effective value of the MSR will be 0 if PDCM is clear, however.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

a7557539

KVM: x86/vmx: Make vmx_set_intercept_for_msr() non-static · 252e365e

由 Like Xu 提交于 2月 01, 2021

To make code responsibilities clear, we may resue and invoke the
vmx_set_intercept_for_msr() in other vmx-specific files (e.g. pmu_intel.c),
so expose it to passthrough LBR msrs later.
Signed-off-by: NLike Xu <like.xu@linux.intel.com>
Reviewed-by: NAndi Kleen <ak@linux.intel.com>
Message-Id: <20210201051039.255478-2-like.xu@linux.intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

252e365e

KVM: VMX: read/write MSR_IA32_DEBUGCTLMSR from GUEST_IA32_DEBUGCTL · d855066f

由 Like Xu 提交于 1月 08, 2021

SVM already has specific handlers of MSR_IA32_DEBUGCTLMSR in the
svm_get/set_msr, so the x86 common part can be safely moved to VMX.
This allows KVM to store the bits it supports in GUEST_IA32_DEBUGCTL.

Add vmx_supported_debugctl() to refactor the throwing logic of #GP.
Signed-off-by: NLike Xu <like.xu@linux.intel.com>
Reviewed-by: NAndi Kleen <ak@linux.intel.com>
Message-Id: <20210108013704.134985-2-like.xu@linux.intel.com>
[Merge parts of Chenyi Qiang's "KVM: X86: Expose bus lock debug exception
 to guest". - Paolo]
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

d855066f

KVM: VMX: Use x2apic_mode to avoid RDMSR when querying PI state · 563c54c4

由 Sean Christopherson 提交于 1月 15, 2021

Use x2apic_mode instead of x2apic_enabled() when adjusting the
destination ID during Posted Interrupt updates.  This avoids the costly
RDMSR that is hidden behind x2apic_enabled().
Reported-by: Nluferry <luferry@163.com>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210115220354.434807-3-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

563c54c4

KVM: VMX: Enable bus lock VM exit · fe6b6bc8

由 Chenyi Qiang 提交于 11月 06, 2020

Virtual Machine can exploit bus locks to degrade the performance of
system. Bus lock can be caused by split locked access to writeback(WB)
memory or by using locks on uncacheable(UC) memory. The bus lock is
typically >1000 cycles slower than an atomic operation within a cache
line. It also disrupts performance on other cores (which must wait for
the bus lock to be released before their memory operations can
complete).

To address the threat, bus lock VM exit is introduced to notify the VMM
when a bus lock was acquired, allowing it to enforce throttling or other
policy based mitigations.

A VMM can enable VM exit due to bus locks by setting a new "Bus Lock
Detection" VM-execution control(bit 30 of Secondary Processor-based VM
execution controls). If delivery of this VM exit was preempted by a
higher priority VM exit (e.g. EPT misconfiguration, EPT violation, APIC
access VM exit, APIC write VM exit, exception bitmap exiting), bit 26 of
exit reason in vmcs field is set to 1.

In current implementation, the KVM exposes this capability through
KVM_CAP_X86_BUS_LOCK_EXIT. The user can get the supported mode bitmap
(i.e. off and exit) and enable it explicitly (disabled by default). If
bus locks in guest are detected by KVM, exit to user space even when
current exit reason is handled by KVM internally. Set a new field
KVM_RUN_BUS_LOCK in vcpu->run->flags to inform the user space that there
is a bus lock detected in guest.

Document for Bus Lock VM exit is now available at the latest "Intel
Architecture Instruction Set Extensions Programming Reference".

Document Link:
https://software.intel.com/content/www/us/en/develop/download/intel-architecture-instruction-set-extensions-programming-reference.htmlCo-developed-by: NXiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: NXiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: NChenyi Qiang <chenyi.qiang@intel.com>
Message-Id: <20201106090315.18606-4-chenyi.qiang@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

fe6b6bc8

KVM: X86: Reset the vcpu->run->flags at the beginning of vcpu_run · 15aad3be

由 Chenyi Qiang 提交于 11月 06, 2020

Reset the vcpu->run->flags at the beginning of kvm_arch_vcpu_ioctl_run.
It can avoid every thunk of code that needs to set the flag clear it,
which increases the odds of missing a case and ending up with a flag in
an undefined state.
Signed-off-by: NChenyi Qiang <chenyi.qiang@intel.com>
Message-Id: <20201106090315.18606-3-chenyi.qiang@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

15aad3be

KVM: VMX: Convert vcpu_vmx.exit_reason to a union · 8e533240

由 Sean Christopherson 提交于 11月 06, 2020

Convert vcpu_vmx.exit_reason from a u32 to a union (of size u32). The
full VM_EXIT_REASON field is comprised of a 16-bit basic exit reason in
bits 15:0, and single-bit modifiers in bits 31:16.

Historically, KVM has only had to worry about handling the "failed
VM-Entry" modifier, which could only be set in very specific flows and
required dedicated handling. I.e. manually stripping the FAILED_VMENTRY
bit was a somewhat viable approach. But even with only a single bit to
worry about, KVM has had several bugs related to comparing a basic exit
reason against the full exit reason store in vcpu_vmx.

Upcoming Intel features, e.g. SGX, will add new modifier bits that can
be set on more or less any VM-Exit, as opposed to the significantly more
restricted FAILED_VMENTRY, i.e. correctly handling everything in one-off
flows isn't scalable. Tracking exit reason in a union forces code to
explicitly choose between consuming the full exit reason and the basic
exit, and is a convenient way to document and access the modifiers.

No functional change intended.

Cc: Xiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: NChenyi Qiang <chenyi.qiang@intel.com>
Message-Id: <20201106090315.18606-2-chenyi.qiang@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

8e533240

KVM/SVM: add support for SEV attestation command · 2c07ded0

由 Brijesh Singh 提交于 1月 04, 2021

The SEV FW version >= 0.23 added a new command that can be used to query
the attestation report containing the SHA-256 digest of the guest memory
encrypted through the KVM_SEV_LAUNCH_UPDATE_{DATA, VMSA} commands and
sign the report with the Platform Endorsement Key (PEK).

See the SEV FW API spec section 6.8 for more details.

Note there already exist a command (KVM_SEV_LAUNCH_MEASURE) that can be
used to get the SHA-256 digest. The main difference between the
KVM_SEV_LAUNCH_MEASURE and KVM_SEV_ATTESTATION_REPORT is that the latter
can be called while the guest is running and the measurement value is
signed with PEK.

Cc: James Bottomley <jejb@linux.ibm.com>
Cc: Tom Lendacky <Thomas.Lendacky@amd.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: John Allen <john.allen@amd.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: linux-crypto@vger.kernel.org
Reviewed-by: NTom Lendacky <thomas.lendacky@amd.com>
Acked-by: NDavid Rientjes <rientjes@google.com>
Tested-by: NJames Bottomley <jejb@linux.ibm.com>
Signed-off-by: NBrijesh Singh <brijesh.singh@amd.com>
Message-Id: <20210104151749.30248-1-brijesh.singh@amd.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

2c07ded0

KVM: x86/mmu: Remove the defunct update_pte() paging hook · c5e2184d

由 Sean Christopherson 提交于 1月 14, 2021

Remove the update_pte() shadow paging logic, which was obsoleted by
commit 4731d4c7 ("KVM: MMU: out of sync shadow core"), but never
removed.  As pointed out by Yu, KVM never write protects leaf page
tables for the purposes of shadow paging, and instead marks their
associated shadow page as unsync so that the guest can write PTEs at
will.

The update_pte() path, which predates the unsync logic, optimizes COW
scenarios by refreshing leaf SPTEs when they are written, as opposed to
zapping the SPTE, restarting the guest, and installing the new SPTE on
the subsequent fault.  Since KVM no longer write-protects leaf page
tables, update_pte() is unreachable and can be dropped.
Reported-by: NYu Zhang <yu.c.zhang@intel.com>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210115004051.4099250-1-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c5e2184d

KVM: Expose AVX_VNNI instruction to guset · 1085a6b5

由 Yang Zhong 提交于 1月 05, 2021

Expose AVX (VEX-encoded) versions of the Vector Neural Network
Instructions to guest.

The bit definition:
CPUID.(EAX=7,ECX=1):EAX[bit 4] AVX_VNNI

The following instructions are available when this feature is
present in the guest.
  1. VPDPBUS: Multiply and Add Unsigned and Signed Bytes
  2. VPDPBUSDS: Multiply and Add Unsigned and Signed Bytes with Saturation
  3. VPDPWSSD: Multiply and Add Signed Word Integers
  4. VPDPWSSDS: Multiply and Add Signed Integers with Saturation

This instruction is currently documented in the latest "extensions"
manual (ISE). It will appear in the "main" manual (SDM) in the future.
Signed-off-by: NYang Zhong <yang.zhong@intel.com>
Reviewed-by: NTony Luck <tony.luck@intel.com>
Message-Id: <20210105004909.42000-3-yang.zhong@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

1085a6b5

x86: kvm: style: Simplify bool comparison · 7ca7f3b9

由 YANG LI 提交于 1月 11, 2021

Fix the following coccicheck warning:
./arch/x86/kvm/x86.c:8012:5-48: WARNING: Comparison to bool
Signed-off-by: NYANG LI <abaci-bugfix@linux.alibaba.com>
Reported-by: NAbaci Robot <abaci@linux.alibaba.com>
Message-Id: <1610357578-66081-1-git-send-email-abaci-bugfix@linux.alibaba.com>
Reviewed-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

7ca7f3b9

KVM: x86: Zap the oldest MMU pages, not the newest · 8fc51726

由 Sean Christopherson 提交于 1月 13, 2021

Walk the list of MMU pages in reverse in kvm_mmu_zap_oldest_mmu_pages().
The list is FIFO, meaning new pages are inserted at the head and thus
the oldest pages are at the tail.  Using a "forward" iterator causes KVM
to zap MMU pages that were just added, which obliterates guest
performance once the max number of shadow MMU pages is reached.

Fixes: 6b82ef2c ("KVM: x86/mmu: Batch zap MMU pages when recycling oldest pages")
Reported-by: NZdenek Kaspar <zkaspar82@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210113205030.3481307-1-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

8fc51726

KVM: x86/mmu: Use boolean returns for (S)PTE accessors · 15e6a7e5

由 Sean Christopherson 提交于 1月 22, 2021

Return a 'bool' instead of an 'int' for various PTE accessors that are
boolean in nature, e.g. is_shadow_present_pte().  Returning an int is
goofy and potentially dangerous, e.g. if a flag being checked is moved
into the upper 32 bits of a SPTE, then the compiler may silently squash
the entire check since casting to an int is guaranteed to yield a
return value of '0'.

Opportunistically refactor is_last_spte() so that it naturally returns
a bool value instead of letting it implicitly cast 0/1 to false/true.

No functional change intended.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210123003003.3137525-1-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

15e6a7e5

KVM: x86: Take KVM's SRCU lock only if steal time update is needed · 15b51dc0

由 Sean Christopherson 提交于 1月 22, 2021

Enter a SRCU critical section for a memslots lookup during steal time
update if and only if a steal time update is actually needed.  Taking
the lock can be avoided if steal time is disabled by the guest, or if
KVM knows it has already flagged the vCPU as being preempted.

Reword the comment to be more precise as to exactly why memslots will
be queried.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210123000334.3123628-3-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

15b51dc0

KVM: x86: Remove obsolete disabling of page faults in kvm_arch_vcpu_put() · 19979fba

由 Sean Christopherson 提交于 1月 22, 2021

Remove the disabling of page faults across kvm_steal_time_set_preempted()
as KVM now accesses the steal time struct (shared with the guest) via a
cached mapping (see commit b0431382, "x86/KVM: Make sure
KVM_VCPU_FLUSH_TLB flag is not missed".) The cache lookup is flagged as
atomic, thus it would be a bug if KVM tried to resolve a new pfn, i.e.
we want the splat that would be reached via might_fault().
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210123000334.3123628-2-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

19979fba

KVM: x86/mmu: Fix TDP MMU zap collapsible SPTEs · 87aa9ec9

由 Ben Gardon 提交于 2月 02, 2021

There is a bug in the TDP MMU function to zap SPTEs which could be
replaced with a larger mapping which prevents the function from doing
anything. Fix this by correctly zapping the last level SPTEs.

Cc: stable@vger.kernel.org
Fixes: 14881998 ("kvm: x86/mmu: Support disabling dirty logging for the tdp MMU")
Signed-off-by: NBen Gardon <bgardon@google.com>
Message-Id: <20210202185734.1680553-11-bgardon@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

87aa9ec9

03 2月, 2021 3 次提交

KVM: x86: cleanup CR3 reserved bits checks · c1c35cf7

由 Paolo Bonzini 提交于 11月 13, 2020

If not in long mode, the low bits of CR3 are reserved but not enforced to
be zero, so remove those checks.  If in long mode, however, the MBZ bits
extend down to the highest physical address bit of the guest, excluding
the encryption bit.

Make the checks consistent with the above, and match them between
nested_vmcb_checks and KVM_SET_SREGS.

Cc: stable@vger.kernel.org
Fixes: 761e4169 ("KVM: nSVM: Check that MBZ bits in CR3 and CR4 are not set on vmrun of nested guests")
Fixes: a780a3ea ("KVM: X86: Fix reserved bits check for MOV to CR3")
Reviewed-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c1c35cf7

KVM: SVM: Treat SVM as unsupported when running as an SEV guest · ccd85d90

由 Sean Christopherson 提交于 2月 02, 2021

Don't let KVM load when running as an SEV guest, regardless of what
CPUID says.  Memory is encrypted with a key that is not accessible to
the host (L0), thus it's impossible for L0 to emulate SVM, e.g. it'll
see garbage when reading the VMCB.

Technically, KVM could decrypt all memory that needs to be accessible to
the L0 and use shadow paging so that L0 does not need to shadow NPT, but
exposing such information to L0 largely defeats the purpose of running as
an SEV guest.  This can always be revisited if someone comes up with a
use case for running VMs inside SEV guests.

Note, VMLOAD, VMRUN, etc... will also #GP on GPAs with C-bit set, i.e. KVM
is doomed even if the SEV guest is debuggable and the hypervisor is willing
to decrypt the VMCB.  This may or may not be fixed on CPUs that have the
SVME_ADDR_CHK fix.

Cc: stable@vger.kernel.org
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210202212017.2486595-1-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

ccd85d90

KVM: x86: Update emulator context mode if SYSENTER xfers to 64-bit mode · 943dea8a

由 Sean Christopherson 提交于 2月 02, 2021

Set the emulator context to PROT64 if SYSENTER transitions from 32-bit
userspace (compat mode) to a 64-bit kernel, otherwise the RIP update at
the end of x86_emulate_insn() will incorrectly truncate the new RIP.

Note, this bug is mostly limited to running an Intel virtual CPU model on
an AMD physical CPU, as other combinations of virtual and physical CPUs
do not trigger full emulation.  On Intel CPUs, SYSENTER in compatibility
mode is legal, and unconditionally transitions to 64-bit mode.  On AMD
CPUs, SYSENTER is illegal in compatibility mode and #UDs.  If the vCPU is
AMD, KVM injects a #UD on SYSENTER in compat mode.  If the pCPU is Intel,
SYSENTER will execute natively and not trigger #UD->VM-Exit (ignoring
guest TLB shenanigans).

Fixes: fede8076 ("KVM: x86: handle wrap around 32-bit address space")
Cc: stable@vger.kernel.org
Signed-off-by: NJonny Barker <jonny@jonnybarker.com>
[sean: wrote changelog]
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210202165546.2390296-1-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

943dea8a

02 2月, 2021 3 次提交

KVM: x86: Supplement __cr4_reserved_bits() with X86_FEATURE_PCID check · 4683d758

由 Vitaly Kuznetsov 提交于 2月 01, 2021

Commit 7a873e45 ("KVM: selftests: Verify supported CR4 bits can be set
before KVM_SET_CPUID2") reveals that KVM allows to set X86_CR4_PCIDE even
when PCID support is missing:

==== Test Assertion Failure ====
  x86_64/set_sregs_test.c:41: rc
  pid=6956 tid=6956 - Invalid argument
     1	0x000000000040177d: test_cr4_feature_bit at set_sregs_test.c:41
     2	0x00000000004014fc: main at set_sregs_test.c:119
     3	0x00007f2d9346d041: ?? ??:0
     4	0x000000000040164d: _start at ??:?
  KVM allowed unsupported CR4 bit (0x20000)

Add X86_FEATURE_PCID feature check to __cr4_reserved_bits() to make
kvm_is_valid_cr4() fail.
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20210201142843.108190-1-vkuznets@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

4683d758

KVM/x86: assign hva with the right value to vm_munmap the pages · b66f9bab

由 Zheng Zhan Liang 提交于 2月 01, 2021

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Wanpeng Li <wanpengli@tencent.com>
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: NZheng Zhan Liang <zhengzhanliang@huorong.cn>
Message-Id: <20210201055310.267029-1-zhengzhanliang@huorong.cn>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

b66f9bab

KVM: x86: Allow guests to see MSR_IA32_TSX_CTRL even if tsx=off · 7131636e

由 Paolo Bonzini 提交于 1月 28, 2021

Userspace that does not know about KVM_GET_MSR_FEATURE_INDEX_LIST
will generally use the default value for MSR_IA32_ARCH_CAPABILITIES.
When this happens and the host has tsx=on, it is possible to end up with
virtual machines that have HLE and RTM disabled, but TSX_CTRL available.

If the fleet is then switched to tsx=off, kvm_get_arch_capabilities()
will clear the ARCH_CAP_TSX_CTRL_MSR bit and it will not be possible to
use the tsx=off hosts as migration destinations, even though the guests
do not have TSX enabled.

To allow this migration, allow guests to write to their TSX_CTRL MSR,
while keeping the host MSR unchanged for the entire life of the guests.
This ensures that TSX remains disabled and also saves MSR reads and
writes, and it's okay to do because with tsx=off we know that guests will
not have the HLE and RTM features in their CPUID. (If userspace sets
bogus CPUID data, we do not expect HLE and RTM to work in guests anyway).

Cc: stable@vger.kernel.org
Fixes: cbbaa272 ("KVM: x86: fix presentation of TSX feature in ARCH_CAPABILITIES")
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

7131636e

29 1月, 2021 1 次提交

Fix unsynchronized access to sev members through svm_register_enc_region · 19a23da5

由 Peter Gonda 提交于 1月 27, 2021

Grab kvm->lock before pinning memory when registering an encrypted
region; sev_pin_memory() relies on kvm->lock being held to ensure
correctness when checking and updating the number of pinned pages.

Add a lockdep assertion to help prevent future regressions.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: x86@kernel.org
Cc: kvm@vger.kernel.org
Cc: stable@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Fixes: 1e80fdc0 ("KVM: SVM: Pin guest memory when SEV is active")
Signed-off-by: NPeter Gonda <pgonda@google.com>

V2
 - Fix up patch description
 - Correct file paths svm.c -> sev.c
 - Add unlock of kvm->lock on sev_pin_memory error

V1
 - https://lore.kernel.org/kvm/20210126185431.1824530-1-pgonda@google.com/

Message-Id: <20210127161524.2832400-1-pgonda@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

19a23da5

28 1月, 2021 1 次提交

KVM: x86: fix CPUID entries returned by KVM_GET_CPUID2 ioctl · 181f4948

由 Michael Roth 提交于 1月 27, 2021

Recent commit 255cbecf modified struct kvm_vcpu_arch to make
'cpuid_entries' a pointer to an array of kvm_cpuid_entry2 entries
rather than embedding the array in the struct. KVM_SET_CPUID and
KVM_SET_CPUID2 were updated accordingly, but KVM_GET_CPUID2 was missed.

As a result, KVM_GET_CPUID2 currently returns random fields from struct
kvm_vcpu_arch to userspace rather than the expected CPUID values. Fix
this by treating 'cpuid_entries' as a pointer when copying its
contents to userspace buffer.

Fixes: 255cbecf ("KVM: x86: allocate vcpu->arch.cpuid_entries dynamically")
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: NMichael Roth <michael.roth@amd.com.com>
Message-Id: <20210128024451.1816770-1-michael.roth@amd.com>
Cc: stable@vger.kernel.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

181f4948

26 1月, 2021 9 次提交

KVM: x86: allow KVM_REQ_GET_NESTED_STATE_PAGES outside guest mode for VMX · 9a78e158

由 Paolo Bonzini 提交于 1月 08, 2021

VMX also uses KVM_REQ_GET_NESTED_STATE_PAGES for the Hyper-V eVMCS,
which may need to be loaded outside guest mode.  Therefore we cannot
WARN in that case.

However, that part of nested_get_vmcs12_pages is _not_ needed at
vmentry time.  Split it out of KVM_REQ_GET_NESTED_STATE_PAGES handling,
so that both vmentry and migration (and in the latter case, independent
of is_guest_mode) do the parts that are needed.

Cc: <stable@vger.kernel.org> # 5.10.x: f2c7ef3b: KVM: nSVM: cancel KVM_REQ_GET_NESTED_STATE_PAGES
Cc: <stable@vger.kernel.org> # 5.10.x
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

9a78e158

KVM: x86: Revert "KVM: x86: Mark GPRs dirty when written" · aed89418

由 Sean Christopherson 提交于 1月 22, 2021

Revert the dirty/available tracking of GPRs now that KVM copies the GPRs
to the GHCB on any post-VMGEXIT VMRUN, even if a GPR is not dirty.  Per
commit de3cd117 ("KVM: x86: Omit caching logic for always-available
GPRs"), tracking for GPRs noticeably impacts KVM's code footprint.

This reverts commit 1c04d8c9.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210122235049.3107620-3-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

aed89418

KVM: SVM: Unconditionally sync GPRs to GHCB on VMRUN of SEV-ES guest · 25009140

由 Sean Christopherson 提交于 1月 22, 2021

Drop the per-GPR dirty checks when synchronizing GPRs to the GHCB, the
GRPs' dirty bits are set from time zero and never cleared, i.e. will
always be seen as dirty.  The obvious alternative would be to clear
the dirty bits when appropriate, but removing the dirty checks is
desirable as it allows reverting GPR dirty+available tracking, which
adds overhead to all flavors of x86 VMs.

Note, unconditionally writing the GPRs in the GHCB is tacitly allowed
by the GHCB spec, which allows the hypervisor (or guest) to provide
unnecessary info; it's the guest's responsibility to consume only what
it needs (the hypervisor is untrusted after all).

  The guest and hypervisor can supply additional state if desired but
  must not rely on that additional state being provided.

Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Fixes: 291bd20d ("KVM: SVM: Add initial support for a VMGEXIT VMEXIT")
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210122235049.3107620-2-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

25009140

KVM: nVMX: Sync unsync'd vmcs02 state to vmcs12 on migration · d51e1d3f

由 Maxim Levitsky 提交于 1月 14, 2021

Even when we are outside the nested guest, some vmcs02 fields
may not be in sync vs vmcs12.  This is intentional, even across
nested VM-exit, because the sync can be delayed until the nested
hypervisor performs a VMCLEAR or a VMREAD/VMWRITE that affects those
rarely accessed fields.

However, during KVM_GET_NESTED_STATE, the vmcs12 has to be up to date to
be able to restore it.  To fix that, call copy_vmcs02_to_vmcs12_rare()
before the vmcs12 contents are copied to userspace.

Fixes: 7952d769 ("KVM: nVMX: Sync rarely accessed guest fields only when needed")
Reviewed-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20210114205449.8715-2-mlevitsk@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

d51e1d3f

kvm: tracing: Fix unmatched kvm_entry and kvm_exit events · d95df951

由 Lorenzo Brescia 提交于 12月 23, 2020

On VMX, if we exit and then re-enter immediately without leaving
the vmx_vcpu_run() function, the kvm_entry event is not logged.
That means we will see one (or more) kvm_exit, without its (their)
corresponding kvm_entry, as shown here:

 CPU-1979 [002] 89.871187: kvm_entry: vcpu 1
 CPU-1979 [002] 89.871218: kvm_exit:  reason MSR_WRITE
 CPU-1979 [002] 89.871259: kvm_exit:  reason MSR_WRITE

It also seems possible for a kvm_entry event to be logged, but then
we leave vmx_vcpu_run() right away (if vmx->emulation_required is
true). In this case, we will have a spurious kvm_entry event in the
trace.

Fix these situations by moving trace_kvm_entry() inside vmx_vcpu_run()
(where trace_kvm_exit() already is).

A trace obtained with this patch applied looks like this:

 CPU-14295 [000] 8388.395387: kvm_entry: vcpu 0
 CPU-14295 [000] 8388.395392: kvm_exit:  reason MSR_WRITE
 CPU-14295 [000] 8388.395393: kvm_entry: vcpu 0
 CPU-14295 [000] 8388.395503: kvm_exit:  reason EXTERNAL_INTERRUPT

Of course, not calling trace_kvm_entry() in common x86 code any
longer means that we need to adjust the SVM side of things too.
Signed-off-by: NLorenzo Brescia <lorenzo.brescia@edu.unito.it>
Signed-off-by: NDario Faggioli <dfaggioli@suse.com>
Message-Id: <160873470698.11652.13483635328769030605.stgit@Wayrath>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

d95df951

KVM: x86: get smi pending status correctly · 1f7becf1

由 Jay Zhou 提交于 1月 18, 2021

The injection process of smi has two steps:

    Qemu                        KVM
Step1:
    cpu->interrupt_request &= \
        ~CPU_INTERRUPT_SMI;
    kvm_vcpu_ioctl(cpu, KVM_SMI)

                                call kvm_vcpu_ioctl_smi() and
                                kvm_make_request(KVM_REQ_SMI, vcpu);

Step2:
    kvm_vcpu_ioctl(cpu, KVM_RUN, 0)

                                call process_smi() if
                                kvm_check_request(KVM_REQ_SMI, vcpu) is
                                true, mark vcpu->arch.smi_pending = true;

The vcpu->arch.smi_pending will be set true in step2, unfortunately if
vcpu paused between step1 and step2, the kvm_run->immediate_exit will be
set and vcpu has to exit to Qemu immediately during step2 before mark
vcpu->arch.smi_pending true.
During VM migration, Qemu will get the smi pending status from KVM using
KVM_GET_VCPU_EVENTS ioctl at the downtime, then the smi pending status
will be lost.
Signed-off-by: NJay Zhou <jianjay.zhou@huawei.com>
Signed-off-by: NShengen Zhuang <zhuangshengen@huawei.com>
Message-Id: <20210118084720.1585-1-jianjay.zhou@huawei.com>
Cc: stable@vger.kernel.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

1f7becf1

KVM: x86/pmu: Fix HW_REF_CPU_CYCLES event pseudo-encoding in intel_arch_events[] · 98dd2f10

由 Like Xu 提交于 12月 30, 2020

The HW_REF_CPU_CYCLES event on the fixed counter 2 is pseudo-encoded as
0x0300 in the intel_perfmon_event_map[]. Correct its usage.

Fixes: 62079d8a ("KVM: PMU: add proper support for fixed counter 2")
Signed-off-by: NLike Xu <like.xu@linux.intel.com>
Message-Id: <20201230081916.63417-1-like.xu@linux.intel.com>
Reviewed-by: NSean Christopherson <seanjc@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

98dd2f10

KVM: x86/pmu: Fix UBSAN shift-out-of-bounds warning in intel_pmu_refresh() · e61ab2a3

由 Like Xu 提交于 1月 18, 2021

Since we know vPMU will not work properly when (1) the guest bit_width(s)
of the [gp|fixed] counters are greater than the host ones, or (2) guest
requested architectural events exceeds the range supported by the host, so
we can setup a smaller left shift value and refresh the guest cpuid entry,
thus fixing the following UBSAN shift-out-of-bounds warning:

shift exponent 197 is too large for 64-bit type 'long long unsigned int'

Call Trace:
 __dump_stack lib/dump_stack.c:79 [inline]
 dump_stack+0x107/0x163 lib/dump_stack.c:120
 ubsan_epilogue+0xb/0x5a lib/ubsan.c:148
 __ubsan_handle_shift_out_of_bounds.cold+0xb1/0x181 lib/ubsan.c:395
 intel_pmu_refresh.cold+0x75/0x99 arch/x86/kvm/vmx/pmu_intel.c:348
 kvm_vcpu_after_set_cpuid+0x65a/0xf80 arch/x86/kvm/cpuid.c:177
 kvm_vcpu_ioctl_set_cpuid2+0x160/0x440 arch/x86/kvm/cpuid.c:308
 kvm_arch_vcpu_ioctl+0x11b6/0x2d70 arch/x86/kvm/x86.c:4709
 kvm_vcpu_ioctl+0x7b9/0xdb0 arch/x86/kvm/../../../virt/kvm/kvm_main.c:3386
 vfs_ioctl fs/ioctl.c:48 [inline]
 __do_sys_ioctl fs/ioctl.c:753 [inline]
 __se_sys_ioctl fs/ioctl.c:739 [inline]
 __x64_sys_ioctl+0x193/0x200 fs/ioctl.c:739
 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

Reported-by: syzbot+ae488dc136a4cc6ba32b@syzkaller.appspotmail.com
Signed-off-by: NLike Xu <like.xu@linux.intel.com>
Message-Id: <20210118025800.34620-1-like.xu@linux.intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

e61ab2a3

KVM: x86: Add more protection against undefined behavior in rsvd_bits() · eb79cd00

由 Sean Christopherson 提交于 1月 13, 2021

Add compile-time asserts in rsvd_bits() to guard against KVM passing in
garbage hardcoded values, and cap the upper bound at '63' for dynamic
values to prevent generating a mask that would overflow a u64.
Suggested-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210113204515.3473079-1-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

eb79cd00

08 1月, 2021 5 次提交

P
KVM: x86: __kvm_vcpu_halt can be static · 872f36eb
由 Paolo Bonzini 提交于 1月 08, 2021
```
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
```
872f36eb

KVM: SVM: Add support for booting APs in an SEV-ES guest · 647daca2

由 Tom Lendacky 提交于 1月 04, 2021

Typically under KVM, an AP is booted using the INIT-SIPI-SIPI sequence,
where the guest vCPU register state is updated and then the vCPU is VMRUN
to begin execution of the AP. For an SEV-ES guest, this won't work because
the guest register state is encrypted.

Following the GHCB specification, the hypervisor must not alter the guest
register state, so KVM must track an AP/vCPU boot. Should the guest want
to park the AP, it must use the AP Reset Hold exit event in place of, for
example, a HLT loop.

First AP boot (first INIT-SIPI-SIPI sequence):
Execute the AP (vCPU) as it was initialized and measured by the SEV-ES
support. It is up to the guest to transfer control of the AP to the
proper location.

Subsequent AP boot:
KVM will expect to receive an AP Reset Hold exit event indicating that
the vCPU is being parked and will require an INIT-SIPI-SIPI sequence to
awaken it. When the AP Reset Hold exit event is received, KVM will place
the vCPU into a simulated HLT mode. Upon receiving the INIT-SIPI-SIPI
sequence, KVM will make the vCPU runnable. It is again up to the guest
to then transfer control of the AP to the proper location.

To differentiate between an actual HLT and an AP Reset Hold, a new MP
state is introduced, KVM_MP_STATE_AP_RESET_HOLD, which the vCPU is
placed in upon receiving the AP Reset Hold exit event. Additionally, to
communicate the AP Reset Hold exit event up to userspace (if needed), a
new exit reason is introduced, KVM_EXIT_AP_RESET_HOLD.

A new x86 ops function is introduced, vcpu_deliver_sipi_vector, in order
to accomplish AP booting. For VMX, vcpu_deliver_sipi_vector is set to the
original SIPI delivery function, kvm_vcpu_deliver_sipi_vector(). SVM adds
a new function that, for non SEV-ES guests, invokes the original SIPI
delivery function, kvm_vcpu_deliver_sipi_vector(), but for SEV-ES guests,
implements the logic above.
Signed-off-by: NTom Lendacky <thomas.lendacky@amd.com>
Message-Id: <e8fbebe8eb161ceaabdad7c01a5859a78b424d5e.1609791600.git.thomas.lendacky@amd.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

647daca2

KVM: nSVM: cancel KVM_REQ_GET_NESTED_STATE_PAGES on nested vmexit · f2c7ef3b

由 Maxim Levitsky 提交于 1月 07, 2021

It is possible to exit the nested guest mode, entered by
svm_set_nested_state prior to first vm entry to it (e.g due to pending event)
if the nested run was not pending during the migration.

In this case we must not switch to the nested msr permission bitmap.
Also add a warning to catch similar cases in the future.

Fixes: a7d5c7ce ("KVM: nSVM: delay MSR permission processing to first nested VM run")
Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20210107093854.882483-2-mlevitsk@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

f2c7ef3b

KVM: nSVM: mark vmcb as dirty when forcingly leaving the guest mode · 56fe28de

由 Maxim Levitsky 提交于 1月 07, 2021

We overwrite most of vmcb fields while doing so, so we must
mark it as dirty.
Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20210107093854.882483-5-mlevitsk@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

56fe28de

KVM: nSVM: correctly restore nested_run_pending on migration · 81f76ada

由 Maxim Levitsky 提交于 1月 07, 2021

The code to store it on the migration exists, but no code was restoring it.

One of the side effects of fixing this is that L1->L2 injected events
are no longer lost when migration happens with nested run pending.
Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20210107093854.882483-3-mlevitsk@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

81f76ada

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功