提交 · 855fb0384a3db2ae33b42f7b9bb6ecb753a40880 · openeuler / Kernel

20 12月, 2021 1 次提交

KVM: x86: Always set kvm_run->if_flag · c5063551

由 Marc Orr 提交于 12月 09, 2021

The kvm_run struct's if_flag is a part of the userspace/kernel API. The
SEV-ES patches failed to set this flag because it's no longer needed by
QEMU (according to the comment in the source code). However, other
hypervisors may make use of this flag. Therefore, set the flag for
guests with encrypted registers (i.e., with guest_state_protected set).

Fixes: f1c6366e ("KVM: SVM: Add required changes to support intercepts under SEV-ES")
Signed-off-by: NMarc Orr <marcorr@google.com>
Message-Id: <20211209155257.128747-1-marcorr@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Reviewed-by: NMaxim Levitsky <mlevitsk@redhat.com>

c5063551

10 12月, 2021 1 次提交

KVM: x86: Wait for IPIs to be delivered when handling Hyper-V TLB flush hypercall · 1ebfaa11

由 Vitaly Kuznetsov 提交于 12月 09, 2021

Prior to commit 0baedd79 ("KVM: x86: make Hyper-V PV TLB flush use
tlb_flush_guest()"), kvm_hv_flush_tlb() was using 'KVM_REQ_TLB_FLUSH |
KVM_REQUEST_NO_WAKEUP' when making a request to flush TLBs on other vCPUs
and KVM_REQ_TLB_FLUSH is/was defined as:

 (0 | KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)

so KVM_REQUEST_WAIT was lost. Hyper-V TLFS, however, requires that
"This call guarantees that by the time control returns back to the
caller, the observable effects of all flushes on the specified virtual
processors have occurred." and without KVM_REQUEST_WAIT there's a small
chance that the vCPU making the TLB flush will resume running before
all IPIs get delivered to other vCPUs and a stale mapping can get read
there.

Fix the issue by adding KVM_REQUEST_WAIT flag to KVM_REQ_TLB_FLUSH_GUEST:
kvm_hv_flush_tlb() is the sole caller which uses it for
kvm_make_all_cpus_request()/kvm_make_vcpus_request_mask() where
KVM_REQUEST_WAIT makes a difference.

Cc: stable@kernel.org
Fixes: 0baedd79 ("KVM: x86: make Hyper-V PV TLB flush use tlb_flush_guest()")
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20211209102937.584397-1-vkuznets@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

1ebfaa11

08 12月, 2021 10 次提交

KVM: x86: Add an emulation type to handle completion of user exits · 906fa904

由 Hou Wenlong 提交于 11月 02, 2021

The next patch would use kvm_emulate_instruction() with
EMULTYPE_SKIP in complete_userspace_io callback to fix a
problem in msr access emulation. However, EMULTYPE_SKIP
only updates RIP, more things like updating interruptibility
state and injecting single-step #DBs would be done in the
callback. Since the emulator also does those things after
x86_emulate_insn(), add a new emulation type to pair with
EMULTYPE_SKIP to do those things for completion of user exits
within the emulator.
Suggested-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NHou Wenlong <houwenlong93@linux.alibaba.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Message-Id: <8f8c8e268b65f31d55c2881a4b30670946ecfa0d.1635842679.git.houwenlong93@linux.alibaba.com>

906fa904

KVM: X86: Remove mmu parameter from load_pdptrs() · 2df4a5eb

由 Lai Jiangshan 提交于 11月 24, 2021

It uses vcpu->arch.walk_mmu always; nested EPT does not have PDPTRs,
and nested NPT treats them like all other non-leaf page table levels
instead of caching them.
Signed-off-by: NLai Jiangshan <laijs@linux.alibaba.com>
Message-Id: <20211124122055.64424-11-jiangshanlai@gmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

2df4a5eb

KVM: X86: Rename gpte_is_8_bytes to has_4_byte_gpte and invert the direction · bb3b394d

由 Lai Jiangshan 提交于 11月 24, 2021

This bit is very close to mean "role.quadrant is not in use", except that
it is false also when the MMU is mapping guest physical addresses
directly.  In that case, role.quadrant is indeed not in use, but there
are no guest PTEs at all.

Changing the name and direction of the bit removes the special case,
since a guest with paging disabled, or not considering guest paging
structures as is the case for two-dimensional paging, does not have
to deal with 4-byte guest PTEs.
Suggested-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NLai Jiangshan <laijs@linux.alibaba.com>
Message-Id: <20211124122055.64424-10-jiangshanlai@gmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

bb3b394d

KVM: X86: Remove mmu->translate_gpa · c59a0f57

由 Lai Jiangshan 提交于 11月 24, 2021

Reduce an indirect function call (retpoline) and some intialization
code.
Signed-off-by: NLai Jiangshan <laijs@linux.alibaba.com>
Message-Id: <20211124122055.64424-4-jiangshanlai@gmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c59a0f57

KVM: X86: Add parameter struct kvm_mmu *mmu into mmu->gva_to_gpa() · 1f5a21ee

由 Lai Jiangshan 提交于 11月 24, 2021

The mmu->gva_to_gpa() has no "struct kvm_mmu *mmu", so an extra
FNAME(gva_to_gpa_nested) is needed.

Add the parameter can simplify the code.  And it makes it explicit that
the walk is upon vcpu->arch.walk_mmu for gva and vcpu->arch.mmu for L2
gpa in translate_nested_gpa() via the new parameter.
Signed-off-by: NLai Jiangshan <laijs@linux.alibaba.com>
Message-Id: <20211124122055.64424-3-jiangshanlai@gmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

1f5a21ee

KVM: X86: Remove unused declaration of __kvm_mmu_free_some_pages() · 42f34c20

由 Lai Jiangshan 提交于 11月 18, 2021

The body of __kvm_mmu_free_some_pages() has been removed.
Signed-off-by: NLai Jiangshan <laijs@linux.alibaba.com>
Message-Id: <20211118110814.2568-13-jiangshanlai@gmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

42f34c20

KVM: Drop obsolete kvm_arch_vcpu_block_finish() · 005467e0

由 Sean Christopherson 提交于 10月 08, 2021

Drop kvm_arch_vcpu_block_finish() now that all arch implementations are
nops.

No functional change intended.
Acked-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: NDavid Matlack <dmatlack@google.com>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20211009021236.4122790-10-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

005467e0

KVM: x86: Tweak halt emulation helper names to free up kvm_vcpu_halt() · 1460179d

由 Sean Christopherson 提交于 10月 08, 2021

Rename a variety of HLT-related helpers to free up the function name
"kvm_vcpu_halt" for future use in generic KVM code, e.g. to differentiate
between "block" and "halt".

No functional change intended.
Reviewed-by: NDavid Matlack <dmatlack@google.com>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20211009021236.4122790-13-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

1460179d

KVM: x86: Use nr_memslot_pages to avoid traversing the memslots array · f5756029

由 Maciej S. Szmigiero 提交于 12月 06, 2021

There is no point in recalculating from scratch the total number of pages
in all memslots each time a memslot is created or deleted.  Use KVM's
cached nr_memslot_pages to compute the default max number of MMU pages.

Note that even with nr_memslot_pages capped at ULONG_MAX we can't safely
multiply it by KVM_PERMILLE_MMU_PAGES (20) since this operation can
possibly overflow an unsigned long variable.

Write this "* 20 / 1000" operation as "/ 50" instead to avoid such
overflow.
Signed-off-by: NMaciej S. Szmigiero <maciej.szmigiero@oracle.com>
[sean: use common KVM field and rework changelog accordingly]
Reviewed-by: NSean Christopherson <seanjc@google.com>
Message-Id: <d14c5a24535269606675437d5602b7dac4ad8c0e.1638817640.git.maciej.szmigiero@oracle.com>

f5756029

KVM: MMU: update comment on the number of page role combinations · dc1ce455

由 Paolo Bonzini 提交于 11月 16, 2021

Fix the number of bits in the role, and simplify the explanation of
why several bits or combinations of bits are redundant.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

dc1ce455

02 12月, 2021 1 次提交

KVM: ensure APICv is considered inactive if there is no APIC · ef8b4b72

由 Paolo Bonzini 提交于 11月 30, 2021

kvm_vcpu_apicv_active() returns false if a virtual machine has no in-kernel
local APIC, however kvm_apicv_activated might still be true if there are
no reasons to disable APICv; in fact it is quite likely that there is none
because APICv is inhibited by specific configurations of the local APIC
and those configurations cannot be programmed. This triggers a WARN:

WARN_ON_ONCE(kvm_apicv_activated(vcpu->kvm) != kvm_vcpu_apicv_active(vcpu));

To avoid this, introduce another cause for APICv inhibition, namely the
absence of an in-kernel local APIC. This cause is enabled by default,
and is dropped by either KVM_CREATE_IRQCHIP or the enabling of
KVM_CAP_IRQCHIP_SPLIT.
Reported-by: NIgnat Korchagin <ignat@cloudflare.com>
Fixes: ee49a893 ("KVM: x86: Move SVM's APICv sanity check to common x86", 2021-10-22)
Reviewed-by: NMaxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: NSean Christopherson <seanjc@google.com>
Tested-by: NIgnat Korchagin <ignat@cloudflare.com>
Message-Id: <20211130123746.293379-1-pbonzini@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

ef8b4b72

18 11月, 2021 1 次提交

KVM: x86/mmu: include EFER.LMA in extended mmu role · b8453cdc

由 Maxim Levitsky 提交于 11月 15, 2021

Incorporate EFER.LMA into kvm_mmu_extended_role, as it used to compute the
guest root level and is not reflected in kvm_mmu_page_role.level when TDP
is in use. When simply running the guest, it is impossible for EFER.LMA
and kvm_mmu.root_level to get out of sync, as the guest cannot transition
from PAE paging to 64-bit paging without toggling CR0.PG, i.e. without
first bouncing through a different MMU context. And stuffing guest state
via KVM_SET_SREGS{,2} also ensures a full MMU context reset.

However, if KVM_SET_SREGS{,2} is followed by KVM_SET_NESTED_STATE, e.g. to
set guest state when migrating the VM while L2 is active, the vCPU state
will reflect L2, not L1. If L1 is using TDP for L2, then root_mmu will
have been configured using L2's state, despite not being used for L2. If
L2.EFER.LMA != L1.EFER.LMA, and L2 is using PAE paging, then root_mmu will
be configured for guest PAE paging, but will match the mmu_role for 64-bit
paging and cause KVM to not reconfigure root_mmu on the next nested VM-Exit.

Alternatively, the root_mmu's role could be invalidated after a successful
KVM_SET_NESTED_STATE that yields vcpu->arch.mmu != vcpu->arch.root_mmu,
i.e. that switches the active mmu to guest_mmu, but doing so is unnecessarily
tricky, and not even needed if L1 and L2 do have the same role (e.g., they
are both 64-bit guests and run with the same CR4).
Suggested-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20211115131837.195527-3-mlevitsk@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

b8453cdc

11 11月, 2021 5 次提交

KVM: x86: Drop arbitrary KVM_SOFT_MAX_VCPUS · da1bfd52

由 Vitaly Kuznetsov 提交于 11月 11, 2021

KVM_CAP_NR_VCPUS is used to get the "recommended" maximum number of
VCPUs and arm64/mips/riscv report num_online_cpus(). Powerpc reports
either num_online_cpus() or num_present_cpus(), s390 has multiple
constants depending on hardware features. On x86, KVM reports an
arbitrary value of '710' which is supposed to be the maximum tested
value but it's possible to test all KVM_MAX_VCPUS even when there are
less physical CPUs available.

Drop the arbitrary '710' value and return num_online_cpus() on x86 as
well. The recommendation will match other architectures and will mean
'no CPU overcommit'.

For reference, QEMU only queries KVM_CAP_NR_VCPUS to print a warning
when the requested vCPU number exceeds it. The static limit of '710'
is quite weird as smaller systems with just a few physical CPUs should
certainly "recommend" less.
Suggested-by: NEduardo Habkost <ehabkost@redhat.com>
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20211111134733.86601-1-vkuznets@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

da1bfd52

KVM: x86: Make sure KVM_CPUID_FEATURES really are KVM_CPUID_FEATURES · 760849b1

由 Paul Durrant 提交于 11月 05, 2021

Currently when kvm_update_cpuid_runtime() runs, it assumes that the
KVM_CPUID_FEATURES leaf is located at 0x40000001. This is not true,
however, if Hyper-V support is enabled. In this case the KVM leaves will
be offset.

This patch introdues as new 'kvm_cpuid_base' field into struct
kvm_vcpu_arch to track the location of the KVM leaves and function
kvm_update_kvm_cpuid_base() (called from kvm_set_cpuid()) to locate the
leaves using the 'KVMKVMKVM\0\0\0' signature (which is now given a
definition in kvm_para.h). Adjustment of KVM_CPUID_FEATURES will hence now
target the correct leaf.

NOTE: A new for_each_possible_hypervisor_cpuid_base() macro is intoduced
      into processor.h to avoid having duplicate code for the iteration
      over possible hypervisor base leaves.
Signed-off-by: NPaul Durrant <pdurrant@amazon.com>
Message-Id: <20211105095101.5384-3-pdurrant@amazon.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

760849b1

KVM: x86: inhibit APICv when KVM_GUESTDBG_BLOCKIRQ active · cae72dcc

由 Maxim Levitsky 提交于 11月 08, 2021

KVM_GUESTDBG_BLOCKIRQ relies on interrupts being injected using
standard kvm's inject_pending_event, and not via APICv/AVIC.

Since this is a debug feature, just inhibit APICv/AVIC while
KVM_GUESTDBG_BLOCKIRQ is in use on at least one vCPU.

Fixes: 61e5f69e ("KVM: x86: implement KVM_GUESTDBG_BLOCKIRQ")
Reported-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: NSean Christopherson <seanjc@google.com>
Tested-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20211108090245.166408-1-mlevitsk@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

cae72dcc

KVM: x86: Fix recording of guest steal time / preempted status · 7e2175eb

由 David Woodhouse 提交于 11月 02, 2021

In commit b0431382 ("x86/KVM: Make sure KVM_VCPU_FLUSH_TLB flag is
not missed") we switched to using a gfn_to_pfn_cache for accessing the
guest steal time structure in order to allow for an atomic xchg of the
preempted field. This has a couple of problems.

Firstly, kvm_map_gfn() doesn't work at all for IOMEM pages when the
atomic flag is set, which it is in kvm_steal_time_set_preempted(). So a
guest vCPU using an IOMEM page for its steal time would never have its
preempted field set.

Secondly, the gfn_to_pfn_cache is not invalidated in all cases where it
should have been. There are two stages to the GFN->PFN conversion;
first the GFN is converted to a userspace HVA, and then that HVA is
looked up in the process page tables to find the underlying host PFN.
Correct invalidation of the latter would require being hooked up to the
MMU notifiers, but that doesn't happen---so it just keeps mapping and
unmapping the *wrong* PFN after the userspace page tables change.

In the !IOMEM case at least the stale page *is* pinned all the time it's
cached, so it won't be freed and reused by anyone else while still
receiving the steal time updates. The map/unmap dance only takes care
of the KVM administrivia such as marking the page dirty.

Until the gfn_to_pfn cache handles the remapping automatically by
integrating with the MMU notifiers, we might as well not get a
kernel mapping of it, and use the perfectly serviceable userspace HVA
that we already have. We just need to implement the atomic xchg on
the userspace address with appropriate exception handling, which is
fairly trivial.

Cc: stable@vger.kernel.org
Fixes: b0431382 ("x86/KVM: Make sure KVM_VCPU_FLUSH_TLB flag is not missed")
Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk>
Message-Id: <3645b9b889dac6438394194bb5586a46b68d581f.camel@infradead.org>
[I didn't entirely agree with David's assessment of the
usefulness of the gfn_to_pfn cache, and integrated the outcome
of the discussion in the above commit message. - Paolo]
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

7e2175eb

KVM: SEV: Add support for SEV intra host migration · b5663931

由 Peter Gonda 提交于 10月 21, 2021

For SEV to work with intra host migration, contents of the SEV info struct
such as the ASID (used to index the encryption key in the AMD SP) and
the list of memory regions need to be transferred to the target VM.
This change adds a commands for a target VMM to get a source SEV VM's sev
info.
Signed-off-by: NPeter Gonda <pgonda@google.com>
Suggested-by: NSean Christopherson <seanjc@google.com>
Reviewed-by: NMarc Orr <marcorr@google.com>
Cc: Marc Orr <marcorr@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Wanpeng Li <wanpengli@tencent.com>
Cc: Jim Mattson <jmattson@google.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Message-Id: <20211021174303.385706-3-pgonda@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

b5663931

25 10月, 2021 3 次提交

KVM: x86: switch pvclock_gtod_sync_lock to a raw spinlock · 8228c77d

由 David Woodhouse 提交于 10月 23, 2021

On the preemption path when updating a Xen guest's runstate times, this
lock is taken inside the scheduler rq->lock, which is a raw spinlock.
This was shown in a lockdep warning:

[   89.138354] =============================
[   89.138356] [ BUG: Invalid wait context ]
[   89.138358] 5.15.0-rc5+ #834 Tainted: G S        I E
[   89.138360] -----------------------------
[   89.138361] xen_shinfo_test/2575 is trying to lock:
[   89.138363] ffffa34a0364efd8 (&kvm->arch.pvclock_gtod_sync_lock){....}-{3:3}, at: get_kvmclock_ns+0x1f/0x130 [kvm]
[   89.138442] other info that might help us debug this:
[   89.138444] context-{5:5}
[   89.138445] 4 locks held by xen_shinfo_test/2575:
[   89.138447]  #0: ffff972bdc3b8108 (&vcpu->mutex){+.+.}-{4:4}, at: kvm_vcpu_ioctl+0x77/0x6f0 [kvm]
[   89.138483]  #1: ffffa34a03662e90 (&kvm->srcu){....}-{0:0}, at: kvm_arch_vcpu_ioctl_run+0xdc/0x8b0 [kvm]
[   89.138526]  #2: ffff97331fdbac98 (&rq->__lock){-.-.}-{2:2}, at: __schedule+0xff/0xbd0
[   89.138534]  #3: ffffa34a03662e90 (&kvm->srcu){....}-{0:0}, at: kvm_arch_vcpu_put+0x26/0x170 [kvm]
...
[   89.138695]  get_kvmclock_ns+0x1f/0x130 [kvm]
[   89.138734]  kvm_xen_update_runstate+0x14/0x90 [kvm]
[   89.138783]  kvm_xen_update_runstate_guest+0x15/0xd0 [kvm]
[   89.138830]  kvm_arch_vcpu_put+0xe6/0x170 [kvm]
[   89.138870]  kvm_sched_out+0x2f/0x40 [kvm]
[   89.138900]  __schedule+0x5de/0xbd0

Cc: stable@vger.kernel.org
Reported-by: syzbot+b282b65c2c68492df769@syzkaller.appspotmail.com
Fixes: 30b5c851 ("KVM: x86/xen: Add support for vCPU runstate information")
Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk>
Message-Id: <1b02a06421c17993df337493a68ba923f3bd5c0f.camel@infradead.org>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

8228c77d

KVM: x86: On emulation failure, convey the exit reason, etc. to userspace · e615e355

由 David Edmondson 提交于 9月 20, 2021

Should instruction emulation fail, include the VM exit reason, etc. in
the emulation_failure data passed to userspace, in order that the VMM
can report it as a debugging aid when describing the failure.
Suggested-by: NJoao Martins <joao.m.martins@oracle.com>
Signed-off-by: NDavid Edmondson <david.edmondson@oracle.com>
Reviewed-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210920103737.2696756-4-david.edmondson@oracle.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

e615e355

KVM: x86: Get exit_reason as part of kvm_x86_ops.get_exit_info · 0a62a031

由 David Edmondson 提交于 9月 20, 2021

Extend the get_exit_info static call to provide the reason for the VM
exit. Modify relevant trace points to use this rather than extracting
the reason in the caller.
Signed-off-by: NDavid Edmondson <david.edmondson@oracle.com>
Reviewed-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210920103737.2696756-3-david.edmondson@oracle.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

0a62a031

23 10月, 2021 1 次提交

x86/kvm: Convert FPU handling to a single swap buffer · d69c1382

由 Thomas Gleixner 提交于 10月 22, 2021

For the upcoming AMX support it's necessary to do a proper integration with
KVM. Currently KVM allocates two FPU structs which are used for saving the user
state of the vCPU thread and restoring the guest state when entering
vcpu_run() and doing the reverse operation before leaving vcpu_run().

With the new fpstate mechanism this can be reduced to one extra buffer by
swapping the fpstate pointer in current::thread::fpu. This makes the
upcoming support for AMX and XFD simpler because then fpstate information
(features, sizes, xfd) are always consistent and it does not require any
nasty workarounds.

Convert the KVM FPU code over to this new scheme.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Link: https://lkml.kernel.org/r/20211022185313.019454292@linutronix.de

d69c1382

22 10月, 2021 6 次提交

KVM: x86: Use rw_semaphore for APICv lock to allow vCPU parallelism · 187c8833

由 Sean Christopherson 提交于 10月 21, 2021

Use a rw_semaphore instead of a mutex to coordinate APICv updates so that
vCPUs responding to requests can take the lock for read and run in
parallel. Using a mutex forces serialization of vCPUs even though
kvm_vcpu_update_apicv() only touches data local to that vCPU or is
protected by a different lock, e.g. SVM's ir_list_lock.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20211022004927.1448382-5-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

187c8833

KVM: SEV-ES: go over the sev_pio_data buffer in multiple passes if needed · 95e16b47

由 Paolo Bonzini 提交于 10月 12, 2021

The PIO scratch buffer is larger than a single page, and therefore
it is not possible to copy it in a single step to vcpu->arch/pio_data.
Bound each call to emulator_pio_in/out to a single page; keep
track of how many I/O operations are left in vcpu->arch.sev_pio_count,
so that the operation can be restarted in the complete_userspace_io
callback.

For OUT, this means that the previous kvm_sev_es_outs implementation
becomes an iterator of the loop, and we can consume the sev_pio_data
buffer before leaving to userspace.

For IN, instead, consuming the buffer and decreasing sev_pio_count
is always done in the complete_userspace_io callback, because that
is when the memcpy is done into sev_pio_data.

Cc: stable@vger.kernel.org
Fixes: 7ed9abfe ("KVM: SVM: Support string IO operations for an SEV-ES guest")
Reported-by: NFelix Wilhelm <fwilhelm@google.com>
Reviewed-by: NMaxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

95e16b47

KVM: SEV-ES: rename guest_ins_data to sev_pio_data · b5998402

由 Paolo Bonzini 提交于 10月 12, 2021

We will be using this field for OUTS emulation as well, in case the
data that is pushed via OUTS spans more than one page.  In that case,
there will be a need to save the data pointer across exits to userspace.

So, change the name to something that refers to any kind of PIO.
Also spell out what it is used for, namely SEV-ES.

No functional change intended.

Cc: stable@vger.kernel.org
Fixes: 7ed9abfe ("KVM: SVM: Support string IO operations for an SEV-ES guest")
Reviewed-by: NMaxim Levitsky <mlevitsk@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

b5998402

KVM: x86: Add vendor name to kvm_x86_ops, use it for error messages · 9dadfc4a

由 Sean Christopherson 提交于 10月 18, 2021

Paul pointed out the error messages when KVM fails to load are unhelpful
in understanding exactly what went wrong if userspace probes the "wrong"
module.

Add a mandatory kvm_x86_ops field to track vendor module names, kvm_intel
and kvm_amd, and use the name for relevant error message when KVM fails
to load so that the user knows which module failed to load.

Opportunistically tweak the "disabled by bios" error message to clarify
that _support_ was disabled, not that the module itself was magically
disabled by BIOS.
Suggested-by: NPaul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20211018183929.897461-2-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

9dadfc4a

KVM: vPMU: Fill get_msr MSR_CORE_PERF_GLOBAL_OVF_CTRL w/ 0 · 540c7abe

由 Wanpeng Li 提交于 10月 19, 2021

SDM section 18.2.3 mentioned that:

  "IA32_PERF_GLOBAL_OVF_CTL MSR allows software to clear overflow indicator(s) of
   any general-purpose or fixed-function counters via a single WRMSR."

It is R/W mentioned by SDM, we read this msr on bare-metal during perf testing,
the value is always 0 for ICX/SKX boxes on hands. Let's fill get_msr
MSR_CORE_PERF_GLOBAL_OVF_CTRL w/ 0 as hardware behavior and drop
global_ovf_ctrl variable.
Tested-by: NLike Xu <likexu@tencent.com>
Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
Message-Id: <1634631160-67276-2-git-send-email-wanpengli@tencent.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

540c7abe

KVM: cleanup allocation of rmaps and page tracking data · 1e76a3ce

由 David Stevens 提交于 10月 15, 2021

Unify the flags for rmaps and page tracking data, using a
single flag in struct kvm_arch and a single loop to go
over all the address spaces and memslots.  This avoids
code duplication between alloc_all_memslots_rmaps and
kvm_page_track_enable_mmu_write_tracking.
Signed-off-by: NDavid Stevens <stevensd@chromium.org>
[This patch is the delta between David's v2 and v3, with conflicts
 fixed and my own commit message. - Paolo]
Co-developed-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

1e76a3ce

19 10月, 2021 3 次提交

KVM: x86: Expose TSC offset controls to userspace · 828ca896

由 Oliver Upton 提交于 9月 16, 2021

To date, VMM-directed TSC synchronization and migration has been a bit
messy. KVM has some baked-in heuristics around TSC writes to infer if
the VMM is attempting to synchronize. This is problematic, as it depends
on host userspace writing to the guest's TSC within 1 second of the last
write.

A much cleaner approach to configuring the guest's views of the TSC is to
simply migrate the TSC offset for every vCPU. Offsets are idempotent,
and thus not subject to change depending on when the VMM actually
reads/writes values from/to KVM. The VMM can then read the TSC once with
KVM_GET_CLOCK to capture a (realtime, host_tsc) pair at the instant when
the guest is paused.

Cc: David Matlack <dmatlack@google.com>
Cc: Sean Christopherson <seanjc@google.com>
Signed-off-by: NOliver Upton <oupton@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Message-Id: <20210916181538.968978-8-oupton@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

828ca896

kvm: x86: protect masterclock with a seqcount · 869b4421

由 Paolo Bonzini 提交于 9月 16, 2021

Protect the reference point for kvmclock with a seqcount, so that
kvmclock updates for all vCPUs can proceed in parallel.  Xen runstate
updates will also run in parallel and not bounce the kvmclock cacheline.

Of the variables that were protected by pvclock_gtod_sync_lock,
nr_vcpus_matched_tsc is different because it is updated outside
pvclock_update_vm_gtod_copy and read inside it.  Therefore, we
need to keep it protected by a spinlock.  In fact it must now
be a raw spinlock, because pvclock_update_vm_gtod_copy, being the
write-side of a seqcount, is non-preemptible.  Since we already
have tsc_write_lock which is a raw spinlock, we can just use
tsc_write_lock as the lock that protects the write-side of the
seqcount.
Co-developed-by: NOliver Upton <oupton@google.com>
Message-Id: <20210916181538.968978-6-oupton@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

869b4421

KVM: x86: Report host tsc and realtime values in KVM_GET_CLOCK · c68dc1b5

由 Oliver Upton 提交于 9月 16, 2021

Handling the migration of TSCs correctly is difficult, in part because
Linux does not provide userspace with the ability to retrieve a (TSC,
realtime) clock pair for a single instant in time. In lieu of a more
convenient facility, KVM can report similar information in the kvm_clock
structure.

Provide userspace with a host TSC & realtime pair iff the realtime clock
is based on the TSC. If userspace provides KVM_SET_CLOCK with a valid
realtime value, advance the KVM clock by the amount of elapsed time. Do
not step the KVM clock backwards, though, as it is a monotonic
oscillator.
Suggested-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NOliver Upton <oupton@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Message-Id: <20210916181538.968978-5-oupton@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c68dc1b5

01 10月, 2021 4 次提交

KVM: x86: only allocate gfn_track when necessary · deae4a10

由 David Stevens 提交于 9月 22, 2021

Avoid allocating the gfn_track arrays if nothing needs them. If there
are no external to KVM users of the API (i.e. no GVT-g), then page
tracking is only needed for shadow page tables. This means that when tdp
is enabled and there are no external users, then the gfn_track arrays
can be lazily allocated when the shadow MMU is actually used. This avoid
allocations equal to .05% of guest memory when nested virtualization is
not used, if the kernel is compiled without GVT-g.
Signed-off-by: NDavid Stevens <stevensd@chromium.org>
Message-Id: <20210922045859.2011227-3-stevensd@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

deae4a10

kvm: use kvfree() in kvm_arch_free_vm() · 78b497f2

由 Juergen Gross 提交于 9月 03, 2021

By switching from kfree() to kvfree() in kvm_arch_free_vm() Arm64 can
use the common variant. This can be accomplished by adding another
macro __KVM_HAVE_ARCH_VM_FREE, which will be used only by x86 for now.

Further simplification can be achieved by adding __kvm_arch_free_vm()
doing the common part.
Suggested-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NJuergen Gross <jgross@suse.com>
Message-Id: <20210903130808.30142-5-jgross@suse.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

78b497f2

KVM: MMU: change mmu->page_fault() arguments to kvm_page_fault · c501040a

由 Paolo Bonzini 提交于 8月 06, 2021

Pass struct kvm_page_fault to mmu->page_fault() instead of
extracting the arguments from the struct.  FNAME(page_fault) can use
the precomputed bools from the error code.
Suggested-by: NIsaku Yamahata <isaku.yamahata@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c501040a

kvm: x86: abstract locking around pvclock_update_vm_gtod_copy · 6b6fcd28

由 Paolo Bonzini 提交于 9月 16, 2021

Updates to the kvmclock parameters needs to do a complicated dance of
KVM_REQ_MCLOCK_INPROGRESS and KVM_REQ_CLOCK_UPDATE in addition to taking
pvclock_gtod_sync_lock. Place that in two functions that can be called
on all of master clock update, KVM_SET_CLOCK, and Hyper-V reenlightenment.
Reviewed-by: NMarcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

6b6fcd28

30 9月, 2021 3 次提交

KVM: x86: Subsume nested GPA read helper into load_pdptrs() · 15cabbc2

由 Sean Christopherson 提交于 8月 31, 2021

Open code the call to mmu->translate_gpa() when loading nested PDPTRs and
kill off the existing helper, kvm_read_guest_page_mmu(), to discourage
incorrect use.  Reading guest memory straight from an L2 GPA is extremely
rare (as evidenced by the lack of users), as very few constructs in x86
specify physical addresses, even fewer are virtualized by KVM, and even
fewer yet require emulation of L2 by L0 KVM.

No functional change intended.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Message-Id: <20210831164224.1119728-3-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

15cabbc2

kvm: rename KVM_MAX_VCPU_ID to KVM_MAX_VCPU_IDS · a1c42dde

由 Juergen Gross 提交于 9月 13, 2021

KVM_MAX_VCPU_ID is not specifying the highest allowed vcpu-id, but the
number of allowed vcpu-ids. This has already led to confusion, so
rename KVM_MAX_VCPU_ID to KVM_MAX_VCPU_IDS to make its semantics more
clear
Suggested-by: NEduardo Habkost <ehabkost@redhat.com>
Signed-off-by: NJuergen Gross <jgross@suse.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Message-Id: <20210913135745.13944-3-jgross@suse.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

a1c42dde

KVM: Make kvm_make_vcpus_request_mask() use pre-allocated cpu_kick_mask · 620b2438

由 Vitaly Kuznetsov 提交于 9月 03, 2021

kvm_make_vcpus_request_mask() already disables preemption so just like
kvm_make_all_cpus_request_except() it can be switched to using
pre-allocated per-cpu cpumasks. This allows for improvements for both
users of the function: in Hyper-V emulation code 'tlb_flush' can now be
dropped from 'struct kvm_vcpu_hv' and kvm_make_scan_ioapic_request_mask()
gets rid of dynamic allocation.

cpumask_available() checks in kvm_make_vcpu_request() and
kvm_kick_many_cpus() can now be dropped as they checks for an impossible
condition: kvm_init() makes sure per-cpu masks are allocated.
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Message-Id: <20210903075141.403071-9-vkuznets@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

620b2438

06 9月, 2021 1 次提交

kvm: x86: Increase KVM_SOFT_MAX_VCPUS to 710 · 1dbaf04c

由 Eduardo Habkost 提交于 9月 03, 2021

Support for 710 VCPUs was tested by Red Hat since RHEL-8.4,
so increase KVM_SOFT_MAX_VCPUS to 710.
Signed-off-by: NEduardo Habkost <ehabkost@redhat.com>
Message-Id: <20210903211600.2002377-4-ehabkost@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

1dbaf04c

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功