提交 · 59bd54a84d15e9335de5b8abe7b3b9713a36b99b · openeuler / Kernel

07 4月, 2022 1 次提交

x86/tdx: Detect running as a TDX guest in early boot · 59bd54a8

由 Kuppuswamy Sathyanarayanan 提交于 4月 06, 2022

In preparation of extending cc_platform_has() API to support TDX guest,
use CPUID instruction to detect support for TDX guests in the early
boot code (via tdx_early_init()). Since copy_bootdata() is the first
user of cc_platform_has() API, detect the TDX guest status before it.

Define a synthetic feature flag (X86_FEATURE_TDX_GUEST) and set this
bit in a valid TDX guest platform.
Signed-off-by: NKuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: NAndi Kleen <ak@linux.intel.com>
Reviewed-by: NTony Luck <tony.luck@intel.com>
Reviewed-by: NDave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: NBorislav Petkov <bp@suse.de>
Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20220405232939.73860-2-kirill.shutemov@linux.intel.com

59bd54a8

02 4月, 2022 27 次提交

KVM: x86: fix sending PV IPI · c15e0ae4

由 Li RongQing 提交于 3月 09, 2022

If apic_id is less than min, and (max - apic_id) is greater than
KVM_IPI_CLUSTER_SIZE, then the third check condition is satisfied but
the new apic_id does not fit the bitmask.  In this case __send_ipi_mask
should send the IPI.

This is mostly theoretical, but it can happen if the apic_ids on three
iterations of the loop are for example 1, KVM_IPI_CLUSTER_SIZE, 0.

Fixes: aaffcfd1 ("KVM: X86: Implement PV IPIs in linux guest")
Signed-off-by: NLi RongQing <lirongqing@baidu.com>
Message-Id: <1646814944-51801-1-git-send-email-lirongqing@baidu.com>
Cc: stable@vger.kernel.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c15e0ae4

KVM: x86/mmu: do compare-and-exchange of gPTE via the user address · 2a8859f3

由 Paolo Bonzini 提交于 3月 29, 2022

FNAME(cmpxchg_gpte) is an inefficient mess.  It is at least decent if it
can go through get_user_pages_fast(), but if it cannot then it tries to
use memremap(); that is not just terribly slow, it is also wrong because
it assumes that the VM_PFNMAP VMA is contiguous.

The right way to do it would be to do the same thing as
hva_to_pfn_remapped() does since commit add6a0cd ("KVM: MMU: try to
fix up page faults before giving up", 2016-07-05), using follow_pte()
and fixup_user_fault() to determine the correct address to use for
memremap().  To do this, one could for example extract hva_to_pfn()
for use outside virt/kvm/kvm_main.c.  But really there is no reason to
do that either, because there is already a perfectly valid address to
do the cmpxchg() on, only it is a userspace address.  That means doing
user_access_begin()/user_access_end() and writing the code in assembly
to handle exceptions correctly.  Worse, the guest PTE can be 8-byte
even on i686 so there is the extra complication of using cmpxchg8b to
account for.  But at least it is an efficient mess.

(Thanks to Linus for suggesting improvement on the inline assembly).
Reported-by: NQiuhao Li <qiuhao@sysec.org>
Reported-by: NGaoning Pan <pgn@zju.edu.cn>
Reported-by: NYongkang Jia <kangel@zju.edu.cn>
Reported-by: syzbot+6cde2282daa792c49ab8@syzkaller.appspotmail.com
Debugged-by: NTadeusz Struk <tadeusz.struk@linaro.org>
Tested-by: NMaxim Levitsky <mlevitsk@redhat.com>
Cc: stable@vger.kernel.org
Fixes: bd53cb35 ("X86/KVM: Handle PFNs outside of kernel reach when touching GPTEs")
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

2a8859f3

KVM: x86: Remove redundant vm_entry_controls_clearbit() call · 4335edbb

由 Zhenzhong Duan 提交于 3月 11, 2022

When emulating exit from long mode, EFER_LMA is cleared with
vmx_set_efer(). This will already unset the VM_ENTRY_IA32E_MODE control
bit as requested by SDM, so there is no need to unset VM_ENTRY_IA32E_MODE
again in exit_lmode() explicitly. In case EFER isn't supported by
hardware, long mode isn't supported, so exit_lmode() cannot be reached.

Note that, thanks to the shadow controls mechanism, this change doesn't
eliminate vmread or vmwrite.
Signed-off-by: NZhenzhong Duan <zhenzhong.duan@intel.com>
Message-Id: <20220311102643.807507-3-zhenzhong.duan@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

4335edbb

KVM: x86: cleanup enter_rmode() · b76edfe9

由 Zhenzhong Duan 提交于 3月 11, 2022

vmx_set_efer() sets uret->data but, in fact if the value of uret->data
will be used vmx_setup_uret_msrs() will have rewritten it with the value
returned by update_transition_efer().  uret->data is consumed if and only
if uret->load_into_hardware is true, and vmx_setup_uret_msrs() takes care
of (a) updating uret->data before setting uret->load_into_hardware to true
(b) setting uret->load_into_hardware to false if uret->data isn't updated.

Opportunistically use "vmx" directly instead of redoing to_vmx().
Signed-off-by: NZhenzhong Duan <zhenzhong.duan@intel.com>
Message-Id: <20220311102643.807507-2-zhenzhong.duan@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

b76edfe9

KVM: x86: SVM: fix tsc scaling when the host doesn't support it · 88099313

由 Maxim Levitsky 提交于 3月 22, 2022

It was decided that when TSC scaling is not supported,
the virtual MSR_AMD64_TSC_RATIO should still have the default '1.0'
value.

However in this case kvm_max_tsc_scaling_ratio is not set,
which breaks various assumptions.

Fix this by always calculating kvm_max_tsc_scaling_ratio regardless of
host support.  For consistency, do the same for VMX.
Suggested-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20220322172449.235575-8-mlevitsk@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

88099313

kvm: x86: SVM: remove unused defines · f37b735e

由 Maxim Levitsky 提交于 3月 22, 2022

Remove some unused #defines from svm.c
Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20220322172449.235575-7-mlevitsk@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

f37b735e

KVM: x86: SVM: move tsc ratio definitions to svm.h · bb2aa78e

由 Maxim Levitsky 提交于 3月 22, 2022

Another piece of SVM spec which should be in the header file
Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20220322172449.235575-6-mlevitsk@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

bb2aa78e

KVM: x86: SVM: fix avic spec based definitions again · 0dacc3df

由 Maxim Levitsky 提交于 3月 22, 2022

Due to wrong rebase, commit
4a204f78 ("KVM: SVM: Allow AVIC support on system w/ physical APIC ID > 255")

moved avic spec #defines back to avic.c.

Move them back, and while at it extend AVIC_DOORBELL_PHYSICAL_ID_MASK to 12
bits as well (it will be used in nested avic)
Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20220322172449.235575-5-mlevitsk@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

0dacc3df

KVM: x86: Only do MSR filtering when access MSR by rdmsr/wrmsr · ac8d6cad

由 Hou Wenlong 提交于 3月 07, 2022

If MSR access is rejected by MSR filtering,
kvm_set_msr()/kvm_get_msr() would return KVM_MSR_RET_FILTERED,
and the return value is only handled well for rdmsr/wrmsr.
However, some instruction emulation and state transition also
use kvm_set_msr()/kvm_get_msr() to do msr access but may trigger
some unexpected results if MSR access is rejected, E.g. RDPID
emulation would inject a #UD but RDPID wouldn't cause a exit
when RDPID is supported in hardware and ENABLE_RDTSCP is set.
And it would also cause failure when load MSR at nested entry/exit.
Since msr filtering is based on MSR bitmap, it is better to only
do MSR filtering for rdmsr/wrmsr.
Signed-off-by: NHou Wenlong <houwenlong.hwl@antgroup.com>
Message-Id: <2b2774154f7532c96a6f04d71c82a8bec7d9e80b.1646655860.git.houwenlong.hwl@antgroup.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

ac8d6cad

KVM: x86/emulator: Emulate RDPID only if it is enabled in guest · a836839c

由 Hou Wenlong 提交于 3月 02, 2022

When RDTSCP is supported but RDPID is not supported in host,
RDPID emulation is available. However, __kvm_get_msr() would
only fail when RDTSCP/RDPID both are disabled in guest, so
the emulator wouldn't inject a #UD when RDPID is disabled but
RDTSCP is enabled in guest.

Fixes: fb6d4d34 ("KVM: x86: emulate RDPID")
Signed-off-by: NHou Wenlong <houwenlong.hwl@antgroup.com>
Message-Id: <1dfd46ae5b76d3ed87bde3154d51c64ea64c99c1.1646226788.git.houwenlong.hwl@antgroup.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

a836839c

KVM: x86/pmu: Fix and isolate TSX-specific performance event logic · e644896f

由 Like Xu 提交于 3月 09, 2022

HSW_IN_TX* bits are used in generic code which are not supported on
AMD. Worse, these bits overlap with AMD EventSelect[11:8] and hence
using HSW_IN_TX* bits unconditionally in generic code is resulting in
unintentional pmu behavior on AMD. For example, if EventSelect[11:8]
is 0x2, pmc_reprogram_counter() wrongly assumes that
HSW_IN_TX_CHECKPOINTED is set and thus forces sampling period to be 0.

Also per the SDM, both bits 32 and 33 "may only be set if the processor
supports HLE or RTM" and for "IN_TXCP (bit 33): this bit may only be set
for IA32_PERFEVTSEL2."

Opportunistically eliminate code redundancy, because if the HSW_IN_TX*
bit is set in pmc->eventsel, it is already set in attr.config.
Reported-by: NRavi Bangoria <ravi.bangoria@amd.com>
Reported-by: NJim Mattson <jmattson@google.com>
Fixes: 103af0a9 ("perf, kvm: Support the in_tx/in_tx_cp modifiers in KVM arch perfmon emulation v5")
Co-developed-by: NRavi Bangoria <ravi.bangoria@amd.com>
Signed-off-by: NRavi Bangoria <ravi.bangoria@amd.com>
Signed-off-by: NLike Xu <likexu@tencent.com>
Message-Id: <20220309084257.88931-1-likexu@tencent.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

e644896f

KVM: x86: mmu: trace kvm_mmu_set_spte after the new SPTE was set · 5959ff4a

由 Maxim Levitsky 提交于 3月 02, 2022

It makes more sense to print new SPTE value than the
old value.
Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20220302102457.588450-1-mlevitsk@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

5959ff4a

KVM: x86/svm: Clear reserved bits written to PerfEvtSeln MSRs · 9b026073

由 Jim Mattson 提交于 2月 26, 2022

AMD EPYC CPUs never raise a #GP for a WRMSR to a PerfEvtSeln MSR. Some
reserved bits are cleared, and some are not. Specifically, on
Zen3/Milan, bits 19 and 42 are not cleared.

When emulating such a WRMSR, KVM should not synthesize a #GP,
regardless of which bits are set. However, undocumented bits should
not be passed through to the hardware MSR. So, rather than checking
for reserved bits and synthesizing a #GP, just clear the reserved
bits.

This may seem pedantic, but since KVM currently does not support the
"Host/Guest Only" bits (41:40), it is necessary to clear these bits
rather than synthesizing #GP, because some popular guests (e.g Linux)
will set the "Host Only" bit even on CPUs that don't support
EFER.SVME, and they don't expect a #GP.

For example,

root@Ubuntu1804:~# perf stat -e r26 -a sleep 1

 Performance counter stats for 'system wide':

                 0      r26

       1.001070977 seconds time elapsed

Feb 23 03:59:58 Ubuntu1804 kernel: [  405.379957] unchecked MSR access error: WRMSR to 0xc0010200 (tried to write 0x0000020000130026) at rIP: 0xffffffff9b276a28 (native_write_msr+0x8/0x30)
Feb 23 03:59:58 Ubuntu1804 kernel: [  405.379958] Call Trace:
Feb 23 03:59:58 Ubuntu1804 kernel: [  405.379963]  amd_pmu_disable_event+0x27/0x90

Fixes: ca724305 ("KVM: x86/vPMU: Implement AMD vPMU code for KVM")
Reported-by: NLotus Fenn <lotusf@google.com>
Signed-off-by: NJim Mattson <jmattson@google.com>
Reviewed-by: NLike Xu <likexu@tencent.com>
Reviewed-by: NDavid Dunn <daviddunn@google.com>
Message-Id: <20220226234131.2167175-1-jmattson@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

9b026073

KVM: x86: Trace all APICv inhibit changes and capture overall status · 4f4c4a3e

由 Sean Christopherson 提交于 3月 11, 2022

Trace all APICv inhibit changes instead of just those that result in
APICv being (un)inhibited, and log the current state.  Debugging why
APICv isn't working is frustrating as it's hard to see why APICv is still
inhibited, and logging only the first inhibition means unnecessary onion
peeling.

Opportunistically drop the export of the tracepoint, it is not and should
not be used by vendor code due to the need to serialize toggling via
apicv_update_lock.

Note, using the common flow means kvm_apicv_init() switched from atomic
to non-atomic bitwise operations.  The VM is unreachable at init, so
non-atomic is perfectly ok.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20220311043517.17027-4-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

4f4c4a3e

KVM: x86: Add wrappers for setting/clearing APICv inhibits · 320af55a

由 Sean Christopherson 提交于 3月 11, 2022

Add set/clear wrappers for toggling APICv inhibits to make the call sites
more readable, and opportunistically rename the inner helpers to align
with the new wrappers and to make them more readable as well. Invert the
flag from "activate" to "set"; activate is painfully ambiguous as it's
not obvious if the inhibit is being activated, or if APICv is being
activated, in which case the inhibit is being deactivated.

For the functions that take @set, swap the order of the inhibit reason
and @set so that the call sites are visually similar to those that bounce
through the wrapper.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20220311043517.17027-3-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

320af55a

KVM: x86: Make APICv inhibit reasons an enum and cleanup naming · 7491b7b2

由 Sean Christopherson 提交于 3月 11, 2022

Use an enum for the APICv inhibit reasons, there is no meaning behind
their values and they most definitely are not "unsigned longs".  Rename
the various params to "reason" for consistency and clarity (inhibit may
be confused as a command, i.e. inhibit APICv, instead of the reason that
is getting toggled/checked).

No functional change intended.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20220311043517.17027-2-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

7491b7b2

KVM: X86: Handle implicit supervisor access with SMAP · 4f4aa80e

由 Lai Jiangshan 提交于 3月 11, 2022

There are two kinds of implicit supervisor access
implicit supervisor access when CPL = 3
implicit supervisor access when CPL < 3

Current permission_fault() handles only the first kind for SMAP.

But if the access is implicit when SMAP is on, data may not be read
nor write from any user-mode address regardless the current CPL.

So the second kind should be also supported.

The first kind can be detect via CPL and access mode: if it is
supervisor access and CPL = 3, it must be implicit supervisor access.

But it is not possible to detect the second kind without extra
information, so this patch adds an artificial PFERR_EXPLICIT_ACCESS
into @access. This extra information also works for the first kind, so
the logic is changed to use this information for both cases.

The value of PFERR_EXPLICIT_ACCESS is deliberately chosen to be bit 48
which is in the most significant 16 bits of u64 and less likely to be
forced to change due to future hardware uses it.

This patch removes the call to ->get_cpl() for access mode is determined
by @access. Not only does it reduce a function call, but also remove
confusions when the permission is checked for nested TDP. The nested
TDP shouldn't have SMAP checking nor even the L2's CPL have any bearing
on it. The original code works just because it is always user walk for
NPT and SMAP fault is not set for EPT in update_permission_bitmask.
Signed-off-by: NLai Jiangshan <jiangshan.ljs@antgroup.com>
Message-Id: <20220311070346.45023-5-jiangshanlai@gmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

4f4aa80e

KVM: X86: Rename variable smap to not_smap in permission_fault() · 8873c143

由 Lai Jiangshan 提交于 3月 11, 2022

Comments above the variable says the bit is set when SMAP is overridden
or the same meaning in update_permission_bitmask(): it is not subjected
to SMAP restriction.

Renaming it to reflect the negative implication and make the code better
readability.
Signed-off-by: NLai Jiangshan <jiangshan.ljs@antgroup.com>
Message-Id: <20220311070346.45023-4-jiangshanlai@gmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

8873c143

KVM: X86: Fix comments in update_permission_bitmask · 94b4a2f1

由 Lai Jiangshan 提交于 3月 11, 2022

The commit 09f037aa ("KVM: MMU: speedup update_permission_bitmask")
refactored the code of update_permission_bitmask() and change the
comments.  It added a condition into a list to match the new code,
so the number/order for conditions in the comments should be updated
too.
Signed-off-by: NLai Jiangshan <jiangshan.ljs@antgroup.com>
Message-Id: <20220311070346.45023-3-jiangshanlai@gmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

94b4a2f1

KVM: X86: Change the type of access u32 to u64 · 5b22bbe7

由 Lai Jiangshan 提交于 3月 11, 2022

Change the type of access u32 to u64 for FNAME(walk_addr) and
->gva_to_gpa().

The kinds of accesses are usually combinations of UWX, and VMX/SVM's
nested paging adds a new factor of access: is it an access for a guest
page table or for a final guest physical address.

And SMAP relies a factor for supervisor access: explicit or implicit.

So @access in FNAME(walk_addr) and ->gva_to_gpa() is better to include
all these information to do the walk.

Although @access(u32) has enough bits to encode all the kinds, this
patch extends it to u64:
	o Extra bits will be in the higher 32 bits, so that we can
	  easily obtain the traditional access mode (UWX) by converting
	  it to u32.
	o Reuse the value for the access kind defined by SVM's nested
	  paging (PFERR_GUEST_FINAL_MASK and PFERR_GUEST_PAGE_MASK) as
	  @error_code in kvm_handle_page_fault().
Signed-off-by: NLai Jiangshan <jiangshan.ljs@antgroup.com>
Message-Id: <20220311070346.45023-2-jiangshanlai@gmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

5b22bbe7

KVM: Remove dirty handling from gfn_to_pfn_cache completely · cf1d88b3

由 David Woodhouse 提交于 3月 03, 2022

It isn't OK to cache the dirty status of a page in internal structures
for an indefinite period of time.

Any time a vCPU exits the run loop to userspace might be its last; the
VMM might do its final check of the dirty log, flush the last remaining
dirty pages to the destination and complete a live migration. If we
have internal 'dirty' state which doesn't get flushed until the vCPU
is finally destroyed on the source after migration is complete, then
we have lost data because that will escape the final copy.

This problem already exists with the use of kvm_vcpu_unmap() to mark
pages dirty in e.g. VMX nesting.

Note that the actual Linux MM already considers the page to be dirty
since we have a writeable mapping of it. This is just about the KVM
dirty logging.

For the nesting-style use cases (KVM_GUEST_USES_PFN) we will need to
track which gfn_to_pfn_caches have been used and explicitly mark the
corresponding pages dirty before returning to userspace. But we would
have needed external tracking of that anyway, rather than walking the
full list of GPCs to find those belonging to this vCPU which are dirty.

So let's rely *solely* on that external tracking, and keep it simple
rather than laying a tempting trap for callers to fall into.
Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Message-Id: <20220303154127.202856-3-dwmw2@infradead.org>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

cf1d88b3

KVM: Use enum to track if cached PFN will be used in guest and/or host · d0d96121

由 Sean Christopherson 提交于 3月 03, 2022

Replace the guest_uses_pa and kernel_map booleans in the PFN cache code
with a unified enum/bitmask. Using explicit names makes it easier to
review and audit call sites.

Opportunistically add a WARN to prevent passing garbage; instantating a
cache without declaring its usage is either buggy or pointless.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Message-Id: <20220303154127.202856-2-dwmw2@infradead.org>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

d0d96121

KVM: SVM: Fix kvm_cache_regs.h inclusions for is_guest_mode() · 4a9e7b9e

由 Peter Gonda 提交于 3月 04, 2022

Include kvm_cache_regs.h to pick up the definition of is_guest_mode(),
which is referenced by nested_svm_virtualize_tpr() in svm.h. Remove
include from svm_onhpyerv.c which was done only because of lack of
include in svm.h.

Fixes: 883b0a91 ("KVM: SVM: Move Nested SVM Implementation to nested.c")
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: NPeter Gonda <pgonda@google.com>
Message-Id: <20220304161032.2270688-1-pgonda@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

4a9e7b9e

KVM: x86/pmu: Use different raw event masks for AMD and Intel · 95b065bf

由 Jim Mattson 提交于 3月 07, 2022

The third nybble of AMD's event select overlaps with Intel's IN_TX and
IN_TXCP bits. Therefore, we can't use AMD64_RAW_EVENT_MASK on Intel
platforms that support TSX.

Declare a raw_event_mask in the kvm_pmu structure, initialize it in
the vendor-specific pmu_refresh() functions, and use that mask for
PERF_TYPE_RAW configurations in reprogram_gp_counter().

Fixes: 710c4765 ("KVM: x86/pmu: Use AMD64_RAW_EVENT_MASK for PERF_TYPE_RAW")
Signed-off-by: NJim Mattson <jmattson@google.com>
Message-Id: <20220308012452.3468611-1-jmattson@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

95b065bf

KVM: x86/mmu: Zap only TDP MMU leafs in zap range and mmu_notifier unmap · f47e5bbb

由 Sean Christopherson 提交于 3月 25, 2022

Re-introduce zapping only leaf SPTEs in kvm_zap_gfn_range() and
kvm_tdp_mmu_unmap_gfn_range(), this time without losing a pending TLB
flush when processing multiple roots (including nested TDP shadow roots).
Dropping the TLB flush resulted in random crashes when running Hyper-V
Server 2019 in a guest with KSM enabled in the host (or any source of
mmu_notifier invalidations, KSM is just the easiest to force).

This effectively revert commits 873dd122
and fcb93eb6, and thus restores commit
cf3e2642, plus this delta on top:

bool kvm_tdp_mmu_zap_leafs(struct kvm *kvm, int as_id, gfn_t start, gfn_t end,
        struct kvm_mmu_page *root;

        for_each_tdp_mmu_root_yield_safe(kvm, root, as_id)
-               flush = tdp_mmu_zap_leafs(kvm, root, start, end, can_yield, false);
+               flush = tdp_mmu_zap_leafs(kvm, root, start, end, can_yield, flush);

        return flush;
 }

Cc: Ben Gardon <bgardon@google.com>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Tested-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20220325230348.2587437-1-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

f47e5bbb

KVM: SVM: fix panic on out-of-bounds guest IRQ · a80ced6e

由 Yi Wang 提交于 3月 09, 2022

As guest_irq is coming from KVM_IRQFD API call, it may trigger
crash in svm_update_pi_irte() due to out-of-bounds:

crash> bt
PID: 22218  TASK: ffff951a6ad74980  CPU: 73  COMMAND: "vcpu8"
 #0 [ffffb1ba6707fa40] machine_kexec at ffffffff8565b397
 #1 [ffffb1ba6707fa90] __crash_kexec at ffffffff85788a6d
 #2 [ffffb1ba6707fb58] crash_kexec at ffffffff8578995d
 #3 [ffffb1ba6707fb70] oops_end at ffffffff85623c0d
 #4 [ffffb1ba6707fb90] no_context at ffffffff856692c9
 #5 [ffffb1ba6707fbf8] exc_page_fault at ffffffff85f95b51
 #6 [ffffb1ba6707fc50] asm_exc_page_fault at ffffffff86000ace
    [exception RIP: svm_update_pi_irte+227]
    RIP: ffffffffc0761b53  RSP: ffffb1ba6707fd08  RFLAGS: 00010086
    RAX: ffffb1ba6707fd78  RBX: ffffb1ba66d91000  RCX: 0000000000000001
    RDX: 00003c803f63f1c0  RSI: 000000000000019a  RDI: ffffb1ba66db2ab8
    RBP: 000000000000019a   R8: 0000000000000040   R9: ffff94ca41b82200
    R10: ffffffffffffffcf  R11: 0000000000000001  R12: 0000000000000001
    R13: 0000000000000001  R14: ffffffffffffffcf  R15: 000000000000005f
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #7 [ffffb1ba6707fdb8] kvm_irq_routing_update at ffffffffc09f19a1 [kvm]
 #8 [ffffb1ba6707fde0] kvm_set_irq_routing at ffffffffc09f2133 [kvm]
 #9 [ffffb1ba6707fe18] kvm_vm_ioctl at ffffffffc09ef544 [kvm]
    RIP: 00007f143c36488b  RSP: 00007f143a4e04b8  RFLAGS: 00000246
    RAX: ffffffffffffffda  RBX: 00007f05780041d0  RCX: 00007f143c36488b
    RDX: 00007f05780041d0  RSI: 000000004008ae6a  RDI: 0000000000000020
    RBP: 00000000000004e8   R8: 0000000000000008   R9: 00007f05780041e0
    R10: 00007f0578004560  R11: 0000000000000246  R12: 00000000000004e0
    R13: 000000000000001a  R14: 00007f1424001c60  R15: 00007f0578003bc0
    ORIG_RAX: 0000000000000010  CS: 0033  SS: 002b

Vmx have been fix this in commit 3a8b0677 (KVM: VMX: Do not BUG() on
out-of-bounds guest IRQ), so we can just copy source from that to fix
this.
Co-developed-by: NYi Liu <liu.yi24@zte.com.cn>
Signed-off-by: NYi Liu <liu.yi24@zte.com.cn>
Signed-off-by: NYi Wang <wang.yi59@zte.com.cn>
Message-Id: <20220309113025.44469-1-wang.yi59@zte.com.cn>
Cc: stable@vger.kernel.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

a80ced6e

KVM: MMU: propagate alloc_workqueue failure · a1a39128

由 Paolo Bonzini 提交于 3月 25, 2022

If kvm->arch.tdp_mmu_zap_wq cannot be created, the failure has
to be propagated up to kvm_mmu_init_vm and kvm_arch_init_vm.
kvm_arch_init_vm also has to undo all the initialization, so
group all the MMU initialization code at the beginning and
handle cleaning up of kvm_page_track_init.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

a1a39128

31 3月, 2022 2 次提交

Revert "signal, x86: Delay calling signals in atomic on RT enabled kernels" · 7dd5ad2d

由 Thomas Gleixner 提交于 3月 31, 2022

Revert commit bf9ad37d. It needs to be better encapsulated and
generalized.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>

7dd5ad2d

arch: syscalls: simplify uapi/kapi directory creation · bbc90bc1

由 Masahiro Yamada 提交于 2月 27, 2022

$(shell ...) expands to empty. There is no need to assign it to _dummy.
Signed-off-by: NMasahiro Yamada <masahiroy@kernel.org>
Acked-by: NGeert Uytterhoeven <geert@linux-m68k.org>

bbc90bc1

30 3月, 2022 10 次提交

x86/fpu/xstate: Consolidate size calculations · d6d6d50f

由 Thomas Gleixner 提交于 3月 28, 2022

Use the offset calculation to do the size calculation which avoids yet
another series of CPUID instructions for each invocation.

  [ Fix the FP/SSE only case which missed to take the xstate
    header into account, as
    Reported-by: kernel test robot <oliver.sang@intel.com> ]
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/87o81pgbp2.ffs@tglx

d6d6d50f

x86/fpu/xstate: Handle supervisor states in XSTATE permissions · 781c64bf

由 Thomas Gleixner 提交于 3月 24, 2022

The size calculation in __xstate_request_perm() fails to take supervisor
states into account because the permission bitmap is only relevant for user
states.

Up to 5.17 this does not matter because there are no supervisor states
supported, but the (re-)enabling of ENQCMD makes them available.

Fixes: 7c1ef591 ("x86/cpufeatures: Re-enable ENQCMD")
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20220324134623.681768598@linutronix.de

781c64bf

x86/fpu/xsave: Handle compacted offsets correctly with supervisor states · 7aa5128b

由 Thomas Gleixner 提交于 3月 24, 2022

So far the cached fixed compacted offsets worked, but with (re-)enabling
of ENQCMD this does no longer work with KVM fpstate.

KVM does not have supervisor features enabled for the guest FPU, which
means that KVM has then a different XSAVE area layout than the host FPU
state. This in turn breaks the copy from/to UABI functions when invoked for
a guest state.

Remove the pre-calculated compacted offsets and calculate the offset
of each component at runtime based on the XCOMP_BV field in the XSAVE
header.

The runtime overhead is not interesting because these copy from/to UABI
functions are not used in critical fast paths. KVM uses them to save and
restore FPU state during migration. The host uses them for ptrace and for
the slow path of 32bit signal handling.

Fixes: 7c1ef591 ("x86/cpufeatures: Re-enable ENQCMD")
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20220324134623.627636809@linutronix.de

7aa5128b

x86/fpu: Cache xfeature flags from CPUID · 6afbb58c

由 Thomas Gleixner 提交于 3月 24, 2022

In preparation for runtime calculation of XSAVE offsets cache the feature
flags for each XSTATE component during feature enumeration via CPUID(0xD).

EDX has two relevant bits:
    0	Supervisor component
    1	Feature storage must be 64 byte aligned

These bits are currently only evaluated during init, but the alignment bit
must be cached to make runtime calculation of XSAVE offsets efficient.

Cache the full EDX content and use it for the existing alignment and
supervisor checks.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20220324134623.573656209@linutronix.de

6afbb58c

x86/fpu/xsave: Initialize offset/size cache early · 35a77d45

由 Thomas Gleixner 提交于 3月 24, 2022

Reading XSTATE feature information from CPUID over and over does not make
sense. The information has to be cached anyway, so it can be done early.

Prepare for runtime calculation of XSTATE offsets and allow
consolidation of the size calculation functions in a later step.

Rename the function while at it as it does not setup any features.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20220324134623.519411939@linutronix.de

35a77d45

x86/fpu: Remove unused supervisor only offsets · d47f71f6

由 Thomas Gleixner 提交于 3月 24, 2022

No users.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20220324134623.465066249@linutronix.de

d47f71f6

crypto: x86/sm3 - Fixup SLS · aa8e73ee

由 Peter Zijlstra 提交于 3月 25, 2022

This missed the big asm update due to being merged through the crypto
tree.

Fixes: f94909ce ("x86: Prepare asm files for straight-line-speculation")
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

aa8e73ee

x86/fpu: Remove redundant XCOMP_BV initialization · a9f84fb7

由 Thomas Gleixner 提交于 3月 24, 2022

fpu_copy_uabi_to_guest_fpstate() initializes the XCOMP_BV field in the
XSAVE header. That's a leftover from the old KVM FPU buffer handling code.

Since

  d69c1382 ("x86/kvm: Convert FPU handling to a single swap buffer")

KVM uses the FPU core allocation code, which initializes the XCOMP_BV
field already.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Link: https://lore.kernel.org/r/20220324134623.408932232@linutronix.de

a9f84fb7

KVM: x86: Forbid VMM to set SYNIC/STIMER MSRs when SynIC wasn't activated · b1e34d32

由 Vitaly Kuznetsov 提交于 3月 25, 2022

Setting non-zero values to SYNIC/STIMER MSRs activates certain features,
this should not happen when KVM_CAP_HYPERV_SYNIC{,2} was not activated.

Note, it would've been better to forbid writing anything to SYNIC/STIMER
MSRs, including zeroes, however, at least QEMU tries clearing
HV_X64_MSR_STIMER0_CONFIG without SynIC. HV_X64_MSR_EOM MSR is somewhat
'special' as writing zero there triggers an action, this also should not
happen when SynIC wasn't activated.
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20220325132140.25650-4-vkuznets@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

b1e34d32

KVM: x86: Avoid theoretical NULL pointer dereference in kvm_irq_delivery_to_apic_fast() · 00b5f371

由 Vitaly Kuznetsov 提交于 3月 25, 2022

When kvm_irq_delivery_to_apic_fast() is called with APIC_DEST_SELF
shorthand, 'src' must not be NULL. Crash the VM with KVM_BUG_ON()
instead of crashing the host.
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20220325132140.25650-3-vkuznets@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

00b5f371

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功