提交 · 880993138396f8f0be620c425d08f84490c35251 · openeuler / Kernel

02 4月, 2022 27 次提交

KVM: x86: SVM: fix tsc scaling when the host doesn't support it · 88099313

由 Maxim Levitsky 提交于 2年前

It was decided that when TSC scaling is not supported,
the virtual MSR_AMD64_TSC_RATIO should still have the default '1.0'
value.

However in this case kvm_max_tsc_scaling_ratio is not set,
which breaks various assumptions.

Fix this by always calculating kvm_max_tsc_scaling_ratio regardless of
host support.  For consistency, do the same for VMX.
Suggested-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20220322172449.235575-8-mlevitsk@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

88099313

kvm: x86: SVM: remove unused defines · f37b735e

由 Maxim Levitsky 提交于 2年前

Remove some unused #defines from svm.c
Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20220322172449.235575-7-mlevitsk@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

f37b735e

KVM: x86: SVM: move tsc ratio definitions to svm.h · bb2aa78e

由 Maxim Levitsky 提交于 2年前

Another piece of SVM spec which should be in the header file
Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20220322172449.235575-6-mlevitsk@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

bb2aa78e

KVM: x86: SVM: fix avic spec based definitions again · 0dacc3df

由 Maxim Levitsky 提交于 2年前

Due to wrong rebase, commit
4a204f78 ("KVM: SVM: Allow AVIC support on system w/ physical APIC ID > 255")

moved avic spec #defines back to avic.c.

Move them back, and while at it extend AVIC_DOORBELL_PHYSICAL_ID_MASK to 12
bits as well (it will be used in nested avic)
Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20220322172449.235575-5-mlevitsk@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

0dacc3df

KVM: MIPS: remove reference to trap&emulate virtualization · fe5f6914

由 Paolo Bonzini 提交于 2年前

Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Message-Id: <20220313140522.1307751-1-pbonzini@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

fe5f6914

KVM: x86: document limitations of MSR filtering · ce2f72e2

由 Paolo Bonzini 提交于 2年前

MSR filtering requires an exit to userspace that is hard to implement and
would be very slow in the case of nested VMX vmexit and vmentry MSR
accesses.  Document the limitation.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

ce2f72e2

KVM: x86: Only do MSR filtering when access MSR by rdmsr/wrmsr · ac8d6cad

由 Hou Wenlong 提交于 2年前

If MSR access is rejected by MSR filtering,
kvm_set_msr()/kvm_get_msr() would return KVM_MSR_RET_FILTERED,
and the return value is only handled well for rdmsr/wrmsr.
However, some instruction emulation and state transition also
use kvm_set_msr()/kvm_get_msr() to do msr access but may trigger
some unexpected results if MSR access is rejected, E.g. RDPID
emulation would inject a #UD but RDPID wouldn't cause a exit
when RDPID is supported in hardware and ENABLE_RDTSCP is set.
And it would also cause failure when load MSR at nested entry/exit.
Since msr filtering is based on MSR bitmap, it is better to only
do MSR filtering for rdmsr/wrmsr.
Signed-off-by: NHou Wenlong <houwenlong.hwl@antgroup.com>
Message-Id: <2b2774154f7532c96a6f04d71c82a8bec7d9e80b.1646655860.git.houwenlong.hwl@antgroup.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

ac8d6cad

KVM: x86/emulator: Emulate RDPID only if it is enabled in guest · a836839c

由 Hou Wenlong 提交于 2年前

When RDTSCP is supported but RDPID is not supported in host,
RDPID emulation is available. However, __kvm_get_msr() would
only fail when RDTSCP/RDPID both are disabled in guest, so
the emulator wouldn't inject a #UD when RDPID is disabled but
RDTSCP is enabled in guest.

Fixes: fb6d4d34 ("KVM: x86: emulate RDPID")
Signed-off-by: NHou Wenlong <houwenlong.hwl@antgroup.com>
Message-Id: <1dfd46ae5b76d3ed87bde3154d51c64ea64c99c1.1646226788.git.houwenlong.hwl@antgroup.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

a836839c

KVM: x86/pmu: Fix and isolate TSX-specific performance event logic · e644896f

由 Like Xu 提交于 2年前

HSW_IN_TX* bits are used in generic code which are not supported on
AMD. Worse, these bits overlap with AMD EventSelect[11:8] and hence
using HSW_IN_TX* bits unconditionally in generic code is resulting in
unintentional pmu behavior on AMD. For example, if EventSelect[11:8]
is 0x2, pmc_reprogram_counter() wrongly assumes that
HSW_IN_TX_CHECKPOINTED is set and thus forces sampling period to be 0.

Also per the SDM, both bits 32 and 33 "may only be set if the processor
supports HLE or RTM" and for "IN_TXCP (bit 33): this bit may only be set
for IA32_PERFEVTSEL2."

Opportunistically eliminate code redundancy, because if the HSW_IN_TX*
bit is set in pmc->eventsel, it is already set in attr.config.
Reported-by: NRavi Bangoria <ravi.bangoria@amd.com>
Reported-by: NJim Mattson <jmattson@google.com>
Fixes: 103af0a9 ("perf, kvm: Support the in_tx/in_tx_cp modifiers in KVM arch perfmon emulation v5")
Co-developed-by: NRavi Bangoria <ravi.bangoria@amd.com>
Signed-off-by: NRavi Bangoria <ravi.bangoria@amd.com>
Signed-off-by: NLike Xu <likexu@tencent.com>
Message-Id: <20220309084257.88931-1-likexu@tencent.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

e644896f

KVM: x86: mmu: trace kvm_mmu_set_spte after the new SPTE was set · 5959ff4a

由 Maxim Levitsky 提交于 2年前

It makes more sense to print new SPTE value than the
old value.
Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20220302102457.588450-1-mlevitsk@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

5959ff4a

KVM: x86/svm: Clear reserved bits written to PerfEvtSeln MSRs · 9b026073

由 Jim Mattson 提交于 2年前

AMD EPYC CPUs never raise a #GP for a WRMSR to a PerfEvtSeln MSR. Some
reserved bits are cleared, and some are not. Specifically, on
Zen3/Milan, bits 19 and 42 are not cleared.

When emulating such a WRMSR, KVM should not synthesize a #GP,
regardless of which bits are set. However, undocumented bits should
not be passed through to the hardware MSR. So, rather than checking
for reserved bits and synthesizing a #GP, just clear the reserved
bits.

This may seem pedantic, but since KVM currently does not support the
"Host/Guest Only" bits (41:40), it is necessary to clear these bits
rather than synthesizing #GP, because some popular guests (e.g Linux)
will set the "Host Only" bit even on CPUs that don't support
EFER.SVME, and they don't expect a #GP.

For example,

root@Ubuntu1804:~# perf stat -e r26 -a sleep 1

 Performance counter stats for 'system wide':

                 0      r26

       1.001070977 seconds time elapsed

Feb 23 03:59:58 Ubuntu1804 kernel: [  405.379957] unchecked MSR access error: WRMSR to 0xc0010200 (tried to write 0x0000020000130026) at rIP: 0xffffffff9b276a28 (native_write_msr+0x8/0x30)
Feb 23 03:59:58 Ubuntu1804 kernel: [  405.379958] Call Trace:
Feb 23 03:59:58 Ubuntu1804 kernel: [  405.379963]  amd_pmu_disable_event+0x27/0x90

Fixes: ca724305 ("KVM: x86/vPMU: Implement AMD vPMU code for KVM")
Reported-by: NLotus Fenn <lotusf@google.com>
Signed-off-by: NJim Mattson <jmattson@google.com>
Reviewed-by: NLike Xu <likexu@tencent.com>
Reviewed-by: NDavid Dunn <daviddunn@google.com>
Message-Id: <20220226234131.2167175-1-jmattson@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

9b026073

KVM: x86: Trace all APICv inhibit changes and capture overall status · 4f4c4a3e

由 Sean Christopherson 提交于 2年前

Trace all APICv inhibit changes instead of just those that result in
APICv being (un)inhibited, and log the current state.  Debugging why
APICv isn't working is frustrating as it's hard to see why APICv is still
inhibited, and logging only the first inhibition means unnecessary onion
peeling.

Opportunistically drop the export of the tracepoint, it is not and should
not be used by vendor code due to the need to serialize toggling via
apicv_update_lock.

Note, using the common flow means kvm_apicv_init() switched from atomic
to non-atomic bitwise operations.  The VM is unreachable at init, so
non-atomic is perfectly ok.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20220311043517.17027-4-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

4f4c4a3e

KVM: x86: Add wrappers for setting/clearing APICv inhibits · 320af55a

由 Sean Christopherson 提交于 2年前

Add set/clear wrappers for toggling APICv inhibits to make the call sites
more readable, and opportunistically rename the inner helpers to align
with the new wrappers and to make them more readable as well. Invert the
flag from "activate" to "set"; activate is painfully ambiguous as it's
not obvious if the inhibit is being activated, or if APICv is being
activated, in which case the inhibit is being deactivated.

For the functions that take @set, swap the order of the inhibit reason
and @set so that the call sites are visually similar to those that bounce
through the wrapper.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20220311043517.17027-3-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

320af55a

KVM: x86: Make APICv inhibit reasons an enum and cleanup naming · 7491b7b2

由 Sean Christopherson 提交于 2年前

Use an enum for the APICv inhibit reasons, there is no meaning behind
their values and they most definitely are not "unsigned longs".  Rename
the various params to "reason" for consistency and clarity (inhibit may
be confused as a command, i.e. inhibit APICv, instead of the reason that
is getting toggled/checked).

No functional change intended.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20220311043517.17027-2-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

7491b7b2

KVM: X86: Handle implicit supervisor access with SMAP · 4f4aa80e

由 Lai Jiangshan 提交于 2年前

There are two kinds of implicit supervisor access
implicit supervisor access when CPL = 3
implicit supervisor access when CPL < 3

Current permission_fault() handles only the first kind for SMAP.

But if the access is implicit when SMAP is on, data may not be read
nor write from any user-mode address regardless the current CPL.

So the second kind should be also supported.

The first kind can be detect via CPL and access mode: if it is
supervisor access and CPL = 3, it must be implicit supervisor access.

But it is not possible to detect the second kind without extra
information, so this patch adds an artificial PFERR_EXPLICIT_ACCESS
into @access. This extra information also works for the first kind, so
the logic is changed to use this information for both cases.

The value of PFERR_EXPLICIT_ACCESS is deliberately chosen to be bit 48
which is in the most significant 16 bits of u64 and less likely to be
forced to change due to future hardware uses it.

This patch removes the call to ->get_cpl() for access mode is determined
by @access. Not only does it reduce a function call, but also remove
confusions when the permission is checked for nested TDP. The nested
TDP shouldn't have SMAP checking nor even the L2's CPL have any bearing
on it. The original code works just because it is always user walk for
NPT and SMAP fault is not set for EPT in update_permission_bitmask.
Signed-off-by: NLai Jiangshan <jiangshan.ljs@antgroup.com>
Message-Id: <20220311070346.45023-5-jiangshanlai@gmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

4f4aa80e

KVM: X86: Rename variable smap to not_smap in permission_fault() · 8873c143

由 Lai Jiangshan 提交于 2年前

Comments above the variable says the bit is set when SMAP is overridden
or the same meaning in update_permission_bitmask(): it is not subjected
to SMAP restriction.

Renaming it to reflect the negative implication and make the code better
readability.
Signed-off-by: NLai Jiangshan <jiangshan.ljs@antgroup.com>
Message-Id: <20220311070346.45023-4-jiangshanlai@gmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

8873c143

KVM: X86: Fix comments in update_permission_bitmask · 94b4a2f1

由 Lai Jiangshan 提交于 2年前

The commit 09f037aa ("KVM: MMU: speedup update_permission_bitmask")
refactored the code of update_permission_bitmask() and change the
comments.  It added a condition into a list to match the new code,
so the number/order for conditions in the comments should be updated
too.
Signed-off-by: NLai Jiangshan <jiangshan.ljs@antgroup.com>
Message-Id: <20220311070346.45023-3-jiangshanlai@gmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

94b4a2f1

KVM: X86: Change the type of access u32 to u64 · 5b22bbe7

由 Lai Jiangshan 提交于 2年前

Change the type of access u32 to u64 for FNAME(walk_addr) and
->gva_to_gpa().

The kinds of accesses are usually combinations of UWX, and VMX/SVM's
nested paging adds a new factor of access: is it an access for a guest
page table or for a final guest physical address.

And SMAP relies a factor for supervisor access: explicit or implicit.

So @access in FNAME(walk_addr) and ->gva_to_gpa() is better to include
all these information to do the walk.

Although @access(u32) has enough bits to encode all the kinds, this
patch extends it to u64:
	o Extra bits will be in the higher 32 bits, so that we can
	  easily obtain the traditional access mode (UWX) by converting
	  it to u32.
	o Reuse the value for the access kind defined by SVM's nested
	  paging (PFERR_GUEST_FINAL_MASK and PFERR_GUEST_PAGE_MASK) as
	  @error_code in kvm_handle_page_fault().
Signed-off-by: NLai Jiangshan <jiangshan.ljs@antgroup.com>
Message-Id: <20220311070346.45023-2-jiangshanlai@gmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

5b22bbe7

KVM: Remove dirty handling from gfn_to_pfn_cache completely · cf1d88b3

由 David Woodhouse 提交于 2年前

It isn't OK to cache the dirty status of a page in internal structures
for an indefinite period of time.

Any time a vCPU exits the run loop to userspace might be its last; the
VMM might do its final check of the dirty log, flush the last remaining
dirty pages to the destination and complete a live migration. If we
have internal 'dirty' state which doesn't get flushed until the vCPU
is finally destroyed on the source after migration is complete, then
we have lost data because that will escape the final copy.

This problem already exists with the use of kvm_vcpu_unmap() to mark
pages dirty in e.g. VMX nesting.

Note that the actual Linux MM already considers the page to be dirty
since we have a writeable mapping of it. This is just about the KVM
dirty logging.

For the nesting-style use cases (KVM_GUEST_USES_PFN) we will need to
track which gfn_to_pfn_caches have been used and explicitly mark the
corresponding pages dirty before returning to userspace. But we would
have needed external tracking of that anyway, rather than walking the
full list of GPCs to find those belonging to this vCPU which are dirty.

So let's rely *solely* on that external tracking, and keep it simple
rather than laying a tempting trap for callers to fall into.
Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Message-Id: <20220303154127.202856-3-dwmw2@infradead.org>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

cf1d88b3

KVM: Use enum to track if cached PFN will be used in guest and/or host · d0d96121

由 Sean Christopherson 提交于 2年前

Replace the guest_uses_pa and kernel_map booleans in the PFN cache code
with a unified enum/bitmask. Using explicit names makes it easier to
review and audit call sites.

Opportunistically add a WARN to prevent passing garbage; instantating a
cache without declaring its usage is either buggy or pointless.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Message-Id: <20220303154127.202856-2-dwmw2@infradead.org>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

d0d96121

KVM: SVM: Fix kvm_cache_regs.h inclusions for is_guest_mode() · 4a9e7b9e

由 Peter Gonda 提交于 2年前

Include kvm_cache_regs.h to pick up the definition of is_guest_mode(),
which is referenced by nested_svm_virtualize_tpr() in svm.h. Remove
include from svm_onhpyerv.c which was done only because of lack of
include in svm.h.

Fixes: 883b0a91 ("KVM: SVM: Move Nested SVM Implementation to nested.c")
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: NPeter Gonda <pgonda@google.com>
Message-Id: <20220304161032.2270688-1-pgonda@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

4a9e7b9e

KVM: x86/pmu: Use different raw event masks for AMD and Intel · 95b065bf

由 Jim Mattson 提交于 2年前

The third nybble of AMD's event select overlaps with Intel's IN_TX and
IN_TXCP bits. Therefore, we can't use AMD64_RAW_EVENT_MASK on Intel
platforms that support TSX.

Declare a raw_event_mask in the kvm_pmu structure, initialize it in
the vendor-specific pmu_refresh() functions, and use that mask for
PERF_TYPE_RAW configurations in reprogram_gp_counter().

Fixes: 710c4765 ("KVM: x86/pmu: Use AMD64_RAW_EVENT_MASK for PERF_TYPE_RAW")
Signed-off-by: NJim Mattson <jmattson@google.com>
Message-Id: <20220308012452.3468611-1-jmattson@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

95b065bf

KVM: Don't actually set a request when evicting vCPUs for GFN cache invd · df06dae3

由 Sean Christopherson 提交于 2年前

Don't actually set a request bit in vcpu->requests when making a request
purely to force a vCPU to exit the guest.  Logging a request but not
actually consuming it would cause the vCPU to get stuck in an infinite
loop during KVM_RUN because KVM would see the pending request and bail
from VM-Enter to service the request.

Note, it's currently impossible for KVM to set KVM_REQ_GPC_INVALIDATE as
nothing in KVM is wired up to set guest_uses_pa=true.  But, it'd be all
too easy for arch code to introduce use of kvm_gfn_to_pfn_cache_init()
without implementing handling of the request, especially since getting
test coverage of MMU notifier interaction with specific KVM features
usually requires a directed test.

Opportunistically rename gfn_to_pfn_cache_invalidate_start()'s wake_vcpus
to evict_vcpus.  The purpose of the request is to get vCPUs out of guest
mode, it's supposed to _avoid_ waking vCPUs that are blocking.

Opportunistically rename KVM_REQ_GPC_INVALIDATE to be more specific as to
what it wants to accomplish, and to genericize the name so that it can
used for similar but unrelated scenarios, should they arise in the future.
Add a comment and documentation to explain why the "no action" request
exists.

Add compile-time assertions to help detect improper usage.  Use the inner
assertless helper in the one s390 path that makes requests without a
hardcoded request.

Cc: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20220223165302.3205276-1-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

df06dae3

KVM: avoid double put_page with gfn-to-pfn cache · 79593c08

由 David Woodhouse 提交于 2年前

If the cache's user host virtual address becomes invalid, there
is still a path from kvm_gfn_to_pfn_cache_refresh() where __release_gpc()
could release the pfn but the gpc->pfn field has not been overwritten
with an error value.  If this happens, kvm_gfn_to_pfn_cache_unmap will
call put_page again on the same page.

Cc: stable@vger.kernel.org
Fixes: 982ed0de ("KVM: Reinstate gfn_to_pfn_cache with invalidation support")
Signed-off-by: NDavid Woodhouse <dwmw2@infradead.org>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

79593c08

KVM: x86/mmu: Zap only TDP MMU leafs in zap range and mmu_notifier unmap · f47e5bbb

由 Sean Christopherson 提交于 2年前

Re-introduce zapping only leaf SPTEs in kvm_zap_gfn_range() and
kvm_tdp_mmu_unmap_gfn_range(), this time without losing a pending TLB
flush when processing multiple roots (including nested TDP shadow roots).
Dropping the TLB flush resulted in random crashes when running Hyper-V
Server 2019 in a guest with KSM enabled in the host (or any source of
mmu_notifier invalidations, KSM is just the easiest to force).

This effectively revert commits 873dd122
and fcb93eb6, and thus restores commit
cf3e2642, plus this delta on top:

bool kvm_tdp_mmu_zap_leafs(struct kvm *kvm, int as_id, gfn_t start, gfn_t end,
        struct kvm_mmu_page *root;

        for_each_tdp_mmu_root_yield_safe(kvm, root, as_id)
-               flush = tdp_mmu_zap_leafs(kvm, root, start, end, can_yield, false);
+               flush = tdp_mmu_zap_leafs(kvm, root, start, end, can_yield, flush);

        return flush;
 }

Cc: Ben Gardon <bgardon@google.com>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Tested-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20220325230348.2587437-1-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

f47e5bbb

KVM: SVM: fix panic on out-of-bounds guest IRQ · a80ced6e

由 Yi Wang 提交于 2年前

As guest_irq is coming from KVM_IRQFD API call, it may trigger
crash in svm_update_pi_irte() due to out-of-bounds:

crash> bt
PID: 22218  TASK: ffff951a6ad74980  CPU: 73  COMMAND: "vcpu8"
 #0 [ffffb1ba6707fa40] machine_kexec at ffffffff8565b397
 #1 [ffffb1ba6707fa90] __crash_kexec at ffffffff85788a6d
 #2 [ffffb1ba6707fb58] crash_kexec at ffffffff8578995d
 #3 [ffffb1ba6707fb70] oops_end at ffffffff85623c0d
 #4 [ffffb1ba6707fb90] no_context at ffffffff856692c9
 #5 [ffffb1ba6707fbf8] exc_page_fault at ffffffff85f95b51
 #6 [ffffb1ba6707fc50] asm_exc_page_fault at ffffffff86000ace
    [exception RIP: svm_update_pi_irte+227]
    RIP: ffffffffc0761b53  RSP: ffffb1ba6707fd08  RFLAGS: 00010086
    RAX: ffffb1ba6707fd78  RBX: ffffb1ba66d91000  RCX: 0000000000000001
    RDX: 00003c803f63f1c0  RSI: 000000000000019a  RDI: ffffb1ba66db2ab8
    RBP: 000000000000019a   R8: 0000000000000040   R9: ffff94ca41b82200
    R10: ffffffffffffffcf  R11: 0000000000000001  R12: 0000000000000001
    R13: 0000000000000001  R14: ffffffffffffffcf  R15: 000000000000005f
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #7 [ffffb1ba6707fdb8] kvm_irq_routing_update at ffffffffc09f19a1 [kvm]
 #8 [ffffb1ba6707fde0] kvm_set_irq_routing at ffffffffc09f2133 [kvm]
 #9 [ffffb1ba6707fe18] kvm_vm_ioctl at ffffffffc09ef544 [kvm]
    RIP: 00007f143c36488b  RSP: 00007f143a4e04b8  RFLAGS: 00000246
    RAX: ffffffffffffffda  RBX: 00007f05780041d0  RCX: 00007f143c36488b
    RDX: 00007f05780041d0  RSI: 000000004008ae6a  RDI: 0000000000000020
    RBP: 00000000000004e8   R8: 0000000000000008   R9: 00007f05780041e0
    R10: 00007f0578004560  R11: 0000000000000246  R12: 00000000000004e0
    R13: 000000000000001a  R14: 00007f1424001c60  R15: 00007f0578003bc0
    ORIG_RAX: 0000000000000010  CS: 0033  SS: 002b

Vmx have been fix this in commit 3a8b0677 (KVM: VMX: Do not BUG() on
out-of-bounds guest IRQ), so we can just copy source from that to fix
this.
Co-developed-by: NYi Liu <liu.yi24@zte.com.cn>
Signed-off-by: NYi Liu <liu.yi24@zte.com.cn>
Signed-off-by: NYi Wang <wang.yi59@zte.com.cn>
Message-Id: <20220309113025.44469-1-wang.yi59@zte.com.cn>
Cc: stable@vger.kernel.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

a80ced6e

KVM: MMU: propagate alloc_workqueue failure · a1a39128

由 Paolo Bonzini 提交于 2年前

If kvm->arch.tdp_mmu_zap_wq cannot be created, the failure has
to be propagated up to kvm_mmu_init_vm and kvm_arch_init_vm.
kvm_arch_init_vm also has to undo all the initialization, so
group all the MMU initialization code at the beginning and
handle cleaning up of kvm_page_track_init.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

a1a39128

30 3月, 2022 11 次提交

KVM: x86: Forbid VMM to set SYNIC/STIMER MSRs when SynIC wasn't activated · b1e34d32

由 Vitaly Kuznetsov 提交于 2年前

Setting non-zero values to SYNIC/STIMER MSRs activates certain features,
this should not happen when KVM_CAP_HYPERV_SYNIC{,2} was not activated.

Note, it would've been better to forbid writing anything to SYNIC/STIMER
MSRs, including zeroes, however, at least QEMU tries clearing
HV_X64_MSR_STIMER0_CONFIG without SynIC. HV_X64_MSR_EOM MSR is somewhat
'special' as writing zero there triggers an action, this also should not
happen when SynIC wasn't activated.
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20220325132140.25650-4-vkuznets@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

b1e34d32

KVM: x86: Avoid theoretical NULL pointer dereference in kvm_irq_delivery_to_apic_fast() · 00b5f371

由 Vitaly Kuznetsov 提交于 2年前

When kvm_irq_delivery_to_apic_fast() is called with APIC_DEST_SELF
shorthand, 'src' must not be NULL. Crash the VM with KVM_BUG_ON()
instead of crashing the host.
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20220325132140.25650-3-vkuznets@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

00b5f371

KVM: x86: Check lapic_in_kernel() before attempting to set a SynIC irq · 7ec37d1c

由 Vitaly Kuznetsov 提交于 2年前

When KVM_CAP_HYPERV_SYNIC{,2} is activated, KVM already checks for
irqchip_in_kernel() so normally SynIC irqs should never be set. It is,
however,  possible for a misbehaving VMM to write to SYNIC/STIMER MSRs
causing erroneous behavior.

The immediate issue being fixed is that kvm_irq_delivery_to_apic()
(kvm_irq_delivery_to_apic_fast()) crashes when called with
'irq.shorthand = APIC_DEST_SELF' and 'src == NULL'.
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20220325132140.25650-2-vkuznets@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

7ec37d1c

Documentation: KVM: add API issues section · cde363ab

由 Paolo Bonzini 提交于 2年前

Add a section to document all the different ways in which the KVM API sucks.

I am sure there are way more, give people a place to vent so that userspace
authors are aware.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Message-Id: <20220322110712.222449-4-pbonzini@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

cde363ab

Documentation: KVM: add virtual CPU errata documentation · 45016721

由 Paolo Bonzini 提交于 2年前

Add a file to document all the different ways in which the virtual CPU
emulation is imperfect.  Include an example to show how to document
such errata.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Reviewed-by: NJim Mattson <jmattson@google.com>
Reviewed-by: NOliver Upton <oupton@google.com>
Message-Id: <20220322110712.222449-3-pbonzini@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

45016721

Documentation: KVM: add separate directories for architecture-specific documentation · daec8d40

由 Paolo Bonzini 提交于 2年前

ARM already has an arm/ subdirectory, but s390 and x86 do not even though
they have a relatively large number of files specific to them.  Create
new directories in Documentation/virt/kvm for these two architectures
as well.

While at it, group the API documentation and the developer documentation
in the table of contents.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Message-Id: <20220322110712.222449-2-pbonzini@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

daec8d40

Documentation: kvm: include new locks · 99a17b77

由 Paolo Bonzini 提交于 2年前

kvm->mn_invalidate_lock and kvm->slots_arch_lock were not included in the
documentation, add them.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Message-Id: <20220322110720.222499-3-pbonzini@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

99a17b77

Documentation: kvm: fixes for locking.rst · e9611bf9

由 Paolo Bonzini 提交于 2年前

Separate the various locks clearly, and include the new names of blocked_vcpu_on_cpu_lock
and blocked_vcpu_on_cpu.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Message-Id: <20220322110720.222499-2-pbonzini@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

e9611bf9

KVM: x86: Fix clang -Wimplicit-fallthrough in do_host_cpuid() · 07ea4ab1

由 Nathan Chancellor 提交于 2年前

Clang warns:

  arch/x86/kvm/cpuid.c:739:2: error: unannotated fall-through between switch labels [-Werror,-Wimplicit-fallthrough]
          default:
          ^
  arch/x86/kvm/cpuid.c:739:2: note: insert 'break;' to avoid fall-through
          default:
          ^
          break;
  1 error generated.

Clang is a little more pedantic than GCC, which does not warn when
falling through to a case that is just break or return. Clang's version
is more in line with the kernel's own stance in deprecated.rst, which
states that all switch/case blocks must end in either break,
fallthrough, continue, goto, or return. Add the missing break to silence
the warning.

Fixes: f144c49e ("KVM: x86: synthesize CPUID leaf 0x80000021h if useful")
Reported-by: Nkernel test robot <lkp@intel.com>
Signed-off-by: NNathan Chancellor <nathan@kernel.org>
Message-Id: <20220322152906.112164-1-nathan@kernel.org>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

07ea4ab1

Revert "KVM: set owner of cpu and vm file operations" · 70375c2d

由 David Matlack 提交于 2年前

This reverts commit 3d3aab1b.

Now that the KVM module's lifetime is tied to kvm.users_count, there is
no need to also tie it's lifetime to the lifetime of the VM and vCPU
file descriptors.
Suggested-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NDavid Matlack <dmatlack@google.com>
Message-Id: <20220303183328.1499189-3-dmatlack@google.com>
Reviewed-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

70375c2d

KVM: Prevent module exit until all VMs are freed · 5f6de5cb

由 David Matlack 提交于 2年前

Tie the lifetime the KVM module to the lifetime of each VM via
kvm.users_count. This way anything that grabs a reference to the VM via
kvm_get_kvm() cannot accidentally outlive the KVM module.

Prior to this commit, the lifetime of the KVM module was tied to the
lifetime of /dev/kvm file descriptors, VM file descriptors, and vCPU
file descriptors by their respective file_operations "owner" field.
This approach is insufficient because references grabbed via
kvm_get_kvm() do not prevent closing any of the aforementioned file
descriptors.

This fixes a long standing theoretical bug in KVM that at least affects
async page faults. kvm_setup_async_pf() grabs a reference via
kvm_get_kvm(), and drops it in an asynchronous work callback. Nothing
prevents the VM file descriptor from being closed and the KVM module
from being unloaded before this callback runs.

Fixes: af585b92 ("KVM: Halt vcpu if page it tries to access is swapped out")
Fixes: 3d3aab1b ("KVM: set owner of cpu and vm file operations")
Cc: stable@vger.kernel.org
Suggested-by: NBen Gardon <bgardon@google.com>
[ Based on a patch from Ben implemented for Google's kernel. ]
Signed-off-by: NDavid Matlack <dmatlack@google.com>
Message-Id: <20220303183328.1499189-2-dmatlack@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

5f6de5cb

21 3月, 2022 2 次提交

KVM: use kvcalloc for array allocations · c9b8fecd

由 Paolo Bonzini 提交于 2年前

Instead of using array_size, use a function that takes care of the
multiplication.  While at it, switch to kvcalloc since this allocation
should not be very large.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c9b8fecd

KVM: x86: Introduce KVM_CAP_DISABLE_QUIRKS2 · 6d849191

由 Oliver Upton 提交于 2年前

KVM_CAP_DISABLE_QUIRKS is irrevocably broken. The capability does not
advertise the set of quirks which may be disabled to userspace, so it is
impossible to predict the behavior of KVM. Worse yet,
KVM_CAP_DISABLE_QUIRKS will tolerate any value for cap->args[0], meaning
it fails to reject attempts to set invalid quirk bits.

The only valid workaround for the quirky quirks API is to add a new CAP.
Actually advertise the set of quirks that can be disabled to userspace
so it can predict KVM's behavior. Reject values for cap->args[0] that
contain invalid bits.

Finally, add documentation for the new capability and describe the
existing quirks.
Signed-off-by: NOliver Upton <oupton@google.com>
Message-Id: <20220301060351.442881-5-oupton@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

6d849191

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功