提交 · 404d5d7bff0d419fe11c7eaebca9ec8f25258f95 · openeuler / Kernel

16 5月, 2020 18 次提交

KVM: X86: Introduce more exit_fastpath_completion enum values · 404d5d7b

由 Wanpeng Li 提交于 4月 28, 2020

Adds a fastpath_t typedef since enum lines are a bit long, and replace
EXIT_FASTPATH_SKIP_EMUL_INS with two new exit_fastpath_completion enum values.

- EXIT_FASTPATH_EXIT_HANDLED  kvm will still go through it's full run loop,
                              but it would skip invoking the exit handler.

- EXIT_FASTPATH_REENTER_GUEST complete fastpath, guest can be re-entered
                              without invoking the exit handler or going
                              back to vcpu_run
Tested-by: NHaiwei Li <lihaiwei@tencent.com>
Cc: Haiwei Li <lihaiwei@tencent.com>
Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
Message-Id: <1588055009-12677-4-git-send-email-wanpengli@tencent.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

404d5d7b

KVM: X86: Introduce kvm_vcpu_exit_request() helper · 5a9f5443

由 Wanpeng Li 提交于 4月 28, 2020

Introduce kvm_vcpu_exit_request() helper, we need to check some conditions
before enter guest again immediately, we skip invoking the exit handler and
go through full run loop if complete fastpath but there is stuff preventing
we enter guest again immediately.
Tested-by: NHaiwei Li <lihaiwei@tencent.com>
Cc: Haiwei Li <lihaiwei@tencent.com>
Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
Message-Id: <1588055009-12677-5-git-send-email-wanpengli@tencent.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

5a9f5443

KVM: x86: Print symbolic names of VMX VM-Exit flags in traces · 2c4c4132

由 Sean Christopherson 提交于 5月 08, 2020

Use __print_flags() to display the names of VMX flags in VM-Exit traces
and strip the flags when printing the basic exit reason, e.g. so that a
failed VM-Entry due to invalid guest state gets recorded as
"INVALID_STATE FAILED_VMENTRY" instead of "0x80000021".

Opportunstically fix misaligned variables in the kvm_exit and
kvm_nested_vmexit_inject tracepoints.
Reviewed-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200508235348.19427-3-sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

2c4c4132

KVM: VMX: Introduce generic fastpath handler · dcf068da

由 Wanpeng Li 提交于 4月 28, 2020

Introduce generic fastpath handler to handle MSR fastpath, VMX-preemption
timer fastpath etc; move it after vmx_complete_interrupts() in order to
catch events delivered to the guest, and abort the fast path in later
patches.  While at it, move the kvm_exit tracepoint so that it is printed
for fastpath vmexits as well.

There is no observed performance effect for the IPI fastpath after this patch.
Tested-by: NHaiwei Li <lihaiwei@tencent.com>
Cc: Haiwei Li <lihaiwei@tencent.com>
Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
Suggested-by: NSean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <1588055009-12677-2-git-send-email-wanpengli@tencent.com>
Reviewed-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

dcf068da

KVM: nVMX: Drop superfluous VMREAD of vmcs02.GUEST_SYSENTER_* · 9e826feb

由 Sean Christopherson 提交于 4月 28, 2020

Don't propagate GUEST_SYSENTER_* from vmcs02 to vmcs12 on nested VM-Exit
as the vmcs12 fields are updated in vmx_set_msr(), and writes to the
corresponding MSRs are always intercepted by KVM when running L2.

Dropping the propagation was intended to be done in the same commit that
added vmcs12 writes in vmx_set_msr()[1], but for reasons unknown was
only shuffled around[2][3].

[1] https://patchwork.kernel.org/patch/10933215
[2] https://patchwork.kernel.org/patch/10933215/#22682289
[3] https://lore.kernel.org/patchwork/patch/1088643Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200428231025.12766-3-sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

9e826feb

KVM: nVMX: Truncate writes to vmcs.SYSENTER_EIP/ESP for 32-bit vCPU · 2408500d

由 Sean Christopherson 提交于 4月 28, 2020

Explicitly truncate the data written to vmcs.SYSENTER_EIP/ESP on WRMSR
if the virtual CPU doesn't support 64-bit mode. The SYSENTER address
fields in the VMCS are natural width, i.e. bits 63:32 are dropped if the
CPU doesn't support Intel 64 architectures. This behavior is visible to
the guest after a VM-Exit/VM-Exit roundtrip, e.g. if the guest sets bits
63:32 in the actual MSR.
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200428231025.12766-2-sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

2408500d

KVM: VMX: Improve handle_external_interrupt_irqoff inline assembly · 551896e0

由 Uros Bizjak 提交于 5月 04, 2020

Improve handle_external_interrupt_irqoff inline assembly in several ways:
- remove unneeded %c operand modifiers and "$" prefixes
- use %rsp instead of _ASM_SP, since we are in CONFIG_X86_64 part
- use $-16 immediate to align %rsp
- remove unneeded use of __ASM_SIZE macro
- define "ss" named operand only for X86_64

The patch introduces no functional changes.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: NUros Bizjak <ubizjak@gmail.com>
Message-Id: <20200504155706.2516956-1-ubizjak@gmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

551896e0

KVM: Documentation: Fix up cpuid page · 62315b63

由 Peter Xu 提交于 4月 16, 2020

0x4b564d00 and 0x4b564d01 belong to KVM_FEATURE_CLOCKSOURCE2.
Signed-off-by: NPeter Xu <peterx@redhat.com>
Message-Id: <20200416155913.267562-1-peterx@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

62315b63

KVM: X86: Sanity check on gfn before removal · 0fd46044

由 Peter Xu 提交于 4月 16, 2020

The index returned by kvm_async_pf_gfn_slot() will be removed when an
async pf gfn is going to be removed.  However kvm_async_pf_gfn_slot()
is not reliable in that it can return the last key it loops over even
if the gfn is not found in the async gfn array.  It should never
happen, but it's still better to sanity check against that to make
sure no unexpected gfn will be removed.
Signed-off-by: NPeter Xu <peterx@redhat.com>
Message-Id: <20200416155910.267514-1-peterx@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

0fd46044

KVM: No need to retry for hva_to_pfn_remapped() · 5b494aea

由 Peter Xu 提交于 4月 16, 2020

hva_to_pfn_remapped() calls fixup_user_fault(), which has already
handled the retry gracefully.  Even if "unlocked" is set to true, it
means that we've got a VM_FAULT_RETRY inside fixup_user_fault(),
however the page fault has already retried and we should have the pfn
set correctly.  No need to do that again.
Signed-off-by: NPeter Xu <peterx@redhat.com>
Message-Id: <20200416155906.267462-1-peterx@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

5b494aea

KVM: X86: Force ASYNC_PF_PER_VCPU to be power of two · dd03bcaa

由 Peter Xu 提交于 4月 16, 2020

Forcing the ASYNC_PF_PER_VCPU to be power of two is much easier to be
used rather than calling roundup_pow_of_two() from time to time.  Do
this by adding a BUILD_BUG_ON() inside the hash function.

Another point is that generally async pf does not allow concurrency
over ASYNC_PF_PER_VCPU after all (see kvm_setup_async_pf()), so it
does not make much sense either to have it not a power of two or some
of the entries will definitely be wasted.
Signed-off-by: NPeter Xu <peterx@redhat.com>
Message-Id: <20200416155859.267366-1-peterx@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

dd03bcaa

KVM: VMX: Remove unneeded __ASM_SIZE usage with POP instruction · c16312f4

由 Uros Bizjak 提交于 4月 27, 2020

POP [mem] defaults to the word size, and the only legal non-default
size is 16 bits, e.g. a 32-bit POP will #UD in 64-bit mode and vice
versa, no need to use __ASM_SIZE macro to force operating mode.

Changes since v1:
- Fix commit message.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Reviewed-by: NSean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: NUros Bizjak <ubizjak@gmail.com>
Message-Id: <20200427205035.1594232-1-ubizjak@gmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c16312f4

KVM: x86/mmu: Add a helper to consolidate root sp allocation · 8123f265

由 Sean Christopherson 提交于 4月 27, 2020

Add a helper, mmu_alloc_root(), to consolidate the allocation of a root
shadow page, which has the same basic mechanics for all flavors of TDP
and shadow paging.

Note, __pa(sp->spt) doesn't need to be protected by mmu_lock, sp->spt
points at a kernel page.

No functional change intended.
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200428023714.31923-1-sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

8123f265

KVM: x86/mmu: Drop KVM's hugepage enums in favor of the kernel's enums · 3bae0459

由 Sean Christopherson 提交于 4月 27, 2020

Replace KVM's PT_PAGE_TABLE_LEVEL, PT_DIRECTORY_LEVEL and PT_PDPE_LEVEL
with the kernel's PG_LEVEL_4K, PG_LEVEL_2M and PG_LEVEL_1G.  KVM's
enums are borderline impossible to remember and result in code that is
visually difficult to audit, e.g.

        if (!enable_ept)
                ept_lpage_level = 0;
        else if (cpu_has_vmx_ept_1g_page())
                ept_lpage_level = PT_PDPE_LEVEL;
        else if (cpu_has_vmx_ept_2m_page())
                ept_lpage_level = PT_DIRECTORY_LEVEL;
        else
                ept_lpage_level = PT_PAGE_TABLE_LEVEL;

versus

        if (!enable_ept)
                ept_lpage_level = 0;
        else if (cpu_has_vmx_ept_1g_page())
                ept_lpage_level = PG_LEVEL_1G;
        else if (cpu_has_vmx_ept_2m_page())
                ept_lpage_level = PG_LEVEL_2M;
        else
                ept_lpage_level = PG_LEVEL_4K;

No functional change intended.
Suggested-by: NBarret Rhoden <brho@google.com>
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200428005422.4235-4-sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

3bae0459

KVM: x86/mmu: Move max hugepage level to a separate #define · e662ec3e

由 Sean Christopherson 提交于 4月 27, 2020

Rename PT_MAX_HUGEPAGE_LEVEL to KVM_MAX_HUGEPAGE_LEVEL and make it a
separate define in anticipation of dropping KVM's PT_*_LEVEL enums in
favor of the kernel's PG_LEVEL_* enums.

No functional change intended.
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200428005422.4235-3-sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

e662ec3e

KVM: x86/mmu: Tweak PSE hugepage handling to avoid 2M vs 4M conundrum · b2f432f8

由 Sean Christopherson 提交于 4月 27, 2020

Change the PSE hugepage handling in walk_addr_generic() to fire on any
page level greater than PT_PAGE_TABLE_LEVEL, a.k.a. PG_LEVEL_4K.  PSE
paging only has two levels, so "== 2" and "> 1" are functionally the
same, i.e. this is a nop.

A future patch will drop KVM's PT_*_LEVEL enums in favor of the kernel's
PG_LEVEL_* enums, at which point "walker->level == PG_LEVEL_2M" is
semantically incorrect (though still functionally ok).

No functional change intended.
Suggested-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200428005422.4235-2-sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

b2f432f8

kvm: x86: Cleanup vcpu->arch.guest_xstate_size · a71936ab

由 Xiaoyao Li 提交于 4月 29, 2020

vcpu->arch.guest_xstate_size lost its only user since commit df1daba7
("KVM: x86: support XSAVES usage in the host"), so clean it up.
Signed-off-by: NXiaoyao Li <xiaoyao.li@intel.com>
Message-Id: <20200429154312.1411-1-xiaoyao.li@intel.com>
Reviewed-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

a71936ab

KVM: nVMX: Tweak handling of failure code for nested VM-Enter failure · 68cda40d

由 Sean Christopherson 提交于 5月 11, 2020

Use an enum for passing around the failure code for a failed VM-Enter
that results in VM-Exit to provide a level of indirection from the final
resting place of the failure code, vmcs.EXIT_QUALIFICATION. The exit
qualification field is an unsigned long, e.g. passing around
'u32 exit_qual' throws up red flags as it suggests KVM may be dropping
bits when reporting errors to L1. This is a red herring because the
only defined failure codes are 0, 2, 3, and 4, i.e. don't come remotely
close to overflowing a u32.

Setting vmcs.EXIT_QUALIFICATION on entry failure is further complicated
by the MSR load list, which returns the (1-based) entry that failed, and
the number of MSRs to load is a 32-bit VMCS field. At first blush, it
would appear that overflowing a u32 is possible, but the number of MSRs
that can be loaded is hardcapped at 4096 (limited by MSR_IA32_VMX_MISC).

In other words, there are two completely disparate types of data that
eventually get stuffed into vmcs.EXIT_QUALIFICATION, neither of which is
an 'unsigned long' in nature. This was presumably the reasoning for
switching to 'u32' when the related code was refactored in commit
ca0bde28 ("kvm: nVMX: Split VMCS checks from nested_vmx_run()").

Using an enum for the failure code addresses the technically-possible-
but-will-never-happen scenario where Intel defines a failure code that
doesn't fit in a 32-bit integer. The enum variables and values will
either be automatically sized (gcc 5.4 behavior) or be subjected to some
combination of truncation. The former case will simply work, while the
latter will trigger a compile-time warning unless the compiler is being
particularly unhelpful.

Separating the failure code from the failed MSR entry allows for
disassociating both from vmcs.EXIT_QUALIFICATION, which avoids the
conundrum where KVM has to choose between 'u32 exit_qual' and tracking
values as 'unsigned long' that have no business being tracked as such.
To cement the split, set vmcs12->exit_qualification directly from the
entry error code or failed MSR index instead of bouncing through a local
variable.

Opportunistically rename the variables in load_vmcs12_host_state() and
vmx_set_nested_state() to call out that they're ignored, set exit_reason
on demand on nested VM-Enter failure, and add a comment in
nested_vmx_load_msr() to call out that returning 'i + 1' can't wrap.

No functional change intended.
Reported-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Cc: Jim Mattson <jmattson@google.com>
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200511220529.11402-1-sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

68cda40d

14 5月, 2020 22 次提交

KVM: x86/mmu: Capture TDP level when updating CPUID · e93fd3b3

由 Sean Christopherson 提交于 5月 01, 2020

Snapshot the TDP level now that it's invariant (SVM) or dependent only
on host capabilities and guest CPUID (VMX).  This avoids having to call
kvm_x86_ops.get_tdp_level() when initializing a TDP MMU and/or
calculating the page role, and thus avoids the associated retpoline.

Drop the WARN in vmx_get_tdp_level() as updating CPUID while L2 is
active is legal, if dodgy.

No functional change intended.
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200502043234.12481-11-sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

e93fd3b3

KVM: VMX: Move nested EPT out of kvm_x86_ops.get_tdp_level() hook · 0047fcad

由 Sean Christopherson 提交于 5月 01, 2020

Separate the "core" TDP level handling from the nested EPT path to make
it clear that kvm_x86_ops.get_tdp_level() is used if and only if nested
EPT is not in use (kvm_init_shadow_ept_mmu() calculates the level from
the passed in vmcs12->eptp).  Add a WARN_ON() to enforce that the
kvm_x86_ops hook is not called for nested EPT.

This sets the stage for snapshotting the non-"nested EPT" TDP page level
during kvm_cpuid_update() to avoid the retpoline associated with
kvm_x86_ops.get_tdp_level() when resetting the MMU, a relatively
frequent operation when running a nested guest.

No functional change intended.
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200502043234.12481-10-sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

0047fcad

KVM: VMX: Add proper cache tracking for CR0 · bd31fe49

由 Sean Christopherson 提交于 5月 01, 2020

Move CR0 caching into the standard register caching mechanism in order
to take advantage of the availability checks provided by regs_avail.
This avoids multiple VMREADs in the (uncommon) case where kvm_read_cr0()
is called multiple times in a single VM-Exit, and more importantly
eliminates a kvm_x86_ops hook, saves a retpoline on SVM when reading
CR0, and squashes the confusing naming discrepancy of "cache_reg" vs.
"decache_cr0_guest_bits".

No functional change intended.
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200502043234.12481-8-sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

bd31fe49

KVM: VMX: Add proper cache tracking for CR4 · f98c1e77

由 Sean Christopherson 提交于 5月 01, 2020

Move CR4 caching into the standard register caching mechanism in order
to take advantage of the availability checks provided by regs_avail.
This avoids multiple VMREADs and retpolines (when configured) during
nested VMX transitions as kvm_read_cr4_bits() is invoked multiple times
on each transition, e.g. when stuffing CR0 and CR3.

As an added bonus, this eliminates a kvm_x86_ops hook, saves a retpoline
on SVM when reading CR4, and squashes the confusing naming discrepancy
of "cache_reg" vs. "decache_cr4_guest_bits".

No functional change intended.
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200502043234.12481-7-sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

f98c1e77

KVM: nVMX: Unconditionally validate CR3 during nested transitions · 0cc69204

由 Sean Christopherson 提交于 5月 01, 2020

Unconditionally check the validity of the incoming CR3 during nested
VM-Enter/VM-Exit to avoid invoking kvm_read_cr3() in the common case
where the guest isn't using PAE paging.  If vmcs.GUEST_CR3 hasn't yet
been cached (common case), kvm_read_cr3() will trigger a VMREAD.  The
VMREAD (~30 cycles) alone is likely slower than nested_cr3_valid()
(~5 cycles if vcpu->arch.maxphyaddr gets a cache hit), and the poor
exchange only gets worse when retpolines are enabled as the call to
kvm_x86_ops.cache_reg() will incur a retpoline (60+ cycles).
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200502043234.12481-3-sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

0cc69204

KVM: x86: Save L1 TSC offset in 'struct kvm_vcpu_arch' · 56ba77a4

由 Sean Christopherson 提交于 5月 01, 2020

Save L1's TSC offset in 'struct kvm_vcpu_arch' and drop the kvm_x86_ops
hook read_l1_tsc_offset().  This avoids a retpoline (when configured)
when reading L1's effective TSC, which is done at least once on every
VM-Exit.

No functional change intended.
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200502043234.12481-2-sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

56ba77a4

KVM: nVMX: Skip IBPB when temporarily switching between vmcs01 and vmcs02 · 1af1bb05

由 Sean Christopherson 提交于 5月 06, 2020

Skip the Indirect Branch Prediction Barrier that is triggered on a VMCS
switch when temporarily loading vmcs02 to synchronize it to vmcs12, i.e.
give copy_vmcs02_to_vmcs12_rare() the same treatment as
vmx_switch_vmcs().

Make vmx_vcpu_load() static now that it's only referenced within vmx.c.
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200506235850.22600-3-sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

1af1bb05

KVM: nVMX: Skip IBPB when switching between vmcs01 and vmcs02 · 5c911bef

由 Sean Christopherson 提交于 5月 01, 2020

Skip the Indirect Branch Prediction Barrier that is triggered on a VMCS
switch when running with spectre_v2_user=on/auto if the switch is
between two VMCSes in the same guest, i.e. between vmcs01 and vmcs02.
The IBPB is intended to prevent one guest from attacking another, which
is unnecessary in the nested case as it's the same guest from KVM's
perspective.

This all but eliminates the overhead observed for nested VMX transitions
when running with CONFIG_RETPOLINE=y and spectre_v2_user=on/auto, which
can be significant, e.g. roughly 3x on current systems.
Reported-by: NAlexander Graf <graf@amazon.com>
Cc: KarimAllah Raslan <karahmed@amazon.de>
Cc: stable@vger.kernel.org
Fixes: 15d45071 ("KVM/x86: Add IBPB support")
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200501163117.4655-1-sean.j.christopherson@intel.com>
[Invert direction of bool argument. - Paolo]
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

5c911bef

KVM: VMX: Use accessor to read vmcs.INTR_INFO when handling exception · f27ad73a

由 Sean Christopherson 提交于 4月 27, 2020

Use vmx_get_intr_info() when grabbing the cached vmcs.INTR_INFO in
handle_exception_nmi() to ensure the cache isn't stale. Bypassing the
caching accessor doesn't cause any known issues as the cache is always
refreshed by handle_exception_nmi_irqoff(), but the whole point of
adding the proper caching mechanism was to avoid such dependencies.

Fixes: 87915858 ("KVM: VMX: Cache vmcs.EXIT_INTR_INFO using arch avail_reg flags")
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200427171837.22613-1-sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

f27ad73a

KVM: x86: handle wrap around 32-bit address space · fede8076

由 Paolo Bonzini 提交于 4月 27, 2020

KVM is not handling the case where EIP wraps around the 32-bit address
space (that is, outside long mode).  This is needed both in vmx.c
and in emulate.c.  SVM with NRIPS is okay, but it can still print
an error to dmesg due to integer overflow.
Reported-by: NNick Peterson <everdox@gmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

fede8076

kvm/eventfd: remove unneeded conversion to bool · c4e115f0

由 Jason Yan 提交于 4月 20, 2020

The '==' expression itself is bool, no need to convert it to bool again.
This fixes the following coccicheck warning:

virt/kvm/eventfd.c:724:38-43: WARNING: conversion to bool not needed
here
Signed-off-by: NJason Yan <yanaijie@huawei.com>
Message-Id: <20200420123805.4494-1-yanaijie@huawei.com>
Reviewed-by: NPeter Xu <peterx@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c4e115f0

kvm: Replace vcpu->swait with rcuwait · da4ad88c

由 Davidlohr Bueso 提交于 4月 23, 2020

The use of any sort of waitqueue (simple or regular) for
wait/waking vcpus has always been an overkill and semantically
wrong. Because this is per-vcpu (which is blocked) there is
only ever a single waiting vcpu, thus no need for any sort of
queue.

As such, make use of the rcuwait primitive, with the following
considerations:

  - rcuwait already provides the proper barriers that serialize
  concurrent waiter and waker.

  - Task wakeup is done in rcu read critical region, with a
  stable task pointer.

  - Because there is no concurrency among waiters, we need
  not worry about rcuwait_wait_event() calls corrupting
  the wait->task. As a consequence, this saves the locking
  done in swait when modifying the queue. This also applies
  to per-vcore wait for powerpc kvm-hv.

The x86 tscdeadline_latency test mentioned in 8577370f
("KVM: Use simple waitqueue for vcpu->wq") shows that, on avg,
latency is reduced by around 15-20% with this change.

Cc: Paul Mackerras <paulus@ozlabs.org>
Cc: kvmarm@lists.cs.columbia.edu
Cc: linux-mips@vger.kernel.org
Reviewed-by: NMarc Zyngier <maz@kernel.org>
Signed-off-by: NDavidlohr Bueso <dbueso@suse.de>
Message-Id: <20200424054837.5138-6-dave@stgolabs.net>
[Avoid extra logic changes. - Paolo]
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

da4ad88c

rcuwait: Introduce rcuwait_active() · 191a43be

由 Davidlohr Bueso 提交于 4月 23, 2020

This call is lockless and thus should not be trusted blindly.
For example, the return value of rcuwait_wakeup() should be used to
track whether a process was woken up.
Signed-off-by: NDavidlohr Bueso <dbueso@suse.de>
Message-Id: <20200424054837.5138-5-dave@stgolabs.net>
Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

191a43be

rcuwait: Introduce prepare_to and finish_rcuwait · 5c21f7b3

由 Davidlohr Bueso 提交于 4月 23, 2020

This allows further flexibility for some callers to implement
ad-hoc versions of the generic rcuwait_wait_event(). For example,
kvm will need this to maintain tracing semantics. The naming
is of course similar to what waitqueue apis offer.

Also go ahead and make use of rcu_assign_pointer() for both task
writes as it will make the __rcu sparse people happy - this will
be the special nil case, thus no added serialization.
Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: NDavidlohr Bueso <dbueso@suse.de>
Message-Id: <20200424054837.5138-4-dave@stgolabs.net>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

5c21f7b3

rcuwait: Let rcuwait_wake_up() return whether or not a task was awoken · 9d9a6ebf

由 Davidlohr Bueso 提交于 4月 23, 2020

Propagating the return value of wake_up_process() back to the caller
can come in handy for future users, such as for statistics or
accounting purposes.
Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: NDavidlohr Bueso <dbueso@suse.de>
Message-Id: <20200424054837.5138-3-dave@stgolabs.net>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

9d9a6ebf

rcuwait: Fix stale wake call name in comment · c9d64a1b

由 Davidlohr Bueso 提交于 4月 23, 2020

The 'trywake' name was renamed to simply 'wake', update the comment.
Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: NDavidlohr Bueso <dbueso@suse.de>
Message-Id: <20200424054837.5138-2-dave@stgolabs.net>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c9d64a1b

KVM: x86: Replace late check_nested_events() hack with more precise fix · c300ab9f

由 Paolo Bonzini 提交于 4月 23, 2020

Add an argument to interrupt_allowed and nmi_allowed, to checking if
interrupt injection is blocked.  Use the hook to handle the case where
an interrupt arrives between check_nested_events() and the injection
logic.  Drop the retry of check_nested_events() that hack-a-fixed the
same condition.

Blocking injection is also a bit of a hack, e.g. KVM should do exiting
and non-exiting interrupt processing in a single pass, but it's a more
precise hack.  The old comment is also misleading, e.g. KVM_REQ_EVENT is
purely an optimization, setting it on every run loop (which KVM doesn't
do) should not affect functionality, only performance.
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200423022550.15113-13-sean.j.christopherson@intel.com>
[Extend to SVM, add SMI and NMI.  Even though NMI and SMI cannot come
 asynchronously right now, making the fix generic is easy and removes a
 special case. - Paolo]
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c300ab9f

KVM: VMX: Use vmx_get_rflags() to query RFLAGS in vmx_interrupt_blocked() · 7ab0abdb

由 Sean Christopherson 提交于 4月 22, 2020

Use vmx_get_rflags() instead of manually reading vmcs.GUEST_RFLAGS when
querying RFLAGS.IF so that multiple checks against interrupt blocking in
a single run loop only require a single VMREAD.
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200423022550.15113-14-sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

7ab0abdb

KVM: VMX: Use vmx_interrupt_blocked() directly from vmx_handle_exit() · db438592

由 Sean Christopherson 提交于 4月 22, 2020

Use vmx_interrupt_blocked() instead of bouncing through
vmx_interrupt_allowed() when handling edge cases in vmx_handle_exit().
The nested_run_pending check in vmx_interrupt_allowed() should never
evaluate true in the VM-Exit path.

Hoist the WARN in handle_invalid_guest_state() up to vmx_handle_exit()
to enforce the above assumption for the !enable_vnmi case, and to detect
any other potential bugs with nested VM-Enter.
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200423022550.15113-12-sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

db438592

KVM: x86: WARN on injected+pending exception even in nested case · 3b82b8d7

由 Sean Christopherson 提交于 4月 22, 2020

WARN if a pending exception is coincident with an injected exception
before calling check_nested_events() so that the WARN will fire even if
inject_pending_event() bails early because check_nested_events() detects
the conflict.  Bailing early isn't problematic (quite the opposite), but
suppressing the WARN is undesirable as it could mask a bug elsewhere in
KVM.
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200423022550.15113-11-sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

3b82b8d7

KVM: nSVM: Preserve IRQ/NMI/SMI priority irrespective of exiting behavior · 221e7610

由 Paolo Bonzini 提交于 4月 23, 2020

Short circuit vmx_check_nested_events() if an unblocked IRQ/NMI/SMI is
pending and needs to be injected into L2, priority between coincident
events is not dependent on exiting behavior.

Fixes: b518ba9f ("KVM: nSVM: implement check_nested_events for interrupts")
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

221e7610

KVM: nSVM: Report interrupts as allowed when in L2 and exit-on-interrupt is set · fc6f7c03

由 Paolo Bonzini 提交于 4月 23, 2020

Report interrupts as allowed when the vCPU is in L2 and L2 is being run with
exit-on-interrupts enabled and EFLAGS.IF=1 (either on the host or on the guest
according to VINTR).  Interrupts are always unblocked from L1's perspective
in this case.

While moving nested_exit_on_intr to svm.h, use INTERCEPT_INTR properly instead
of assuming it's zero (which it is of course).
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

fc6f7c03

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功