提交 · 6758034e4d6a7f0e26b748789ab1f83f3116d1b9 · openeuler / Kernel

18 6月, 2020 1 次提交

x86/process/64: Make save_fsgs_for_kvm() ready for FSGSBASE · 6758034e

由 Thomas Gleixner 提交于 5月 28, 2020

save_fsgs_for_kvm() is invoked via

  vcpu_enter_guest()
    kvm_x86_ops.prepare_guest_switch(vcpu)
      vmx_prepare_switch_to_guest()
        save_fsgs_for_kvm()

with preemption disabled, but interrupts enabled.

The upcoming FSGSBASE based GS safe needs interrupts to be disabled. This
could be done in the helper function, but that function is also called from
switch_to() which has interrupts disabled already.

Disable interrupts inside save_fsgs_for_kvm() and rename the function to
current_save_fsgs() so it can be invoked from other places.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20200528201402.1708239-7-sashal@kernel.org

6758034e

11 6月, 2020 1 次提交

x86/entry: Convert Machine Check to IDTENTRY_IST · 8cd501c1

由 Thomas Gleixner 提交于 2月 25, 2020

Convert #MC to IDTENTRY_MCE:
  - Implement the C entry points with DEFINE_IDTENTRY_MCE
  - Emit the ASM stub with DECLARE_IDTENTRY_MCE
  - Remove the ASM idtentry in 64bit
  - Remove the open coded ASM entry code in 32bit
  - Fixup the XEN/PV code
  - Remove the old prototypes
  - Remove the error code from *machine_check_vector() as
    it is always 0 and not used by any of the functions
    it can point to. Fixup all the functions as well.

No functional change.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Reviewed-by: NAlexandre Chartre <alexandre.chartre@oracle.com>
Acked-by: NPeter Zijlstra <peterz@infradead.org>
Acked-by: NAndy Lutomirski <luto@kernel.org>
Link: https://lkml.kernel.org/r/20200505135314.334980426@linutronix.de

8cd501c1

08 6月, 2020 1 次提交

KVM: VMX: Properly handle kvm_read/write_guest_virt*() result · 7a35e515

由 Vitaly Kuznetsov 提交于 6月 05, 2020

Syzbot reports the following issue:

WARNING: CPU: 0 PID: 6819 at arch/x86/kvm/x86.c:618
 kvm_inject_emulated_page_fault+0x210/0x290 arch/x86/kvm/x86.c:618
...
Call Trace:
...
RIP: 0010:kvm_inject_emulated_page_fault+0x210/0x290 arch/x86/kvm/x86.c:618
...
 nested_vmx_get_vmptr+0x1f9/0x2a0 arch/x86/kvm/vmx/nested.c:4638
 handle_vmon arch/x86/kvm/vmx/nested.c:4767 [inline]
 handle_vmon+0x168/0x3a0 arch/x86/kvm/vmx/nested.c:4728
 vmx_handle_exit+0x29c/0x1260 arch/x86/kvm/vmx/vmx.c:6067

'exception' we're trying to inject with kvm_inject_emulated_page_fault()
comes from:

  nested_vmx_get_vmptr()
   kvm_read_guest_virt()
     kvm_read_guest_virt_helper()
       vcpu->arch.walk_mmu->gva_to_gpa()

but it is only set when GVA to GPA conversion fails. In case it doesn't but
we still fail kvm_vcpu_read_guest_page(), X86EMUL_IO_NEEDED is returned and
nested_vmx_get_vmptr() calls kvm_inject_emulated_page_fault() with zeroed
'exception'. This happen when the argument is MMIO.

Paolo also noticed that nested_vmx_get_vmptr() is not the only place in
KVM code where kvm_read/write_guest_virt*() return result is mishandled.
VMX instructions along with INVPCID have the same issue. This was already
noticed before, e.g. see commit 541ab2ae ("KVM: x86: work around
leak of uninitialized stack contents") but was never fully fixed.

KVM could've handled the request correctly by going to userspace and
performing I/O but there doesn't seem to be a good need for such requests
in the first place.

Introduce vmx_handle_memory_failure() as an interim solution.

Note, nested_vmx_get_vmptr() now has three possible outcomes: OK, PF,
KVM_EXIT_INTERNAL_ERROR and callers need to know if userspace exit is
needed (for KVM_EXIT_INTERNAL_ERROR) in case of failure. We don't seem
to have a good enum describing this tristate, just add "int *ret" to
nested_vmx_get_vmptr() interface to pass the information.

Reported-by: syzbot+2a7156e11dc199bdbd8a@syzkaller.appspotmail.com
Suggested-by: NSean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20200605115906.532682-1-vkuznets@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

7a35e515

05 6月, 2020 1 次提交

KVM: x86: Move MPK feature detection to common code · fa44b82e

由 Babu Moger 提交于 5月 12, 2020

Both Intel and AMD support (MPK) Memory Protection Key feature.
Move the feature detection from VMX to the common code. It should
work for both the platforms now.
Signed-off-by: NBabu Moger <babu.moger@amd.com>
Message-Id: <158932795627.44260.15144185478040178638.stgit@naples-babu.amd.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

fa44b82e

01 6月, 2020 2 次提交

KVM: x86/pmu: Support full width counting · 27461da3

由 Like Xu 提交于 5月 29, 2020

Intel CPUs have a new alternative MSR range (starting from MSR_IA32_PMC0)
for GP counters that allows writing the full counter width. Enable this
range from a new capability bit (IA32_PERF_CAPABILITIES.FW_WRITE[bit 13]).

The guest would query CPUID to get the counter width, and sign extends
the counter values as needed. The traditional MSRs always limit to 32bit,
even though the counter internally is larger (48 or 57 bits).

When the new capability is set, use the alternative range which do not
have these restrictions. This lowers the overhead of perf stat slightly
because it has to do less interrupts to accumulate the counter value.
Signed-off-by: NLike Xu <like.xu@linux.intel.com>
Message-Id: <20200529074347.124619-3-like.xu@linux.intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

27461da3

KVM: x86: extend struct kvm_vcpu_pv_apf_data with token info · 68fd66f1

由 Vitaly Kuznetsov 提交于 5月 25, 2020

Currently, APF mechanism relies on the #PF abuse where the token is being
passed through CR2. If we switch to using interrupts to deliver page-ready
notifications we need a different way to pass the data. Extent the existing
'struct kvm_vcpu_pv_apf_data' with token information for page-ready
notifications.

While on it, rename 'reason' to 'flags'. This doesn't change the semantics
as we only have reasons '1' and '2' and these can be treated as bit flags
but KVM_PV_REASON_PAGE_READY is going away with interrupt based delivery
making 'reason' name misleading.

The newly introduced apf_put_user_ready() temporary puts both flags and
token information, this will be changed to put token only when we switch
to interrupt based notifications.
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20200525144125.143875-3-vkuznets@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

68fd66f1

28 5月, 2020 6 次提交

KVM: nVMX: always update CR3 in VMCS · df7e0681

由 Paolo Bonzini 提交于 5月 20, 2020

vmx_load_mmu_pgd is delaying the write of GUEST_CR3 to prepare_vmcs02 as
an optimization, but this is only correct before the nested vmentry.
If userspace is modifying CR3 with KVM_SET_SREGS after the VM has
already been put in guest mode, the value of CR3 will not be updated.
Remove the optimization, which almost never triggers anyway.

Fixes: 04f11ef4 ("KVM: nVMX: Always write vmcs02.GUEST_CR3 during nested VM-Enter")
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

df7e0681

KVM: x86: enable event window in inject_pending_event · c9d40913

由 Paolo Bonzini 提交于 5月 22, 2020

In case an interrupt arrives after nested.check_events but before the
call to kvm_cpu_has_injectable_intr, we could end up enabling the interrupt
window even if the interrupt is actually going to be a vmexit.  This is
useless rather than harmful, but it really complicates reasoning about
SVM's handling of the VINTR intercept.  We'd like to never bother with
the VINTR intercept if V_INTR_MASKING=1 && INTERCEPT_INTR=1, because in
that case there is no interrupt window and we can just exit the nested
guest whenever we want.

This patch moves the opening of the interrupt window inside
inject_pending_event.  This consolidates the check for pending
interrupt/NMI/SMI in one place, and makes KVM's usage of immediate
exits more consistent, extending it beyond just nested virtualization.

There are two functional changes here.  They only affect corner cases,
but overall they simplify the inject_pending_event.

- re-injection of still-pending events will also use req_immediate_exit
instead of using interrupt-window intercepts.  This should have no impact
on performance on Intel since it simply replaces an interrupt-window
or NMI-window exit for a preemption-timer exit.  On AMD, which has no
equivalent of the preemption time, it may incur some overhead but an
actual effect on performance should only be visible in pathological cases.

- kvm_arch_interrupt_allowed and kvm_vcpu_has_events will return true
if an interrupt, NMI or SMI is blocked by nested_run_pending.  This
makes sense because entering the VM will allow it to make progress
and deliver the event.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c9d40913

KVM: VMX: replace "fall through" with "return" to indicate different case · a8cfbae5

由 Miaohe Lin 提交于 2月 19, 2020

The second "/* fall through */" in rmode_exception() makes code harder to
read. Replace it with "return" to indicate they are different cases, only
the #DB and #BP check vcpu->guest_debug, while others don't care. And this
also improves the readability.
Suggested-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: NMiaohe Lin <linmiaohe@huawei.com>
Message-Id: <1582080348-20827-1-git-send-email-linmiaohe@huawei.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

a8cfbae5

KVM: x86: Take an unsigned 32-bit int for has_emulated_msr()'s index · cb97c2d6

由 Sean Christopherson 提交于 2月 18, 2020

Take a u32 for the index in has_emulated_msr() to match hardware, which
treats MSR indices as unsigned 32-bit values. Functionally, taking a
signed int doesn't cause problems with the current code base, but could
theoretically cause problems with 32-bit KVM, e.g. if the index were
checked via a less-than statement, which would evaluate incorrectly for
MSR indices with bit 31 set.
Reviewed-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200218234012.7110-3-sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

cb97c2d6

KVM: x86: simplify is_mmio_spte · e7581cac

由 Paolo Bonzini 提交于 5月 19, 2020

We can simply look at bits 52-53 to identify MMIO entries in KVM's page
tables. Therefore, there is no need to pass a mask to kvm_mmu_set_mmio_spte_mask.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

e7581cac

KVM: VMX: enable X86_FEATURE_WAITPKG in KVM capabilities · 0abcc8f6

由 Maxim Levitsky 提交于 5月 23, 2020

Even though we might not allow the guest to use WAITPKG's new
instructions, we should tell KVM that the feature is supported by the
host CPU.

Note that vmx_waitpkg_supported checks that WAITPKG _can_ be set in
secondary execution controls as specified by VMX capability MSR, rather
that we actually enable it for a guest.

Cc: stable@vger.kernel.org
Fixes: e69e72fa ("KVM: x86: Add support for user wait instructions")
Suggested-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20200523161455.3940-2-mlevitsk@redhat.com>
Reviewed-by: NSean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

0abcc8f6

16 5月, 2020 10 次提交

KVM: nVMX: Migrate the VMX-preemption timer · 93dff2fe

由 Jim Mattson 提交于 5月 08, 2020

The hrtimer used to emulate the VMX-preemption timer must be pinned to
the same logical processor as the vCPU thread to be interrupted if we
want to have any hope of adhering to the architectural specification
of the VMX-preemption timer. Even with this change, the emulated
VMX-preemption timer VM-exit occasionally arrives too late.
Signed-off-by: NJim Mattson <jmattson@google.com>
Reviewed-by: NPeter Shier <pshier@google.com>
Reviewed-by: NOliver Upton <oupton@google.com>
Message-Id: <20200508203643.85477-4-jmattson@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

93dff2fe

KVM: nVMX: Remove unused 'ops' param from nested_vmx_hardware_setup() · 6c1c6e58

由 Sean Christopherson 提交于 5月 06, 2020

Remove a 'struct kvm_x86_ops' param that got left behind when the nested
ops were moved to their own struct.

Fixes: 33b22172 ("KVM: x86: move nested-related kvm_x86_ops to a separate struct")
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200506204653.14683-1-sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

6c1c6e58

KVM: VMX: Handle preemption timer fastpath · 26efe2fd

由 Wanpeng Li 提交于 5月 06, 2020

This patch implements a fastpath for the preemption timer vmexit.  The vmexit
can be handled quickly so it can be performed with interrupts off and going
back directly to the guest.

Testing on SKX Server.

cyclictest in guest(w/o mwait exposed, adaptive advance lapic timer is default -1):

5540.5ns -> 4602ns       17%

kvm-unit-test/vmexit.flat:

w/o avanced timer:
tscdeadline_immed: 3028.5  -> 2494.75  17.6%
tscdeadline:       5765.7  -> 5285      8.3%

w/ adaptive advance timer default -1:
tscdeadline_immed: 3123.75 -> 2583     17.3%
tscdeadline:       4663.75 -> 4537      2.7%
Tested-by: NHaiwei Li <lihaiwei@tencent.com>
Cc: Haiwei Li <lihaiwei@tencent.com>
Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
Message-Id: <1588055009-12677-8-git-send-email-wanpengli@tencent.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

26efe2fd

KVM: x86: introduce kvm_can_use_hv_timer · 199a8b84

由 Paolo Bonzini 提交于 5月 05, 2020

Replace the ad hoc test in vmx_set_hv_timer with a test in the caller,
start_hv_timer. This test is not Intel-specific and would be duplicated
when introducing the fast path for the TSC deadline MSR.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

199a8b84

KVM: VMX: Optimize posted-interrupt delivery for timer fastpath · 379a3c8e

由 Wanpeng Li 提交于 4月 28, 2020

While optimizing posted-interrupt delivery especially for the timer
fastpath scenario, I measured kvm_x86_ops.deliver_posted_interrupt()
to introduce substantial latency because the processor has to perform
all vmentry tasks, ack the posted interrupt notification vector,
read the posted-interrupt descriptor etc.

This is not only slow, it is also unnecessary when delivering an
interrupt to the current CPU (as is the case for the LAPIC timer) because
PIR->IRR and IRR->RVI synchronization is already performed on vmentry
Therefore skip kvm_vcpu_trigger_posted_interrupt in this case, and
instead do vmx_sync_pir_to_irr() on the EXIT_FASTPATH_REENTER_GUEST
fastpath as well.
Tested-by: NHaiwei Li <lihaiwei@tencent.com>
Cc: Haiwei Li <lihaiwei@tencent.com>
Suggested-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
Message-Id: <1588055009-12677-6-git-send-email-wanpengli@tencent.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

379a3c8e

KVM: X86: Introduce more exit_fastpath_completion enum values · 404d5d7b

由 Wanpeng Li 提交于 4月 28, 2020

Adds a fastpath_t typedef since enum lines are a bit long, and replace
EXIT_FASTPATH_SKIP_EMUL_INS with two new exit_fastpath_completion enum values.

- EXIT_FASTPATH_EXIT_HANDLED  kvm will still go through it's full run loop,
                              but it would skip invoking the exit handler.

- EXIT_FASTPATH_REENTER_GUEST complete fastpath, guest can be re-entered
                              without invoking the exit handler or going
                              back to vcpu_run
Tested-by: NHaiwei Li <lihaiwei@tencent.com>
Cc: Haiwei Li <lihaiwei@tencent.com>
Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
Message-Id: <1588055009-12677-4-git-send-email-wanpengli@tencent.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

404d5d7b

KVM: VMX: Introduce generic fastpath handler · dcf068da

由 Wanpeng Li 提交于 4月 28, 2020

Introduce generic fastpath handler to handle MSR fastpath, VMX-preemption
timer fastpath etc; move it after vmx_complete_interrupts() in order to
catch events delivered to the guest, and abort the fast path in later
patches.  While at it, move the kvm_exit tracepoint so that it is printed
for fastpath vmexits as well.

There is no observed performance effect for the IPI fastpath after this patch.
Tested-by: NHaiwei Li <lihaiwei@tencent.com>
Cc: Haiwei Li <lihaiwei@tencent.com>
Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
Suggested-by: NSean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <1588055009-12677-2-git-send-email-wanpengli@tencent.com>
Reviewed-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

dcf068da

KVM: nVMX: Truncate writes to vmcs.SYSENTER_EIP/ESP for 32-bit vCPU · 2408500d

由 Sean Christopherson 提交于 4月 28, 2020

Explicitly truncate the data written to vmcs.SYSENTER_EIP/ESP on WRMSR
if the virtual CPU doesn't support 64-bit mode. The SYSENTER address
fields in the VMCS are natural width, i.e. bits 63:32 are dropped if the
CPU doesn't support Intel 64 architectures. This behavior is visible to
the guest after a VM-Exit/VM-Exit roundtrip, e.g. if the guest sets bits
63:32 in the actual MSR.
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200428231025.12766-2-sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

2408500d

KVM: VMX: Improve handle_external_interrupt_irqoff inline assembly · 551896e0

由 Uros Bizjak 提交于 5月 04, 2020

Improve handle_external_interrupt_irqoff inline assembly in several ways:
- remove unneeded %c operand modifiers and "$" prefixes
- use %rsp instead of _ASM_SP, since we are in CONFIG_X86_64 part
- use $-16 immediate to align %rsp
- remove unneeded use of __ASM_SIZE macro
- define "ss" named operand only for X86_64

The patch introduces no functional changes.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: NUros Bizjak <ubizjak@gmail.com>
Message-Id: <20200504155706.2516956-1-ubizjak@gmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

551896e0

KVM: x86/mmu: Drop KVM's hugepage enums in favor of the kernel's enums · 3bae0459

由 Sean Christopherson 提交于 4月 27, 2020

Replace KVM's PT_PAGE_TABLE_LEVEL, PT_DIRECTORY_LEVEL and PT_PDPE_LEVEL
with the kernel's PG_LEVEL_4K, PG_LEVEL_2M and PG_LEVEL_1G.  KVM's
enums are borderline impossible to remember and result in code that is
visually difficult to audit, e.g.

        if (!enable_ept)
                ept_lpage_level = 0;
        else if (cpu_has_vmx_ept_1g_page())
                ept_lpage_level = PT_PDPE_LEVEL;
        else if (cpu_has_vmx_ept_2m_page())
                ept_lpage_level = PT_DIRECTORY_LEVEL;
        else
                ept_lpage_level = PT_PAGE_TABLE_LEVEL;

versus

        if (!enable_ept)
                ept_lpage_level = 0;
        else if (cpu_has_vmx_ept_1g_page())
                ept_lpage_level = PG_LEVEL_1G;
        else if (cpu_has_vmx_ept_2m_page())
                ept_lpage_level = PG_LEVEL_2M;
        else
                ept_lpage_level = PG_LEVEL_4K;

No functional change intended.
Suggested-by: NBarret Rhoden <brho@google.com>
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200428005422.4235-4-sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

3bae0459

14 5月, 2020 16 次提交

KVM: x86/mmu: Capture TDP level when updating CPUID · e93fd3b3

由 Sean Christopherson 提交于 5月 01, 2020

Snapshot the TDP level now that it's invariant (SVM) or dependent only
on host capabilities and guest CPUID (VMX).  This avoids having to call
kvm_x86_ops.get_tdp_level() when initializing a TDP MMU and/or
calculating the page role, and thus avoids the associated retpoline.

Drop the WARN in vmx_get_tdp_level() as updating CPUID while L2 is
active is legal, if dodgy.

No functional change intended.
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200502043234.12481-11-sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

e93fd3b3

KVM: VMX: Move nested EPT out of kvm_x86_ops.get_tdp_level() hook · 0047fcad

由 Sean Christopherson 提交于 5月 01, 2020

Separate the "core" TDP level handling from the nested EPT path to make
it clear that kvm_x86_ops.get_tdp_level() is used if and only if nested
EPT is not in use (kvm_init_shadow_ept_mmu() calculates the level from
the passed in vmcs12->eptp).  Add a WARN_ON() to enforce that the
kvm_x86_ops hook is not called for nested EPT.

This sets the stage for snapshotting the non-"nested EPT" TDP page level
during kvm_cpuid_update() to avoid the retpoline associated with
kvm_x86_ops.get_tdp_level() when resetting the MMU, a relatively
frequent operation when running a nested guest.

No functional change intended.
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200502043234.12481-10-sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

0047fcad

KVM: VMX: Add proper cache tracking for CR0 · bd31fe49

由 Sean Christopherson 提交于 5月 01, 2020

Move CR0 caching into the standard register caching mechanism in order
to take advantage of the availability checks provided by regs_avail.
This avoids multiple VMREADs in the (uncommon) case where kvm_read_cr0()
is called multiple times in a single VM-Exit, and more importantly
eliminates a kvm_x86_ops hook, saves a retpoline on SVM when reading
CR0, and squashes the confusing naming discrepancy of "cache_reg" vs.
"decache_cr0_guest_bits".

No functional change intended.
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200502043234.12481-8-sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

bd31fe49

KVM: VMX: Add proper cache tracking for CR4 · f98c1e77

由 Sean Christopherson 提交于 5月 01, 2020

Move CR4 caching into the standard register caching mechanism in order
to take advantage of the availability checks provided by regs_avail.
This avoids multiple VMREADs and retpolines (when configured) during
nested VMX transitions as kvm_read_cr4_bits() is invoked multiple times
on each transition, e.g. when stuffing CR0 and CR3.

As an added bonus, this eliminates a kvm_x86_ops hook, saves a retpoline
on SVM when reading CR4, and squashes the confusing naming discrepancy
of "cache_reg" vs. "decache_cr4_guest_bits".

No functional change intended.
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200502043234.12481-7-sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

f98c1e77

KVM: x86: Save L1 TSC offset in 'struct kvm_vcpu_arch' · 56ba77a4

由 Sean Christopherson 提交于 5月 01, 2020

Save L1's TSC offset in 'struct kvm_vcpu_arch' and drop the kvm_x86_ops
hook read_l1_tsc_offset().  This avoids a retpoline (when configured)
when reading L1's effective TSC, which is done at least once on every
VM-Exit.

No functional change intended.
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200502043234.12481-2-sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

56ba77a4

KVM: nVMX: Skip IBPB when temporarily switching between vmcs01 and vmcs02 · 1af1bb05

由 Sean Christopherson 提交于 5月 06, 2020

Skip the Indirect Branch Prediction Barrier that is triggered on a VMCS
switch when temporarily loading vmcs02 to synchronize it to vmcs12, i.e.
give copy_vmcs02_to_vmcs12_rare() the same treatment as
vmx_switch_vmcs().

Make vmx_vcpu_load() static now that it's only referenced within vmx.c.
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200506235850.22600-3-sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

1af1bb05

KVM: nVMX: Skip IBPB when switching between vmcs01 and vmcs02 · 5c911bef

由 Sean Christopherson 提交于 5月 01, 2020

Skip the Indirect Branch Prediction Barrier that is triggered on a VMCS
switch when running with spectre_v2_user=on/auto if the switch is
between two VMCSes in the same guest, i.e. between vmcs01 and vmcs02.
The IBPB is intended to prevent one guest from attacking another, which
is unnecessary in the nested case as it's the same guest from KVM's
perspective.

This all but eliminates the overhead observed for nested VMX transitions
when running with CONFIG_RETPOLINE=y and spectre_v2_user=on/auto, which
can be significant, e.g. roughly 3x on current systems.
Reported-by: NAlexander Graf <graf@amazon.com>
Cc: KarimAllah Raslan <karahmed@amazon.de>
Cc: stable@vger.kernel.org
Fixes: 15d45071 ("KVM/x86: Add IBPB support")
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200501163117.4655-1-sean.j.christopherson@intel.com>
[Invert direction of bool argument. - Paolo]
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

5c911bef

KVM: VMX: Use accessor to read vmcs.INTR_INFO when handling exception · f27ad73a

由 Sean Christopherson 提交于 4月 27, 2020

Use vmx_get_intr_info() when grabbing the cached vmcs.INTR_INFO in
handle_exception_nmi() to ensure the cache isn't stale. Bypassing the
caching accessor doesn't cause any known issues as the cache is always
refreshed by handle_exception_nmi_irqoff(), but the whole point of
adding the proper caching mechanism was to avoid such dependencies.

Fixes: 87915858 ("KVM: VMX: Cache vmcs.EXIT_INTR_INFO using arch avail_reg flags")
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200427171837.22613-1-sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

f27ad73a

KVM: x86: handle wrap around 32-bit address space · fede8076

由 Paolo Bonzini 提交于 4月 27, 2020

KVM is not handling the case where EIP wraps around the 32-bit address
space (that is, outside long mode).  This is needed both in vmx.c
and in emulate.c.  SVM with NRIPS is okay, but it can still print
an error to dmesg due to integer overflow.
Reported-by: NNick Peterson <everdox@gmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

fede8076

KVM: x86: Replace late check_nested_events() hack with more precise fix · c300ab9f

由 Paolo Bonzini 提交于 4月 23, 2020

Add an argument to interrupt_allowed and nmi_allowed, to checking if
interrupt injection is blocked.  Use the hook to handle the case where
an interrupt arrives between check_nested_events() and the injection
logic.  Drop the retry of check_nested_events() that hack-a-fixed the
same condition.

Blocking injection is also a bit of a hack, e.g. KVM should do exiting
and non-exiting interrupt processing in a single pass, but it's a more
precise hack.  The old comment is also misleading, e.g. KVM_REQ_EVENT is
purely an optimization, setting it on every run loop (which KVM doesn't
do) should not affect functionality, only performance.
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200423022550.15113-13-sean.j.christopherson@intel.com>
[Extend to SVM, add SMI and NMI.  Even though NMI and SMI cannot come
 asynchronously right now, making the fix generic is easy and removes a
 special case. - Paolo]
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c300ab9f

KVM: VMX: Use vmx_get_rflags() to query RFLAGS in vmx_interrupt_blocked() · 7ab0abdb

由 Sean Christopherson 提交于 4月 22, 2020

Use vmx_get_rflags() instead of manually reading vmcs.GUEST_RFLAGS when
querying RFLAGS.IF so that multiple checks against interrupt blocking in
a single run loop only require a single VMREAD.
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200423022550.15113-14-sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

7ab0abdb

KVM: VMX: Use vmx_interrupt_blocked() directly from vmx_handle_exit() · db438592

由 Sean Christopherson 提交于 4月 22, 2020

Use vmx_interrupt_blocked() instead of bouncing through
vmx_interrupt_allowed() when handling edge cases in vmx_handle_exit().
The nested_run_pending check in vmx_interrupt_allowed() should never
evaluate true in the VM-Exit path.

Hoist the WARN in handle_invalid_guest_state() up to vmx_handle_exit()
to enforce the above assumption for the !enable_vnmi case, and to detect
any other potential bugs with nested VM-Enter.
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200423022550.15113-12-sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

db438592

KVM: VMX: Split out architectural interrupt/NMI blocking checks · 1b660b6b

由 Sean Christopherson 提交于 4月 22, 2020

Move the architectural (non-KVM specific) interrupt/NMI blocking checks
to a separate helper so that they can be used in a future patch by
vmx_check_nested_events().

No functional change intended.
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200423022550.15113-8-sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

1b660b6b

KVM: nVMX: Report NMIs as allowed when in L2 and Exit-on-NMI is set · 429ab576

由 Sean Christopherson 提交于 4月 22, 2020

Report NMIs as allowed when the vCPU is in L2 and L2 is being run with
Exit-on-NMI enabled, as NMIs are always unblocked from L1's perspective
in this case.
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200423022550.15113-7-sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

429ab576

KVM: x86: replace is_smm checks with kvm_x86_ops.smi_allowed · a9fa7cb6

由 Paolo Bonzini 提交于 4月 23, 2020

Do not hardcode is_smm so that all the architectural conditions for
blocking SMIs are listed in a single place. Well, in two places because
this introduces some code duplication between Intel and AMD.

This ensures that nested SVM obeys GIF in kvm_vcpu_has_events.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

a9fa7cb6

KVM: x86: Make return for {interrupt_nmi,smi}_allowed() a bool instead of int · 88c604b6

由 Sean Christopherson 提交于 4月 22, 2020

Return an actual bool for kvm_x86_ops' {interrupt_nmi}_allowed() hook to
better reflect the return semantics, and to avoid creating an even
bigger mess when the related VMX code is refactored in upcoming patches.
Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20200423022550.15113-5-sean.j.christopherson@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

88c604b6

13 5月, 2020 1 次提交

KVM: x86: Fix pkru save/restore when guest CR4.PKE=0, move it to x86.c · 37486135

由 Babu Moger 提交于 5月 12, 2020

Though rdpkru and wrpkru are contingent upon CR4.PKE, the PKRU
resource isn't. It can be read with XSAVE and written with XRSTOR.
So, if we don't set the guest PKRU value here(kvm_load_guest_xsave_state),
the guest can read the host value.

In case of kvm_load_host_xsave_state, guest with CR4.PKE clear could
potentially use XRSTOR to change the host PKRU value.

While at it, move pkru state save/restore to common code and the
host_pkru field to kvm_vcpu_arch.  This will let SVM support protection keys.

Cc: stable@vger.kernel.org
Reported-by: NJim Mattson <jmattson@google.com>
Signed-off-by: NBabu Moger <babu.moger@amd.com>
Message-Id: <158932794619.44260.14508381096663848853.stgit@naples-babu.amd.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

37486135

08 5月, 2020 1 次提交

KVM: VMX: pass correct DR6 for GD userspace exit · 45981ded

由 Paolo Bonzini 提交于 5月 06, 2020

When KVM_EXIT_DEBUG is raised for the disabled-breakpoints case (DR7.GD),
DR6 was incorrectly copied from the value in the VM. Instead,
DR6.BD should be set in order to catch this case.

On AMD this does not need any special code because the processor triggers
a #DB exception that is intercepted. However, the testcase would fail
without the previous patch because both DR6.BS and DR6.BD would be set.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

45981ded

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功