提交 · 73ad160693dc3baf230d76cf44c3207defad6e21 · openeuler / Kernel

15 3月, 2021 14 次提交

KVM: x86/mmu: WARN on NULL pae_root or lm_root, or bad shadow root level · 73ad1606

由 Sean Christopherson 提交于 3月 04, 2021

WARN if KVM is about to dereference a NULL pae_root or lm_root when
loading an MMU, and convert the BUG() on a bad shadow_root_level into a
WARN (now that errors are handled cleanly). With nested NPT, botching
the level and sending KVM down the wrong path is all too easy, and the
on-demand allocation of pae_root and lm_root means bugs crash the host.
Obviously, KVM could unconditionally allocate the roots, but that's
arguably a worse failure mode as it would potentially corrupt the guest
instead of crashing it.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210305011101.3597423-18-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

73ad1606

KVM: x86/mmu: Sync roots after MMU load iff load as successful · a91f387b

由 Sean Christopherson 提交于 3月 04, 2021

For clarity, explicitly skip syncing roots if the MMU load failed
instead of relying on the !VALID_PAGE check in kvm_mmu_sync_roots().
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210305011101.3597423-17-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

a91f387b

KVM: x86/mmu: Unexport MMU load/unload functions · 61a1773e

由 Sean Christopherson 提交于 3月 04, 2021

Unexport the MMU load and unload helpers now that they are no longer
used (incorrectly) in vendor code.

Opportunistically move the kvm_mmu_sync_roots() declaration into mmu.h,
it should not be exposed to vendor code.

No functional change intended.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210305011101.3597423-16-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

61a1773e

KVM: x86: Defer the MMU unload to the normal path on an global INVPCID · f66c53b3

由 Sean Christopherson 提交于 3月 04, 2021

Defer unloading the MMU after a INVPCID until the instruction emulation
has completed, i.e. until after RIP has been updated.

On VMX, this is a benign bug as VMX doesn't touch the MMU when skipping
an emulated instruction.  However, on SVM, if nrip is disabled, the
emulator is used to skip an instruction, which would lead to fireworks
if the emulator were invoked without a valid MMU.

Fixes: eb4b248e ("kvm: vmx: Support INVPCID in shadow paging mode")
Cc: stable@vger.kernel.org
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210305011101.3597423-15-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

f66c53b3

KVM: nVMX: Defer the MMU reload to the normal path on an EPTP switch · c805f5d5

由 Sean Christopherson 提交于 3月 04, 2021

Defer reloading the MMU after a EPTP successful EPTP switch.  The VMFUNC
instruction itself is executed in the previous EPTP context, any side
effects, e.g. updating RIP, should occur in the old context.  Practically
speaking, this bug is benign as VMX doesn't touch the MMU when skipping
an emulated instruction, nor does queuing a single-step #DB.  No other
post-switch side effects exist.

Fixes: 41ab9372 ("KVM: nVMX: Emulate EPTP switching for the L1 hypervisor")
Cc: stable@vger.kernel.org
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210305011101.3597423-14-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c805f5d5

KVM: x86/mmu: Set the C-bit in the PDPTRs and LM pseudo-PDPTRs · 17e368d9

由 Sean Christopherson 提交于 3月 04, 2021

Set the C-bit in SPTEs that are set outside of the normal MMU flows,
specifically the PDPDTRs and the handful of special cased "LM root"
entries, all of which are shadow paging only.

Note, the direct-mapped-root PDPTR handling is needed for the scenario
where paging is disabled in the guest, in which case KVM uses a direct
mapped MMU even though TDP is disabled.

Fixes: d0ec49d4 ("kvm/x86/svm: Support Secure Memory Encryption within KVM")
Cc: stable@vger.kernel.org
Cc: Brijesh Singh <brijesh.singh@amd.com>
Cc: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210305011101.3597423-11-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

17e368d9

KVM: x86/mmu: Fix and unconditionally enable WARNs to detect PAE leaks · e49e0b7b

由 Sean Christopherson 提交于 3月 04, 2021

Exempt NULL PAE roots from the check to detect leaks, since
kvm_mmu_free_roots() doesn't set them back to INVALID_PAGE.  Stop hiding
the WARNs to detect PAE root leaks behind MMU_WARN_ON, the hidden WARNs
obviously didn't do their job given the hilarious number of bugs that
could lead to PAE roots being leaked, not to mention the above false
positive.

Opportunistically delete a warning on root_hpa being valid, there's
nothing special about 4/5-level shadow pages that warrants a WARN.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210305011101.3597423-9-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

e49e0b7b

KVM: x86/mmu: Check PDPTRs before allocating PAE roots · 6e0918ae

由 Sean Christopherson 提交于 3月 04, 2021

Check the validity of the PDPTRs before allocating any of the PAE roots,
otherwise a bad PDPTR will cause KVM to leak any previously allocated
roots.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210305011101.3597423-8-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

6e0918ae

KVM: x86/mmu: Ensure MMU pages are available when allocating roots · 6e6ec584

由 Sean Christopherson 提交于 3月 04, 2021

Hold the mmu_lock for write for the entire duration of allocating and
initializing an MMU's roots.  This ensures there are MMU pages available
and thus prevents root allocations from failing.  That in turn fixes a
bug where KVM would fail to free valid PAE roots if a one of the later
roots failed to allocate.

Add a comment to make_mmu_pages_available() to call out that the limit
is a soft limit, e.g. KVM will temporarily exceed the threshold if a
page fault allocates multiple shadow pages and there was only one page
"available".

Note, KVM _still_ leaks the PAE roots if the guest PDPTR checks fail.
This will be addressed in a future commit.

Cc: Ben Gardon <bgardon@google.com>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210305011101.3597423-7-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

6e6ec584

KVM: x86/mmu: Allocate pae_root and lm_root pages in dedicated helper · 748e52b9

由 Sean Christopherson 提交于 3月 04, 2021

Move the on-demand allocation of the pae_root and lm_root pages, used by
nested NPT for 32-bit L1s, into a separate helper.  This will allow a
future patch to hold mmu_lock while allocating the non-special roots so
that make_mmu_pages_available() can be checked once at the start of root
allocation, and thus avoid having to deal with failure in the middle of
root allocation.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210305011101.3597423-6-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

748e52b9

KVM: x86/mmu: Allocate the lm_root before allocating PAE roots · ba0a194f

由 Sean Christopherson 提交于 3月 04, 2021

Allocate lm_root before the PAE roots so that the PAE roots aren't
leaked if the memory allocation for the lm_root happens to fail.

Note, KVM can still leak PAE roots if mmu_check_root() fails on a guest's
PDPTR, or if mmu_alloc_root() fails due to MMU pages not being available.
Those issues will be fixed in future commits.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210305011101.3597423-5-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

ba0a194f

KVM: x86/mmu: Capture 'mmu' in a local variable when allocating roots · b37233c9

由 Sean Christopherson 提交于 3月 04, 2021

Grab 'mmu' and do s/vcpu->arch.mmu/mmu to shorten line lengths and yield
smaller diffs when moving code around in future cleanup without forcing
the new code to use the same ugly pattern.

No functional change intended.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210305011101.3597423-4-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

b37233c9

KVM: x86/mmu: Alloc page for PDPTEs when shadowing 32-bit NPT with 64-bit · 04d45551

由 Sean Christopherson 提交于 3月 04, 2021

Allocate the so called pae_root page on-demand, along with the lm_root
page, when shadowing 32-bit NPT with 64-bit NPT, i.e. when running a
32-bit L1.  KVM currently only allocates the page when NPT is disabled,
or when L0 is 32-bit (using PAE paging).

Note, there is an existing memory leak involving the MMU roots, as KVM
fails to free the PAE roots on failure.  This will be addressed in a
future commit.

Fixes: ee6268ba ("KVM: x86: Skip pae_root shadow allocation if tdp enabled")
Fixes: b6b80c78 ("KVM: x86/mmu: Allocate PAE root array when using SVM's 32-bit NPT")
Cc: stable@vger.kernel.org
Reviewed-by: NBen Gardon <bgardon@google.com>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210305011101.3597423-3-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

04d45551

KVM: x86: to track if L1 is running L2 VM · 43c11d91

由 Dongli Zhang 提交于 3月 05, 2021

The new per-cpu stat 'nested_run' is introduced in order to track if L1 VM
is running or used to run L2 VM.

An example of the usage of 'nested_run' is to help the host administrator
to easily track if any L1 VM is used to run L2 VM. Suppose there is issue
that may happen with nested virtualization, the administrator will be able
to easily narrow down and confirm if the issue is due to nested
virtualization via 'nested_run'. For example, whether the fix like
commit 88dddc11 ("KVM: nVMX: do not use dangling shadow VMCS after
guest reset") is required.

Cc: Joe Jin <joe.jin@oracle.com>
Signed-off-by: NDongli Zhang <dongli.zhang@oracle.com>
Message-Id: <20210305225747.7682-1-dongli.zhang@oracle.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

43c11d91

13 3月, 2021 2 次提交

KVM: LAPIC: Advancing the timer expiration on guest initiated write · 35737d2d

由 Wanpeng Li 提交于 3月 04, 2021

Advancing the timer expiration should only be necessary on guest initiated
writes. When we cancel the timer and clear .pending during state restore,
clear expired_tscdeadline as well.
Reviewed-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
Message-Id: <1614818118-965-1-git-send-email-wanpengli@tencent.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

35737d2d

KVM: x86/mmu: Skip !MMU-present SPTEs when removing SP in exclusive mode · 8df9f1af

由 Sean Christopherson 提交于 3月 09, 2021

If mmu_lock is held for write, don't bother setting !PRESENT SPTEs to
REMOVED_SPTE when recursively zapping SPTEs as part of shadow page
removal.  The concurrent write protections provided by REMOVED_SPTE are
not needed, there are no backing page side effects to record, and MMIO
SPTEs can be left as is since they are protected by the memslot
generation, not by ensuring that the MMIO SPTE is unreachable (which
is racy with respect to lockless walks regardless of zapping behavior).

Skipping !PRESENT drastically reduces the number of updates needed to
tear down sparsely populated MMUs, e.g. when tearing down a 6gb VM that
didn't touch much memory, 6929/7168 (~96.6%) of SPTEs were '0' and could
be skipped.

Avoiding the write itself is likely close to a wash, but avoiding
__handle_changed_spte() is a clear-cut win as that involves saving and
restoring all non-volatile GPRs (it's a subtly big function), as well as
several conditional branches before bailing out.

Cc: Ben Gardon <bgardon@google.com>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210310003029.1250571-1-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

8df9f1af

10 3月, 2021 1 次提交

x86/perf: Use RET0 as default for guest_get_msrs to handle "no PMU" case · c8e2fe13

由 Sean Christopherson 提交于 3月 09, 2021

Initialize x86_pmu.guest_get_msrs to return 0/NULL to handle the "nop"
case.  Patching in perf_guest_get_msrs_nop() during setup does not work
if there is no PMU, as setup bails before updating the static calls,
leaving x86_pmu.guest_get_msrs NULL and thus a complete nop.  Ultimately,
this causes VMX abort on VM-Exit due to KVM putting random garbage from
the stack into the MSR load list.

Add a comment in KVM to note that nr_msrs is valid if and only if the
return value is non-NULL.

Fixes: abd562df ("x86/perf: Use static_call for x86_pmu.guest_get_msrs")
Reported-by: NDmitry Vyukov <dvyukov@google.com>
Reported-by: syzbot+cce9ef2dd25246f815ee@syzkaller.appspotmail.com
Suggested-by: NPeter Zijlstra <peterz@infradead.org>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20210309171019.1125243-1-seanjc@google.com

c8e2fe13

06 3月, 2021 1 次提交

kvm: x86: use NULL instead of using plain integer as pointer · 46914534

由 Muhammad Usama Anjum 提交于 3月 05, 2021

Sparse warnings removed:
warning: Using plain integer as NULL pointer
Signed-off-by: NMuhammad Usama Anjum <musamaanjum@gmail.com>
Message-Id: <20210305180816.GA488770@LEGION>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

46914534

05 3月, 2021 2 次提交

KVM: SVM: Connect 'npt' module param to KVM's internal 'npt_enabled' · 99840a75

由 Sean Christopherson 提交于 3月 04, 2021

Directly connect the 'npt' param to the 'npt_enabled' variable so that
runtime adjustments to npt_enabled are reflected in sysfs.  Move the
!PAE restriction to a runtime check to ensure NPT is forced off if the
host is using 2-level paging, and add a comment explicitly stating why
NPT requires a 64-bit kernel or a kernel with PAE enabled.

Opportunistically switch the param to octal permissions.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210305021637.3768573-1-seanjc@google.com>
Reviewed-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

99840a75

KVM: x86: Ensure deadline timer has truly expired before posting its IRQ · beda4301

由 Sean Christopherson 提交于 3月 04, 2021

When posting a deadline timer interrupt, open code the checks guarding
__kvm_wait_lapic_expire() in order to skip the lapic_timer_int_injected()
check in kvm_wait_lapic_expire().  The injection check will always fail
since the interrupt has not yet be injected.  Moving the call after
injection would also be wrong as that wouldn't actually delay delivery
of the IRQ if it is indeed sent via posted interrupt.

Fixes: 010fd37f ("KVM: LAPIC: Reduce world switch latency caused by timer_advance_ns")
Cc: stable@vger.kernel.org
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210305021808.3769732-1-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

beda4301

03 3月, 2021 4 次提交

KVM: SVM: Clear the CR4 register on reset · 9e46f6c6

由 Babu Moger 提交于 3月 02, 2021

This problem was reported on a SVM guest while executing kexec.
Kexec fails to load the new kernel when the PCID feature is enabled.

When kexec starts loading the new kernel, it starts the process by
resetting the vCPU's and then bringing each vCPU online one by one.
The vCPU reset is supposed to reset all the register states before the
vCPUs are brought online. However, the CR4 register is not reset during
this process. If this register is already setup during the last boot,
all the flags can remain intact. The X86_CR4_PCIDE bit can only be
enabled in long mode. So, it must be enabled much later in SMP
initialization. Having the X86_CR4_PCIDE bit set during SMP boot can
cause a boot failures.

Fix the issue by resetting the CR4 register in init_vmcb().
Signed-off-by: NBabu Moger <babu.moger@amd.com>
Message-Id: <161471109108.30811.6392805173629704166.stgit@bmoger-ubuntu>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

9e46f6c6

KVM: x86/xen: Add support for vCPU runstate information · 30b5c851

由 David Woodhouse 提交于 3月 01, 2021

This is how Xen guests do steal time accounting. The hypervisor records
the amount of time spent in each of running/runnable/blocked/offline
states.

In the Xen accounting, a vCPU is still in state RUNSTATE_running while
in Xen for a hypercall or I/O trap, etc. Only if Xen explicitly schedules
does the state become RUNSTATE_blocked. In KVM this means that even when
the vCPU exits the kvm_run loop, the state remains RUNSTATE_running.

The VMM can explicitly set the vCPU to RUNSTATE_blocked by using the
KVM_XEN_VCPU_ATTR_TYPE_RUNSTATE_CURRENT attribute, and can also use
KVM_XEN_VCPU_ATTR_TYPE_RUNSTATE_ADJUST to retrospectively add a given
amount of time to the blocked state and subtract it from the running
state.

The state_entry_time corresponds to get_kvmclock_ns() at the time the
vCPU entered the current state, and the total times of all four states
should always add up to state_entry_time.
Co-developed-by: NJoao Martins <joao.m.martins@oracle.com>
Signed-off-by: NJoao Martins <joao.m.martins@oracle.com>
Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk>
Message-Id: <20210301125309.874953-2-dwmw2@infradead.org>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

30b5c851

KVM: x86/xen: Fix return code when clearing vcpu_info and vcpu_time_info · 7d7c5f76

由 David Woodhouse 提交于 3月 01, 2021

When clearing the per-vCPU shared regions, set the return value to zero
to indicate success. This was causing spurious errors to be returned to
userspace on soft reset.

Also add a paranoid BUILD_BUG_ON() for compat structure compatibility.

Fixes: 0c165b3c ("KVM: x86/xen: Allow reset of Xen attributes")
Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk>
Message-Id: <20210301125309.874953-1-dwmw2@infradead.org>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

7d7c5f76

KVM: x86: allow compiling out the Xen hypercall interface · b59b153d

由 Paolo Bonzini 提交于 2月 26, 2021

The Xen hypercall interface adds to the attack surface of the hypervisor
and will be used quite rarely.  Allow compiling it out.
Suggested-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDavid Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

b59b153d

26 2月, 2021 3 次提交

KVM: xen: flush deferred static key before checking it · c462f859

由 Paolo Bonzini 提交于 2月 26, 2021

A missing flush would cause the static branch to trigger incorrectly.

Cc: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c462f859

KVM: x86/mmu: Set SPTE_AD_WRPROT_ONLY_MASK if and only if PML is enabled · 44ac5958

由 Sean Christopherson 提交于 2月 25, 2021

Check that PML is actually enabled before setting the mask to force a
SPTE to be write-protected.  The bits used for the !AD_ENABLED case are
in the upper half of the SPTE.  With 64-bit paging and EPT, these bits
are ignored, but with 32-bit PAE paging they are reserved.  Setting them
for L2 SPTEs without checking PML breaks NPT on 32-bit KVM.

Fixes: 1f4e5fc8 ("KVM: x86: fix nested guest live migration with PML")
Cc: stable@vger.kernel.org
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210225204749.1512652-2-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

44ac5958

KVM: x86: hyper-v: Fix Hyper-V context null-ptr-deref · 919f4ebc

由 Wanpeng Li 提交于 2月 26, 2021

Reported by syzkaller:

    KASAN: null-ptr-deref in range [0x0000000000000140-0x0000000000000147]
    CPU: 1 PID: 8370 Comm: syz-executor859 Not tainted 5.11.0-syzkaller #0
    RIP: 0010:synic_get arch/x86/kvm/hyperv.c:165 [inline]
    RIP: 0010:kvm_hv_set_sint_gsi arch/x86/kvm/hyperv.c:475 [inline]
    RIP: 0010:kvm_hv_irq_routing_update+0x230/0x460 arch/x86/kvm/hyperv.c:498
    Call Trace:
     kvm_set_irq_routing+0x69b/0x940 arch/x86/kvm/../../../virt/kvm/irqchip.c:223
     kvm_vm_ioctl+0x12d0/0x2800 arch/x86/kvm/../../../virt/kvm/kvm_main.c:3959
     vfs_ioctl fs/ioctl.c:48 [inline]
     __do_sys_ioctl fs/ioctl.c:753 [inline]
     __se_sys_ioctl fs/ioctl.c:739 [inline]
     __x64_sys_ioctl+0x193/0x200 fs/ioctl.c:739
     do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
     entry_SYSCALL_64_after_hwframe+0x44/0xae

Hyper-V context is lazily allocated until Hyper-V specific MSRs are accessed
or SynIC is enabled. However, the syzkaller testcase sets irq routing table
directly w/o enabling SynIC. This results in null-ptr-deref when accessing
SynIC Hyper-V context. This patch fixes it.

syzkaller source: https://syzkaller.appspot.com/x/repro.c?x=163342ccd00000

Reported-by: syzbot+6987f3b2dbd9eda95f12@syzkaller.appspotmail.com
Fixes: 8f014550 ("KVM: x86: hyper-v: Make Hyper-V emulation enablement conditional")
Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
Message-Id: <1614326399-5762-1-git-send-email-wanpengli@tencent.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

919f4ebc

25 2月, 2021 1 次提交

KVM: SVM: Fix nested VM-Exit on #GP interception handling · 2df8d380

由 Sean Christopherson 提交于 2月 23, 2021

Fix the interpreation of nested_svm_vmexit()'s return value when
synthesizing a nested VM-Exit after intercepting an SVM instruction while
L2 was running. The helper returns '0' on success, whereas a return
value of '0' in the exit handler path means "exit to userspace". The
incorrect return value causes KVM to exit to userspace without filling
the run state, e.g. QEMU logs "KVM: unknown exit, hardware reason 0".

Fixes: 14c2bf81 ("KVM: SVM: Fix #GP handling for doubly-nested virtualization")
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210224005627.657028-1-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

2df8d380

24 2月, 2021 1 次提交

KVM: vmx/pmu: Fix dummy check if lbr_desc->event is created · 67b45af9

由 Like Xu 提交于 2月 23, 2021

If lbr_desc->event is successfully created, the intel_pmu_create_
guest_lbr_event() will return 0, otherwise it will return -ENOENT,
and then jump to LBR msrs dummy handling.

Fixes: 1b5ac322 ("KVM: vmx/pmu: Pass-through LBR msrs when the guest LBR event is ACTIVE")
Signed-off-by: NLike Xu <like.xu@linux.intel.com>
Message-Id: <20210223013958.1280444-1-like.xu@linux.intel.com>
[Add "< 0" and PTR_ERR to make the code clearer. - Paolo]
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

67b45af9

23 2月, 2021 3 次提交

KVM: x86/mmu: Consider the hva in mmu_notifier retry · 4a42d848

由 David Stevens 提交于 2月 22, 2021

Track the range being invalidated by mmu_notifier and skip page fault
retries if the fault address is not affected by the in-progress
invalidation. Handle concurrent invalidations by finding the minimal
range which includes all ranges being invalidated. Although the combined
range may include unrelated addresses and cannot be shrunk as individual
invalidation operations complete, it is unlikely the marginal gains of
proper range tracking are worth the additional complexity.

The primary benefit of this change is the reduction in the likelihood of
extreme latency when handing a page fault due to another thread having
been preempted while modifying host virtual addresses.
Signed-off-by: NDavid Stevens <stevensd@chromium.org>
Message-Id: <20210222024522.1751719-3-stevensd@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

4a42d848

KVM: x86/mmu: Skip mmu_notifier check when handling MMIO page fault · 5f8a7cf2

由 Sean Christopherson 提交于 2月 22, 2021

Don't retry a page fault due to an mmu_notifier invalidation when
handling a page fault for a GPA that did not resolve to a memslot, i.e.
an MMIO page fault.  Invalidations from the mmu_notifier signal a change
in a host virtual address (HVA) mapping; without a memslot, there is no
HVA and thus no possibility that the invalidation is relevant to the
page fault being handled.

Note, the MMIO vs. memslot generation checks handle the case where a
pending memslot will create a memslot overlapping the faulting GPA.  The
mmu_notifier checks are orthogonal to memslot updates.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210222024522.1751719-2-stevensd@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

5f8a7cf2

KVM: nSVM: prepare guest save area while is_guest_mode is true · d2df592f

由 Paolo Bonzini 提交于 2月 18, 2021

Right now, enter_svm_guest_mode is calling nested_prepare_vmcb_save and
nested_prepare_vmcb_control.  This results in is_guest_mode being false
until the end of nested_prepare_vmcb_control.

This is a problem because nested_prepare_vmcb_save can in turn cause
changes to the intercepts and these have to be applied to the "host VMCB"
(stored in svm->nested.hsave) and then merged with the VMCB12 intercepts
into svm->vmcb.

In particular, without this change we forget to set the CR0 read and CR0
write intercepts when running a real mode L2 guest with NPT disabled.
The guest is therefore able to see the CR0.PG bit that KVM sets to
enable "paged real mode".  This patch fixes the svm.flat mode_switch
test case with npt=0.  There are no other problematic calls in
nested_prepare_vmcb_save.

Moving is_guest_mode to the end is done since commit 06fc7772
("KVM: SVM: Activate nested state only when guest state is complete",
2010-04-25).  However, back then KVM didn't grab a different VMCB
when updating the intercepts, it had already copied/merged L1's stuff
to L0's VMCB, and then updated L0's VMCB regardless of is_nested().
Later recalc_intercepts was introduced in commit 384c6368
("KVM: SVM: Add function to recalculate intercept masks", 2011-01-12).
This introduced the bug, because recalc_intercepts now throws away
the intercept manipulations that svm_set_cr0 had done in the meanwhile
to svm->vmcb.

[1] https://lore.kernel.org/kvm/1266493115-28386-1-git-send-email-joerg.roedel@amd.com/Reviewed-by: NSean Christopherson <seanjc@google.com>
Tested-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

d2df592f

19 2月, 2021 8 次提交

KVM: x86/mmu: Remove a variety of unnecessary exports · 96ad91ae

由 Sean Christopherson 提交于 2月 12, 2021

Remove several exports from the MMU that are no longer necessary.

No functional change intended.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210213005015.1651772-15-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

96ad91ae

KVM: x86: Fold "write-protect large" use case into generic write-protect · a1419f8b

由 Sean Christopherson 提交于 2月 12, 2021

Drop kvm_mmu_slot_largepage_remove_write_access() and refactor its sole
caller to use kvm_mmu_slot_remove_write_access().  Remove the now-unused
slot_handle_large_level() and slot_handle_all_level() helpers.

No functional change intended.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210213005015.1651772-14-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

a1419f8b

KVM: x86/mmu: Don't set dirty bits when disabling dirty logging w/ PML · b6e16ae5

由 Sean Christopherson 提交于 2月 12, 2021

Stop setting dirty bits for MMU pages when dirty logging is disabled for
a memslot, as PML is now completely disabled when there are no memslots
with dirty logging enabled.

This means that spurious PML entries will be created for memslots with
dirty logging disabled if at least one other memslot has dirty logging
enabled. However, spurious PML entries are already possible since
dirty bits are set only when a dirty logging is turned off, i.e. memslots
that are never dirty logged will have dirty bits cleared.

In the end, it's faster overall to eat a few spurious PML entries in the
window where dirty logging is being disabled across all memslots.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210213005015.1651772-13-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

b6e16ae5

KVM: VMX: Dynamically enable/disable PML based on memslot dirty logging · a85863c2

由 Makarand Sonare 提交于 2月 12, 2021

Currently, if enable_pml=1 PML remains enabled for the entire lifetime
of the VM irrespective of whether dirty logging is enable or disabled.
When dirty logging is disabled, all the pages of the VM are manually
marked dirty, so that PML is effectively non-operational. Setting
the dirty bits is an expensive operation which can cause severe MMU
lock contention in a performance sensitive path when dirty logging is
disabled after a failed or canceled live migration.

Manually setting dirty bits also fails to prevent PML activity if some
code path clears dirty bits, which can incur unnecessary VM-Exits.

In order to avoid this extra overhead, dynamically enable/disable PML
when dirty logging gets turned on/off for the first/last memslot.
Signed-off-by: NMakarand Sonare <makarandsonare@google.com>
Co-developed-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210213005015.1651772-12-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

a85863c2

KVM: x86: Further clarify the logic and comments for toggling log dirty · 52f46079

由 Sean Christopherson 提交于 2月 12, 2021

Add a sanity check in kvm_mmu_slot_apply_flags to assert that the
LOG_DIRTY_PAGES flag is indeed being toggled, and explicitly rely on
that holding true when zapping collapsible SPTEs.  Manipulating the
CPU dirty log (PML) and write-protection also relies on this assertion,
but that's not obvious in the current code.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210213005015.1651772-11-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

52f46079

KVM: x86: Move MMU's PML logic to common code · a018eba5

由 Sean Christopherson 提交于 2月 12, 2021

Drop the facade of KVM's PML logic being vendor specific and move the
bits that aren't truly VMX specific into common x86 code.  The MMU logic
for dealing with PML is tightly coupled to the feature and to VMX's
implementation, bouncing through kvm_x86_ops obfuscates the code without
providing any meaningful separation of concerns or encapsulation.

No functional change intended.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210213005015.1651772-10-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

a018eba5

KVM: x86/mmu: Make dirty log size hook (PML) a value, not a function · 6dd03800

由 Sean Christopherson 提交于 2月 12, 2021

Store the vendor-specific dirty log size in a variable, there's no need
to wrap it in a function since the value is constant after
hardware_setup() runs.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210213005015.1651772-9-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

6dd03800

KVM: x86/mmu: Expand on the comment in kvm_vcpu_ad_need_write_protect() · 2855f982

由 Sean Christopherson 提交于 2月 12, 2021

Expand the comment about need to use write-protection for nested EPT
when PML is enabled to clarify that the tagging is a nop when PML is
_not_ enabled.  Without the clarification, omitting the PML check looks
wrong at first^Wfifth glance.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20210213005015.1651772-8-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

2855f982

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功