提交 · 982ed0de4753ed6e71dbd40f82a5a066baf133ed · openeuler / Kernel

07 1月, 2022 19 次提交

KVM: Reinstate gfn_to_pfn_cache with invalidation support · 982ed0de

由 David Woodhouse 提交于 12月 10, 2021

This can be used in two modes. There is an atomic mode where the cached
mapping is accessed while holding the rwlock, and a mode where the
physical address is used by a vCPU in guest mode.

For the latter case, an invalidation will wake the vCPU with the new
KVM_REQ_GPC_INVALIDATE, and the architecture will need to refresh any
caches it still needs to access before entering guest mode again.

Only one vCPU can be targeted by the wake requests; it's simple enough
to make it wake all vCPUs or even a mask but I don't see a use case for
that additional complexity right now.

Invalidation happens from the invalidate_range_start MMU notifier, which
needs to be able to sleep in order to wake the vCPU and wait for it.

This means that revalidation potentially needs to "wait" for the MMU
operation to complete and the invalidate_range_end notifier to be
invoked. Like the vCPU when it takes a page fault in that period, we
just spin — fixing that in a future patch by implementing an actual
*wait* may be another part of shaving this particularly hirsute yak.

As noted in the comments in the function itself, the only case where
the invalidate_range_start notifier is expected to be called *without*
being able to sleep is when the OOM reaper is killing the process. In
that case, we expect the vCPU threads already to have exited, and thus
there will be nothing to wake, and no reason to wait. So we clear the
KVM_REQUEST_WAIT bit and send the request anyway, then complain loudly
if there actually *was* anything to wake up.
Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk>
Message-Id: <20211210163625.2886-3-dwmw2@infradead.org>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

982ed0de

KVM: Warn if mark_page_dirty() is called without an active vCPU · 2efd61a6

由 David Woodhouse 提交于 12月 10, 2021

The various kvm_write_guest() and mark_page_dirty() functions must only
ever be called in the context of an active vCPU, because if dirty ring
tracking is enabled it may simply oops when kvm_get_running_vcpu()
returns NULL for the vcpu and then kvm_dirty_ring_get() dereferences it.

This oops was reported by "butt3rflyh4ck" <butterflyhuangxx@gmail.com> in
https://lore.kernel.org/kvm/CAFcO6XOmoS7EacN_n6v4Txk7xL7iqRa2gABg3F7E3Naf5uG94g@mail.gmail.com/

That actual bug will be fixed under separate cover but this warning
should help to prevent new ones from being added.
Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk>
Message-Id: <20211210163625.2886-2-dwmw2@infradead.org>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

2efd61a6

x86/kvm: Silence per-cpu pr_info noise about KVM clocks and steal time · f3f26dae

由 David Woodhouse 提交于 12月 09, 2021

I made the actual CPU bringup go nice and fast... and then Linux spends
half a minute printing stupid nonsense about clocks and steal time for
each of 256 vCPUs. Don't do that. Nobody cares.
Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk>
Message-Id: <20211209150938.3518-12-dwmw2@infradead.org>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

f3f26dae

KVM: x86: Update vPMCs when retiring branch instructions · 018d70ff

由 Eric Hankland 提交于 11月 30, 2021

When KVM retires a guest branch instruction through emulation,
increment any vPMCs that are configured to monitor "branch
instructions retired," and update the sample period of those counters
so that they will overflow at the right time.
Signed-off-by: NEric Hankland <ehankland@google.com>
[jmattson:
  - Split the code to increment "branch instructions retired" into a
    separate commit.
  - Moved/consolidated the calls to kvm_pmu_trigger_event() in the
    emulation of VMLAUNCH/VMRESUME to accommodate the evolution of
    that code.
]
Fixes: f5132b01 ("KVM: Expose a version 2 architectural PMU to a guests")
Signed-off-by: NJim Mattson <jmattson@google.com>
Message-Id: <20211130074221.93635-7-likexu@tencent.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

018d70ff

KVM: x86: Update vPMCs when retiring instructions · 9cd803d4

由 Eric Hankland 提交于 11月 30, 2021

When KVM retires a guest instruction through emulation, increment any
vPMCs that are configured to monitor "instructions retired," and
update the sample period of those counters so that they will overflow
at the right time.
Signed-off-by: NEric Hankland <ehankland@google.com>
[jmattson:
  - Split the code to increment "branch instructions retired" into a
    separate commit.
  - Added 'static' to kvm_pmu_incr_counter() definition.
  - Modified kvm_pmu_incr_counter() to check pmc->perf_event->state ==
    PERF_EVENT_STATE_ACTIVE.
]
Fixes: f5132b01 ("KVM: Expose a version 2 architectural PMU to a guests")
Signed-off-by: NJim Mattson <jmattson@google.com>
[likexu:
  - Drop checks for pmc->perf_event or event state or event type
  - Increase a counter once its umask bits and the first 8 select bits are matched
  - Rewrite kvm_pmu_incr_counter() with a less invasive approach to the host perf;
  - Rename kvm_pmu_record_event to kvm_pmu_trigger_event;
  - Add counter enable and CPL check for kvm_pmu_trigger_event();
]
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: NLike Xu <likexu@tencent.com>
Message-Id: <20211130074221.93635-6-likexu@tencent.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

9cd803d4

KVM: x86/pmu: Add pmc->intr to refactor kvm_perf_overflow{_intr}() · 40ccb96d

由 Like Xu 提交于 11月 30, 2021

Depending on whether intr should be triggered or not, KVM registers
two different event overflow callbacks in the perf_event context.

The code skeleton of these two functions is very similar, so
the pmc->intr can be stored into pmc from pmc_reprogram_counter()
which provides smaller instructions footprint against the
u-architecture branch predictor.

The __kvm_perf_overflow() can be called in non-nmi contexts
and a flag is needed to distinguish the caller context and thus
avoid a check on kvm_is_in_guest(), otherwise we might get
warnings from suspicious RCU or check_preemption_disabled().
Suggested-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NLike Xu <likexu@tencent.com>
Message-Id: <20211130074221.93635-5-likexu@tencent.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

40ccb96d

KVM: x86/pmu: Reuse pmc_perf_hw_id() and drop find_fixed_event() · 6ed1298e

由 Like Xu 提交于 11月 30, 2021

Since we set the same semantic event value for the fixed counter in
pmc->eventsel, returning the perf_hw_id for the fixed counter via
find_fixed_event() can be painlessly replaced by pmc_perf_hw_id()
with the help of pmc_is_fixed() check.
Signed-off-by: NLike Xu <likexu@tencent.com>
Message-Id: <20211130074221.93635-4-likexu@tencent.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

6ed1298e

KVM: x86/pmu: Refactoring find_arch_event() to pmc_perf_hw_id() · 7c174f30

由 Like Xu 提交于 11月 30, 2021

The find_arch_event() returns a "unsigned int" value,
which is used by the pmc_reprogram_counter() to
program a PERF_TYPE_HARDWARE type perf_event.

The returned value is actually the kernel defined generic
perf_hw_id, let's rename it to pmc_perf_hw_id() with simpler
incoming parameters for better self-explanation.
Signed-off-by: NLike Xu <likexu@tencent.com>
Message-Id: <20211130074221.93635-3-likexu@tencent.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

7c174f30

KVM: x86/pmu: Setup pmc->eventsel for fixed PMCs · 76187563

由 Like Xu 提交于 11月 30, 2021

The current pmc->eventsel for fixed counter is underutilised. The
pmc->eventsel can be setup for all known available fixed counters
since we have mapping between fixed pmc index and
the intel_arch_events array.

Either gp or fixed counter, it will simplify the later checks for
consistency between eventsel and perf_hw_id.
Signed-off-by: NLike Xu <likexu@tencent.com>
Message-Id: <20211130074221.93635-2-likexu@tencent.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

76187563

KVM: x86: avoid out of bounds indices for fixed performance counters · 006a0f06

由 Paolo Bonzini 提交于 12月 09, 2021

Because IceLake has 4 fixed performance counters but KVM only
supports 3, it is possible for reprogram_fixed_counters to pass
to reprogram_fixed_counter an index that is out of bounds for the
fixed_pmc_events array.

Ultimately intel_find_fixed_event, which is the only place that uses
fixed_pmc_events, handles this correctly because it checks against the
size of fixed_pmc_events anyway.  Every other place operates on the
fixed_counters[] array which is sized according to INTEL_PMC_MAX_FIXED.
However, it is cleaner if the unsupported performance counters are culled
early on in reprogram_fixed_counters.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

006a0f06

KVM: VMX: Mark VCPU_EXREG_CR3 dirty when !CR0_PG -> CR0_PG if EPT + !URG · 5b61178c

由 Lai Jiangshan 提交于 12月 16, 2021

When !CR0_PG -> CR0_PG, vcpu->arch.cr3 becomes active, but GUEST_CR3 is
still vmx->ept_identity_map_addr if EPT + !URG. So VCPU_EXREG_CR3 is
considered to be dirty and GUEST_CR3 needs to be updated in this case.
Reported-by: NMaxim Levitsky <mlevitsk@redhat.com>
Suggested-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NLai Jiangshan <laijs@linux.alibaba.com>
Message-Id: <20211216021938.11752-4-jiangshanlai@gmail.com>
Fixes: c62c7bd4 ("KVM: VMX: Update vmcs.GUEST_CR3 only when the guest CR3 is dirty")
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

5b61178c

KVM: x86/mmu: Reconstruct shadow page root if the guest PDPTEs is changed · 6b123c3a

由 Lai Jiangshan 提交于 12月 16, 2021

For shadow paging, the page table needs to be reconstructed before the
coming VMENTER if the guest PDPTEs is changed.

But not all paths that call load_pdptrs() will cause the page tables to be
reconstructed. Normally, kvm_mmu_reset_context() and kvm_mmu_free_roots()
are used to launch later reconstruction.

The commit d81135a5("KVM: x86: do not reset mmu if CR0.CD and
CR0.NW are changed") skips kvm_mmu_reset_context() after load_pdptrs()
when changing CR0.CD and CR0.NW.

The commit 21823fbd("KVM: x86: Invalidate all PGDs for the current
PCID on MOV CR3 w/ flush") skips kvm_mmu_free_roots() after
load_pdptrs() when rewriting the CR3 with the same value.

The commit a91a7c70("KVM: X86: Don't reset mmu context when
toggling X86_CR4_PGE") skips kvm_mmu_reset_context() after
load_pdptrs() when changing CR4.PGE.

Guests like linux would keep the PDPTEs unchanged for every instance of
pagetable, so this missing reconstruction has no problem for linux
guests.

Fixes: d81135a5("KVM: x86: do not reset mmu if CR0.CD and CR0.NW are changed")
Fixes: 21823fbd("KVM: x86: Invalidate all PGDs for the current PCID on MOV CR3 w/ flush")
Fixes: a91a7c70("KVM: X86: Don't reset mmu context when toggling X86_CR4_PGE")
Suggested-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NLai Jiangshan <laijs@linux.alibaba.com>
Message-Id: <20211216021938.11752-3-jiangshanlai@gmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

6b123c3a

KVM: VMX: Save HOST_CR3 in vmx_set_host_fs_gs() · a9f2705e

由 Lai Jiangshan 提交于 12月 16, 2021

The host CR3 in the vcpu thread can only be changed when scheduling,
so commit 15ad9762 ("KVM: VMX: Save HOST_CR3 in vmx_prepare_switch_to_guest()")
changed vmx.c to only save it in vmx_prepare_switch_to_guest().

However, it also has to be synced in vmx_sync_vmcs_host_state() when switching VMCS.
vmx_set_host_fs_gs() is called in both places, so rename it to
vmx_set_vmcs_host_state() and make it update HOST_CR3.

Fixes: 15ad9762 ("KVM: VMX: Save HOST_CR3 in vmx_prepare_switch_to_guest()")
Signed-off-by: NLai Jiangshan <laijs@linux.alibaba.com>
Message-Id: <20211216021938.11752-2-jiangshanlai@gmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

a9f2705e

Revert "KVM: X86: Update mmu->pdptrs only when it is changed" · 46cbc040

由 Paolo Bonzini 提交于 12月 10, 2021

This reverts commit 24cd19a2.
Sean Christopherson reports:

"Commit 24cd19a2 ('KVM: X86: Update mmu->pdptrs only when it is
changed') breaks nested VMs with EPT in L0 and PAE shadow paging in L2.
Reproducing is trivial, just disable EPT in L1 and run a VM.  I haven't
investigating how it breaks things."
Reviewed-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

46cbc040

selftests: KVM: sev_migrate_tests: Add mirror command tests · a6fec539

由 Peter Gonda 提交于 12月 08, 2021

Add tests to confirm mirror vms can only run correct subset of commands.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Marc Orr <marcorr@google.com>
Signed-off-by: NPeter Gonda <pgonda@google.com>
Message-Id: <20211208191642.3792819-4-pgonda@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

a6fec539

selftests: KVM: sev_migrate_tests: Fix sev_ioctl() · 427d046a

由 Peter Gonda 提交于 12月 08, 2021

TEST_ASSERT in SEV ioctl was allowing errors because it checked return
value was good OR the FW error code was OK. This TEST_ASSERT should
require both (aka. AND) values are OK. Removes the LAUNCH_START from the
mirror VM because this call correctly fails because mirror VMs cannot
call this command. Currently issues with the PSP driver functions mean
the firmware error is not always reset to SEV_RET_SUCCESS when a call is
successful. Mainly sev_platform_init() doesn't correctly set the fw
error if the platform has already been initialized.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Marc Orr <marcorr@google.com>
Signed-off-by: NPeter Gonda <pgonda@google.com>
Message-Id: <20211208191642.3792819-3-pgonda@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

427d046a

selftests: KVM: sev_migrate_tests: Fix test_sev_mirror() · 4c66b567

由 Peter Gonda 提交于 12月 08, 2021

Mirrors should not be able to call LAUNCH_START. Remove the call on the
mirror to correct the test before fixing sev_ioctl() to correctly assert
on this failed ioctl.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Marc Orr <marcorr@google.com>
Signed-off-by: NPeter Gonda <pgonda@google.com>
Message-Id: <20211208191642.3792819-2-pgonda@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

4c66b567

Merge tag 'kvm-riscv-5.17-1' of https://github.com/kvm-riscv/linux into HEAD · 1b0c9d00

由 Paolo Bonzini 提交于 1月 07, 2022

KVM/riscv changes for 5.17, take #1

- Use common KVM implementation of MMU memory caches
- SBI v0.2 support for Guest
- Initial KVM selftests support
- Fix to avoid spurious virtual interrupts after clearing hideleg CSR
- Update email address for Anup and Atish

1b0c9d00

Merge tag 'kvmarm-5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD · 7fd55a02

由 Paolo Bonzini 提交于 1月 07, 2022

KVM/arm64 updates for Linux 5.16

- Simplification of the 'vcpu first run' by integrating it into
  KVM's 'pid change' flow

- Refactoring of the FP and SVE state tracking, also leading to
  a simpler state and less shared data between EL1 and EL2 in
  the nVHE case

- Tidy up the header file usage for the nvhe hyp object

- New HYP unsharing mechanism, finally allowing pages to be
  unmapped from the Stage-1 EL2 page-tables

- Various pKVM cleanups around refcounting and sharing

- A couple of vgic fixes for bugs that would trigger once
  the vcpu xarray rework is merged, but not sooner

- Add minimal support for ARMv8.7's PMU extension

- Rework kvm_pgtable initialisation ahead of the NV work

- New selftest for IRQ injection

- Teach selftests about the lack of default IPA space and
  page sizes

- Expand sysreg selftest to deal with Pointer Authentication

- The usual bunch of cleanups and doc update

7fd55a02

06 1月, 2022 14 次提交

MAINTAINERS: Update Anup's email address · 497685f2

由 Anup Patel 提交于 1月 03, 2022

I am no longer work at Western Digital so update my email address to
personal one and add entries to .mailmap as well.
Signed-off-by: NAnup Patel <anup@brainfault.org>
Acked-by: NAtish Patra <atishp@rivosinc.com>

497685f2

KVM: RISC-V: Avoid spurious virtual interrupts after clearing hideleg CSR · 33e5b574

由 Vincent Chen 提交于 12月 27, 2021

When the last VM is terminated, the host kernel will invoke function
hardware_disable_nolock() on each CPU to disable the related virtualization
functions. Here, RISC-V currently only clears hideleg CSR and hedeleg CSR.
This behavior will cause the host kernel to receive spurious interrupts if
hvip CSR has pending interrupts and the corresponding enable bits in vsie
CSR are asserted. To avoid it, hvip CSR and vsie CSR must be cleared
before clearing hideleg CSR.

Fixes: 99cdc6c1 ("RISC-V: Add initial skeletal KVM support")
Signed-off-by: NVincent Chen <vincent.chen@sifive.com>
Reviewed-by: NAnup Patel <anup.patel@wdc.com>
Signed-off-by: NAnup Patel <anup.patel@wdc.com>

33e5b574

KVM: selftests: Add initial support for RISC-V 64-bit · 3e06cdf1

由 Anup Patel 提交于 10月 05, 2021

We add initial support for RISC-V 64-bit in KVM selftests using
which we can cross-compile and run arch independent tests such as:
demand_paging_test
dirty_log_test
kvm_create_max_vcpus,
kvm_page_table_test
set_memory_region_test
kvm_binary_stats_test

All VM guest modes defined in kvm_util.h require at least 48-bit
guest virtual address so to use KVM RISC-V selftests hardware
need to support at least Sv48 MMU for guest (i.e. VS-mode).
Signed-off-by: NAnup Patel <anup.patel@wdc.com>
Reviewed-and-tested-by: NAtish Patra <atishp@rivosinc.com>

3e06cdf1

KVM: selftests: Add EXTRA_CFLAGS in top-level Makefile · 788490e7

由 Anup Patel 提交于 11月 26, 2021

We add EXTRA_CFLAGS to the common CFLAGS of top-level Makefile which will
allow users to pass additional compile-time flags such as "-static".
Signed-off-by: NAnup Patel <anup.patel@wdc.com>
Reviewed-and-tested-by: NAtish Patra <atishp@rivosinc.com>
Reviewed-and-tested-by: NSean Christopherson <seanjc@google.com>

788490e7

RISC-V: KVM: Add VM capability to allow userspace get GPA bits · a457fd56

由 Anup Patel 提交于 11月 26, 2021

The number of GPA bits supported for a RISC-V Guest/VM is based on the
MMU mode used by the G-stage translation. The KVM RISC-V will detect and
use the best possible MMU mode for the G-stage in kvm_arch_init().

We add a generic VM capability KVM_CAP_VM_GPA_BITS which can be used by
the KVM userspace to get the number of GPA (guest physical address) bits
supported for a Guest/VM.
Signed-off-by: NAnup Patel <anup.patel@wdc.com>
Reviewed-and-tested-by: NAtish Patra <atishp@rivosinc.com>

a457fd56

RISC-V: KVM: Forward SBI experimental and vendor extensions · ef8949a9

由 Anup Patel 提交于 11月 26, 2021

The SBI experimental extension space is for temporary (or experimental)
stuff whereas SBI vendor extension space is for hardware vendor specific
stuff. Both these SBI extension spaces won't be standardized by the SBI
specification so let's blindly forward such SBI calls to the userspace.
Signed-off-by: NAnup Patel <anup.patel@wdc.com>
Reviewed-and-tested-by: NAtish Patra <atishp@rivosinc.com>

ef8949a9

RISC-V: KVM: make kvm_riscv_vcpu_fp_clean() static · 637ad655

由 Jisheng Zhang 提交于 11月 29, 2021

There are no users outside vcpu_fp.c so make kvm_riscv_vcpu_fp_clean()
static.
Signed-off-by: NJisheng Zhang <jszhang@kernel.org>
Signed-off-by: NAnup Patel <anup.patel@wdc.com>

637ad655

MAINTAINERS: Update Atish's email address · 4abed558

由 Atish Patra 提交于 12月 02, 2021

I am no longer employed by western digital. Update my email address to
personal one and add entries to .mailmap as well.
Signed-off-by: NAtish Patra <atishp@atishpatra.org>
Signed-off-by: NAnup Patel <anup.patel@wdc.com>

4abed558

RISC-V: KVM: Add SBI HSM extension in KVM · 3e1d8656