提交 · 71ab5a6fea49875290e6ca035ff1d3e3ef3813f7 · openeuler / Kernel

11 6月, 2022 15 次提交

KVM: selftests: Make vm_ioctl() a wrapper to pretty print ioctl name · 71ab5a6f

由 Sean Christopherson 提交于 2月 15, 2022

Make vm_ioctl() a macro wrapper and print the _name_ of the ioctl on
failure instead of the number.

Deliberately do not use __stringify(), as that will expand the ioctl all
the way down to its numerical sequence.  Again the intent is to print the
name of the macro.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

71ab5a6f

KVM: selftests: Add vcpu_get() to retrieve and assert on vCPU existence · 47a7c924

由 Sean Christopherson 提交于 2月 15, 2022

Add vcpu_get() to wrap vcpu_find() and deduplicate a pile of code that
asserts the requested vCPU exists.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

47a7c924

KVM: selftests: Remove vcpu_get_fd() · 21c6ee2b

由 Sean Christopherson 提交于 2月 15, 2022

Drop vcpu_get_fd(), it no longer has any users, and really should not
exist as the framework has failed if tests need to manually operate on
a vCPU fd.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

21c6ee2b

KVM: selftests: Use vcpu_access_device_attr() in arm64 code · caf12f3b

由 Sean Christopherson 提交于 2月 15, 2022

Use vcpu_access_device_attr() in arm's arch_timer test instead of
manually retrieving the vCPU's fd.  This will allow dropping vcpu_get_fd()
in a future patch.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

caf12f3b

KVM: selftests: Add __vcpu_run() helper · 38d4a385

由 Sean Christopherson 提交于 2月 15, 2022

Add __vcpu_run() so that tests that want to avoid asserts on KVM_RUN
failures don't need to open code the ioctl() call.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

38d4a385

KVM: sefltests: Use vcpu_ioctl() and __vcpu_ioctl() helpers · ffb7c77f

由 Sean Christopherson 提交于 2月 15, 2022

Use the recently introduced vCPU-specific ioctl() helpers instead of
open coding calls to ioctl() just to pretty print the ioctl name.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

ffb7c77f

KVM: selftests: Split vcpu_set_nested_state() into two helpers · 1d438b3b

由 Sean Christopherson 提交于 2月 15, 2022

Split vcpu_nested_state_set() into a wrapper that asserts, and an inner
helper that does not. Passing a bool is all kinds of awful as it's
unintuitive for readers and requires returning an 'int' from a function
that for most users can never return anything other than "success".
Signed-off-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

1d438b3b

KVM: selftests: Drop @mode from common vm_create() helper · 2ab2c307

由 Sean Christopherson 提交于 2月 14, 2022

Drop @mode from vm_create() and have it use VM_MODE_DEFAULT.  Add and use
an inner helper, __vm_create(), to service the handful of tests that want
something other than VM_MODE_DEFAULT.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

2ab2c307

KVM: selftests: Make vcpu_ioctl() a wrapper to pretty print ioctl name · 02e04c15

由 Sean Christopherson 提交于 2月 15, 2022

Make vcpu_ioctl() a macro wrapper and pretty the _name_ of the ioctl on
failure instead of the number.  Add inner macros to allow handling cases
where the name of the ioctl needs to be resolved higher up the stack, and
to allow using the formatting for non-ioctl syscalls without being
technically wrong.

Deliberately do not use __stringify(), as that will expand the ioctl all
the way down to its numerical sequence, again the intent is to print the
name of the macro.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

02e04c15

KVM: selftests: Add another underscore to inner ioctl() helpers · 2b38a739

由 Sean Christopherson 提交于 2月 15, 2022

Add a second underscore to inner ioctl() helpers to better align with
commonly accepted kernel coding style, and to allow using a single
underscore variant in the future for macro shenanigans.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

2b38a739

KVM: selftests: Always open VM file descriptors with O_RDWR · ccc82ba6

由 Sean Christopherson 提交于 2月 14, 2022

Drop the @perm param from vm_create() and always open VM file descriptors
with O_RDWR.  There's no legitimate use case for other permissions, and
if a selftest wants to do oddball negative testing it can open code the
necessary bits instead of forcing a bunch of tests to provide useless
information.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

ccc82ba6

KVM: selftests: Drop stale declarations from kvm_util_base.h · d379749f

由 Sean Christopherson 提交于 5月 02, 2022

Drop declarations for allocate_kvm_dirty_log() and vm_create_device(),
which no longer have implementations.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

d379749f

KVM: selftests: Fix typo in vgic_init test · ff624e57

由 Sean Christopherson 提交于 2月 16, 2022

When iterating over vCPUs, invoke access_v3_redist_reg() on the "current"
vCPU instead of vCPU0, which is presumably what was intended by iterating
over all vCPUs.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

ff624e57

KVM: selftests: Fix buggy-but-benign check in test_v3_new_redist_regions() · 1ca378f6

由 Sean Christopherson 提交于 2月 23, 2022

Update 'ret' with the return value of _kvm_device_access() prior to
asserting that ret is non-zero. In the current code base, the flaw is
benign as 'ret' is guaranteed to be -EBUSY from the previous run_vcpu(),
which also means that errno==EBUSY prior to _kvm_device_access(), thus
the "errno == EFAULT" part of the assert means that a false negative is
impossible (unless the kernel is being truly mean and spuriously setting
errno=EFAULT while returning success).
Signed-off-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

1ca378f6

KVM: Fix references to non-existent KVM_CAP_TRIPLE_FAULT_EVENT · 8deb03e7

由 Sean Christopherson 提交于 6月 01, 2022

The x86-only KVM_CAP_TRIPLE_FAULT_EVENT was (appropriately) renamed to
KVM_CAP_X86_TRIPLE_FAULT_EVENT when the patches were applied, but the
docs and selftests got left behind.  Fix them.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

8deb03e7

10 6月, 2022 8 次提交

KVM: x86: Bug the VM on an out-of-bounds data read · d38ea957

由 Sean Christopherson 提交于 5月 26, 2022

Bug the VM and terminate emulation if an out-of-bounds read into the
emulator's data cache occurs.  Knowingly contuining on all but guarantees
that KVM will overwrite random kernel data, which is far, far worse than
killing the VM.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Reviewed-by: NKees Cook <keescook@chromium.org>
Reviewed-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20220526210817.3428868-9-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

d38ea957

KVM: x86: Bug the VM if the emulator generates a bogus exception vector · 49a1431d

由 Sean Christopherson 提交于 5月 26, 2022

Bug the VM if KVM's emulator attempts to inject a bogus exception vector.
The guest is likely doomed even if KVM continues on, and propagating a
bad vector to the rest of KVM runs the risk of breaking other assumptions
in KVM and thus triggering a more egregious bug.

All existing users of emulate_exception() have hardcoded vector numbers
(__load_segment_descriptor() uses a few different vectors, but they're
all hardcoded), and future users are likely to follow suit, i.e. the
change to emulate_exception() is a glorified nop.

As for the ctxt->exception.vector check in x86_emulate_insn(), the few
known times the WARN has been triggered in the past is when the field was
not set when synthesizing a fault, i.e. for all intents and purposes the
check protects against consumption of uninitialized data.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Reviewed-by: NKees Cook <keescook@chromium.org>
Reviewed-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20220526210817.3428868-8-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

49a1431d

KVM: x86: Bug the VM if the emulator accesses a non-existent GPR · 1cca2f8c

由 Sean Christopherson 提交于 5月 26, 2022

Bug the VM, i.e. kill it, if the emulator accesses a non-existent GPR,
i.e. generates an out-of-bounds GPR index.  Continuing on all but
gaurantees some form of data corruption in the guest, e.g. even if KVM
were to redirect to a dummy register, KVM would be incorrectly read zeros
and drop writes.

Note, bugging the VM doesn't completely prevent data corruption, e.g. the
current round of emulation will complete before the vCPU bails out to
userspace.  But, the very act of killing the guest can also cause data
corruption, e.g. due to lack of file writeback before termination, so
taking on additional complexity to cleanly bail out of the emulator isn't
justified, the goal is purely to stem the bleeding and alert userspace
that something has gone horribly wrong, i.e. to avoid _silent_ data
corruption.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Reviewed-by: NKees Cook <keescook@chromium.org>
Reviewed-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20220526210817.3428868-7-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

1cca2f8c

KVM: x86: Reduce the number of emulator GPRs to '8' for 32-bit KVM · b443183a

由 Sean Christopherson 提交于 5月 26, 2022

Reduce the number of GPRs emulated by 32-bit KVM from 16 to 8.  KVM does
not support emulating 64-bit mode on 32-bit host kernels, and so should
never generate accesses to R8-15.

Opportunistically use NR_EMULATOR_GPRS in rsm_load_state_{32,64}() now
that it is precise and accurate for both flavors.

Wrap the definition with full #ifdef ugliness; sadly, IS_ENABLED()
doesn't guarantee a compile-time constant as far as BUILD_BUG_ON() is
concerned.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Reviewed-by: NKees Cook <keescook@chromium.org>
Message-Id: <20220526210817.3428868-6-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

b443183a

KVM: x86: Use 16-bit fields to track dirty/valid emulator GPRs · 0cbc60d4

由 Sean Christopherson 提交于 5月 26, 2022

Use a u16 instead of a u32 to track the dirty/valid status of GPRs in the
emulator.  Unlike struct kvm_vcpu_arch, x86_emulate_ctxt tracks only the
"true" GPRs, i.e. doesn't include RIP in its array, and so only needs to
track 16 registers.

Note, maxing out at 16 GPRs is a fundamental property of x86-64 and will
not change barring a massive architecture update.  Legacy x86 ModRM and
SIB encodings use 3 bits for GPRs, i.e. support 8 registers.  x86-64 uses
a single bit in the REX prefix for each possible reference type to double
the number of supported GPRs to 16 registers (4 bits).
Reviewed-by: NKees Cook <keescook@chromium.org>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20220526210817.3428868-5-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

0cbc60d4

KVM: x86: Omit VCPU_REGS_RIP from emulator's _regs array · a5ba67b4

由 Sean Christopherson 提交于 5月 26, 2022

Omit RIP from the emulator's _regs array, which is used only for GPRs,
i.e. registers that can be referenced via ModRM and/or SIB bytes.  The
emulator uses the dedicated _eip field for RIP, and manually reads from
_eip to handle RIP-relative addressing.

To avoid an even bigger, slightly more dangerous change, hardcode the
number of GPRs to 16 for the time being even though 32-bit KVM's emulator
technically should only have 8 GPRs.  Add a TODO to address that in a
future commit.

See also the comments above the read_gpr() and write_gpr() declarations,
and obviously the handling in writeback_registers().

No functional change intended.
Signed-off-by: NSean Christopherson <seanjc@google.com>
Reviewed-by: NKees Cook <keescook@chromium.org>
Message-Id: <20220526210817.3428868-4-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

a5ba67b4

KVM: x86: Harden _regs accesses to guard against buggy input · dfe21e6b

由 Sean Christopherson 提交于 5月 26, 2022

WARN and truncate the incoming GPR number/index when reading/writing GPRs
in the emulator to guard against KVM bugs, e.g. to avoid out-of-bounds
accesses to ctxt->_regs[] if KVM generates a bogus index.  Truncate the
index instead of returning e.g. zero, as reg_write() returns a pointer
to the register, i.e. returning zero would result in a NULL pointer
dereference.  KVM could also force the index to any arbitrary GPR, but
that's no better or worse, just different.

Open code the restriction to 16 registers; RIP is handled via _eip and
should never be accessed through reg_read() or reg_write().  See the
comments above the declarations of reg_read() and reg_write(), and the
behavior of writeback_registers().  The horrific open coded mess will be
cleaned up in a future commit.

There are no such bugs known to exist in the emulator, but determining
that KVM is bug-free is not at all simple and requires a deep dive into
the emulator.  The code is so convoluted that GCC-12 with the recently
enable -Warray-bounds spits out a false-positive due to a GCC bug:

  arch/x86/kvm/emulate.c:254:27: warning: array subscript 32 is above array
                                 bounds of 'long unsigned int[17]' [-Warray-bounds]
    254 |         return ctxt->_regs[nr];
        |                ~~~~~~~~~~~^~~~
  In file included from arch/x86/kvm/emulate.c:23:
  arch/x86/kvm/kvm_emulate.h: In function 'reg_rmw':
  arch/x86/kvm/kvm_emulate.h:366:23: note: while referencing '_regs'
    366 |         unsigned long _regs[NR_VCPU_REGS];
        |                       ^~~~~

Link: https://lore.kernel.org/all/YofQlBrlx18J7h9Y@google.com
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216026
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105679Reported-and-tested-by: NRobert Dinse <nanook@eskimo.com>
Reported-by: NKees Cook <keescook@chromium.org>
Reviewed-by: NKees Cook <keescook@chromium.org>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20220526210817.3428868-3-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

dfe21e6b

KVM: x86: Grab regs_dirty in local 'unsigned long' · 61d9c412

由 Sean Christopherson 提交于 5月 26, 2022

Capture ctxt->regs_dirty in a local 'unsigned long' instead of casting it
to an 'unsigned long *' for use in for_each_set_bit(). The bitops helpers
really do read the entire 'unsigned long', even though the walking of the
read value is capped at the specified size. I.e. 64-bit KVM is reading
memory beyond ctxt->regs_dirty, which is a u32 and thus 4 bytes, whereas
an unsigned long is 8 bytes. Functionally it's not an issue because
regs_dirty is in the middle of x86_emulate_ctxt, i.e. KVM is just reading
its own memory, but relying on that coincidence is gross and unsafe.
Reviewed-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: NKees Cook <keescook@chromium.org>
Signed-off-by: NSean Christopherson <seanjc@google.com>
Message-Id: <20220526210817.3428868-2-seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

61d9c412

09 6月, 2022 17 次提交

Merge branch 'kvm-5.20-early' · e15f5e6f

由 Paolo Bonzini 提交于 6月 09, 2022

s390:

* add an interface to provide a hypervisor dump for secure guests

* improve selftests to show tests

x86:

* Intel IPI virtualization

* Allow getting/setting pending triple fault with KVM_GET/SET_VCPU_EVENTS

* PEBS virtualization

* Simplify PMU emulation by just using PERF_TYPE_RAW events

* More accurate event reinjection on SVM (avoid retrying instructions)

* Allow getting/setting the state of the speaker port data bit

* Rewrite gfn-pfn cache refresh

* Refuse starting the module if VM-Entry/VM-Exit controls are inconsistent

* "Notify" VM exit

e15f5e6f

KVM: selftests: Restrict test region to 48-bit physical addresses when using nested · e0f3f46e

由 David Matlack 提交于 5月 20, 2022

The selftests nested code only supports 4-level paging at the moment.
This means it cannot map nested guest physical addresses with more than
48 bits. Allow perf_test_util nested mode to work on hosts with more
than 48 physical addresses by restricting the guest test region to
48-bits.

While here, opportunistically fix an off-by-one error when dealing with
vm_get_max_gfn(). perf_test_util.c was treating this as the maximum
number of GFNs, rather than the maximum allowed GFN. This didn't result
in any correctness issues, but it did end up shifting the test region
down slightly when using huge pages.
Suggested-by: NSean Christopherson <seanjc@google.com>
Signed-off-by: NDavid Matlack <dmatlack@google.com>
Message-Id: <20220520233249.3776001-12-dmatlack@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

e0f3f46e

KVM: selftests: Add option to run dirty_log_perf_test vCPUs in L2 · 71d48966

由 David Matlack 提交于 5月 20, 2022

Add an option to dirty_log_perf_test that configures the vCPUs to run in
L2 instead of L1. This makes it possible to benchmark the dirty logging
performance of nested virtualization, which is particularly interesting
because KVM must shadow L1's EPT/NPT tables.

For now this support only works on x86_64 CPUs with VMX. Otherwise
passing -n results in the test being skipped.
Signed-off-by: NDavid Matlack <dmatlack@google.com>
Message-Id: <20220520233249.3776001-11-dmatlack@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

71d48966

KVM: selftests: Clean up LIBKVM files in Makefile · cf97d5e9

由 David Matlack 提交于 5月 20, 2022

Break up the long lines for LIBKVM and alphabetize each architecture.
This makes reading the Makefile easier, and will make reading diffs to
LIBKVM easier.

No functional change intended.
Reviewed-by: NPeter Xu <peterx@redhat.com>
Signed-off-by: NDavid Matlack <dmatlack@google.com>
Message-Id: <20220520233249.3776001-10-dmatlack@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

cf97d5e9

KVM: selftests: Link selftests directly with lib object files · cdc979da

由 David Matlack 提交于 5月 20, 2022

The linker does obey strong/weak symbols when linking static libraries,
it simply resolves an undefined symbol to the first-encountered symbol.
This means that defining __weak arch-generic functions and then defining
arch-specific strong functions to override them in libkvm will not
always work.

More specifically, if we have:

lib/generic.c:

  void __weak foo(void)
  {
          pr_info("weak\n");
  }

  void bar(void)
  {
          foo();
  }

lib/x86_64/arch.c:

  void foo(void)
  {
          pr_info("strong\n");
  }

And a selftest that calls bar(), it will print "weak". Now if you make
generic.o explicitly depend on arch.o (e.g. add function to arch.c that
is called directly from generic.c) it will print "strong". In other
words, it seems that the linker is free to throw out arch.o when linking
because generic.o does not explicitly depend on it, which causes the
linker to lose the strong symbol.

One solution is to link libkvm.a with --whole-archive so that the linker
doesn't throw away object files it thinks are unnecessary. However that
is a bit difficult to plumb since we are using the common selftests
makefile rules. An easier solution is to drop libkvm.a just link
selftests with all the .o files that were originally in libkvm.a.
Reviewed-by: NPeter Xu <peterx@redhat.com>
Signed-off-by: NDavid Matlack <dmatlack@google.com>
Message-Id: <20220520233249.3776001-9-dmatlack@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

cdc979da

KVM: selftests: Drop unnecessary rule for STATIC_LIBS · acf57736

由 David Matlack 提交于 5月 20, 2022

Drop the "all: $(STATIC_LIBS)" rule. The KVM selftests already depend
on $(STATIC_LIBS), so there is no reason to have an extra "all" rule.
Suggested-by: NPeter Xu <peterx@redhat.com>
Signed-off-by: NDavid Matlack <dmatlack@google.com>
Message-Id: <20220520233249.3776001-8-dmatlack@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

acf57736

KVM: selftests: Add a helper to check EPT/VPID capabilities · c363d959

由 David Matlack 提交于 5月 20, 2022

Create a small helper function to check if a given EPT/VPID capability
is supported. This will be re-used in a follow-up commit to check for 1G
page support.

No functional change intended.
Reviewed-by: NPeter Xu <peterx@redhat.com>
Signed-off-by: NDavid Matlack <dmatlack@google.com>
Message-Id: <20220520233249.3776001-7-dmatlack@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c363d959

KVM: selftests: Move VMX_EPT_VPID_CAP_AD_BITS to vmx.h · b6c086d0

由 David Matlack 提交于 5月 20, 2022

This is a VMX-related macro so move it to vmx.h. While here, open code
the mask like the rest of the VMX bitmask macros.

No functional change intended.
Reviewed-by: NPeter Xu <peterx@redhat.com>
Signed-off-by: NDavid Matlack <dmatlack@google.com>
Message-Id: <20220520233249.3776001-6-dmatlack@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

b6c086d0

KVM: selftests: Refactor nested_map() to specify target level · ce690e9c

由 David Matlack 提交于 5月 20, 2022

Refactor nested_map() to specify that it explicityl wants 4K mappings
(the existing behavior) and push the implementation down into
__nested_map(), which can be used in subsequent commits to create huge
page mappings.

No function change intended.
Reviewed-by: NPeter Xu <peterx@redhat.com>
Signed-off-by: NDavid Matlack <dmatlack@google.com>
Message-Id: <20220520233249.3776001-5-dmatlack@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

ce690e9c

KVM: selftests: Drop stale function parameter comment for nested_map() · b8ca01ea

由 David Matlack 提交于 5月 20, 2022

nested_map() does not take a parameter named eptp_memslot. Drop the
comment referring to it.
Reviewed-by: NPeter Xu <peterx@redhat.com>
Signed-off-by: NDavid Matlack <dmatlack@google.com>
Message-Id: <20220520233249.3776001-4-dmatlack@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

b8ca01ea

KVM: selftests: Add option to create 2M and 1G EPT mappings · c5a0ccec

由 David Matlack 提交于 5月 20, 2022

The current EPT mapping code in the selftests only supports mapping 4K
pages. This commit extends that support with an option to map at 2M or
1G. This will be used in a future commit to create large page mappings
to test eager page splitting.

No functional change intended.
Signed-off-by: NDavid Matlack <dmatlack@google.com>
Message-Id: <20220520233249.3776001-3-dmatlack@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c5a0ccec

KVM: selftests: Replace x86_page_size with PG_LEVEL_XX · 4ee602e7

由 David Matlack 提交于 5月 20, 2022

x86_page_size is an enum used to communicate the desired page size with
which to map a range of memory. Under the hood they just encode the
desired level at which to map the page. This ends up being clunky in a
few ways:

 - The name suggests it encodes the size of the page rather than the
   level.
 - In other places in x86_64/processor.c we just use a raw int to encode
   the level.

Simplify this by adopting the kernel style of PG_LEVEL_XX enums and pass
around raw ints when referring to the level. This makes the code easier
to understand since these macros are very common in KVM MMU code.
Signed-off-by: NDavid Matlack <dmatlack@google.com>
Message-Id: <20220520233249.3776001-2-dmatlack@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

4ee602e7

KVM: x86: SVM: fix nested PAUSE filtering when L0 intercepts PAUSE · e3cdaab5

由 Paolo Bonzini 提交于 5月 31, 2022

Commit 74fd41ed ("KVM: x86: nSVM: support PAUSE filtering when L0
doesn't intercept PAUSE") introduced passthrough support for nested pause
filtering, (when the host doesn't intercept PAUSE) (either disabled with
kvm module param, or disabled with '-overcommit cpu-pm=on')

Before this commit, L1 KVM didn't intercept PAUSE at all; afterwards,
the feature was exposed as supported by KVM cpuid unconditionally, thus
if L1 could try to use it even when the L0 KVM can't really support it.

In this case the fallback caused KVM to intercept each PAUSE instruction;
in some cases, such intercept can slow down the nested guest so much
that it can fail to boot. Instead, before the problematic commit KVM
was already setting both thresholds to 0 in vmcb02, but after the first
userspace VM exit shrink_ple_window was called and would reset the
pause_filter_count to the default value.

To fix this, change the fallback strategy - ignore the guest threshold
values, but use/update the host threshold values unless the guest
specifically requests disabling PAUSE filtering (either simple or
advanced).

Also fix a minor bug: on nested VM exit, when PAUSE filter counter
were copied back to vmcb01, a dirty bit was not set.

Thanks a lot to Suravee Suthikulpanit for debugging this!

Fixes: 74fd41ed ("KVM: x86: nSVM: support PAUSE filtering when L0 doesn't intercept PAUSE")
Reported-by: NSuravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Tested-by: NSuravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Co-developed-by: NMaxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20220518072709.730031-1-mlevitsk@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

e3cdaab5

KVM: x86: SVM: drop preempt-safe wrappers for avic_vcpu_load/put · ba8ec273

由 Maxim Levitsky 提交于 6月 06, 2022

Now that these functions are always called with preemption disabled,
remove the preempt_disable()/preempt_enable() pair inside them.

No functional change intended.
Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20220606180829.102503-8-mlevitsk@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

ba8ec273

KVM: x86: disable preemption around the call to kvm_arch_vcpu_{un|}blocking · 18869f26

由 Maxim Levitsky 提交于 6月 06, 2022

On SVM, if preemption happens right after the call to finish_rcuwait
but before call to kvm_arch_vcpu_unblocking on SVM/AVIC, it itself
will re-enable AVIC, and then we will try to re-enable it again
in kvm_arch_vcpu_unblocking which will lead to a warning
in __avic_vcpu_load.

The same problem can happen if the vCPU is preempted right after the call
to kvm_arch_vcpu_blocking but before the call to prepare_to_rcuwait
and in this case, we will end up with AVIC enabled during sleep -
Ooops.
Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20220606180829.102503-7-mlevitsk@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

18869f26

KVM: x86: disable preemption while updating apicv inhibition · 66c768d3

由 Maxim Levitsky 提交于 6月 06, 2022

Currently nothing prevents preemption in kvm_vcpu_update_apicv.

On SVM, If the preemption happens after we update the
vcpu->arch.apicv_active, the preemption itself will
'update' the inhibition since the AVIC will be first disabled
on vCPU unload and then enabled, when the current task
is loaded again.

Then we will try to update it again, which will lead to a warning
in __avic_vcpu_load, that the AVIC is already enabled.

Fix this by disabling preemption in this code.
Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20220606180829.102503-6-mlevitsk@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

66c768d3

KVM: x86: SVM: fix avic_kick_target_vcpus_fast · 603ccef4

由 Maxim Levitsky 提交于 6月 06, 2022

There are two issues in avic_kick_target_vcpus_fast

1. It is legal to issue an IPI request with APIC_DEST_NOSHORT
   and a physical destination of 0xFF (or 0xFFFFFFFF in case of x2apic),
   which must be treated as a broadcast destination.

   Fix this by explicitly checking for it.
   Also donâ€™t use â€˜indexâ€™ in this case as it gives no new information.

2. It is legal to issue a logical IPI request to more than one target.
   Index field only provides index in physical id table of first
   such target and therefore can't be used before we are sure
   that only a single target was addressed.

   Instead, parse the ICRL/ICRH, double check that a unicast interrupt
   was requested, and use that info to figure out the physical id
   of the target vCPU.
   At that point there is no need to use the index field as well.

In addition to fixing the above	issues,	also skip the call to
kvm_apic_match_dest.

It is possible to do this now, because now as long as AVIC is not
inhibited, it is guaranteed that none of the vCPUs changed their
apic id from its default value.

This fixes boot of windows guest with AVIC enabled because it uses
IPI with 0xFF destination and no destination shorthand.

Fixes: 7223fd2d ("KVM: SVM: Use target APIC ID to complete AVIC IRQs when possible")
Cc: stable@vger.kernel.org
Signed-off-by: NMaxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20220606180829.102503-5-mlevitsk@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

603ccef4

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功