提交 · d8ba8ba4c801b794f47852a6f1821ea48f83b5d1 · openeuler / Kernel

30 11月, 2022 1 次提交

KVM: x86/xen: Allow XEN_RUNSTATE_UPDATE flag behaviour to be configured · d8ba8ba4

由 David Woodhouse 提交于 11月 27, 2022

Closer inspection of the Xen code shows that we aren't supposed to be
using the XEN_RUNSTATE_UPDATE flag unconditionally. It should be
explicitly enabled by guests through the HYPERVISOR_vm_assist hypercall.
If we randomly set the top bit of ->state_entry_time for a guest that
hasn't asked for it and doesn't expect it, that could make the runtimes
fail to add up and confuse the guest. Without the flag it's perfectly
safe for a vCPU to read its own vcpu_runstate_info; just not for one
vCPU to read *another's*.

I briefly pondered adding a word for the whole set of VMASST_TYPE_*
flags but the only one we care about for HVM guests is this, so it
seemed a bit pointless.
Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk>
Message-Id: <20221127122210.248427-3-dwmw2@infradead.org>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

d8ba8ba4

23 11月, 2022 2 次提交

KVM: s390: pv: add KVM_CAP_S390_PROTECTED_ASYNC_DISABLE · 8c516b25

由 Claudio Imbrenda 提交于 11月 11, 2022

Add KVM_CAP_S390_PROTECTED_ASYNC_DISABLE to signal that the
KVM_PV_ASYNC_DISABLE and KVM_PV_ASYNC_DISABLE_PREPARE commands for the
KVM_S390_PV_COMMAND ioctl are available.
Signed-off-by: NClaudio Imbrenda <imbrenda@linux.ibm.com>
Reviewed-by: NNico Boehr <nrb@linux.ibm.com>
Reviewed-by: NSteffen Eiden <seiden@linux.ibm.com>
Reviewed-by: NJanosch Frank <frankja@linux.ibm.com>
Link: https://lore.kernel.org/r/20221111170632.77622-4-imbrenda@linux.ibm.com
Message-Id: <20221111170632.77622-4-imbrenda@linux.ibm.com>
Signed-off-by: NJanosch Frank <frankja@linux.ibm.com>

8c516b25

KVM: s390: pv: asynchronous destroy for reboot · fb491d55

由 Claudio Imbrenda 提交于 11月 11, 2022

Until now, destroying a protected guest was an entirely synchronous
operation that could potentially take a very long time, depending on
the size of the guest, due to the time needed to clean up the address
space from protected pages.

This patch implements an asynchronous destroy mechanism, that allows a
protected guest to reboot significantly faster than previously.

This is achieved by clearing the pages of the old guest in background.
In case of reboot, the new guest will be able to run in the same
address space almost immediately.

The old protected guest is then only destroyed when all of its memory
has been destroyed or otherwise made non protected.

Two new PV commands are added for the KVM_S390_PV_COMMAND ioctl:

KVM_PV_ASYNC_CLEANUP_PREPARE: set aside the current protected VM for
later asynchronous teardown. The current KVM VM will then continue
immediately as non-protected. If a protected VM had already been
set aside for asynchronous teardown, but without starting the teardown
process, this call will fail. There can be at most one VM set aside at
any time. Once it is set aside, the protected VM only exists in the
context of the Ultravisor, it is not associated with the KVM VM
anymore. Its protected CPUs have already been destroyed, but not its
memory. This command can be issued again immediately after starting
KVM_PV_ASYNC_CLEANUP_PERFORM, without having to wait for completion.

KVM_PV_ASYNC_CLEANUP_PERFORM: tears down the protected VM previously
set aside using KVM_PV_ASYNC_CLEANUP_PREPARE. Ideally the
KVM_PV_ASYNC_CLEANUP_PERFORM PV command should be issued by userspace
from a separate thread. If a fatal signal is received (or if the
process terminates naturally), the command will terminate immediately
without completing. All protected VMs whose teardown was interrupted
will be put in the need_cleanup list. The rest of the normal KVM
teardown process will take care of properly cleaning up all remaining
protected VMs, including the ones on the need_cleanup list.
Signed-off-by: NClaudio Imbrenda <imbrenda@linux.ibm.com>
Reviewed-by: NNico Boehr <nrb@linux.ibm.com>
Reviewed-by: NJanosch Frank <frankja@linux.ibm.com>
Reviewed-by: NSteffen Eiden <seiden@linux.ibm.com>
Link: https://lore.kernel.org/r/20221111170632.77622-2-imbrenda@linux.ibm.com
Message-Id: <20221111170632.77622-2-imbrenda@linux.ibm.com>
Signed-off-by: NJanosch Frank <frankja@linux.ibm.com>

fb491d55

10 11月, 2022 1 次提交

KVM: x86: Add a VALID_MASK for the MSR exit reason flags · db205f7e

由 Aaron Lewis 提交于 9月 21, 2022

Add the mask KVM_MSR_EXIT_REASON_VALID_MASK for the MSR exit reason
flags.  This simplifies checks that validate these flags, and makes it
easier to introduce new flags in the future.

No functional change intended.
Signed-off-by: NAaron Lewis <aaronlewis@google.com>
Message-Id: <20220921151525.904162-3-aaronlewis@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

db205f7e

29 9月, 2022 1 次提交

KVM: Add KVM_CAP_DIRTY_LOG_RING_ACQ_REL capability and config option · 17601bfe

由 Marc Zyngier 提交于 9月 26, 2022

In order to differenciate between architectures that require no extra
synchronisation when accessing the dirty ring and those who do,
add a new capability (KVM_CAP_DIRTY_LOG_RING_ACQ_REL) that identify
the latter sort. TSO architectures can obviously advertise both, while
relaxed architectures must only advertise the ACQ_REL version.

This requires some configuration symbol rejigging, with HAVE_KVM_DIRTY_RING
being only indirectly selected by two top-level config symbols:
- HAVE_KVM_DIRTY_RING_TSO for strongly ordered architectures (x86)
- HAVE_KVM_DIRTY_RING_ACQ_REL for weakly ordered architectures (arm64)
Suggested-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Reviewed-by: NGavin Shan <gshan@redhat.com>
Reviewed-by: NPeter Xu <peterx@redhat.com>
Link: https://lore.kernel.org/r/20220926145120.27974-3-maz@kernel.org

17601bfe

29 7月, 2022 1 次提交

RISC-V: KVM: Add extensible CSR emulation framework · 8a061562

由 Anup Patel 提交于 7月 29, 2022

We add an extensible CSR emulation framework which is based upon the
existing system instruction emulation. This will be useful to upcoming
AIA, PMU, Nested and other virtualization features.

The CSR emulation framework also has provision to emulate CSR in user
space but this will be used only in very specific cases such as AIA
IMSIC CSR emulation in user space or vendor specific CSR emulation
in user space.

By default, all CSRs not handled by KVM RISC-V will be redirected back
to Guest VCPU as illegal instruction trap.
Signed-off-by: NAnup Patel <apatel@ventanamicro.com>
Signed-off-by: NAnup Patel <anup@brainfault.org>

8a061562

20 7月, 2022 1 次提交

KVM: s390: resetting the Topology-Change-Report · f5ecfee9

由 Pierre Morel 提交于 7月 14, 2022

During a subsystem reset the Topology-Change-Report is cleared.

Let's give userland the possibility to clear the MTCR in the case
of a subsystem reset.

To migrate the MTCR, we give userland the possibility to
query the MTCR state.

We indicate KVM support for the CPU topology facility with a new
KVM capability: KVM_CAP_S390_CPU_TOPOLOGY.
Signed-off-by: NPierre Morel <pmorel@linux.ibm.com>
Reviewed-by: NJanis Schoetterl-Glausch <scgl@linux.ibm.com>
Reviewed-by: NJanosch Frank <frankja@linux.ibm.com>
Message-Id: <20220714194334.127812-1-pmorel@linux.ibm.com>
Link: https://lore.kernel.org/all/20220714194334.127812-1-pmorel@linux.ibm.com/
[frankja@linux.ibm.com: Simple conflict resolution in Documentation/virt/kvm/api.rst]
Signed-off-by: NJanosch Frank <frankja@linux.ibm.com>

f5ecfee9

19 7月, 2022 1 次提交

KVM: stats: Fix value for KVM_STATS_UNIT_MAX for boolean stats · 450a5639

由 Oliver Upton 提交于 7月 19, 2022

commit 1b870fa5 ("kvm: stats: tell userspace which values are
boolean") added a new stat unit (boolean) but failed to raise
KVM_STATS_UNIT_MAX.

Fix by pointing UNIT_MAX at the new max value of UNIT_BOOLEAN.

Fixes: 1b870fa5 ("kvm: stats: tell userspace which values are boolean")
Reported-by: NJanis Schoetterl-Glausch <scgl@linux.ibm.com>
Signed-off-by: NOliver Upton <oupton@google.com>
Message-Id: <20220719125229.2934273-1-oupton@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

450a5639

14 7月, 2022 1 次提交

kvm: stats: tell userspace which values are boolean · 1b870fa5

由 Paolo Bonzini 提交于 7月 14, 2022

Some of the statistics values exported by KVM are always only 0 or 1.
It can be useful to export this fact to userspace so that it can track
them specially (for example by polling the value every now and then to
compute a % of time spent in a specific state).

Therefore, add "boolean value" as a new "unit".  While it is not exactly
a unit, it walks and quacks like one.  In particular, using the type
would be wrong because boolean values could be instantaneous or peak
values (e.g. "is the rmap allocated?") or even two-bucket histograms
(e.g. "number of posted vs. non-posted interrupt injections").
Suggested-by: NAmneesh Singh <natto@weirdnatto.in>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

1b870fa5

11 7月, 2022 1 次提交

KVM: s390: add KVM_S390_ZPCI_OP to manage guest zPCI devices · db1c875e

由 Matthew Rosato 提交于 6月 06, 2022

The KVM_S390_ZPCI_OP ioctl provides a mechanism for managing
hardware-assisted virtualization features for s390x zPCI passthrough.
Add the first 2 operations, which can be used to enable/disable
the specified device for Adapter Event Notification interpretation.
Signed-off-by: NMatthew Rosato <mjrosato@linux.ibm.com>
Acked-by: NPierre Morel <pmorel@linux.ibm.com>
Reviewed-by: NThomas Huth <thuth@redhat.com>
Link: https://lore.kernel.org/r/20220606203325.110625-21-mjrosato@linux.ibm.comSigned-off-by: NChristian Borntraeger <borntraeger@linux.ibm.com>

db1c875e

29 6月, 2022 1 次提交

treewide: uapi: Replace zero-length arrays with flexible-array members · 94dfc73e

由 Gustavo A. R. Silva 提交于 4月 06, 2022

There is a regular need in the kernel to provide a way to declare
having a dynamically sized set of trailing elements in a structure.
Kernel code should always use “flexible array members”[1] for these
cases. The older style of one-element or zero-length arrays should
no longer be used[2].

This code was transformed with the help of Coccinelle:
(linux-5.19-rc2$ spatch --jobs $(getconf _NPROCESSORS_ONLN) --sp-file script.cocci --include-headers --dir . > output.patch)

@@
identifier S, member, array;
type T1, T2;
@@

struct S {
  ...
  T1 member;
  T2 array[
- 0
  ];
};

-fstrict-flex-arrays=3 is coming and we need to land these changes
to prevent issues like these in the short future:

../fs/minix/dir.c:337:3: warning: 'strcpy' will always overflow; destination buffer has size 0,
but the source string has length 2 (including NUL byte) [-Wfortify-source]
		strcpy(de3->name, ".");
		^

Since these are all [0] to [] changes, the risk to UAPI is nearly zero. If
this breaks anything, we can use a union with a new member name.

[1] https://en.wikipedia.org/wiki/Flexible_array_member
[2] https://www.kernel.org/doc/html/v5.16/process/deprecated.html#zero-length-and-one-element-arrays

Link: https://github.com/KSPP/linux/issues/78Build-tested-by: Nkernel test robot <lkp@intel.com>
Link: https://lore.kernel.org/lkml/62b675ec.wKX6AOZ6cbE71vtF%25lkp@intel.com/
Acked-by: Dan Williams <dan.j.williams@intel.com> # For ndctl.h
Signed-off-by: NGustavo A. R. Silva <gustavoars@kernel.org>

94dfc73e

24 6月, 2022 1 次提交

KVM: x86/MMU: Allow NX huge pages to be disabled on a per-vm basis · 084cc29f

由 Ben Gardon 提交于 6月 13, 2022

In some cases, the NX hugepage mitigation for iTLB multihit is not
needed for all guests on a host. Allow disabling the mitigation on a
per-VM basis to avoid the performance hit of NX hugepages on trusted
workloads.

In order to disable NX hugepages on a VM, ensure that the userspace
actor has permission to reboot the system. Since disabling NX hugepages
would allow a guest to crash the system, it is similar to reboot
permissions.

Ideally, KVM would require userspace to prove it has access to KVM's
nx_huge_pages module param, e.g. so that userspace can opt out without
needing full reboot permissions.  But getting access to the module param
file info is difficult because it is buried in layers of sysfs and module
glue. Requiring CAP_SYS_BOOT is sufficient for all known use cases.
Suggested-by: NJim Mattson <jmattson@google.com>
Reviewed-by: NDavid Matlack <dmatlack@google.com>
Reviewed-by: NPeter Xu <peterx@redhat.com>
Signed-off-by: NBen Gardon <bgardon@google.com>
Message-Id: <20220613212523.3436117-9-bgardon@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

084cc29f

08 6月, 2022 2 次提交

KVM: VMX: Enable Notify VM exit · 2f4073e0

由 Tao Xu 提交于 5月 24, 2022

There are cases that malicious virtual machines can cause CPU stuck (due
to event windows don't open up), e.g., infinite loop in microcode when
nested #AC (CVE-2015-5307). No event window means no event (NMI, SMI and
IRQ) can be delivered. It leads the CPU to be unavailable to host or
other VMs.

VMM can enable notify VM exit that a VM exit generated if no event
window occurs in VM non-root mode for a specified amount of time (notify
window).

Feature enabling:
- The new vmcs field SECONDARY_EXEC_NOTIFY_VM_EXITING is introduced to
  enable this feature. VMM can set NOTIFY_WINDOW vmcs field to adjust
  the expected notify window.
- Add a new KVM capability KVM_CAP_X86_NOTIFY_VMEXIT so that user space
  can query and enable this feature in per-VM scope. The argument is a
  64bit value: bits 63:32 are used for notify window, and bits 31:0 are
  for flags. Current supported flags:
  - KVM_X86_NOTIFY_VMEXIT_ENABLED: enable the feature with the notify
    window provided.
  - KVM_X86_NOTIFY_VMEXIT_USER: exit to userspace once the exits happen.
- It's safe to even set notify window to zero since an internal hardware
  threshold is added to vmcs.notify_window.

VM exit handling:
- Introduce a vcpu state notify_window_exits to records the count of
  notify VM exits and expose it through the debugfs.
- Notify VM exit can happen incident to delivery of a vector event.
  Allow it in KVM.
- Exit to userspace unconditionally for handling when VM_CONTEXT_INVALID
  bit is set.

Nested handling
- Nested notify VM exits are not supported yet. Keep the same notify
  window control in vmcs02 as vmcs01, so that L1 can't escape the
  restriction of notify VM exits through launching L2 VM.

Notify VM exit is defined in latest Intel Architecture Instruction Set
Extensions Programming Reference, chapter 9.2.
Co-developed-by: NXiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: NXiaoyao Li <xiaoyao.li@intel.com>
Signed-off-by: NTao Xu <tao3.xu@intel.com>
Co-developed-by: NChenyi Qiang <chenyi.qiang@intel.com>
Signed-off-by: NChenyi Qiang <chenyi.qiang@intel.com>
Message-Id: <20220524135624.22988-5-chenyi.qiang@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

2f4073e0

KVM: x86: Extend KVM_{G,S}ET_VCPU_EVENTS to support pending triple fault · ed235117

由 Chenyi Qiang 提交于 5月 24, 2022

For the triple fault sythesized by KVM, e.g. the RSM path or
nested_vmx_abort(), if KVM exits to userspace before the request is
serviced, userspace could migrate the VM and lose the triple fault.

Extend KVM_{G,S}ET_VCPU_EVENTS to support pending triple fault with a
new event KVM_VCPUEVENT_VALID_FAULT_FAULT so that userspace can save and
restore the triple fault event. This extension is guarded by a new KVM
capability KVM_CAP_TRIPLE_FAULT_EVENT.

Note that in the set_vcpu_events path, userspace is able to set/clear
the triple fault request through triple_fault.pending field.
Signed-off-by: NChenyi Qiang <chenyi.qiang@intel.com>
Message-Id: <20220524135624.22988-2-chenyi.qiang@intel.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

ed235117

01 6月, 2022 5 次提交

KVM: s390: Add KVM_CAP_S390_PROTECTED_DUMP · e9bf3acb

由 Janosch Frank 提交于 5月 17, 2022

The capability indicates dump support for protected VMs.
Signed-off-by: NJanosch Frank <frankja@linux.ibm.com>
Reviewed-by: NClaudio Imbrenda <imbrenda@linux.ibm.com>
Link: https://lore.kernel.org/r/20220517163629.3443-9-frankja@linux.ibm.com
Message-Id: <20220517163629.3443-9-frankja@linux.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@linux.ibm.com>

e9bf3acb

KVM: s390: Add CPU dump functionality · 8aba0958

由 Janosch Frank 提交于 5月 17, 2022

The previous patch introduced the per-VM dump functions now let's
focus on dumping the VCPU state via the newly introduced
KVM_S390_PV_CPU_COMMAND ioctl which mirrors the VM UV ioctl and can be
extended with new commands later.
Signed-off-by: NJanosch Frank <frankja@linux.ibm.com>
Reviewed-by: NClaudio Imbrenda <imbrenda@linux.ibm.com>
Link: https://lore.kernel.org/r/20220517163629.3443-8-frankja@linux.ibm.com
Message-Id: <20220517163629.3443-8-frankja@linux.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@linux.ibm.com>

8aba0958

KVM: s390: Add configuration dump functionality · 0460eb35

由 Janosch Frank 提交于 5月 17, 2022

Sometimes dumping inside of a VM fails, is unavailable or doesn't
yield the required data. For these occasions we dump the VM from the
outside, writing memory and cpu data to a file.

Up to now PV guests only supported dumping from the inside of the
guest through dumpers like KDUMP. A PV guest can be dumped from the
hypervisor but the data will be stale and / or encrypted.

To get the actual state of the PV VM we need the help of the
Ultravisor who safeguards the VM state. New UV calls have been added
to initialize the dump, dump storage state data, dump cpu data and
complete the dump process. We expose these calls in this patch via a
new UV ioctl command.

The sensitive parts of the dump data are encrypted, the dump key is
derived from the Customer Communication Key (CCK). This ensures that
only the owner of the VM who has the CCK can decrypt the dump data.

The memory is dumped / read via a normal export call and a re-import
after the dump initialization is not needed (no re-encryption with a
dump key).
Signed-off-by: NJanosch Frank <frankja@linux.ibm.com>
Reviewed-by: NClaudio Imbrenda <imbrenda@linux.ibm.com>
Link: https://lore.kernel.org/r/20220517163629.3443-7-frankja@linux.ibm.com
Message-Id: <20220517163629.3443-7-frankja@linux.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@linux.ibm.com>

0460eb35

KVM: s390: pv: Add query dump information · fe9a93e0

由 Janosch Frank 提交于 5月 17, 2022

The dump API requires userspace to provide buffers into which we will
store data. The dump information added in this patch tells userspace
how big those buffers need to be.
Signed-off-by: NJanosch Frank <frankja@linux.ibm.com>
Reviewed-by: NClaudio Imbrenda <imbrenda@linux.ibm.com>
Reviewed-by: NSteffen Eiden <seiden@linux.ibm.com>
Link: https://lore.kernel.org/r/20220517163629.3443-6-frankja@linux.ibm.com
Message-Id: <20220517163629.3443-6-frankja@linux.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@linux.ibm.com>

fe9a93e0

KVM: s390: pv: Add query interface · 35d02493

由 Janosch Frank 提交于 5月 17, 2022

Some of the query information is already available via sysfs but
having a IOCTL makes the information easier to retrieve.
Signed-off-by: NJanosch Frank <frankja@linux.ibm.com>
Reviewed-by: NClaudio Imbrenda <imbrenda@linux.ibm.com>
Reviewed-by: NSteffen Eiden <seiden@linux.ibm.com>
Link: https://lore.kernel.org/r/20220517163629.3443-4-frankja@linux.ibm.com
Message-Id: <20220517163629.3443-4-frankja@linux.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@linux.ibm.com>

35d02493

04 5月, 2022 2 次提交

KVM: arm64: Implement PSCI SYSTEM_SUSPEND · bfbab445

由 Oliver Upton 提交于 5月 04, 2022

ARM DEN0022D.b 5.19 "SYSTEM_SUSPEND" describes a PSCI call that allows
software to request that a system be placed in the deepest possible
low-power state. Effectively, software can use this to suspend itself to
RAM.

Unfortunately, there really is no good way to implement a system-wide
PSCI call in KVM. Any precondition checks done in the kernel will need
to be repeated by userspace since there is no good way to protect a
critical section that spans an exit to userspace. SYSTEM_RESET and
SYSTEM_OFF are equally plagued by this issue, although no users have
seemingly cared for the relatively long time these calls have been
supported.

The solution is to just make the whole implementation userspace's
problem. Introduce a new system event, KVM_SYSTEM_EVENT_SUSPEND, that
indicates to userspace a calling vCPU has invoked PSCI SYSTEM_SUSPEND.
Additionally, add a CAP to get buy-in from userspace for this new exit
type.

Only advertise the SYSTEM_SUSPEND PSCI call if userspace has opted in.
If a vCPU calls SYSTEM_SUSPEND, punt straight to userspace. Provide
explicit documentation of userspace's responsibilites for the exit and
point to the PSCI specification to describe the actual PSCI call.
Reviewed-by: NReiji Watanabe <reijiw@google.com>
Signed-off-by: NOliver Upton <oupton@google.com>
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220504032446.4133305-8-oupton@google.com

bfbab445

KVM: arm64: Add support for userspace to suspend a vCPU · 7b33a09d

由 Oliver Upton 提交于 5月 04, 2022

Introduce a new MP state, KVM_MP_STATE_SUSPENDED, which indicates a vCPU
is in a suspended state. In the suspended state the vCPU will block
until a wakeup event (pending interrupt) is recognized.

Add a new system event type, KVM_SYSTEM_EVENT_WAKEUP, to indicate to
userspace that KVM has recognized one such wakeup event. It is the
responsibility of userspace to then make the vCPU runnable, or leave it
suspended until the next wakeup event.
Signed-off-by: NOliver Upton <oupton@google.com>
Signed-off-by: NMarc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20220504032446.4133305-7-oupton@google.com

7b33a09d

30 4月, 2022 1 次提交

KVM: fix bad user ABI for KVM_EXIT_SYSTEM_EVENT · d495f942

由 Paolo Bonzini 提交于 4月 22, 2022

When KVM_EXIT_SYSTEM_EVENT was introduced, it included a flags
member that at the time was unused.  Unfortunately this extensibility
mechanism has several issues:

- x86 is not writing the member, so it would not be possible to use it
  on x86 except for new events

- the member is not aligned to 64 bits, so the definition of the
  uAPI struct is incorrect for 32- on 64-bit userspace.  This is a
  problem for RISC-V, which supports CONFIG_KVM_COMPAT, but fortunately
  usage of flags was only introduced in 5.18.

Since padding has to be introduced, place a new field in there
that tells if the flags field is valid.  To allow further extensibility,
in fact, change flags to an array of 16 values, and store how many
of the values are valid.  The availability of the new ndata field
is tied to a system capability; all architectures are changed to
fill in the field.

To avoid breaking compilation of userspace that was using the flags
field, provide a userspace-only union to overlap flags with data[0].
The new field is placed at the same offset for both 32- and 64-bit
userspace.

Cc: Will Deacon <will@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Peter Gonda <pgonda@google.com>
Cc: Sean Christopherson <seanjc@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Reported-by: Nkernel test robot <lkp@intel.com>
Message-Id: <20220422103013.34832-1-pbonzini@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

d495f942

14 4月, 2022 1 次提交

KVM, SEV: Add KVM_EXIT_SHUTDOWN metadata for SEV-ES · c24a950e

由 Peter Gonda 提交于 4月 07, 2022

If an SEV-ES guest requests termination, exit to userspace with
KVM_EXIT_SYSTEM_EVENT and a dedicated SEV_TERM type instead of -EINVAL
so that userspace can take appropriate action.

See AMD's GHCB spec section '4.1.13 Termination Request' for more details.
Suggested-by: NSean Christopherson <seanjc@google.com>
Suggested-by: NPaolo Bonzini <pbonzini@redhat.com>
Cc: kvm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: NPeter Gonda <pgonda@google.com>
Reported-by: Nkernel test robot <lkp@intel.com>
Message-Id: <20220407210233.782250-1-pgonda@google.com>
[Add documentatino. - Paolo]
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

c24a950e

02 4月, 2022 8 次提交

KVM: x86: Accept KVM_[GS]ET_TSC_KHZ as a VM ioctl. · ffbb61d0

由 David Woodhouse 提交于 2月 25, 2022

This sets the default TSC frequency for subsequently created vCPUs.
Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk>
Message-Id: <20220225145304.36166-2-dwmw2@infradead.org>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

ffbb61d0

KVM: x86/xen: Advertise and document KVM_XEN_HVM_CONFIG_EVTCHN_SEND · 661a20fa

由 David Woodhouse 提交于 3月 03, 2022

At the end of the patch series adding this batch of event channel
acceleration features, finally add the feature bit which advertises
them and document it all.

For SCHEDOP_poll we need to wake a polling vCPU when a given port
is triggered, even when it's masked — and we want to implement that
in the kernel, for efficiency. So we want the kernel to know that it
has sole ownership of event channel delivery. Thus, we allow
userspace to make the 'promise' by setting the corresponding feature
bit in its KVM_XEN_HVM_CONFIG call. As we implement SCHEDOP_poll
bypass later, we will do so only if that promise has been made by
userspace.
Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Message-Id: <20220303154127.202856-16-dwmw2@infradead.org>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

661a20fa

KVM: x86/xen: Support per-vCPU event channel upcall via local APIC · fde0451b

由 David Woodhouse 提交于 3月 03, 2022

Windows uses a per-vCPU vector, and it's delivered via the local APIC
basically like an MSI (with associated EOI) unlike the traditional
guest-wide vector which is just magically asserted by Xen (and in the
KVM case by kvm_xen_has_interrupt() / kvm_cpu_get_extint()).

Now that the kernel is able to raise event channel events for itself,
being able to do so for Windows guests is also going to be useful.
Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Message-Id: <20220303154127.202856-15-dwmw2@infradead.org>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

fde0451b

KVM: x86/xen: Kernel acceleration for XENVER_version · 28d1629f

由 David Woodhouse 提交于 3月 03, 2022

Turns out this is a fast path for PV guests because they use it to
trigger the event channel upcall. So letting it bounce all the way up
to userspace is not great.
Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Message-Id: <20220303154127.202856-14-dwmw2@infradead.org>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

28d1629f

KVM: x86/xen: handle PV timers oneshot mode · 53639526

由 Joao Martins 提交于 3月 03, 2022

If the guest has offloaded the timer virq, handle the following
hypercalls for programming the timer:

    VCPUOP_set_singleshot_timer
    VCPUOP_stop_singleshot_timer
    set_timer_op(timestamp_ns)

The event channel corresponding to the timer virq is then used to inject
events once timer deadlines are met. For now we back the PV timer with
hrtimer.

[ dwmw2: Add save/restore, 32-bit compat mode, immediate delivery,
         don't check timer in kvm_vcpu_has_event() ]
Signed-off-by: NJoao Martins <joao.m.martins@oracle.com>
Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Message-Id: <20220303154127.202856-13-dwmw2@infradead.org>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

53639526

KVM: x86/xen: Add KVM_XEN_VCPU_ATTR_TYPE_VCPU_ID · 942c2490

由 David Woodhouse 提交于 3月 03, 2022

In order to intercept hypercalls such as VCPUOP_set_singleshot_timer, we
need to be aware of the Xen CPU numbering.

This looks a lot like the Hyper-V handling of vpidx, for obvious reasons.
Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Message-Id: <20220303154127.202856-12-dwmw2@infradead.org>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

942c2490

KVM: x86/xen: intercept EVTCHNOP_send from guests · 2fd6df2f

由 Joao Martins 提交于 3月 03, 2022

Userspace registers a sending @port to either deliver to an @eventfd
or directly back to a local event channel port.

After binding events the guest or host may wish to bind those
events to a particular vcpu. This is usually done for unbound
and and interdomain events. Update requests are handled via the
KVM_XEN_EVTCHN_UPDATE flag.

Unregistered ports are handled by the emulator.
Co-developed-by: NAnkur Arora <ankur.a.arora@oracle.com>
Co-developed-By: NDavid Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: NJoao Martins <joao.m.martins@oracle.com>
Signed-off-by: NAnkur Arora <ankur.a.arora@oracle.com>
Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Message-Id: <20220303154127.202856-10-dwmw2@infradead.org>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

2fd6df2f

KVM: x86/xen: Support direct injection of event channel events · 35025735

由 David Woodhouse 提交于 3月 03, 2022

This adds a KVM_XEN_HVM_EVTCHN_SEND ioctl which allows direct injection
of events given an explicit { vcpu, port, priority } in precisely the
same form that those fields are given in the IRQ routing table.

Userspace is currently able to inject 2-level events purely by setting
the bits in the shared_info and vcpu_info, but FIFO event channels are
harder to deal with; we will need the kernel to take sole ownership of
delivery when we support those.

A patch advertising this feature with a new bit in the KVM_CAP_XEN_HVM
ioctl will be added in a subsequent patch.
Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Message-Id: <20220303154127.202856-9-dwmw2@infradead.org>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

35025735

21 3月, 2022 1 次提交

KVM: x86: Introduce KVM_CAP_DISABLE_QUIRKS2 · 6d849191

由 Oliver Upton 提交于 3月 01, 2022

KVM_CAP_DISABLE_QUIRKS is irrevocably broken. The capability does not
advertise the set of quirks which may be disabled to userspace, so it is
impossible to predict the behavior of KVM. Worse yet,
KVM_CAP_DISABLE_QUIRKS will tolerate any value for cap->args[0], meaning
it fails to reject attempts to set invalid quirk bits.

The only valid workaround for the quirky quirks API is to add a new CAP.
Actually advertise the set of quirks that can be disabled to userspace
so it can predict KVM's behavior. Reject values for cap->args[0] that
contain invalid bits.

Finally, add documentation for the new capability and describe the
existing quirks.
Signed-off-by: NOliver Upton <oupton@google.com>
Message-Id: <20220301060351.442881-5-oupton@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

6d849191

25 2月, 2022 1 次提交

KVM: x86: Provide per VM capability for disabling PMU virtualization · ba7bb663

由 David Dunn 提交于 2月 23, 2022

Add a new capability, KVM_CAP_PMU_CAPABILITY, that takes a bitmask of
settings/features to allow userspace to configure PMU virtualization on
a per-VM basis.  For now, support a single flag, KVM_PMU_CAP_DISABLE,
to allow disabling PMU virtualization for a VM even when KVM is configured
with enable_pmu=true a module level.

To keep KVM simple, disallow changing VM's PMU configuration after vCPUs
have been created.
Signed-off-by: NDavid Dunn <daviddunn@google.com>
Message-Id: <20220223225743.2703915-2-daviddunn@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

ba7bb663

22 2月, 2022 1 次提交

KVM: PPC: reserve capability 210 for KVM_CAP_PPC_AIL_MODE_3 · 93b71801

由 Nicholas Piggin 提交于 2月 22, 2022

Add KVM_CAP_PPC_AIL_MODE_3 to advertise the capability to set the AIL
resource mode to 3 with the H_SET_MODE hypercall. This capability
differs between processor types and KVM types (PR, HV, Nested HV), and
affects guest-visible behaviour.

QEMU will implement a cap-ail-mode-3 to control this behaviour[1], and
use the KVM CAP if available to determine KVM support[2].
Reviewed-by: NFabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

93b71801

14 2月, 2022 4 次提交

KVM: s390: Update api documentation for memop ioctl · 5e35d0eb

由 Janis Schoetterl-Glausch 提交于 2月 11, 2022

Document all currently existing operations, flags and explain under
which circumstances they are available. Document the recently
introduced absolute operations and the storage key protection flag,
as well as the existing SIDA operations.
Signed-off-by: NJanis Schoetterl-Glausch <scgl@linux.ibm.com>
Reviewed-by: NJanosch Frank <frankja@linux.ibm.com>
Link: https://lore.kernel.org/r/20220211182215.2730017-10-scgl@linux.ibm.comSigned-off-by: NChristian Borntraeger <borntraeger@linux.ibm.com>

5e35d0eb

KVM: s390: Add capability for storage key extension of MEM_OP IOCTL · d004079e

由 Janis Schoetterl-Glausch 提交于 2月 11, 2022

Availability of the KVM_CAP_S390_MEM_OP_EXTENSION capability signals that:
* The vcpu MEM_OP IOCTL supports storage key checking.
* The vm MEM_OP IOCTL exists.
Signed-off-by: NJanis Schoetterl-Glausch <scgl@linux.ibm.com>
Reviewed-by: NJanosch Frank <frankja@linux.ibm.com>
Reviewed-by: NChristian Borntraeger <borntraeger@linux.ibm.com>
Link: https://lore.kernel.org/r/20220211182215.2730017-9-scgl@linux.ibm.comSigned-off-by: NChristian Borntraeger <borntraeger@linux.ibm.com>

d004079e

KVM: s390: Add vm IOCTL for key checked guest absolute memory access · ef11c946

由 Janis Schoetterl-Glausch 提交于 2月 11, 2022

Channel I/O honors storage keys and is performed on absolute memory.
For I/O emulation user space therefore needs to be able to do key
checked accesses.
The vm IOCTL supports read/write accesses, as well as checking
if an access would succeed.
Unlike relying on KVM_S390_GET_SKEYS for key checking would,
the vm IOCTL performs the check in lockstep with the read or write,
by, ultimately, mapping the access to move instructions that
support key protection checking with a supplied key.
Fetch and storage protection override are not applicable to absolute
accesses and so are not applied as they are when using the vcpu memop.
Signed-off-by: NJanis Schoetterl-Glausch <scgl@linux.ibm.com>
Reviewed-by: NChristian Borntraeger <borntraeger@linux.ibm.com>
Link: https://lore.kernel.org/r/20220211182215.2730017-7-scgl@linux.ibm.comSigned-off-by: NChristian Borntraeger <borntraeger@linux.ibm.com>

ef11c946

KVM: s390: Add optional storage key checking to MEMOP IOCTL · e9e9feeb

由 Janis Schoetterl-Glausch 提交于 2月 11, 2022

User space needs a mechanism to perform key checked accesses when
emulating instructions.

The key can be passed as an additional argument.
Having an additional argument is flexible, as user space can
pass the guest PSW's key, in order to make an access the same way the
CPU would, or pass another key if necessary.
Signed-off-by: NJanis Schoetterl-Glausch <scgl@linux.ibm.com>
Reviewed-by: NClaudio Imbrenda <imbrenda@linux.ibm.com>
Reviewed-by: NChristian Borntraeger <borntraeger@linux.ibm.com>
Reviewed-by: NJanosch Frank <frankja@linux.ibm.com>
Link: https://lore.kernel.org/r/20220211182215.2730017-6-scgl@linux.ibm.comSigned-off-by: NChristian Borntraeger <borntraeger@linux.ibm.com>

e9e9feeb

31 1月, 2022 1 次提交

kvm: Move KVM_GET_XSAVE2 IOCTL definition at the end of kvm.h · f6c6804c

由 Janosch Frank 提交于 1月 28, 2022

This way we can more easily find the next free IOCTL number when
adding new IOCTLs.

Fixes: be50b206 ("kvm: x86: Add support for getting/setting expanded xstate buffer")
Signed-off-by: NJanosch Frank <frankja@linux.ibm.com>
Message-Id: <20220128154025.102666-1-frankja@linux.ibm.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

f6c6804c

28 1月, 2022 1 次提交

KVM: x86: add system attribute to retrieve full set of supported xsave states · dd6e6312

由 Paolo Bonzini 提交于 1月 26, 2022

Because KVM_GET_SUPPORTED_CPUID is meant to be passed (by simple-minded
VMMs) to KVM_SET_CPUID2, it cannot include any dynamic xsave states that
have not been enabled. Probing those, for example so that they can be
passed to ARCH_REQ_XCOMP_GUEST_PERM, requires a new ioctl or arch_prctl.
The latter is in fact worse, even though that is what the rest of the
API uses, because it would require supported_xcr0 to be moved from the
KVM module to the kernel just for this use. In addition, the value
would be nonsensical (or an error would have to be returned) until
the KVM module is loaded in.

Therefore, to limit the growth of system ioctls, add a /dev/kvm
variant of KVM_{GET,HAS}_DEVICE_ATTR, and implement it in x86
with just one group (0) and attribute (KVM_X86_XCOMP_GUEST_SUPP).
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

dd6e6312

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功