- 30 11月, 2022 1 次提交
-
-
由 David Woodhouse 提交于
Closer inspection of the Xen code shows that we aren't supposed to be using the XEN_RUNSTATE_UPDATE flag unconditionally. It should be explicitly enabled by guests through the HYPERVISOR_vm_assist hypercall. If we randomly set the top bit of ->state_entry_time for a guest that hasn't asked for it and doesn't expect it, that could make the runtimes fail to add up and confuse the guest. Without the flag it's perfectly safe for a vCPU to read its own vcpu_runstate_info; just not for one vCPU to read *another's*. I briefly pondered adding a word for the whole set of VMASST_TYPE_* flags but the only one we care about for HVM guests is this, so it seemed a bit pointless. Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk> Message-Id: <20221127122210.248427-3-dwmw2@infradead.org> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 23 11月, 2022 2 次提交
-
-
由 Claudio Imbrenda 提交于
Add KVM_CAP_S390_PROTECTED_ASYNC_DISABLE to signal that the KVM_PV_ASYNC_DISABLE and KVM_PV_ASYNC_DISABLE_PREPARE commands for the KVM_S390_PV_COMMAND ioctl are available. Signed-off-by: NClaudio Imbrenda <imbrenda@linux.ibm.com> Reviewed-by: NNico Boehr <nrb@linux.ibm.com> Reviewed-by: NSteffen Eiden <seiden@linux.ibm.com> Reviewed-by: NJanosch Frank <frankja@linux.ibm.com> Link: https://lore.kernel.org/r/20221111170632.77622-4-imbrenda@linux.ibm.com Message-Id: <20221111170632.77622-4-imbrenda@linux.ibm.com> Signed-off-by: NJanosch Frank <frankja@linux.ibm.com>
-
由 Claudio Imbrenda 提交于
Until now, destroying a protected guest was an entirely synchronous operation that could potentially take a very long time, depending on the size of the guest, due to the time needed to clean up the address space from protected pages. This patch implements an asynchronous destroy mechanism, that allows a protected guest to reboot significantly faster than previously. This is achieved by clearing the pages of the old guest in background. In case of reboot, the new guest will be able to run in the same address space almost immediately. The old protected guest is then only destroyed when all of its memory has been destroyed or otherwise made non protected. Two new PV commands are added for the KVM_S390_PV_COMMAND ioctl: KVM_PV_ASYNC_CLEANUP_PREPARE: set aside the current protected VM for later asynchronous teardown. The current KVM VM will then continue immediately as non-protected. If a protected VM had already been set aside for asynchronous teardown, but without starting the teardown process, this call will fail. There can be at most one VM set aside at any time. Once it is set aside, the protected VM only exists in the context of the Ultravisor, it is not associated with the KVM VM anymore. Its protected CPUs have already been destroyed, but not its memory. This command can be issued again immediately after starting KVM_PV_ASYNC_CLEANUP_PERFORM, without having to wait for completion. KVM_PV_ASYNC_CLEANUP_PERFORM: tears down the protected VM previously set aside using KVM_PV_ASYNC_CLEANUP_PREPARE. Ideally the KVM_PV_ASYNC_CLEANUP_PERFORM PV command should be issued by userspace from a separate thread. If a fatal signal is received (or if the process terminates naturally), the command will terminate immediately without completing. All protected VMs whose teardown was interrupted will be put in the need_cleanup list. The rest of the normal KVM teardown process will take care of properly cleaning up all remaining protected VMs, including the ones on the need_cleanup list. Signed-off-by: NClaudio Imbrenda <imbrenda@linux.ibm.com> Reviewed-by: NNico Boehr <nrb@linux.ibm.com> Reviewed-by: NJanosch Frank <frankja@linux.ibm.com> Reviewed-by: NSteffen Eiden <seiden@linux.ibm.com> Link: https://lore.kernel.org/r/20221111170632.77622-2-imbrenda@linux.ibm.com Message-Id: <20221111170632.77622-2-imbrenda@linux.ibm.com> Signed-off-by: NJanosch Frank <frankja@linux.ibm.com>
-
- 10 11月, 2022 1 次提交
-
-
由 Aaron Lewis 提交于
Add the mask KVM_MSR_EXIT_REASON_VALID_MASK for the MSR exit reason flags. This simplifies checks that validate these flags, and makes it easier to introduce new flags in the future. No functional change intended. Signed-off-by: NAaron Lewis <aaronlewis@google.com> Message-Id: <20220921151525.904162-3-aaronlewis@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 29 9月, 2022 1 次提交
-
-
由 Marc Zyngier 提交于
In order to differenciate between architectures that require no extra synchronisation when accessing the dirty ring and those who do, add a new capability (KVM_CAP_DIRTY_LOG_RING_ACQ_REL) that identify the latter sort. TSO architectures can obviously advertise both, while relaxed architectures must only advertise the ACQ_REL version. This requires some configuration symbol rejigging, with HAVE_KVM_DIRTY_RING being only indirectly selected by two top-level config symbols: - HAVE_KVM_DIRTY_RING_TSO for strongly ordered architectures (x86) - HAVE_KVM_DIRTY_RING_ACQ_REL for weakly ordered architectures (arm64) Suggested-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NMarc Zyngier <maz@kernel.org> Reviewed-by: NGavin Shan <gshan@redhat.com> Reviewed-by: NPeter Xu <peterx@redhat.com> Link: https://lore.kernel.org/r/20220926145120.27974-3-maz@kernel.org
-
- 29 7月, 2022 1 次提交
-
-
由 Anup Patel 提交于
We add an extensible CSR emulation framework which is based upon the existing system instruction emulation. This will be useful to upcoming AIA, PMU, Nested and other virtualization features. The CSR emulation framework also has provision to emulate CSR in user space but this will be used only in very specific cases such as AIA IMSIC CSR emulation in user space or vendor specific CSR emulation in user space. By default, all CSRs not handled by KVM RISC-V will be redirected back to Guest VCPU as illegal instruction trap. Signed-off-by: NAnup Patel <apatel@ventanamicro.com> Signed-off-by: NAnup Patel <anup@brainfault.org>
-
- 20 7月, 2022 1 次提交
-
-
由 Pierre Morel 提交于
During a subsystem reset the Topology-Change-Report is cleared. Let's give userland the possibility to clear the MTCR in the case of a subsystem reset. To migrate the MTCR, we give userland the possibility to query the MTCR state. We indicate KVM support for the CPU topology facility with a new KVM capability: KVM_CAP_S390_CPU_TOPOLOGY. Signed-off-by: NPierre Morel <pmorel@linux.ibm.com> Reviewed-by: NJanis Schoetterl-Glausch <scgl@linux.ibm.com> Reviewed-by: NJanosch Frank <frankja@linux.ibm.com> Message-Id: <20220714194334.127812-1-pmorel@linux.ibm.com> Link: https://lore.kernel.org/all/20220714194334.127812-1-pmorel@linux.ibm.com/ [frankja@linux.ibm.com: Simple conflict resolution in Documentation/virt/kvm/api.rst] Signed-off-by: NJanosch Frank <frankja@linux.ibm.com>
-
- 19 7月, 2022 1 次提交
-
-
由 Oliver Upton 提交于
commit 1b870fa5 ("kvm: stats: tell userspace which values are boolean") added a new stat unit (boolean) but failed to raise KVM_STATS_UNIT_MAX. Fix by pointing UNIT_MAX at the new max value of UNIT_BOOLEAN. Fixes: 1b870fa5 ("kvm: stats: tell userspace which values are boolean") Reported-by: NJanis Schoetterl-Glausch <scgl@linux.ibm.com> Signed-off-by: NOliver Upton <oupton@google.com> Message-Id: <20220719125229.2934273-1-oupton@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 14 7月, 2022 1 次提交
-
-
由 Paolo Bonzini 提交于
Some of the statistics values exported by KVM are always only 0 or 1. It can be useful to export this fact to userspace so that it can track them specially (for example by polling the value every now and then to compute a % of time spent in a specific state). Therefore, add "boolean value" as a new "unit". While it is not exactly a unit, it walks and quacks like one. In particular, using the type would be wrong because boolean values could be instantaneous or peak values (e.g. "is the rmap allocated?") or even two-bucket histograms (e.g. "number of posted vs. non-posted interrupt injections"). Suggested-by: NAmneesh Singh <natto@weirdnatto.in> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 11 7月, 2022 1 次提交
-
-
由 Matthew Rosato 提交于
The KVM_S390_ZPCI_OP ioctl provides a mechanism for managing hardware-assisted virtualization features for s390x zPCI passthrough. Add the first 2 operations, which can be used to enable/disable the specified device for Adapter Event Notification interpretation. Signed-off-by: NMatthew Rosato <mjrosato@linux.ibm.com> Acked-by: NPierre Morel <pmorel@linux.ibm.com> Reviewed-by: NThomas Huth <thuth@redhat.com> Link: https://lore.kernel.org/r/20220606203325.110625-21-mjrosato@linux.ibm.comSigned-off-by: NChristian Borntraeger <borntraeger@linux.ibm.com>
-
- 29 6月, 2022 1 次提交
-
-
由 Gustavo A. R. Silva 提交于
There is a regular need in the kernel to provide a way to declare having a dynamically sized set of trailing elements in a structure. Kernel code should always use “flexible array members”[1] for these cases. The older style of one-element or zero-length arrays should no longer be used[2]. This code was transformed with the help of Coccinelle: (linux-5.19-rc2$ spatch --jobs $(getconf _NPROCESSORS_ONLN) --sp-file script.cocci --include-headers --dir . > output.patch) @@ identifier S, member, array; type T1, T2; @@ struct S { ... T1 member; T2 array[ - 0 ]; }; -fstrict-flex-arrays=3 is coming and we need to land these changes to prevent issues like these in the short future: ../fs/minix/dir.c:337:3: warning: 'strcpy' will always overflow; destination buffer has size 0, but the source string has length 2 (including NUL byte) [-Wfortify-source] strcpy(de3->name, "."); ^ Since these are all [0] to [] changes, the risk to UAPI is nearly zero. If this breaks anything, we can use a union with a new member name. [1] https://en.wikipedia.org/wiki/Flexible_array_member [2] https://www.kernel.org/doc/html/v5.16/process/deprecated.html#zero-length-and-one-element-arrays Link: https://github.com/KSPP/linux/issues/78Build-tested-by: Nkernel test robot <lkp@intel.com> Link: https://lore.kernel.org/lkml/62b675ec.wKX6AOZ6cbE71vtF%25lkp@intel.com/ Acked-by: Dan Williams <dan.j.williams@intel.com> # For ndctl.h Signed-off-by: NGustavo A. R. Silva <gustavoars@kernel.org>
-
- 24 6月, 2022 1 次提交
-
-
由 Ben Gardon 提交于
In some cases, the NX hugepage mitigation for iTLB multihit is not needed for all guests on a host. Allow disabling the mitigation on a per-VM basis to avoid the performance hit of NX hugepages on trusted workloads. In order to disable NX hugepages on a VM, ensure that the userspace actor has permission to reboot the system. Since disabling NX hugepages would allow a guest to crash the system, it is similar to reboot permissions. Ideally, KVM would require userspace to prove it has access to KVM's nx_huge_pages module param, e.g. so that userspace can opt out without needing full reboot permissions. But getting access to the module param file info is difficult because it is buried in layers of sysfs and module glue. Requiring CAP_SYS_BOOT is sufficient for all known use cases. Suggested-by: NJim Mattson <jmattson@google.com> Reviewed-by: NDavid Matlack <dmatlack@google.com> Reviewed-by: NPeter Xu <peterx@redhat.com> Signed-off-by: NBen Gardon <bgardon@google.com> Message-Id: <20220613212523.3436117-9-bgardon@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 08 6月, 2022 2 次提交
-
-
由 Tao Xu 提交于
There are cases that malicious virtual machines can cause CPU stuck (due to event windows don't open up), e.g., infinite loop in microcode when nested #AC (CVE-2015-5307). No event window means no event (NMI, SMI and IRQ) can be delivered. It leads the CPU to be unavailable to host or other VMs. VMM can enable notify VM exit that a VM exit generated if no event window occurs in VM non-root mode for a specified amount of time (notify window). Feature enabling: - The new vmcs field SECONDARY_EXEC_NOTIFY_VM_EXITING is introduced to enable this feature. VMM can set NOTIFY_WINDOW vmcs field to adjust the expected notify window. - Add a new KVM capability KVM_CAP_X86_NOTIFY_VMEXIT so that user space can query and enable this feature in per-VM scope. The argument is a 64bit value: bits 63:32 are used for notify window, and bits 31:0 are for flags. Current supported flags: - KVM_X86_NOTIFY_VMEXIT_ENABLED: enable the feature with the notify window provided. - KVM_X86_NOTIFY_VMEXIT_USER: exit to userspace once the exits happen. - It's safe to even set notify window to zero since an internal hardware threshold is added to vmcs.notify_window. VM exit handling: - Introduce a vcpu state notify_window_exits to records the count of notify VM exits and expose it through the debugfs. - Notify VM exit can happen incident to delivery of a vector event. Allow it in KVM. - Exit to userspace unconditionally for handling when VM_CONTEXT_INVALID bit is set. Nested handling - Nested notify VM exits are not supported yet. Keep the same notify window control in vmcs02 as vmcs01, so that L1 can't escape the restriction of notify VM exits through launching L2 VM. Notify VM exit is defined in latest Intel Architecture Instruction Set Extensions Programming Reference, chapter 9.2. Co-developed-by: NXiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: NXiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: NTao Xu <tao3.xu@intel.com> Co-developed-by: NChenyi Qiang <chenyi.qiang@intel.com> Signed-off-by: NChenyi Qiang <chenyi.qiang@intel.com> Message-Id: <20220524135624.22988-5-chenyi.qiang@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Chenyi Qiang 提交于
For the triple fault sythesized by KVM, e.g. the RSM path or nested_vmx_abort(), if KVM exits to userspace before the request is serviced, userspace could migrate the VM and lose the triple fault. Extend KVM_{G,S}ET_VCPU_EVENTS to support pending triple fault with a new event KVM_VCPUEVENT_VALID_FAULT_FAULT so that userspace can save and restore the triple fault event. This extension is guarded by a new KVM capability KVM_CAP_TRIPLE_FAULT_EVENT. Note that in the set_vcpu_events path, userspace is able to set/clear the triple fault request through triple_fault.pending field. Signed-off-by: NChenyi Qiang <chenyi.qiang@intel.com> Message-Id: <20220524135624.22988-2-chenyi.qiang@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 01 6月, 2022 5 次提交
-
-
由 Janosch Frank 提交于
The capability indicates dump support for protected VMs. Signed-off-by: NJanosch Frank <frankja@linux.ibm.com> Reviewed-by: NClaudio Imbrenda <imbrenda@linux.ibm.com> Link: https://lore.kernel.org/r/20220517163629.3443-9-frankja@linux.ibm.com Message-Id: <20220517163629.3443-9-frankja@linux.ibm.com> Signed-off-by: NChristian Borntraeger <borntraeger@linux.ibm.com>
-
由 Janosch Frank 提交于
The previous patch introduced the per-VM dump functions now let's focus on dumping the VCPU state via the newly introduced KVM_S390_PV_CPU_COMMAND ioctl which mirrors the VM UV ioctl and can be extended with new commands later. Signed-off-by: NJanosch Frank <frankja@linux.ibm.com> Reviewed-by: NClaudio Imbrenda <imbrenda@linux.ibm.com> Link: https://lore.kernel.org/r/20220517163629.3443-8-frankja@linux.ibm.com Message-Id: <20220517163629.3443-8-frankja@linux.ibm.com> Signed-off-by: NChristian Borntraeger <borntraeger@linux.ibm.com>
-
由 Janosch Frank 提交于
Sometimes dumping inside of a VM fails, is unavailable or doesn't yield the required data. For these occasions we dump the VM from the outside, writing memory and cpu data to a file. Up to now PV guests only supported dumping from the inside of the guest through dumpers like KDUMP. A PV guest can be dumped from the hypervisor but the data will be stale and / or encrypted. To get the actual state of the PV VM we need the help of the Ultravisor who safeguards the VM state. New UV calls have been added to initialize the dump, dump storage state data, dump cpu data and complete the dump process. We expose these calls in this patch via a new UV ioctl command. The sensitive parts of the dump data are encrypted, the dump key is derived from the Customer Communication Key (CCK). This ensures that only the owner of the VM who has the CCK can decrypt the dump data. The memory is dumped / read via a normal export call and a re-import after the dump initialization is not needed (no re-encryption with a dump key). Signed-off-by: NJanosch Frank <frankja@linux.ibm.com> Reviewed-by: NClaudio Imbrenda <imbrenda@linux.ibm.com> Link: https://lore.kernel.org/r/20220517163629.3443-7-frankja@linux.ibm.com Message-Id: <20220517163629.3443-7-frankja@linux.ibm.com> Signed-off-by: NChristian Borntraeger <borntraeger@linux.ibm.com>
-
由 Janosch Frank 提交于
The dump API requires userspace to provide buffers into which we will store data. The dump information added in this patch tells userspace how big those buffers need to be. Signed-off-by: NJanosch Frank <frankja@linux.ibm.com> Reviewed-by: NClaudio Imbrenda <imbrenda@linux.ibm.com> Reviewed-by: NSteffen Eiden <seiden@linux.ibm.com> Link: https://lore.kernel.org/r/20220517163629.3443-6-frankja@linux.ibm.com Message-Id: <20220517163629.3443-6-frankja@linux.ibm.com> Signed-off-by: NChristian Borntraeger <borntraeger@linux.ibm.com>
-
由 Janosch Frank 提交于
Some of the query information is already available via sysfs but having a IOCTL makes the information easier to retrieve. Signed-off-by: NJanosch Frank <frankja@linux.ibm.com> Reviewed-by: NClaudio Imbrenda <imbrenda@linux.ibm.com> Reviewed-by: NSteffen Eiden <seiden@linux.ibm.com> Link: https://lore.kernel.org/r/20220517163629.3443-4-frankja@linux.ibm.com Message-Id: <20220517163629.3443-4-frankja@linux.ibm.com> Signed-off-by: NChristian Borntraeger <borntraeger@linux.ibm.com>
-
- 04 5月, 2022 2 次提交
-
-
由 Oliver Upton 提交于
ARM DEN0022D.b 5.19 "SYSTEM_SUSPEND" describes a PSCI call that allows software to request that a system be placed in the deepest possible low-power state. Effectively, software can use this to suspend itself to RAM. Unfortunately, there really is no good way to implement a system-wide PSCI call in KVM. Any precondition checks done in the kernel will need to be repeated by userspace since there is no good way to protect a critical section that spans an exit to userspace. SYSTEM_RESET and SYSTEM_OFF are equally plagued by this issue, although no users have seemingly cared for the relatively long time these calls have been supported. The solution is to just make the whole implementation userspace's problem. Introduce a new system event, KVM_SYSTEM_EVENT_SUSPEND, that indicates to userspace a calling vCPU has invoked PSCI SYSTEM_SUSPEND. Additionally, add a CAP to get buy-in from userspace for this new exit type. Only advertise the SYSTEM_SUSPEND PSCI call if userspace has opted in. If a vCPU calls SYSTEM_SUSPEND, punt straight to userspace. Provide explicit documentation of userspace's responsibilites for the exit and point to the PSCI specification to describe the actual PSCI call. Reviewed-by: NReiji Watanabe <reijiw@google.com> Signed-off-by: NOliver Upton <oupton@google.com> Signed-off-by: NMarc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220504032446.4133305-8-oupton@google.com
-
由 Oliver Upton 提交于
Introduce a new MP state, KVM_MP_STATE_SUSPENDED, which indicates a vCPU is in a suspended state. In the suspended state the vCPU will block until a wakeup event (pending interrupt) is recognized. Add a new system event type, KVM_SYSTEM_EVENT_WAKEUP, to indicate to userspace that KVM has recognized one such wakeup event. It is the responsibility of userspace to then make the vCPU runnable, or leave it suspended until the next wakeup event. Signed-off-by: NOliver Upton <oupton@google.com> Signed-off-by: NMarc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20220504032446.4133305-7-oupton@google.com
-
- 30 4月, 2022 1 次提交
-
-
由 Paolo Bonzini 提交于
When KVM_EXIT_SYSTEM_EVENT was introduced, it included a flags member that at the time was unused. Unfortunately this extensibility mechanism has several issues: - x86 is not writing the member, so it would not be possible to use it on x86 except for new events - the member is not aligned to 64 bits, so the definition of the uAPI struct is incorrect for 32- on 64-bit userspace. This is a problem for RISC-V, which supports CONFIG_KVM_COMPAT, but fortunately usage of flags was only introduced in 5.18. Since padding has to be introduced, place a new field in there that tells if the flags field is valid. To allow further extensibility, in fact, change flags to an array of 16 values, and store how many of the values are valid. The availability of the new ndata field is tied to a system capability; all architectures are changed to fill in the field. To avoid breaking compilation of userspace that was using the flags field, provide a userspace-only union to overlap flags with data[0]. The new field is placed at the same offset for both 32- and 64-bit userspace. Cc: Will Deacon <will@kernel.org> Cc: Marc Zyngier <maz@kernel.org> Cc: Peter Gonda <pgonda@google.com> Cc: Sean Christopherson <seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Reported-by: Nkernel test robot <lkp@intel.com> Message-Id: <20220422103013.34832-1-pbonzini@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 14 4月, 2022 1 次提交
-
-
由 Peter Gonda 提交于
If an SEV-ES guest requests termination, exit to userspace with KVM_EXIT_SYSTEM_EVENT and a dedicated SEV_TERM type instead of -EINVAL so that userspace can take appropriate action. See AMD's GHCB spec section '4.1.13 Termination Request' for more details. Suggested-by: NSean Christopherson <seanjc@google.com> Suggested-by: NPaolo Bonzini <pbonzini@redhat.com> Cc: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: NPeter Gonda <pgonda@google.com> Reported-by: Nkernel test robot <lkp@intel.com> Message-Id: <20220407210233.782250-1-pgonda@google.com> [Add documentatino. - Paolo] Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 02 4月, 2022 8 次提交
-
-
由 David Woodhouse 提交于
This sets the default TSC frequency for subsequently created vCPUs. Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk> Message-Id: <20220225145304.36166-2-dwmw2@infradead.org> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 David Woodhouse 提交于
At the end of the patch series adding this batch of event channel acceleration features, finally add the feature bit which advertises them and document it all. For SCHEDOP_poll we need to wake a polling vCPU when a given port is triggered, even when it's masked — and we want to implement that in the kernel, for efficiency. So we want the kernel to know that it has sole ownership of event channel delivery. Thus, we allow userspace to make the 'promise' by setting the corresponding feature bit in its KVM_XEN_HVM_CONFIG call. As we implement SCHEDOP_poll bypass later, we will do so only if that promise has been made by userspace. Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Message-Id: <20220303154127.202856-16-dwmw2@infradead.org> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 David Woodhouse 提交于
Windows uses a per-vCPU vector, and it's delivered via the local APIC basically like an MSI (with associated EOI) unlike the traditional guest-wide vector which is just magically asserted by Xen (and in the KVM case by kvm_xen_has_interrupt() / kvm_cpu_get_extint()). Now that the kernel is able to raise event channel events for itself, being able to do so for Windows guests is also going to be useful. Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Message-Id: <20220303154127.202856-15-dwmw2@infradead.org> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 David Woodhouse 提交于
Turns out this is a fast path for PV guests because they use it to trigger the event channel upcall. So letting it bounce all the way up to userspace is not great. Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Message-Id: <20220303154127.202856-14-dwmw2@infradead.org> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Joao Martins 提交于
If the guest has offloaded the timer virq, handle the following hypercalls for programming the timer: VCPUOP_set_singleshot_timer VCPUOP_stop_singleshot_timer set_timer_op(timestamp_ns) The event channel corresponding to the timer virq is then used to inject events once timer deadlines are met. For now we back the PV timer with hrtimer. [ dwmw2: Add save/restore, 32-bit compat mode, immediate delivery, don't check timer in kvm_vcpu_has_event() ] Signed-off-by: NJoao Martins <joao.m.martins@oracle.com> Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Message-Id: <20220303154127.202856-13-dwmw2@infradead.org> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 David Woodhouse 提交于
In order to intercept hypercalls such as VCPUOP_set_singleshot_timer, we need to be aware of the Xen CPU numbering. This looks a lot like the Hyper-V handling of vpidx, for obvious reasons. Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Message-Id: <20220303154127.202856-12-dwmw2@infradead.org> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Joao Martins 提交于
Userspace registers a sending @port to either deliver to an @eventfd or directly back to a local event channel port. After binding events the guest or host may wish to bind those events to a particular vcpu. This is usually done for unbound and and interdomain events. Update requests are handled via the KVM_XEN_EVTCHN_UPDATE flag. Unregistered ports are handled by the emulator. Co-developed-by: NAnkur Arora <ankur.a.arora@oracle.com> Co-developed-By: NDavid Woodhouse <dwmw@amazon.co.uk> Signed-off-by: NJoao Martins <joao.m.martins@oracle.com> Signed-off-by: NAnkur Arora <ankur.a.arora@oracle.com> Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Message-Id: <20220303154127.202856-10-dwmw2@infradead.org> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 David Woodhouse 提交于
This adds a KVM_XEN_HVM_EVTCHN_SEND ioctl which allows direct injection of events given an explicit { vcpu, port, priority } in precisely the same form that those fields are given in the IRQ routing table. Userspace is currently able to inject 2-level events purely by setting the bits in the shared_info and vcpu_info, but FIFO event channels are harder to deal with; we will need the kernel to take sole ownership of delivery when we support those. A patch advertising this feature with a new bit in the KVM_CAP_XEN_HVM ioctl will be added in a subsequent patch. Signed-off-by: NDavid Woodhouse <dwmw@amazon.co.uk> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Message-Id: <20220303154127.202856-9-dwmw2@infradead.org> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 21 3月, 2022 1 次提交
-
-
由 Oliver Upton 提交于
KVM_CAP_DISABLE_QUIRKS is irrevocably broken. The capability does not advertise the set of quirks which may be disabled to userspace, so it is impossible to predict the behavior of KVM. Worse yet, KVM_CAP_DISABLE_QUIRKS will tolerate any value for cap->args[0], meaning it fails to reject attempts to set invalid quirk bits. The only valid workaround for the quirky quirks API is to add a new CAP. Actually advertise the set of quirks that can be disabled to userspace so it can predict KVM's behavior. Reject values for cap->args[0] that contain invalid bits. Finally, add documentation for the new capability and describe the existing quirks. Signed-off-by: NOliver Upton <oupton@google.com> Message-Id: <20220301060351.442881-5-oupton@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 25 2月, 2022 1 次提交
-
-
由 David Dunn 提交于
Add a new capability, KVM_CAP_PMU_CAPABILITY, that takes a bitmask of settings/features to allow userspace to configure PMU virtualization on a per-VM basis. For now, support a single flag, KVM_PMU_CAP_DISABLE, to allow disabling PMU virtualization for a VM even when KVM is configured with enable_pmu=true a module level. To keep KVM simple, disallow changing VM's PMU configuration after vCPUs have been created. Signed-off-by: NDavid Dunn <daviddunn@google.com> Message-Id: <20220223225743.2703915-2-daviddunn@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 22 2月, 2022 1 次提交
-
-
由 Nicholas Piggin 提交于
Add KVM_CAP_PPC_AIL_MODE_3 to advertise the capability to set the AIL resource mode to 3 with the H_SET_MODE hypercall. This capability differs between processor types and KVM types (PR, HV, Nested HV), and affects guest-visible behaviour. QEMU will implement a cap-ail-mode-3 to control this behaviour[1], and use the KVM CAP if available to determine KVM support[2]. Reviewed-by: NFabiano Rosas <farosas@linux.ibm.com> Signed-off-by: NNicholas Piggin <npiggin@gmail.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 14 2月, 2022 4 次提交
-
-
由 Janis Schoetterl-Glausch 提交于
Document all currently existing operations, flags and explain under which circumstances they are available. Document the recently introduced absolute operations and the storage key protection flag, as well as the existing SIDA operations. Signed-off-by: NJanis Schoetterl-Glausch <scgl@linux.ibm.com> Reviewed-by: NJanosch Frank <frankja@linux.ibm.com> Link: https://lore.kernel.org/r/20220211182215.2730017-10-scgl@linux.ibm.comSigned-off-by: NChristian Borntraeger <borntraeger@linux.ibm.com>
-
由 Janis Schoetterl-Glausch 提交于
Availability of the KVM_CAP_S390_MEM_OP_EXTENSION capability signals that: * The vcpu MEM_OP IOCTL supports storage key checking. * The vm MEM_OP IOCTL exists. Signed-off-by: NJanis Schoetterl-Glausch <scgl@linux.ibm.com> Reviewed-by: NJanosch Frank <frankja@linux.ibm.com> Reviewed-by: NChristian Borntraeger <borntraeger@linux.ibm.com> Link: https://lore.kernel.org/r/20220211182215.2730017-9-scgl@linux.ibm.comSigned-off-by: NChristian Borntraeger <borntraeger@linux.ibm.com>
-
由 Janis Schoetterl-Glausch 提交于
Channel I/O honors storage keys and is performed on absolute memory. For I/O emulation user space therefore needs to be able to do key checked accesses. The vm IOCTL supports read/write accesses, as well as checking if an access would succeed. Unlike relying on KVM_S390_GET_SKEYS for key checking would, the vm IOCTL performs the check in lockstep with the read or write, by, ultimately, mapping the access to move instructions that support key protection checking with a supplied key. Fetch and storage protection override are not applicable to absolute accesses and so are not applied as they are when using the vcpu memop. Signed-off-by: NJanis Schoetterl-Glausch <scgl@linux.ibm.com> Reviewed-by: NChristian Borntraeger <borntraeger@linux.ibm.com> Link: https://lore.kernel.org/r/20220211182215.2730017-7-scgl@linux.ibm.comSigned-off-by: NChristian Borntraeger <borntraeger@linux.ibm.com>
-
由 Janis Schoetterl-Glausch 提交于
User space needs a mechanism to perform key checked accesses when emulating instructions. The key can be passed as an additional argument. Having an additional argument is flexible, as user space can pass the guest PSW's key, in order to make an access the same way the CPU would, or pass another key if necessary. Signed-off-by: NJanis Schoetterl-Glausch <scgl@linux.ibm.com> Reviewed-by: NClaudio Imbrenda <imbrenda@linux.ibm.com> Reviewed-by: NChristian Borntraeger <borntraeger@linux.ibm.com> Reviewed-by: NJanosch Frank <frankja@linux.ibm.com> Link: https://lore.kernel.org/r/20220211182215.2730017-6-scgl@linux.ibm.comSigned-off-by: NChristian Borntraeger <borntraeger@linux.ibm.com>
-
- 31 1月, 2022 1 次提交
-
-
由 Janosch Frank 提交于
This way we can more easily find the next free IOCTL number when adding new IOCTLs. Fixes: be50b206 ("kvm: x86: Add support for getting/setting expanded xstate buffer") Signed-off-by: NJanosch Frank <frankja@linux.ibm.com> Message-Id: <20220128154025.102666-1-frankja@linux.ibm.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 28 1月, 2022 1 次提交
-
-
由 Paolo Bonzini 提交于
Because KVM_GET_SUPPORTED_CPUID is meant to be passed (by simple-minded VMMs) to KVM_SET_CPUID2, it cannot include any dynamic xsave states that have not been enabled. Probing those, for example so that they can be passed to ARCH_REQ_XCOMP_GUEST_PERM, requires a new ioctl or arch_prctl. The latter is in fact worse, even though that is what the rest of the API uses, because it would require supported_xcr0 to be moved from the KVM module to the kernel just for this use. In addition, the value would be nonsensical (or an error would have to be returned) until the KVM module is loaded in. Therefore, to limit the growth of system ioctls, add a /dev/kvm variant of KVM_{GET,HAS}_DEVICE_ATTR, and implement it in x86 with just one group (0) and attribute (KVM_X86_XCOMP_GUEST_SUPP). Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-