- 21 10月, 2019 1 次提交
-
-
由 Fabiano Rosas 提交于
When calling the KVM_SET_GUEST_DEBUG ioctl, userspace might request the next instruction to be single stepped via the KVM_GUESTDBG_SINGLESTEP control bit of the kvm_guest_debug structure. This patch adds the KVM_CAP_PPC_GUEST_DEBUG_SSTEP capability in order to inform userspace about the state of single stepping support. We currently don't have support for guest single stepping implemented in Book3S HV so the capability is only present for Book3S PR and BookE. Signed-off-by: NFabiano Rosas <farosas@linux.ibm.com> Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
-
- 24 9月, 2019 1 次提交
-
-
由 Tianyu Lan 提交于
Hyper-V direct tlb flush function should be enabled for guest that only uses Hyper-V hypercall. User space hypervisor(e.g, Qemu) can disable KVM identification in CPUID and just exposes Hyper-V identification to make sure the precondition. Add new KVM capability KVM_CAP_ HYPERV_DIRECT_TLBFLUSH for user space to enable Hyper-V direct tlb function and this function is default to be disabled in KVM. Signed-off-by: NTianyu Lan <Tianyu.Lan@microsoft.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 11 9月, 2019 1 次提交
-
-
由 Xiaoyao Li 提交于
Userspace can use ioctl KVM_SET_MSRS to update a set of MSRs of guest. This ioctl set specified MSRs one by one. If it fails to set an MSR, e.g., due to setting reserved bits, the MSR is not supported/emulated by KVM, etc..., it stops processing the MSR list and returns the number of MSRs have been set successfully. Signed-off-by: NXiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 09 9月, 2019 1 次提交
-
-
由 Marc Zyngier 提交于
While parts of the VGIC support a large number of vcpus (we bravely allow up to 512), other parts are more limited. One of these limits is visible in the KVM_IRQ_LINE ioctl, which only allows 256 vcpus to be signalled when using the CPU or PPI types. Unfortunately, we've cornered ourselves badly by allocating all the bits in the irq field. Since the irq_type subfield (8 bit wide) is currently only taking the values 0, 1 and 2 (and we have been careful not to allow anything else), let's reduce this field to only 4 bits, and allocate the remaining 4 bits to a vcpu2_index, which acts as a multiplier: vcpu_id = 256 * vcpu2_index + vcpu_index With that, and a new capability (KVM_CAP_ARM_IRQ_LINE_LAYOUT_2) allowing this to be discovered, it becomes possible to inject PPIs to up to 4096 vcpus. But please just don't. Whilst we're there, add a clarification about the use of KVM_IRQ_LINE on arm, which is not completely conditionned by KVM_CAP_IRQCHIP. Reported-by: NZenghui Yu <yuzenghui@huawei.com> Reviewed-by: NEric Auger <eric.auger@redhat.com> Reviewed-by: NZenghui Yu <yuzenghui@huawei.com> Signed-off-by: NMarc Zyngier <maz@kernel.org>
-
- 29 8月, 2019 1 次提交
-
-
由 Cornelia Huck 提交于
Explicitly specify the valid ranges for size and ar, and reword buf requirements a bit. Signed-off-by: NCornelia Huck <cohuck@redhat.com> Reviewed-by: NThomas Huth <thuth@redhat.com> Reviewed-by: NDavid Hildenbrand <david@redhat.com> Link: https://lkml.kernel.org/r/20190829124746.28665-1-cohuck@redhat.comSigned-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
-
- 24 7月, 2019 1 次提交
-
-
由 Christoph Hellwig 提交于
Renaming docs seems to be en vogue at the moment, so fix on of the grossly misnamed directories. We usually never use "virtual" as a shortcut for virtualization in the kernel, but always virt, as seen in the virt/ top-level directory. Fix up the documentation to match that. Fixes: ed16648e ("Move kvm, uml, and lguest subdirectories under a common "virtual" directory, I.E:") Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 20 7月, 2019 1 次提交
-
-
由 Eric Hankland 提交于
Updates KVM_CAP_PMU_EVENT_FILTER so it can also whitelist or blacklist fixed counters. Signed-off-by: NEric Hankland <ehankland@google.com> [No need to check padding fields for zero. - Paolo] Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 11 7月, 2019 1 次提交
-
-
由 Eric Hankland 提交于
Some events can provide a guest with information about other guests or the host (e.g. L3 cache stats); providing the capability to restrict access to a "safe" set of events would limit the potential for the PMU to be used in any side channel attacks. This change introduces a new VM ioctl that sets an event filter. If the guest attempts to program a counter for any blacklisted or non-whitelisted event, the kernel counter won't be created, so any RDPMC/RDMSR will show 0 instances of that event. Signed-off-by: NEric Hankland <ehankland@google.com> [Lots of changes. All remaining bugs are probably mine. - Paolo] Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 19 6月, 2019 1 次提交
-
-
由 Liran Alon 提交于
Improve the KVM_{GET,SET}_NESTED_STATE structs by detailing the format of VMX nested state data in a struct. In order to avoid changing the ioctl values of KVM_{GET,SET}_NESTED_STATE, there is a need to preserve sizeof(struct kvm_nested_state). This is done by defining the data struct as "data.vmx[0]". It was the most elegant way I found to preserve struct size while still keeping struct readable and easy to maintain. It does have a misfortunate side-effect that now it has to be accessed as "data.vmx[0]" rather than just "data.vmx". Because we are already modifying these structs, I also modified the following: * Define the "format" field values as macros. * Rename vmcs_pa to vmcs12_pa for better readability. Signed-off-by: NLiran Alon <liran.alon@oracle.com> [Remove SVM stubs, add KVM_STATE_NESTED_VMX_VMCS12_SIZE. - Paolo] Reviewed-by: NLiran Alon <liran.alon@oracle.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 18 6月, 2019 1 次提交
-
-
由 Dennis Restle 提交于
The documentation mentions a non-existing capability KVM_CAP_USER_MEM.s The right name is KVM_CAP_USER_MEMORY. Signed-off-by: NDennis Restle <derestle@htwg-konstanz.de> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 15 6月, 2019 1 次提交
-
-
由 Mauro Carvalho Chehab 提交于
The documentation is in a format that is very close to ReST format. The conversion is actually: - add blank lines in order to identify paragraphs; - fixing tables markups; - adding some lists markups; - marking literal blocks; - adjust some title markups. At its new index.rst, let's add a :orphan: while this is not linked to the main index.rst file, in order to avoid build warnings. Signed-off-by: NMauro Carvalho Chehab <mchehab+samsung@kernel.org> Signed-off-by: NJonathan Corbet <corbet@lwn.net>
-
- 05 6月, 2019 2 次提交
-
-
由 Wanpeng Li 提交于
Allow guest reads CORE cstate when exposing host CPU power management capabilities to the guest. PKG cstate is restricted to avoid a guest to get the whole package information in multi-tenant scenario. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Sean Christopherson <sean.j.christopherson@intel.com> Cc: Liran Alon <liran.alon@oracle.com> Signed-off-by: NWanpeng Li <wanpengli@tencent.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Wanpeng Li 提交于
Commit b31c114b (KVM: X86: Provide a capability to disable PAUSE intercepts) forgot to add the KVM_X86_DISABLE_EXITS_PAUSE into api doc. This patch adds it. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Sean Christopherson <sean.j.christopherson@intel.com> Cc: Liran Alon <liran.alon@oracle.com> Signed-off-by: NWanpeng Li <wanpengli@tencent.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 08 5月, 2019 1 次提交
-
-
由 Peter Xu 提交于
The previous KVM_CAP_MANUAL_DIRTY_LOG_PROTECT has some problem which blocks the correct usage from userspace. Obsolete the old one and introduce a new capability bit for it. Suggested-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NPeter Xu <peterx@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 01 5月, 2019 3 次提交
-
-
由 Paolo Bonzini 提交于
If a memory slot's size is not a multiple of 64 pages (256K), then the KVM_CLEAR_DIRTY_LOG API is unusable: clearing the final 64 pages either requires the requested page range to go beyond memslot->npages, or requires log->num_pages to be unaligned, and kvm_clear_dirty_log_protect requires log->num_pages to be both in range and aligned. To allow this case, allow log->num_pages not to be a multiple of 64 if it ends exactly on the last page of the slot. Reported-by: NPeter Xu <peterx@redhat.com> Fixes: 98938aa8 ("KVM: validate userspace input in kvm_clear_dirty_log_protect()", 2019-01-02) Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Radim Krčmář 提交于
This reverts commit 919f6cd8. The patch was applied twice. The first commit is eca6be56. Reported-by: NCornelia Huck <cohuck@redhat.com> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
If a memory slot's size is not a multiple of 64 pages (256K), then the KVM_CLEAR_DIRTY_LOG API is unusable: clearing the final 64 pages either requires the requested page range to go beyond memslot->npages, or requires log->num_pages to be unaligned, and kvm_clear_dirty_log_protect requires log->num_pages to be both in range and aligned. To allow this case, allow log->num_pages not to be a multiple of 64 if it ends exactly on the last page of the slot. Reported-by: NPeter Xu <peterx@redhat.com> Fixes: 98938aa8 ("KVM: validate userspace input in kvm_clear_dirty_log_protect()", 2019-01-02) Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 30 4月, 2019 2 次提交
-
-
由 Cédric Le Goater 提交于
The state of the thread interrupt management registers needs to be collected for migration. These registers are cached under the 'xive_saved_state.w01' field of the VCPU when the VPCU context is pulled from the HW thread. An OPAL call retrieves the backup of the IPB register in the underlying XIVE NVT structure and merges it in the KVM state. Signed-off-by: NCédric Le Goater <clg@kaod.org> Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au> Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
-
由 Cédric Le Goater 提交于
The user interface exposes a new capability KVM_CAP_PPC_IRQ_XIVE to let QEMU connect the vCPU presenters to the XIVE KVM device if required. The capability is not advertised for now as the full support for the XIVE native exploitation mode is not yet available. When this is case, the capability will be advertised on PowerNV Hypervisors only. Nested guests (pseries KVM Hypervisor) are not supported. Internally, the interface to the new KVM device is protected with a new interrupt mode: KVMPPC_IRQ_XIVE. Signed-off-by: NCédric Le Goater <clg@kaod.org> Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au> Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
-
- 29 4月, 2019 1 次提交
-
-
由 Andrew Jones 提交于
KVM_GET_DIRTY_LOG is implemented by all architectures, not just x86, and KVM_CAP_MANUAL_DIRTY_LOG_PROTECT is additionally implemented by arm, arm64, and mips. Signed-off-by: NAndrew Jones <drjones@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 24 4月, 2019 2 次提交
-
-
由 Amit Daniel Kachhap 提交于
This patch advertises the capability of two cpu feature called address pointer authentication and generic pointer authentication. These capabilities depend upon system support for pointer authentication and VHE mode. The current arm64 KVM partially implements pointer authentication and support of address/generic authentication are tied together. However, separate ABI requirements for both of them is added so that any future isolated implementation will not require any ABI changes. Signed-off-by: NAmit Daniel Kachhap <amit.kachhap@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Christoffer Dall <christoffer.dall@arm.com> Cc: kvmarm@lists.cs.columbia.edu Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
-
由 Amit Daniel Kachhap 提交于
Now that the building blocks of pointer authentication are present, lets add userspace flags KVM_ARM_VCPU_PTRAUTH_ADDRESS and KVM_ARM_VCPU_PTRAUTH_GENERIC. These flags will enable pointer authentication for the KVM guest on a per-vcpu basis through the ioctl KVM_ARM_VCPU_INIT. This features will allow the KVM guest to allow the handling of pointer authentication instructions or to treat them as undefined if not set. Necessary documentations are added to reflect the changes done. Reviewed-by: NDave Martin <Dave.Martin@arm.com> Signed-off-by: NAmit Daniel Kachhap <amit.kachhap@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Christoffer Dall <christoffer.dall@arm.com> Cc: kvmarm@lists.cs.columbia.edu Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
-
- 19 4月, 2019 4 次提交
-
-
由 Dave Martin 提交于
The existing documentation for which SVE register slice IDs are considered out-of-range, and what happens when userspace tries to access them, is cryptic. This patch rewords the text with the aim of making it a bit easier to understand. No functional change. Suggested-by: NAndrew Jones <drjones@redhat.com> Signed-off-by: NDave Martin <Dave.Martin@arm.com> Reviewed-by: NAndrew Jones <drjones@redhat.com> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
-
由 Dave Martin 提交于
The current error code documentation for KVM_GET_ONE_REG and KVM_SET_ONE_REG could be read as implying that all architectures implement these error codes, or that KVM guarantees which error code is returned in a particular situation. Because this is not really the case, this patch waters down the documentation explicitly to remove such guarantees. EPERM is marked as arm64-specific, since for now arm64 really is the only architecture that yields this error code for the finalization-required case. Keeping this as a distinct error code is useful however for debugging due to the statefulness of the API in this instance. No functional change. Suggested-by: NAndrew Jones <drjones@redhat.com> Fixes: 395f562f ("KVM: Document errors for KVM_GET_ONE_REG and KVM_SET_ONE_REG") Fixes: 50036ad0 ("KVM: arm64/sve: Document KVM API extensions for SVE") Signed-off-by: NDave Martin <Dave.Martin@arm.com> Reviewed-by: NAndrew Jones <drjones@redhat.com> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
-
由 Dave Martin 提交于
Userspace is only supposed to use KVM_ARM_VCPU_FINALIZE when there is some vcpu feature that can actually be finalized. This means that documenting KVM_ARM_VCPU_FINALIZE as available or not depending on the capabilities present is not helpful. This patch amends the documentation to describe availability in terms of which capability is required for each finalizable feature instead. In any case, userspace sees the same error (EINVAL) regardless of whether the given feature is not present or KVM_ARM_VCPU_FINALIZE is not implemented at all. No functional change. Suggested-by: NAndrew Jones <drjones@redhat.com> Signed-off-by: NDave Martin <Dave.Martin@arm.com> Reviewed-by: NAndrew Jones <drjones@redhat.com> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
-
由 Dave Martin 提交于
A complicated DIV_ROUND_UP() expression is currently written out explicitly in multiple places in order to specify the size of the bitmap exchanged with userspace to represent the value of the KVM_REG_ARM64_SVE_VLS pseudo-register. Userspace currently has no direct way to work this out either: for documentation purposes, the size is just quoted as 8 u64s. To make this more intuitive, this patch replaces these with a single define, which is also exported to userspace as KVM_ARM64_SVE_VLS_WORDS. Since the number of words in a bitmap is just the index of the last word used + 1, this patch expresses the bound that way instead. This should make it clearer what is being expressed. For userspace convenience, the minimum and maximum possible vector lengths relevant to the KVM ABI are exposed to UAPI as KVM_ARM64_SVE_VQ_MIN, KVM_ARM64_SVE_VQ_MAX. Since the only direct use for these at present is manipulation of KVM_REG_ARM64_SVE_VLS, no corresponding _VL_ macros are defined. They could be added later if a need arises. Since use of DIV_ROUND_UP() was the only reason for including <linux/kernel.h> in guest.c, this patch also removes that #include. Suggested-by: NAndrew Jones <drjones@redhat.com> Signed-off-by: NDave Martin <Dave.Martin@arm.com> Reviewed-by: NAndrew Jones <drjones@redhat.com> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
-
- 16 4月, 2019 1 次提交
-
-
由 Paolo Bonzini 提交于
All architectures except MIPS were defining it in the same way, and memory slots are handled entirely by common code so there is no point in keeping the definition per-architecture. Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 29 3月, 2019 7 次提交
-
-
由 Dave Martin 提交于
This patch adds sections to the KVM API documentation describing the extensions for supporting the Scalable Vector Extension (SVE) in guests. Signed-off-by: NDave Martin <Dave.Martin@arm.com> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
-
由 Dave Martin 提交于
KVM_GET_ONE_REG and KVM_SET_ONE_REG return some error codes that are not documented (but hopefully not surprising either). To give an indication of what these may mean, this patch adds brief documentation. Signed-off-by: NDave Martin <Dave.Martin@arm.com> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
-
由 Dave Martin 提交于
Since the the sizes of individual members of the core arm64 registers vary, the list of register encodings that make sense is not a simple linear sequence. To clarify which encodings to use, this patch adds a brief list to the documentation. Signed-off-by: NDave Martin <Dave.Martin@arm.com> Reviewed-by: NJulien Grall <julien.grall@arm.com> Reviewed-by: NPeter Maydell <peter.maydell@linaro.org> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
-
由 Paolo Bonzini 提交于
The documentation does not mention how to delete a slot, add the information. Reported-by: NNathaniel McCallum <npmccallum@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
The series to add memcg accounting to KVM allocations[1] states: There are many KVM kernel memory allocations which are tied to the life of the VM process and should be charged to the VM process's cgroup. While it is correct to account KVM kernel allocations to the cgroup of the process that created the VM, it's technically incorrect to state that the KVM kernel memory allocations are tied to the life of the VM process. This is because the VM itself, i.e. struct kvm, is not tied to the life of the process which created it, rather it is tied to the life of its associated file descriptor. In other words, kvm_destroy_vm() is not invoked until fput() decrements its associated file's refcount to zero. A simple example is to fork() in Qemu and have the child sleep indefinitely; kvm_destroy_vm() isn't called until Qemu closes its file descriptor *and* the rogue child is killed. The allocations are guaranteed to be *accounted* to the process which created the VM, but only because KVM's per-{VM,vCPU} ioctls reject the ioctl() with -EIO if kvm->mm != current->mm. I.e. the child can keep the VM "alive" but can't do anything useful with its reference. Note that because 'struct kvm' also holds a reference to the mm_struct of its owner, the above behavior also applies to userspace allocations. Given that mucking with a VM's file descriptor can lead to subtle and undesirable behavior, e.g. memcg charges persisting after a VM is shut down, explicitly document a VM's lifecycle and its impact on the VM's resources. Alternatively, KVM could aggressively free resources when the creating process exits, e.g. via mmu_notifier->release(). However, mmu_notifier isn't guaranteed to be available, and freeing resources when the creator exits is likely to be error prone and fragile as KVM would need to ensure that it only freed resources that are truly out of reach. In practice, the existing behavior shouldn't be problematic as a properly configured system will prevent a child process from being moved out of the appropriate cgroup hierarchy, i.e. prevent hiding the process from the OOM killer, and will prevent an unprivileged user from being able to to hold a reference to struct kvm via another method, e.g. debugfs. [1]https://patchwork.kernel.org/patch/10806707/Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
KVM's API requires thats ioctls must be issued from the same process that created the VM. In other words, userspace can play games with a VM's file descriptors, e.g. fork(), SCM_RIGHTS, etc..., but only the creator can do anything useful. Explicitly reject device ioctls that are issued by a process other than the VM's creator, and update KVM's API documentation to extend its requirements to device ioctls. Fixes: 852b6d57 ("kvm: add device control API") Cc: <stable@vger.kernel.org> Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Per Paolo[1], instantiating multiple VMs in a single process is legal; but this conflicts with KVM's API documentation, which states: The only supported use is one virtual machine per process, and one vcpu per thread. However, an earlier section in the documentation states: Only run VM ioctls from the same process (address space) that was used to create the VM. and: Only run vcpu ioctls from the same thread that was used to create the vcpu. This suggests that the conflicting documentation is simply an incorrect ordering of of words, i.e. what's really meant is that a virtual machine can't be shared across multiple processes and a vCPU can't be shared across multiple threads. Tweak the blurb on issuing ioctls to use a more assertive tone, and rewrite the "supported use" sentence to reference said blurb instead of poorly restating it in different terms. Opportunistically add missing punctuation. [1] https://lkml.kernel.org/r/f23265d4-528e-3bd4-011f-4d7b8f3281db@redhat.com Fixes: 9c1b96e3 ("KVM: Document basic API") Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> [Improve notes on asynchronous ioctl] Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 16 3月, 2019 1 次提交
-
-
由 Sean Christopherson 提交于
The series to add memcg accounting to KVM allocations[1] states: There are many KVM kernel memory allocations which are tied to the life of the VM process and should be charged to the VM process's cgroup. While it is correct to account KVM kernel allocations to the cgroup of the process that created the VM, it's technically incorrect to state that the KVM kernel memory allocations are tied to the life of the VM process. This is because the VM itself, i.e. struct kvm, is not tied to the life of the process which created it, rather it is tied to the life of its associated file descriptor. In other words, kvm_destroy_vm() is not invoked until fput() decrements its associated file's refcount to zero. A simple example is to fork() in Qemu and have the child sleep indefinitely; kvm_destroy_vm() isn't called until Qemu closes its file descriptor *and* the rogue child is killed. The allocations are guaranteed to be *accounted* to the process which created the VM, but only because KVM's per-{VM,vCPU} ioctls reject the ioctl() with -EIO if kvm->mm != current->mm. I.e. the child can keep the VM "alive" but can't do anything useful with its reference. Note that because 'struct kvm' also holds a reference to the mm_struct of its owner, the above behavior also applies to userspace allocations. Given that mucking with a VM's file descriptor can lead to subtle and undesirable behavior, e.g. memcg charges persisting after a VM is shut down, explicitly document a VM's lifecycle and its impact on the VM's resources. Alternatively, KVM could aggressively free resources when the creating process exits, e.g. via mmu_notifier->release(). However, mmu_notifier isn't guaranteed to be available, and freeing resources when the creator exits is likely to be error prone and fragile as KVM would need to ensure that it only freed resources that are truly out of reach. In practice, the existing behavior shouldn't be problematic as a properly configured system will prevent a child process from being moved out of the appropriate cgroup hierarchy, i.e. prevent hiding the process from the OOM killer, and will prevent an unprivileged user from being able to to hold a reference to struct kvm via another method, e.g. debugfs. [1]https://patchwork.kernel.org/patch/10806707/Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 15 12月, 2018 1 次提交
-
-
由 Vitaly Kuznetsov 提交于
With every new Hyper-V Enlightenment we implement we're forced to add a KVM_CAP_HYPERV_* capability. While this approach works it is fairly inconvenient: the majority of the enlightenments we do have corresponding CPUID feature bit(s) and userspace has to know this anyways to be able to expose the feature to the guest. Add KVM_GET_SUPPORTED_HV_CPUID ioctl (backed by KVM_CAP_HYPERV_CPUID, "one cap to rule them all!") returning all Hyper-V CPUID feature leaves. Using the existing KVM_GET_SUPPORTED_CPUID doesn't seem to be possible: Hyper-V CPUID feature leaves intersect with KVM's (e.g. 0x40000000, 0x40000001) and we would probably confuse userspace in case we decide to return these twice. KVM_CAP_HYPERV_CPUID's number is interim: we're intended to drop KVM_CAP_HYPERV_STIMER_DIRECT and use its number instead. Suggested-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 14 12月, 2018 2 次提交
-
-
由 Paolo Bonzini 提交于
There are two problems with KVM_GET_DIRTY_LOG. First, and less important, it can take kvm->mmu_lock for an extended period of time. Second, its user can actually see many false positives in some cases. The latter is due to a benign race like this: 1. KVM_GET_DIRTY_LOG returns a set of dirty pages and write protects them. 2. The guest modifies the pages, causing them to be marked ditry. 3. Userspace actually copies the pages. 4. KVM_GET_DIRTY_LOG returns those pages as dirty again, even though they were not written to since (3). This is especially a problem for large guests, where the time between (1) and (3) can be substantial. This patch introduces a new capability which, when enabled, makes KVM_GET_DIRTY_LOG not write-protect the pages it returns. Instead, userspace has to explicitly clear the dirty log bits just before using the content of the page. The new KVM_CLEAR_DIRTY_LOG ioctl can also operate on a 64-page granularity rather than requiring to sync a full memslot; this way, the mmu_lock is taken for small amounts of time, and only a small amount of time will pass between write protection of pages and the sending of their content. Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
The first such capability to be handled in virt/kvm/ will be manual dirty page reprotection. Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 18 10月, 2018 2 次提交
-
-
由 Jim Mattson 提交于
This is a per-VM capability which can be enabled by userspace so that the faulting linear address will be included with the information about a pending #PF in L2, and the "new DR6 bits" will be included with the information about a pending #DB in L2. With this capability enabled, the L1 hypervisor can now intercept #PF before CR2 is modified. Under VMX, the L1 hypervisor can now intercept #DB before DR6 and DR7 are modified. When userspace has enabled KVM_CAP_EXCEPTION_PAYLOAD, it should generally provide an appropriate payload when injecting a #PF or #DB exception via KVM_SET_VCPU_EVENTS. However, to support restoring old checkpoints, this payload is not required. Note that bit 16 of the "new DR6 bits" is set to indicate that a debug exception (#DB) or a breakpoint exception (#BP) occurred inside an RTM region while advanced debugging of RTM transactional regions was enabled. This is the reverse of DR6.RTM, which is cleared in this scenario. This capability also enables exception.pending in struct kvm_vcpu_events, which allows userspace to distinguish between pending and injected exceptions. Reported-by: NJim Mattson <jmattson@google.com> Suggested-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NJim Mattson <jmattson@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Jim Mattson 提交于
The per-VM capability KVM_CAP_EXCEPTION_PAYLOAD (to be introduced in a later commit) adds the following fields to struct kvm_vcpu_events: exception_has_payload, exception_payload, and exception.pending. With this capability set, all of the details of vcpu->arch.exception, including the payload for a pending exception, are reported to userspace in response to KVM_GET_VCPU_EVENTS. With this capability clear, the original ABI is preserved, and the exception.injected field is set for either pending or injected exceptions. When userspace calls KVM_SET_VCPU_EVENTS with KVM_CAP_EXCEPTION_PAYLOAD clear, exception.injected is no longer translated to exception.pending. KVM_SET_VCPU_EVENTS can now only establish a pending exception when KVM_CAP_EXCEPTION_PAYLOAD is set. Reported-by: NJim Mattson <jmattson@google.com> Suggested-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NJim Mattson <jmattson@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-