- 24 1月, 2020 3 次提交
-
-
由 Sean Christopherson 提交于
Add kvm_vcpu_destroy() and wire up all architectures to call the common function instead of their arch specific implementation. The common destruction function will be used by future patches to move allocation and initialization of vCPUs to common KVM code, i.e. to free resources that are allocated by arch agnostic code. No functional change intended. Acked-by: NChristoffer Dall <christoffer.dall@arm.com> Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: NCornelia Huck <cohuck@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Add a pre-allocation arch hook to handle checks that are currently done by arch specific code prior to allocating the vCPU object. This paves the way for moving the allocation to common KVM code. Acked-by: NChristoffer Dall <christoffer.dall@arm.com> Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: NCornelia Huck <cohuck@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Sean Christopherson 提交于
Remove the superfluous kvm_arch_vcpu_free() as it is no longer called from commmon KVM code. Note, kvm_arch_vcpu_destroy() *is* called from common code, i.e. choosing which function to whack is not completely arbitrary. Acked-by: NChristoffer Dall <christoffer.dall@arm.com> Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 06 12月, 2019 1 次提交
-
-
由 Miaohe Lin 提交于
As arg dummy is not really needed, there's no need to pass NULL when calling cpu_init_hyp_mode(). So clean it up. Fixes: 67f69197 ("arm64: kvm: allows kvm cpu hotplug") Reviewed-by: NSteven Price <steven.price@arm.com> Signed-off-by: NMiaohe Lin <linmiaohe@huawei.com> Signed-off-by: NMarc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/1574320559-5662-1-git-send-email-linmiaohe@huawei.com
-
- 08 11月, 2019 1 次提交
-
-
由 Marc Zyngier 提交于
Just like we do for WFE trapping, it can be useful to turn off WFI trapping when the physical CPU is not oversubscribed (that is, the vcpu is the only runnable process on this CPU) *and* that we're using direct injection of interrupts. The conditions are reevaluated on each vcpu_load(), ensuring that we don't switch to this mode on a busy system. On a GICv4 system, this has the effect of reducing the generation of doorbell interrupts to zero when the right conditions are met, which is a huge improvement over the current situation (where the doorbells are screaming if the CPU ever hits a blocking WFI). Signed-off-by: NMarc Zyngier <maz@kernel.org> Reviewed-by: NZenghui Yu <yuzenghui@huawei.com> Reviewed-by: NChristoffer Dall <christoffer.dall@arm.com> Link: https://lore.kernel.org/r/20191107160412.30301-3-maz@kernel.org
-
- 29 10月, 2019 1 次提交
-
-
由 Marc Zyngier 提交于
When the VHE code was reworked, a lot of the vgic stuff was moved around, but the GICv4 residency code did stay untouched, meaning that we come in and out of residency on each flush/sync, which is obviously suboptimal. To address this, let's move things around a bit: - Residency entry (flush) moves to vcpu_load - Residency exit (sync) moves to vcpu_put - On blocking (entry to WFI), we "put" - On unblocking (exit from WFI), we "load" Because these can nest (load/block/put/load/unblock/put, for example), we now have per-VPE tracking of the residency state. Additionally, vgic_v4_put gains a "need doorbell" parameter, which only gets set to true when blocking because of a WFI. This allows a finer control of the doorbell, which now also gets disabled as soon as it gets signaled. Signed-off-by: NMarc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20191027144234.8395-2-maz@kernel.org
-
- 22 10月, 2019 3 次提交
-
-
由 Steven Price 提交于
Implement the service call for configuring a shared structure between a VCPU and the hypervisor in which the hypervisor can write the time stolen from the VCPU's execution time by other tasks on the host. User space allocates memory which is placed at an IPA also chosen by user space. The hypervisor then updates the shared structure using kvm_put_guest() to ensure single copy atomicity of the 64-bit value reporting the stolen time in nanoseconds. Whenever stolen time is enabled by the guest, the stolen time counter is reset. The stolen time itself is retrieved from the sched_info structure maintained by the Linux scheduler code. We enable SCHEDSTATS when selecting KVM Kconfig to ensure this value is meaningful. Signed-off-by: NSteven Price <steven.price@arm.com> Signed-off-by: NMarc Zyngier <maz@kernel.org>
-
由 Christoffer Dall 提交于
In some scenarios, such as buggy guest or incorrect configuration of the VMM and firmware description data, userspace will detect a memory access to a portion of the IPA, which is not mapped to any MMIO region. For this purpose, the appropriate action is to inject an external abort to the guest. The kernel already has functionality to inject an external abort, but we need to wire up a signal from user space that lets user space tell the kernel to do this. It turns out, we already have the set event functionality which we can perfectly reuse for this. Signed-off-by: NChristoffer Dall <christoffer.dall@arm.com> Signed-off-by: NMarc Zyngier <maz@kernel.org>
-
由 Christoffer Dall 提交于
For a long time, if a guest accessed memory outside of a memslot using any of the load/store instructions in the architecture which doesn't supply decoding information in the ESR_EL2 (the ISV bit is not set), the kernel would print the following message and terminate the VM as a result of returning -ENOSYS to userspace: load/store instruction decoding not implemented The reason behind this message is that KVM assumes that all accesses outside a memslot is an MMIO access which should be handled by userspace, and we originally expected to eventually implement some sort of decoding of load/store instructions where the ISV bit was not set. However, it turns out that many of the instructions which don't provide decoding information on abort are not safe to use for MMIO accesses, and the remaining few that would potentially make sense to use on MMIO accesses, such as those with register writeback, are not used in practice. It also turns out that fetching an instruction from guest memory can be a pretty horrible affair, involving stopping all CPUs on SMP systems, handling multiple corner cases of address translation in software, and more. It doesn't appear likely that we'll ever implement this in the kernel. What is much more common is that a user has misconfigured his/her guest and is actually not accessing an MMIO region, but just hitting some random hole in the IPA space. In this scenario, the error message above is almost misleading and has led to a great deal of confusion over the years. It is, nevertheless, ABI to userspace, and we therefore need to introduce a new capability that userspace explicitly enables to change behavior. This patch introduces KVM_CAP_ARM_NISV_TO_USER (NISV meaning Non-ISV) which does exactly that, and introduces a new exit reason to report the event to userspace. User space can then emulate an exception to the guest, restart the guest, suspend the guest, or take any other appropriate action as per the policy of the running system. Reported-by: NHeinrich Schuchardt <xypron.glpk@gmx.de> Signed-off-by: NChristoffer Dall <christoffer.dall@arm.com> Reviewed-by: NAlexander Graf <graf@amazon.com> Signed-off-by: NMarc Zyngier <maz@kernel.org>
-
- 09 9月, 2019 1 次提交
-
-
由 Marc Zyngier 提交于
While parts of the VGIC support a large number of vcpus (we bravely allow up to 512), other parts are more limited. One of these limits is visible in the KVM_IRQ_LINE ioctl, which only allows 256 vcpus to be signalled when using the CPU or PPI types. Unfortunately, we've cornered ourselves badly by allocating all the bits in the irq field. Since the irq_type subfield (8 bit wide) is currently only taking the values 0, 1 and 2 (and we have been careful not to allow anything else), let's reduce this field to only 4 bits, and allocate the remaining 4 bits to a vcpu2_index, which acts as a multiplier: vcpu_id = 256 * vcpu2_index + vcpu_index With that, and a new capability (KVM_CAP_ARM_IRQ_LINE_LAYOUT_2) allowing this to be discovered, it becomes possible to inject PPIs to up to 4096 vcpus. But please just don't. Whilst we're there, add a clarification about the use of KVM_IRQ_LINE on arm, which is not completely conditionned by KVM_CAP_IRQCHIP. Reported-by: NZenghui Yu <yuzenghui@huawei.com> Reviewed-by: NEric Auger <eric.auger@redhat.com> Reviewed-by: NZenghui Yu <yuzenghui@huawei.com> Signed-off-by: NMarc Zyngier <maz@kernel.org>
-
- 05 8月, 2019 2 次提交
-
-
由 Marc Zyngier 提交于
Since commit commit 328e5664 ("KVM: arm/arm64: vgic: Defer touching GICH_VMCR to vcpu_load/put"), we leave ICH_VMCR_EL2 (or its GICv2 equivalent) loaded as long as we can, only syncing it back when we're scheduled out. There is a small snag with that though: kvm_vgic_vcpu_pending_irq(), which is indirectly called from kvm_vcpu_check_block(), needs to evaluate the guest's view of ICC_PMR_EL1. At the point were we call kvm_vcpu_check_block(), the vcpu is still loaded, and whatever changes to PMR is not visible in memory until we do a vcpu_put(). Things go really south if the guest does the following: mov x0, #0 // or any small value masking interrupts msr ICC_PMR_EL1, x0 [vcpu preempted, then rescheduled, VMCR sampled] mov x0, #ff // allow all interrupts msr ICC_PMR_EL1, x0 wfi // traps to EL2, so samping of VMCR [interrupt arrives just after WFI] Here, the hypervisor's view of PMR is zero, while the guest has enabled its interrupts. kvm_vgic_vcpu_pending_irq() will then say that no interrupts are pending (despite an interrupt being received) and we'll block for no reason. If the guest doesn't have a periodic interrupt firing once it has blocked, it will stay there forever. To avoid this unfortuante situation, let's resync VMCR from kvm_arch_vcpu_blocking(), ensuring that a following kvm_vcpu_check_block() will observe the latest value of PMR. This has been found by booting an arm64 Linux guest with the pseudo NMI feature, and thus using interrupt priorities to mask interrupts instead of the usual PSTATE masking. Cc: stable@vger.kernel.org # 4.12 Fixes: 328e5664 ("KVM: arm/arm64: vgic: Defer touching GICH_VMCR to vcpu_load/put") Signed-off-by: NMarc Zyngier <maz@kernel.org>
-
由 Paolo Bonzini 提交于
There is no need for this function as all arches have to implement kvm_arch_create_vcpu_debugfs() no matter what. A #define symbol let us actually simplify the code. Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 24 7月, 2019 1 次提交
-
-
由 Christoph Hellwig 提交于
Renaming docs seems to be en vogue at the moment, so fix on of the grossly misnamed directories. We usually never use "virtual" as a shortcut for virtualization in the kernel, but always virt, as seen in the virt/ top-level directory. Fix up the documentation to match that. Fixes: ed16648e ("Move kvm, uml, and lguest subdirectories under a common "virtual" directory, I.E:") Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 23 7月, 2019 1 次提交
-
-
由 Zenghui Yu 提交于
We use "pmc->idx" and the "chained" bitmap to determine if the pmc is chained, in kvm_pmu_pmc_is_chained(). But idx might be uninitialized (and random) when we doing this decision, through a KVM_ARM_VCPU_INIT ioctl -> kvm_pmu_vcpu_reset(). And the test_bit() against this random idx will potentially hit a KASAN BUG [1]. In general, idx is the static property of a PMU counter that is not expected to be modified across resets, as suggested by Julien. It looks more reasonable if we can setup the PMU counter idx for a vcpu in its creation time. Introduce a new function - kvm_pmu_vcpu_init() for this basic setup. Oh, and the KASAN BUG will get fixed this way. [1] https://www.spinics.net/lists/kvm-arm/msg36700.html Fixes: 80f393a2 ("KVM: arm/arm64: Support chained PMU counters") Suggested-by: NAndrew Murray <andrew.murray@arm.com> Suggested-by: NJulien Thierry <julien.thierry@arm.com> Acked-by: NJulien Thierry <julien.thierry@arm.com> Signed-off-by: NZenghui Yu <yuzenghui@huawei.com> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
-
- 08 7月, 2019 1 次提交
-
-
由 Marc Zyngier 提交于
As part of setting up the host context, we populate its MPIDR by using cpu_logical_map(). It turns out that contrary to arm64, cpu_logical_map() on 32bit ARM doesn't return the *full* MPIDR, but a truncated version. This leaves the host MPIDR slightly corrupted after the first run of a VM, since we won't correctly restore the MPIDR on exit. Oops. Since we cannot trust cpu_logical_map(), let's adopt a different strategy. We move the initialization of the host CPU context as part of the per-CPU initialization (which, in retrospect, makes a lot of sense), and directly read the MPIDR from the HW. This is guaranteed to work on both arm and arm64. Reported-by: NAndre Przywara <Andre.Przywara@arm.com> Tested-by: NAndre Przywara <Andre.Przywara@arm.com> Fixes: 32f13955 ("arm/arm64: KVM: Statically configure the host's view of MPIDR") Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
-
- 05 6月, 2019 2 次提交
-
-
由 Thomas Gleixner 提交于
Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license version 2 as published by the free software foundation this program is distributed in the hope that it will be useful but without any warranty without even the implied warranty of merchantability or fitness for a particular purpose see the gnu general public license for more details you should have received a copy of the gnu general public license along with this program if not write to the free software foundation 51 franklin street fifth floor boston ma 02110 1301 usa extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 67 file(s). Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NAllison Randal <allison@lohutok.net> Reviewed-by: NRichard Fontana <rfontana@redhat.com> Reviewed-by: NAlexios Zavras <alexios.zavras@intel.com> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190529141333.953658117@linutronix.deSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
由 Sean Christopherson 提交于
Add a wrapper to invoke kvm_arch_check_processor_compat() so that the boilerplate ugliness of checking virtualization support on all CPUs is hidden from the arch specific code. x86's implementation in particular is quite heinous, as it unnecessarily propagates the out-param pattern into kvm_x86_ops. While the x86 specific issue could be resolved solely by changing kvm_x86_ops, make the change for all architectures as returning a value directly is prettier and technically more robust, e.g. s390 doesn't set the out param, which could lead to subtle breakage in the (highly unlikely) scenario where the out-param was not pre-initialized by the caller. Opportunistically annotate svm_check_processor_compat() with __init. Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: NCornelia Huck <cohuck@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 28 5月, 2019 1 次提交
-
-
由 Thomas Huth 提交于
KVM_CAP_MAX_VCPU_ID is currently always reporting KVM_MAX_VCPU_ID on all architectures. However, on s390x, the amount of usable CPUs is determined during runtime - it is depending on the features of the machine the code is running on. Since we are using the vcpu_id as an index into the SCA structures that are defined by the hardware (see e.g. the sca_add_vcpu() function), it is not only the amount of CPUs that is limited by the hard- ware, but also the range of IDs that we can use. Thus KVM_CAP_MAX_VCPU_ID must be determined during runtime on s390x, too. So the handling of KVM_CAP_MAX_VCPU_ID has to be moved from the common code into the architecture specific code, and on s390x we have to return the same value here as for KVM_CAP_MAX_VCPUS. This problem has been discovered with the kvm_create_max_vcpus selftest. With this change applied, the selftest now passes on s390x, too. Reviewed-by: NAndrew Jones <drjones@redhat.com> Reviewed-by: NCornelia Huck <cohuck@redhat.com> Reviewed-by: NDavid Hildenbrand <david@redhat.com> Signed-off-by: NThomas Huth <thuth@redhat.com> Message-Id: <20190523164309.13345-9-thuth@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
-
- 25 4月, 2019 1 次提交
-
-
由 Andrew Jones 提交于
A failed KVM_ARM_VCPU_INIT should not set the vcpu target, as the vcpu target is used by kvm_vcpu_initialized() to determine if other vcpu ioctls may proceed. We need to set the target before calling kvm_reset_vcpu(), but if that call fails, we should then unset it and clear the feature bitmap while we're at it. Signed-off-by: NAndrew Jones <drjones@redhat.com> [maz: Simplified patch, completed commit message] Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
-
- 24 4月, 2019 3 次提交
-
-
由 Andrew Murray 提交于
With VHE different exception levels are used between the host (EL2) and guest (EL1) with a shared exception level for userpace (EL0). We can take advantage of this and use the PMU's exception level filtering to avoid enabling/disabling counters in the world-switch code. Instead we just modify the counter type to include or exclude EL0 at vcpu_{load,put} time. We also ensure that trapped PMU system register writes do not re-enable EL0 when reconfiguring the backing perf events. This approach completely avoids blackout windows seen with !VHE. Suggested-by: NChristoffer Dall <christoffer.dall@arm.com> Signed-off-by: NAndrew Murray <andrew.murray@arm.com> Acked-by: NWill Deacon <will.deacon@arm.com> Reviewed-by: NSuzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
-
由 Andrew Murray 提交于
The virt/arm core allocates a kvm_cpu_context_t percpu, at present this is a typedef to kvm_cpu_context and is used to store host cpu context. The kvm_cpu_context structure is also used elsewhere to hold vcpu context. In order to use the percpu to hold additional future host information we encapsulate kvm_cpu_context in a new structure and rename the typedef and percpu to match. Signed-off-by: NAndrew Murray <andrew.murray@arm.com> Reviewed-by: NSuzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
-
由 Mark Rutland 提交于
When pointer authentication is supported, a guest may wish to use it. This patch adds the necessary KVM infrastructure for this to work, with a semi-lazy context switch of the pointer auth state. Pointer authentication feature is only enabled when VHE is built in the kernel and present in the CPU implementation so only VHE code paths are modified. When we schedule a vcpu, we disable guest usage of pointer authentication instructions and accesses to the keys. While these are disabled, we avoid context-switching the keys. When we trap the guest trying to use pointer authentication functionality, we change to eagerly context-switching the keys, and enable the feature. The next time the vcpu is scheduled out/in, we start again. However the host key save is optimized and implemented inside ptrauth instruction/register access trap. Pointer authentication consists of address authentication and generic authentication, and CPUs in a system might have varied support for either. Where support for either feature is not uniform, it is hidden from guests via ID register emulation, as a result of the cpufeature framework in the host. Unfortunately, address authentication and generic authentication cannot be trapped separately, as the architecture provides a single EL2 trap covering both. If we wish to expose one without the other, we cannot prevent a (badly-written) guest from intermittently using a feature which is not uniformly supported (when scheduled on a physical CPU which supports the relevant feature). Hence, this patch expects both type of authentication to be present in a cpu. This switch of key is done from guest enter/exit assembly as preparation for the upcoming in-kernel pointer authentication support. Hence, these key switching routines are not implemented in C code as they may cause pointer authentication key signing error in some situations. Signed-off-by: NMark Rutland <mark.rutland@arm.com> [Only VHE, key switch in full assembly, vcpu_has_ptrauth checks , save host key in ptrauth exception trap] Signed-off-by: NAmit Daniel Kachhap <amit.kachhap@arm.com> Reviewed-by: NJulien Thierry <julien.thierry@arm.com> Cc: Christoffer Dall <christoffer.dall@arm.com> Cc: kvmarm@lists.cs.columbia.edu [maz: various fixups] Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
-
- 19 4月, 2019 1 次提交
-
-
由 Dave Martin 提交于
The introduction of kvm_arm_init_arch_resources() looks like premature factoring, since nothing else uses this hook yet and it is not clear what will use it in the future. For now, let's not pretend that this is a general thing: This patch simply renames the function to kvm_arm_init_sve(), retaining the arm stub version under the new name. Suggested-by: NAndrew Jones <drjones@redhat.com> Signed-off-by: NDave Martin <Dave.Martin@arm.com> Reviewed-by: NAndrew Jones <drjones@redhat.com> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
-
- 16 4月, 2019 1 次提交
-
-
由 Paolo Bonzini 提交于
All architectures except MIPS were defining it in the same way, and memory slots are handled entirely by common code so there is no point in keeping the definition per-architecture. Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 29 3月, 2019 2 次提交
-
-
由 Dave Martin 提交于
Some aspects of vcpu configuration may be too complex to be completed inside KVM_ARM_VCPU_INIT. Thus, there may be a requirement for userspace to do some additional configuration before various other ioctls will work in a consistent way. In particular this will be the case for SVE, where userspace will need to negotiate the set of vector lengths to be made available to the guest before the vcpu becomes fully usable. In order to provide an explicit way for userspace to confirm that it has finished setting up a particular vcpu feature, this patch adds a new ioctl KVM_ARM_VCPU_FINALIZE. When userspace has opted into a feature that requires finalization, typically by means of a feature flag passed to KVM_ARM_VCPU_INIT, a matching call to KVM_ARM_VCPU_FINALIZE is now required before KVM_RUN or KVM_GET_REG_LIST is allowed. Individual features may impose additional restrictions where appropriate. No existing vcpu features are affected by this, so current userspace implementations will continue to work exactly as before, with no need to issue KVM_ARM_VCPU_FINALIZE. As implemented in this patch, KVM_ARM_VCPU_FINALIZE is currently a placeholder: no finalizable features exist yet, so ioctl is not required and will always yield EINVAL. Subsequent patches will add the finalization logic to make use of this ioctl for SVE. No functional change for existing userspace. Signed-off-by: NDave Martin <Dave.Martin@arm.com> Reviewed-by: NJulien Thierry <julien.thierry@arm.com> Tested-by: Nzhang.lei <zhang.lei@jp.fujitsu.com> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
-
由 Dave Martin 提交于
This patch adds a kvm_arm_init_arch_resources() hook to perform subarch-specific initialisation when starting up KVM. This will be used in a subsequent patch for global SVE-related setup on arm64. No functional change. Signed-off-by: NDave Martin <Dave.Martin@arm.com> Reviewed-by: NJulien Thierry <julien.thierry@arm.com> Tested-by: Nzhang.lei <zhang.lei@jp.fujitsu.com> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
-
- 20 2月, 2019 5 次提交
-
-
由 Colin Ian King 提交于
There is a spelling mistake in a kvm_err error message. Fix it. Signed-off-by: NColin Ian King <colin.king@canonical.com> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
-
由 Christoffer Dall 提交于
Instead of calling into kvm_timer_[un]schedule from the main kvm blocking path, test if the VCPU is on the wait queue from the load/put path and perform the background timer setup/cancel in this path. This has the distinct advantage that we no longer race between load/put and schedule/unschedule and programming and canceling of the bg_timer always happens when the timer state is not loaded. Note that we must now remove the checks in kvm_timer_blocking that do not schedule a background timer if one of the timers can fire, because we no longer have a guarantee that kvm_vcpu_check_block() will be called before kvm_timer_blocking. Reported-by: NAndre Przywara <andre.przywara@arm.com> Signed-off-by: NChristoffer Dall <christoffer.dall@arm.com> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
-
由 Christoffer Dall 提交于
In preparation for nested virtualization where we are going to have more than a single VMID per VM, let's factor out the VMID data into a separate VMID data structure and change the VMID allocator to operate on this new structure instead of using a struct kvm. This also means that udate_vttbr now becomes update_vmid, and that the vttbr itself is generated on the fly based on the stage 2 page table base address and the vmid. We cache the physical address of the pgd when allocating the pgd to avoid doing the calculation on every entry to the guest and to avoid calling into potentially non-hyp-mapped code from hyp/EL2. If we wanted to merge the VMID allocator with the arm64 ASID allocator at some point in the future, it should actually become easier to do that after this patch. Note that to avoid mapping the kvm_vmid_bits variable into hyp, we simply forego the masking of the vmid value in kvm_get_vttbr and rely on update_vmid to always assign a valid vmid value (within the supported range). Reviewed-by: NMarc Zyngier <marc.zyngier@arm.com> [maz: minor cleanups] Reviewed-by: NJulien Thierry <julien.thierry@arm.com> Signed-off-by: NChristoffer Dall <christoffer.dall@arm.com> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
-
由 Marc Zyngier 提交于
We currently eagerly save/restore MPIDR. It turns out to be slightly pointless: - On the host, this value is known as soon as we're scheduled on a physical CPU - In the guest, this value cannot change, as it is set by KVM (and this is a read-only register) The result of the above is that we can perfectly avoid the eager saving of MPIDR_EL1, and only keep the restore. We just have to setup the host contexts appropriately at boot time. Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com> Acked-by: NChristoffer Dall <christoffer.dall@arm.com> Signed-off-by: NChristoffer Dall <christoffer.dall@arm.com>
-
由 Marc Zyngier 提交于
Until now, we haven't differentiated between HYP calls that have a return value and those who don't. As we're about to change this, introduce kvm_call_hyp_ret(), and change all call sites that actually make use of a return value. Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com> Acked-by: NChristoffer Dall <christoffer.dall@arm.com> Signed-off-by: NChristoffer Dall <christoffer.dall@arm.com>
-
- 07 2月, 2019 1 次提交
-
-
由 Marc Zyngier 提交于
The current kvm_psci_vcpu_on implementation will directly try to manipulate the state of the VCPU to reset it. However, since this is not done on the thread that runs the VCPU, we can end up in a strangely corrupted state when the source and target VCPUs are running at the same time. Fix this by factoring out all reset logic from the PSCI implementation and forwarding the required information along with a request to the target VCPU. Reviewed-by: NAndrew Jones <drjones@redhat.com> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com> Signed-off-by: NChristoffer Dall <christoffer.dall@arm.com>
-
- 18 12月, 2018 2 次提交
-
-
由 Christoffer Dall 提交于
We recently addressed a VMID generation race by introducing a read/write lock around accesses and updates to the vmid generation values. However, kvm_arch_vcpu_ioctl_run() also calls need_new_vmid_gen() but does so without taking the read lock. As far as I can tell, this can lead to the same kind of race: VM 0, VCPU 0 VM 0, VCPU 1 ------------ ------------ update_vttbr (vmid 254) update_vttbr (vmid 1) // roll over read_lock(kvm_vmid_lock); force_vm_exit() local_irq_disable need_new_vmid_gen == false //because vmid gen matches enter_guest (vmid 254) kvm_arch.vttbr = <PGD>:<VMID 1> read_unlock(kvm_vmid_lock); enter_guest (vmid 1) Which results in running two VCPUs in the same VM with different VMIDs and (even worse) other VCPUs from other VMs could now allocate clashing VMID 254 from the new generation as long as VCPU 0 is not exiting. Attempt to solve this by making sure vttbr is updated before another CPU can observe the updated VMID generation. Cc: stable@vger.kernel.org Fixes: f0cf47d9 "KVM: arm/arm64: Close VMID generation race" Reviewed-by: NJulien Thierry <julien.thierry@arm.com> Signed-off-by: NChristoffer Dall <christoffer.dall@arm.com> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
-
由 Mark Rutland 提交于
When we emulate a guest instruction, we don't advance the hardware singlestep state machine, and thus the guest will receive a software step exception after a next instruction which is not emulated by the host. We bodge around this in an ad-hoc fashion. Sometimes we explicitly check whether userspace requested a single step, and fake a debug exception from within the kernel. Other times, we advance the HW singlestep state rely on the HW to generate the exception for us. Thus, the observed step behaviour differs for host and guest. Let's make this simpler and consistent by always advancing the HW singlestep state machine when we skip an instruction. Thus we can rely on the hardware to generate the singlestep exception for us, and never need to explicitly check for an active-pending step, nor do we need to fake a debug exception from the guest. Cc: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: NAlex Bennée <alex.bennee@linaro.org> Reviewed-by: NChristoffer Dall <christoffer.dall@arm.com> Signed-off-by: NMark Rutland <mark.rutland@arm.com> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
-
- 14 12月, 2018 2 次提交
-
-
由 Paolo Bonzini 提交于
There are two problems with KVM_GET_DIRTY_LOG. First, and less important, it can take kvm->mmu_lock for an extended period of time. Second, its user can actually see many false positives in some cases. The latter is due to a benign race like this: 1. KVM_GET_DIRTY_LOG returns a set of dirty pages and write protects them. 2. The guest modifies the pages, causing them to be marked ditry. 3. Userspace actually copies the pages. 4. KVM_GET_DIRTY_LOG returns those pages as dirty again, even though they were not written to since (3). This is especially a problem for large guests, where the time between (1) and (3) can be substantial. This patch introduces a new capability which, when enabled, makes KVM_GET_DIRTY_LOG not write-protect the pages it returns. Instead, userspace has to explicitly clear the dirty log bits just before using the content of the page. The new KVM_CLEAR_DIRTY_LOG ioctl can also operate on a 64-page granularity rather than requiring to sync a full memslot; this way, the mmu_lock is taken for small amounts of time, and only a small amount of time will pass between write protection of pages and the sending of their content. Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
When manual dirty log reprotect will be enabled, kvm_get_dirty_log_protect's pointer argument will always be false on exit, because no TLB flush is needed until the manual re-protection operation. Rename it from "is_dirty" to "flush", which more accurately tells the caller what they have to do with it. Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 10 12月, 2018 1 次提交
-
-
由 Marc Zyngier 提交于
An SVE system is so far the only case where we mandate VHE. As we're starting to grow this requirements, let's slightly rework the way we deal with that situation, allowing for easy extension of this check. Acked-by: NChristoffer Dall <christoffer.dall@arm.com> Reviewed-by: NJames Morse <james.morse@arm.com> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com> Signed-off-by: NWill Deacon <will.deacon@arm.com>
-
- 18 10月, 2018 3 次提交
-
-
由 Dongjiu Geng 提交于
The commit 539aee0e ("KVM: arm64: Share the parts of get/set events useful to 32bit") shares the get/set events helper for arm64 and arm32, but forgot to share the cap extension code. User space will check whether KVM supports vcpu events by checking the KVM_CAP_VCPU_EVENTS extension Acked-by: NJames Morse <james.morse@arm.com> Reviewed-by : Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: NDongjiu Geng <gengdongjiu@huawei.com> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
-
由 Dongjiu Geng 提交于
Rename kvm_arch_dev_ioctl_check_extension() to kvm_arch_vm_ioctl_check_extension(), because it does not have any relationship with device. Renaming this function can make code readable. Cc: James Morse <james.morse@arm.com> Reviewed-by: NSuzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: NDongjiu Geng <gengdongjiu@huawei.com> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
-
由 Mark Rutland 提交于
At boot time, KVM stashes the host MDCR_EL2 value, but only does this when the kernel is not running in hyp mode (i.e. is non-VHE). In these cases, the stashed value of MDCR_EL2.HPMN happens to be zero, which can lead to CONSTRAINED UNPREDICTABLE behaviour. Since we use this value to derive the MDCR_EL2 value when switching to/from a guest, after a guest have been run, the performance counters do not behave as expected. This has been observed to result in accesses via PMXEVTYPER_EL0 and PMXEVCNTR_EL0 not affecting the relevant counters, resulting in events not being counted. In these cases, only the fixed-purpose cycle counter appears to work as expected. Fix this by always stashing the host MDCR_EL2 value, regardless of VHE. Cc: Christopher Dall <christoffer.dall@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: stable@vger.kernel.org Fixes: 1e947bad ("arm64: KVM: Skip HYP setup when already running in HYP") Tested-by: NRobin Murphy <robin.murphy@arm.com> Signed-off-by: NMark Rutland <mark.rutland@arm.com> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
-