- 19 4月, 2019 1 次提交
-
-
由 Dave Martin 提交于
The introduction of kvm_arm_init_arch_resources() looks like premature factoring, since nothing else uses this hook yet and it is not clear what will use it in the future. For now, let's not pretend that this is a general thing: This patch simply renames the function to kvm_arm_init_sve(), retaining the arm stub version under the new name. Suggested-by: NAndrew Jones <drjones@redhat.com> Signed-off-by: NDave Martin <Dave.Martin@arm.com> Reviewed-by: NAndrew Jones <drjones@redhat.com> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
-
- 29 3月, 2019 2 次提交
-
-
由 Dave Martin 提交于
Some aspects of vcpu configuration may be too complex to be completed inside KVM_ARM_VCPU_INIT. Thus, there may be a requirement for userspace to do some additional configuration before various other ioctls will work in a consistent way. In particular this will be the case for SVE, where userspace will need to negotiate the set of vector lengths to be made available to the guest before the vcpu becomes fully usable. In order to provide an explicit way for userspace to confirm that it has finished setting up a particular vcpu feature, this patch adds a new ioctl KVM_ARM_VCPU_FINALIZE. When userspace has opted into a feature that requires finalization, typically by means of a feature flag passed to KVM_ARM_VCPU_INIT, a matching call to KVM_ARM_VCPU_FINALIZE is now required before KVM_RUN or KVM_GET_REG_LIST is allowed. Individual features may impose additional restrictions where appropriate. No existing vcpu features are affected by this, so current userspace implementations will continue to work exactly as before, with no need to issue KVM_ARM_VCPU_FINALIZE. As implemented in this patch, KVM_ARM_VCPU_FINALIZE is currently a placeholder: no finalizable features exist yet, so ioctl is not required and will always yield EINVAL. Subsequent patches will add the finalization logic to make use of this ioctl for SVE. No functional change for existing userspace. Signed-off-by: NDave Martin <Dave.Martin@arm.com> Reviewed-by: NJulien Thierry <julien.thierry@arm.com> Tested-by: Nzhang.lei <zhang.lei@jp.fujitsu.com> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
-
由 Dave Martin 提交于
This patch adds a kvm_arm_init_arch_resources() hook to perform subarch-specific initialisation when starting up KVM. This will be used in a subsequent patch for global SVE-related setup on arm64. No functional change. Signed-off-by: NDave Martin <Dave.Martin@arm.com> Reviewed-by: NJulien Thierry <julien.thierry@arm.com> Tested-by: Nzhang.lei <zhang.lei@jp.fujitsu.com> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
-
- 20 2月, 2019 5 次提交
-
-
由 Colin Ian King 提交于
There is a spelling mistake in a kvm_err error message. Fix it. Signed-off-by: NColin Ian King <colin.king@canonical.com> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
-
由 Christoffer Dall 提交于
Instead of calling into kvm_timer_[un]schedule from the main kvm blocking path, test if the VCPU is on the wait queue from the load/put path and perform the background timer setup/cancel in this path. This has the distinct advantage that we no longer race between load/put and schedule/unschedule and programming and canceling of the bg_timer always happens when the timer state is not loaded. Note that we must now remove the checks in kvm_timer_blocking that do not schedule a background timer if one of the timers can fire, because we no longer have a guarantee that kvm_vcpu_check_block() will be called before kvm_timer_blocking. Reported-by: NAndre Przywara <andre.przywara@arm.com> Signed-off-by: NChristoffer Dall <christoffer.dall@arm.com> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
-
由 Christoffer Dall 提交于
In preparation for nested virtualization where we are going to have more than a single VMID per VM, let's factor out the VMID data into a separate VMID data structure and change the VMID allocator to operate on this new structure instead of using a struct kvm. This also means that udate_vttbr now becomes update_vmid, and that the vttbr itself is generated on the fly based on the stage 2 page table base address and the vmid. We cache the physical address of the pgd when allocating the pgd to avoid doing the calculation on every entry to the guest and to avoid calling into potentially non-hyp-mapped code from hyp/EL2. If we wanted to merge the VMID allocator with the arm64 ASID allocator at some point in the future, it should actually become easier to do that after this patch. Note that to avoid mapping the kvm_vmid_bits variable into hyp, we simply forego the masking of the vmid value in kvm_get_vttbr and rely on update_vmid to always assign a valid vmid value (within the supported range). Reviewed-by: NMarc Zyngier <marc.zyngier@arm.com> [maz: minor cleanups] Reviewed-by: NJulien Thierry <julien.thierry@arm.com> Signed-off-by: NChristoffer Dall <christoffer.dall@arm.com> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
-
由 Marc Zyngier 提交于
We currently eagerly save/restore MPIDR. It turns out to be slightly pointless: - On the host, this value is known as soon as we're scheduled on a physical CPU - In the guest, this value cannot change, as it is set by KVM (and this is a read-only register) The result of the above is that we can perfectly avoid the eager saving of MPIDR_EL1, and only keep the restore. We just have to setup the host contexts appropriately at boot time. Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com> Acked-by: NChristoffer Dall <christoffer.dall@arm.com> Signed-off-by: NChristoffer Dall <christoffer.dall@arm.com>
-
由 Marc Zyngier 提交于
Until now, we haven't differentiated between HYP calls that have a return value and those who don't. As we're about to change this, introduce kvm_call_hyp_ret(), and change all call sites that actually make use of a return value. Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com> Acked-by: NChristoffer Dall <christoffer.dall@arm.com> Signed-off-by: NChristoffer Dall <christoffer.dall@arm.com>
-
- 07 2月, 2019 1 次提交
-
-
由 Marc Zyngier 提交于
The current kvm_psci_vcpu_on implementation will directly try to manipulate the state of the VCPU to reset it. However, since this is not done on the thread that runs the VCPU, we can end up in a strangely corrupted state when the source and target VCPUs are running at the same time. Fix this by factoring out all reset logic from the PSCI implementation and forwarding the required information along with a request to the target VCPU. Reviewed-by: NAndrew Jones <drjones@redhat.com> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com> Signed-off-by: NChristoffer Dall <christoffer.dall@arm.com>
-
- 18 12月, 2018 2 次提交
-
-
由 Christoffer Dall 提交于
We recently addressed a VMID generation race by introducing a read/write lock around accesses and updates to the vmid generation values. However, kvm_arch_vcpu_ioctl_run() also calls need_new_vmid_gen() but does so without taking the read lock. As far as I can tell, this can lead to the same kind of race: VM 0, VCPU 0 VM 0, VCPU 1 ------------ ------------ update_vttbr (vmid 254) update_vttbr (vmid 1) // roll over read_lock(kvm_vmid_lock); force_vm_exit() local_irq_disable need_new_vmid_gen == false //because vmid gen matches enter_guest (vmid 254) kvm_arch.vttbr = <PGD>:<VMID 1> read_unlock(kvm_vmid_lock); enter_guest (vmid 1) Which results in running two VCPUs in the same VM with different VMIDs and (even worse) other VCPUs from other VMs could now allocate clashing VMID 254 from the new generation as long as VCPU 0 is not exiting. Attempt to solve this by making sure vttbr is updated before another CPU can observe the updated VMID generation. Cc: stable@vger.kernel.org Fixes: f0cf47d9 "KVM: arm/arm64: Close VMID generation race" Reviewed-by: NJulien Thierry <julien.thierry@arm.com> Signed-off-by: NChristoffer Dall <christoffer.dall@arm.com> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
-
由 Mark Rutland 提交于
When we emulate a guest instruction, we don't advance the hardware singlestep state machine, and thus the guest will receive a software step exception after a next instruction which is not emulated by the host. We bodge around this in an ad-hoc fashion. Sometimes we explicitly check whether userspace requested a single step, and fake a debug exception from within the kernel. Other times, we advance the HW singlestep state rely on the HW to generate the exception for us. Thus, the observed step behaviour differs for host and guest. Let's make this simpler and consistent by always advancing the HW singlestep state machine when we skip an instruction. Thus we can rely on the hardware to generate the singlestep exception for us, and never need to explicitly check for an active-pending step, nor do we need to fake a debug exception from the guest. Cc: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: NAlex Bennée <alex.bennee@linaro.org> Reviewed-by: NChristoffer Dall <christoffer.dall@arm.com> Signed-off-by: NMark Rutland <mark.rutland@arm.com> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
-
- 14 12月, 2018 2 次提交
-
-
由 Paolo Bonzini 提交于
There are two problems with KVM_GET_DIRTY_LOG. First, and less important, it can take kvm->mmu_lock for an extended period of time. Second, its user can actually see many false positives in some cases. The latter is due to a benign race like this: 1. KVM_GET_DIRTY_LOG returns a set of dirty pages and write protects them. 2. The guest modifies the pages, causing them to be marked ditry. 3. Userspace actually copies the pages. 4. KVM_GET_DIRTY_LOG returns those pages as dirty again, even though they were not written to since (3). This is especially a problem for large guests, where the time between (1) and (3) can be substantial. This patch introduces a new capability which, when enabled, makes KVM_GET_DIRTY_LOG not write-protect the pages it returns. Instead, userspace has to explicitly clear the dirty log bits just before using the content of the page. The new KVM_CLEAR_DIRTY_LOG ioctl can also operate on a 64-page granularity rather than requiring to sync a full memslot; this way, the mmu_lock is taken for small amounts of time, and only a small amount of time will pass between write protection of pages and the sending of their content. Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Paolo Bonzini 提交于
When manual dirty log reprotect will be enabled, kvm_get_dirty_log_protect's pointer argument will always be false on exit, because no TLB flush is needed until the manual re-protection operation. Rename it from "is_dirty" to "flush", which more accurately tells the caller what they have to do with it. Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 10 12月, 2018 1 次提交
-
-
由 Marc Zyngier 提交于
An SVE system is so far the only case where we mandate VHE. As we're starting to grow this requirements, let's slightly rework the way we deal with that situation, allowing for easy extension of this check. Acked-by: NChristoffer Dall <christoffer.dall@arm.com> Reviewed-by: NJames Morse <james.morse@arm.com> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com> Signed-off-by: NWill Deacon <will.deacon@arm.com>
-
- 18 10月, 2018 3 次提交
-
-
由 Dongjiu Geng 提交于
The commit 539aee0e ("KVM: arm64: Share the parts of get/set events useful to 32bit") shares the get/set events helper for arm64 and arm32, but forgot to share the cap extension code. User space will check whether KVM supports vcpu events by checking the KVM_CAP_VCPU_EVENTS extension Acked-by: NJames Morse <james.morse@arm.com> Reviewed-by : Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: NDongjiu Geng <gengdongjiu@huawei.com> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
-
由 Dongjiu Geng 提交于
Rename kvm_arch_dev_ioctl_check_extension() to kvm_arch_vm_ioctl_check_extension(), because it does not have any relationship with device. Renaming this function can make code readable. Cc: James Morse <james.morse@arm.com> Reviewed-by: NSuzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: NDongjiu Geng <gengdongjiu@huawei.com> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
-
由 Mark Rutland 提交于
At boot time, KVM stashes the host MDCR_EL2 value, but only does this when the kernel is not running in hyp mode (i.e. is non-VHE). In these cases, the stashed value of MDCR_EL2.HPMN happens to be zero, which can lead to CONSTRAINED UNPREDICTABLE behaviour. Since we use this value to derive the MDCR_EL2 value when switching to/from a guest, after a guest have been run, the performance counters do not behave as expected. This has been observed to result in accesses via PMXEVTYPER_EL0 and PMXEVCNTR_EL0 not affecting the relevant counters, resulting in events not being counted. In these cases, only the fixed-purpose cycle counter appears to work as expected. Fix this by always stashing the host MDCR_EL2 value, regardless of VHE. Cc: Christopher Dall <christoffer.dall@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: stable@vger.kernel.org Fixes: 1e947bad ("arm64: KVM: Skip HYP setup when already running in HYP") Tested-by: NRobin Murphy <robin.murphy@arm.com> Signed-off-by: NMark Rutland <mark.rutland@arm.com> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
-
- 03 10月, 2018 3 次提交
-
-
由 Marc Zyngier 提交于
__cpu_init_stage2 doesn't do anything anymore on arm64, and is totally non-sensical if running VHE (as VHE is 64bit only). Reviewed-by: NEric Auger <eric.auger@redhat.com> Reviewed-by: NSuzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
-
由 Marc Zyngier 提交于
VM tends to be a very overloaded term in KVM, so let's keep it to describe the virtual machine. For the virtual memory setup, let's use the "stage2" suffix. Reviewed-by: NEric Auger <eric.auger@redhat.com> Reviewed-by: NSuzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
-
由 Suzuki K Poulose 提交于
So far we have restricted the IPA size of the VM to the default value (40bits). Now that we can manage the IPA size per VM and support dynamic stage2 page tables, we can allow VMs to have larger IPA. This patch introduces a the maximum IPA size supported on the host. This is decided by the following factors : 1) Maximum PARange supported by the CPUs - This can be inferred from the system wide safe value. 2) Maximum PA size supported by the host kernel (48 vs 52) 3) Number of levels in the host page table (as we base our stage2 tables on the host table helpers). Since the stage2 page table code is dependent on the stage1 page table, we always ensure that : Number of Levels at Stage1 >= Number of Levels at Stage2 So we limit the IPA to make sure that the above condition is satisfied. This will affect the following combinations of VA_BITS and IPA for different page sizes. Host configuration | Unsupported IPA ranges 39bit VA, 4K | [44, 48] 36bit VA, 16K | [41, 48] 42bit VA, 64K | [47, 52] Supporting the above combinations need independent stage2 page table manipulation code, which would need substantial changes. We could purse the solution independently and switch the page table code once we have it ready. Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Christoffer Dall <cdall@kernel.org> Reviewed-by: NEric Auger <eric.auger@redhat.com> Signed-off-by: NSuzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
-
- 01 10月, 2018 2 次提交
-
-
由 Suzuki K Poulose 提交于
Right now the stage2 page table for a VM is hard coded, assuming an IPA of 40bits. As we are about to add support for per VM IPA, prepare the stage2 page table helpers to accept the kvm instance to make the right decision for the VM. No functional changes. Adds stage2_pgd_size(kvm) to replace S2_PGD_SIZE. Also, moves some of the definitions in arm32 to align with the arm64. Also drop the _AC() specifier constants wherever possible. Cc: Christoffer Dall <cdall@kernel.org> Acked-by: NMarc Zyngier <marc.zyngier@arm.com> Reviewed-by: NEric Auger <eric.auger@redhat.com> Signed-off-by: NSuzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
-
由 Suzuki K Poulose 提交于
Allow the arch backends to perform VM specific initialisation. This will be later used to handle IPA size configuration and per-VM VTCR configuration on arm64. Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Christoffer Dall <cdall@kernel.org> Reviewed-by: NEric Auger <eric.auger@redhat.com> Signed-off-by: NSuzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
-
- 18 9月, 2018 1 次提交
-
-
由 Vladimir Murzin 提交于
We rely on cpufeature framework to detect and enable CNP so for KVM we need to patch hyp to set CNP bit just before TTBR0_EL2 gets written. For the guest we encode CNP bit while building vttbr, so we don't need to bother with that in a world switch. Reviewed-by: NJames Morse <james.morse@arm.com> Acked-by: NCatalin Marinas <catalin.marinas@arm.com> Acked-by: NMarc Zyngier <marc.zyngier@arm.com> Signed-off-by: NVladimir Murzin <vladimir.murzin@arm.com> Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
-
- 21 7月, 2018 3 次提交
-
-
由 James Morse 提交于
arm64's new use of KVMs get_events/set_events API calls isn't just or RAS, it allows an SError that has been made pending by KVM as part of its device emulation to be migrated. Wire this up for 32bit too. We only need to read/write the HCR_VA bit, and check that no esr has been provided, as we don't yet support VDFSR. Signed-off-by: NJames Morse <james.morse@arm.com> Reviewed-by: NDongjiu Geng <gengdongjiu@huawei.com> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
-
由 James Morse 提交于
The get/set events helpers to do some work to check reserved and padding fields are zero. This is useful on 32bit too. Move this code into virt/kvm/arm/arm.c, and give the arch code some underscores. This is temporarily hidden behind __KVM_HAVE_VCPU_EVENTS until 32bit is wired up. Signed-off-by: NJames Morse <james.morse@arm.com> Reviewed-by: NDongjiu Geng <gengdongjiu@huawei.com> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
-
由 Dongjiu Geng 提交于
For the migrating VMs, user space may need to know the exception state. For example, in the machine A, KVM make an SError pending, when migrate to B, KVM also needs to pend an SError. This new IOCTL exports user-invisible states related to SError. Together with appropriate user space changes, user space can get/set the SError exception state to do migrate/snapshot/suspend. Signed-off-by: NDongjiu Geng <gengdongjiu@huawei.com> Reviewed-by: NJames Morse <james.morse@arm.com> [expanded documentation wording] Signed-off-by: NJames Morse <james.morse@arm.com> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
-
- 09 7月, 2018 1 次提交
-
-
由 Marc Zyngier 提交于
Trapping blocking WFE is extremely beneficial in situations where the system is oversubscribed, as it allows another thread to run while being blocked. In a non-oversubscribed environment, this is the complete opposite, and trapping WFE is just unnecessary overhead. Let's only enable WFE trapping if the CPU has more than a single task to run (that is, more than just the vcpu thread). Reviewed-by: NChristoffer Dall <christoffer.dall@arm.com> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
-
- 20 6月, 2018 1 次提交
-
-
由 Peter Zijlstra 提交于
Since swait basically implemented exclusive waits only, make sure the API reflects that. $ git grep -l -e "\<swake_up\>" -e "\<swait_event[^ (]*" -e "\<prepare_to_swait\>" | while read file; do sed -i -e 's/\<swake_up\>/&_one/g' -e 's/\<swait_event[^ (]*/&_exclusive/g' -e 's/\<prepare_to_swait\>/&_exclusive/g' $file; done With a few manual touch-ups. Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Acked-by: NLinus Torvalds <torvalds@linux-foundation.org> Cc: bigeasy@linutronix.de Cc: oleg@redhat.com Cc: paulmck@linux.vnet.ibm.com Cc: pbonzini@redhat.com Link: https://lkml.kernel.org/r/20180612083909.261946548@infradead.org
-
- 02 6月, 2018 2 次提交
-
-
由 Marc Orr 提交于
The kvm struct has been bloating. For example, it's tens of kilo-bytes for x86, which turns out to be a large amount of memory to allocate contiguously via kzalloc. Thus, this patch does the following: 1. Uses architecture-specific routines to allocate the kvm struct via vzalloc for x86. 2. Switches arm to __KVM_HAVE_ARCH_VM_ALLOC so that it can use vzalloc when has_vhe() is true. Other architectures continue to default to kalloc, as they have a dependency on kalloc or have a small-enough struct kvm. Signed-off-by: NMarc Orr <marcorr@google.com> Reviewed-by: NMarc Zyngier <marc.zyngier@arm.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Souptick Joarder 提交于
Use new return type vm_fault_t for fault handler. For now, this is just documenting that the function returns a VM_FAULT value rather than an errno. Once all instances are converted, vm_fault_t will become a distinct type. commit 1c8f4220 ("mm: change return type to vm_fault_t") Signed-off-by: NSouptick Joarder <jrdr.linux@gmail.com> Reviewed-by: NMatthew Wilcox <mawilcox@microsoft.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 01 6月, 2018 1 次提交
-
-
由 Marc Zyngier 提交于
In order to offer ARCH_WORKAROUND_2 support to guests, we need a bit of infrastructure. Let's add a flag indicating whether or not the guest uses SSBD mitigation. Depending on the state of this flag, allow KVM to disable ARCH_WORKAROUND_2 before entering the guest, and enable it when exiting it. Reviewed-by: NChristoffer Dall <christoffer.dall@arm.com> Reviewed-by: NMark Rutland <mark.rutland@arm.com> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com> Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
-
- 25 5月, 2018 4 次提交
-
-
由 Eric Auger 提交于
kvm_vgic_vcpu_early_init gets called after kvm_vgic_cpu_init which is confusing. The call path is as follows: kvm_vm_ioctl_create_vcpu |_ kvm_arch_cpu_create |_ kvm_vcpu_init |_ kvm_arch_vcpu_init |_ kvm_vgic_vcpu_init |_ kvm_arch_vcpu_postcreate |_ kvm_vgic_vcpu_early_init Static initialization currently done in kvm_vgic_vcpu_early_init() can be moved to kvm_vgic_vcpu_init(). So let's move the code and remove kvm_vgic_vcpu_early_init(). kvm_arch_vcpu_postcreate() does nothing. Signed-off-by: NEric Auger <eric.auger@redhat.com> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
-
由 Dave Martin 提交于
Now that the host SVE context can be saved on demand from Hyp, there is no longer any need to save this state in advance before entering the guest. This patch removes the relevant call to kvm_fpsimd_flush_cpu_state(). Since the problem that function was intended to solve now no longer exists, the function and its dependencies are also deleted. Signed-off-by: NDave Martin <Dave.Martin@arm.com> Reviewed-by: NAlex Bennée <alex.bennee@linaro.org> Acked-by: NChristoffer Dall <christoffer.dall@arm.com> Acked-by: NMarc Zyngier <marc.zyngier@arm.com> Acked-by: NCatalin Marinas <catalin.marinas@arm.com> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
-
由 Dave Martin 提交于
This patch adds SVE context saving to the hyp FPSIMD context switch path. This means that it is no longer necessary to save the host SVE state in advance of entering the guest, when in use. In order to avoid adding pointless complexity to the code, VHE is assumed if SVE is in use. VHE is an architectural prerequisite for SVE, so there is no good reason to turn CONFIG_ARM64_VHE off in kernels that support both SVE and KVM. Historically, software models exist that can expose the architecturally invalid configuration of SVE without VHE, so if this situation is detected at kvm_init() time then KVM will be disabled. Signed-off-by: NDave Martin <Dave.Martin@arm.com> Reviewed-by: NAlex Bennée <alex.bennee@linaro.org> Acked-by: NCatalin Marinas <catalin.marinas@arm.com> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
-
由 Dave Martin 提交于
This patch refactors KVM to align the host and guest FPSIMD save/restore logic with each other for arm64. This reduces the number of redundant save/restore operations that must occur, and reduces the common-case IRQ blackout time during guest exit storms by saving the host state lazily and optimising away the need to restore the host state before returning to the run loop. Four hooks are defined in order to enable this: * kvm_arch_vcpu_run_map_fp(): Called on PID change to map necessary bits of current to Hyp. * kvm_arch_vcpu_load_fp(): Set up FP/SIMD for entering the KVM run loop (parse as "vcpu_load fp"). * kvm_arch_vcpu_ctxsync_fp(): Get FP/SIMD into a safe state for re-enabling interrupts after a guest exit back to the run loop. For arm64 specifically, this involves updating the host kernel's FPSIMD context tracking metadata so that kernel-mode NEON use will cause the vcpu's FPSIMD state to be saved back correctly into the vcpu struct. This must be done before re-enabling interrupts because kernel-mode NEON may be used by softirqs. * kvm_arch_vcpu_put_fp(): Save guest FP/SIMD state back to memory and dissociate from the CPU ("vcpu_put fp"). Also, the arm64 FPSIMD context switch code is updated to enable it to save back FPSIMD state for a vcpu, not just current. A few helpers drive this: * fpsimd_bind_state_to_cpu(struct user_fpsimd_state *fp): mark this CPU as having context fp (which may belong to a vcpu) currently loaded in its registers. This is the non-task equivalent of the static function fpsimd_bind_to_cpu() in fpsimd.c. * task_fpsimd_save(): exported to allow KVM to save the guest's FPSIMD state back to memory on exit from the run loop. * fpsimd_flush_state(): invalidate any context's FPSIMD state that is currently loaded. Used to disassociate the vcpu from the CPU regs on run loop exit. These changes allow the run loop to enable interrupts (and thus softirqs that may use kernel-mode NEON) without having to save the guest's FPSIMD state eagerly. Some new vcpu_arch fields are added to make all this work. Because host FPSIMD state can now be saved back directly into current's thread_struct as appropriate, host_cpu_context is no longer used for preserving the FPSIMD state. However, it is still needed for preserving other things such as the host's system registers. To avoid ABI churn, the redundant storage space in host_cpu_context is not removed for now. arch/arm is not addressed by this patch and continues to use its current save/restore logic. It could provide implementations of the helpers later if desired. Signed-off-by: NDave Martin <Dave.Martin@arm.com> Reviewed-by: NMarc Zyngier <marc.zyngier@arm.com> Reviewed-by: NChristoffer Dall <christoffer.dall@arm.com> Reviewed-by: NAlex Bennée <alex.bennee@linaro.org> Acked-by: NCatalin Marinas <catalin.marinas@arm.com> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
-
- 17 4月, 2018 1 次提交
-
-
由 Marc Zyngier 提交于
Before entering the guest, we check whether our VMID is still part of the current generation. In order to avoid taking a lock, we start with checking that the generation is still current, and only if not current do we take the lock, recheck, and update the generation and VMID. This leaves open a small race: A vcpu can bump up the global generation number as well as the VM's, but has not updated the VMID itself yet. At that point another vcpu from the same VM comes in, checks the generation (and finds it not needing anything), and jumps into the guest. At this point, we end-up with two vcpus belonging to the same VM running with two different VMIDs. Eventually, the VMID used by the second vcpu will get reassigned, and things will really go wrong... A simple solution would be to drop this initial check, and always take the lock. This is likely to cause performance issues. A middle ground is to convert the spinlock to a rwlock, and only take the read lock on the fast path. If the check fails at that point, drop it and acquire the write lock, rechecking the condition. This ensures that the above scenario doesn't occur. Cc: stable@vger.kernel.org Reported-by: NMark Rutland <mark.rutland@arm.com> Tested-by: NShannon Zhao <zhaoshenglong@huawei.com> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
-
- 19 3月, 2018 4 次提交
-
-
由 Christoffer Dall 提交于
Just like we can program the GICv2 hypervisor control interface directly from the core vgic code, we can do the same for the GICv3 hypervisor control interface on VHE systems. We do this by simply calling the save/restore functions when we have VHE and we can then get rid of the save/restore function calls from the VHE world switch function. One caveat is that we now write GICv3 system register state before the potential early exit path in the run loop, and because we sync back state in the early exit path, we have to ensure that we read a consistent GIC state from the sync path, even though we have never actually run the guest with the newly written GIC state. We solve this by inserting an ISB in the early exit path. Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
-
由 Christoffer Dall 提交于
So far this is mostly (see below) a copy of the legacy non-VHE switch function, but we will start reworking these functions in separate directions to work on VHE and non-VHE in the most optimal way in later patches. The only difference after this patch between the VHE and non-VHE run functions is that we omit the branch-predictor variant-2 hardening for QC Falkor CPUs, because this workaround is specific to a series of non-VHE ARMv8.0 CPUs. Reviewed-by: NMarc Zyngier <marc.zyngier@arm.com> Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
-
由 Christoffer Dall 提交于
As we are about to move a bunch of save/restore logic for VHE kernels to the load and put functions, we need some infrastructure to do this. Reviewed-by: NAndrew Jones <drjones@redhat.com> Acked-by: NMarc Zyngier <marc.zyngier@arm.com> Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
-
由 Christoffer Dall 提交于
We currently have a separate read-modify-write of the HCR_EL2 on entry to the guest for the sole purpose of setting the VF and VI bits, if set. Since this is most rarely the case (only when using userspace IRQ chip and interrupts are in flight), let's get rid of this operation and instead modify the bits in the vcpu->arch.hcr[_el2] directly when needed. Acked-by: NMarc Zyngier <marc.zyngier@arm.com> Reviewed-by: NAndrew Jones <drjones@redhat.com> Reviewed-by: NJulien Thierry <julien.thierry@arm.com> Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
-