- 25 5月, 2018 2 次提交
-
-
由 Dave Martin 提交于
This patch refactors KVM to align the host and guest FPSIMD save/restore logic with each other for arm64. This reduces the number of redundant save/restore operations that must occur, and reduces the common-case IRQ blackout time during guest exit storms by saving the host state lazily and optimising away the need to restore the host state before returning to the run loop. Four hooks are defined in order to enable this: * kvm_arch_vcpu_run_map_fp(): Called on PID change to map necessary bits of current to Hyp. * kvm_arch_vcpu_load_fp(): Set up FP/SIMD for entering the KVM run loop (parse as "vcpu_load fp"). * kvm_arch_vcpu_ctxsync_fp(): Get FP/SIMD into a safe state for re-enabling interrupts after a guest exit back to the run loop. For arm64 specifically, this involves updating the host kernel's FPSIMD context tracking metadata so that kernel-mode NEON use will cause the vcpu's FPSIMD state to be saved back correctly into the vcpu struct. This must be done before re-enabling interrupts because kernel-mode NEON may be used by softirqs. * kvm_arch_vcpu_put_fp(): Save guest FP/SIMD state back to memory and dissociate from the CPU ("vcpu_put fp"). Also, the arm64 FPSIMD context switch code is updated to enable it to save back FPSIMD state for a vcpu, not just current. A few helpers drive this: * fpsimd_bind_state_to_cpu(struct user_fpsimd_state *fp): mark this CPU as having context fp (which may belong to a vcpu) currently loaded in its registers. This is the non-task equivalent of the static function fpsimd_bind_to_cpu() in fpsimd.c. * task_fpsimd_save(): exported to allow KVM to save the guest's FPSIMD state back to memory on exit from the run loop. * fpsimd_flush_state(): invalidate any context's FPSIMD state that is currently loaded. Used to disassociate the vcpu from the CPU regs on run loop exit. These changes allow the run loop to enable interrupts (and thus softirqs that may use kernel-mode NEON) without having to save the guest's FPSIMD state eagerly. Some new vcpu_arch fields are added to make all this work. Because host FPSIMD state can now be saved back directly into current's thread_struct as appropriate, host_cpu_context is no longer used for preserving the FPSIMD state. However, it is still needed for preserving other things such as the host's system registers. To avoid ABI churn, the redundant storage space in host_cpu_context is not removed for now. arch/arm is not addressed by this patch and continues to use its current save/restore logic. It could provide implementations of the helpers later if desired. Signed-off-by: NDave Martin <Dave.Martin@arm.com> Reviewed-by: NMarc Zyngier <marc.zyngier@arm.com> Reviewed-by: NChristoffer Dall <christoffer.dall@arm.com> Reviewed-by: NAlex Bennée <alex.bennee@linaro.org> Acked-by: NCatalin Marinas <catalin.marinas@arm.com> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
-
由 Dave Martin 提交于
In struct vcpu_arch, the debug_flags field is used to store debug-related flags about the vcpu state. Since we are about to add some more flags related to FPSIMD and SVE, it makes sense to add them to the existing flags field rather than adding new fields. Since there is only one debug_flags flag defined so far, there is plenty of free space for expansion. In preparation for adding more flags, this patch renames the debug_flags field to simply "flags", and updates comments appropriately. The flag definitions are also moved to <asm/kvm_host.h>, since their presence in <asm/kvm_asm.h> was for purely historical reasons: these definitions are not used from asm any more, and not very likely to be as more Hyp asm is migrated to C. KVM_ARM64_DEBUG_DIRTY_SHIFT has not been used since commit 1ea66d27 ("arm64: KVM: Move away from the assembly version of the world switch"), so this patch gets rid of that too. No functional change. Signed-off-by: NDave Martin <Dave.Martin@arm.com> Reviewed-by: NMarc Zyngier <marc.zyngier@arm.com> Reviewed-by: NAlex Bennée <alex.bennee@linaro.org> Acked-by: NChristoffer Dall <christoffer.dall@arm.com> [maz: fixed minor conflict] Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
-
- 20 4月, 2018 1 次提交
-
-
由 Marc Zyngier 提交于
Although we've implemented PSCI 0.1, 0.2 and 1.0, we expose either 0.1 or 1.0 to a guest, defaulting to the latest version of the PSCI implementation that is compatible with the requested version. This is no different from doing a firmware upgrade on KVM. But in order to give a chance to hypothetical badly implemented guests that would have a fit by discovering something other than PSCI 0.2, let's provide a new API that allows userspace to pick one particular version of the API. This is implemented as a new class of "firmware" registers, where we expose the PSCI version. This allows the PSCI version to be save/restored as part of a guest migration, and also set to any supported version if the guest requires it. Cc: stable@vger.kernel.org #4.16 Reviewed-by: NChristoffer Dall <cdall@kernel.org> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
-
- 19 3月, 2018 7 次提交
-
-
由 Christoffer Dall 提交于
We are about to defer saving and restoring some groups of system registers to vcpu_put and vcpu_load on supported systems. This means that we need some infrastructure to access system registes which supports either accessing the memory backing of the register or directly accessing the system registers, depending on the state of the system when we access the register. We do this by defining read/write accessor functions, which can handle both "immediate" and "deferrable" system registers. Immediate registers are always saved/restored in the world-switch path, but deferrable registers are only saved/restored in vcpu_put/vcpu_load when supported and sysregs_loaded_on_cpu will be set in that case. Note that we don't use the deferred mechanism yet in this patch, but only introduce infrastructure. This is to improve convenience of review in the subsequent patches where it is clear which registers become deferred. Reviewed-by: NMarc Zyngier <marc.zyngier@arm.com> Reviewed-by: NAndrew Jones <drjones@redhat.com> Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
-
由 Christoffer Dall 提交于
Currently we access the system registers array via the vcpu_sys_reg() macro. However, we are about to change the behavior to some times modify the register file directly, so let's change this to two primitives: * Accessor macros vcpu_write_sys_reg() and vcpu_read_sys_reg() * Direct array access macro __vcpu_sys_reg() The accessor macros should be used in places where the code needs to access the currently loaded VCPU's state as observed by the guest. For example, when trapping on cache related registers, a write to a system register should go directly to the VCPU version of the register. The direct array access macro can be used in places where the VCPU is known to never be running (for example userspace access) or for registers which are never context switched (for example all the PMU system registers). This rewrites all users of vcpu_sys_regs to one of the macros described above. No functional change. Acked-by: NMarc Zyngier <marc.zyngier@arm.com> Reviewed-by: NAndrew Jones <drjones@redhat.com> Signed-off-by: NChristoffer Dall <cdall@cs.columbia.edu> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
-
由 Christoffer Dall 提交于
We currently handle 32-bit accesses to trapped VM system registers using the 32-bit index into the coproc array on the vcpu structure, which is a union of the coproc array and the sysreg array. Since all the 32-bit coproc indices are created to correspond to the architectural mapping between 64-bit system registers and 32-bit coprocessor registers, and because the AArch64 system registers are the double in size of the AArch32 coprocessor registers, we can always find the system register entry that we must update by dividing the 32-bit coproc index by 2. This is going to make our lives much easier when we have to start accessing system registers that use deferred save/restore and might have to be read directly from the physical CPU. Reviewed-by: NAndrew Jones <drjones@redhat.com> Reviewed-by: NMarc Zyngier <marc.zyngier@arm.com> Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
-
由 Christoffer Dall 提交于
So far this is mostly (see below) a copy of the legacy non-VHE switch function, but we will start reworking these functions in separate directions to work on VHE and non-VHE in the most optimal way in later patches. The only difference after this patch between the VHE and non-VHE run functions is that we omit the branch-predictor variant-2 hardening for QC Falkor CPUs, because this workaround is specific to a series of non-VHE ARMv8.0 CPUs. Reviewed-by: NMarc Zyngier <marc.zyngier@arm.com> Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
-
由 Christoffer Dall 提交于
As we are about to move a bunch of save/restore logic for VHE kernels to the load and put functions, we need some infrastructure to do this. Reviewed-by: NAndrew Jones <drjones@redhat.com> Acked-by: NMarc Zyngier <marc.zyngier@arm.com> Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
-
由 Christoffer Dall 提交于
We currently have a separate read-modify-write of the HCR_EL2 on entry to the guest for the sole purpose of setting the VF and VI bits, if set. Since this is most rarely the case (only when using userspace IRQ chip and interrupts are in flight), let's get rid of this operation and instead modify the bits in the vcpu->arch.hcr[_el2] directly when needed. Acked-by: NMarc Zyngier <marc.zyngier@arm.com> Reviewed-by: NAndrew Jones <drjones@redhat.com> Reviewed-by: NJulien Thierry <julien.thierry@arm.com> Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
-
由 Christoffer Dall 提交于
We already have the percpu area for the host cpu state, which points to the VCPU, so there's no need to store the VCPU pointer on the stack on every context switch. We can be a little more clever and just use tpidr_el2 for the percpu offset and load the VCPU pointer from the host context. This has the benefit of being able to retrieve the host context even when our stack is corrupted, and it has a potential performance benefit because we trade a store plus a load for an mrs and a load on a round trip to the guest. This does require us to calculate the percpu offset without including the offset from the kernel mapping of the percpu array to the linear mapping of the array (which is what we store in tpidr_el1), because a PC-relative generated address in EL2 is already giving us the hyp alias of the linear mapping of a kernel address. We do this in __cpu_init_hyp_mode() by using kvm_ksym_ref(). The code that accesses ESR_EL2 was previously using an alternative to use the _EL1 accessor on VHE systems, but this was actually unnecessary as the _EL1 accessor aliases the ESR_EL2 register on VHE, and the _EL2 accessor does the same thing on both systems. Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Reviewed-by: NMarc Zyngier <marc.zyngier@arm.com> Reviewed-by: NAndrew Jones <drjones@redhat.com> Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
-
- 07 2月, 2018 1 次提交
-
-
由 Marc Zyngier 提交于
A new feature of SMCCC 1.1 is that it offers firmware-based CPU workarounds. In particular, SMCCC_ARCH_WORKAROUND_1 provides BP hardening for CVE-2017-5715. If the host has some mitigation for this issue, report that we deal with it using SMCCC_ARCH_WORKAROUND_1, as we apply the host workaround on every guest exit. Tested-by: NArd Biesheuvel <ard.biesheuvel@linaro.org> Reviewed-by: NChristoffer Dall <christoffer.dall@linaro.org> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com> Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
-
- 16 1月, 2018 5 次提交
-
-
由 James Morse 提交于
We expect to have firmware-first handling of RAS SErrors, with errors notified via an APEI method. For systems without firmware-first, add some minimal handling to KVM. There are two ways KVM can take an SError due to a guest, either may be a RAS error: we exit the guest due to an SError routed to EL2 by HCR_EL2.AMO, or we take an SError from EL2 when we unmask PSTATE.A from __guest_exit. The current SError from EL2 code unmasks SError and tries to fence any pending SError into a single instruction window. It then leaves SError unmasked. With the v8.2 RAS Extensions we may take an SError for a 'corrected' error, but KVM is only able to handle SError from EL2 if they occur during this single instruction window... The RAS Extensions give us a new instruction to synchronise and consume SErrors. The RAS Extensions document (ARM DDI0587), '2.4.1 ESB and Unrecoverable errors' describes ESB as synchronising SError interrupts generated by 'instructions, translation table walks, hardware updates to the translation tables, and instruction fetches on the same PE'. This makes ESB equivalent to KVMs existing 'dsb, mrs-daifclr, isb' sequence. Use the alternatives to synchronise and consume any SError using ESB instead of unmasking and taking the SError. Set ARM_EXIT_WITH_SERROR_BIT in the exit_code so that we can restart the vcpu if it turns out this SError has no impact on the vcpu. Reviewed-by: NMarc Zyngier <marc.zyngier@arm.com> Signed-off-by: NJames Morse <james.morse@arm.com> Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
-
由 James Morse 提交于
We expect to have firmware-first handling of RAS SErrors, with errors notified via an APEI method. For systems without firmware-first, add some minimal handling to KVM. There are two ways KVM can take an SError due to a guest, either may be a RAS error: we exit the guest due to an SError routed to EL2 by HCR_EL2.AMO, or we take an SError from EL2 when we unmask PSTATE.A from __guest_exit. For SError that interrupt a guest and are routed to EL2 the existing behaviour is to inject an impdef SError into the guest. Add code to handle RAS SError based on the ESR. For uncontained and uncategorized errors arm64_is_fatal_ras_serror() will panic(), these errors compromise the host too. All other error types are contained: For the fatal errors the vCPU can't make progress, so we inject a virtual SError. We ignore contained errors where we can make progress as if we're lucky, we may not hit them again. If only some of the CPUs support RAS the guest will see the cpufeature sanitised version of the id registers, but we may still take RAS SError on this CPU. Move the SError handling out of handle_exit() into a new handler that runs before we can be preempted. This allows us to use this_cpu_has_cap(), via arm64_is_ras_serror(). Acked-by: NMarc Zyngier <marc.zyngier@arm.com> Signed-off-by: NJames Morse <james.morse@arm.com> Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
-
由 James Morse 提交于
If we deliver a virtual SError to the guest, the guest may defer it with an ESB instruction. The guest reads the deferred value via DISR_EL1, but the guests view of DISR_EL1 is re-mapped to VDISR_EL2 when HCR_EL2.AMO is set. Add the KVM code to save/restore VDISR_EL2, and make it accessible to userspace as DISR_EL1. Signed-off-by: NJames Morse <james.morse@arm.com> Reviewed-by: NMarc Zyngier <marc.zyngier@arm.com> Reviewed-by: NChristoffer Dall <christoffer.dall@linaro.org> Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
-
由 James Morse 提交于
Prior to v8.2's RAS Extensions, the HCR_EL2.VSE 'virtual SError' feature generated an SError with an implementation defined ESR_EL1.ISS, because we had no mechanism to specify the ESR value. On Juno this generates an all-zero ESR, the most significant bit 'ISV' is clear indicating the remainder of the ISS field is invalid. With the RAS Extensions we have a mechanism to specify this value, and the most significant bit has a new meaning: 'IDS - Implementation Defined Syndrome'. An all-zero SError ESR now means: 'RAS error: Uncategorized' instead of 'no valid ISS'. Add KVM support for the VSESR_EL2 register to specify an ESR value when HCR_EL2.VSE generates a virtual SError. Change kvm_inject_vabt() to specify an implementation-defined value. We only need to restore the VSESR_EL2 value when HCR_EL2.VSE is set, KVM save/restores this bit during __{,de}activate_traps() and hardware clears the bit once the guest has consumed the virtual-SError. Future patches may add an API (or KVM CAP) to pend a virtual SError with a specified ESR. Cc: Dongjiu Geng <gengdongjiu@huawei.com> Reviewed-by: NMarc Zyngier <marc.zyngier@arm.com> Signed-off-by: NJames Morse <james.morse@arm.com> Reviewed-by: NChristoffer Dall <christoffer.dall@linaro.org> Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
-
由 James Morse 提交于
Non-VHE systems take an exception to EL2 in order to world-switch into the guest. When returning from the guest KVM implicitly restores the DAIF flags when it returns to the kernel at EL1. With VHE none of this exception-level jumping happens, so KVMs world-switch code is exposed to the host kernel's DAIF values, and KVM spills the guest-exit DAIF values back into the host kernel. On entry to a guest we have Debug and SError exceptions unmasked, KVM has switched VBAR but isn't prepared to handle these. On guest exit Debug exceptions are left disabled once we return to the host and will stay this way until we enter user space. Add a helper to mask/unmask DAIF around VHE guests. The unmask can only happen after the hosts VBAR value has been synchronised by the isb in __vhe_hyp_call (via kvm_call_hyp()). Masking could be as late as setting KVMs VBAR value, but is kept here for symmetry. Acked-by: NMarc Zyngier <marc.zyngier@arm.com> Signed-off-by: NJames Morse <james.morse@arm.com> Reviewed-by: NChristoffer Dall <christoffer.dall@linaro.org> Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
-
- 13 1月, 2018 1 次提交
-
-
由 James Morse 提交于
Make tpidr_el2 a cpu-offset for per-cpu variables in the same way the host uses tpidr_el1. This lets tpidr_el{1,2} have the same value, and on VHE they can be the same register. KVM calls hyp_panic() when anything unexpected happens. This may occur while a guest owns the EL1 registers. KVM stashes the vcpu pointer in tpidr_el2, which it uses to find the host context in order to restore the host EL1 registers before parachuting into the host's panic(). The host context is a struct kvm_cpu_context allocated in the per-cpu area, and mapped to hyp. Given the per-cpu offset for this CPU, this is easy to find. Change hyp_panic() to take a pointer to the struct kvm_cpu_context. Wrap these calls with an asm function that retrieves the struct kvm_cpu_context from the host's per-cpu area. Copy the per-cpu offset from the hosts tpidr_el1 into tpidr_el2 during kvm init. (Later patches will make this unnecessary for VHE hosts) We print out the vcpu pointer as part of the panic message. Add a back reference to the 'running vcpu' in the host cpu context to preserve this. Signed-off-by: NJames Morse <james.morse@arm.com> Reviewed-by: NChristoffer Dall <cdall@linaro.org> Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
-
- 02 1月, 2018 1 次提交
-
-
由 Christoffer Dall 提交于
We currently check if the VM has a userspace irqchip in several places along the critical path, and if so, we do some work which is only required for having an irqchip in userspace. This is unfortunate, as we could avoid doing any work entirely, if we didn't have to support irqchip in userspace. Realizing the userspace irqchip on ARM is mostly a developer or hobby feature, and is unlikely to be used in servers or other scenarios where performance is a priority, we can use a refcounted static key to only check the irqchip configuration when we have at least one VM that uses an irqchip in userspace. Reviewed-by: NMarc Zyngier <marc.zyngier@arm.com> Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
-
- 29 11月, 2017 1 次提交
-
-
由 Alex Bennée 提交于
After emulating instructions we may want return to user-space to handle single-step debugging. Introduce a helper function, which, if single-step is enabled, sets the run structure for return and returns true. Signed-off-by: NAlex Bennée <alex.bennee@linaro.org> Reviewed-by: NJulien Thierry <julien.thierry@arm.com> Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
-
- 03 11月, 2017 1 次提交
-
-
由 Dave Martin 提交于
Until KVM has full SVE support, guests must not be allowed to execute SVE instructions. This patch enables the necessary traps, and also ensures that the traps are disabled again on exit from the guest so that the host can still use SVE if it wants to. On guest exit, high bits of the SVE Zn registers may have been clobbered as a side-effect the execution of FPSIMD instructions in the guest. The existing KVM host FPSIMD restore code is not sufficient to restore these bits, so this patch explicitly marks the CPU as not containing cached vector state for any task, thus forcing a reload on the next return to userspace. This is an interim measure, in advance of adding full SVE awareness to KVM. This marking of cached vector state in the CPU as invalid is done using __this_cpu_write(fpsimd_last_state, NULL) in fpsimd.c. Due to the repeated use of this rather obscure operation, it makes sense to factor it out as a separate helper with a clearer name. This patch factors it out as fpsimd_flush_cpu_state(), and ports all callers to use it. As a side effect of this refactoring, a this_cpu_write() in fpsimd_cpu_pm_notifier() is changed to __this_cpu_write(). This should be fine, since cpu_pm_enter() is supposed to be called only with interrupts disabled. Signed-off-by: NDave Martin <Dave.Martin@arm.com> Reviewed-by: NAlex Bennée <alex.bennee@linaro.org> Reviewed-by: NChristoffer Dall <christoffer.dall@linaro.org> Acked-by: NCatalin Marinas <catalin.marinas@arm.com> Acked-by: NMarc Zyngier <marc.zyngier@arm.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: NWill Deacon <will.deacon@arm.com>
-
- 01 9月, 2017 1 次提交
-
-
由 Jérôme Glisse 提交于
Calls to mmu_notifier_invalidate_page() were replaced by calls to mmu_notifier_invalidate_range() and are now bracketed by calls to mmu_notifier_invalidate_range_start()/end() Remove now useless invalidate_page callback. Changed since v1 (Linus Torvalds) - remove now useless kvm_arch_mmu_notifier_invalidate_page() Signed-off-by: NJérôme Glisse <jglisse@redhat.com> Tested-by: NMike Galbraith <efault@gmx.de> Tested-by: NAdam Borowski <kilobyte@angband.pl> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: kvm@vger.kernel.org Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 04 6月, 2017 3 次提交
-
-
由 Andrew Jones 提交于
Don't use request-less VCPU kicks when injecting IRQs, as a VCPU kick meant to trigger the interrupt injection could be sent while the VCPU is outside guest mode, which means no IPI is sent, and after it has called kvm_vgic_flush_hwstate(), meaning it won't see the updated GIC state until its next exit some time later for some other reason. The receiving VCPU only needs to check this request in VCPU RUN to handle it. By checking it, if it's pending, a memory barrier will be issued that ensures all state is visible. See "Ensuring Requests Are Seen" of Documentation/virtual/kvm/vcpu-requests.rst Signed-off-by: NAndrew Jones <drjones@redhat.com> Reviewed-by: NChristoffer Dall <cdall@linaro.org> Signed-off-by: NChristoffer Dall <cdall@linaro.org>
-
由 Andrew Jones 提交于
A request called EXIT is too generic. All requests are meant to cause exits, but different requests have different flags. Let's not make it difficult to decide if the EXIT request is correct for some case by just always providing unique requests for each case. This patch changes EXIT to SLEEP, because that's what the request is asking the VCPU to do. Signed-off-by: NAndrew Jones <drjones@redhat.com> Acked-by: NChristoffer Dall <cdall@linaro.org> Signed-off-by: NChristoffer Dall <cdall@linaro.org>
-
由 Andrew Jones 提交于
Marc Zyngier suggested that we define the arch specific VCPU request base, rather than requiring each arch to remember to start from 8. That suggestion, along with Radim Krcmar's recent VCPU request flag addition, snowballed into defining something of an arch VCPU request defining API. No functional change. (Looks like x86 is running out of arch VCPU request bits. Maybe someday we'll need to extend to 64.) Signed-off-by: NAndrew Jones <drjones@redhat.com> Acked-by: NChristoffer Dall <cdall@linaro.org> Signed-off-by: NChristoffer Dall <cdall@linaro.org>
-
- 23 5月, 2017 1 次提交
-
-
由 Christoffer Dall 提交于
We don't need to stop a specific VCPU when changing the active state, because private IRQs can only be modified by a running VCPU for the VCPU itself and it is therefore already stopped. However, it is also possible for two VCPUs to be modifying the active state of SPIs at the same time, which can cause the thread being stuck in the loop that checks other VCPU threads for a potentially very long time, or to modify the active state of a running VCPU. Fix this by serializing all accesses to setting and clearing the active state of interrupts using the KVM mutex. Reported-by: NAndrew Jones <drjones@redhat.com> Signed-off-by: NChristoffer Dall <cdall@linaro.org> Reviewed-by: NMarc Zyngier <marc.zyngier@arm.com>
-
- 18 5月, 2017 1 次提交
-
-
由 Mark Rutland 提交于
Currently, cpus_set_cap() calls static_branch_enable_cpuslocked(), which must take the jump_label mutex. We call cpus_set_cap() in the secondary bringup path, from the idle thread where interrupts are disabled. Taking a mutex in this path "is a NONO" regardless of whether it's contended, and something we must avoid. We didn't spot this until recently, as ___might_sleep() won't warn for this case until all CPUs have been brought up. This patch avoids taking the mutex in the secondary bringup path. The poking of static keys is deferred until enable_cpu_capabilities(), which runs in a suitable context on the boot CPU. To account for the static keys being set later, cpus_have_const_cap() is updated to use another static key to check whether the const cap keys have been initialised, falling back to the caps bitmap until this is the case. This means that users of cpus_have_const_cap() gain should only gain a single additional NOP in the fast path once the const caps are initialised, but should always see the current cap value. The hyp code should never dereference the caps array, since the caps are initialized before we run the module initcall to initialise hyp. A check is added to the hyp init code to document this requirement. This change will sidestep a number of issues when the upcoming hotplug locking rework is merged. Signed-off-by: NMark Rutland <mark.rutland@arm.com> Reviewed-by: NMarc Zyniger <marc.zyngier@arm.com> Reviewed-by: NSuzuki Poulose <suzuki.poulose@arm.com> Acked-by: NWill Deacon <will.deacon@arm.com> Cc: Christoffer Dall <christoffer.dall@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Sewior <bigeasy@linutronix.de> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
-
- 27 4月, 2017 2 次提交
-
-
由 Paolo Bonzini 提交于
kvm_make_all_requests() provides a synchronization that waits until all kicked VCPUs have acknowledged the kick. This is important for KVM_REQ_MMU_RELOAD as it prevents freeing while lockless paging is underway. This patch adds the synchronization property into all requests that are currently being used with kvm_make_all_requests() in order to preserve the current behavior and only introduce a new framework. Removing it from requests where it is not necessary is left for future patches. Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Radim Krčmář 提交于
Some operations must ensure that the guest is not running with stale data, but if the guest is halted, then the update can wait until another event happens. kvm_make_all_requests() currently doesn't wake up, so we can mark all requests used with it. First 8 bits were arbitrarily reserved for request numbers. Most uses of requests have the request type as a constant, so a compiler will optimize the '&'. An alternative would be to have an inline function that would return whether the request needs a wake-up or not, but I like this one better even though it might produce worse assembly. Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com> Reviewed-by: NAndrew Jones <drjones@redhat.com> Reviewed-by: NCornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 09 4月, 2017 2 次提交
-
-
由 Marc Zyngier 提交于
__cpu_reset_hyp_mode doesn't need to be passed any argument now, as the hyp-stub implementations are self-contained, and is now reduced to just calling __hyp_reset_vectors(). Let's drop the wrapper and use the stub hypercall directly. Acked-by: NCatalin Marinas <catalin.marinas@arm.com> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com> Signed-off-by: NChristoffer Dall <cdall@linaro.org>
-
由 Marc Zyngier 提交于
We are now able to use the hyp stub to reset HYP mode. Time to kiss __kvm_hyp_reset goodbye, and use __hyp_reset_vectors. Acked-by: NCatalin Marinas <catalin.marinas@arm.com> Reviewed-by: NJames Morse <james.morse@arm.com> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com> Signed-off-by: NChristoffer Dall <cdall@linaro.org>
-
- 07 4月, 2017 1 次提交
-
-
由 Paolo Bonzini 提交于
Its value has never changed; we might as well make it part of the ABI instead of using the return value of KVM_CHECK_EXTENSION(KVM_CAP_COALESCED_MMIO). Because PPC does not always make MMIO available, the code has to be made dependent on CONFIG_KVM_MMIO rather than KVM_COALESCED_MMIO_PAGE_OFFSET. Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
- 09 3月, 2017 2 次提交
-
-
由 Linu Cherian 提交于
Having only 32 memslots is a real constraint for the maximum number of PCI devices that can be assigned to a single guest. Assuming each PCI device/virtual function having two memory BAR regions, we could assign only 15 devices/virtual functions to a guest. Hence increase KVM_USER_MEM_SLOTS to 512 as done in other archs like powerpc. Reviewed-by: NChristoffer Dall <cdall@linaro.org> Signed-off-by: NLinu Cherian <linu.cherian@cavium.com> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
-
由 Linu Cherian 提交于
arm/arm64 architecture doesnt use private memslots, hence removing KVM_PRIVATE_MEM_SLOTS macro definition. Reviewed-by: NChristoffer Dall <cdall@linaro.org> Signed-off-by: NLinu Cherian <linu.cherian@cavium.com> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
-
- 08 2月, 2017 1 次提交
-
-
由 Jintack Lim 提交于
Make cntvoff per each timer context. This is helpful to abstract kvm timer functions to work with timer context without considering timer types (e.g. physical timer or virtual timer). This also would pave the way for ever doing adjustments of the cntvoff on a per-CPU basis if that should ever make sense. Signed-off-by: NJintack Lim <jintack@cs.columbia.edu> Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
-
- 03 2月, 2017 1 次提交
-
-
由 Will Deacon 提交于
The SPE buffer is virtually addressed, using the page tables of the CPU MMU. Unusually, this means that the EL0/1 page table may be live whilst we're executing at EL2 on non-VHE configurations. When VHE is in use, we can use the same property to profile the guest behind its back. This patch adds the relevant disabling and flushing code to KVM so that the host can make use of SPE without corrupting guest memory, and any attempts by a guest to use SPE will result in a trap. Acked-by: NMarc Zyngier <marc.zyngier@arm.com> Cc: Alex Bennée <alex.bennee@linaro.org> Cc: Christoffer Dall <christoffer.dall@linaro.org> Signed-off-by: NWill Deacon <will.deacon@arm.com>
-
- 05 11月, 2016 1 次提交
-
-
由 Marc Zyngier 提交于
Architecturally, TLBs are private to the (physical) CPU they're associated with. But when multiple vcpus from the same VM are being multiplexed on the same CPU, the TLBs are not private to the vcpus (and are actually shared across the VMID). Let's consider the following scenario: - vcpu-0 maps PA to VA - vcpu-1 maps PA' to VA If run on the same physical CPU, vcpu-1 can hit TLB entries generated by vcpu-0 accesses, and access the wrong physical page. The solution to this is to keep a per-VM map of which vcpu ran last on each given physical CPU, and invalidate local TLBs when switching to a different vcpu from the same VM. Reviewed-by: NChristoffer Dall <christoffer.dall@linaro.org> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
-
- 08 9月, 2016 1 次提交
-
-
由 Suraj Jitindar Singh 提交于
vms and vcpus have statistics associated with them which can be viewed within the debugfs. Currently it is assumed within the vcpu_stat_get() and vm_stat_get() functions that all of these statistics are represented as u32s, however the next patch adds some u64 vcpu statistics. Change all vcpu statistics to u64 and modify vcpu_stat_get() accordingly. Since vcpu statistics are per vcpu, they will only be updated by a single vcpu at a time so this shouldn't present a problem on 32-bit machines which can't atomically increment 64-bit numbers. However vm statistics could potentially be updated by multiple vcpus from that vm at a time. To avoid the overhead of atomics make all vm statistics ulong such that they are 64-bit on 64-bit systems where they can be atomically incremented and are 32-bit on 32-bit systems which may not be able to atomically increment 64-bit numbers. Modify vm_stat_get() to expect ulongs. Signed-off-by: NSuraj Jitindar Singh <sjitindarsingh@gmail.com> Reviewed-by: NDavid Matlack <dmatlack@google.com> Acked-by: NChristian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
-
- 19 7月, 2016 1 次提交
-
-
由 Andre Przywara 提交于
KVM capabilities can be a per-VM property, though ARM/ARM64 currently does not pass on the VM pointer to the architecture specific capability handlers. Add a "struct kvm*" parameter to those function to later allow proper per-VM capability reporting. Signed-off-by: NAndre Przywara <andre.przywara@arm.com> Reviewed-by: NEric Auger <eric.auger@linaro.org> Reviewed-by: NMarc Zyngier <marc.zyngier@arm.com> Acked-by: NChristoffer Dall <christoffer.dall@linaro.org> Tested-by: NEric Auger <eric.auger@redhat.com> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
-
- 04 7月, 2016 2 次提交
-
-
由 Marc Zyngier 提交于
So far, KVM was getting in the way of kexec on 32bit (and the arm64 kexec hackers couldn't be bothered to fix it on 32bit...). With simpler page tables, tearing KVM down becomes very easy, so let's just do it. Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com> Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
-
由 Marc Zyngier 提交于
Since we now only have one set of page tables, the concept of boot_pgd is useless and can be removed. We still keep it as an element of the "extended idmap" thing. Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com> Signed-off-by: NChristoffer Dall <christoffer.dall@linaro.org>
-