- 24 3月, 2019 1 次提交
-
-
由 Julien Thierry 提交于
[ Upstream commit fc3bc475231e12e9c0142f60100cf84d077c79e1 ] vgic_dist->lpi_list_lock must always be taken with interrupts disabled as it is used in interrupt context. For configurations such as PREEMPT_RT_FULL, this means that it should be a raw_spinlock since RT spinlocks are interruptible. Signed-off-by: NJulien Thierry <julien.thierry@arm.com> Acked-by: NChristoffer Dall <christoffer.dall@arm.com> Acked-by: NMarc Zyngier <marc.zyngier@arm.com> Signed-off-by: NChristoffer Dall <christoffer.dall@arm.com> Signed-off-by: NSasha Levin <sashal@kernel.org>
-
- 13 2月, 2019 3 次提交
-
-
由 Jann Horn 提交于
commit cfa39381173d5f969daf43582c95ad679189cbc9 upstream. kvm_ioctl_create_device() does the following: 1. creates a device that holds a reference to the VM object (with a borrowed reference, the VM's refcount has not been bumped yet) 2. initializes the device 3. transfers the reference to the device to the caller's file descriptor table 4. calls kvm_get_kvm() to turn the borrowed reference to the VM into a real reference The ownership transfer in step 3 must not happen before the reference to the VM becomes a proper, non-borrowed reference, which only happens in step 4. After step 3, an attacker can close the file descriptor and drop the borrowed reference, which can cause the refcount of the kvm object to drop to zero. This means that we need to grab a reference for the device before anon_inode_getfd(), otherwise the VM can disappear from under us. Fixes: 852b6d57 ("kvm: add device control API") Cc: stable@kernel.org Signed-off-by: NJann Horn <jannh@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
由 Jim Mattson 提交于
[ Upstream commit 7a86dab8cf2f0fdf508f3555dddfc236623bff60 ] Since the offset is added directly to the hva from the gfn_to_hva_cache, a negative offset could result in an out of bounds write. The existing BUG_ON only checks for addresses beyond the end of the gfn_to_hva_cache, not for addresses before the start of the gfn_to_hva_cache. Note that all current call sites have non-negative offsets. Fixes: 4ec6e863 ("kvm: Introduce kvm_write_guest_offset_cached()") Reported-by: NCfir Cohen <cfir@google.com> Signed-off-by: NJim Mattson <jmattson@google.com> Reviewed-by: NCfir Cohen <cfir@google.com> Reviewed-by: NPeter Shier <pshier@google.com> Reviewed-by: NKrish Sadhukhan <krish.sadhukhan@oracle.com> Reviewed-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com> Signed-off-by: NSasha Levin <sashal@kernel.org>
-
由 Mark Rutland 提交于
[ Upstream commit 0d640732dbebed0f10f18526de21652931f0b2f2 ] When we emulate an MMIO instruction, we advance the CPU state within decode_hsr(), before emulating the instruction effects. Having this logic in decode_hsr() is opaque, and advancing the state before emulation is problematic. It gets in the way of applying consistent single-step logic, and it prevents us from being able to fail an MMIO instruction with a synchronous exception. Clean this up by only advancing the CPU state *after* the effects of the instruction are emulated. Cc: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: NAlex Bennée <alex.bennee@linaro.org> Reviewed-by: NChristoffer Dall <christoffer.dall@arm.com> Signed-off-by: NMark Rutland <mark.rutland@arm.com> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com> Signed-off-by: NSasha Levin <sashal@kernel.org>
-
- 17 1月, 2019 1 次提交
-
-
由 Christoffer Dall 提交于
commit fb544d1ca65a89f7a3895f7531221ceeed74ada7 upstream. We recently addressed a VMID generation race by introducing a read/write lock around accesses and updates to the vmid generation values. However, kvm_arch_vcpu_ioctl_run() also calls need_new_vmid_gen() but does so without taking the read lock. As far as I can tell, this can lead to the same kind of race: VM 0, VCPU 0 VM 0, VCPU 1 ------------ ------------ update_vttbr (vmid 254) update_vttbr (vmid 1) // roll over read_lock(kvm_vmid_lock); force_vm_exit() local_irq_disable need_new_vmid_gen == false //because vmid gen matches enter_guest (vmid 254) kvm_arch.vttbr = <PGD>:<VMID 1> read_unlock(kvm_vmid_lock); enter_guest (vmid 1) Which results in running two VCPUs in the same VM with different VMIDs and (even worse) other VCPUs from other VMs could now allocate clashing VMID 254 from the new generation as long as VCPU 0 is not exiting. Attempt to solve this by making sure vttbr is updated before another CPU can observe the updated VMID generation. Cc: stable@vger.kernel.org Fixes: f0cf47d9 "KVM: arm/arm64: Close VMID generation race" Reviewed-by: NJulien Thierry <julien.thierry@arm.com> Signed-off-by: NChristoffer Dall <christoffer.dall@arm.com> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 10 1月, 2019 5 次提交
-
-
由 Gustavo A. R. Silva 提交于
commit c23b2e6fc4ca346018618266bcabd335c0a8a49e upstream. When using the nospec API, it should be taken into account that: "...if the CPU speculates past the bounds check then * array_index_nospec() will clamp the index within the range of [0, * size)." The above is part of the header for macro array_index_nospec() in linux/nospec.h Now, in this particular case, if intid evaluates to exactly VGIC_MAX_SPI or to exaclty VGIC_MAX_PRIVATE, the array_index_nospec() macro ends up returning VGIC_MAX_SPI - 1 or VGIC_MAX_PRIVATE - 1 respectively, instead of VGIC_MAX_SPI or VGIC_MAX_PRIVATE, which, based on the original logic: /* SGIs and PPIs */ if (intid <= VGIC_MAX_PRIVATE) return &vcpu->arch.vgic_cpu.private_irqs[intid]; /* SPIs */ if (intid <= VGIC_MAX_SPI) return &kvm->arch.vgic.spis[intid - VGIC_NR_PRIVATE_IRQS]; are valid values for intid. Fix this by calling array_index_nospec() macro with VGIC_MAX_PRIVATE + 1 and VGIC_MAX_SPI + 1 as arguments for its parameter size. Fixes: 41b87599 ("KVM: arm/arm64: vgic: fix possible spectre-v1 in vgic_get_irq()") Cc: stable@vger.kernel.org Signed-off-by: NGustavo A. R. Silva <gustavo@embeddedor.com> [dropped the SPI part which was fixed separately] Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
由 Christoffer Dall 提交于
commit 60c3ab30d8c2ff3a52606df03f05af2aae07dc6b upstream. When restoring the active state from userspace, we don't know which CPU was the source for the active state, and this is not architecturally exposed in any of the register state. Set the active_source to 0 in this case. In the future, we can expand on this and exposse the information as additional information to userspace for GICv2 if anyone cares. Cc: stable@vger.kernel.org Signed-off-by: NChristoffer Dall <christoffer.dall@arm.com> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
由 Marc Zyngier 提交于
commit bea2ef803ade3359026d5d357348842bca9edcf1 upstream. SPIs should be checked against the VMs specific configuration, and not the architectural maximum. Cc: stable@vger.kernel.org Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
由 Julien Thierry 提交于
commit 2e2f6c3c0b08eed3fcf7de3c7684c940451bdeb1 upstream. To change the active state of an MMIO, halt is requested for all vcpus of the affected guest before modifying the IRQ state. This is done by calling cond_resched_lock() in vgic_mmio_change_active(). However interrupts are disabled at this point and we cannot reschedule a vcpu. We actually don't need any of this, as kvm_arm_halt_guest ensures that all the other vcpus are out of the guest. Let's just drop that useless code. Signed-off-by: NJulien Thierry <julien.thierry@arm.com> Suggested-by: NChristoffer Dall <christoffer.dall@arm.com> Cc: stable@vger.kernel.org Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
由 Marc Zyngier 提交于
commit 107352a24900fb458152b92a4e72fbdc83fd5510 upstream. We currently only halt the guest when a vCPU messes with the active state of an SPI. This is perfectly fine for GICv2, but isn't enough for GICv3, where all vCPUs can access the state of any other vCPU. Let's broaden the condition to include any GICv3 interrupt that has an active state (i.e. all but LPIs). Cc: stable@vger.kernel.org Reviewed-by: NChristoffer Dall <christoffer.dall@arm.com> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 14 11月, 2018 2 次提交
-
-
由 Mark Rutland 提交于
commit da5a3ce66b8bb51b0ea8a89f42aac153903f90fb upstream. At boot time, KVM stashes the host MDCR_EL2 value, but only does this when the kernel is not running in hyp mode (i.e. is non-VHE). In these cases, the stashed value of MDCR_EL2.HPMN happens to be zero, which can lead to CONSTRAINED UNPREDICTABLE behaviour. Since we use this value to derive the MDCR_EL2 value when switching to/from a guest, after a guest have been run, the performance counters do not behave as expected. This has been observed to result in accesses via PMXEVTYPER_EL0 and PMXEVCNTR_EL0 not affecting the relevant counters, resulting in events not being counted. In these cases, only the fixed-purpose cycle counter appears to work as expected. Fix this by always stashing the host MDCR_EL2 value, regardless of VHE. Cc: Christopher Dall <christoffer.dall@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: stable@vger.kernel.org Fixes: 1e947bad ("arm64: KVM: Skip HYP setup when already running in HYP") Tested-by: NRobin Murphy <robin.murphy@arm.com> Signed-off-by: NMark Rutland <mark.rutland@arm.com> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
由 Punit Agrawal 提交于
commit fd2ef358 upstream. PageTransCompoundMap() returns true for hugetlbfs and THP hugepages. This behaviour incorrectly leads to stage 2 faults for unsupported hugepage sizes (e.g., 64K hugepage with 4K pages) to be treated as THP faults. Tighten the check to filter out hugetlbfs pages. This also leads to consistently mapping all unsupported hugepage sizes as PTE level entries at stage 2. Signed-off-by: NPunit Agrawal <punit.agrawal@arm.com> Reviewed-by: NSuzuki Poulose <suzuki.poulose@arm.com> Cc: Christoffer Dall <christoffer.dall@arm.com> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: stable@vger.kernel.org # v4.13+ Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 07 9月, 2018 2 次提交
-
-
由 Marc Zyngier 提交于
kvm_unmap_hva is long gone, and we only have kvm_unmap_hva_range to deal with. Drop the now obsolete code. Fixes: fb1522e0 ("KVM: update to new mmu_notifier semantic v2") Cc: James Hogan <jhogan@kernel.org> Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com> Signed-off-by: NChristoffer Dall <christoffer.dall@arm.com>
-
由 Marc Zyngier 提交于
When triggering a CoW, we unmap the RO page via an MMU notifier (invalidate_range_start), and then populate the new PTE using another one (change_pte). In the meantime, we'll have copied the old page into the new one. The problem is that the data for the new page is sitting in the cache, and should the guest have an uncached mapping to that page (or its MMU off), following accesses will bypass the cache. In a way, this is similar to what happens on a translation fault: We need to clean the page to the PoC before mapping it. So let's just do that. This fixes a KVM unit test regression observed on a HiSilicon platform, and subsequently reproduced on Seattle. Fixes: a9c0e12e ("KVM: arm/arm64: Only clean the dcache on translation fault") Cc: stable@vger.kernel.org # v4.16+ Reported-by: NMike Galbraith <efault@gmx.de> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com> Signed-off-by: NChristoffer Dall <christoffer.dall@arm.com>
-
- 23 8月, 2018 1 次提交
-
-
由 Michal Hocko 提交于
There are several blockable mmu notifiers which might sleep in mmu_notifier_invalidate_range_start and that is a problem for the oom_reaper because it needs to guarantee a forward progress so it cannot depend on any sleepable locks. Currently we simply back off and mark an oom victim with blockable mmu notifiers as done after a short sleep. That can result in selecting a new oom victim prematurely because the previous one still hasn't torn its memory down yet. We can do much better though. Even if mmu notifiers use sleepable locks there is no reason to automatically assume those locks are held. Moreover majority of notifiers only care about a portion of the address space and there is absolutely zero reason to fail when we are unmapping an unrelated range. Many notifiers do really block and wait for HW which is harder to handle and we have to bail out though. This patch handles the low hanging fruit. __mmu_notifier_invalidate_range_start gets a blockable flag and callbacks are not allowed to sleep if the flag is set to false. This is achieved by using trylock instead of the sleepable lock for most callbacks and continue as long as we do not block down the call chain. I think we can improve that even further because there is a common pattern to do a range lookup first and then do something about that. The first part can be done without a sleeping lock in most cases AFAICS. The oom_reaper end then simply retries if there is at least one notifier which couldn't make any progress in !blockable mode. A retry loop is already implemented to wait for the mmap_sem and this is basically the same thing. The simplest way for driver developers to test this code path is to wrap userspace code which uses these notifiers into a memcg and set the hard limit to hit the oom. This can be done e.g. after the test faults in all the mmu notifier managed memory and set the hard limit to something really small. Then we are looking for a proper process tear down. [akpm@linux-foundation.org: coding style fixes] [akpm@linux-foundation.org: minor code simplification] Link: http://lkml.kernel.org/r/20180716115058.5559-1-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com> Acked-by: Christian König <christian.koenig@amd.com> # AMD notifiers Acked-by: Leon Romanovsky <leonro@mellanox.com> # mlx and umem_odp Reported-by: NDavid Rientjes <rientjes@google.com> Cc: "David (ChunMing) Zhou" <David1.Zhou@amd.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: David Airlie <airlied@linux.ie> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Doug Ledford <dledford@redhat.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Mike Marciniszyn <mike.marciniszyn@intel.com> Cc: Dennis Dalessandro <dennis.dalessandro@intel.com> Cc: Sudeep Dutt <sudeep.dutt@intel.com> Cc: Ashutosh Dixit <ashutosh.dixit@intel.com> Cc: Dimitri Sivanich <sivanich@sgi.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Juergen Gross <jgross@suse.com> Cc: "Jérôme Glisse" <jglisse@redhat.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 13 8月, 2018 2 次提交
-
-
由 Punit Agrawal 提交于
When there is contention on faulting in a particular page table entry at stage 2, the break-before-make requirement of the architecture can lead to additional refaulting due to TLB invalidation. Avoid this by skipping a page table update if the new value of the PTE matches the previous value. Cc: stable@vger.kernel.org Fixes: d5d8184d ("KVM: ARM: Memory virtualization setup") Reviewed-by: NSuzuki Poulose <suzuki.poulose@arm.com> Acked-by: NChristoffer Dall <christoffer.dall@arm.com> Signed-off-by: NPunit Agrawal <punit.agrawal@arm.com> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
-
由 Punit Agrawal 提交于
Contention on updating a PMD entry by a large number of vcpus can lead to duplicate work when handling stage 2 page faults. As the page table update follows the break-before-make requirement of the architecture, it can lead to repeated refaults due to clearing the entry and flushing the tlbs. This problem is more likely when - * there are large number of vcpus * the mapping is large block mapping such as when using PMD hugepages (512MB) with 64k pages. Fix this by skipping the page table update if there is no change in the entry being updated. Cc: stable@vger.kernel.org Fixes: ad361f09 ("KVM: ARM: Support hugetlbfs backed huge pages") Reviewed-by: NSuzuki Poulose <suzuki.poulose@arm.com> Acked-by: NChristoffer Dall <christoffer.dall@arm.com> Signed-off-by: NPunit Agrawal <punit.agrawal@arm.com> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
-
- 12 8月, 2018 3 次提交
-
-
由 Jia He 提交于
kvm_vgic_sync_hwstate is only called with IRQ being disabled. There is thus no need to call spin_lock_irqsave/restore in vgic_fold_lr_state and vgic_prune_ap_list. This patch replace them with the non irq-safe version. Signed-off-by: NJia He <jia.he@hxt-semitech.com> Acked-by: NChristoffer Dall <christoffer.dall@arm.com> [maz: commit message tidy-up] Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
-
由 Jia He 提交于
DEBUG_SPINLOCK_BUG_ON can be used with both vgic-v2 and vgic-v3, so let's move it to vgic.h Signed-off-by: NJia He <jia.he@hxt-semitech.com> [maz: commit message tidy-up] Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
-
由 Marc Zyngier 提交于
Although vgic-v3 now supports Group0 interrupts, it still doesn't deal with Group0 SGIs. As usually with the GIC, nothing is simple: - ICC_SGI1R can signal SGIs of both groups, since GICD_CTLR.DS==1 with KVM (as per 8.1.10, Non-secure EL1 access) - ICC_SGI0R can only generate Group0 SGIs - ICC_ASGI1R sees its scope refocussed to generate only Group0 SGIs (as per the note at the bottom of Table 8-14) We only support Group1 SGIs so far, so no material change. Reviewed-by: NEric Auger <eric.auger@redhat.com> Reviewed-by: NChristoffer Dall <christoffer.dall@arm.com> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
-
- 06 8月, 2018 3 次提交
-
-
由 Paolo Bonzini 提交于
We are currently cutting hva_to_pfn_fast short if we do not want an immediate exit, which is represented by !async && !atomic. However, this is unnecessary, and __get_user_pages_fast is *much* faster because the regular get_user_pages takes pmd_lock/pte_lock. In fact, when many CPUs take a nested vmexit at the same time the contention on those locks is visible, and this patch removes about 25% (compared to 4.18) from vmexit.flat on a 16 vCPU nested guest. Suggested-by: NAndrea Arcangeli <aarcange@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Tianyu Lan 提交于
This patch is to provide a way for platforms to register hv tlb remote flush callback and this helps to optimize operation of tlb flush among vcpus for nested virtualization case. Signed-off-by: NLan Tianyu <Tianyu.Lan@microsoft.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Junaid Shahid 提交于
Use the fast CR3 switch mechanism to locklessly change the MMU root page when switching between L1 and L2. The switch from L2 to L1 should always go through the fast path, while the switch from L1 to L2 should go through the fast path if L1's CR3/EPTP for L2 hasn't changed since the last time. Signed-off-by: NJunaid Shahid <junaids@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 31 7月, 2018 2 次提交
-
-
由 Christoffer Dall 提交于
When the VCPU is blocked (for example from WFI) we don't inject the physical timer interrupt if it should fire while the CPU is blocked, but instead we just wake up the VCPU and expect kvm_timer_vcpu_load to take care of injecting the interrupt. Unfortunately, kvm_timer_vcpu_load() doesn't actually do that, it only has support to schedule a soft timer if the emulated phys timer is expected to fire in the future. Follow the same pattern as kvm_timer_update_state() and update the irq state after potentially scheduling a soft timer. Reported-by: NAndre Przywara <andre.przywara@arm.com> Cc: Stable <stable@vger.kernel.org> # 4.15+ Fixes: bbdd52cf ("KVM: arm/arm64: Avoid phys timer emulation in vcpu entry/exit") Signed-off-by: NChristoffer Dall <christoffer.dall@arm.com> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
-
由 Christoffer Dall 提交于
kvm_timer_update_state() is called when changing the phys timer configuration registers, either via vcpu reset, as a result of a trap from the guest, or when userspace programs the registers. phys_timer_emulate() is in turn called by kvm_timer_update_state() to either cancel an existing software timer, or program a new software timer, to emulate the behavior of a real phys timer, based on the change in configuration registers. Unfortunately, the interaction between these two functions left a small race; if the conceptual emulated phys timer should actually fire, but the soft timer hasn't executed its callback yet, we cancel the timer in phys_timer_emulate without injecting an irq. This only happens if the check in kvm_timer_update_state is called before the timer should fire, which is relatively unlikely, but possible. The solution is to update the state of the phys timer after calling phys_timer_emulate, which will pick up the pending timer state and update the interrupt value. Note that this leaves the opportunity of raising the interrupt twice, once in the just-programmed soft timer, and once in kvm_timer_update_state. Since this always happens synchronously with the VCPU execution, there is no harm in this, and the guest ever only sees a single timer interrupt. Cc: Stable <stable@vger.kernel.org> # 4.15+ Signed-off-by: NChristoffer Dall <christoffer.dall@arm.com> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
-
- 24 7月, 2018 1 次提交
-
-
由 Mark Rutland 提交于
It's possible for userspace to control n. Sanitize n when using it as an array index, to inhibit the potential spectre-v1 write gadget. Note that while it appears that n must be bound to the interval [0,3] due to the way it is extracted from addr, we cannot guarantee that compiler transformations (and/or future refactoring) will ensure this is the case, and given this is a slow path it's better to always perform the masking. Found by smatch. Signed-off-by: NMark Rutland <mark.rutland@arm.com> Cc: Christoffer Dall <christoffer.dall@arm.com> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: kvmarm@lists.cs.columbia.edu Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
-
- 21 7月, 2018 14 次提交
-
-
由 Eric W. Biederman 提交于
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
-
由 James Morse 提交于
arm64's new use of KVMs get_events/set_events API calls isn't just or RAS, it allows an SError that has been made pending by KVM as part of its device emulation to be migrated. Wire this up for 32bit too. We only need to read/write the HCR_VA bit, and check that no esr has been provided, as we don't yet support VDFSR. Signed-off-by: NJames Morse <james.morse@arm.com> Reviewed-by: NDongjiu Geng <gengdongjiu@huawei.com> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
-
由 James Morse 提交于
The get/set events helpers to do some work to check reserved and padding fields are zero. This is useful on 32bit too. Move this code into virt/kvm/arm/arm.c, and give the arch code some underscores. This is temporarily hidden behind __KVM_HAVE_VCPU_EVENTS until 32bit is wired up. Signed-off-by: NJames Morse <james.morse@arm.com> Reviewed-by: NDongjiu Geng <gengdongjiu@huawei.com> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
-
由 Dongjiu Geng 提交于
For the migrating VMs, user space may need to know the exception state. For example, in the machine A, KVM make an SError pending, when migrate to B, KVM also needs to pend an SError. This new IOCTL exports user-invisible states related to SError. Together with appropriate user space changes, user space can get/set the SError exception state to do migrate/snapshot/suspend. Signed-off-by: NDongjiu Geng <gengdongjiu@huawei.com> Reviewed-by: NJames Morse <james.morse@arm.com> [expanded documentation wording] Signed-off-by: NJames Morse <james.morse@arm.com> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
-
由 Christoffer Dall 提交于
Simply letting IGROUPR be writable from userspace would break migration from old kernels to newer kernels, because old kernels incorrectly report interrupt groups as group 1. This would not be a big problem if userspace wrote GICD_IIDR as read from the kernel, because we could detect the incompatibility and return an error to userspace. Unfortunately, this is not the case with current userspace implementations and simply letting IGROUPR be writable from userspace for an emulated GICv2 silently breaks migration and causes the destination VM to no longer run after migration. We now encourage userspace to write the read and expected value of GICD_IIDR as the first part of a GIC register restore, and if we observe a write to GICD_IIDR we know that userspace has been updated and has had a chance to cope with older kernels (VGICv2 IIDR.Revision == 0) incorrectly reporting interrupts as group 1, and therefore we now allow groups to be user writable. Reviewed-by: NAndrew Jones <drjones@redhat.com> Signed-off-by: NChristoffer Dall <christoffer.dall@arm.com> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
-
由 Christoffer Dall 提交于
Implement the required MMIO accessors for GICv2 and GICv3 for the IGROUPR distributor and redistributor registers. This can allow guests to change behavior compared to running on previous versions of KVM, but only to align with the architecture and hardware implementations. This also allows userspace to configure the interrupts groups for GICv3. We don't allow userspace to write the groups on GICv2 just yet, because that would result in GICv2 guests not receiving interrupts after migrating from an older kernel that exposes GICv2 interrupts as group 1. Reviewed-by: NAndrew Jones <drjones@redhat.com> Signed-off-by: NChristoffer Dall <christoffer.dall@arm.com> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
-
由 Christoffer Dall 提交于
If userspace attempts to write a GICD_IIDR that does not match the kernel version, return an error to userspace. The intention is to allow implementation changes inside KVM while avoiding silently breaking migration resulting in guests not running without any clear indication of what went wrong. Reviewed-by: NAndrew Jones <drjones@redhat.com> Signed-off-by: NChristoffer Dall <christoffer.dall@arm.com> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
-
由 Christoffer Dall 提交于
Currently we do not allow any vgic mmio write operations to fail, which makes sense from mmio traps from the guest. However, we should be able to report failures to userspace, if userspace writes incompatible values to read-only registers. Rework the internal interface to allow errors to be returned on the write side for userspace writes. Reviewed-by: NAndrew Jones <drjones@redhat.com> Signed-off-by: NChristoffer Dall <christoffer.dall@arm.com> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
-
由 Christoffer Dall 提交于
Now when we have a group configuration on the struct IRQ, use this state when populating the LR and signaling interrupts as either group 0 or group 1 to the VM. Depending on the model of the emulated GIC, and the guest's configuration of the VMCR, interrupts may be signaled as IRQs or FIQs to the VM. Reviewed-by: NAndrew Jones <drjones@redhat.com> Signed-off-by: NChristoffer Dall <christoffer.dall@arm.com> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
-
由 Christoffer Dall 提交于
In preparation for proper group 0 and group 1 support in the vgic, we add a field in the struct irq to store the group of all interrupts. We initialize the group to group 0 when emulating GICv2 and to group 1 when emulating GICv3, just like we treat them today. LPIs are always group 1. We also continue to ignore writes from the guest, preserving existing functionality, for now. Finally, we also add this field to the vgic debug logic to show the group for all interrupts. Reviewed-by: NAndrew Jones <drjones@redhat.com> Signed-off-by: NChristoffer Dall <christoffer.dall@arm.com> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
-
由 Christoffer Dall 提交于
We currently don't support grouping in the emulated VGIC, which is a known defect on KVM (not hurting any currently used guests as far as we're aware). This is currently handled by treating all interrupts as group 0 interrupts for an emulated GICv2 and always signaling interrupts as group 0 to the virtual CPU interface. However, when reading which group interrupts belongs to in the guest from the emulated VGIC, the VGIC currently reports group 1 instead of group 0, which is misleading. Fix this temporarily before introducing full group support by changing the hander to _raz instead of _rao. Fixes: fb848db3 "KVM: arm/arm64: vgic-new: Add GICv2 MMIO handling framework" Reviewed-by: NAndrew Jones <drjones@redhat.com> Signed-off-by: NChristoffer Dall <christoffer.dall@arm.com> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
-
由 Christoffer Dall 提交于
As we are about to tweak implementation aspects of the VGIC emulation, while still preserving some level of backwards compatibility support, add a field to keep track of the implementation revision field which is reported to the VM and to userspace. Reviewed-by: NAndrew Jones <drjones@redhat.com> Signed-off-by: NChristoffer Dall <christoffer.dall@arm.com> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
-
由 Christoffer Dall 提交于
Instead of hardcoding the shifts and masks in the GICD_IIDR register emulation, let's add the definition of these fields to the GIC header files and use them. This will make things more obvious when we're going to bump the revision in the IIDR when we'll make guest-visible changes to the implementation. Reviewed-by: NAndrew Jones <drjones@redhat.com> Signed-off-by: NChristoffer Dall <christoffer.dall@arm.com> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
-
由 Marc Zyngier 提交于
The vgic debugfs file only knows about SGI/PPI/SPI interrupts, and completely ignores LPIs. Let's fix that. Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
-