- 18 3月, 2020 10 次提交
-
-
由 Marc Zyngier 提交于
commit 8e01d9a396e6db153d94a6004e6473d9ff251a6a upstream. When the VHE code was reworked, a lot of the vgic stuff was moved around, but the GICv4 residency code did stay untouched, meaning that we come in and out of residency on each flush/sync, which is obviously suboptimal. To address this, let's move things around a bit: - Residency entry (flush) moves to vcpu_load - Residency exit (sync) moves to vcpu_put - On blocking (entry to WFI), we "put" - On unblocking (exit from WFI), we "load" Because these can nest (load/block/put/load/unblock/put, for example), we now have per-VPE tracking of the residency state. Additionally, vgic_v4_put gains a "need doorbell" parameter, which only gets set to true when blocking because of a WFI. This allows a finer control of the doorbell, which now also gets disabled as soon as it gets signaled. Signed-off-by: NMarc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20191027144234.8395-2-maz@kernel.orgSigned-off-by: NShannon Zhao <shannon.zhao@linux.alibaba.com> Acked-by: NZou Cao <zoucao@linux.alibaba.com>
-
由 Punit Agrawal 提交于
commit b8e0ba7c8bea994011aff3b4c35256b180fab874 upstream. KVM only supports PMD hugepages at stage 2. Now that the various page handling routines are updated, extend the stage 2 fault handling to map in PUD hugepages. Addition of PUD hugepage support enables additional page sizes (e.g., 1G with 4K granule) which can be useful on cores that support mapping larger block sizes in the TLB entries. Signed-off-by: NPunit Agrawal <punit.agrawal@arm.com> Reviewed-by: NChristoffer Dall <christoffer.dall@arm.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> [ Replace BUG() => WARN_ON(1) for arm32 PUD helpers ] Signed-off-by: NSuzuki Poulose <suzuki.poulose@arm.com> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com> Signed-off-by: NShannon Zhao <shannon.zhao@linux.alibaba.com> Acked-by: NZou Cao <zoucao@linux.alibaba.com>
-
由 Punit Agrawal 提交于
commit 35a63966194dd994f44150f07398c62f8dca011e upstream. In preparation for creating larger hugepages at Stage 2, add support to the age handling notifiers for PUD hugepages when encountered. Provide trivial helpers for arm32 to allow sharing code. Signed-off-by: NPunit Agrawal <punit.agrawal@arm.com> Reviewed-by: NChristoffer Dall <christoffer.dall@arm.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> [ Replaced BUG() => WARN_ON(1) for arm32 PUD helpers ] Signed-off-by: NSuzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com> Signed-off-by: NShannon Zhao <shannon.zhao@linux.alibaba.com> Acked-by: NZou Cao <zoucao@linux.alibaba.com>
-
由 Punit Agrawal 提交于
commit eb3f0624ea082def887acc79e97934e27d0188b7 upstream. In preparation for creating larger hugepages at Stage 2, extend the access fault handling at Stage 2 to support PUD hugepages when encountered. Provide trivial helpers for arm32 to allow sharing of code. Signed-off-by: NPunit Agrawal <punit.agrawal@arm.com> Reviewed-by: NChristoffer Dall <christoffer.dall@arm.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> [ Replaced BUG() => WARN_ON(1) in PUD helpers ] Signed-off-by: NSuzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com> Signed-off-by: NShannon Zhao <shannon.zhao@linux.alibaba.com> Acked-by: NZou Cao <zoucao@linux.alibaba.com>
-
由 Punit Agrawal 提交于
commit 86d1c55ea605025f78d026e7fc3a2bb4c3fc2d6a upstream. In preparation for creating PUD hugepages at stage 2, add support for detecting execute permissions on PUD page table entries. Faults due to lack of execute permissions on page table entries is used to perform i-cache invalidation on first execute. Provide trivial implementations of arm32 helpers to allow sharing of code. Signed-off-by: NPunit Agrawal <punit.agrawal@arm.com> Reviewed-by: NChristoffer Dall <christoffer.dall@arm.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> [ Replaced BUG() => WARN_ON(1) in arm32 PUD helpers ] Signed-off-by: NSuzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com> Signed-off-by: NShannon Zhao <shannon.zhao@linux.alibaba.com> Acked-by: NZou Cao <zoucao@linux.alibaba.com>
-
由 Punit Agrawal 提交于
commit 4ea5af53114091e23a8fc279f25637e6c4e892c6 upstream. In preparation for creating PUD hugepages at stage 2, add support for write protecting PUD hugepages when they are encountered. Write protecting guest tables is used to track dirty pages when migrating VMs. Also, provide trivial implementations of required kvm_s2pud_* helpers to allow sharing of code with arm32. Signed-off-by: NPunit Agrawal <punit.agrawal@arm.com> Reviewed-by: NChristoffer Dall <christoffer.dall@arm.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> [ Replaced BUG() => WARN_ON() in arm32 pud helpers ] Signed-off-by: NSuzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com> Signed-off-by: NShannon Zhao <shannon.zhao@linux.alibaba.com> Acked-by: NZou Cao <zoucao@linux.alibaba.com>
-
由 Punit Agrawal 提交于
commit f8df73388ee25b5e5f1d26249202e7126ca8139d upstream. Introduce helpers to abstract architectural handling of the conversion of pfn to page table entries and marking a PMD page table entry as a block entry. The helpers are introduced in preparation for supporting PUD hugepages at stage 2 - which are supported on arm64 but do not exist on arm. Signed-off-by: NPunit Agrawal <punit.agrawal@arm.com> Reviewed-by: NSuzuki K Poulose <suzuki.poulose@arm.com> Acked-by: NChristoffer Dall <christoffer.dall@arm.com> Cc: Russell King <linux@armlinux.org.uk> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Reviewed-by: NMarc Zyngier <marc.zyngier@arm.com> Signed-off-by: NSuzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com> Signed-off-by: NShannon Zhao <shannon.zhao@linux.alibaba.com> Acked-by: NZou Cao <zoucao@linux.alibaba.com>
-
由 Punit Agrawal 提交于
commit 6396b852e46e562f4742ed0a9042b537eb26b8aa upstream. Stage 2 fault handler marks a page as executable if it is handling an execution fault or if it was a permission fault in which case the executable bit needs to be preserved. The logic to decide if the page should be marked executable is duplicated for PMD and PTE entries. To avoid creating another copy when support for PUD hugepages is introduced refactor the code to share the checks needed to mark a page table entry as executable. Signed-off-by: NPunit Agrawal <punit.agrawal@arm.com> Reviewed-by: NSuzuki K Poulose <suzuki.poulose@arm.com> Reviewed-by: NChristoffer Dall <christoffer.dall@arm.com> Signed-off-by: NSuzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com> Signed-off-by: NShannon Zhao <shannon.zhao@linux.alibaba.com> Acked-by: NZou Cao <zoucao@linux.alibaba.com>
-
由 Punit Agrawal 提交于
commit 3f58bf63455588a260b6abc8761c48bc8977b919 upstream. The code for operations such as marking the pfn as dirty, and dcache/icache maintenance during stage 2 fault handling is duplicated between normal pages and PMD hugepages. Instead of creating another copy of the operations when we introduce PUD hugepages, let's share them across the different pagesizes. Signed-off-by: NPunit Agrawal <punit.agrawal@arm.com> Reviewed-by: NSuzuki K Poulose <suzuki.poulose@arm.com> Reviewed-by: NChristoffer Dall <christoffer.dall@arm.com> Signed-off-by: NSuzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com> Signed-off-by: NShannon Zhao <shannon.zhao@linux.alibaba.com> Acked-by: NZou Cao <zoucao@linux.alibaba.com>
-
由 Mark Rutland 提交于
commit bd7d95cafb499e24903b7d21f9eeb2c5208160c2 upstream. When we emulate a guest instruction, we don't advance the hardware singlestep state machine, and thus the guest will receive a software step exception after a next instruction which is not emulated by the host. We bodge around this in an ad-hoc fashion. Sometimes we explicitly check whether userspace requested a single step, and fake a debug exception from within the kernel. Other times, we advance the HW singlestep state rely on the HW to generate the exception for us. Thus, the observed step behaviour differs for host and guest. Let's make this simpler and consistent by always advancing the HW singlestep state machine when we skip an instruction. Thus we can rely on the hardware to generate the singlestep exception for us, and never need to explicitly check for an active-pending step, nor do we need to fake a debug exception from the guest. Cc: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: NAlex Bennée <alex.bennee@linaro.org> Reviewed-by: NChristoffer Dall <christoffer.dall@arm.com> Signed-off-by: NMark Rutland <mark.rutland@arm.com> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com> Signed-off-by: NShannon Zhao <shannon.zhao@linux.alibaba.com> Reviewed-by: NZou Cao <zoucao@linux.alibaba.com>
-
- 27 12月, 2019 2 次提交
-
-
由 Dongjiu Geng 提交于
commit 58bf437ff64eac8aca606e42d7e4623e40b61fa1 upstream The commit 539aee0e ("KVM: arm64: Share the parts of get/set events useful to 32bit") shares the get/set events helper for arm64 and arm32, but forgot to share the cap extension code. User space will check whether KVM supports vcpu events by checking the KVM_CAP_VCPU_EVENTS extension Acked-by: NJames Morse <james.morse@arm.com> Reviewed-by : Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: NDongjiu Geng <gengdongjiu@huawei.com> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com> Signed-off-by: NZou Cao <zoucao@linux.alibaba.com> Reviewed-by: NBaoyou Xie <xie.baoyou@linux.alibaba.com>
-
由 Dongjiu Geng 提交于
commit 375bdd3b5d4f7cf146f0df1488b4671b141dd799 upstream Rename kvm_arch_dev_ioctl_check_extension() to kvm_arch_vm_ioctl_check_extension(), because it does not have any relationship with device. Renaming this function can make code readable. Cc: James Morse <james.morse@arm.com> Reviewed-by: NSuzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: NDongjiu Geng <gengdongjiu@huawei.com> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com> Signed-off-by: NZou Cao <zoucao@linux.alibaba.com> Reviewed-by: NBaoyou Xie <xie.baoyou@linux.alibaba.com>
-
- 13 12月, 2019 1 次提交
-
-
由 Zenghui Yu 提交于
commit ca185b260951d3b55108c0b95e188682d8a507b7 upstream. It's possible that two LPIs locate in the same "byte_offset" but target two different vcpus, where their pending status are indicated by two different pending tables. In such a scenario, using last_byte_offset optimization will lead KVM relying on the wrong pending table entry. Let us use last_ptr instead, which can be treated as a byte index into a pending table and also, can be vcpu specific. Fixes: 28077125 ("KVM: arm64: vgic-v3: KVM_DEV_ARM_VGIC_SAVE_PENDING_TABLES") Cc: stable@vger.kernel.org Signed-off-by: NZenghui Yu <yuzenghui@huawei.com> Signed-off-by: NMarc Zyngier <maz@kernel.org> Acked-by: NEric Auger <eric.auger@redhat.com> Link: https://lore.kernel.org/r/20191029071919.177-4-yuzenghui@huawei.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 24 11月, 2019 1 次提交
-
-
由 Suzuki K Poulose 提交于
[ Upstream commit d2db7773ba864df6b4e19643dfc54838550d8049 ] So far we have only supported 3 level page table with fixed IPA of 40bits, where PUD is folded. With 4 level page tables, we need to check if the PUD entry is valid or not. Fix stage2_flush_memslot() to do this check, before walking down the table. Acked-by: NChristoffer Dall <cdall@kernel.org> Acked-by: NMarc Zyngier <marc.zyngier@arm.com> Reviewed-by: NEric Auger <eric.auger@redhat.com> Signed-off-by: NSuzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com> Signed-off-by: NSasha Levin <sashal@kernel.org>
-
- 10 9月, 2019 2 次提交
-
-
由 Andre Przywara 提交于
[ Upstream commit 2e16f3e926ed48373c98edea85c6ad0ef69425d1 ] At the moment we initialise the target *mask* of a virtual IRQ to the VCPU it belongs to, even though this mask is only defined for GICv2 and quickly runs out of bits for many GICv3 guests. This behaviour triggers an UBSAN complaint for more than 32 VCPUs: ------ [ 5659.462377] UBSAN: Undefined behaviour in virt/kvm/arm/vgic/vgic-init.c:223:21 [ 5659.471689] shift exponent 32 is too large for 32-bit type 'unsigned int' ------ Also for GICv3 guests the reporting of TARGET in the "vgic-state" debugfs dump is wrong, due to this very same problem. Because there is no requirement to create the VGIC device before the VCPUs (and QEMU actually does it the other way round), we can't safely initialise mpidr or targets in kvm_vgic_vcpu_init(). But since we touch every private IRQ for each VCPU anyway later (in vgic_init()), we can just move the initialisation of those fields into there, where we definitely know the VGIC type. On the way make sure we really have either a VGICv2 or a VGICv3 device, since the existing code is just checking for "VGICv3 or not", silently ignoring the uninitialised case. Signed-off-by: NAndre Przywara <andre.przywara@arm.com> Reported-by: NDave Martin <dave.martin@arm.com> Tested-by: NJulien Grall <julien.grall@arm.com> Signed-off-by: NMarc Zyngier <maz@kernel.org> Signed-off-by: NSasha Levin <sashal@kernel.org>
-
由 Andrew Jones 提交于
[ Upstream commit 2113c5f62b7423e4a72b890bd479704aa85c81ba ] If after an MMIO exit to userspace a VCPU is immediately run with an immediate_exit request, such as when a signal is delivered or an MMIO emulation completion is needed, then the VCPU completes the MMIO emulation and immediately returns to userspace. As the exit_reason does not get changed from KVM_EXIT_MMIO in these cases we have to be careful not to complete the MMIO emulation again, when the VCPU is eventually run again, because the emulation does an instruction skip (and doing too many skips would be a waste of guest code :-) We need to use additional VCPU state to track if the emulation is complete. As luck would have it, we already have 'mmio_needed', which even appears to be used in this way by other architectures already. Fixes: 0d640732dbeb ("arm64: KVM: Skip MMIO insn after emulation") Acked-by: NMark Rutland <mark.rutland@arm.com> Signed-off-by: NAndrew Jones <drjones@redhat.com> Signed-off-by: NMarc Zyngier <maz@kernel.org> Signed-off-by: NSasha Levin <sashal@kernel.org>
-
- 06 9月, 2019 2 次提交
-
-
由 Marc Zyngier 提交于
[ Upstream commit 82e40f558de566fdee214bec68096bbd5e64a6a4 ] A guest is not allowed to inject a SGI (or clear its pending state) by writing to GICD_ISPENDR0 (resp. GICD_ICPENDR0), as these bits are defined as WI (as per ARM IHI 0048B 4.3.7 and 4.3.8). Make sure we correctly emulate the architecture. Fixes: 96b29800 ("KVM: arm/arm64: vgic-new: Add PENDING registers handlers") Cc: stable@vger.kernel.org # 4.7+ Reported-by: NAndre Przywara <andre.przywara@arm.com> Signed-off-by: NMarc Zyngier <maz@kernel.org> Signed-off-by: NWill Deacon <will@kernel.org> Signed-off-by: NSasha Levin <sashal@kernel.org>
-
由 Heyi Guo 提交于
[ Upstream commit d4a8061a7c5f7c27a2dc002ee4cb89b3e6637e44 ] If the ap_list is longer than 256 entries, merge_final() in list_sort() will call the comparison callback with the same element twice, causing a deadlock in vgic_irq_cmp(). Fix it by returning early when irqa == irqb. Cc: stable@vger.kernel.org # 4.7+ Fixes: 8e444745 ("KVM: arm/arm64: vgic-new: Add IRQ sorting") Signed-off-by: NZenghui Yu <yuzenghui@huawei.com> Signed-off-by: NHeyi Guo <guoheyi@huawei.com> [maz: massaged commit log and patch, added Fixes and Cc-stable] Signed-off-by: NMarc Zyngier <maz@kernel.org> Signed-off-by: NWill Deacon <will@kernel.org> Signed-off-by: NSasha Levin <sashal@kernel.org>
-
- 25 8月, 2019 1 次提交
-
-
由 Marc Zyngier 提交于
commit 5eeaf10eec394b28fad2c58f1f5c3a5da0e87d1c upstream. Since commit commit 328e5664 ("KVM: arm/arm64: vgic: Defer touching GICH_VMCR to vcpu_load/put"), we leave ICH_VMCR_EL2 (or its GICv2 equivalent) loaded as long as we can, only syncing it back when we're scheduled out. There is a small snag with that though: kvm_vgic_vcpu_pending_irq(), which is indirectly called from kvm_vcpu_check_block(), needs to evaluate the guest's view of ICC_PMR_EL1. At the point were we call kvm_vcpu_check_block(), the vcpu is still loaded, and whatever changes to PMR is not visible in memory until we do a vcpu_put(). Things go really south if the guest does the following: mov x0, #0 // or any small value masking interrupts msr ICC_PMR_EL1, x0 [vcpu preempted, then rescheduled, VMCR sampled] mov x0, #ff // allow all interrupts msr ICC_PMR_EL1, x0 wfi // traps to EL2, so samping of VMCR [interrupt arrives just after WFI] Here, the hypervisor's view of PMR is zero, while the guest has enabled its interrupts. kvm_vgic_vcpu_pending_irq() will then say that no interrupts are pending (despite an interrupt being received) and we'll block for no reason. If the guest doesn't have a periodic interrupt firing once it has blocked, it will stay there forever. To avoid this unfortuante situation, let's resync VMCR from kvm_arch_vcpu_blocking(), ensuring that a following kvm_vcpu_check_block() will observe the latest value of PMR. This has been found by booting an arm64 Linux guest with the pseudo NMI feature, and thus using interrupt priorities to mask interrupts instead of the usual PSTATE masking. Cc: stable@vger.kernel.org # 4.12 Fixes: 328e5664 ("KVM: arm/arm64: vgic: Defer touching GICH_VMCR to vcpu_load/put") Signed-off-by: NMarc Zyngier <maz@kernel.org> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 14 7月, 2019 1 次提交
-
-
由 Dave Martin 提交于
[ Upstream commit 4729ec8c1e1145234aeeebad5d96d77f4ccbb00a ] kvm_device->destroy() seems to be supposed to free its kvm_device struct, but vgic_its_destroy() is not currently doing this, resulting in a memory leak, resulting in kmemleak reports such as the following: unreferenced object 0xffff800aeddfe280 (size 128): comm "qemu-system-aar", pid 13799, jiffies 4299827317 (age 1569.844s) [...] backtrace: [<00000000a08b80e2>] kmem_cache_alloc+0x178/0x208 [<00000000dcad2bd3>] kvm_vm_ioctl+0x350/0xbc0 Fix it. Cc: Andre Przywara <andre.przywara@arm.com> Fixes: 1085fdc6 ("KVM: arm64: vgic-its: Introduce new KVM ITS device") Signed-off-by: NDave Martin <Dave.Martin@arm.com> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com> Signed-off-by: NSasha Levin <sashal@kernel.org>
-
- 19 6月, 2019 1 次提交
-
-
由 James Morse 提交于
[ Upstream commit 623e1528d4090bd1abaf93ec46f047dee9a6fb32 ] KVM has helpers to handle the condition codes of trapped aarch32 instructions. These are marked __hyp_text and used from HYP, but they aren't built by the 'hyp' Makefile, which has all the runes to avoid ASAN and KCOV instrumentation. Move this code to a new hyp/aarch32.c to avoid a hyp-panic when starting an aarch32 guest on a host built with the ASAN/KCOV debug options. Fixes: 021234ef ("KVM: arm64: Make kvm_condition_valid32() accessible from EL2") Fixes: 8cebe750 ("arm64: KVM: Make kvm_skip_instr32 available to HYP") Signed-off-by: NJames Morse <james.morse@arm.com> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com> Signed-off-by: NSasha Levin <sashal@kernel.org>
-
- 09 6月, 2019 1 次提交
-
-
由 Thomas Huth 提交于
commit a86cb413f4bf273a9d341a3ab2c2ca44e12eb317 upstream. KVM_CAP_MAX_VCPU_ID is currently always reporting KVM_MAX_VCPU_ID on all architectures. However, on s390x, the amount of usable CPUs is determined during runtime - it is depending on the features of the machine the code is running on. Since we are using the vcpu_id as an index into the SCA structures that are defined by the hardware (see e.g. the sca_add_vcpu() function), it is not only the amount of CPUs that is limited by the hard- ware, but also the range of IDs that we can use. Thus KVM_CAP_MAX_VCPU_ID must be determined during runtime on s390x, too. So the handling of KVM_CAP_MAX_VCPU_ID has to be moved from the common code into the architecture specific code, and on s390x we have to return the same value here as for KVM_CAP_MAX_VCPUS. This problem has been discovered with the kvm_create_max_vcpus selftest. With this change applied, the selftest now passes on s390x, too. Reviewed-by: NAndrew Jones <drjones@redhat.com> Reviewed-by: NCornelia Huck <cohuck@redhat.com> Reviewed-by: NDavid Hildenbrand <david@redhat.com> Signed-off-by: NThomas Huth <thuth@redhat.com> Message-Id: <20190523164309.13345-9-thuth@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 26 5月, 2019 1 次提交
-
-
由 Andrew Jones 提交于
[ Upstream commit 811328fc3222f7b55846de0cd0404339e2e1e6d7 ] A failed KVM_ARM_VCPU_INIT should not set the vcpu target, as the vcpu target is used by kvm_vcpu_initialized() to determine if other vcpu ioctls may proceed. We need to set the target before calling kvm_reset_vcpu(), but if that call fails, we should then unset it and clear the feature bitmap while we're at it. Signed-off-by: NAndrew Jones <drjones@redhat.com> [maz: Simplified patch, completed commit message] Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com> Signed-off-by: NSasha Levin <sashal@kernel.org>
-
- 04 5月, 2019 2 次提交
-
-
由 Marc Zyngier 提交于
[ Upstream commit 7494cec6cb3ba7385a6a223b81906384f15aae34 ] Calling kvm_is_visible_gfn() implies that we're parsing the memslots, and doing this without the srcu lock is frown upon: [12704.164532] ============================= [12704.164544] WARNING: suspicious RCU usage [12704.164560] 5.1.0-rc1-00008-g600025238f51-dirty #16 Tainted: G W [12704.164573] ----------------------------- [12704.164589] ./include/linux/kvm_host.h:605 suspicious rcu_dereference_check() usage! [12704.164602] other info that might help us debug this: [12704.164616] rcu_scheduler_active = 2, debug_locks = 1 [12704.164631] 6 locks held by qemu-system-aar/13968: [12704.164644] #0: 000000007ebdae4f (&kvm->lock){+.+.}, at: vgic_its_set_attr+0x244/0x3a0 [12704.164691] #1: 000000007d751022 (&its->its_lock){+.+.}, at: vgic_its_set_attr+0x250/0x3a0 [12704.164726] #2: 00000000219d2706 (&vcpu->mutex){+.+.}, at: lock_all_vcpus+0x64/0xd0 [12704.164761] #3: 00000000a760aecd (&vcpu->mutex){+.+.}, at: lock_all_vcpus+0x64/0xd0 [12704.164794] #4: 000000000ef8e31d (&vcpu->mutex){+.+.}, at: lock_all_vcpus+0x64/0xd0 [12704.164827] #5: 000000007a872093 (&vcpu->mutex){+.+.}, at: lock_all_vcpus+0x64/0xd0 [12704.164861] stack backtrace: [12704.164878] CPU: 2 PID: 13968 Comm: qemu-system-aar Tainted: G W 5.1.0-rc1-00008-g600025238f51-dirty #16 [12704.164887] Hardware name: rockchip evb_rk3399/evb_rk3399, BIOS 2019.04-rc3-00124-g2feec69fb1 03/15/2019 [12704.164896] Call trace: [12704.164910] dump_backtrace+0x0/0x138 [12704.164920] show_stack+0x24/0x30 [12704.164934] dump_stack+0xbc/0x104 [12704.164946] lockdep_rcu_suspicious+0xcc/0x110 [12704.164958] gfn_to_memslot+0x174/0x190 [12704.164969] kvm_is_visible_gfn+0x28/0x70 [12704.164980] vgic_its_check_id.isra.0+0xec/0x1e8 [12704.164991] vgic_its_save_tables_v0+0x1ac/0x330 [12704.165001] vgic_its_set_attr+0x298/0x3a0 [12704.165012] kvm_device_ioctl_attr+0x9c/0xd8 [12704.165022] kvm_device_ioctl+0x8c/0xf8 [12704.165035] do_vfs_ioctl+0xc8/0x960 [12704.165045] ksys_ioctl+0x8c/0xa0 [12704.165055] __arm64_sys_ioctl+0x28/0x38 [12704.165067] el0_svc_common+0xd8/0x138 [12704.165078] el0_svc_handler+0x38/0x78 [12704.165089] el0_svc+0x8/0xc Make sure the lock is taken when doing this. Fixes: bf308242 ("KVM: arm/arm64: VGIC/ITS: protect kvm_read_guest() calls with SRCU lock") Reviewed-by: NEric Auger <eric.auger@redhat.com> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com> Signed-off-by: NSasha Levin (Microsoft) <sashal@kernel.org>
-
由 Marc Zyngier 提交于
[ Upstream commit a6ecfb11bf37743c1ac49b266595582b107b61d4 ] When halting a guest, QEMU flushes the virtual ITS caches, which amounts to writing to the various tables that the guest has allocated. When doing this, we fail to take the srcu lock, and the kernel shouts loudly if running a lockdep kernel: [ 69.680416] ============================= [ 69.680819] WARNING: suspicious RCU usage [ 69.681526] 5.1.0-rc1-00008-g600025238f51-dirty #18 Not tainted [ 69.682096] ----------------------------- [ 69.682501] ./include/linux/kvm_host.h:605 suspicious rcu_dereference_check() usage! [ 69.683225] [ 69.683225] other info that might help us debug this: [ 69.683225] [ 69.683975] [ 69.683975] rcu_scheduler_active = 2, debug_locks = 1 [ 69.684598] 6 locks held by qemu-system-aar/4097: [ 69.685059] #0: 0000000034196013 (&kvm->lock){+.+.}, at: vgic_its_set_attr+0x244/0x3a0 [ 69.686087] #1: 00000000f2ed935e (&its->its_lock){+.+.}, at: vgic_its_set_attr+0x250/0x3a0 [ 69.686919] #2: 000000005e71ea54 (&vcpu->mutex){+.+.}, at: lock_all_vcpus+0x64/0xd0 [ 69.687698] #3: 00000000c17e548d (&vcpu->mutex){+.+.}, at: lock_all_vcpus+0x64/0xd0 [ 69.688475] #4: 00000000ba386017 (&vcpu->mutex){+.+.}, at: lock_all_vcpus+0x64/0xd0 [ 69.689978] #5: 00000000c2c3c335 (&vcpu->mutex){+.+.}, at: lock_all_vcpus+0x64/0xd0 [ 69.690729] [ 69.690729] stack backtrace: [ 69.691151] CPU: 2 PID: 4097 Comm: qemu-system-aar Not tainted 5.1.0-rc1-00008-g600025238f51-dirty #18 [ 69.691984] Hardware name: rockchip evb_rk3399/evb_rk3399, BIOS 2019.04-rc3-00124-g2feec69fb1 03/15/2019 [ 69.692831] Call trace: [ 69.694072] lockdep_rcu_suspicious+0xcc/0x110 [ 69.694490] gfn_to_memslot+0x174/0x190 [ 69.694853] kvm_write_guest+0x50/0xb0 [ 69.695209] vgic_its_save_tables_v0+0x248/0x330 [ 69.695639] vgic_its_set_attr+0x298/0x3a0 [ 69.696024] kvm_device_ioctl_attr+0x9c/0xd8 [ 69.696424] kvm_device_ioctl+0x8c/0xf8 [ 69.696788] do_vfs_ioctl+0xc8/0x960 [ 69.697128] ksys_ioctl+0x8c/0xa0 [ 69.697445] __arm64_sys_ioctl+0x28/0x38 [ 69.697817] el0_svc_common+0xd8/0x138 [ 69.698173] el0_svc_handler+0x38/0x78 [ 69.698528] el0_svc+0x8/0xc The fix is to obviously take the srcu lock, just like we do on the read side of things since bf308242. One wonders why this wasn't fixed at the same time, but hey... Fixes: bf308242 ("KVM: arm/arm64: VGIC/ITS: protect kvm_read_guest() calls with SRCU lock") Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com> Signed-off-by: NSasha Levin (Microsoft) <sashal@kernel.org>
-
- 24 3月, 2019 4 次提交
-
-
由 Sean Christopherson 提交于
commit 152482580a1b0accb60676063a1ac57b2d12daf6 upstream. kvm_arch_memslots_updated() is at this point in time an x86-specific hook for handling MMIO generation wraparound. x86 stashes 19 bits of the memslots generation number in its MMIO sptes in order to avoid full page fault walks for repeat faults on emulated MMIO addresses. Because only 19 bits are used, wrapping the MMIO generation number is possible, if unlikely. kvm_arch_memslots_updated() alerts x86 that the generation has changed so that it can invalidate all MMIO sptes in case the effective MMIO generation has wrapped so as to avoid using a stale spte, e.g. a (very) old spte that was created with generation==0. Given that the purpose of kvm_arch_memslots_updated() is to prevent consuming stale entries, it needs to be called before the new generation is propagated to memslots. Invalidating the MMIO sptes after updating memslots means that there is a window where a vCPU could dereference the new memslots generation, e.g. 0, and incorrectly reuse an old MMIO spte that was created with (pre-wrap) generation==0. Fixes: e59dbe09 ("KVM: Introduce kvm_arch_memslots_updated()") Cc: <stable@vger.kernel.org> Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
由 Christoffer Dall 提交于
[ Upstream commit ab2d5eb03dbb7b37a1c6356686fb48626ab0c93e ] We currently initialize the group of private IRQs during kvm_vgic_vcpu_init, and the value of the group depends on the GIC model we are emulating. However, CPUs created before creating (and initializing) the VGIC might end up with the wrong group if the VGIC is created as GICv3 later. Since we have no enforced ordering of creating the VGIC and creating VCPUs, we can end up with part the VCPUs being properly intialized and the remaining incorrectly initialized. That also means that we have no single place to do the per-cpu data structure initialization which depends on knowing the emulated GIC model (which is only the group field). This patch removes the incorrect comment from kvm_vgic_vcpu_init and initializes the group of all previously created VCPUs's private interrupts in vgic_init in addition to the existing initialization in kvm_vgic_vcpu_init. Signed-off-by: NChristoffer Dall <christoffer.dall@arm.com> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com> Signed-off-by: NSasha Levin <sashal@kernel.org>
-
由 Marc Zyngier 提交于
[ Upstream commit 358b28f09f0ab074d781df72b8a671edb1547789 ] The current kvm_psci_vcpu_on implementation will directly try to manipulate the state of the VCPU to reset it. However, since this is not done on the thread that runs the VCPU, we can end up in a strangely corrupted state when the source and target VCPUs are running at the same time. Fix this by factoring out all reset logic from the PSCI implementation and forwarding the required information along with a request to the target VCPU. Reviewed-by: NAndrew Jones <drjones@redhat.com> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com> Signed-off-by: NChristoffer Dall <christoffer.dall@arm.com> Signed-off-by: NSasha Levin <sashal@kernel.org>
-
由 Julien Thierry 提交于
[ Upstream commit fc3bc475231e12e9c0142f60100cf84d077c79e1 ] vgic_dist->lpi_list_lock must always be taken with interrupts disabled as it is used in interrupt context. For configurations such as PREEMPT_RT_FULL, this means that it should be a raw_spinlock since RT spinlocks are interruptible. Signed-off-by: NJulien Thierry <julien.thierry@arm.com> Acked-by: NChristoffer Dall <christoffer.dall@arm.com> Acked-by: NMarc Zyngier <marc.zyngier@arm.com> Signed-off-by: NChristoffer Dall <christoffer.dall@arm.com> Signed-off-by: NSasha Levin <sashal@kernel.org>
-
- 13 2月, 2019 1 次提交
-
-
由 Mark Rutland 提交于
[ Upstream commit 0d640732dbebed0f10f18526de21652931f0b2f2 ] When we emulate an MMIO instruction, we advance the CPU state within decode_hsr(), before emulating the instruction effects. Having this logic in decode_hsr() is opaque, and advancing the state before emulation is problematic. It gets in the way of applying consistent single-step logic, and it prevents us from being able to fail an MMIO instruction with a synchronous exception. Clean this up by only advancing the CPU state *after* the effects of the instruction are emulated. Cc: Peter Maydell <peter.maydell@linaro.org> Reviewed-by: NAlex Bennée <alex.bennee@linaro.org> Reviewed-by: NChristoffer Dall <christoffer.dall@arm.com> Signed-off-by: NMark Rutland <mark.rutland@arm.com> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com> Signed-off-by: NSasha Levin <sashal@kernel.org>
-
- 17 1月, 2019 1 次提交
-
-
由 Christoffer Dall 提交于
commit fb544d1ca65a89f7a3895f7531221ceeed74ada7 upstream. We recently addressed a VMID generation race by introducing a read/write lock around accesses and updates to the vmid generation values. However, kvm_arch_vcpu_ioctl_run() also calls need_new_vmid_gen() but does so without taking the read lock. As far as I can tell, this can lead to the same kind of race: VM 0, VCPU 0 VM 0, VCPU 1 ------------ ------------ update_vttbr (vmid 254) update_vttbr (vmid 1) // roll over read_lock(kvm_vmid_lock); force_vm_exit() local_irq_disable need_new_vmid_gen == false //because vmid gen matches enter_guest (vmid 254) kvm_arch.vttbr = <PGD>:<VMID 1> read_unlock(kvm_vmid_lock); enter_guest (vmid 1) Which results in running two VCPUs in the same VM with different VMIDs and (even worse) other VCPUs from other VMs could now allocate clashing VMID 254 from the new generation as long as VCPU 0 is not exiting. Attempt to solve this by making sure vttbr is updated before another CPU can observe the updated VMID generation. Cc: stable@vger.kernel.org Fixes: f0cf47d9 "KVM: arm/arm64: Close VMID generation race" Reviewed-by: NJulien Thierry <julien.thierry@arm.com> Signed-off-by: NChristoffer Dall <christoffer.dall@arm.com> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 10 1月, 2019 5 次提交
-
-
由 Gustavo A. R. Silva 提交于
commit c23b2e6fc4ca346018618266bcabd335c0a8a49e upstream. When using the nospec API, it should be taken into account that: "...if the CPU speculates past the bounds check then * array_index_nospec() will clamp the index within the range of [0, * size)." The above is part of the header for macro array_index_nospec() in linux/nospec.h Now, in this particular case, if intid evaluates to exactly VGIC_MAX_SPI or to exaclty VGIC_MAX_PRIVATE, the array_index_nospec() macro ends up returning VGIC_MAX_SPI - 1 or VGIC_MAX_PRIVATE - 1 respectively, instead of VGIC_MAX_SPI or VGIC_MAX_PRIVATE, which, based on the original logic: /* SGIs and PPIs */ if (intid <= VGIC_MAX_PRIVATE) return &vcpu->arch.vgic_cpu.private_irqs[intid]; /* SPIs */ if (intid <= VGIC_MAX_SPI) return &kvm->arch.vgic.spis[intid - VGIC_NR_PRIVATE_IRQS]; are valid values for intid. Fix this by calling array_index_nospec() macro with VGIC_MAX_PRIVATE + 1 and VGIC_MAX_SPI + 1 as arguments for its parameter size. Fixes: 41b87599 ("KVM: arm/arm64: vgic: fix possible spectre-v1 in vgic_get_irq()") Cc: stable@vger.kernel.org Signed-off-by: NGustavo A. R. Silva <gustavo@embeddedor.com> [dropped the SPI part which was fixed separately] Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
由 Christoffer Dall 提交于
commit 60c3ab30d8c2ff3a52606df03f05af2aae07dc6b upstream. When restoring the active state from userspace, we don't know which CPU was the source for the active state, and this is not architecturally exposed in any of the register state. Set the active_source to 0 in this case. In the future, we can expand on this and exposse the information as additional information to userspace for GICv2 if anyone cares. Cc: stable@vger.kernel.org Signed-off-by: NChristoffer Dall <christoffer.dall@arm.com> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
由 Marc Zyngier 提交于
commit bea2ef803ade3359026d5d357348842bca9edcf1 upstream. SPIs should be checked against the VMs specific configuration, and not the architectural maximum. Cc: stable@vger.kernel.org Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
由 Julien Thierry 提交于
commit 2e2f6c3c0b08eed3fcf7de3c7684c940451bdeb1 upstream. To change the active state of an MMIO, halt is requested for all vcpus of the affected guest before modifying the IRQ state. This is done by calling cond_resched_lock() in vgic_mmio_change_active(). However interrupts are disabled at this point and we cannot reschedule a vcpu. We actually don't need any of this, as kvm_arm_halt_guest ensures that all the other vcpus are out of the guest. Let's just drop that useless code. Signed-off-by: NJulien Thierry <julien.thierry@arm.com> Suggested-by: NChristoffer Dall <christoffer.dall@arm.com> Cc: stable@vger.kernel.org Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
由 Marc Zyngier 提交于
commit 107352a24900fb458152b92a4e72fbdc83fd5510 upstream. We currently only halt the guest when a vCPU messes with the active state of an SPI. This is perfectly fine for GICv2, but isn't enough for GICv3, where all vCPUs can access the state of any other vCPU. Let's broaden the condition to include any GICv3 interrupt that has an active state (i.e. all but LPIs). Cc: stable@vger.kernel.org Reviewed-by: NChristoffer Dall <christoffer.dall@arm.com> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 14 11月, 2018 2 次提交
-
-
由 Mark Rutland 提交于
commit da5a3ce66b8bb51b0ea8a89f42aac153903f90fb upstream. At boot time, KVM stashes the host MDCR_EL2 value, but only does this when the kernel is not running in hyp mode (i.e. is non-VHE). In these cases, the stashed value of MDCR_EL2.HPMN happens to be zero, which can lead to CONSTRAINED UNPREDICTABLE behaviour. Since we use this value to derive the MDCR_EL2 value when switching to/from a guest, after a guest have been run, the performance counters do not behave as expected. This has been observed to result in accesses via PMXEVTYPER_EL0 and PMXEVCNTR_EL0 not affecting the relevant counters, resulting in events not being counted. In these cases, only the fixed-purpose cycle counter appears to work as expected. Fix this by always stashing the host MDCR_EL2 value, regardless of VHE. Cc: Christopher Dall <christoffer.dall@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: stable@vger.kernel.org Fixes: 1e947bad ("arm64: KVM: Skip HYP setup when already running in HYP") Tested-by: NRobin Murphy <robin.murphy@arm.com> Signed-off-by: NMark Rutland <mark.rutland@arm.com> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
由 Punit Agrawal 提交于
commit fd2ef358 upstream. PageTransCompoundMap() returns true for hugetlbfs and THP hugepages. This behaviour incorrectly leads to stage 2 faults for unsupported hugepage sizes (e.g., 64K hugepage with 4K pages) to be treated as THP faults. Tighten the check to filter out hugetlbfs pages. This also leads to consistently mapping all unsupported hugepage sizes as PTE level entries at stage 2. Signed-off-by: NPunit Agrawal <punit.agrawal@arm.com> Reviewed-by: NSuzuki Poulose <suzuki.poulose@arm.com> Cc: Christoffer Dall <christoffer.dall@arm.com> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: stable@vger.kernel.org # v4.13+ Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 07 9月, 2018 2 次提交
-
-
由 Marc Zyngier 提交于
kvm_unmap_hva is long gone, and we only have kvm_unmap_hva_range to deal with. Drop the now obsolete code. Fixes: fb1522e0 ("KVM: update to new mmu_notifier semantic v2") Cc: James Hogan <jhogan@kernel.org> Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com> Signed-off-by: NChristoffer Dall <christoffer.dall@arm.com>
-
由 Marc Zyngier 提交于
When triggering a CoW, we unmap the RO page via an MMU notifier (invalidate_range_start), and then populate the new PTE using another one (change_pte). In the meantime, we'll have copied the old page into the new one. The problem is that the data for the new page is sitting in the cache, and should the guest have an uncached mapping to that page (or its MMU off), following accesses will bypass the cache. In a way, this is similar to what happens on a translation fault: We need to clean the page to the PoC before mapping it. So let's just do that. This fixes a KVM unit test regression observed on a HiSilicon platform, and subsequently reproduced on Seattle. Fixes: a9c0e12e ("KVM: arm/arm64: Only clean the dcache on translation fault") Cc: stable@vger.kernel.org # v4.16+ Reported-by: NMike Galbraith <efault@gmx.de> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com> Signed-off-by: NChristoffer Dall <christoffer.dall@arm.com>
-