- 09 6月, 2020 1 次提交
-
-
由 Miaohe Lin 提交于
to #27338374 commit d80b64ff297e40c2b6f7d7abc1b3eba70d22a068 upstream. When kmalloc memory for sd->sev_vmcbs failed, we forget to free the page held by sd->save_area. Also get rid of the var r as '-ENOMEM' is actually the only possible outcome here. Reviewed-by: NLiran Alon <liran.alon@oracle.com> Reviewed-by: NVitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: NMiaohe Lin <linmiaohe@huawei.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Cc: Ben Hutchings <ben.hutchings@codethink.co.uk> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Reference: CVE-2020-12768 Signed-off-by: NShile Zhang <shile.zhang@linux.alibaba.com> Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
-
- 17 1月, 2020 6 次提交
-
-
由 Singh, Brijesh 提交于
commit 05d5a48635259e621ea26d01e8316c6feeb34190 upstream. Errata#1096: On a nested data page fault when CR.SMAP=1 and the guest data read generates a SMAP violation, GuestInstrBytes field of the VMCB on a VMEXIT will incorrectly return 0h instead the correct guest instruction bytes . Recommend Workaround: To determine what instruction the guest was executing the hypervisor will have to decode the instruction at the instruction pointer. The recommended workaround can not be implemented for the SEV guest because guest memory is encrypted with the guest specific key, and instruction decoder will not be able to decode the instruction bytes. If we hit this errata in the SEV guest then log the message and request a guest shutdown. Reported-by: NVenkatesh Srinivas <venkateshs@google.com> Cc: Jim Mattson <jmattson@google.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Joerg Roedel <joro@8bytes.org> Cc: "Radim Krčmář" <rkrcmar@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: NBrijesh Singh <brijesh.singh@amd.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NWANG Siyuan <Siyuan.Wang@amd.com> Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>
-
由 Suthikulpanit, Suravee 提交于
commit e44e3eacccfd2294a1ce279f68452b1635d7fa82 upstream. Only clear the valid bit when invalidate logical APIC id entry. The current logic clear the valid bit, but also set the rest of the bits (including reserved bits) to 1. Fixes: 98d90582be2e ('svm: Fix AVIC DFR and LDR handling') Signed-off-by: NSuravee Suthikulpanit <suravee.suthikulpanit@amd.com> Cc: stable@vger.kernel.org Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NWANG Siyuan <Siyuan.Wang@amd.com> Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>
-
由 Suthikulpanit, Suravee 提交于
commit c57cd3c89ecf2812976f53e494580a396f93efd2 upstream. The function svm_refresh_apicv_exec_ctrl() always returning prematurely as kvm_vcpu_apicv_active() always return false when calling from the function arch/x86/kvm/x86.c:kvm_vcpu_deactivate_apicv(). This is because the apicv_active is set to false just before calling refresh_apicv_exec_ctrl(). Also, we need to mark VMCB_AVIC bit as dirty instead of VMCB_INTR. So, fix svm_refresh_apicv_exec_ctrl() to properly deactivate AVIC. Fixes: 67034bb9 ('KVM: SVM: Add irqchip_split() checks before enabling AVIC') Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: NSuravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NWANG Siyuan <Siyuan.Wang@amd.com> Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>
-
由 Suthikulpanit, Suravee 提交于
commit 98d90582be2e08246a70af31e09950ecb8876252 upstream. Current SVM AVIC driver makes two incorrect assumptions: 1. APIC LDR register cannot be zero 2. APIC DFR for all vCPUs must be the same LDR=0 means the local APIC does not support logical destination mode. Therefore, the driver should mark any previously assigned logical APIC ID table entry as invalid, and return success. Also, DFR is specific to a particular local APIC, and can be different among all vCPUs (as observed on Windows 10). These incorrect assumptions cause Windows 10 and FreeBSD VMs to fail to boot with AVIC enabled. So, instead of flush the whole logical APIC ID table, handle DFR and LDR for each vCPU independently. Fixes: 18f40c53 ('svm: Add VMEXIT handlers for AVIC') Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Reported-by: NJulian Stecklina <jsteckli@amazon.de> Signed-off-by: NSuravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NWANG Siyuan <Siyuan.Wang@amd.com> Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>
-
由 Suravee Suthikulpanit 提交于
commit 37ef0c4414c9743ba7f1af4392f0a27a99649f2a upstream. Print warning message when IPI target ID is invalid due to one of the following reasons: * In logical mode: cluster > max_cluster (64) * In physical mode: target > max_physical (512) * Address is not present in the physical or logical ID tables Signed-off-by: NSuravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NWANG Siyuan <Siyuan.Wang@amd.com> Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>
-
由 Tambe, William 提交于
commit e081354d6aa7b67c6d0ef51ff8c428b6c261a6fe upstream. Currently, the nested guest's PAUSE intercept intentions are not being honored. Instead, since the L0 hypervisor's pause_filter_count and pause_filter_thresh values are still in place, these values are used instead of those programmed in the VMCB by the L1 hypervisor. To honor the desired PAUSE intercept support of the L1 hypervisor, the L0 hypervisor must use the PAUSE filtering fields of the L1 hypervisor. This requires saving and restoring of both the L0 and L1 hypervisor's PAUSE filtering fields. Signed-off-by: NWilliam Tambe <william.tambe@amd.com> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com> Signed-off-by: NWANG Siyuan <Siyuan.Wang@amd.com> Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>
-
- 13 11月, 2019 1 次提交
-
-
由 Paolo Bonzini 提交于
commit 9167ab79936206118cc60e47dcb926c3489f3bd5 upstream. VMX already does so if the host has SMEP, in order to support the combination of CR0.WP=1 and CR4.SMEP=1. However, it is perfectly safe to always do so, and in fact VMX also ends up running with EFER.NXE=1 on old processors that lack the "load EFER" controls, because it may help avoiding a slow MSR write. SVM does not have similar code, but it should since recent AMD processors do support SMEP. So this patch makes the code for the two vendors simpler and more similar, while fixing an issue with CR0.WP=1 and CR4.SMEP=1 on AMD. Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: Joerg Roedel <jroedel@suse.de> Cc: stable@vger.kernel.org Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 16 9月, 2019 1 次提交
-
-
由 WANG Chao 提交于
[ Upstream commit 1811d979c71621aafc7b879477202d286f7e863b ] guest xcr0 could leak into host when MCE happens in guest mode. Because do_machine_check() could schedule out at a few places. For example: kvm_load_guest_xcr0 ... kvm_x86_ops->run(vcpu) { vmx_vcpu_run vmx_complete_atomic_exit kvm_machine_check do_machine_check do_memory_failure memory_failure lock_page In this case, host_xcr0 is 0x2ff, guest vcpu xcr0 is 0xff. After schedule out, host cpu has guest xcr0 loaded (0xff). In __switch_to { switch_fpu_finish copy_kernel_to_fpregs XRSTORS If any bit i in XSTATE_BV[i] == 1 and xcr0[i] == 0, XRSTORS will generate #GP (In this case, bit 9). Then ex_handler_fprestore kicks in and tries to reinitialize fpu by restoring init fpu state. Same story as last #GP, except we get DOUBLE FAULT this time. Cc: stable@vger.kernel.org Signed-off-by: NWANG Chao <chao.wang@ucloud.cn> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NSasha Levin <sashal@kernel.org>
-
- 16 8月, 2019 1 次提交
-
-
由 Wanpeng Li 提交于
commit 17e433b54393a6269acbcb792da97791fe1592d8 upstream. After commit d73eb57b80b (KVM: Boost vCPUs that are delivering interrupts), a five years old bug is exposed. Running ebizzy benchmark in three 80 vCPUs VMs on one 80 pCPUs Skylake server, a lot of rcu_sched stall warning splatting in the VMs after stress testing: INFO: rcu_sched detected stalls on CPUs/tasks: { 4 41 57 62 77} (detected by 15, t=60004 jiffies, g=899, c=898, q=15073) Call Trace: flush_tlb_mm_range+0x68/0x140 tlb_flush_mmu.part.75+0x37/0xe0 tlb_finish_mmu+0x55/0x60 zap_page_range+0x142/0x190 SyS_madvise+0x3cd/0x9c0 system_call_fastpath+0x1c/0x21 swait_active() sustains to be true before finish_swait() is called in kvm_vcpu_block(), voluntarily preempted vCPUs are taken into account by kvm_vcpu_on_spin() loop greatly increases the probability condition kvm_arch_vcpu_runnable(vcpu) is checked and can be true, when APICv is enabled the yield-candidate vCPU's VMCS RVI field leaks(by vmx_sync_pir_to_irr()) into spinning-on-a-taken-lock vCPU's current VMCS. This patch fixes it by checking conservatively a subset of events. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Marc Zyngier <Marc.Zyngier@arm.com> Cc: stable@vger.kernel.org Fixes: 98f4a146 (KVM: add kvm_arch_vcpu_runnable() test to kvm_vcpu_on_spin() loop) Signed-off-by: NWanpeng Li <wanpengli@tencent.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 31 5月, 2019 1 次提交
-
-
由 Suthikulpanit, Suravee 提交于
commit c9bcd3e3335d0a29d89fabd2c385e1b989e6f1b0 upstream. Current logic does not allow VCPU to be loaded onto CPU with APIC ID 255. This should be allowed since the host physical APIC ID field in the AVIC Physical APIC table entry is an 8-bit value, and APIC ID 255 is valid in system with x2APIC enabled. Instead, do not allow VCPU load if the host APIC ID cannot be represented by an 8-bit value. Also, use the more appropriate AVIC_PHYSICAL_ID_ENTRY_HOST_PHYSICAL_ID_MASK instead of AVIC_MAX_PHYSICAL_ID_COUNT. Signed-off-by: NSuravee Suthikulpanit <suravee.suthikulpanit@amd.com> Cc: stable@vger.kernel.org Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 08 5月, 2019 1 次提交
-
-
由 David Rientjes 提交于
[ Upstream commit b86bc2858b389255cd44555ce4b1e427b2b770c0 ] This ensures that the address and length provided to DBG_DECRYPT and DBG_ENCRYPT do not cause an overflow. At the same time, pass the actual number of pages pinned in memory to sev_unpin_memory() as a cleanup. Reported-by: NCfir Cohen <cfir@google.com> Signed-off-by: NDavid Rientjes <rientjes@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NSasha Levin <sashal@kernel.org>
-
- 27 4月, 2019 2 次提交
-
-
由 Suthikulpanit, Suravee 提交于
commit 4a58038b9e420276157785afa0a0bbb4b9bc2265 upstream. This reverts commit bb218fbcfaaa3b115d4cd7a43c0ca164f3a96e57. As Oren Twaig pointed out the old discussion: https://patchwork.kernel.org/patch/8292231/ that the change coud potentially cause an extra IPI to be sent to the destination vcpu because the AVIC hardware already set the IRR bit before the incomplete IPI #VMEXIT with id=1 (target vcpu is not running). Since writting to ICR and ICR2 will also set the IRR. If something triggers the destination vcpu to get scheduled before the emulation finishes, then this could result in an additional IPI. Also, the issue mentioned in the commit bb218fbcfaaa was misdiagnosed. Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Reported-by: NOren Twaig <oren@scalemp.com> Signed-off-by: NSuravee Suthikulpanit <suravee.suthikulpanit@amd.com> Cc: stable@vger.kernel.org Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
由 Vitaly Kuznetsov 提交于
commit 99c221796a810055974b54c02e8f53297e48d146 upstream. I noticed that apic test from kvm-unit-tests always hangs on my EPYC 7401P, the hanging test nmi-after-sti is trying to deliver 30000 NMIs and tracing shows that we're sometimes able to deliver a few but never all. When we're trying to inject an NMI we may fail to do so immediately for various reasons, however, we still need to inject it so enable_nmi_window() arms nmi_singlestep mode. #DB occurs as expected, but we're not checking for pending NMIs before entering the guest and unless there's a different event to process, the NMI will never get delivered. Make KVM_REQ_EVENT request on the vCPU from db_interception() to make sure pending NMIs are checked and possibly injected. Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 17 4月, 2019 1 次提交
-
-
由 David Rientjes 提交于
commit ede885ecb2cdf8a8dd5367702e3d964ec846a2d5 upstream. get_num_contig_pages() could potentially overflow int so make its type consistent with its usage. Reported-by: NCfir Cohen <cfir@google.com> Cc: stable@vger.kernel.org Signed-off-by: NDavid Rientjes <rientjes@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 06 3月, 2019 2 次提交
-
-
由 Vitaly Kuznetsov 提交于
[ Upstream commit 619ad846fc3452adaf71ca246c5aa711e2055398 ] kvm-unit-tests' eventinj "NMI failing on IDT" test results in NMI being delivered to the host (L1) when it's running nested. The problem seems to be: svm_complete_interrupts() raises 'nmi_injected' flag but later we decide to reflect EXIT_NPF to L1. The flag remains pending and we do NMI injection upon entry so it got delivered to L1 instead of L2. It seems that VMX code solves the same issue in prepare_vmcs12(), this was introduced with code refactoring in commit 5f3d5799 ("KVM: nVMX: Rework event injection and recovery"). Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NSasha Levin <sashal@kernel.org>
-
由 Suravee Suthikulpanit 提交于
[ Upstream commit bb218fbcfaaa3b115d4cd7a43c0ca164f3a96e57 ] In case of incomplete IPI with invalid interrupt type, the current SVM driver does not properly emulate the IPI, and fails to boot FreeBSD guests with multiple vcpus when enabling AVIC. Fix this by update APIC ICR high/low registers, which also emulate sending the IPI. Signed-off-by: NSuravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NSasha Levin <sashal@kernel.org>
-
- 20 2月, 2019 1 次提交
-
-
由 David Rientjes 提交于
[ Upstream commit 3f14a89d1132dcae3c8ce6721c6ef51f6e6d9b5f ] By code inspection, it was found that multiple calls to KVM_SEV_INIT could deplete asid bits and overwrite kvm_sev_info's regions_list. Multiple calls to KVM_SVM_INIT is not likely to occur with QEMU, but this should likely be fixed anyway. This code is serialized by kvm->lock. Fixes: 1654efcb ("KVM: SVM: Add KVM_SEV_INIT command") Reported-by: NCfir Cohen <cfir@google.com> Signed-off-by: NDavid Rientjes <rientjes@google.com> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com> Signed-off-by: NSasha Levin <sashal@kernel.org>
-
- 13 2月, 2019 1 次提交
-
-
由 Vitaly Kuznetsov 提交于
[ Upstream commit e87555e550cef4941579cd879759a7c0dee24e68 ] AMD doesn't seem to implement MSR_IA32_MCG_EXT_CTL and svm code in kvm knows nothing about it, however, this MSR is among emulated_msrs and thus returned with KVM_GET_MSR_INDEX_LIST. The consequent KVM_GET_MSRS, of course, fails. Report the MSR as unsupported to not confuse userspace. Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com> Signed-off-by: NSasha Levin <sashal@kernel.org>
-
- 08 12月, 2018 1 次提交
-
-
由 Wei Wang 提交于
commit 30510387a5e45bfcf8190e03ec7aa15b295828e2 upstream. There is a race condition when accessing kvm->arch.apic_access_page_done. Due to it, x86_set_memory_region will fail when creating the second vcpu for a svm guest. Add a mutex_lock to serialize the accesses to apic_access_page_done. This lock is also used by vmx for the same purpose. Signed-off-by: NWei Wang <wawei@amazon.de> Signed-off-by: NAmadeusz Juskowiak <ajusk@amazon.de> Signed-off-by: NJulian Stecklina <jsteckli@amazon.de> Signed-off-by: NSuravee Suthikulpanit <suravee.suthikulpanit@amd.com> Reviewed-by: NJoerg Roedel <jroedel@suse.de> Cc: stable@vger.kernel.org Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 06 12月, 2018 2 次提交
-
-
由 Leonid Shatz 提交于
commit 326e7425 upstream. Since commit e79f245d ("X86/KVM: Properly update 'tsc_offset' to represent the running guest"), vcpu->arch.tsc_offset meaning was changed to always reflect the tsc_offset value set on active VMCS. Regardless if vCPU is currently running L1 or L2. However, above mentioned commit failed to also change kvm_vcpu_write_tsc_offset() to set vcpu->arch.tsc_offset correctly. This is because vmx_write_tsc_offset() could set the tsc_offset value in active VMCS to given offset parameter *plus vmcs12->tsc_offset*. However, kvm_vcpu_write_tsc_offset() just sets vcpu->arch.tsc_offset to given offset parameter. Without taking into account the possible addition of vmcs12->tsc_offset. (Same is true for SVM case). Fix this issue by changing kvm_x86_ops->write_tsc_offset() to return actually set tsc_offset in active VMCS and modify kvm_vcpu_write_tsc_offset() to set returned value in vcpu->arch.tsc_offset. In addition, rename write_tsc_offset() callback to write_l1_tsc_offset() to make it clear that it is meant to set L1 TSC offset. Fixes: e79f245d ("X86/KVM: Properly update 'tsc_offset' to represent the running guest") Reviewed-by: NLiran Alon <liran.alon@oracle.com> Reviewed-by: NMihai Carabas <mihai.carabas@oracle.com> Reviewed-by: NKrish Sadhukhan <krish.sadhukhan@oracle.com> Signed-off-by: NLeonid Shatz <leonid.shatz@oracle.com> Cc: stable@vger.kernel.org Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
由 Jim Mattson 提交于
commit fd65d3142f734bc4376053c8d75670041903134d upstream. Previously, we only called indirect_branch_prediction_barrier on the logical CPU that freed a vmcb. This function should be called on all logical CPUs that last loaded the vmcb in question. Fixes: 15d45071 ("KVM/x86: Add IBPB support") Reported-by: NNeel Natu <neelnatu@google.com> Signed-off-by: NJim Mattson <jmattson@google.com> Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: stable@vger.kernel.org Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- 10 10月, 2018 1 次提交
-
-
由 Paolo Bonzini 提交于
SEV requires access to the AMD cryptographic device APIs, and this does not work when KVM is builtin and the crypto driver is a module. Actually the Kconfig conditions for CONFIG_KVM_AMD_SEV try to disable SEV in that case, but it does not work because the actual crypto calls are not culled, only sev_hardware_setup() is. This patch adds two CONFIG_KVM_AMD_SEV checks that gate all the remaining SEV code; it fixes this particular configuration, and drops 5 KiB of code when CONFIG_KVM_AMD_SEV=n. Reported-by: NGuenter Roeck <linux@roeck-us.net> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 20 9月, 2018 2 次提交
-
-
由 Sean Christopherson 提交于
A VMX preemption timer value of '0' is guaranteed to cause a VMExit prior to the CPU executing any instructions in the guest. Use the preemption timer (if it's supported) to trigger immediate VMExit in place of the current method of sending a self-IPI. This ensures that pending VMExit injection to L1 occurs prior to executing any instructions in the guest (regardless of nesting level). When deferring VMExit injection, KVM generates an immediate VMExit from the (possibly nested) guest by sending itself an IPI. Because hardware interrupts are blocked prior to VMEnter and are unblocked (in hardware) after VMEnter, this results in taking a VMExit(INTR) before any guest instruction is executed. But, as this approach relies on the IPI being received before VMEnter executes, it only works as intended when KVM is running as L0. Because there are no architectural guarantees regarding when IPIs are delivered, when running nested the INTR may "arrive" long after L2 is running e.g. L0 KVM doesn't force an immediate switch to L1 to deliver an INTR. For the most part, this unintended delay is not an issue since the events being injected to L1 also do not have architectural guarantees regarding their timing. The notable exception is the VMX preemption timer[1], which is architecturally guaranteed to cause a VMExit prior to executing any instructions in the guest if the timer value is '0' at VMEnter. Specifically, the delay in injecting the VMExit causes the preemption timer KVM unit test to fail when run in a nested guest. Note: this approach is viable even on CPUs with a broken preemption timer, as broken in this context only means the timer counts at the wrong rate. There are no known errata affecting timer value of '0'. [1] I/O SMIs also have guarantees on when they arrive, but I have no idea if/how those are emulated in KVM. Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> [Use a hook for SVM instead of leaving the default in x86.c - Paolo] Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Andy Shevchenko 提交于
Switch to bitmap_zalloc() to show clearly what we are allocating. Besides that it returns pointer of bitmap type instead of opaque void *. Signed-off-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 30 8月, 2018 3 次提交
-
-
由 Sean Christopherson 提交于
Lack of the kvm_ prefix gives the impression that it's a VMX or SVM specific function, and there's no conflict that prevents adding the kvm_ prefix. Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
由 Sean Christopherson 提交于
Re-execution after an emulation decode failure is only intended to handle a case where two or vCPUs race to write a shadowed page, i.e. we should never re-execute an instruction as part of RSM emulation. Add a new helper, kvm_emulate_instruction_from_buffer(), to support emulating from a pre-defined buffer. This eliminates the last direct call to x86_emulate_instruction() outside of kvm_mmu_page_fault(), which means x86_emulate_instruction() can be unexported in a future patch. Fixes: 7607b717 ("KVM: SVM: install RSM intercept") Cc: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Cc: stable@vger.kernel.org Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
由 Colin Ian King 提交于
Variable dst_vaddr_end is being assigned but is never used hence it is redundant and can be removed. Cleans up clang warning: variable 'dst_vaddr_end' set but not used [-Wunused-but-set-variable] Signed-off-by: NColin Ian King <colin.king@canonical.com> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
- 22 8月, 2018 1 次提交
-
-
由 Thomas Gleixner 提交于
Mikhail reported the following lockdep splat: WARNING: possible irq lock inversion dependency detected CPU 0/KVM/10284 just changed the state of lock: 000000000d538a88 (&st->lock){+...}, at: speculative_store_bypass_update+0x10b/0x170 but this lock was taken by another, HARDIRQ-safe lock in the past: (&(&sighand->siglock)->rlock){-.-.} and interrupts could create inverse lock ordering between them. Possible interrupt unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&st->lock); local_irq_disable(); lock(&(&sighand->siglock)->rlock); lock(&st->lock); <Interrupt> lock(&(&sighand->siglock)->rlock); *** DEADLOCK *** The code path which connects those locks is: speculative_store_bypass_update() ssb_prctl_set() do_seccomp() do_syscall_64() In svm_vcpu_run() speculative_store_bypass_update() is called with interupts enabled via x86_virt_spec_ctrl_set_guest/host(). This is actually a false positive, because GIF=0 so interrupts are disabled even if IF=1; however, we can easily move the invocations of x86_virt_spec_ctrl_set_guest/host() into the interrupt disabled region to cure it, and it's a good idea to keep the GIF=0/IF=1 area as small and self-contained as possible. Fixes: 1f50ddb4 ("x86/speculation: Handle HT correctly on AMD") Reported-by: NMikhail Gavrilov <mikhail.v.gavrilov@gmail.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Tested-by: NMikhail Gavrilov <mikhail.v.gavrilov@gmail.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Borislav Petkov <bp@suse.de> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: kvm@vger.kernel.org Cc: x86@kernel.org Cc: stable@vger.kernel.org Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 06 8月, 2018 2 次提交
-
-
由 Junaid Shahid 提交于
This needs a minor bug fix. The updated patch is as follows. Thanks, Junaid ------------------------------------------------------------------------------ kvm_mmu_invlpg() and kvm_mmu_invpcid_gva() only need to flush the TLB entries for the specific guest virtual address, instead of flushing all TLB entries associated with the VM. Signed-off-by: NJunaid Shahid <junaids@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Junaid Shahid 提交于
Remove the implicit flush from the set_cr3 handlers, so that the callers are able to decide whether to flush the TLB or not. Signed-off-by: NJunaid Shahid <junaids@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 13 6月, 2018 1 次提交
-
-
由 Kees Cook 提交于
The kmalloc() function has a 2-factor argument form, kmalloc_array(). This patch replaces cases of: kmalloc(a * b, gfp) with: kmalloc_array(a * b, gfp) as well as handling cases of: kmalloc(a * b * c, gfp) with: kmalloc(array3_size(a, b, c), gfp) as it's slightly less ugly than: kmalloc_array(array_size(a, b), c, gfp) This does, however, attempt to ignore constant size factors like: kmalloc(4 * 1024, gfp) though any constants defined via macros get caught up in the conversion. Any factors with a sizeof() of "unsigned char", "char", and "u8" were dropped, since they're redundant. The tools/ directory was manually excluded, since it has its own implementation of kmalloc(). The Coccinelle script used for this was: // Fix redundant parens around sizeof(). @@ type TYPE; expression THING, E; @@ ( kmalloc( - (sizeof(TYPE)) * E + sizeof(TYPE) * E , ...) | kmalloc( - (sizeof(THING)) * E + sizeof(THING) * E , ...) ) // Drop single-byte sizes and redundant parens. @@ expression COUNT; typedef u8; typedef __u8; @@ ( kmalloc( - sizeof(u8) * (COUNT) + COUNT , ...) | kmalloc( - sizeof(__u8) * (COUNT) + COUNT , ...) | kmalloc( - sizeof(char) * (COUNT) + COUNT , ...) | kmalloc( - sizeof(unsigned char) * (COUNT) + COUNT , ...) | kmalloc( - sizeof(u8) * COUNT + COUNT , ...) | kmalloc( - sizeof(__u8) * COUNT + COUNT , ...) | kmalloc( - sizeof(char) * COUNT + COUNT , ...) | kmalloc( - sizeof(unsigned char) * COUNT + COUNT , ...) ) // 2-factor product with sizeof(type/expression) and identifier or constant. @@ type TYPE; expression THING; identifier COUNT_ID; constant COUNT_CONST; @@ ( - kmalloc + kmalloc_array ( - sizeof(TYPE) * (COUNT_ID) + COUNT_ID, sizeof(TYPE) , ...) | - kmalloc + kmalloc_array ( - sizeof(TYPE) * COUNT_ID + COUNT_ID, sizeof(TYPE) , ...) | - kmalloc + kmalloc_array ( - sizeof(TYPE) * (COUNT_CONST) + COUNT_CONST, sizeof(TYPE) , ...) | - kmalloc + kmalloc_array ( - sizeof(TYPE) * COUNT_CONST + COUNT_CONST, sizeof(TYPE) , ...) | - kmalloc + kmalloc_array ( - sizeof(THING) * (COUNT_ID) + COUNT_ID, sizeof(THING) , ...) | - kmalloc + kmalloc_array ( - sizeof(THING) * COUNT_ID + COUNT_ID, sizeof(THING) , ...) | - kmalloc + kmalloc_array ( - sizeof(THING) * (COUNT_CONST) + COUNT_CONST, sizeof(THING) , ...) | - kmalloc + kmalloc_array ( - sizeof(THING) * COUNT_CONST + COUNT_CONST, sizeof(THING) , ...) ) // 2-factor product, only identifiers. @@ identifier SIZE, COUNT; @@ - kmalloc + kmalloc_array ( - SIZE * COUNT + COUNT, SIZE , ...) // 3-factor product with 1 sizeof(type) or sizeof(expression), with // redundant parens removed. @@ expression THING; identifier STRIDE, COUNT; type TYPE; @@ ( kmalloc( - sizeof(TYPE) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kmalloc( - sizeof(TYPE) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kmalloc( - sizeof(TYPE) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kmalloc( - sizeof(TYPE) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kmalloc( - sizeof(THING) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kmalloc( - sizeof(THING) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kmalloc( - sizeof(THING) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kmalloc( - sizeof(THING) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) ) // 3-factor product with 2 sizeof(variable), with redundant parens removed. @@ expression THING1, THING2; identifier COUNT; type TYPE1, TYPE2; @@ ( kmalloc( - sizeof(TYPE1) * sizeof(TYPE2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | kmalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | kmalloc( - sizeof(THING1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | kmalloc( - sizeof(THING1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | kmalloc( - sizeof(TYPE1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) | kmalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) ) // 3-factor product, only identifiers, with redundant parens removed. @@ identifier STRIDE, SIZE, COUNT; @@ ( kmalloc( - (COUNT) * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - COUNT * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - COUNT * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - (COUNT) * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - COUNT * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - (COUNT) * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - (COUNT) * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - COUNT * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) ) // Any remaining multi-factor products, first at least 3-factor products, // when they're not all constants... @@ expression E1, E2, E3; constant C1, C2, C3; @@ ( kmalloc(C1 * C2 * C3, ...) | kmalloc( - (E1) * E2 * E3 + array3_size(E1, E2, E3) , ...) | kmalloc( - (E1) * (E2) * E3 + array3_size(E1, E2, E3) , ...) | kmalloc( - (E1) * (E2) * (E3) + array3_size(E1, E2, E3) , ...) | kmalloc( - E1 * E2 * E3 + array3_size(E1, E2, E3) , ...) ) // And then all remaining 2 factors products when they're not all constants, // keeping sizeof() as the second factor argument. @@ expression THING, E1, E2; type TYPE; constant C1, C2, C3; @@ ( kmalloc(sizeof(THING) * C2, ...) | kmalloc(sizeof(TYPE) * C2, ...) | kmalloc(C1 * C2 * C3, ...) | kmalloc(C1 * C2, ...) | - kmalloc + kmalloc_array ( - sizeof(TYPE) * (E2) + E2, sizeof(TYPE) , ...) | - kmalloc + kmalloc_array ( - sizeof(TYPE) * E2 + E2, sizeof(TYPE) , ...) | - kmalloc + kmalloc_array ( - sizeof(THING) * (E2) + E2, sizeof(THING) , ...) | - kmalloc + kmalloc_array ( - sizeof(THING) * E2 + E2, sizeof(THING) , ...) | - kmalloc + kmalloc_array ( - (E1) * E2 + E1, E2 , ...) | - kmalloc + kmalloc_array ( - (E1) * (E2) + E1, E2 , ...) | - kmalloc + kmalloc_array ( - E1 * E2 + E1, E2 , ...) ) Signed-off-by: NKees Cook <keescook@chromium.org>
-
- 06 6月, 2018 1 次提交
-
-
由 Konrad Rzeszutek Wilk 提交于
The AMD document outlining the SSBD handling 124441_AMD64_SpeculativeStoreBypassDisable_Whitepaper_final.pdf mentions that if CPUID 8000_0008.EBX[24] is set we should be using the SPEC_CTRL MSR (0x48) over the VIRT SPEC_CTRL MSR (0xC001_011f) for speculative store bypass disable. This in effect means we should clear the X86_FEATURE_VIRT_SSBD flag so that we would prefer the SPEC_CTRL MSR. See the document titled: 124441_AMD64_SpeculativeStoreBypassDisable_Whitepaper_final.pdf A copy of this document is available at https://bugzilla.kernel.org/show_bug.cgi?id=199889Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Janakarajan Natarajan <Janakarajan.Natarajan@amd.com> Cc: kvm@vger.kernel.org Cc: KarimAllah Ahmed <karahmed@amazon.de> Cc: andrew.cooper3@citrix.com Cc: Joerg Roedel <joro@8bytes.org> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Borislav Petkov <bp@suse.de> Cc: David Woodhouse <dwmw@amazon.co.uk> Cc: Kees Cook <keescook@chromium.org> Link: https://lkml.kernel.org/r/20180601145921.9500-3-konrad.wilk@oracle.com
-
- 02 6月, 2018 1 次提交
-
-
由 Marc Orr 提交于
The kvm struct has been bloating. For example, it's tens of kilo-bytes for x86, which turns out to be a large amount of memory to allocate contiguously via kzalloc. Thus, this patch does the following: 1. Uses architecture-specific routines to allocate the kvm struct via vzalloc for x86. 2. Switches arm to __KVM_HAVE_ARCH_VM_ALLOC so that it can use vzalloc when has_vhe() is true. Other architectures continue to default to kalloc, as they have a dependency on kalloc or have a small-enough struct kvm. Signed-off-by: NMarc Orr <marcorr@google.com> Reviewed-by: NMarc Zyngier <marc.zyngier@arm.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 25 5月, 2018 1 次提交
-
-
由 Dan Carpenter 提交于
This is a fix from reviewing the code, but it looks like it might be able to lead to an Oops. It affects 32bit systems. The KVM_MEMORY_ENCRYPT_REG_REGION ioctl uses a u64 for range->addr and range->size but the high 32 bits would be truncated away on a 32 bit system. This is harmless but it's also harmless to prevent it. Then in sev_pin_memory() the "uaddr + ulen" calculation can wrap around. The wrap around can happen on 32 bit or 64 bit systems, but I was only able to figure out a problem for 32 bit systems. We would pick a number which results in "npages" being zero. The sev_pin_memory() would then return ZERO_SIZE_PTR without allocating anything. I made it illegal to call sev_pin_memory() with "ulen" set to zero. Hopefully, that doesn't cause any problems. I also changed the type of "first" and "last" to long, just for cosmetic reasons. Otherwise on a 64 bit system you're saving "uaddr >> 12" in an int and it truncates the high 20 bits away. The math works in the current code so far as I can see but it's just weird. Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com> [Brijesh noted that the code is only reachable on X86_64.] Reviewed-by: NBrijesh Singh <brijesh.singh@amd.com> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
- 17 5月, 2018 4 次提交
-
-
由 Tom Lendacky 提交于
Expose the new virtualized architectural mechanism, VIRT_SSBD, for using speculative store bypass disable (SSBD) under SVM. This will allow guests to use SSBD on hardware that uses non-architectural mechanisms for enabling SSBD. [ tglx: Folded the migration fixup from Paolo Bonzini ] Signed-off-by: NTom Lendacky <thomas.lendacky@amd.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Thomas Gleixner 提交于
AMD is proposing a VIRT_SPEC_CTRL MSR to handle the Speculative Store Bypass Disable via MSR_AMD64_LS_CFG so that guests do not have to care about the bit position of the SSBD bit and thus facilitate migration. Also, the sibling coordination on Family 17H CPUs can only be done on the host. Extend x86_spec_ctrl_set_guest() and x86_spec_ctrl_restore_host() with an extra argument for the VIRT_SPEC_CTRL MSR. Hand in 0 from VMX and in SVM add a new virt_spec_ctrl member to the CPU data structure which is going to be used in later patches for the actual implementation. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NBorislav Petkov <bp@suse.de> Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-
由 Borislav Petkov 提交于
Intel and AMD have different CPUID bits hence for those use synthetic bits which get set on the respective vendor's in init_speculation_control(). So that debacles like what the commit message of c65732e4 ("x86/cpu: Restore CPUID_8000_0008_EBX reload") talks about don't happen anymore. Signed-off-by: NBorislav Petkov <bp@suse.de> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com> Tested-by: NJörg Otte <jrg.otte@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Link: https://lkml.kernel.org/r/20180504161815.GG9257@pd.tnic
-
由 Thomas Gleixner 提交于
svm_vcpu_run() invokes x86_spec_ctrl_restore_host() after VMEXIT, but before the host GS is restored. x86_spec_ctrl_restore_host() uses 'current' to determine the host SSBD state of the thread. 'current' is GS based, but host GS is not yet restored and the access causes a triple fault. Move the call after the host GS restore. Fixes: 885f82bf x86/process: Allow runtime control of Speculative Store Bypass Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NBorislav Petkov <bp@suse.de> Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com> Acked-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 15 5月, 2018 1 次提交
-
-
由 Jim Mattson 提交于
Previously, we toggled between SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE and SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES, depending on whether or not the EXTD bit was set in MSR_IA32_APICBASE. However, if the local APIC is disabled, we should not set either of these APIC virtualization control bits. Signed-off-by: NJim Mattson <jmattson@google.com> Signed-off-by: NKrish Sadhukhan <krish.sadhukhan@oracle.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-