- 16 4月, 2019 1 次提交
-
-
由 Suthikulpanit, Suravee 提交于
This reverts commit bb218fbc. As Oren Twaig pointed out the old discussion: https://patchwork.kernel.org/patch/8292231/ that the change coud potentially cause an extra IPI to be sent to the destination vcpu because the AVIC hardware already set the IRR bit before the incomplete IPI #VMEXIT with id=1 (target vcpu is not running). Since writting to ICR and ICR2 will also set the IRR. If something triggers the destination vcpu to get scheduled before the emulation finishes, then this could result in an additional IPI. Also, the issue mentioned in the commit bb218fbc was misdiagnosed. Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Reported-by: NOren Twaig <oren@scalemp.com> Signed-off-by: NSuravee Suthikulpanit <suravee.suthikulpanit@amd.com> Cc: stable@vger.kernel.org Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 06 4月, 2019 2 次提交
-
-
由 David Rientjes 提交于
This ensures that the address and length provided to DBG_DECRYPT and DBG_ENCRYPT do not cause an overflow. At the same time, pass the actual number of pages pinned in memory to sev_unpin_memory() as a cleanup. Reported-by: NCfir Cohen <cfir@google.com> Signed-off-by: NDavid Rientjes <rientjes@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 David Rientjes 提交于
get_num_contig_pages() could potentially overflow int so make its type consistent with its usage. Reported-by: NCfir Cohen <cfir@google.com> Cc: stable@vger.kernel.org Signed-off-by: NDavid Rientjes <rientjes@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 29 3月, 2019 1 次提交
-
-
由 Singh, Brijesh 提交于
Errata#1096: On a nested data page fault when CR.SMAP=1 and the guest data read generates a SMAP violation, GuestInstrBytes field of the VMCB on a VMEXIT will incorrectly return 0h instead the correct guest instruction bytes . Recommend Workaround: To determine what instruction the guest was executing the hypervisor will have to decode the instruction at the instruction pointer. The recommended workaround can not be implemented for the SEV guest because guest memory is encrypted with the guest specific key, and instruction decoder will not be able to decode the instruction bytes. If we hit this errata in the SEV guest then log the message and request a guest shutdown. Reported-by: NVenkatesh Srinivas <venkateshs@google.com> Cc: Jim Mattson <jmattson@google.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Joerg Roedel <joro@8bytes.org> Cc: "Radim Krčmář" <rkrcmar@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: NBrijesh Singh <brijesh.singh@amd.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 21 2月, 2019 3 次提交
-
-
由 Ben Gardon 提交于
There are many KVM kernel memory allocations which are tied to the life of the VM process and should be charged to the VM process's cgroup. If the allocations aren't tied to the process, the OOM killer will not know that killing the process will free the associated kernel memory. Add __GFP_ACCOUNT flags to many of the allocations which are not yet being charged to the VM process's cgroup. Tested: Ran all kvm-unit-tests on a 64 bit Haswell machine, the patch introduced no new failures. Ran a kernel memory accounting test which creates a VM to touch memory and then checks that the kernel memory allocated for the process is within certain bounds. With this patch we account for much more of the vmalloc and slab memory allocated for the VM. Signed-off-by: NBen Gardon <bgardon@google.com> Reviewed-by: NShakeel Butt <shakeelb@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Suthikulpanit, Suravee 提交于
The function svm_refresh_apicv_exec_ctrl() always returning prematurely as kvm_vcpu_apicv_active() always return false when calling from the function arch/x86/kvm/x86.c:kvm_vcpu_deactivate_apicv(). This is because the apicv_active is set to false just before calling refresh_apicv_exec_ctrl(). Also, we need to mark VMCB_AVIC bit as dirty instead of VMCB_INTR. So, fix svm_refresh_apicv_exec_ctrl() to properly deactivate AVIC. Fixes: 67034bb9 ('KVM: SVM: Add irqchip_split() checks before enabling AVIC') Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: NSuravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Suthikulpanit, Suravee 提交于
Current SVM AVIC driver makes two incorrect assumptions: 1. APIC LDR register cannot be zero 2. APIC DFR for all vCPUs must be the same LDR=0 means the local APIC does not support logical destination mode. Therefore, the driver should mark any previously assigned logical APIC ID table entry as invalid, and return success. Also, DFR is specific to a particular local APIC, and can be different among all vCPUs (as observed on Windows 10). These incorrect assumptions cause Windows 10 and FreeBSD VMs to fail to boot with AVIC enabled. So, instead of flush the whole logical APIC ID table, handle DFR and LDR for each vCPU independently. Fixes: 18f40c53 ('svm: Add VMEXIT handlers for AVIC') Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Reported-by: NJulian Stecklina <jsteckli@amazon.de> Signed-off-by: NSuravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 26 1月, 2019 4 次提交
-
-
由 Gustavo A. R. Silva 提交于
In preparation to enabling -Wimplicit-fallthrough, mark switch cases where we are expecting to fall through. This patch fixes the following warnings: arch/x86/kvm/lapic.c:1037:27: warning: this statement may fall through [-Wimplicit-fallthrough=] arch/x86/kvm/lapic.c:1876:3: warning: this statement may fall through [-Wimplicit-fallthrough=] arch/x86/kvm/hyperv.c:1637:6: warning: this statement may fall through [-Wimplicit-fallthrough=] arch/x86/kvm/svm.c:4396:6: warning: this statement may fall through [-Wimplicit-fallthrough=] arch/x86/kvm/mmu.c:4372:36: warning: this statement may fall through [-Wimplicit-fallthrough=] arch/x86/kvm/x86.c:3835:6: warning: this statement may fall through [-Wimplicit-fallthrough=] arch/x86/kvm/x86.c:7938:23: warning: this statement may fall through [-Wimplicit-fallthrough=] arch/x86/kvm/vmx/vmx.c:2015:6: warning: this statement may fall through [-Wimplicit-fallthrough=] arch/x86/kvm/vmx/vmx.c:1773:6: warning: this statement may fall through [-Wimplicit-fallthrough=] Warning level 3 was used: -Wimplicit-fallthrough=3 This patch is part of the ongoing efforts to enabling -Wimplicit-fallthrough. Signed-off-by: NGustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Vitaly Kuznetsov 提交于
kvm-unit-tests' eventinj "NMI failing on IDT" test results in NMI being delivered to the host (L1) when it's running nested. The problem seems to be: svm_complete_interrupts() raises 'nmi_injected' flag but later we decide to reflect EXIT_NPF to L1. The flag remains pending and we do NMI injection upon entry so it got delivered to L1 instead of L2. It seems that VMX code solves the same issue in prepare_vmcs12(), this was introduced with code refactoring in commit 5f3d5799 ("KVM: nVMX: Rework event injection and recovery"). Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Suravee Suthikulpanit 提交于
In case of incomplete IPI with invalid interrupt type, the current SVM driver does not properly emulate the IPI, and fails to boot FreeBSD guests with multiple vcpus when enabling AVIC. Fix this by update APIC ICR high/low registers, which also emulate sending the IPI. Signed-off-by: NSuravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Suravee Suthikulpanit 提交于
Print warning message when IPI target ID is invalid due to one of the following reasons: * In logical mode: cluster > max_cluster (64) * In physical mode: target > max_physical (512) * Address is not present in the physical or logical ID tables Signed-off-by: NSuravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 12 1月, 2019 1 次提交
-
-
由 David Rientjes 提交于
By code inspection, it was found that multiple calls to KVM_SEV_INIT could deplete asid bits and overwrite kvm_sev_info's regions_list. Multiple calls to KVM_SVM_INIT is not likely to occur with QEMU, but this should likely be fixed anyway. This code is serialized by kvm->lock. Fixes: 1654efcb ("KVM: SVM: Add KVM_SEV_INIT command") Reported-by: NCfir Cohen <cfir@google.com> Signed-off-by: NDavid Rientjes <rientjes@google.com> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
- 21 12月, 2018 4 次提交
-
-
由 Uros Bizjak 提交于
Recently the minimum required version of binutils was changed to 2.20, which supports all SVM instruction mnemonics. The patch removes all .byte #defines and uses real instruction mnemonics instead. Signed-off-by: NUros Bizjak <ubizjak@gmail.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Chao Peng 提交于
Expose Intel Processor Trace to guest only when the PT works in Host-Guest mode. Signed-off-by: NChao Peng <chao.p.peng@linux.intel.com> Signed-off-by: NLuwei Kang <luwei.kang@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Tambe, William 提交于
Currently, the nested guest's PAUSE intercept intentions are not being honored. Instead, since the L0 hypervisor's pause_filter_count and pause_filter_thresh values are still in place, these values are used instead of those programmed in the VMCB by the L1 hypervisor. To honor the desired PAUSE intercept support of the L1 hypervisor, the L0 hypervisor must use the PAUSE filtering fields of the L1 hypervisor. This requires saving and restoring of both the L0 and L1 hypervisor's PAUSE filtering fields. Signed-off-by: NWilliam Tambe <william.tambe@amd.com> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
由 Vitaly Kuznetsov 提交于
AMD doesn't seem to implement MSR_IA32_MCG_EXT_CTL and svm code in kvm knows nothing about it, however, this MSR is among emulated_msrs and thus returned with KVM_GET_MSR_INDEX_LIST. The consequent KVM_GET_MSRS, of course, fails. Report the MSR as unsupported to not confuse userspace. Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
- 20 12月, 2018 1 次提交
-
-
由 Vitaly Kuznetsov 提交于
Recent optimizations in MMU code broke nested SVM with NPT in L1 completely: when we do nested_svm_{,un}init_mmu_context() we want to switch from TDP MMU to shadow MMU, both init_kvm_tdp_mmu() and kvm_init_shadow_mmu() check if re-configuration is needed by looking at cache source data. The data, however, doesn't change - it's only the type of the MMU which changes. We end up not re-initializing guest MMU as shadow and everything goes off the rails. The issue could have been fixed by putting MMU type into extended MMU role but this is not really needed. We can just split root and guest MMUs the exact same way we did for nVMX, their types never change in the lifetime of a vCPU. There is still room for improvement: currently, we reset all MMU roots when switching from L1 to L2 and back and this is not needed. Fixes: 7dcd5755 ("x86/kvm/mmu: check if tdp/shadow MMU reconfiguration is needed") Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 15 12月, 2018 3 次提交
-
-
由 Marc Orr 提交于
Previously, the guest_fpu field was embedded in the kvm_vcpu_arch struct. Unfortunately, the field is quite large, (e.g., 4352 bytes on my current setup). This bloats the kvm_vcpu_arch struct for x86 into an order 3 memory allocation, which can become a problem on overcommitted machines. Thus, this patch moves the fpu state outside of the kvm_vcpu_arch struct. With this patch applied, the kvm_vcpu_arch struct is reduced to 15168 bytes for vmx on my setup when building the kernel with kvmconfig. Suggested-by: NDave Hansen <dave.hansen@intel.com> Signed-off-by: NMarc Orr <marcorr@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Vitaly Kuznetsov 提交于
The upcoming KVM_GET_SUPPORTED_HV_CPUID ioctl will need to return Enlightened VMCS version in HYPERV_CPUID_NESTED_FEATURES.EAX when it was enabled. Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Peng Hao 提交于
structure svm_init_data is never used. So remove it. Signed-off-by: NPeng Hao <peng.hao2@zte.com.cn> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 27 11月, 2018 4 次提交
-
-
由 Paolo Bonzini 提交于
For some reason, kvm_x86_ops->write_l1_tsc_offset() skipped trace of change to active TSC offset in case vCPU is in guest-mode. This patch changes write_l1_tsc_offset() behavior to trace any change to active TSC offset to aid debugging. The VMX code is changed to look more similar to SVM, which is in my opinion nicer. Based on a patch by Liran Alon. Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Jim Mattson 提交于
Previously, we only called indirect_branch_prediction_barrier on the logical CPU that freed a vmcb. This function should be called on all logical CPUs that last loaded the vmcb in question. Fixes: 15d45071 ("KVM/x86: Add IBPB support") Reported-by: NNeel Natu <neelnatu@google.com> Signed-off-by: NJim Mattson <jmattson@google.com> Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: stable@vger.kernel.org Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Leonid Shatz 提交于
Since commit e79f245d ("X86/KVM: Properly update 'tsc_offset' to represent the running guest"), vcpu->arch.tsc_offset meaning was changed to always reflect the tsc_offset value set on active VMCS. Regardless if vCPU is currently running L1 or L2. However, above mentioned commit failed to also change kvm_vcpu_write_tsc_offset() to set vcpu->arch.tsc_offset correctly. This is because vmx_write_tsc_offset() could set the tsc_offset value in active VMCS to given offset parameter *plus vmcs12->tsc_offset*. However, kvm_vcpu_write_tsc_offset() just sets vcpu->arch.tsc_offset to given offset parameter. Without taking into account the possible addition of vmcs12->tsc_offset. (Same is true for SVM case). Fix this issue by changing kvm_x86_ops->write_tsc_offset() to return actually set tsc_offset in active VMCS and modify kvm_vcpu_write_tsc_offset() to set returned value in vcpu->arch.tsc_offset. In addition, rename write_tsc_offset() callback to write_l1_tsc_offset() to make it clear that it is meant to set L1 TSC offset. Fixes: e79f245d ("X86/KVM: Properly update 'tsc_offset' to represent the running guest") Reviewed-by: NLiran Alon <liran.alon@oracle.com> Reviewed-by: NMihai Carabas <mihai.carabas@oracle.com> Reviewed-by: NKrish Sadhukhan <krish.sadhukhan@oracle.com> Signed-off-by: NLeonid Shatz <leonid.shatz@oracle.com> Cc: stable@vger.kernel.org Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Wei Wang 提交于
There is a race condition when accessing kvm->arch.apic_access_page_done. Due to it, x86_set_memory_region will fail when creating the second vcpu for a svm guest. Add a mutex_lock to serialize the accesses to apic_access_page_done. This lock is also used by vmx for the same purpose. Signed-off-by: NWei Wang <wawei@amazon.de> Signed-off-by: NAmadeusz Juskowiak <ajusk@amazon.de> Signed-off-by: NJulian Stecklina <jsteckli@amazon.de> Signed-off-by: NSuravee Suthikulpanit <suravee.suthikulpanit@amd.com> Reviewed-by: NJoerg Roedel <jroedel@suse.de> Cc: stable@vger.kernel.org Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 18 10月, 2018 2 次提交
-
-
由 Uros Bizjak 提交于
x86_64 zero-extends 32bit xor operation to a full 64bit register. Also add a comment and remove unnecessary instruction suffix in vmx.c Signed-off-by: NUros Bizjak <ubizjak@gmail.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Jim Mattson 提交于
When exception payloads are enabled by userspace (which is not yet possible) and a #PF is raised in L2, defer the setting of CR2 until the #PF is delivered. This allows the L1 hypervisor to intercept the fault before CR2 is modified. For backwards compatibility, when exception payloads are not enabled by userspace, kvm_multiple_exception modifies CR2 when the #PF exception is raised. Reported-by: NJim Mattson <jmattson@google.com> Suggested-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NJim Mattson <jmattson@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 17 10月, 2018 2 次提交
-
-
由 Vitaly Kuznetsov 提交于
Enlightened VMCS is opt-in. The current version does not contain all fields supported by nested VMX so we must not advertise the corresponding VMX features if enlightened VMCS is enabled. Userspace is given the enlightened VMCS version supported by KVM as part of enabling KVM_CAP_HYPERV_ENLIGHTENED_VMCS. The version is to be advertised to the nested hypervisor, currently done via a cpuid leaf for Hyper-V. Suggested-by: NLadi Prosek <lprosek@redhat.com> Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: NLiran Alon <liran.alon@oracle.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Vitaly Kuznetsov 提交于
As a preparation to full MMU split between L1 and L2 make vcpu->arch.mmu a pointer to the currently used mmu. For now, this is always vcpu->arch.root_mmu. No functional change. Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Reviewed-by: NSean Christopherson <sean.j.christopherson@intel.com>
-
- 10 10月, 2018 1 次提交
-
-
由 Paolo Bonzini 提交于
SEV requires access to the AMD cryptographic device APIs, and this does not work when KVM is builtin and the crypto driver is a module. Actually the Kconfig conditions for CONFIG_KVM_AMD_SEV try to disable SEV in that case, but it does not work because the actual crypto calls are not culled, only sev_hardware_setup() is. This patch adds two CONFIG_KVM_AMD_SEV checks that gate all the remaining SEV code; it fixes this particular configuration, and drops 5 KiB of code when CONFIG_KVM_AMD_SEV=n. Reported-by: NGuenter Roeck <linux@roeck-us.net> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 20 9月, 2018 2 次提交
-
-
由 Sean Christopherson 提交于
A VMX preemption timer value of '0' is guaranteed to cause a VMExit prior to the CPU executing any instructions in the guest. Use the preemption timer (if it's supported) to trigger immediate VMExit in place of the current method of sending a self-IPI. This ensures that pending VMExit injection to L1 occurs prior to executing any instructions in the guest (regardless of nesting level). When deferring VMExit injection, KVM generates an immediate VMExit from the (possibly nested) guest by sending itself an IPI. Because hardware interrupts are blocked prior to VMEnter and are unblocked (in hardware) after VMEnter, this results in taking a VMExit(INTR) before any guest instruction is executed. But, as this approach relies on the IPI being received before VMEnter executes, it only works as intended when KVM is running as L0. Because there are no architectural guarantees regarding when IPIs are delivered, when running nested the INTR may "arrive" long after L2 is running e.g. L0 KVM doesn't force an immediate switch to L1 to deliver an INTR. For the most part, this unintended delay is not an issue since the events being injected to L1 also do not have architectural guarantees regarding their timing. The notable exception is the VMX preemption timer[1], which is architecturally guaranteed to cause a VMExit prior to executing any instructions in the guest if the timer value is '0' at VMEnter. Specifically, the delay in injecting the VMExit causes the preemption timer KVM unit test to fail when run in a nested guest. Note: this approach is viable even on CPUs with a broken preemption timer, as broken in this context only means the timer counts at the wrong rate. There are no known errata affecting timer value of '0'. [1] I/O SMIs also have guarantees on when they arrive, but I have no idea if/how those are emulated in KVM. Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> [Use a hook for SVM instead of leaving the default in x86.c - Paolo] Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Andy Shevchenko 提交于
Switch to bitmap_zalloc() to show clearly what we are allocating. Besides that it returns pointer of bitmap type instead of opaque void *. Signed-off-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 30 8月, 2018 3 次提交
-
-
由 Sean Christopherson 提交于
Lack of the kvm_ prefix gives the impression that it's a VMX or SVM specific function, and there's no conflict that prevents adding the kvm_ prefix. Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
由 Sean Christopherson 提交于
Re-execution after an emulation decode failure is only intended to handle a case where two or vCPUs race to write a shadowed page, i.e. we should never re-execute an instruction as part of RSM emulation. Add a new helper, kvm_emulate_instruction_from_buffer(), to support emulating from a pre-defined buffer. This eliminates the last direct call to x86_emulate_instruction() outside of kvm_mmu_page_fault(), which means x86_emulate_instruction() can be unexported in a future patch. Fixes: 7607b717 ("KVM: SVM: install RSM intercept") Cc: Brijesh Singh <brijesh.singh@amd.com> Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Cc: stable@vger.kernel.org Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
由 Colin Ian King 提交于
Variable dst_vaddr_end is being assigned but is never used hence it is redundant and can be removed. Cleans up clang warning: variable 'dst_vaddr_end' set but not used [-Wunused-but-set-variable] Signed-off-by: NColin Ian King <colin.king@canonical.com> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
- 22 8月, 2018 1 次提交
-
-
由 Thomas Gleixner 提交于
Mikhail reported the following lockdep splat: WARNING: possible irq lock inversion dependency detected CPU 0/KVM/10284 just changed the state of lock: 000000000d538a88 (&st->lock){+...}, at: speculative_store_bypass_update+0x10b/0x170 but this lock was taken by another, HARDIRQ-safe lock in the past: (&(&sighand->siglock)->rlock){-.-.} and interrupts could create inverse lock ordering between them. Possible interrupt unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&st->lock); local_irq_disable(); lock(&(&sighand->siglock)->rlock); lock(&st->lock); <Interrupt> lock(&(&sighand->siglock)->rlock); *** DEADLOCK *** The code path which connects those locks is: speculative_store_bypass_update() ssb_prctl_set() do_seccomp() do_syscall_64() In svm_vcpu_run() speculative_store_bypass_update() is called with interupts enabled via x86_virt_spec_ctrl_set_guest/host(). This is actually a false positive, because GIF=0 so interrupts are disabled even if IF=1; however, we can easily move the invocations of x86_virt_spec_ctrl_set_guest/host() into the interrupt disabled region to cure it, and it's a good idea to keep the GIF=0/IF=1 area as small and self-contained as possible. Fixes: 1f50ddb4 ("x86/speculation: Handle HT correctly on AMD") Reported-by: NMikhail Gavrilov <mikhail.v.gavrilov@gmail.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Tested-by: NMikhail Gavrilov <mikhail.v.gavrilov@gmail.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Borislav Petkov <bp@suse.de> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: kvm@vger.kernel.org Cc: x86@kernel.org Cc: stable@vger.kernel.org Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 06 8月, 2018 2 次提交
-
-
由 Junaid Shahid 提交于
This needs a minor bug fix. The updated patch is as follows. Thanks, Junaid ------------------------------------------------------------------------------ kvm_mmu_invlpg() and kvm_mmu_invpcid_gva() only need to flush the TLB entries for the specific guest virtual address, instead of flushing all TLB entries associated with the VM. Signed-off-by: NJunaid Shahid <junaids@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Junaid Shahid 提交于
Remove the implicit flush from the set_cr3 handlers, so that the callers are able to decide whether to flush the TLB or not. Signed-off-by: NJunaid Shahid <junaids@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 13 6月, 2018 1 次提交
-
-
由 Kees Cook 提交于
The kmalloc() function has a 2-factor argument form, kmalloc_array(). This patch replaces cases of: kmalloc(a * b, gfp) with: kmalloc_array(a * b, gfp) as well as handling cases of: kmalloc(a * b * c, gfp) with: kmalloc(array3_size(a, b, c), gfp) as it's slightly less ugly than: kmalloc_array(array_size(a, b), c, gfp) This does, however, attempt to ignore constant size factors like: kmalloc(4 * 1024, gfp) though any constants defined via macros get caught up in the conversion. Any factors with a sizeof() of "unsigned char", "char", and "u8" were dropped, since they're redundant. The tools/ directory was manually excluded, since it has its own implementation of kmalloc(). The Coccinelle script used for this was: // Fix redundant parens around sizeof(). @@ type TYPE; expression THING, E; @@ ( kmalloc( - (sizeof(TYPE)) * E + sizeof(TYPE) * E , ...) | kmalloc( - (sizeof(THING)) * E + sizeof(THING) * E , ...) ) // Drop single-byte sizes and redundant parens. @@ expression COUNT; typedef u8; typedef __u8; @@ ( kmalloc( - sizeof(u8) * (COUNT) + COUNT , ...) | kmalloc( - sizeof(__u8) * (COUNT) + COUNT , ...) | kmalloc( - sizeof(char) * (COUNT) + COUNT , ...) | kmalloc( - sizeof(unsigned char) * (COUNT) + COUNT , ...) | kmalloc( - sizeof(u8) * COUNT + COUNT , ...) | kmalloc( - sizeof(__u8) * COUNT + COUNT , ...) | kmalloc( - sizeof(char) * COUNT + COUNT , ...) | kmalloc( - sizeof(unsigned char) * COUNT + COUNT , ...) ) // 2-factor product with sizeof(type/expression) and identifier or constant. @@ type TYPE; expression THING; identifier COUNT_ID; constant COUNT_CONST; @@ ( - kmalloc + kmalloc_array ( - sizeof(TYPE) * (COUNT_ID) + COUNT_ID, sizeof(TYPE) , ...) | - kmalloc + kmalloc_array ( - sizeof(TYPE) * COUNT_ID + COUNT_ID, sizeof(TYPE) , ...) | - kmalloc + kmalloc_array ( - sizeof(TYPE) * (COUNT_CONST) + COUNT_CONST, sizeof(TYPE) , ...) | - kmalloc + kmalloc_array ( - sizeof(TYPE) * COUNT_CONST + COUNT_CONST, sizeof(TYPE) , ...) | - kmalloc + kmalloc_array ( - sizeof(THING) * (COUNT_ID) + COUNT_ID, sizeof(THING) , ...) | - kmalloc + kmalloc_array ( - sizeof(THING) * COUNT_ID + COUNT_ID, sizeof(THING) , ...) | - kmalloc + kmalloc_array ( - sizeof(THING) * (COUNT_CONST) + COUNT_CONST, sizeof(THING) , ...) | - kmalloc + kmalloc_array ( - sizeof(THING) * COUNT_CONST + COUNT_CONST, sizeof(THING) , ...) ) // 2-factor product, only identifiers. @@ identifier SIZE, COUNT; @@ - kmalloc + kmalloc_array ( - SIZE * COUNT + COUNT, SIZE , ...) // 3-factor product with 1 sizeof(type) or sizeof(expression), with // redundant parens removed. @@ expression THING; identifier STRIDE, COUNT; type TYPE; @@ ( kmalloc( - sizeof(TYPE) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kmalloc( - sizeof(TYPE) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kmalloc( - sizeof(TYPE) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kmalloc( - sizeof(TYPE) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(TYPE)) , ...) | kmalloc( - sizeof(THING) * (COUNT) * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kmalloc( - sizeof(THING) * (COUNT) * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kmalloc( - sizeof(THING) * COUNT * (STRIDE) + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) | kmalloc( - sizeof(THING) * COUNT * STRIDE + array3_size(COUNT, STRIDE, sizeof(THING)) , ...) ) // 3-factor product with 2 sizeof(variable), with redundant parens removed. @@ expression THING1, THING2; identifier COUNT; type TYPE1, TYPE2; @@ ( kmalloc( - sizeof(TYPE1) * sizeof(TYPE2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | kmalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2)) , ...) | kmalloc( - sizeof(THING1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | kmalloc( - sizeof(THING1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(THING1), sizeof(THING2)) , ...) | kmalloc( - sizeof(TYPE1) * sizeof(THING2) * COUNT + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) | kmalloc( - sizeof(TYPE1) * sizeof(THING2) * (COUNT) + array3_size(COUNT, sizeof(TYPE1), sizeof(THING2)) , ...) ) // 3-factor product, only identifiers, with redundant parens removed. @@ identifier STRIDE, SIZE, COUNT; @@ ( kmalloc( - (COUNT) * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - COUNT * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - COUNT * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - (COUNT) * (STRIDE) * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - COUNT * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - (COUNT) * STRIDE * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - (COUNT) * (STRIDE) * (SIZE) + array3_size(COUNT, STRIDE, SIZE) , ...) | kmalloc( - COUNT * STRIDE * SIZE + array3_size(COUNT, STRIDE, SIZE) , ...) ) // Any remaining multi-factor products, first at least 3-factor products, // when they're not all constants... @@ expression E1, E2, E3; constant C1, C2, C3; @@ ( kmalloc(C1 * C2 * C3, ...) | kmalloc( - (E1) * E2 * E3 + array3_size(E1, E2, E3) , ...) | kmalloc( - (E1) * (E2) * E3 + array3_size(E1, E2, E3) , ...) | kmalloc( - (E1) * (E2) * (E3) + array3_size(E1, E2, E3) , ...) | kmalloc( - E1 * E2 * E3 + array3_size(E1, E2, E3) , ...) ) // And then all remaining 2 factors products when they're not all constants, // keeping sizeof() as the second factor argument. @@ expression THING, E1, E2; type TYPE; constant C1, C2, C3; @@ ( kmalloc(sizeof(THING) * C2, ...) | kmalloc(sizeof(TYPE) * C2, ...) | kmalloc(C1 * C2 * C3, ...) | kmalloc(C1 * C2, ...) | - kmalloc + kmalloc_array ( - sizeof(TYPE) * (E2) + E2, sizeof(TYPE) , ...) | - kmalloc + kmalloc_array ( - sizeof(TYPE) * E2 + E2, sizeof(TYPE) , ...) | - kmalloc + kmalloc_array ( - sizeof(THING) * (E2) + E2, sizeof(THING) , ...) | - kmalloc + kmalloc_array ( - sizeof(THING) * E2 + E2, sizeof(THING) , ...) | - kmalloc + kmalloc_array ( - (E1) * E2 + E1, E2 , ...) | - kmalloc + kmalloc_array ( - (E1) * (E2) + E1, E2 , ...) | - kmalloc + kmalloc_array ( - E1 * E2 + E1, E2 , ...) ) Signed-off-by: NKees Cook <keescook@chromium.org>
-
- 06 6月, 2018 1 次提交
-
-
由 Konrad Rzeszutek Wilk 提交于
The AMD document outlining the SSBD handling 124441_AMD64_SpeculativeStoreBypassDisable_Whitepaper_final.pdf mentions that if CPUID 8000_0008.EBX[24] is set we should be using the SPEC_CTRL MSR (0x48) over the VIRT SPEC_CTRL MSR (0xC001_011f) for speculative store bypass disable. This in effect means we should clear the X86_FEATURE_VIRT_SSBD flag so that we would prefer the SPEC_CTRL MSR. See the document titled: 124441_AMD64_SpeculativeStoreBypassDisable_Whitepaper_final.pdf A copy of this document is available at https://bugzilla.kernel.org/show_bug.cgi?id=199889Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: Tom Lendacky <thomas.lendacky@amd.com> Cc: Janakarajan Natarajan <Janakarajan.Natarajan@amd.com> Cc: kvm@vger.kernel.org Cc: KarimAllah Ahmed <karahmed@amazon.de> Cc: andrew.cooper3@citrix.com Cc: Joerg Roedel <joro@8bytes.org> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Borislav Petkov <bp@suse.de> Cc: David Woodhouse <dwmw@amazon.co.uk> Cc: Kees Cook <keescook@chromium.org> Link: https://lkml.kernel.org/r/20180601145921.9500-3-konrad.wilk@oracle.com
-
- 02 6月, 2018 1 次提交
-
-
由 Marc Orr 提交于
The kvm struct has been bloating. For example, it's tens of kilo-bytes for x86, which turns out to be a large amount of memory to allocate contiguously via kzalloc. Thus, this patch does the following: 1. Uses architecture-specific routines to allocate the kvm struct via vzalloc for x86. 2. Switches arm to __KVM_HAVE_ARCH_VM_ALLOC so that it can use vzalloc when has_vhe() is true. Other architectures continue to default to kalloc, as they have a dependency on kalloc or have a small-enough struct kvm. Signed-off-by: NMarc Orr <marcorr@google.com> Reviewed-by: NMarc Zyngier <marc.zyngier@arm.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-