- 06 1月, 2023 1 次提交
-
-
由 Jim Mattson 提交于
mainline inclusion from mainline-v6.2-rc1 commit 2e7eab81 category: bugfix bugzilla: 188212 CVE: CVE-2022-2196 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit?id=2e7eab81425ad6c875f2ed47c0ce01e78afc38a5 -------------------------------- According to Intel's document on Indirect Branch Restricted Speculation, "Enabling IBRS does not prevent software from controlling the predicted targets of indirect branches of unrelated software executed later at the same predictor mode (for example, between two different user applications, or two different virtual machines). Such isolation can be ensured through use of the Indirect Branch Predictor Barrier (IBPB) command." This applies to both basic and enhanced IBRS. Since L1 and L2 VMs share hardware predictor modes (guest-user and guest-kernel), hardware IBRS is not sufficient to virtualize IBRS. (The way that basic IBRS is implemented on pre-eIBRS parts, hardware IBRS is actually sufficient in practice, even though it isn't sufficient architecturally.) For virtual CPUs that support IBRS, add an indirect branch prediction barrier on emulated VM-exit, to ensure that the predicted targets of indirect branches executed in L1 cannot be controlled by software that was executed in L2. Since we typically don't intercept guest writes to IA32_SPEC_CTRL, perform the IBPB at emulated VM-exit regardless of the current IA32_SPEC_CTRL.IBRS value, even though the IBPB could technically be deferred until L1 sets IA32_SPEC_CTRL.IBRS, if IA32_SPEC_CTRL.IBRS is clear at emulated VM-exit. This is CVE-2022-2196. Fixes: 5c911bef ("KVM: nVMX: Skip IBPB when switching between vmcs01 and vmcs02") Cc: Sean Christopherson <seanjc@google.com> Signed-off-by: NJim Mattson <jmattson@google.com> Reviewed-by: NSean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20221019213620.1953281-3-jmattson@google.comSigned-off-by: NSean Christopherson <seanjc@google.com> Signed-off-by: NChenXiaoSong <chenxiaosong2@huawei.com> Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com> (cherry picked from commit 04856b0e)
-
- 24 11月, 2022 3 次提交
-
-
由 Kevin Tian 提交于
mainline inclusion from mainline-v5.17-rc1 commit b5274b1b category: feature bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I5RQLJ CVE: NA Intel-SIG: commit b5274b1b kvm: x86: Disable interception for IA32_XFD on demand. -------------------------------- Always intercepting IA32_XFD causes non-negligible overhead when this register is updated frequently in the guest. Disable r/w emulation after intercepting the first WRMSR(IA32_XFD) with a non-zero value. Disable WRMSR emulation implies that IA32_XFD becomes out-of-sync with the software states in fpstate and the per-cpu xfd cache. This leads to two additional changes accordingly: - Call fpu_sync_guest_vmexit_xfd_state() after vm-exit to bring software states back in-sync with the MSR, before handle_exit_irqoff() is called. - Always trap #NM once write interception is disabled for IA32_XFD. The #NM exception is rare if the guest doesn't use dynamic features. Otherwise, there is at most one exception per guest task given a dynamic feature. p.s. We have confirmed that SDM is being revised to say that when setting IA32_XFD[18] the AMX register state is not guaranteed to be preserved. This clarification avoids adding mess for a creative guest which sets IA32_XFD[18]=1 before saving active AMX state to its own storage. Signed-off-by: NKevin Tian <kevin.tian@intel.com> Signed-off-by: NJing Liu <jing2.liu@intel.com> Signed-off-by: NYang Zhong <yang.zhong@intel.com> Message-Id: <20220105123532.12586-22-yang.zhong@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NLin Wang <lin.x.wang@intel.com>
-
由 Jing Liu 提交于
mainline inclusion from mainline-v5.17-rc1 commit 61f20813 category: feature bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I5RQLJ CVE: NA Intel-SIG: commit 61f20813 kvm: x86: Disable RDMSR interception of IA32_XFD_ERR. -------------------------------- This saves one unnecessary VM-exit in guest #NM handler, given that the MSR is already restored with the guest value before the guest is resumed. Suggested-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NJing Liu <jing2.liu@intel.com> Signed-off-by: NYang Zhong <yang.zhong@intel.com> Message-Id: <20220105123532.12586-15-yang.zhong@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NLin Wang <lin.x.wang@intel.com>
-
由 Jing Liu 提交于
mainline inclusion from mainline-v5.17-rc1 commit ec5be88a category: feature bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I5RQLJ CVE: NA Intel-SIG: commit ec5be88a kvm: x86: Intercept #NM for saving IA32_XFD_ERR. -------------------------------- Guest IA32_XFD_ERR is generally modified in two places: - Set by CPU when #NM is triggered; - Cleared by guest in its #NM handler; Intercept #NM for the first case when a nonzero value is written to IA32_XFD. Nonzero indicates that the guest is willing to do dynamic fpstate expansion for certain xfeatures, thus KVM needs to manage and virtualize guest XFD_ERR properly. The vcpu exception bitmap is updated in XFD write emulation according to guest_fpu::xfd. Save the current XFD_ERR value to the guest_fpu container in the #NM VM-exit handler. This must be done with interrupt disabled, otherwise the unsaved MSR value may be clobbered by host activity. The saving operation is conducted conditionally only when guest_fpu:xfd includes a non-zero value. Doing so also avoids misread on a platform which doesn't support XFD but #NM is triggered due to L1 interception. Queueing #NM to the guest is postponed to handle_exception_nmi(). This goes through the nested_vmx check so a virtual vmexit is queued instead when #NM is triggered in L2 but L1 wants to intercept it. Restore the host value (always ZERO outside of the host #NM handler) before enabling interrupt. Restore the guest value from the guest_fpu container right before entering the guest (with interrupt disabled). Suggested-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NJing Liu <jing2.liu@intel.com> Signed-off-by: NKevin Tian <kevin.tian@intel.com> Signed-off-by: NYang Zhong <yang.zhong@intel.com> Message-Id: <20220105123532.12586-13-yang.zhong@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NLin Wang <lin.x.wang@intel.com>
-
- 18 11月, 2022 4 次提交
-
-
由 Thomas Gleixner 提交于
mainline inclusion from mainline-v5.16-rc1 commit b56d2795 category: feature bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I590ZC CVE: NA Intel-SIG: commit b56d2795 x86/fpu: Replace the includes of fpu/internal.h. -------------------------------- Now that the file is empty, fixup all references with the proper includes and delete the former kitchen sink. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NBorislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20211015011540.001197214@linutronix.deSigned-off-by: NLin Wang <lin.x.wang@intel.com> Signed-off-by: NAichun Shi <aichun.shi@intel.com>
-
由 Sean Christopherson 提交于
stable inclusion from stable-v5.10.137 commit c72a9b1d0dadfd85d19eaea81f61f7b286c57a31 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I60PLB Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=c72a9b1d0dadfd85d19eaea81f61f7b286c57a31 -------------------------------- [ Upstream commit c2fe3cd4 ] Split out VMX's checks on CR4.VMXE to a dedicated hook, .is_valid_cr4(), and invoke the new hook from kvm_valid_cr4(). This fixes an issue where KVM_SET_SREGS would return success while failing to actually set CR4. Fixing the issue by explicitly checking kvm_x86_ops.set_cr4()'s return in __set_sregs() is not a viable option as KVM has already stuffed a variety of vCPU state. Note, kvm_valid_cr4() and is_valid_cr4() have different return types and inverted semantics. This will be remedied in a future patch. Fixes: 5e1746d6 ("KVM: nVMX: Allow setting the VMXE bit in CR4") Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20201007014417.29276-5-sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com> Reviewed-by: NWei Li <liwei391@huawei.com>
-
由 Sean Christopherson 提交于
stable inclusion from stable-v5.10.137 commit da7f731f2ed5b4a082567967ce74be274aab2daf category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I60PLB Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=da7f731f2ed5b4a082567967ce74be274aab2daf -------------------------------- [ Upstream commit a447e38a ] Drop vmx_set_cr4()'s explicit check on the 'nested' module param now that common x86 handles the check by incorporating VMXE into the CR4 reserved bits, via kvm_cpu_caps. X86_FEATURE_VMX is set in kvm_cpu_caps (by vmx_set_cpu_caps()), if and only if 'nested' is true. No functional change intended. Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20201007014417.29276-3-sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com> Reviewed-by: NWei Li <liwei391@huawei.com>
-
由 Sean Christopherson 提交于
stable inclusion from stable-v5.10.137 commit 8b8b376903b32d3d854f39eeebe018169c920cb6 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I60PLB Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=8b8b376903b32d3d854f39eeebe018169c920cb6 -------------------------------- [ Upstream commit d3a9e414 ] Drop vmx_set_cr4()'s somewhat hidden guest_cpuid_has() check on VMXE now that common x86 handles the check by incorporating VMXE into the CR4 reserved bits, i.e. in cr4_guest_rsvd_bits. This fixes a bug where KVM incorrectly rejects KVM_SET_SREGS with CR4.VMXE=1 if it's executed before KVM_SET_CPUID{,2}. Fixes: 5e1746d6 ("KVM: nVMX: Allow setting the VMXE bit in CR4") Reported-by: NStas Sergeev <stsp@users.sourceforge.net> Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Message-Id: <20201007014417.29276-2-sean.j.christopherson@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com> Reviewed-by: NWei Li <liwei391@huawei.com>
-
- 03 11月, 2022 5 次提交
-
-
由 Paolo Bonzini 提交于
mainline inclusion from mainline-v5.13-rc2 commit 76ea438b category: feature feature: KVM Bus Lock Debug Exception bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I5RHW7 CVE: N/A Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/ commit/?id=76ea438b Intel-SIG: commit 76ea438b ("KVM: X86: Expose bus lock debug exception to guest") ------------------------------------- KVM: X86: Expose bus lock debug exception to guest Bus lock debug exception is an ability to notify the kernel by an #DB trap after the instruction acquires a bus lock and is executed when CPL>0. This allows the kernel to enforce user application throttling or mitigations. Existence of bus lock debug exception is enumerated via CPUID.(EAX=7,ECX=0).ECX[24]. Software can enable these exceptions by setting bit 2 of the MSR_IA32_DEBUGCTL. Expose the CPUID to guest and emulate the MSR handling when guest enables it. Support for this feature was originally developed by Xiaoyao Li and Chenyi Qiang, but code has since changed enough that this patch has nothing in common with theirs, except for this commit message. Co-developed-by: NXiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: NXiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: NChenyi Qiang <chenyi.qiang@intel.com> Message-Id: <20210202090433.13441-4-chenyi.qiang@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NAichun Shi <aichun.shi@intel.com>
-
由 Chenyi Qiang 提交于
mainline inclusion from mainline-v5.12-rc1 commit 9a3ecd5e category: feature feature: KVM Bus Lock Debug Exception bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I5RHW7 CVE: N/A Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/ commit/?id=9a3ecd5e Intel-SIG: commit 9a3ecd5e ("KVM: X86: Rename DR6_INIT to DR6_ACTIVE_LOW") ------------------------------------- KVM: X86: Rename DR6_INIT to DR6_ACTIVE_LOW DR6_INIT contains the 1-reserved bits as well as the bit that is cleared to 0 when the condition (e.g. RTM) happens. The value can be used to initialize dr6 and also be the XOR mask between the #DB exit qualification (or payload) and DR6. Concerning that DR6_INIT is used as initial value only once, rename it to DR6_ACTIVE_LOW and apply it in other places, which would make the incoming changes for bus lock debug exception more simple. Signed-off-by: NChenyi Qiang <chenyi.qiang@intel.com> Message-Id: <20210202090433.13441-2-chenyi.qiang@intel.com> [Define DR6_FIXED_1 from DR6_ACTIVE_LOW and DR6_VOLATILE. - Paolo] Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NAichun Shi <aichun.shi@intel.com>
-
由 Tao Xu 提交于
mainline inclusion from mainline-v6.0-rc1 commit 2f4073e0 category: feature feature: Notify VM exit bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I5PAJ5 CVE: N/A Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/ commit/?id=2f4073e0 Intel-SIG: commit 2f4073e0 ("KVM: VMX: Enable Notify VM exit") ------------------------------------- KVM: VMX: Enable Notify VM exit There are cases that malicious virtual machines can cause CPU stuck (due to event windows don't open up), e.g., infinite loop in microcode when nested #AC (CVE-2015-5307). No event window means no event (NMI, SMI and IRQ) can be delivered. It leads the CPU to be unavailable to host or other VMs. VMM can enable notify VM exit that a VM exit generated if no event window occurs in VM non-root mode for a specified amount of time (notify window). Feature enabling: - The new vmcs field SECONDARY_EXEC_NOTIFY_VM_EXITING is introduced to enable this feature. VMM can set NOTIFY_WINDOW vmcs field to adjust the expected notify window. - Add a new KVM capability KVM_CAP_X86_NOTIFY_VMEXIT so that user space can query and enable this feature in per-VM scope. The argument is a 64bit value: bits 63:32 are used for notify window, and bits 31:0 are for flags. Current supported flags: - KVM_X86_NOTIFY_VMEXIT_ENABLED: enable the feature with the notify window provided. - KVM_X86_NOTIFY_VMEXIT_USER: exit to userspace once the exits happen. - It's safe to even set notify window to zero since an internal hardware threshold is added to vmcs.notify_window. VM exit handling: - Introduce a vcpu state notify_window_exits to records the count of notify VM exits and expose it through the debugfs. - Notify VM exit can happen incident to delivery of a vector event. Allow it in KVM. - Exit to userspace unconditionally for handling when VM_CONTEXT_INVALID bit is set. Nested handling - Nested notify VM exits are not supported yet. Keep the same notify window control in vmcs02 as vmcs01, so that L1 can't escape the restriction of notify VM exits through launching L2 VM. Notify VM exit is defined in latest Intel Architecture Instruction Set Extensions Programming Reference, chapter 9.2. Co-developed-by: NXiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: NXiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: NTao Xu <tao3.xu@intel.com> Co-developed-by: NChenyi Qiang <chenyi.qiang@intel.com> Signed-off-by: NChenyi Qiang <chenyi.qiang@intel.com> Message-Id: <20220524135624.22988-5-chenyi.qiang@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NAichun Shi <aichun.shi@intel.com>
-
由 Hao Xiang 提交于
mainline inclusion from mainline-v5.15-rc7 commit d61863c6 category: feature feature: KVM Bus Lock VM Exit bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I5RJCB CVE: N/A Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/ commit/?id=d61863c6 Intel-SIG: commit d61863c6 ("KVM: VMX: Remove redundant handling of bus lock vmexit") ------------------------------------- KVM: VMX: Remove redundant handling of bus lock vmexit Hardware may or may not set exit_reason.bus_lock_detected on BUS_LOCK VM-Exits. Dealing with KVM_RUN_X86_BUS_LOCK in handle_bus_lock_vmexit could be redundant when exit_reason.basic is EXIT_REASON_BUS_LOCK. We can remove redundant handling of bus lock vmexit. Unconditionally Set exit_reason.bus_lock_detected in handle_bus_lock_vmexit(), and deal with KVM_RUN_X86_BUS_LOCK only in vmx_handle_exit(). Signed-off-by: NHao Xiang <hao.xiang@linux.alibaba.com> Message-Id: <1634299161-30101-1-git-send-email-hao.xiang@linux.alibaba.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NAichun Shi <aichun.shi@intel.com>
-
由 Chenyi Qiang 提交于
mainline inclusion from mainline-v5.12-rc1 commit fe6b6bc8 category: feature feature: KVM Bus Lock VM Exit bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I5RJCB CVE: N/A Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/ commit/?id=fe6b6bc8 Intel-SIG: commit fe6b6bc8 ("KVM: VMX: Enable bus lock VM exit") ------------------------------------- KVM: VMX: Enable bus lock VM exit Virtual Machine can exploit bus locks to degrade the performance of system. Bus lock can be caused by split locked access to writeback(WB) memory or by using locks on uncacheable(UC) memory. The bus lock is typically >1000 cycles slower than an atomic operation within a cache line. It also disrupts performance on other cores (which must wait for the bus lock to be released before their memory operations can complete). To address the threat, bus lock VM exit is introduced to notify the VMM when a bus lock was acquired, allowing it to enforce throttling or other policy based mitigations. A VMM can enable VM exit due to bus locks by setting a new "Bus Lock Detection" VM-execution control(bit 30 of Secondary Processor-based VM execution controls). If delivery of this VM exit was preempted by a higher priority VM exit (e.g. EPT misconfiguration, EPT violation, APIC access VM exit, APIC write VM exit, exception bitmap exiting), bit 26 of exit reason in vmcs field is set to 1. In current implementation, the KVM exposes this capability through KVM_CAP_X86_BUS_LOCK_EXIT. The user can get the supported mode bitmap (i.e. off and exit) and enable it explicitly (disabled by default). If bus locks in guest are detected by KVM, exit to user space even when current exit reason is handled by KVM internally. Set a new field KVM_RUN_BUS_LOCK in vcpu->run->flags to inform the user space that there is a bus lock detected in guest. Document for Bus Lock VM exit is now available at the latest "Intel Architecture Instruction Set Extensions Programming Reference". Document Link: https://software.intel.com/content/www/us/en/develop/download/intel-architecture-instruction-set-extensions-programming-reference.htmlCo-developed-by: NXiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: NXiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: NChenyi Qiang <chenyi.qiang@intel.com> Message-Id: <20201106090315.18606-4-chenyi.qiang@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NAichun Shi <aichun.shi@intel.com>
-
- 27 10月, 2022 1 次提交
-
-
由 Sean Christopherson 提交于
stable inclusion from stable-v5.10.124 commit d6be031a2f5e27f27f3648bac98d2a35874eaddc category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5L6E7 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=d6be031a2f5e27f27f3648bac98d2a35874eaddc -------------------------------- commit eba04b20 upstream. Switch to GFP_KERNEL_ACCOUNT for a handful of allocations that are clearly associated with a single task/VM. Note, there are a several SEV allocations that aren't accounted, but those can (hopefully) be fixed by using the local stack for memory. Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210331023025.2485960-3-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> [sudip: adjust context] Signed-off-by: NSudip Mukherjee <sudipm.mukherjee@gmail.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com> Reviewed-by: NWei Li <liwei391@huawei.com>
-
- 08 10月, 2022 4 次提交
-
-
由 Chao Gao 提交于
mainline inclusion from mainline-v6.0-rc1 commit d588bb9b category: feature feature: IPI Virtualization bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I5ODSC CVE: N/A Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d588bb9be1da6aa750aa64875fe57369db983d8b Intel-SIG: commit d588bb9b ("KVM: VMX: enable IPI virtualization") ------------------------------------- KVM: VMX: enable IPI virtualization With IPI virtualization enabled, the processor emulates writes to APIC registers that would send IPIs. The processor sets the bit corresponding to the vector in target vCPU's PIR and may send a notification (IPI) specified by NDST and NV fields in target vCPU's Posted-Interrupt Descriptor (PID). It is similar to what IOMMU engine does when dealing with posted interrupt from devices. A PID-pointer table is used by the processor to locate the PID of a vCPU with the vCPU's APIC ID. The table size depends on maximum APIC ID assigned for current VM session from userspace. Allocating memory for PID-pointer table is deferred to vCPU creation, because irqchip mode and VM-scope maximum APIC ID is settled at that point. KVM can skip PID-pointer table allocation if !irqchip_in_kernel(). Like VT-d PI, if a vCPU goes to blocked state, VMM needs to switch its notification vector to wakeup vector. This can ensure that when an IPI for blocked vCPUs arrives, VMM can get control and wake up blocked vCPUs. And if a VCPU is preempted, its posted interrupt notification is suppressed. Note that IPI virtualization can only virualize physical-addressing, flat mode, unicast IPIs. Sending other IPIs would still cause a trap-like APIC-write VM-exit and need to be handled by VMM. Signed-off-by: NChao Gao <chao.gao@intel.com> Signed-off-by: NZeng Guang <guang.zeng@intel.com> Message-Id: <20220419154510.11938-1-guang.zeng@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NJason Zeng <jason.zeng@intel.com>
-
由 Zeng Guang 提交于
mainline inclusion from mainline-v6.0-rc1 commit f08a06c9 category: feature feature: IPI Virtualization bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I5ODSC CVE: N/A Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=f08a06c9a35706349f74b7a18deefe3f89f73e8e Intel-SIG: commit f08a06c9 ("KVM: VMX: Clean up vmx_refresh_apicv_exec_ctrl()") ------------------------------------- KVM: VMX: Clean up vmx_refresh_apicv_exec_ctrl() Remove the condition check cpu_has_secondary_exec_ctrls(). Calling vmx_refresh_apicv_exec_ctrl() premises secondary controls activated and VMCS fields related to APICv valid as well. If it's invoked in wrong circumstance at the worst case, VMX operation will report VMfailValid error without further harmful impact and just functions as if all the secondary controls were 0. Suggested-by: NSean Christopherson <seanjc@google.com> Signed-off-by: NZeng Guang <guang.zeng@intel.com> Message-Id: <20220419153604.11786-1-guang.zeng@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NJason Zeng <jason.zeng@intel.com>
-
由 Robert Hoo 提交于
mainline inclusion from mainline-v6.0-rc1 commit 0b85baa5 category: feature feature: IPI Virtualization bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I5ODSC CVE: N/A Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0b85baa5f46de1c6ad6e4b987905df041f2f80f0 Intel-SIG: commit 0b85baa5 ("KVM: VMX: Report tertiary_exec_control field in dump_vmcs()") ------------------------------------- KVM: VMX: Report tertiary_exec_control field in dump_vmcs() Add tertiary_exec_control field report in dump_vmcs(). Meanwhile, reorganize the dump output of VMCS category as follows. Before change: *** Control State *** PinBased=0x000000ff CPUBased=0xb5a26dfa SecondaryExec=0x061037eb EntryControls=0000d1ff ExitControls=002befff After change: *** Control State *** CPUBased=0xb5a26dfa SecondaryExec=0x061037eb TertiaryExec=0x0000000000000010 PinBased=0x000000ff EntryControls=0000d1ff ExitControls=002befff Reviewed-by: NMaxim Levitsky <mlevitsk@redhat.com> Signed-off-by: NRobert Hoo <robert.hu@linux.intel.com> Signed-off-by: NZeng Guang <guang.zeng@intel.com> Message-Id: <20220419153441.11687-1-guang.zeng@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NJason Zeng <jason.zeng@intel.com>
-
由 Robert Hoo 提交于
mainline inclusion from mainline-v6.0-rc1 commit 1ad4e543 category: feature feature: IPI Virtualization bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I5ODSC CVE: N/A Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=1ad4e5438c67a01620ed67cea959de89f4430515 Intel-SIG: commit 1ad4e543 ("KVM: VMX: Detect Tertiary VM-Execution control when setup VMCS config") ------------------------------------- KVM: VMX: Detect Tertiary VM-Execution control when setup VMCS config Check VMX features on tertiary execution control in VMCS config setup. Sub-features in tertiary execution control to be enabled are adjusted according to hardware capabilities although no sub-feature is enabled in this patch. EVMCSv1 doesn't support tertiary VM-execution control, so disable it when EVMCSv1 is in use. And define the auxiliary functions for Tertiary control field here, using the new BUILD_CONTROLS_SHADOW(). Reviewed-by: NMaxim Levitsky <mlevitsk@redhat.com> Signed-off-by: NRobert Hoo <robert.hu@linux.intel.com> Signed-off-by: NZeng Guang <guang.zeng@intel.com> Message-Id: <20220419153400.11642-1-guang.zeng@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NJason Zeng <jason.zeng@intel.com>
-
- 20 9月, 2022 6 次提交
-
-
由 Josh Poimboeuf 提交于
stable inclusion from stable-v5.10.133 commit 47ae76fb27398e867980d63789058ff7c4f12a35 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5PTAS CVE: CVE-2022-29900,CVE-2022-23816,CVE-2022-29901 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=47ae76fb27398e867980d63789058ff7c4f12a35 -------------------------------- commit bea7e31a upstream. For legacy IBRS to work, the IBRS bit needs to be always re-written after vmexit, even if it's already on. Signed-off-by: NJosh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: NBorislav Petkov <bp@suse.de> Signed-off-by: NThadeu Lima de Souza Cascardo <cascardo@canonical.com> Signed-off-by: NBen Hutchings <ben@decadent.org.uk> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NLin Yujun <linyujun809@huawei.com> Reviewed-by: NZhang Jianhua <chris.zjh@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Josh Poimboeuf 提交于
stable inclusion from stable-v5.10.133 commit 5269be9111e2b66572e78647f2e8948f7fc96466 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5PTAS CVE: CVE-2022-29900,CVE-2022-23816,CVE-2022-29901 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=5269be9111e2b66572e78647f2e8948f7fc96466 -------------------------------- commit fc02735b upstream. On eIBRS systems, the returns in the vmexit return path from __vmx_vcpu_run() to vmx_vcpu_run() are exposed to RSB poisoning attacks. Fix that by moving the post-vmexit spec_ctrl handling to immediately after the vmexit. Signed-off-by: NJosh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: NBorislav Petkov <bp@suse.de> Signed-off-by: NThadeu Lima de Souza Cascardo <cascardo@canonical.com> Signed-off-by: NBen Hutchings <ben@decadent.org.uk> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> conflict: arch/x86/kvm/vmx/vmx.h Signed-off-by: NLin Yujun <linyujun809@huawei.com> Reviewed-by: NZhang Jianhua <chris.zjh@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Josh Poimboeuf 提交于
stable inclusion from stable-v5.10.133 commit 84061fff2ad98a7809f00e88a54f584f84830388 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5PTAS CVE: CVE-2022-29900,CVE-2022-23816,CVE-2022-29901 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=84061fff2ad98a7809f00e88a54f584f84830388 -------------------------------- commit bb066506 upstream. Convert __vmx_vcpu_run()'s 'launched' argument to 'flags', in preparation for doing SPEC_CTRL handling immediately after vmexit, which will need another flag. This is much easier than adding a fourth argument, because this code supports both 32-bit and 64-bit, and the fourth argument on 32-bit would have to be pushed on the stack. Note that __vmx_vcpu_run_flags() is called outside of the noinstr critical section because it will soon start calling potentially traceable functions. Signed-off-by: NJosh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: NBorislav Petkov <bp@suse.de> Signed-off-by: NThadeu Lima de Souza Cascardo <cascardo@canonical.com> Signed-off-by: NBen Hutchings <ben@decadent.org.uk> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> conflict: arch/x86/kvm/vmx/vmx.h Signed-off-by: NLin Yujun <linyujun809@huawei.com> Reviewed-by: NZhang Jianhua <chris.zjh@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Peter Zijlstra 提交于
stable inclusion from stable-v5.10.133 commit 7070bbb66c5303117e4c7651711ea7daae4c64b5 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5PTAS CVE: CVE-2022-29900,CVE-2022-23816,CVE-2022-29901 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=7070bbb66c5303117e4c7651711ea7daae4c64b5 -------------------------------- commit 742ab6df upstream. The recent mmio_stale_data fixes broke the noinstr constraints: vmlinux.o: warning: objtool: vmx_vcpu_enter_exit+0x15b: call to wrmsrl.constprop.0() leaves .noinstr.text section vmlinux.o: warning: objtool: vmx_vcpu_enter_exit+0x1bf: call to kvm_arch_has_assigned_device() leaves .noinstr.text section make it all happy again. Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: NBorislav Petkov <bp@suse.de> Signed-off-by: NThadeu Lima de Souza Cascardo <cascardo@canonical.com> Signed-off-by: NBen Hutchings <ben@decadent.org.uk> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NLin Yujun <linyujun809@huawei.com> Reviewed-by: NZhang Jianhua <chris.zjh@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Uros Bizjak 提交于
stable inclusion from stable-v5.10.133 commit dd87aa5f610be44f195cf5a99b7bc153faf30a3d category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5PTAS CVE: CVE-2022-29900,CVE-2022-23816,CVE-2022-29901 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=dd87aa5f610be44f195cf5a99b7bc153faf30a3d -------------------------------- commit 150f17bf upstream. Replace inline assembly in nested_vmx_check_vmentry_hw with a call to __vmx_vcpu_run. The function is not performance critical, so (double) GPR save/restore in __vmx_vcpu_run can be tolerated, as far as performance effects are concerned. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Sean Christopherson <seanjc@google.com> Reviewed-and-tested-by: NSean Christopherson <seanjc@google.com> Signed-off-by: NUros Bizjak <ubizjak@gmail.com> [sean: dropped versioning info from changelog] Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20201231002702.22237077-5-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NBen Hutchings <ben@decadent.org.uk> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> conflict: arch/x86/kvm/vmx/vmx.h Signed-off-by: NLin Yujun <linyujun809@huawei.com> Reviewed-by: NZhang Jianhua <chris.zjh@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Paolo Bonzini 提交于
mainline inclusion from mainline-v5.19-rc2 commit 6cd88243 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I5PJ7H CVE: CVE-2022-39189 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=6cd88243c7e03845a450795e134b488fc2afb736 ---------------------------------------- If a vCPU is outside guest mode and is scheduled out, it might be in the process of making a memory access. A problem occurs if another vCPU uses the PV TLB flush feature during the period when the vCPU is scheduled out, and a virtual address has already been translated but has not yet been accessed, because this is equivalent to using a stale TLB entry. To avoid this, only report a vCPU as preempted if sure that the guest is at an instruction boundary. A rescheduling request will be delivered to the host physical CPU as an external interrupt, so for simplicity consider any vmexit *not* instruction boundary except for external interrupts. It would in principle be okay to report the vCPU as preempted also if it is sleeping in kvm_vcpu_block(): a TLB flush IPI will incur the vmentry/vmexit overhead unnecessarily, and optimistic spinning is also unlikely to succeed. However, leave it for later because right now kvm_vcpu_check_block() is doing memory accesses. Even though the TLB flush issue only applies to virtual memory address, it's very much preferrable to be conservative. Reported-by: NJann Horn <jannh@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> conflict: arch/x86/kvm/x86.c Signed-off-by: NGuo Mengqi <guomengqi3@huawei.com> Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com> Reviewed-by: NWeilong Chen <chenweilong@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
- 08 7月, 2022 4 次提交
-
-
由 Sean Christopherson 提交于
mainline inclusion from mainline-5.13 commit 72add915 category: feature bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I5EZEK CVE: NA Intel-SIG: commit 72add915 KVM: VMX: Enable SGX virtualization for SGX1, SGX2 and LC. Backport for SGX virtualization support -------------------------------- Enable SGX virtualization now that KVM has the VM-Exit handlers needed to trap-and-execute ENCLS to ensure correctness and/or enforce the CPU model exposed to the guest. Add a KVM module param, "sgx", to allow an admin to disable SGX virtualization independent of the kernel. When supported in hardware and the kernel, advertise SGX1, SGX2 and SGX LC to userspace via CPUID and wire up the ENCLS_EXITING bitmap based on the guest's SGX capabilities, i.e. to allow ENCLS to be executed in an SGX-enabled guest. With the exception of the provision key, all SGX attribute bits may be exposed to the guest. Guest access to the provision key, which is controlled via securityfs, will be added in a future patch. Note, KVM does not yet support exposing ENCLS_C leafs or ENCLV leafs. Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NKai Huang <kai.huang@intel.com> Message-Id: <a99e9c23310c79f2f4175c1af4c4cbcef913c3e5.1618196135.git.kai.huang@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NFan Du <fan.du@intel.com> Signed-off-by: NZhiquan Li <zhiquan1.li@intel.com>
-
由 Sean Christopherson 提交于
mainline inclusion from mainline-5.13 commit 8f102445 category: feature bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I5EZEK CVE: NA Intel-SIG: commit 8f102445 KVM: VMX: Add emulation of SGX Launch Control LE hash MSRs. Backport for SGX virtualization support -------------------------------- Emulate the four Launch Enclave public key hash MSRs (LE hash MSRs) that exist on CPUs that support SGX Launch Control (LC). SGX LC modifies the behavior of ENCLS[EINIT] to use the LE hash MSRs when verifying the key used to sign an enclave. On CPUs without LC support, the LE hash is hardwired into the CPU to an Intel controlled key (the Intel key is also the reset value of the LE hash MSRs). Track the guest's desired hash so that a future patch can stuff the hash into the hardware MSRs when executing EINIT on behalf of the guest, when those MSRs are writable in host. Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Co-developed-by: NKai Huang <kai.huang@intel.com> Signed-off-by: NKai Huang <kai.huang@intel.com> Message-Id: <c58ef601ddf88f3a113add837969533099b1364a.1618196135.git.kai.huang@intel.com> [Add a comment regarding the MSRs being available until SGX is locked. - Paolo] Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NFan Du <fan.du@intel.com> Signed-off-by: NZhiquan Li <zhiquan1.li@intel.com>
-
由 Sean Christopherson 提交于
mainline inclusion from mainline-5.13 commit 9798adbc category: feature bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I5EZEK CVE: NA Intel-SIG: commit 9798adbc KVM: VMX: Frame in ENCLS handler for SGX virtualization. Backport for SGX virtualization support -------------------------------- Introduce sgx.c and sgx.h, along with the framework for handling ENCLS VM-Exits. Add a bool, enable_sgx, that will eventually be wired up to a module param to control whether or not SGX virtualization is enabled at runtime. Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NKai Huang <kai.huang@intel.com> Message-Id: <1c782269608b2f5e1034be450f375a8432fb705d.1618196135.git.kai.huang@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NFan Du <fan.du@intel.com> Signed-off-by: NZhiquan Li <zhiquan1.li@intel.com>
-
由 Sean Christopherson 提交于
mainline inclusion from mainline-5.13 commit 3c0c2ad1 category: feature bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I5EZEK CVE: NA Intel-SIG: commit 3c0c2ad1 KVM: VMX: Add basic handling of VM-Exit from SGX enclave Backport for SGX virtualization support -------------------------------- Add support for handling VM-Exits that originate from a guest SGX enclave. In SGX, an "enclave" is a new CPL3-only execution environment, wherein the CPU and memory state is protected by hardware to make the state inaccesible to code running outside of the enclave. When exiting an enclave due to an asynchronous event (from the perspective of the enclave), e.g. exceptions, interrupts, and VM-Exits, the enclave's state is automatically saved and scrubbed (the CPU loads synthetic state), and then reloaded when re-entering the enclave. E.g. after an instruction based VM-Exit from an enclave, vmcs.GUEST_RIP will not contain the RIP of the enclave instruction that trigered VM-Exit, but will instead point to a RIP in the enclave's untrusted runtime (the guest userspace code that coordinates entry/exit to/from the enclave). To help a VMM recognize and handle exits from enclaves, SGX adds bits to existing VMCS fields, VM_EXIT_REASON.VMX_EXIT_REASON_FROM_ENCLAVE and GUEST_INTERRUPTIBILITY_INFO.GUEST_INTR_STATE_ENCLAVE_INTR. Define the new architectural bits, and add a boolean to struct vcpu_vmx to cache VMX_EXIT_REASON_FROM_ENCLAVE. Clear the bit in exit_reason so that checks against exit_reason do not need to account for SGX, e.g. "if (exit_reason == EXIT_REASON_EXCEPTION_NMI)" continues to work. KVM is a largely a passive observer of the new bits, e.g. KVM needs to account for the bits when propagating information to a nested VMM, but otherwise doesn't need to act differently for the majority of VM-Exits from enclaves. The one scenario that is directly impacted is emulation, which is for all intents and purposes impossible[1] since KVM does not have access to the RIP or instruction stream that triggered the VM-Exit. The inability to emulate is a non-issue for KVM, as most instructions that might trigger VM-Exit unconditionally #UD in an enclave (before the VM-Exit check. For the few instruction that conditionally #UD, KVM either never sets the exiting control, e.g. PAUSE_EXITING[2], or sets it if and only if the feature is not exposed to the guest in order to inject a #UD, e.g. RDRAND_EXITING. But, because it is still possible for a guest to trigger emulation, e.g. MMIO, inject a #UD if KVM ever attempts emulation after a VM-Exit from an enclave. This is architecturally accurate for instruction VM-Exits, and for MMIO it's the least bad choice, e.g. it's preferable to killing the VM. In practice, only broken or particularly stupid guests should ever encounter this behavior. Add a WARN in skip_emulated_instruction to detect any attempt to modify the guest's RIP during an SGX enclave VM-Exit as all such flows should either be unreachable or must handle exits from enclaves before getting to skip_emulated_instruction. [1] Impossible for all practical purposes. Not truly impossible since KVM could implement some form of para-virtualization scheme. [2] PAUSE_LOOP_EXITING only affects CPL0 and enclaves exist only at CPL3, so we also don't need to worry about that interaction. Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NKai Huang <kai.huang@intel.com> Message-Id: <315f54a8507d09c292463ef29104e1d4c62e9090.1618196135.git.kai.huang@intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NFan Du <fan.du@intel.com> Signed-off-by: NZhiquan Li <zhiquan1.li@intel.com>
-
- 06 7月, 2022 2 次提交
-
-
由 Pawan Gupta 提交于
stable inclusion from stable-v5.10.123 commit bde15fdcce44956278b4f50680b7363ca126ffb9 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I5D5RS CVE: CVE-2022-21123,CVE-2022-21125,CVE-2022-21166 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.10.y&id=bde15fdcce44956278b4f50680b7363ca126ffb9 -------------------------------- commit 027bbb88 upstream The enumeration of MD_CLEAR in CPUID(EAX=7,ECX=0).EDX{bit 10} is not an accurate indicator on all CPUs of whether the VERW instruction will overwrite fill buffers. FB_CLEAR enumeration in IA32_ARCH_CAPABILITIES{bit 17} covers the case of CPUs that are not vulnerable to MDS/TAA, indicating that microcode does overwrite fill buffers. Guests running in VMM environments may not be aware of all the capabilities/vulnerabilities of the host CPU. Specifically, a guest may apply MDS/TAA mitigations when a virtual CPU is enumerated as vulnerable to MDS/TAA even when the physical CPU is not. On CPUs that enumerate FB_CLEAR_CTRL the VMM may set FB_CLEAR_DIS to skip overwriting of fill buffers by the VERW instruction. This is done by setting FB_CLEAR_DIS during VMENTER and resetting on VMEXIT. For guests that enumerate FB_CLEAR (explicitly asking for fill buffer clear capability) the VMM will not use FB_CLEAR_DIS. Irrespective of guest state, host overwrites CPU buffers before VMENTER to protect itself from an MMIO capable guest, as part of mitigation for MMIO Stale Data vulnerabilities. Signed-off-by: NPawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Conflicts: arch/x86/kvm/vmx/vmx.h Signed-off-by: NYipeng Zou <zouyipeng@huawei.com> Reviewed-by: NZhang Jianhua <chris.zjh@huawei.com> Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com> Reviewed-by: NLiao Chang <liaochang1@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Pawan Gupta 提交于
stable inclusion from stable-v5.10.123 commit 26f6f231f6a5a79ccc274967939b22602dec76e8 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I5D5RS CVE: CVE-2022-21123,CVE-2022-21125,CVE-2022-21166 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.10.y&id=26f6f231f6a5a79ccc274967939b22602dec76e8 -------------------------------- commit 8cb861e9 upstream Processor MMIO Stale Data is a class of vulnerabilities that may expose data after an MMIO operation. For details please refer to Documentation/admin-guide/hw-vuln/processor_mmio_stale_data.rst. These vulnerabilities are broadly categorized as: Device Register Partial Write (DRPW): Some endpoint MMIO registers incorrectly handle writes that are smaller than the register size. Instead of aborting the write or only copying the correct subset of bytes (for example, 2 bytes for a 2-byte write), more bytes than specified by the write transaction may be written to the register. On some processors, this may expose stale data from the fill buffers of the core that created the write transaction. Shared Buffers Data Sampling (SBDS): After propagators may have moved data around the uncore and copied stale data into client core fill buffers, processors affected by MFBDS can leak data from the fill buffer. Shared Buffers Data Read (SBDR): It is similar to Shared Buffer Data Sampling (SBDS) except that the data is directly read into the architectural software-visible state. An attacker can use these vulnerabilities to extract data from CPU fill buffers using MDS and TAA methods. Mitigate it by clearing the CPU fill buffers using the VERW instruction before returning to a user or a guest. On CPUs not affected by MDS and TAA, user application cannot sample data from CPU fill buffers using MDS or TAA. A guest with MMIO access can still use DRPW or SBDR to extract data architecturally. Mitigate it with VERW instruction to clear fill buffers before VMENTER for MMIO capable guests. Add a kernel parameter mmio_stale_data={off|full|full,nosmt} to control the mitigation. Signed-off-by: NPawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NYipeng Zou <zouyipeng@huawei.com> Reviewed-by: NZhang Jianhua <chris.zjh@huawei.com> Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
- 19 5月, 2022 1 次提交
-
-
由 Sean Christopherson 提交于
stable inclusion from stable-v5.10.101 commit 3aa5c8657292e05e6dfa8fe2316951001dab7e3a bugzilla: https://gitee.com/openeuler/kernel/issues/I5669Z Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=3aa5c8657292e05e6dfa8fe2316951001dab7e3a -------------------------------- [ Upstream commit b9bed78e ] Set vmcs.GUEST_PENDING_DBG_EXCEPTIONS.BS, a.k.a. the pending single-step breakpoint flag, when re-injecting a #DB with RFLAGS.TF=1, and STI or MOVSS blocking is active. Setting the flag is necessary to make VM-Entry consistency checks happy, as VMX has an invariant that if RFLAGS.TF is set and STI/MOVSS blocking is true, then the previous instruction must have been STI or MOV/POP, and therefore a single-step #DB must be pending since the RFLAGS.TF cannot have been set by the previous instruction, i.e. the one instruction delay after setting RFLAGS.TF must have already expired. Normally, the CPU sets vmcs.GUEST_PENDING_DBG_EXCEPTIONS.BS appropriately when recording guest state as part of a VM-Exit, but #DB VM-Exits intentionally do not treat the #DB as "guest state" as interception of the #DB effectively makes the #DB host-owned, thus KVM needs to manually set PENDING_DBG.BS when forwarding/re-injecting the #DB to the guest. Note, although this bug can be triggered by guest userspace, doing so requires IOPL=3, and guest userspace running with IOPL=3 has full access to all I/O ports (from the guest's perspective) and can crash/reboot the guest any number of ways. IOPL=3 is required because STI blocking kicks in if and only if RFLAGS.IF is toggled 0=>1, and if CPL>IOPL, STI either takes a #GP or modifies RFLAGS.VIF, not RFLAGS.IF. MOVSS blocking can be initiated by userspace, but can be coincident with a #DB if and only if DR7.GD=1 (General Detect enabled) and a MOV DR is executed in the MOVSS shadow. MOV DR #GPs at CPL>0, thus MOVSS blocking is problematic only for CPL0 (and only if the guest is crazy enough to access a DR in a MOVSS shadow). All other sources of #DBs are either suppressed by MOVSS blocking (single-step, code fetch, data, and I/O), are mutually exclusive with MOVSS blocking (T-bit task switch), or are already handled by KVM (ICEBP, a.k.a. INT1). This bug was originally found by running tests[1] created for XSA-308[2]. Note that Xen's userspace test emits ICEBP in the MOVSS shadow, which is presumably why the Xen bug was deemed to be an exploitable DOS from guest userspace. KVM already handles ICEBP by skipping the ICEBP instruction and thus clears MOVSS blocking as a side effect of its "emulation". [1] http://xenbits.xenproject.org/docs/xtf/xsa-308_2main_8c_source.html [2] https://xenbits.xen.org/xsa/advisory-308.htmlReported-by: NDavid Woodhouse <dwmw2@infradead.org> Reported-by: NAlexander Graf <graf@amazon.de> Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20220120000624.655815-1-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NYu Liao <liaoyu15@huawei.com> Reviewed-by: NWei Li <liwei391@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
- 28 4月, 2022 1 次提交
-
-
由 liangtian 提交于
virt inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I53PTV?from=project-issue CVE: NA ----------------------------------------------------- Since the reset function is in kvm_intel module instead of kvm module, the attribute weak function in kvm_main.c could not be found, which would cause st_max in X86 never be refreshed. The solution is to define the reset function in x86.c under the kvm module. Signed-off-by: Nliangtian <liangtian13@huawei.com> Reviewed-by: NKeqian Zhu <zhukeqian1@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
- 19 4月, 2022 1 次提交
-
-
由 Sean Christopherson 提交于
stable inclusion from stable-v5.10.93 commit 413b427f5fff5d658c2605ca889d6b13b88efd0c bugzilla: 186204 https://gitee.com/openeuler/kernel/issues/I5311N Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=413b427f5fff5d658c2605ca889d6b13b88efd0c -------------------------------- commit f4b027c5 upstream. Override the Processor Trace (PT) interrupt handler for guest mode if and only if PT is configured for host+guest mode, i.e. is being used independently by both host and guest. If PT is configured for system mode, the host fully controls PT and must handle all events. Fixes: 8479e04e ("KVM: x86: Inject PMI for KVM guest") Reported-by: NAlexander Shishkin <alexander.shishkin@linux.intel.com> Reported-by: NArtem Kashkanov <artem.kashkanov@intel.com> Signed-off-by: NSean Christopherson <seanjc@google.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Acked-by: NPaolo Bonzini <pbonzini@redhat.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20211111020738.2512932-4-seanjc@google.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NChen Jun <chenjun102@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
-
- 26 1月, 2022 1 次提交
-
-
由 Sean Christopherson 提交于
stable inclusion from stable-v5.10.89 commit 28626e76baf50e6b37d8a92564844d873aa6b51f bugzilla: 186140 https://gitee.com/openeuler/kernel/issues/I4S8HA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=28626e76baf50e6b37d8a92564844d873aa6b51f -------------------------------- commit fdba608f upstream. Drop a check that guards triggering a posted interrupt on the currently running vCPU, and more importantly guards waking the target vCPU if triggering a posted interrupt fails because the vCPU isn't IN_GUEST_MODE. If a vIRQ is delivered from asynchronous context, the target vCPU can be the currently running vCPU and can also be blocking, in which case skipping kvm_vcpu_wake_up() is effectively dropping what is supposed to be a wake event for the vCPU. The "do nothing" logic when "vcpu == running_vcpu" mostly works only because the majority of calls to ->deliver_posted_interrupt(), especially when using posted interrupts, come from synchronous KVM context. But if a device is exposed to the guest using vfio-pci passthrough, the VFIO IRQ and vCPU are bound to the same pCPU, and the IRQ is _not_ configured to use posted interrupts, wake events from the device will be delivered to KVM from IRQ context, e.g. vfio_msihandler() | |-> eventfd_signal() | |-> ... | |-> irqfd_wakeup() | |->kvm_arch_set_irq_inatomic() | |-> kvm_irq_delivery_to_apic_fast() | |-> kvm_apic_set_irq() This also aligns the non-nested and nested usage of triggering posted interrupts, and will allow for additional cleanups. Fixes: 379a3c8e ("KVM: VMX: Optimize posted-interrupt delivery for timer fastpath") Cc: stable@vger.kernel.org Reported-by: NLongpeng (Mike) <longpeng2@huawei.com> Signed-off-by: NSean Christopherson <seanjc@google.com> Reviewed-by: NMaxim Levitsky <mlevitsk@redhat.com> Message-Id: <20211208015236.1616697-18-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NChen Jun <chenjun102@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
- 14 1月, 2022 1 次提交
-
-
由 Sean Christopherson 提交于
stable inclusion from stable-v5.10.84 commit 7722e88505226d64d7b2158b470e6945ef759832 bugzilla: 186030 https://gitee.com/openeuler/kernel/issues/I4QV2F Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=7722e88505226d64d7b2158b470e6945ef759832 -------------------------------- commit 2b4a5a5d upstream. Flush the current VPID when handling KVM_REQ_TLB_FLUSH_GUEST instead of always flushing vpid01. Any TLB flush that is triggered when L2 is active is scoped to L2's VPID (if it has one), e.g. if L2 toggles CR4.PGE and L1 doesn't intercept PGE writes, then KVM's emulation of the TLB flush needs to be applied to L2's VPID. Reported-by: NLai Jiangshan <jiangshanlai+lkml@gmail.com> Fixes: 07ffaf34 ("KVM: nVMX: Sync all PGDs on nested transition with shadow paging") Cc: stable@vger.kernel.org Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20211125014944.536398-2-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NChen Jun <chenjun102@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
- 07 1月, 2022 5 次提交
-
-
由 Like Xu 提交于
mainline inclusion from mainline-v5.12-rc1 commit e6209a3b category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I4NP0K CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e6209a3bef793e8fe29c873a7612023916eaa611 ------------------- The current vPMU only supports Architecture Version 2. According to Intel SDM "17.4.7 Freezing LBR and Performance Counters on PMI", if IA32_DEBUGCTL.Freeze_LBR_On_PMI = 1, the LBR is frozen on the virtual PMI and the KVM would emulate to clear the LBR bit (bit 0) in IA32_DEBUGCTL. Also, guest needs to re-enable IA32_DEBUGCTL.LBR to resume recording branches. Signed-off-by: NLike Xu <like.xu@linux.intel.com> Reviewed-by: NAndi Kleen <ak@linux.intel.com> Message-Id: <20210201051039.255478-9-like.xu@linux.intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Reviewed-by: NWangJian <wangjian161@huawei.com> Reviewed-by: NWei Li <liwei391@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Like Xu 提交于
mainline inclusion from mainline-v5.12-rc1 commit 1b5ac322 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I4NP0K CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=1b5ac3226a1aa071135fe0ee5d1055d9e88b717c ------------------- In addition to DEBUGCTLMSR_LBR, any KVM trap caused by LBR msrs access will result in a creation of guest LBR event per-vcpu. If the guest LBR event is scheduled on with the corresponding vcpu context, KVM will pass-through all LBR records msrs to the guest. The LBR callstack mechanism implemented in the host could help save/restore the guest LBR records during the event context switches, which reduces a lot of overhead if we save/restore tens of LBR msrs (e.g. 32 LBR records entries) in the much more frequent VMX transitions. To avoid reclaiming LBR resources from any higher priority event on host, KVM would always check the exist of guest LBR event and its state before vm-entry as late as possible. A negative result would cancel the pass-through state, and it also prevents real registers accesses and potential data leakage. If host reclaims the LBR between two checks, the interception state and LBR records can be safely preserved due to native save/restore support from guest LBR event. The KVM emits a pr_warn() when the LBR hardware is unavailable to the guest LBR event. The administer is supposed to reminder users that the guest result may be inaccurate if someone is using LBR to record hypervisor on the host side. Suggested-by: NAndi Kleen <ak@linux.intel.com> Co-developed-by: NWei Wang <wei.w.wang@intel.com> Signed-off-by: NWei Wang <wei.w.wang@intel.com> Signed-off-by: NLike Xu <like.xu@linux.intel.com> Reviewed-by: NAndi Kleen <ak@linux.intel.com> Message-Id: <20210201051039.255478-7-like.xu@linux.intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Reviewed-by: NWangJian <wangjian161@huawei.com> Reviewed-by: NWei Li <liwei391@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Like Xu 提交于
mainline inclusion from mainline-v5.12-rc1 commit 8e12911b category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I4NP0K CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=8e12911b243e485f5e4c7c5fbc79cdf185728700 ------------------- When vcpu sets DEBUGCTLMSR_LBR in the MSR_IA32_DEBUGCTLMSR, the KVM handler would create a guest LBR event which enables the callstack mode and none of hardware counter is assigned. The host perf would schedule and enable this event as usual but in an exclusive way. The guest LBR event will be released when the vPMU is reset but soon, the lazy release mechanism would be applied to this event like a vPMC. Suggested-by: NAndi Kleen <ak@linux.intel.com> Co-developed-by: NWei Wang <wei.w.wang@intel.com> Signed-off-by: NWei Wang <wei.w.wang@intel.com> Signed-off-by: NLike Xu <like.xu@linux.intel.com> Reviewed-by: NAndi Kleen <ak@linux.intel.com> Message-Id: <20210201051039.255478-6-like.xu@linux.intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Reviewed-by: NWangJian <wangjian161@huawei.com> Reviewed-by: NWei Li <liwei391@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Like Xu 提交于
mainline inclusion from mainline-v5.12-rc1 commit c6462363 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I4NP0K CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c646236344e9054cc84cd5a9f763163b9654cf7e ------------------- Usespace could set the bits [0, 5] of the IA32_PERF_CAPABILITIES MSR which tells about the record format stored in the LBR records. The LBR will be enabled on the guest if host perf supports LBR (checked via x86_perf_get_lbr()) and the vcpu model is compatible with the host one. Signed-off-by: NLike Xu <like.xu@linux.intel.com> Message-Id: <20210201051039.255478-4-like.xu@linux.intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Reviewed-by: NWangJian <wangjian161@huawei.com> Reviewed-by: NWei Li <liwei391@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Paolo Bonzini 提交于
mainline inclusion from mainline-v5.12-rc1 commit 9c9520ce category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I4NP0K CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=9c9520ce883386dc3794c7d60204487ff1db09cb ------------------- Usespace could set the bits [0, 5] of the IA32_PERF_CAPABILITIES MSR which tells about the record format stored in the LBR records. The LBR will be enabled on the guest if host perf supports LBR (checked via x86_perf_get_lbr()) and the vcpu model is compatible with the host one. Signed-off-by: NLike Xu <like.xu@linux.intel.com> Message-Id: <20210201051039.255478-4-like.xu@linux.intel.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Reviewed-by: NWangJian <wangjian161@huawei.com> Reviewed-by: NWei Li <liwei391@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-