- 18 11月, 2022 2 次提交
-
-
由 Sean Christopherson 提交于
stable inclusion from stable-v5.10.137 commit 860e3343958a03ba8ad78750bdbe5dd1fe35ce02 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I60PLB Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=860e3343958a03ba8ad78750bdbe5dd1fe35ce02 -------------------------------- commit 764643a6 upstream. If a nested run isn't pending, snapshot vmcs01.GUEST_IA32_DEBUGCTL irrespective of whether or not VM_ENTRY_LOAD_DEBUG_CONTROLS is set in vmcs12. When restoring nested state, e.g. after migration, without a nested run pending, prepare_vmcs02() will propagate nested.vmcs01_debugctl to vmcs02, i.e. will load garbage/zeros into vmcs02.GUEST_IA32_DEBUGCTL. If userspace restores nested state before MSRs, then loading garbage is a non-issue as loading DEBUGCTL will also update vmcs02. But if usersepace restores MSRs first, then KVM is responsible for propagating L2's value, which is actually thrown into vmcs01, into vmcs02. Restoring L2 MSRs into vmcs01, i.e. loading all MSRs before nested state is all kinds of bizarre and ideally would not be supported. Sadly, some VMMs do exactly that and rely on KVM to make things work. Note, there's still a lurking SMM bug, as propagating vmcs01's DEBUGCTL to vmcs02 across RSM may corrupt L2's DEBUGCTL. But KVM's entire VMX+SMM emulation is flawed as SMI+RSM should not toouch _any_ VMCS when use the "default treatment of SMIs", i.e. when not using an SMI Transfer Monitor. Link: https://lore.kernel.org/all/Yobt1XwOfb5M6Dfa@google.com Fixes: 8fcc4b59 ("kvm: nVMX: Introduce KVM_CAP_NESTED_STATE") Cc: stable@vger.kernel.org Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20220614215831.3762138-3-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com> Reviewed-by: NWei Li <liwei391@huawei.com>
-
由 Sean Christopherson 提交于
stable inclusion from stable-v5.10.137 commit ab4805c2638cc1f30697e7e8405c4355a13999ff category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I60PLB Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=ab4805c2638cc1f30697e7e8405c4355a13999ff -------------------------------- commit fa578398 upstream. If a nested run isn't pending, snapshot vmcs01.GUEST_BNDCFGS irrespective of whether or not VM_ENTRY_LOAD_BNDCFGS is set in vmcs12. When restoring nested state, e.g. after migration, without a nested run pending, prepare_vmcs02() will propagate nested.vmcs01_guest_bndcfgs to vmcs02, i.e. will load garbage/zeros into vmcs02.GUEST_BNDCFGS. If userspace restores nested state before MSRs, then loading garbage is a non-issue as loading BNDCFGS will also update vmcs02. But if usersepace restores MSRs first, then KVM is responsible for propagating L2's value, which is actually thrown into vmcs01, into vmcs02. Restoring L2 MSRs into vmcs01, i.e. loading all MSRs before nested state is all kinds of bizarre and ideally would not be supported. Sadly, some VMMs do exactly that and rely on KVM to make things work. Note, there's still a lurking SMM bug, as propagating vmcs01.GUEST_BNDFGS to vmcs02 across RSM may corrupt L2's BNDCFGS. But KVM's entire VMX+SMM emulation is flawed as SMI+RSM should not toouch _any_ VMCS when use the "default treatment of SMIs", i.e. when not using an SMI Transfer Monitor. Link: https://lore.kernel.org/all/Yobt1XwOfb5M6Dfa@google.com Fixes: 62cf9bd8 ("KVM: nVMX: Fix emulation of VM_ENTRY_LOAD_BNDCFGS") Cc: stable@vger.kernel.org Cc: Lei Wang <lei4.wang@intel.com> Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20220614215831.3762138-2-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com> Reviewed-by: NWei Li <liwei391@huawei.com>
-
- 02 11月, 2022 1 次提交
-
-
由 Vitaly Kuznetsov 提交于
stable inclusion from stable-v5.10.132 commit eb58fd350a851b5cda9f4c9a2cefb15c7ccf33f3 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5YS3T Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=eb58fd350a851b5cda9f4c9a2cefb15c7ccf33f3 -------------------------------- [ Upstream commit 8a414f94 ] 'vector' and 'trig_mode' fields of 'struct kvm_lapic_irq' are left uninitialized in kvm_pv_kick_cpu_op(). While these fields are normally not needed for APIC_DM_REMRD, they're still referenced by __apic_accept_irq() for trace_kvm_apic_accept_irq(). Fully initialize the structure to avoid consuming random stack memory. Fixes: a183b638 ("KVM: x86: make apic_accept_irq tracepoint more generic") Reported-by: syzbot+d6caa905917d353f0d07@syzkaller.appspotmail.com Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: NSean Christopherson <seanjc@google.com> Message-Id: <20220708125147.593975-1-vkuznets@redhat.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
-
- 26 10月, 2022 2 次提交
-
-
由 Ashish Kalra 提交于
stable inclusion from stable-v5.10.124 commit 401bef1f95de92c3a8c6eece46e02fa88d7285ee category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5L6E7 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=401bef1f95de92c3a8c6eece46e02fa88d7285ee -------------------------------- commit d22d2474 upstream. For some sev ioctl interfaces, the length parameter that is passed maybe less than or equal to SEV_FW_BLOB_MAX_SIZE, but larger than the data that PSP firmware returns. In this case, kmalloc will allocate memory that is the size of the input rather than the size of the data. Since PSP firmware doesn't fully overwrite the allocated buffer, these sev ioctl interface may return uninitialized kernel slab memory. Reported-by: NAndy Nguyen <theflow@google.com> Suggested-by: NDavid Rientjes <rientjes@google.com> Suggested-by: NPeter Gonda <pgonda@google.com> Cc: kvm@vger.kernel.org Cc: stable@vger.kernel.org Cc: linux-kernel@vger.kernel.org Fixes: eaf78265 ("KVM: SVM: Move SEV code to separate file") Fixes: 2c07ded0 ("KVM: SVM: add support for SEV attestation command") Fixes: 4cfdd47d ("KVM: SVM: Add KVM_SEV SEND_START command") Fixes: d3d1af85 ("KVM: SVM: Add KVM_SEND_UPDATE_DATA command") Fixes: eba04b20 ("KVM: x86: Account a variety of miscellaneous allocations") Signed-off-by: NAshish Kalra <ashish.kalra@amd.com> Reviewed-by: NPeter Gonda <pgonda@google.com> Message-Id: <20220516154310.3685678-1-Ashish.Kalra@amd.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> [sudip: adjust context] Signed-off-by: NSudip Mukherjee <sudipm.mukherjee@gmail.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com> Reviewed-by: NWei Li <liwei391@huawei.com>
-
由 Sean Christopherson 提交于
stable inclusion from stable-v5.10.124 commit d6be031a2f5e27f27f3648bac98d2a35874eaddc category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5L6E7 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=d6be031a2f5e27f27f3648bac98d2a35874eaddc -------------------------------- commit eba04b20 upstream. Switch to GFP_KERNEL_ACCOUNT for a handful of allocations that are clearly associated with a single task/VM. Note, there are a several SEV allocations that aren't accounted, but those can (hopefully) be fixed by using the local stack for memory. Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20210331023025.2485960-3-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> [sudip: adjust context] Signed-off-by: NSudip Mukherjee <sudipm.mukherjee@gmail.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com> Reviewed-by: NWei Li <liwei391@huawei.com>
-
- 27 9月, 2022 4 次提交
-
-
由 Like Xu 提交于
mainline inclusion from mainline-v5.18 commit 75189d1d category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5RD6Y CVE: NA ------------- NMI-watchdog is one of the favorite features of kernel developers, but it does not work in AMD guest even with vPMU enabled and worse, the system misrepresents this capability via /proc. This is a PMC emulation error. KVM does not pass the latest valid value to perf_event in time when guest NMI-watchdog is running, thus the perf_event corresponding to the watchdog counter will enter the old state at some point after the first guest NMI injection, forcing the hardware register PMC0 to be constantly written to 0x800000000001. Meanwhile, the running counter should accurately reflect its new value based on the latest coordinated pmc->counter (from vPMC's point of view) rather than the value written directly by the guest. Fixes: 168d918f ("KVM: x86: Adjust counter sample period after a wrmsr") Reported-by: NDongli Cao <caodongli@kingsoft.com> Signed-off-by: NLike Xu <likexu@tencent.com> Reviewed-by: NYanan Wang <wangyanan55@huawei.com> Tested-by: NYanan Wang <wangyanan55@huawei.com> Reviewed-by: NJim Mattson <jmattson@google.com> Message-Id: <20220409015226.38619-1-likexu@tencent.com> Cc: stable@vger.kernel.org Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: Nyezengruan <yezengruan@huawei.com> Reviewed-by: NKeqian Zhu <zhukeqian1@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Like Xu 提交于
mainline inclusion from mainline-v5.14 commit e79f49c3 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I5RD6Y CVE: NA ------------- Based on our observations, after any vm-exit associated with vPMU, there are at least two or more perf interfaces to be called for guest counter emulation, such as perf_event_{pause, read_value, period}(), and each one will {lock, unlock} the same perf_event_ctx. The frequency of calls becomes more severe when guest use counters in a multiplexed manner. Holding a lock once and completing the KVM request operations in the perf context would introduce a set of impractical new interfaces. So we can further optimize the vPMU implementation by avoiding repeated calls to these interfaces in the KVM context for at least one pattern: After we call perf_event_pause() once, the event will be disabled and its internal count will be reset to 0. So there is no need to pause it again or read its value. Once the event is paused, event period will not be updated until the next time it's resumed or reprogrammed. And there is also no need to call perf_event_period twice for a non-running counter, considering the perf_event for a running counter is never paused. Based on this implementation, for the following common usage of sampling 4 events using perf on a 4u8g guest: echo 0 > /proc/sys/kernel/watchdog echo 25 > /proc/sys/kernel/perf_cpu_time_max_percent echo 10000 > /proc/sys/kernel/perf_event_max_sample_rate echo 0 > /proc/sys/kernel/perf_cpu_time_max_percent for i in `seq 1 1 10` do taskset -c 0 perf record \ -e cpu-cycles -e instructions -e branch-instructions -e cache-misses \ /root/br_instr a done the average latency of the guest NMI handler is reduced from 37646.7 ns to 32929.3 ns (~1.14x speed up) on the Intel ICX server. Also, in addition to collecting more samples, no loss of sampling accuracy was observed compared to before the optimization. Signed-off-by: NLike Xu <likexu@tencent.com> Message-Id: <20210728120705.6855-1-likexu@tencent.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Acked-by: NPeter Zijlstra <peterz@infradead.org> Signed-off-by: Nyezengruan <yezengruan@huawei.com> Reviewed-by: NKeqian Zhu <zhukeqian1@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Sean Christopherson 提交于
stable inclusion from stable-v5.10.121 commit e690350d3d9f248eb5fa7edb5371878473af1278 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5L6CQ Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=e690350d3d9f248eb5fa7edb5371878473af1278 -------------------------------- [ Upstream commit 9bd1f0ef ] Clear the IDT vectoring field in vmcs12 on next VM-Exit due to a double or triple fault. Per the SDM, a VM-Exit isn't considered to occur during event delivery if the exit is due to an intercepted double fault or a triple fault. Opportunistically move the default clearing (no event "pending") into the helper so that it's more obvious that KVM does indeed handle this case. Note, the double fault case is worded rather wierdly in the SDM: The original event results in a double-fault exception that causes the VM exit directly. Temporarily ignoring injected events, double faults can _only_ occur if an exception occurs while attempting to deliver a different exception, i.e. there's _always_ an original event. And for injected double fault, while there's no original event, injected events are never subject to interception. Presumably the SDM is calling out that a the vectoring info will be valid if a different exit occurs after a double fault, e.g. if a #PF occurs and is intercepted while vectoring #DF, then the vectoring info will show the double fault. In other words, the clause can simply be read as: The VM exit is caused by a double-fault exception. Fixes: 4704d0be ("KVM: nVMX: Exiting from L2 to L1") Cc: Chenyi Qiang <chenyi.qiang@intel.com> Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20220407002315.78092-4-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
-
由 Sean Christopherson 提交于
stable inclusion from stable-v5.10.121 commit fd7dca68a69befd3de17d59eb604a9908a13657b category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5L6CQ Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=fd7dca68a69befd3de17d59eb604a9908a13657b -------------------------------- [ Upstream commit c3634d25 ] Don't modify vmcs12 exit fields except EXIT_REASON and EXIT_QUALIFICATION when performing a nested VM-Exit due to failed VM-Entry. Per the SDM, only the two aformentioned fields are filled and "All other VM-exit information fields are unmodified". Fixes: 4704d0be ("KVM: nVMX: Exiting from L2 to L1") Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20220407002315.78092-3-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
-
- 20 9月, 2022 1 次提交
-
-
由 Daniel Sneddon 提交于
stable inclusion from stable-v5.10.136 commit 509c2c9fe75ea7493eebbb6bb2f711f37530ae19 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I5N1SO CVE: CVE-2022-26373 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=509c2c9fe75ea7493eebbb6bb2f711f37530ae19 -------------------------------- commit 2b129932 upstream. tl;dr: The Enhanced IBRS mitigation for Spectre v2 does not work as documented for RET instructions after VM exits. Mitigate it with a new one-entry RSB stuffing mechanism and a new LFENCE. == Background == Indirect Branch Restricted Speculation (IBRS) was designed to help mitigate Branch Target Injection and Speculative Store Bypass, i.e. Spectre, attacks. IBRS prevents software run in less privileged modes from affecting branch prediction in more privileged modes. IBRS requires the MSR to be written on every privilege level change. To overcome some of the performance issues of IBRS, Enhanced IBRS was introduced. eIBRS is an "always on" IBRS, in other words, just turn it on once instead of writing the MSR on every privilege level change. When eIBRS is enabled, more privileged modes should be protected from less privileged modes, including protecting VMMs from guests. == Problem == Here's a simplification of how guests are run on Linux' KVM: void run_kvm_guest(void) { // Prepare to run guest VMRESUME(); // Clean up after guest runs } The execution flow for that would look something like this to the processor: 1. Host-side: call run_kvm_guest() 2. Host-side: VMRESUME 3. Guest runs, does "CALL guest_function" 4. VM exit, host runs again 5. Host might make some "cleanup" function calls 6. Host-side: RET from run_kvm_guest() Now, when back on the host, there are a couple of possible scenarios of post-guest activity the host needs to do before executing host code: * on pre-eIBRS hardware (legacy IBRS, or nothing at all), the RSB is not touched and Linux has to do a 32-entry stuffing. * on eIBRS hardware, VM exit with IBRS enabled, or restoring the host IBRS=1 shortly after VM exit, has a documented side effect of flushing the RSB except in this PBRSB situation where the software needs to stuff the last RSB entry "by hand". IOW, with eIBRS supported, host RET instructions should no longer be influenced by guest behavior after the host retires a single CALL instruction. However, if the RET instructions are "unbalanced" with CALLs after a VM exit as is the RET in #6, it might speculatively use the address for the instruction after the CALL in #3 as an RSB prediction. This is a problem since the (untrusted) guest controls this address. Balanced CALL/RET instruction pairs such as in step #5 are not affected. == Solution == The PBRSB issue affects a wide variety of Intel processors which support eIBRS. But not all of them need mitigation. Today, X86_FEATURE_RSB_VMEXIT triggers an RSB filling sequence that mitigates PBRSB. Systems setting RSB_VMEXIT need no further mitigation - i.e., eIBRS systems which enable legacy IBRS explicitly. However, such systems (X86_FEATURE_IBRS_ENHANCED) do not set RSB_VMEXIT and most of them need a new mitigation. Therefore, introduce a new feature flag X86_FEATURE_RSB_VMEXIT_LITE which triggers a lighter-weight PBRSB mitigation versus RSB_VMEXIT. The lighter-weight mitigation performs a CALL instruction which is immediately followed by a speculative execution barrier (INT3). This steers speculative execution to the barrier -- just like a retpoline -- which ensures that speculation can never reach an unbalanced RET. Then, ensure this CALL is retired before continuing execution with an LFENCE. In other words, the window of exposure is opened at VM exit where RET behavior is troublesome. While the window is open, force RSB predictions sampling for RET targets to a dead end at the INT3. Close the window with the LFENCE. There is a subset of eIBRS systems which are not vulnerable to PBRSB. Add these systems to the cpu_vuln_whitelist[] as NO_EIBRS_PBRSB. Future systems that aren't vulnerable will set ARCH_CAP_PBRSB_NO. [ bp: Massage, incorporate review comments from Andy Cooper. ] Signed-off-by: NDaniel Sneddon <daniel.sneddon@linux.intel.com> Co-developed-by: NPawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: NPawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> conflict: arch/x86/include/asm/cpufeatures.h Signed-off-by: NChen Jiahao <chenjiahao16@huawei.com> Reviewed-by: NZhang Jianhua <chris.zjh@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
- 16 9月, 2022 16 次提交
-
-
由 Linus Torvalds 提交于
stable inclusion from stable-v5.10.133 commit 39065d54347fe1395371ad5397df353ef77c8989 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5PTAS CVE: CVE-2022-29900,CVE-2022-23816,CVE-2022-29901 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=39065d54347fe1395371ad5397df353ef77c8989 -------------------------------- commit 291073a5 upstream. The recent change to make objtool aware of more symbol relocation types (commit 24ff6525: "objtool: Teach get_alt_entry() about more relocation types") also added another check, and resulted in this objtool warning when building kvm on x86: arch/x86/kvm/emulate.o: warning: objtool: __ex_table+0x4: don't know how to handle reloc symbol type: kvm_fastop_exception The reason seems to be that kvm_fastop_exception() is marked as a global symbol, which causes the relocation to ke kept around for objtool. And at the same time, the kvm_fastop_exception definition (which is done as an inline asm statement) doesn't actually set the type of the global, which then makes objtool unhappy. The minimal fix is to just not mark kvm_fastop_exception as being a global symbol. It's only used in that one compilation unit anyway, so it was always pointless. That's how all the other local exception table labels are done. I'm not entirely happy about the kinds of games that the kvm code plays with doing its own exception handling, and the fact that it confused objtool is most definitely a symptom of the code being a bit too subtle and ad-hoc. But at least this trivial one-liner makes objtool no longer upset about what is going on. Fixes: 24ff6525 ("objtool: Teach get_alt_entry() about more relocation types") Link: https://lore.kernel.org/lkml/CAHk-=wiZwq-0LknKhXN4M+T8jbxn_2i9mcKpO+OaBSSq_Eh7tg@mail.gmail.com/ Cc: Borislav Petkov <bp@suse.de> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Sean Christopherson <seanjc@google.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: Wanpeng Li <wanpengli@tencent.com> Cc: Jim Mattson <jmattson@google.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Nathan Chancellor <nathan@kernel.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NLin Yujun <linyujun809@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
stable inclusion from stable-v5.10.133 commit 8e31dfd6306e7c6bb3758a459c2a6067be807886 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5PTAS CVE: CVE-2022-29900,CVE-2022-23816,CVE-2022-29901 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=8e31dfd6306e7c6bb3758a459c2a6067be807886 -------------------------------- commit 84e7051c upstream. The return thunk call makes the fastop functions larger, just like IBT does. Consider a 16-byte FASTOP_SIZE when CONFIG_RETHUNK is enabled. Otherwise, functions will be incorrectly aligned and when computing their position for differently sized operators, they will executed in the middle or end of a function, which may as well be an int3, leading to a crash like: [ 36.091116] int3: 0000 [#1] SMP NOPTI [ 36.091119] CPU: 3 PID: 1371 Comm: qemu-system-x86 Not tainted 5.15.0-41-generic #44 [ 36.091120] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1 04/01/2014 [ 36.091121] RIP: 0010:xaddw_ax_dx+0x9/0x10 [kvm] [ 36.091185] Code: 00 0f bb d0 c3 cc cc cc cc 48 0f bb d0 c3 cc cc cc cc 0f 1f 80 00 00 00 00 0f c0 d0 c3 cc cc cc cc 66 0f c1 d0 c3 cc cc cc cc <0f> 1f 80 00 00 00 00 0f c1 d0 c3 cc cc cc cc 48 0f c1 d0 c3 cc cc [ 36.091186] RSP: 0018:ffffb1f541143c98 EFLAGS: 00000202 [ 36.091188] RAX: 0000000089abcdef RBX: 0000000000000001 RCX: 0000000000000000 [ 36.091188] RDX: 0000000076543210 RSI: ffffffffc073c6d0 RDI: 0000000000000200 [ 36.091189] RBP: ffffb1f541143ca0 R08: ffff9f1803350a70 R09: 0000000000000002 [ 36.091190] R10: ffff9f1803350a70 R11: 0000000000000000 R12: ffff9f1803350a70 [ 36.091190] R13: ffffffffc077fee0 R14: 0000000000000000 R15: 0000000000000000 [ 36.091191] FS: 00007efdfce8d640(0000) GS:ffff9f187dd80000(0000) knlGS:0000000000000000 [ 36.091192] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 36.091192] CR2: 0000000000000000 CR3: 0000000009b62002 CR4: 0000000000772ee0 [ 36.091195] PKRU: 55555554 [ 36.091195] Call Trace: [ 36.091197] <TASK> [ 36.091198] ? fastop+0x5a/0xa0 [kvm] [ 36.091222] x86_emulate_insn+0x7b8/0xe90 [kvm] [ 36.091244] x86_emulate_instruction+0x2f4/0x630 [kvm] [ 36.091263] ? kvm_arch_vcpu_load+0x7c/0x230 [kvm] [ 36.091283] ? vmx_prepare_switch_to_host+0xf7/0x190 [kvm_intel] [ 36.091290] complete_emulated_mmio+0x297/0x320 [kvm] [ 36.091310] kvm_arch_vcpu_ioctl_run+0x32f/0x550 [kvm] [ 36.091330] kvm_vcpu_ioctl+0x29e/0x6d0 [kvm] [ 36.091344] ? kvm_vcpu_ioctl+0x120/0x6d0 [kvm] [ 36.091357] ? __fget_files+0x86/0xc0 [ 36.091362] ? __fget_files+0x86/0xc0 [ 36.091363] __x64_sys_ioctl+0x92/0xd0 [ 36.091366] do_syscall_64+0x59/0xc0 [ 36.091369] ? syscall_exit_to_user_mode+0x27/0x50 [ 36.091370] ? do_syscall_64+0x69/0xc0 [ 36.091371] ? syscall_exit_to_user_mode+0x27/0x50 [ 36.091372] ? __x64_sys_writev+0x1c/0x30 [ 36.091374] ? do_syscall_64+0x69/0xc0 [ 36.091374] ? exit_to_user_mode_prepare+0x37/0xb0 [ 36.091378] ? syscall_exit_to_user_mode+0x27/0x50 [ 36.091379] ? do_syscall_64+0x69/0xc0 [ 36.091379] ? do_syscall_64+0x69/0xc0 [ 36.091380] ? do_syscall_64+0x69/0xc0 [ 36.091381] ? do_syscall_64+0x69/0xc0 [ 36.091381] entry_SYSCALL_64_after_hwframe+0x61/0xcb [ 36.091384] RIP: 0033:0x7efdfe6d1aff [ 36.091390] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <41> 89 c0 3d 00 f0 ff ff 77 1f 48 8b 44 24 18 64 48 2b 04 25 28 00 [ 36.091391] RSP: 002b:00007efdfce8c460 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ 36.091393] RAX: ffffffffffffffda RBX: 000000000000ae80 RCX: 00007efdfe6d1aff [ 36.091393] RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 000000000000000c [ 36.091394] RBP: 0000558f1609e220 R08: 0000558f13fb8190 R09: 00000000ffffffff [ 36.091394] R10: 0000558f16b5e950 R11: 0000000000000246 R12: 0000000000000000 [ 36.091394] R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000000 [ 36.091396] </TASK> [ 36.091397] Modules linked in: isofs nls_iso8859_1 kvm_intel joydev kvm input_leds serio_raw sch_fq_codel dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua ipmi_devintf ipmi_msghandler drm msr ip_tables x_tables autofs4 btrfs blake2b_generic zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel virtio_net net_failover crypto_simd ahci xhci_pci cryptd psmouse virtio_blk libahci xhci_pci_renesas failover [ 36.123271] ---[ end trace db3c0ab5a48fabcc ]--- [ 36.123272] RIP: 0010:xaddw_ax_dx+0x9/0x10 [kvm] [ 36.123319] Code: 00 0f bb d0 c3 cc cc cc cc 48 0f bb d0 c3 cc cc cc cc 0f 1f 80 00 00 00 00 0f c0 d0 c3 cc cc cc cc 66 0f c1 d0 c3 cc cc cc cc <0f> 1f 80 00 00 00 00 0f c1 d0 c3 cc cc cc cc 48 0f c1 d0 c3 cc cc [ 36.123320] RSP: 0018:ffffb1f541143c98 EFLAGS: 00000202 [ 36.123321] RAX: 0000000089abcdef RBX: 0000000000000001 RCX: 0000000000000000 [ 36.123321] RDX: 0000000076543210 RSI: ffffffffc073c6d0 RDI: 0000000000000200 [ 36.123322] RBP: ffffb1f541143ca0 R08: ffff9f1803350a70 R09: 0000000000000002 [ 36.123322] R10: ffff9f1803350a70 R11: 0000000000000000 R12: ffff9f1803350a70 [ 36.123323] R13: ffffffffc077fee0 R14: 0000000000000000 R15: 0000000000000000 [ 36.123323] FS: 00007efdfce8d640(0000) GS:ffff9f187dd80000(0000) knlGS:0000000000000000 [ 36.123324] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 36.123325] CR2: 0000000000000000 CR3: 0000000009b62002 CR4: 0000000000772ee0 [ 36.123327] PKRU: 55555554 [ 36.123328] Kernel panic - not syncing: Fatal exception in interrupt [ 36.123410] Kernel Offset: 0x1400000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) [ 36.135305] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]--- Fixes: aa3d4803 ("x86: Use return-thunk in asm code") Signed-off-by: NThadeu Lima de Souza Cascardo <cascardo@canonical.com> Co-developed-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Borislav Petkov <bp@suse.de> Cc: Josh Poimboeuf <jpoimboe@kernel.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Reported-by: NLinux Kernel Functional Testing <lkft@linaro.org> Message-Id: <20220713171241.184026-1-cascardo@canonical.com> Tested-by: NJack Wang <jinpu.wang@ionos.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NLin Yujun <linyujun809@huawei.com> [Backport] KVM: emulate: do not adjust size of fastop and setcc subroutines stable inclusion from stable-v5.10.133 commit 2ef1b06ceacfc6f96f6792126e107e75c0d7bd5d category: bugfix bugzilla: 187378 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=2ef1b06ceacfc6f96f6792126e107e75c0d7bd5d -------------------------------- commit 79629181 upstream. Instead of doing complicated calculations to find the size of the subroutines (which are even more complicated because they need to be stringified into an asm statement), just hardcode to 16. It is less dense for a few combinations of IBT/SLS/retbleed, but it has the advantage of being really simple. Cc: stable@vger.kernel.org # 5.15.x: 84e7051c: x86/kvm: fix FASTOP_SIZE when return thunks are enabled Cc: stable@vger.kernel.org Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NLin Yujun <linyujun809@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Peter Zijlstra 提交于
stable inclusion from stable-v5.10.133 commit b24fdd0f1c3328cf8ee0c518b93a7187f8cee097 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5PTAS CVE: CVE-2022-29900,CVE-2022-23816,CVE-2022-29901 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=b24fdd0f1c3328cf8ee0c518b93a7187f8cee097 -------------------------------- commit f43b9876 upstream. Do fine-grained Kconfig for all the various retbleed parts. NOTE: if your compiler doesn't support return thunks this will silently 'upgrade' your mitigation to IBPB, you might not like this. Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: NBorislav Petkov <bp@suse.de> [cascardo: there is no CONFIG_OBJTOOL] [cascardo: objtool calling and option parsing has changed] Signed-off-by: NThadeu Lima de Souza Cascardo <cascardo@canonical.com> [bwh: Backported to 5.10: - In scripts/Makefile.build, add the objtool option with an ifdef block, same as for other options - Adjust filename, context] Signed-off-by: NBen Hutchings <ben@decadent.org.uk> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> conflict: arch/x86/include/asm/disabled-features.h Signed-off-by: NLin Yujun <linyujun809@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Josh Poimboeuf 提交于
stable inclusion from stable-v5.10.133 commit 4d7f72b6e1bc630bec7e4cd51814bc2b092bf153 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5PTAS CVE: CVE-2022-29900,CVE-2022-23816,CVE-2022-29901 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=4d7f72b6e1bc630bec7e4cd51814bc2b092bf153 -------------------------------- commit 9756bba2 upstream. Prevent RSB underflow/poisoning attacks with RSB. While at it, add a bunch of comments to attempt to document the current state of tribal knowledge about RSB attacks and what exactly is being mitigated. Signed-off-by: NJosh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: NBorislav Petkov <bp@suse.de> Signed-off-by: NThadeu Lima de Souza Cascardo <cascardo@canonical.com> Signed-off-by: NBen Hutchings <ben@decadent.org.uk> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NLin Yujun <linyujun809@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Josh Poimboeuf 提交于
stable inclusion from stable-v5.10.133 commit 47ae76fb27398e867980d63789058ff7c4f12a35 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5PTAS CVE: CVE-2022-29900,CVE-2022-23816,CVE-2022-29901 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=47ae76fb27398e867980d63789058ff7c4f12a35 -------------------------------- commit bea7e31a upstream. For legacy IBRS to work, the IBRS bit needs to be always re-written after vmexit, even if it's already on. Signed-off-by: NJosh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: NBorislav Petkov <bp@suse.de> Signed-off-by: NThadeu Lima de Souza Cascardo <cascardo@canonical.com> Signed-off-by: NBen Hutchings <ben@decadent.org.uk> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NLin Yujun <linyujun809@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Josh Poimboeuf 提交于
stable inclusion from stable-v5.10.133 commit 5269be9111e2b66572e78647f2e8948f7fc96466 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5PTAS CVE: CVE-2022-29900,CVE-2022-23816,CVE-2022-29901 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=5269be9111e2b66572e78647f2e8948f7fc96466 -------------------------------- commit fc02735b upstream. On eIBRS systems, the returns in the vmexit return path from __vmx_vcpu_run() to vmx_vcpu_run() are exposed to RSB poisoning attacks. Fix that by moving the post-vmexit spec_ctrl handling to immediately after the vmexit. Signed-off-by: NJosh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: NBorislav Petkov <bp@suse.de> Signed-off-by: NThadeu Lima de Souza Cascardo <cascardo@canonical.com> Signed-off-by: NBen Hutchings <ben@decadent.org.uk> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> conflict: arch/x86/kvm/vmx/vmx.h Signed-off-by: NLin Yujun <linyujun809@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Josh Poimboeuf 提交于
stable inclusion from stable-v5.10.133 commit 84061fff2ad98a7809f00e88a54f584f84830388 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5PTAS CVE: CVE-2022-29900,CVE-2022-23816,CVE-2022-29901 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=84061fff2ad98a7809f00e88a54f584f84830388 -------------------------------- commit bb066506 upstream. Convert __vmx_vcpu_run()'s 'launched' argument to 'flags', in preparation for doing SPEC_CTRL handling immediately after vmexit, which will need another flag. This is much easier than adding a fourth argument, because this code supports both 32-bit and 64-bit, and the fourth argument on 32-bit would have to be pushed on the stack. Note that __vmx_vcpu_run_flags() is called outside of the noinstr critical section because it will soon start calling potentially traceable functions. Signed-off-by: NJosh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: NBorislav Petkov <bp@suse.de> Signed-off-by: NThadeu Lima de Souza Cascardo <cascardo@canonical.com> Signed-off-by: NBen Hutchings <ben@decadent.org.uk> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> conflict: arch/x86/kvm/vmx/vmx.h Signed-off-by: NLin Yujun <linyujun809@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Josh Poimboeuf 提交于
stable inclusion from stable-v5.10.133 commit 07401c2311f6fddd3c49a392eafc2c28a899f768 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5PTAS CVE: CVE-2022-29900,CVE-2022-23816,CVE-2022-29901 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=07401c2311f6fddd3c49a392eafc2c28a899f768 -------------------------------- commit 8bd200d2 upstream. Move the vmx_vm{enter,exit}() functionality into __vmx_vcpu_run(). This will make it easier to do the spec_ctrl handling before the first RET. Signed-off-by: NJosh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: NBorislav Petkov <bp@suse.de> [cascardo: remove ENDBR] Signed-off-by: NThadeu Lima de Souza Cascardo <cascardo@canonical.com> Signed-off-by: NBen Hutchings <ben@decadent.org.uk> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NLin Yujun <linyujun809@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Peter Zijlstra 提交于
stable inclusion from stable-v5.10.133 commit df748593c55389892902aecb8691080ad5e8cff5 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5PTAS CVE: CVE-2022-29900,CVE-2022-23816,CVE-2022-29901 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=df748593c55389892902aecb8691080ad5e8cff5 -------------------------------- commit a149180f upstream. Note: needs to be in a section distinct from Retpolines such that the Retpoline RET substitution cannot possibly use immediate jumps. ORC unwinding for zen_untrain_ret() and __x86_return_thunk() is a little tricky but works due to the fact that zen_untrain_ret() doesn't have any stack ops and as such will emit a single ORC entry at the start (+0x3f). Meanwhile, unwinding an IP, including the __x86_return_thunk() one (+0x40) will search for the largest ORC entry smaller or equal to the IP, these will find the one ORC entry (+0x3f) and all works. [ Alexandre: SVM part. ] [ bp: Build fix, massages. ] Suggested-by: NAndrew Cooper <Andrew.Cooper3@citrix.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: NBorislav Petkov <bp@suse.de> Reviewed-by: NJosh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: NBorislav Petkov <bp@suse.de> [cascardo: conflicts at arch/x86/entry/entry_64_compat.S] [cascardo: there is no ANNOTATE_NOENDBR] [cascardo: objtool commit 34c861e8 missing] [cascardo: conflict fixup] Signed-off-by: NThadeu Lima de Souza Cascardo <cascardo@canonical.com> [bwh: Backported to 5.10: SEV-ES is not supported, so drop the change in arch/x86/kvm/svm/vmenter.S] Signed-off-by: NBen Hutchings <ben@decadent.org.uk> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> conflict: arch/x86/include/asm/cpufeatures.h Signed-off-by: NLin Yujun <linyujun809@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Peter Zijlstra 提交于
stable inclusion from stable-v5.10.133 commit ee4996f07d868ee6cc7e76151dfab9a2344cdeb0 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5PTAS CVE: CVE-2022-29900,CVE-2022-23816,CVE-2022-29901 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=ee4996f07d868ee6cc7e76151dfab9a2344cdeb0 -------------------------------- commit af2e140f upstream. Prepare the SETcc fastop stuff for when RET can be larger still. The tricky bit here is that the expressions should not only be constant C expressions, but also absolute GAS expressions. This means no ?: and 'true' is ~0. Also ensure em_setcc() has the same alignment as the actual FOP_SETCC() ops, this ensures there cannot be an alignment hole between em_setcc() and the first op. Additionally, add a .skip directive to the FOP_SETCC() macro to fill any remaining space with INT3 traps; however the primary purpose of this directive is to generate AS warnings when the remaining space goes negative. Which is a very good indication the alignment magic went side-ways. Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: NBorislav Petkov <bp@suse.de> Reviewed-by: NJosh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: NBorislav Petkov <bp@suse.de> [cascardo: ignore ENDBR when computing SETCC_LENGTH] [cascardo: conflict fixup] Signed-off-by: NThadeu Lima de Souza Cascardo <cascardo@canonical.com> Signed-off-by: NBen Hutchings <ben@decadent.org.uk> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NLin Yujun <linyujun809@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Peter Zijlstra 提交于
stable inclusion from stable-v5.10.133 commit 7070bbb66c5303117e4c7651711ea7daae4c64b5 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5PTAS CVE: CVE-2022-29900,CVE-2022-23816,CVE-2022-29901 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=7070bbb66c5303117e4c7651711ea7daae4c64b5 -------------------------------- commit 742ab6df upstream. The recent mmio_stale_data fixes broke the noinstr constraints: vmlinux.o: warning: objtool: vmx_vcpu_enter_exit+0x15b: call to wrmsrl.constprop.0() leaves .noinstr.text section vmlinux.o: warning: objtool: vmx_vcpu_enter_exit+0x1bf: call to kvm_arch_has_assigned_device() leaves .noinstr.text section make it all happy again. Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: NBorislav Petkov <bp@suse.de> Signed-off-by: NThadeu Lima de Souza Cascardo <cascardo@canonical.com> Signed-off-by: NBen Hutchings <ben@decadent.org.uk> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NLin Yujun <linyujun809@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Borislav Petkov 提交于
stable inclusion from stable-v5.10.133 commit bef21f88b47e399a76276ef1620fb816a0cc4e83 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5PTAS CVE: CVE-2022-29900,CVE-2022-23816,CVE-2022-29901 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=bef21f88b47e399a76276ef1620fb816a0cc4e83 -------------------------------- commit fe83f5ea upstream. The commit in Fixes started adding INT3 after RETs as a mitigation against straight-line speculation. The fastop SETcc implementation in kvm's insn emulator uses macro magic to generate all possible SETcc functions and to jump to them when emulating the respective instruction. However, it hardcodes the size and alignment of those functions to 4: a three-byte SETcc insn and a single-byte RET. BUT, with SLS, there's an INT3 that gets slapped after the RET, which brings the whole scheme out of alignment: 15: 0f 90 c0 seto %al 18: c3 ret 19: cc int3 1a: 0f 1f 00 nopl (%rax) 1d: 0f 91 c0 setno %al 20: c3 ret 21: cc int3 22: 0f 1f 00 nopl (%rax) 25: 0f 92 c0 setb %al 28: c3 ret 29: cc int3 and this explodes like this: int3: 0000 [#1] PREEMPT SMP PTI CPU: 0 PID: 2435 Comm: qemu-system-x86 Not tainted 5.17.0-rc8-sls #1 Hardware name: Dell Inc. Precision WorkStation T3400 /0TP412, BIOS A14 04/30/2012 RIP: 0010:setc+0x5/0x8 [kvm] Code: 00 00 0f 1f 00 0f b6 05 43 24 06 00 c3 cc 0f 1f 80 00 00 00 00 0f 90 c0 c3 cc 0f \ 1f 00 0f 91 c0 c3 cc 0f 1f 00 0f 92 c0 c3 cc <0f> 1f 00 0f 93 c0 c3 cc 0f 1f 00 \ 0f 94 c0 c3 cc 0f 1f 00 0f 95 c0 Call Trace: <TASK> ? x86_emulate_insn [kvm] ? x86_emulate_instruction [kvm] ? vmx_handle_exit [kvm_intel] ? kvm_arch_vcpu_ioctl_run [kvm] ? kvm_vcpu_ioctl [kvm] ? __x64_sys_ioctl ? do_syscall_64 ? entry_SYSCALL_64_after_hwframe </TASK> Raise the alignment value when SLS is enabled and use a macro for that instead of hard-coding naked numbers. Fixes: e463a09a ("x86: Add straight-line-speculation mitigation") Reported-by: NJamie Heilman <jamie@audible.transient.net> Signed-off-by: NBorislav Petkov <bp@suse.de> Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Tested-by: NJamie Heilman <jamie@audible.transient.net> Link: https://lore.kernel.org/r/YjGzJwjrvxg5YZ0Z@audible.transient.net [Add a comment and a bit of safety checking, since this is going to be changed again for IBT support. - Paolo] Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NBen Hutchings <ben@decadent.org.uk> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NLin Yujun <linyujun809@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Peter Zijlstra 提交于
stable inclusion from stable-v5.10.133 commit 277f4ddc36c578691678b8ae59b60d76ad15fa1b category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5PTAS CVE: CVE-2022-29900,CVE-2022-23816,CVE-2022-29901 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=277f4ddc36c578691678b8ae59b60d76ad15fa1b -------------------------------- commit b17c2baa upstream. Replace all ret/retq instructions with ASM_RET in preparation of making it more than a single instruction. Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: NBorislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20211204134907.964635458@infradead.orgSigned-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> [bwh: Backported to 5.10: adjust context] Signed-off-by: NBen Hutchings <ben@decadent.org.uk> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NLin Yujun <linyujun809@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Peter Zijlstra 提交于
stable inclusion from stable-v5.10.133 commit 3c91e2257622c169bf74d587e48b96923285ed74 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5PTAS CVE: CVE-2022-29900,CVE-2022-23816,CVE-2022-29901 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=3c91e2257622c169bf74d587e48b96923285ed74 -------------------------------- commit f94909ce upstream. Replace all ret/retq instructions with RET in preparation of making RET a macro. Since AS is case insensitive it's a big no-op without RET defined. find arch/x86/ -name \*.S | while read file do sed -i 's/\<ret[q]*\>/RET/' $file done Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: NBorislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20211204134907.905503893@infradead.org [bwh: Backported to 5.10: ran the above command] Signed-off-by: NBen Hutchings <ben@decadent.org.uk> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> conflict: arch/x86/lib/copy_user_64.S Signed-off-by: NLin Yujun <linyujun809@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Uros Bizjak 提交于
stable inclusion from stable-v5.10.133 commit dd87aa5f610be44f195cf5a99b7bc153faf30a3d category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5PTAS CVE: CVE-2022-29900,CVE-2022-23816,CVE-2022-29901 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=dd87aa5f610be44f195cf5a99b7bc153faf30a3d -------------------------------- commit 150f17bf upstream. Replace inline assembly in nested_vmx_check_vmentry_hw with a call to __vmx_vcpu_run. The function is not performance critical, so (double) GPR save/restore in __vmx_vcpu_run can be tolerated, as far as performance effects are concerned. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Sean Christopherson <seanjc@google.com> Reviewed-and-tested-by: NSean Christopherson <seanjc@google.com> Signed-off-by: NUros Bizjak <ubizjak@gmail.com> [sean: dropped versioning info from changelog] Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20201231002702.22237077-5-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NBen Hutchings <ben@decadent.org.uk> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> conflict: arch/x86/kvm/vmx/vmx.h Signed-off-by: NLin Yujun <linyujun809@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Uros Bizjak 提交于
stable inclusion from stable-v5.10.133 commit 0ca2ba6e4d139da809061a25626174f812303b7a category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5PTAS CVE: CVE-2022-29900,CVE-2022-23816,CVE-2022-29901 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=0ca2ba6e4d139da809061a25626174f812303b7a -------------------------------- commit 6c44221b upstream. Saves one byte in __vmx_vcpu_run for the same functionality. Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NUros Bizjak <ubizjak@gmail.com> Message-Id: <20201029140457.126965-1-ubizjak@gmail.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NBen Hutchings <ben@decadent.org.uk> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NLin Yujun <linyujun809@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
- 14 9月, 2022 1 次提交
-
-
由 Paolo Bonzini 提交于
mainline inclusion from mainline-v5.19-rc2 commit 6cd88243 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I5PJ7H CVE: CVE-2022-39189 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=6cd88243c7e03845a450795e134b488fc2afb736 ---------------------------------------- If a vCPU is outside guest mode and is scheduled out, it might be in the process of making a memory access. A problem occurs if another vCPU uses the PV TLB flush feature during the period when the vCPU is scheduled out, and a virtual address has already been translated but has not yet been accessed, because this is equivalent to using a stale TLB entry. To avoid this, only report a vCPU as preempted if sure that the guest is at an instruction boundary. A rescheduling request will be delivered to the host physical CPU as an external interrupt, so for simplicity consider any vmexit *not* instruction boundary except for external interrupts. It would in principle be okay to report the vCPU as preempted also if it is sleeping in kvm_vcpu_block(): a TLB flush IPI will incur the vmentry/vmexit overhead unnecessarily, and optimistic spinning is also unlikely to succeed. However, leave it for later because right now kvm_vcpu_check_block() is doing memory accesses. Even though the TLB flush issue only applies to virtual memory address, it's very much preferrable to be conservative. Reported-by: NJann Horn <jannh@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> conflict: arch/x86/kvm/x86.c Signed-off-by: NGuo Mengqi <guomengqi3@huawei.com> Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com> Reviewed-by: Nyezengruan <yezengruan@huawei.com> Reviewed-by: NWeilong Chen <chenweilong@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
- 30 8月, 2022 1 次提交
-
-
由 Vitaly Kuznetsov 提交于
stable inclusion from stable-v5.10.119 commit 74c6e5d584354c6126f1231667f9d8e85d7f536f category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5L6BB Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=74c6e5d584354c6126f1231667f9d8e85d7f536f -------------------------------- commit 2f15d027 upstream. Async PF 'page ready' event may happen when LAPIC is (temporary) disabled. In particular, Sebastien reports that when Linux kernel is directly booted by Cloud Hypervisor, LAPIC is 'software disabled' when APF mechanism is initialized. On initialization KVM tries to inject 'wakeup all' event and puts the corresponding token to the slot. It is, however, failing to inject an interrupt (kvm_apic_set_irq() -> __apic_accept_irq() -> !apic_enabled()) so the guest never gets notified and the whole APF mechanism gets stuck. The same issue is likely to happen if the guest temporary disables LAPIC and a previously unavailable page becomes available. Do two things to resolve the issue: - Avoid dequeuing 'page ready' events from APF queue when LAPIC is disabled. - Trigger an attempt to deliver pending 'page ready' events when LAPIC becomes enabled (SPIV or MSR_IA32_APICBASE). Reported-by: NSebastien Boeuf <sebastien.boeuf@intel.com> Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20210422092948.568327-1-vkuznets@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> [Guoqing: backport to 5.10-stable ] Signed-off-by: NGuoqing Jiang <guoqing.jiang@linux.dev> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
-
- 17 8月, 2022 1 次提交
-
-
由 Sean Christopherson 提交于
stable inclusion from stable-v5.10.118 commit b49bc8d615eea1043cb238318af04f8876e846b3 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5L686 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=b49bc8d615eea1043cb238318af04f8876e846b3 -------------------------------- commit b28cb0cd upstream. When zapping obsolete pages, update the running count of zapped pages regardless of whether or not the list has become unstable due to zapping a shadow page with its own child shadow pages. If the VM is backed by mostly 4kb pages, KVM can zap an absurd number of SPTEs without bumping the batch count and thus without yielding. In the worst case scenario, this can cause a soft lokcup. watchdog: BUG: soft lockup - CPU#12 stuck for 22s! [dirty_log_perf_:13020] RIP: 0010:workingset_activation+0x19/0x130 mark_page_accessed+0x266/0x2e0 kvm_set_pfn_accessed+0x31/0x40 mmu_spte_clear_track_bits+0x136/0x1c0 drop_spte+0x1a/0xc0 mmu_page_zap_pte+0xef/0x120 __kvm_mmu_prepare_zap_page+0x205/0x5e0 kvm_mmu_zap_all_fast+0xd7/0x190 kvm_mmu_invalidate_zap_pages_in_memslot+0xe/0x10 kvm_page_track_flush_slot+0x5c/0x80 kvm_arch_flush_shadow_memslot+0xe/0x10 kvm_set_memslot+0x1a8/0x5d0 __kvm_set_memory_region+0x337/0x590 kvm_vm_ioctl+0xb08/0x1040 Fixes: fbb158cb ("KVM: x86/mmu: Revert "Revert "KVM: MMU: zap pages in batch""") Reported-by: NDavid Matlack <dmatlack@google.com> Reviewed-by: NBen Gardon <bgardon@google.com> Cc: stable@vger.kernel.org Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20220511145122.3133334-1-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
-
- 02 8月, 2022 5 次提交
-
-
由 Wanpeng Li 提交于
stable inclusion from stable-v5.10.115 commit 43dbc3edada66eaa9d406ea9647359e795288ea0 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5IZ9C Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=43dbc3edada66eaa9d406ea9647359e795288ea0 -------------------------------- [ Upstream commit 1714a4eb ] As commit 0c5f81da ("KVM: LAPIC: Inject timer interrupt via posted interrupt") mentioned that the host admin should well tune the guest setup, so that vCPUs are placed on isolated pCPUs, and with several pCPUs surplus for *busy* housekeeping. In this setup, it is preferrable to disable mwait/hlt/pause vmexits to keep the vCPUs in non-root mode. However, if only some guests isolated and others not, they would not have any benefit from posted timer interrupts, and at the same time lose VMX preemption timer fast paths because kvm_can_post_timer_interrupt() returns true and therefore forces kvm_can_use_hv_timer() to false. By guaranteeing that posted-interrupt timer is only used if MWAIT or HLT are done without vmexit, KVM can make a better choice and use the VMX preemption timer and the corresponding fast paths. Reported-by: NAili Yao <yaoaili@kingsoft.com> Reviewed-by: NSean Christopherson <seanjc@google.com> Cc: Aili Yao <yaoaili@kingsoft.com> Cc: Sean Christopherson <seanjc@google.com> Signed-off-by: NWanpeng Li <wanpengli@tencent.com> Message-Id: <1643112538-36743-1-git-send-email-wanpengli@tencent.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
-
由 Paolo Bonzini 提交于
stable inclusion from stable-v5.10.115 commit 9c8474fa3477395e754687b3edb17ca0dfe46d6a category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5IZ9C Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=9c8474fa3477395e754687b3edb17ca0dfe46d6a -------------------------------- [ Upstream commit 9191b8f0 ] WARN and bail if KVM attempts to free a root that isn't backed by a shadow page. KVM allocates a bare page for "special" roots, e.g. when using PAE paging or shadowing 2/3/4-level page tables with 4/5-level, and so root_hpa will be valid but won't be backed by a shadow page. It's all too easy to blindly call mmu_free_root_page() on root_hpa, be nice and WARN instead of crashing KVM and possibly the kernel. Reviewed-by: NSean Christopherson <seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
-
由 Paolo Bonzini 提交于
stable inclusion from stable-v5.10.115 commit a474ee5ececc19c764c718c76550c5650e54a3be category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5IZ9C Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=a474ee5ececc19c764c718c76550c5650e54a3be -------------------------------- [ Upstream commit d22a81b3 ] Emulating writes to SELF_IPI with a write to ICR has an unwanted side effect: the value of ICR in vAPIC page gets changed. The lists SELF_IPI as write-only, with no associated MMIO offset, so any write should have no visible side effect in the vAPIC page. Reported-by: NChao Gao <chao.gao@intel.com> Reviewed-by: NSean Christopherson <seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
-
由 Sandipan Das 提交于
stable inclusion from stable-v5.10.115 commit 599fc32e7421d7a591ee78bb4f3b79c54d51065c category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5IZ9C Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=599fc32e7421d7a591ee78bb4f3b79c54d51065c -------------------------------- [ Upstream commit 5a1bde46 ] On some x86 processors, CPUID leaf 0xA provides information on Architectural Performance Monitoring features. It advertises a PMU version which Qemu uses to determine the availability of additional MSRs to manage the PMCs. Upon receiving a KVM_GET_SUPPORTED_CPUID ioctl request for the same, the kernel constructs return values based on the x86_pmu_capability irrespective of the vendor. This leaf and the additional MSRs are not supported on AMD and Hygon processors. If AMD PerfMonV2 is detected, the PMU version is set to 2 and guest startup breaks because of an attempt to access a non-existent MSR. Return zeros to avoid this. Fixes: a6c06ed1 ("KVM: Expose the architectural performance monitoring CPUID leaf") Reported-by: NVasant Hegde <vasant.hegde@amd.com> Signed-off-by: NSandipan Das <sandipan.das@amd.com> Message-Id: <3fef83d9c2b2f7516e8ff50d60851f29a4bcb716.1651058600.git.sandipan.das@amd.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
-
由 Kyle Huey 提交于
stable inclusion from stable-v5.10.115 commit fbb7c61e76017202dedd2020f4d51264f500de68 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5IZ9C Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=fbb7c61e76017202dedd2020f4d51264f500de68 -------------------------------- commit 5eb84932 upstream. Zen renumbered some of the performance counters that correspond to the well known events in perf_hw_id. This code in KVM was never updated for that, so guest that attempt to use counters on Zen that correspond to the pre-Zen perf_hw_id values will silently receive the wrong values. This has been observed in the wild with rr[0] when running in Zen 3 guests. rr uses the retired conditional branch counter 00d1 which is incorrectly recognized by KVM as PERF_COUNT_HW_STALLED_CYCLES_BACKEND. [0] https://rr-project.org/Signed-off-by: NKyle Huey <me@kylehuey.com> Message-Id: <20220503050136.86298-1-khuey@kylehuey.com> Cc: stable@vger.kernel.org [Check guest family, not host. - Paolo] Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
-
- 19 7月, 2022 1 次提交
-
-
由 Sean Christopherson 提交于
stable inclusion from stable-v5.10.112 commit 342454231ee5f2c2782f5510cab2e7a968486fef category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5HL0X Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=342454231ee5f2c2782f5510cab2e7a968486fef -------------------------------- commit 1d0e8480 upstream. Resolve nx_huge_pages to true/false when kvm.ko is loaded, leaving it as -1 is technically undefined behavior when its value is read out by param_get_bool(), as boolean values are supposed to be '0' or '1'. Alternatively, KVM could define a custom getter for the param, but the auto value doesn't depend on the vendor module in any way, and printing "auto" would be unnecessarily unfriendly to the user. In addition to fixing the undefined behavior, resolving the auto value also fixes the scenario where the auto value resolves to N and no vendor module is loaded. Previously, -1 would result in Y being printed even though KVM would ultimately disable the mitigation. Rename the existing MMU module init/exit helpers to clarify that they're invoked with respect to the vendor module, and add comments to document why KVM has two separate "module init" flows. ========================================================================= UBSAN: invalid-load in kernel/params.c:320:33 load of value 255 is not a valid value for type '_Bool' CPU: 6 PID: 892 Comm: tail Not tainted 5.17.0-rc3+ #799 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 Call Trace: <TASK> dump_stack_lvl+0x34/0x44 ubsan_epilogue+0x5/0x40 __ubsan_handle_load_invalid_value.cold+0x43/0x48 param_get_bool.cold+0xf/0x14 param_attr_show+0x55/0x80 module_attr_show+0x1c/0x30 sysfs_kf_seq_show+0x93/0xc0 seq_read_iter+0x11c/0x450 new_sync_read+0x11b/0x1a0 vfs_read+0xf0/0x190 ksys_read+0x5f/0xe0 do_syscall_64+0x3b/0xc0 entry_SYSCALL_64_after_hwframe+0x44/0xae </TASK> ========================================================================= Fixes: b8e8c830 ("kvm: mmu: ITLB_MULTIHIT mitigation") Cc: stable@vger.kernel.org Reported-by: NBruno Goncalves <bgoncalv@redhat.com> Reported-by: NJan Stancek <jstancek@redhat.com> Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20220331221359.3912754-1-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
-
- 14 7月, 2022 2 次提交
-
-
由 Hou Wenlong 提交于
stable inclusion from stable-v5.10.111 commit a24479c5e9f44c2a799ca2178b1ba9fe1290706e category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5GL1Z Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=a24479c5e9f44c2a799ca2178b1ba9fe1290706e -------------------------------- [ Upstream commit a836839c ] When RDTSCP is supported but RDPID is not supported in host, RDPID emulation is available. However, __kvm_get_msr() would only fail when RDTSCP/RDPID both are disabled in guest, so the emulator wouldn't inject a #UD when RDPID is disabled but RDTSCP is enabled in guest. Fixes: fb6d4d34 ("KVM: x86: emulate RDPID") Signed-off-by: NHou Wenlong <houwenlong.hwl@antgroup.com> Message-Id: <1dfd46ae5b76d3ed87bde3154d51c64ea64c99c1.1646226788.git.houwenlong.hwl@antgroup.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com> Reviewed-by: NWei Li <liwei391@huawei.com>
-
由 Jim Mattson 提交于
stable inclusion from stable-v5.10.111 commit 66b0fa6b22187caf4d789bdac8dcca49940628b0 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5GL1Z Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=66b0fa6b22187caf4d789bdac8dcca49940628b0 -------------------------------- [ Upstream commit 9b026073 ] AMD EPYC CPUs never raise a #GP for a WRMSR to a PerfEvtSeln MSR. Some reserved bits are cleared, and some are not. Specifically, on Zen3/Milan, bits 19 and 42 are not cleared. When emulating such a WRMSR, KVM should not synthesize a #GP, regardless of which bits are set. However, undocumented bits should not be passed through to the hardware MSR. So, rather than checking for reserved bits and synthesizing a #GP, just clear the reserved bits. This may seem pedantic, but since KVM currently does not support the "Host/Guest Only" bits (41:40), it is necessary to clear these bits rather than synthesizing #GP, because some popular guests (e.g Linux) will set the "Host Only" bit even on CPUs that don't support EFER.SVME, and they don't expect a #GP. For example, root@Ubuntu1804:~# perf stat -e r26 -a sleep 1 Performance counter stats for 'system wide': 0 r26 1.001070977 seconds time elapsed Feb 23 03:59:58 Ubuntu1804 kernel: [ 405.379957] unchecked MSR access error: WRMSR to 0xc0010200 (tried to write 0x0000020000130026) at rIP: 0xffffffff9b276a28 (native_write_msr+0x8/0x30) Feb 23 03:59:58 Ubuntu1804 kernel: [ 405.379958] Call Trace: Feb 23 03:59:58 Ubuntu1804 kernel: [ 405.379963] amd_pmu_disable_event+0x27/0x90 Fixes: ca724305 ("KVM: x86/vPMU: Implement AMD vPMU code for KVM") Reported-by: NLotus Fenn <lotusf@google.com> Signed-off-by: NJim Mattson <jmattson@google.com> Reviewed-by: NLike Xu <likexu@tencent.com> Reviewed-by: NDavid Dunn <daviddunn@google.com> Message-Id: <20220226234131.2167175-1-jmattson@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com> Reviewed-by: NWei Li <liwei391@huawei.com>
-
- 05 7月, 2022 3 次提交
-
-
由 Pawan Gupta 提交于
stable inclusion from stable-v5.10.123 commit bde15fdcce44956278b4f50680b7363ca126ffb9 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I5D5RS CVE: CVE-2022-21123,CVE-2022-21125,CVE-2022-21166 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.10.y&id=bde15fdcce44956278b4f50680b7363ca126ffb9 -------------------------------- commit 027bbb88 upstream The enumeration of MD_CLEAR in CPUID(EAX=7,ECX=0).EDX{bit 10} is not an accurate indicator on all CPUs of whether the VERW instruction will overwrite fill buffers. FB_CLEAR enumeration in IA32_ARCH_CAPABILITIES{bit 17} covers the case of CPUs that are not vulnerable to MDS/TAA, indicating that microcode does overwrite fill buffers. Guests running in VMM environments may not be aware of all the capabilities/vulnerabilities of the host CPU. Specifically, a guest may apply MDS/TAA mitigations when a virtual CPU is enumerated as vulnerable to MDS/TAA even when the physical CPU is not. On CPUs that enumerate FB_CLEAR_CTRL the VMM may set FB_CLEAR_DIS to skip overwriting of fill buffers by the VERW instruction. This is done by setting FB_CLEAR_DIS during VMENTER and resetting on VMEXIT. For guests that enumerate FB_CLEAR (explicitly asking for fill buffer clear capability) the VMM will not use FB_CLEAR_DIS. Irrespective of guest state, host overwrites CPU buffers before VMENTER to protect itself from an MMIO capable guest, as part of mitigation for MMIO Stale Data vulnerabilities. Signed-off-by: NPawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Conflicts: arch/x86/kvm/vmx/vmx.h Signed-off-by: NYipeng Zou <zouyipeng@huawei.com> Reviewed-by: NZhang Jianhua <chris.zjh@huawei.com> Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Pawan Gupta 提交于
stable inclusion from stable-v5.10.123 commit 26f6f231f6a5a79ccc274967939b22602dec76e8 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I5D5RS CVE: CVE-2022-21123,CVE-2022-21125,CVE-2022-21166 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=linux-5.10.y&id=26f6f231f6a5a79ccc274967939b22602dec76e8 -------------------------------- commit 8cb861e9 upstream Processor MMIO Stale Data is a class of vulnerabilities that may expose data after an MMIO operation. For details please refer to Documentation/admin-guide/hw-vuln/processor_mmio_stale_data.rst. These vulnerabilities are broadly categorized as: Device Register Partial Write (DRPW): Some endpoint MMIO registers incorrectly handle writes that are smaller than the register size. Instead of aborting the write or only copying the correct subset of bytes (for example, 2 bytes for a 2-byte write), more bytes than specified by the write transaction may be written to the register. On some processors, this may expose stale data from the fill buffers of the core that created the write transaction. Shared Buffers Data Sampling (SBDS): After propagators may have moved data around the uncore and copied stale data into client core fill buffers, processors affected by MFBDS can leak data from the fill buffer. Shared Buffers Data Read (SBDR): It is similar to Shared Buffer Data Sampling (SBDS) except that the data is directly read into the architectural software-visible state. An attacker can use these vulnerabilities to extract data from CPU fill buffers using MDS and TAA methods. Mitigate it by clearing the CPU fill buffers using the VERW instruction before returning to a user or a guest. On CPUs not affected by MDS and TAA, user application cannot sample data from CPU fill buffers using MDS or TAA. A guest with MMIO access can still use DRPW or SBDR to extract data architecturally. Mitigate it with VERW instruction to clear fill buffers before VMENTER for MMIO capable guests. Add a kernel parameter mmio_stale_data={off|full|full,nosmt} to control the mitigation. Signed-off-by: NPawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NYipeng Zou <zouyipeng@huawei.com> Reviewed-by: NZhang Jianhua <chris.zjh@huawei.com> Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Yi Wang 提交于
stable inclusion from stable-v5.10.110 commit 0fb470eb48892e131d10aa3be6915239e65758f3 bugzilla: https://gitee.com/openeuler/kernel/issues/I574AL Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=0fb470eb48892e131d10aa3be6915239e65758f3 -------------------------------- commit a80ced6e upstream. As guest_irq is coming from KVM_IRQFD API call, it may trigger crash in svm_update_pi_irte() due to out-of-bounds: crash> bt PID: 22218 TASK: ffff951a6ad74980 CPU: 73 COMMAND: "vcpu8" #0 [ffffb1ba6707fa40] machine_kexec at ffffffff8565b397 #1 [ffffb1ba6707fa90] __crash_kexec at ffffffff85788a6d #2 [ffffb1ba6707fb58] crash_kexec at ffffffff8578995d #3 [ffffb1ba6707fb70] oops_end at ffffffff85623c0d #4 [ffffb1ba6707fb90] no_context at ffffffff856692c9 #5 [ffffb1ba6707fbf8] exc_page_fault at ffffffff85f95b51 #6 [ffffb1ba6707fc50] asm_exc_page_fault at ffffffff86000ace [exception RIP: svm_update_pi_irte+227] RIP: ffffffffc0761b53 RSP: ffffb1ba6707fd08 RFLAGS: 00010086 RAX: ffffb1ba6707fd78 RBX: ffffb1ba66d91000 RCX: 0000000000000001 RDX: 00003c803f63f1c0 RSI: 000000000000019a RDI: ffffb1ba66db2ab8 RBP: 000000000000019a R8: 0000000000000040 R9: ffff94ca41b82200 R10: ffffffffffffffcf R11: 0000000000000001 R12: 0000000000000001 R13: 0000000000000001 R14: ffffffffffffffcf R15: 000000000000005f ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #7 [ffffb1ba6707fdb8] kvm_irq_routing_update at ffffffffc09f19a1 [kvm] #8 [ffffb1ba6707fde0] kvm_set_irq_routing at ffffffffc09f2133 [kvm] #9 [ffffb1ba6707fe18] kvm_vm_ioctl at ffffffffc09ef544 [kvm] RIP: 00007f143c36488b RSP: 00007f143a4e04b8 RFLAGS: 00000246 RAX: ffffffffffffffda RBX: 00007f05780041d0 RCX: 00007f143c36488b RDX: 00007f05780041d0 RSI: 000000004008ae6a RDI: 0000000000000020 RBP: 00000000000004e8 R8: 0000000000000008 R9: 00007f05780041e0 R10: 00007f0578004560 R11: 0000000000000246 R12: 00000000000004e0 R13: 000000000000001a R14: 00007f1424001c60 R15: 00007f0578003bc0 ORIG_RAX: 0000000000000010 CS: 0033 SS: 002b Vmx have been fix this in commit 3a8b0677 (KVM: VMX: Do not BUG() on out-of-bounds guest IRQ), so we can just copy source from that to fix this. Co-developed-by: NYi Liu <liu.yi24@zte.com.cn> Signed-off-by: NYi Liu <liu.yi24@zte.com.cn> Signed-off-by: NYi Wang <wang.yi59@zte.com.cn> Message-Id: <20220309113025.44469-1-wang.yi59@zte.com.cn> Cc: stable@vger.kernel.org Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NYu Liao <liaoyu15@huawei.com> Reviewed-by: NWei Li <liwei391@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-