- 01 9月, 2022 1 次提交
-
-
由 Vitaly Kuznetsov 提交于
stable inclusion from stable-v5.10.119 commit 74c6e5d584354c6126f1231667f9d8e85d7f536f category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5L6BB Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=74c6e5d584354c6126f1231667f9d8e85d7f536f -------------------------------- commit 2f15d027 upstream. Async PF 'page ready' event may happen when LAPIC is (temporary) disabled. In particular, Sebastien reports that when Linux kernel is directly booted by Cloud Hypervisor, LAPIC is 'software disabled' when APF mechanism is initialized. On initialization KVM tries to inject 'wakeup all' event and puts the corresponding token to the slot. It is, however, failing to inject an interrupt (kvm_apic_set_irq() -> __apic_accept_irq() -> !apic_enabled()) so the guest never gets notified and the whole APF mechanism gets stuck. The same issue is likely to happen if the guest temporary disables LAPIC and a previously unavailable page becomes available. Do two things to resolve the issue: - Avoid dequeuing 'page ready' events from APF queue when LAPIC is disabled. - Trigger an attempt to deliver pending 'page ready' events when LAPIC becomes enabled (SPIV or MSR_IA32_APICBASE). Reported-by: NSebastien Boeuf <sebastien.boeuf@intel.com> Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com> Message-Id: <20210422092948.568327-1-vkuznets@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> [Guoqing: backport to 5.10-stable ] Signed-off-by: NGuoqing Jiang <guoqing.jiang@linux.dev> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
-
- 17 8月, 2022 3 次提交
-
-
由 Sean Christopherson 提交于
stable inclusion from stable-v5.10.118 commit b49bc8d615eea1043cb238318af04f8876e846b3 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5L686 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=b49bc8d615eea1043cb238318af04f8876e846b3 -------------------------------- commit b28cb0cd upstream. When zapping obsolete pages, update the running count of zapped pages regardless of whether or not the list has become unstable due to zapping a shadow page with its own child shadow pages. If the VM is backed by mostly 4kb pages, KVM can zap an absurd number of SPTEs without bumping the batch count and thus without yielding. In the worst case scenario, this can cause a soft lokcup. watchdog: BUG: soft lockup - CPU#12 stuck for 22s! [dirty_log_perf_:13020] RIP: 0010:workingset_activation+0x19/0x130 mark_page_accessed+0x266/0x2e0 kvm_set_pfn_accessed+0x31/0x40 mmu_spte_clear_track_bits+0x136/0x1c0 drop_spte+0x1a/0xc0 mmu_page_zap_pte+0xef/0x120 __kvm_mmu_prepare_zap_page+0x205/0x5e0 kvm_mmu_zap_all_fast+0xd7/0x190 kvm_mmu_invalidate_zap_pages_in_memslot+0xe/0x10 kvm_page_track_flush_slot+0x5c/0x80 kvm_arch_flush_shadow_memslot+0xe/0x10 kvm_set_memslot+0x1a8/0x5d0 __kvm_set_memory_region+0x337/0x590 kvm_vm_ioctl+0xb08/0x1040 Fixes: fbb158cb ("KVM: x86/mmu: Revert "Revert "KVM: MMU: zap pages in batch""") Reported-by: NDavid Matlack <dmatlack@google.com> Reviewed-by: NBen Gardon <bgardon@google.com> Cc: stable@vger.kernel.org Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20220511145122.3133334-1-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
-
由 Peter Zijlstra 提交于
stable inclusion from stable-v5.10.118 commit a59450656bcda7fbee9f892d5a65715ad846ce29 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5L686 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=a59450656bcda7fbee9f892d5a65715ad846ce29 -------------------------------- [ Upstream commit 4327d168 ] The chacha_Nblock_xor_avx512vl() functions all have their own, identical, .LdoneN label, however in one particular spot {2,4} jump to the 8 version instead of their own. Resulting in: arch/x86/crypto/chacha-x86_64.o: warning: objtool: chacha_2block_xor_avx512vl() falls through to next function chacha_8block_xor_avx512vl() arch/x86/crypto/chacha-x86_64.o: warning: objtool: chacha_4block_xor_avx512vl() falls through to next function chacha_8block_xor_avx512vl() Make each function consistently use its own done label. Reported-by: NStephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: NMartin Willi <martin@strongswan.org> Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au> Signed-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
-
由 David Gow 提交于
stable inclusion from stable-v5.10.118 commit ea6a86886caa1ec659dd08b38ca879bacde47123 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5L686 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=ea6a86886caa1ec659dd08b38ca879bacde47123 -------------------------------- [ Upstream commit f4f03f29 ] The syscall_handler_t type for x86_64 was defined as 'long (*)(void)', but always cast to 'long (*)(long, long, long, long, long, long)' before use. This now triggers a warning (see below). Define syscall_handler_t as the latter instead, and remove the cast. This simplifies the code, and fixes the warning. Warning: In file included from ../arch/um/include/asm/processor-generic.h:13 from ../arch/x86/um/asm/processor.h:41 from ../include/linux/rcupdate.h:30 from ../include/linux/rculist.h:11 from ../include/linux/pid.h:5 from ../include/linux/sched.h:14 from ../include/linux/ptrace.h:6 from ../arch/um/kernel/skas/syscall.c:7: ../arch/um/kernel/skas/syscall.c: In function ‘handle_syscall’: ../arch/x86/um/shared/sysdep/syscalls_64.h:18:11: warning: cast between incompatible function types from ‘long int (*)(void)’ to ‘long int (*)(long int, long int, long int, long int, long int, long int)’ [ -Wcast-function-type] 18 | (((long (*)(long, long, long, long, long, long)) \ | ^ ../arch/x86/um/asm/ptrace.h:36:62: note: in definition of macro ‘PT_REGS_SET_SYSCALL_RETURN’ 36 | #define PT_REGS_SET_SYSCALL_RETURN(r, res) (PT_REGS_AX(r) = (res)) | ^~~ ../arch/um/kernel/skas/syscall.c:46:33: note: in expansion of macro ‘EXECUTE_SYSCALL’ 46 | EXECUTE_SYSCALL(syscall, regs)); | ^~~~~~~~~~~~~~~ Signed-off-by: NDavid Gow <davidgow@google.com> Signed-off-by: NRichard Weinberger <richard@nod.at> Signed-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
-
- 10 8月, 2022 1 次提交
-
-
由 Juergen Gross 提交于
stable inclusion from stable-v5.10.132 commit 136d7987fcfdeca73ee3c6a29e48f99fdd0f4d87 category: bugfix bugzilla: https://gitee.com/src-openeuler/kernel/issues/I5JTYM CVE: CVE-2022-36123 Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=136d7987fcfdeca73ee3c6a29e48f99fdd0f4d87 -------------------------------- [ Upstream commit 38fa5479 ] The .brk section has the same properties as .bss: it is an alloc-only section and should be cleared before being used. Not doing so is especially a problem for Xen PV guests, as the hypervisor will validate page tables (check for writable page tables and hypervisor private bits) before accepting them to be used. Make sure .brk is initially zero by letting clear_bss() clear the brk area, too. Signed-off-by: NJuergen Gross <jgross@suse.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20220630071441.28576-3-jgross@suse.comSigned-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NGONG, Ruiqi <gongruiqi1@huawei.com> Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com> Reviewed-by: NWang Weiyang <wangweiyang2@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
- 04 8月, 2022 9 次提交
-
-
由 Wanpeng Li 提交于
stable inclusion from stable-v5.10.115 commit 43dbc3edada66eaa9d406ea9647359e795288ea0 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5IZ9C Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=43dbc3edada66eaa9d406ea9647359e795288ea0 -------------------------------- [ Upstream commit 1714a4eb ] As commit 0c5f81da ("KVM: LAPIC: Inject timer interrupt via posted interrupt") mentioned that the host admin should well tune the guest setup, so that vCPUs are placed on isolated pCPUs, and with several pCPUs surplus for *busy* housekeeping. In this setup, it is preferrable to disable mwait/hlt/pause vmexits to keep the vCPUs in non-root mode. However, if only some guests isolated and others not, they would not have any benefit from posted timer interrupts, and at the same time lose VMX preemption timer fast paths because kvm_can_post_timer_interrupt() returns true and therefore forces kvm_can_use_hv_timer() to false. By guaranteeing that posted-interrupt timer is only used if MWAIT or HLT are done without vmexit, KVM can make a better choice and use the VMX preemption timer and the corresponding fast paths. Reported-by: NAili Yao <yaoaili@kingsoft.com> Reviewed-by: NSean Christopherson <seanjc@google.com> Cc: Aili Yao <yaoaili@kingsoft.com> Cc: Sean Christopherson <seanjc@google.com> Signed-off-by: NWanpeng Li <wanpengli@tencent.com> Message-Id: <1643112538-36743-1-git-send-email-wanpengli@tencent.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
-
由 Paolo Bonzini 提交于
stable inclusion from stable-v5.10.115 commit 9c8474fa3477395e754687b3edb17ca0dfe46d6a category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5IZ9C Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=9c8474fa3477395e754687b3edb17ca0dfe46d6a -------------------------------- [ Upstream commit 9191b8f0 ] WARN and bail if KVM attempts to free a root that isn't backed by a shadow page. KVM allocates a bare page for "special" roots, e.g. when using PAE paging or shadowing 2/3/4-level page tables with 4/5-level, and so root_hpa will be valid but won't be backed by a shadow page. It's all too easy to blindly call mmu_free_root_page() on root_hpa, be nice and WARN instead of crashing KVM and possibly the kernel. Reviewed-by: NSean Christopherson <seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
-
由 Paolo Bonzini 提交于
stable inclusion from stable-v5.10.115 commit a474ee5ececc19c764c718c76550c5650e54a3be category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5IZ9C Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=a474ee5ececc19c764c718c76550c5650e54a3be -------------------------------- [ Upstream commit d22a81b3 ] Emulating writes to SELF_IPI with a write to ICR has an unwanted side effect: the value of ICR in vAPIC page gets changed. The lists SELF_IPI as write-only, with no associated MMIO offset, so any write should have no visible side effect in the vAPIC page. Reported-by: NChao Gao <chao.gao@intel.com> Reviewed-by: NSean Christopherson <seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
-
由 Wanpeng Li 提交于
stable inclusion from stable-v5.10.115 commit 64e3e16dbc26c88bc182028b53c27df7af40239a category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5IZ9C Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=64e3e16dbc26c88bc182028b53c27df7af40239a -------------------------------- [ Upstream commit 0361bdfd ] MSR_KVM_POLL_CONTROL is cleared on reset, thus reverting guests to host-side polling after suspend/resume. Non-bootstrap CPUs are restored correctly by the haltpoll driver because they are hot-unplugged during suspend and hot-plugged during resume; however, the BSP is not hotpluggable and remains in host-sde polling mode after the guest resume. The makes the guest pay for the cost of vmexits every time the guest enters idle. Fix it by recording BSP's haltpoll state and resuming it during guest resume. Cc: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: NWanpeng Li <wanpengli@tencent.com> Message-Id: <1650267752-46796-1-git-send-email-wanpengli@tencent.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
-
由 Sandipan Das 提交于
stable inclusion from stable-v5.10.115 commit 599fc32e7421d7a591ee78bb4f3b79c54d51065c category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5IZ9C Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=599fc32e7421d7a591ee78bb4f3b79c54d51065c -------------------------------- [ Upstream commit 5a1bde46 ] On some x86 processors, CPUID leaf 0xA provides information on Architectural Performance Monitoring features. It advertises a PMU version which Qemu uses to determine the availability of additional MSRs to manage the PMCs. Upon receiving a KVM_GET_SUPPORTED_CPUID ioctl request for the same, the kernel constructs return values based on the x86_pmu_capability irrespective of the vendor. This leaf and the additional MSRs are not supported on AMD and Hygon processors. If AMD PerfMonV2 is detected, the PMU version is set to 2 and guest startup breaks because of an attempt to access a non-existent MSR. Return zeros to avoid this. Fixes: a6c06ed1 ("KVM: Expose the architectural performance monitoring CPUID leaf") Reported-by: NVasant Hegde <vasant.hegde@amd.com> Signed-off-by: NSandipan Das <sandipan.das@amd.com> Message-Id: <3fef83d9c2b2f7516e8ff50d60851f29a4bcb716.1651058600.git.sandipan.das@amd.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
-
由 Kyle Huey 提交于
stable inclusion from stable-v5.10.115 commit fbb7c61e76017202dedd2020f4d51264f500de68 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5IZ9C Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=fbb7c61e76017202dedd2020f4d51264f500de68 -------------------------------- commit 5eb84932 upstream. Zen renumbered some of the performance counters that correspond to the well known events in perf_hw_id. This code in KVM was never updated for that, so guest that attempt to use counters on Zen that correspond to the pre-Zen perf_hw_id values will silently receive the wrong values. This has been observed in the wild with rr[0] when running in Zen 3 guests. rr uses the retired conditional branch counter 00d1 which is incorrectly recognized by KVM as PERF_COUNT_HW_STALLED_CYCLES_BACKEND. [0] https://rr-project.org/Signed-off-by: NKyle Huey <me@kylehuey.com> Message-Id: <20220503050136.86298-1-khuey@kylehuey.com> Cc: stable@vger.kernel.org [Check guest family, not host. - Paolo] Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
-
由 Borislav Petkov 提交于
stable inclusion from stable-v5.10.114 commit 2ab14625b879eec22854f1dbe61d51570b427513 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5IY1V Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=2ab14625b879eec22854f1dbe61d51570b427513 -------------------------------- commit f9e14dbb upstream. When resuming from system sleep state, restore_processor_state() restores the boot CPU MSRs. These MSRs could be emulated by microcode. If microcode is not loaded yet, writing to emulated MSRs leads to unchecked MSR access error: ... PM: Calling lapic_suspend+0x0/0x210 unchecked MSR access error: WRMSR to 0x10f (tried to write 0x0...0) at rIP: ... (native_write_msr) Call Trace: <TASK> ? restore_processor_state x86_acpi_suspend_lowlevel acpi_suspend_enter suspend_devices_and_enter pm_suspend.cold state_store kobj_attr_store sysfs_kf_write kernfs_fop_write_iter new_sync_write vfs_write ksys_write __x64_sys_write do_syscall_64 entry_SYSCALL_64_after_hwframe RIP: 0033:0x7fda13c260a7 To ensure microcode emulated MSRs are available for restoration, load the microcode on the boot CPU before restoring these MSRs. [ Pawan: write commit message and productize it. ] Fixes: e2a1256b ("x86/speculation: Restore speculation related MSRs during S3 resume") Reported-by: NKyle D. Pelton <kyle.d.pelton@intel.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Signed-off-by: NPawan Gupta <pawan.kumar.gupta@linux.intel.com> Tested-by: NKyle D. Pelton <kyle.d.pelton@intel.com> Cc: stable@vger.kernel.org Link: https://bugzilla.kernel.org/show_bug.cgi?id=215841 Link: https://lore.kernel.org/r/4350dfbf785cd482d3fafa72b2b49c83102df3ce.1650386317.git.pawan.kumar.gupta@linux.intel.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
-
由 Mikulas Patocka 提交于
stable inclusion from stable-v5.10.114 commit 80fc45377f410426c8d922db4838760b107970aa category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5IY1V Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=80fc45377f410426c8d922db4838760b107970aa -------------------------------- [ Upstream commit a6823e4e ] The first "if" condition in __memcpy_flushcache is supposed to align the "dest" variable to 8 bytes and copy data up to this alignment. However, this condition may misbehave if "size" is greater than 4GiB. The statement min_t(unsigned, size, ALIGN(dest, 8) - dest); casts both arguments to unsigned int and selects the smaller one. However, the cast truncates high bits in "size" and it results in misbehavior. For example: suppose that size == 0x100000001, dest == 0x200000002 min_t(unsigned, size, ALIGN(dest, 8) - dest) == min_t(0x1, 0xe) == 0x1; ... dest += 0x1; so we copy just one byte "and" dest remains unaligned. This patch fixes the bug by replacing unsigned with size_t. Signed-off-by: NMikulas Patocka <mpatocka@redhat.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
-
由 Thomas Gleixner 提交于
stable inclusion from stable-v5.10.114 commit ad604cbd1d54ba9b2944da72f97cd4669c8fcc1a category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5IY1V Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=ad604cbd1d54ba9b2944da72f97cd4669c8fcc1a -------------------------------- commit 7e0815b3 upstream. When a XEN_HVM guest uses the XEN PIRQ/Eventchannel mechanism, then PCI/MSI[-X] masking is solely controlled by the hypervisor, but contrary to XEN_PV guests this does not disable PCI/MSI[-X] masking in the PCI/MSI layer. This can lead to a situation where the PCI/MSI layer masks an MSI[-X] interrupt and the hypervisor grants the write despite the fact that it already requested the interrupt. As a consequence interrupt delivery on the affected device is not happening ever. Set pci_msi_ignore_mask to prevent that like it's done for XEN_PV guests already. Fixes: 809f9267 ("xen: map MSIs into pirqs") Reported-by: NJeremi Piotrowski <jpiotrowski@linux.microsoft.com> Reported-by: NDusty Mabe <dustymabe@redhat.com> Reported-by: NSalvatore Bonaccorso <carnil@debian.org> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Tested-by: NNoah Meyerhans <noahm@debian.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/87tuaduxj5.ffs@tglxSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
-
- 29 7月, 2022 3 次提交
-
-
由 Fenghua Yu 提交于
mainline inclusion from mainline-v5.12-rc4 commit ef4ae6e4 category: feature bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I5G10C CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/ commit/?id=ef4ae6e4 Intel-SIG: commit ef4ae6e4 x86/bus_lock: Set rate limit for bus lock_ -------------------------------- A bus lock can be thousands of cycles slower than atomic operation within one cache line. It also disrupts performance on other cores. Malicious users can generate multiple bus locks to degrade the whole system performance. The current mitigation is to kill the offending process, but for certain scenarios it's desired to identify and throttle the offending application. Add a system wide rate limit for bus locks. When the system detects bus locks at a rate higher than N/sec (where N can be set by the kernel boot argument in the range [1..1000]) any task triggering a bus lock will be forced to sleep for at least 20ms until the overall system rate of bus locks drops below the threshold. Signed-off-by: NFenghua Yu <fenghua.yu@intel.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NTony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/r/20210419214958.4035512-3-fenghua.yu@intel.com (cherry picked from commit ef4ae6e4) Signed-off-by: NEthan Zhao <haifeng.zhao@linux.intel.com>
-
由 Fenghua Yu 提交于
mainline inclusion from mainline-v5.12-rc4 commit ebb1064e category: feature bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I5G10C CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/ commit/?id=ebb1064e Intel-SIG: commit ebb1064e x86/traps: Handle #DB for bus lock -------------------------------- Bus locks degrade performance for the whole system, not just for the CPU that requested the bus lock. Two CPU features "#AC for split lock" and "#DB for bus lock" provide hooks so that the operating system may choose one of several mitigation strategies. bus lock feature to cover additional situations with new options to mitigate. split_lock_detect= #AC for split lock #DB for bus lock off Do nothing Do nothing warn Kernel OOPs Warn once per task and Warn once per task and and continues to run. disable future checking When both features are supported, warn in #AC fatal Kernel OOPs Send SIGBUS to user. Send SIGBUS to user When both features are supported, fatal in #AC ratelimit:N Do nothing Limit bus lock rate to N per second in the current non-root user. Default option is "warn". Hardware only generates #DB for bus lock detect when CPL>0 to avoid nested #DB from multiple bus locks while the first #DB is being handled. So no need to handle #DB for bus lock detected in the kernel. while #AC for split lock is enabled by split lock detection bit 29 in TEST_CTRL MSR. Both breakpoint and bus lock in the same instruction can trigger one #DB. The bus lock is handled before the breakpoint in the #DB handler. Delivery of #DB for bus lock in userspace clears DR6[11], which is set by the #DB handler right after reading DR6. Signed-off-by: NFenghua Yu <fenghua.yu@intel.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NTony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/r/20210322135325.682257-3-fenghua.yu@intel.com (cherry picked from commit ebb1064e) Signed-off-by: NEthan Zhao <haifeng.zhao@linux.intel.com>
-
由 Fenghua Yu 提交于
mainline inclusion from mainline-v5.12-rc4 commit f21d4d3b category: feature bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I5G10C CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/ commit/?id=f21d4d3b Intel-SIG: commit f21d4d3b x86/cpufeatures: Enumerate #DB for bus lock detection -------------------------------- A bus lock is acquired through either a split locked access to writeback (WB) memory or any locked access to non-WB memory. This is typically >1000 cycles slower than an atomic operation within a cache line. It also disrupts performance on other cores. Some CPUs have the ability to notify the kernel by a #DB trap after a user instruction acquires a bus lock and is executed. This allows the kernel to enforce user application throttling or mitigation. Both breakpoint and bus lock can trigger the #DB trap in the same instruction and the ordering of handling them is the kernel #DB handler's choice. The CPU feature flag to be shown in /proc/cpuinfo will be "bus_lock_detect". Signed-off-by: NFenghua Yu <fenghua.yu@intel.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NTony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/r/20210322135325.682257-2-fenghua.yu@intel.com (cherry picked from commit f21d4d3b) Signed-off-by: NEthan Zhao <haifeng.zhao@linux.intel.com>
-
- 28 7月, 2022 2 次提交
-
-
由 Muchun Song 提交于
mainline inclusion from mainline-v5.19-rc1 commit 47010c04 category: feature bugzilla: 187198, https://gitee.com/openeuler/kernel/issues/I5GVFO CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=47010c040dec8af6347ec6259104fc13f7e7e30a -------------------------------- The word of "free" is not expressive enough to express the feature of optimizing vmemmap pages associated with each HugeTLB, rename this keywork to "optimize". In this patch , cheanup configs to make code more expressive. Link: https://lkml.kernel.org/r/20220404074652.68024-4-songmuchun@bytedance.comSigned-off-by: NMuchun Song <songmuchun@bytedance.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: David Hildenbrand <david@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Conflicts: arch/arm64/configs/openeuler_defconfig arch/x86/configs/openeuler_defconfig Documentation/admin-guide/kernel-parameters.txt include/linux/hugetlb.h mm/Makefile Signed-off-by: NLiu Shixin <liushixin2@huawei.com> Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Muchun Song 提交于
mainline inclusion from mainline-v5.19-rc1 commit 2e4ec02b category: feature bugzilla: 187198, https://gitee.com/openeuler/kernel/issues/I5GVFO CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=2e4ec02bbcc05b8905d65c763ebde6bc85508e90 -------------------------------- The feature of minimizing overhead of struct page associated with each HugeTLB page is implemented on x86_64, however, the infrastructure of this feature is already there, we could easily enable it for other architectures. Introduce ARCH_WANT_HUGETLB_PAGE_FREE_VMEMMAP for other architectures to be easily enabled. Just select this config if they want to enable this feature. Link: https://lkml.kernel.org/r/20220331065640.5777-1-songmuchun@bytedance.comSigned-off-by: NMuchun Song <songmuchun@bytedance.com> Suggested-by: NAndrew Morton <akpm@linux-foundation.org> Reviewed-by: NBarry Song <baohua@kernel.org> Tested-by: NBarry Song <baohua@kernel.org> Reviewed-by: NAnshuman Khandual <anshuman.khandual@arm.com> Cc: Bodeddula Balasubramaniam <bodeddub@amazon.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: David Hildenbrand <david@redhat.com> Cc: David Rientjes <rientjes@google.com> Cc: Fam Zheng <fam.zheng@bytedance.com> Cc: James Morse <james.morse@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Will Deacon <will@kernel.org> Cc: Xiongchun Duan <duanxiongchun@bytedance.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLiu Shixin <liushixin2@huawei.com> Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
- 26 7月, 2022 1 次提交
-
-
由 Mikulas Patocka 提交于
stable inclusion from stable-v5.10.113 commit 76101c8e0c31938e59bbcb4beed47c8d8b89d39a category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5ISAH Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=76101c8e0c31938e59bbcb4beed47c8d8b89d39a -------------------------------- [ Upstream commit 932aba1e ] struct stat (defined in arch/x86/include/uapi/asm/stat.h) has 32-bit st_dev and st_rdev; struct compat_stat (defined in arch/x86/include/asm/compat.h) has 16-bit st_dev and st_rdev followed by a 16-bit padding. This patch fixes struct compat_stat to match struct stat. [ Historical note: the old x86 'struct stat' did have that 16-bit field that the compat layer had kept around, but it was changes back in 2003 by "struct stat - support larger dev_t": https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git/commit/?id=e95b2065677fe32512a597a79db94b77b90c968d and back in those days, the x86_64 port was still new, and separate from the i386 code, and had already picked up the old version with a 16-bit st_dev field ] Note that we can't change compat_dev_t because it is used by compat_loop_info. Also, if the st_dev and st_rdev values are 32-bit, we don't have to use old_valid_dev to test if the value fits into them. This fixes -EOVERFLOW on filesystems that are on NVMe because NVMe uses the major number 259. Signed-off-by: NMikulas Patocka <mpatocka@redhat.com> Cc: Andreas Schwab <schwab@linux-m68k.org> Cc: Matthew Wilcox <willy@infradead.org> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
-
- 21 7月, 2022 4 次提交
-
-
由 Youquan Song 提交于
mainline inclusion from mainline-v5.17-rc1 commit 33761363 category: bugfix bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I5HAC1 CVE: NA Intel-SIG: commit 33761363 x86/mce: Reduce number of machine checks taken during recovery. Backport for MCA recovery enhancing & bug fix. -------------------------------- When any of the copy functions in arch/x86/lib/copy_user_64.S take a fault, the fixup code copies the remaining byte count from %ecx to %edx and unconditionally jumps to .Lcopy_user_handle_tail to continue the copy in case any more bytes can be copied. If the fault was #PF this may copy more bytes (because the page fault handler might have fixed the fault). But when the fault is a machine check the original copy code will have copied all the way to the poisoned cache line. So .Lcopy_user_handle_tail will just take another machine check for no good reason. Every code path to .Lcopy_user_handle_tail comes from an exception fixup path, so add a check there to check the trap type (in %eax) and simply return the count of remaining bytes if the trap was a machine check. Doing this reduces the number of machine checks taken during synthetic tests from four to three. As well as reducing the number of machine checks, this also allows Skylake generation Xeons to recover some cases that currently fail. The is because REP; MOVSB is only recoverable when source and destination are well aligned and the byte count is large. That useless call to .Lcopy_user_handle_tail may violate one or more of these conditions and generate a fatal machine check. [ Tony: Add more details to commit message. ] [ bp: Fixup comment. Also, another tip patchset which is adding straight-line speculation mitigation changes the "ret" instruction to an all-caps macro "RET". But, since gas is case-insensitive, use "RET" in the newly added asm block already in order to simplify tip branch merging on its way upstream. ] Signed-off-by: NYouquan Song <youquan.song@intel.com> Signed-off-by: NTony Luck <tony.luck@intel.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/YcTW5dh8yTGucDd+@agluck-desk2.amr.corp.intel.comSigned-off-by: NYouquan Song <youquan.song@intel.com>
-
由 Tony Luck 提交于
mainline inclusion from mainline-v5.16-rc1 commit 69065847 category: bugfix bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I5HAC1 CVE: NA Intel-SIG: commit 69065847 x86/mce: Drop copyin special case for #MC. Backport for MCA recovery enhancing & bug fix. -------------------------------- Fixes to the iterator code to handle faults that are not on page boundaries mean that the special case for machine check during copy from user is no longer needed. For a full list of those fixes, see the output of: git log --oneline v5.14 ^v5.13 -- lib/iov_iter.c Intel-SIG: commit 69065847 x86/mce: Drop copyin special case for #MC. backport for MCA recovery enhance and bug fix. Signed-off-by: NTony Luck <tony.luck@intel.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210818002942.1607544-4-tony.luck@intel.comSigned-off-by: NYouquan Song <youquan.song@intel.com>
-
由 Tony Luck 提交于
mainline inclusion from mainline-v5.16-rc1 commit a6e3cf70 category: bugfix bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I5HAC1 CVE: NA Intel-SIG: commit a6e3cf70 x86/mce: Change to not send SIGBUS error during copy from user. Backport for MCA recovery enhance and bug fix. -------------------------------- Sending a SIGBUS for a copy from user is not the correct semantic. System calls should return -EFAULT (or a short count for write(2)). Signed-off-by: NTony Luck <tony.luck@intel.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210818002942.1607544-3-tony.luck@intel.comSigned-off-by: NYouquan Song <youquan.song@intel.com>
-
由 Naoya Horiguchi 提交于
mainline inclusion from mainline-v5.14-rc1 commit a3f5d80e category: bugfix bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I5HAC1 CVE: NA Intel-SIG: commit a3f5d80e mm,hwpoison: send SIGBUS with error virutal address. Backport for MCA recovery enhancing & bug fix. -------------------------------- Now an action required MCE in already hwpoisoned address surely sends a SIGBUS to current process, but the SIGBUS doesn't convey error virtual address. That's not optimal for hwpoison-aware applications. To fix the issue, make memory_failure() call kill_accessing_process(), that does pagetable walk to find the error virtual address. It could find multiple virtual addresses for the same error page, and it seems hard to tell which virtual address is correct one. But that's rare and sending incorrect virtual address could be better than no address. So let's report the first found virtual address for now. [naoya.horiguchi@nec.com: fix walk_page_range() return] Link: https://lkml.kernel.org/r/20210603051055.GA244241@hori.linux.bs1.fc.nec.co.jp Link: https://lkml.kernel.org/r/20210521030156.2612074-4-nao.horiguchi@gmail.comSigned-off-by: NNaoya Horiguchi <naoya.horiguchi@nec.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Aili Yao <yaoaili@kingsoft.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: David Hildenbrand <david@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Jue Wang <juew@google.com> Cc: Borislav Petkov <bp@suse.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NYouquan Song <youquan.song@intel.com>
-
- 19 7月, 2022 2 次提交
-
-
由 LeoLiuoc 提交于
zhaoxin inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I52DS7 CVE: NA -------------------------------------------- Set CONFIG_SENSORS_ZHAOXIN_CPUTEMP to 'm' by default in openeuler_defconfig. Signed-off-by: NLeoLiuoc <LeoLiu-oc@zhaoxin.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com> Reviewed-by: NXiongfeng Wang <wangxiongfeng2@huawei.com>
-
由 Sean Christopherson 提交于
stable inclusion from stable-v5.10.112 commit 342454231ee5f2c2782f5510cab2e7a968486fef category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5HL0X Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=342454231ee5f2c2782f5510cab2e7a968486fef -------------------------------- commit 1d0e8480 upstream. Resolve nx_huge_pages to true/false when kvm.ko is loaded, leaving it as -1 is technically undefined behavior when its value is read out by param_get_bool(), as boolean values are supposed to be '0' or '1'. Alternatively, KVM could define a custom getter for the param, but the auto value doesn't depend on the vendor module in any way, and printing "auto" would be unnecessarily unfriendly to the user. In addition to fixing the undefined behavior, resolving the auto value also fixes the scenario where the auto value resolves to N and no vendor module is loaded. Previously, -1 would result in Y being printed even though KVM would ultimately disable the mitigation. Rename the existing MMU module init/exit helpers to clarify that they're invoked with respect to the vendor module, and add comments to document why KVM has two separate "module init" flows. ========================================================================= UBSAN: invalid-load in kernel/params.c:320:33 load of value 255 is not a valid value for type '_Bool' CPU: 6 PID: 892 Comm: tail Not tainted 5.17.0-rc3+ #799 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 Call Trace: <TASK> dump_stack_lvl+0x34/0x44 ubsan_epilogue+0x5/0x40 __ubsan_handle_load_invalid_value.cold+0x43/0x48 param_get_bool.cold+0xf/0x14 param_attr_show+0x55/0x80 module_attr_show+0x1c/0x30 sysfs_kf_seq_show+0x93/0xc0 seq_read_iter+0x11c/0x450 new_sync_read+0x11b/0x1a0 vfs_read+0xf0/0x190 ksys_read+0x5f/0xe0 do_syscall_64+0x3b/0xc0 entry_SYSCALL_64_after_hwframe+0x44/0xae </TASK> ========================================================================= Fixes: b8e8c830 ("kvm: mmu: ITLB_MULTIHIT mitigation") Cc: stable@vger.kernel.org Reported-by: NBruno Goncalves <bgoncalv@redhat.com> Reported-by: NJan Stancek <jstancek@redhat.com> Signed-off-by: NSean Christopherson <seanjc@google.com> Message-Id: <20220331221359.3912754-1-seanjc@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com> Acked-by: NXie XiuQi <xiexiuqi@huawei.com>
-
- 18 7月, 2022 6 次提交
-
-
由 Pawan Gupta 提交于
stable inclusion from stable-v5.10.111 commit fc4bdaed4d4ea4209e65115bd3948a1e4ac51cbb category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5GL1Z Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=fc4bdaed4d4ea4209e65115bd3948a1e4ac51cbb -------------------------------- commit e2a1256b upstream. After resuming from suspend-to-RAM, the MSRs that control CPU's speculative execution behavior are not being restored on the boot CPU. These MSRs are used to mitigate speculative execution vulnerabilities. Not restoring them correctly may leave the CPU vulnerable. Secondary CPU's MSRs are correctly being restored at S3 resume by identify_secondary_cpu(). During S3 resume, restore these MSRs for boot CPU when restoring its processor state. Fixes: 77243971 ("x86/bugs/intel: Set proper CPU features and setup RDS") Reported-by: NNeelima Krishnan <neelima.krishnan@intel.com> Signed-off-by: NPawan Gupta <pawan.kumar.gupta@linux.intel.com> Tested-by: NNeelima Krishnan <neelima.krishnan@intel.com> Acked-by: NBorislav Petkov <bp@suse.de> Reviewed-by: NDave Hansen <dave.hansen@linux.intel.com> Cc: stable@vger.kernel.org Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com> Reviewed-by: NWei Li <liwei391@huawei.com>
-
由 Pawan Gupta 提交于
stable inclusion from stable-v5.10.111 commit 8c9e26c890ba130ad98137770d221dac7853540c category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5GL1Z Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=8c9e26c890ba130ad98137770d221dac7853540c -------------------------------- commit 73924ec4 upstream. The mechanism to save/restore MSRs during S3 suspend/resume checks for the MSR validity during suspend, and only restores the MSR if its a valid MSR. This is not optimal, as an invalid MSR will unnecessarily throw an exception for every suspend cycle. The more invalid MSRs, higher the impact will be. Check and save the MSR validity at setup. This ensures that only valid MSRs that are guaranteed to not throw an exception will be attempted during suspend. Fixes: 7a9c2dd0 ("x86/pm: Introduce quirk framework to save/restore extra MSR registers around suspend/resume") Suggested-by: NDave Hansen <dave.hansen@linux.intel.com> Signed-off-by: NPawan Gupta <pawan.kumar.gupta@linux.intel.com> Reviewed-by: NDave Hansen <dave.hansen@linux.intel.com> Acked-by: NBorislav Petkov <bp@suse.de> Cc: stable@vger.kernel.org Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com> Reviewed-by: NWei Li <liwei391@huawei.com>
-
由 Nathan Chancellor 提交于
stable inclusion from stable-v5.10.111 commit 9cb90f9ad5975ddfc90d0906b40e9c71c2eca44e category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5GL1Z Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=9cb90f9ad5975ddfc90d0906b40e9c71c2eca44e -------------------------------- [ Upstream commit aaeed6ec ] There are two outstanding issues with CONFIG_X86_X32_ABI and llvm-objcopy, with similar root causes: 1. llvm-objcopy does not properly convert .note.gnu.property when going from x86_64 to x86_x32, resulting in a corrupted section when linking: https://github.com/ClangBuiltLinux/linux/issues/1141 2. llvm-objcopy produces corrupted compressed debug sections when going from x86_64 to x86_x32, also resulting in an error when linking: https://github.com/ClangBuiltLinux/linux/issues/514 After commit 41c5ef31ad71 ("x86/ibt: Base IBT bits"), the .note.gnu.property section is always generated when CONFIG_X86_KERNEL_IBT is enabled, which causes the first issue to become visible with an allmodconfig build: ld.lld: error: arch/x86/entry/vdso/vclock_gettime-x32.o:(.note.gnu.property+0x1c): program property is too short To avoid this error, do not allow CONFIG_X86_X32_ABI to be selected when using llvm-objcopy. If the two issues ever get fixed in llvm-objcopy, this can be turned into a feature check. Signed-off-by: NNathan Chancellor <nathan@kernel.org> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20220314194842.3452-3-nathan@kernel.orgSigned-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com> Reviewed-by: NWei Li <liwei391@huawei.com>
-
由 Dongli Zhang 提交于
stable inclusion from stable-v5.10.111 commit a2a0e04f6478e8c1038db64717f3fafd55de1420 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5GL1Z Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=a2a0e04f6478e8c1038db64717f3fafd55de1420 -------------------------------- [ Upstream commit eed05744 ] The sched_clock() can be used very early since commit 857baa87 ("sched/clock: Enable sched clock early"). In addition, with commit 38669ba2 ("x86/xen/time: Output xen sched_clock time from 0"), kdump kernel in Xen HVM guest may panic at very early stage when accessing &__this_cpu_read(xen_vcpu)->time as in below: setup_arch() -> init_hypervisor_platform() -> x86_init.hyper.init_platform = xen_hvm_guest_init() -> xen_hvm_init_time_ops() -> xen_clocksource_read() -> src = &__this_cpu_read(xen_vcpu)->time; This is because Xen HVM supports at most MAX_VIRT_CPUS=32 'vcpu_info' embedded inside 'shared_info' during early stage until xen_vcpu_setup() is used to allocate/relocate 'vcpu_info' for boot cpu at arbitrary address. However, when Xen HVM guest panic on vcpu >= 32, since xen_vcpu_info_reset(0) would set per_cpu(xen_vcpu, cpu) = NULL when vcpu >= 32, xen_clocksource_read() on vcpu >= 32 would panic. This patch calls xen_hvm_init_time_ops() again later in xen_hvm_smp_prepare_boot_cpu() after the 'vcpu_info' for boot vcpu is registered when the boot vcpu is >= 32. This issue can be reproduced on purpose via below command at the guest side when kdump/kexec is enabled: "taskset -c 33 echo c > /proc/sysrq-trigger" The bugfix for PVM is not implemented due to the lack of testing environment. [boris: xen_hvm_init_time_ops() returns on errors instead of jumping to end] Cc: Joe Jin <joe.jin@oracle.com> Signed-off-by: NDongli Zhang <dongli.zhang@oracle.com> Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com> Link: https://lore.kernel.org/r/20220302164032.14569-3-dongli.zhang@oracle.comSigned-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com> Reviewed-by: NWei Li <liwei391@huawei.com>
-
由 Hou Wenlong 提交于
stable inclusion from stable-v5.10.111 commit a24479c5e9f44c2a799ca2178b1ba9fe1290706e category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5GL1Z Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=a24479c5e9f44c2a799ca2178b1ba9fe1290706e -------------------------------- [ Upstream commit a836839c ] When RDTSCP is supported but RDPID is not supported in host, RDPID emulation is available. However, __kvm_get_msr() would only fail when RDTSCP/RDPID both are disabled in guest, so the emulator wouldn't inject a #UD when RDPID is disabled but RDTSCP is enabled in guest. Fixes: fb6d4d34 ("KVM: x86: emulate RDPID") Signed-off-by: NHou Wenlong <houwenlong.hwl@antgroup.com> Message-Id: <1dfd46ae5b76d3ed87bde3154d51c64ea64c99c1.1646226788.git.houwenlong.hwl@antgroup.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com> Reviewed-by: NWei Li <liwei391@huawei.com>
-
由 Jim Mattson 提交于
stable inclusion from stable-v5.10.111 commit 66b0fa6b22187caf4d789bdac8dcca49940628b0 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5GL1Z Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=66b0fa6b22187caf4d789bdac8dcca49940628b0 -------------------------------- [ Upstream commit 9b026073 ] AMD EPYC CPUs never raise a #GP for a WRMSR to a PerfEvtSeln MSR. Some reserved bits are cleared, and some are not. Specifically, on Zen3/Milan, bits 19 and 42 are not cleared. When emulating such a WRMSR, KVM should not synthesize a #GP, regardless of which bits are set. However, undocumented bits should not be passed through to the hardware MSR. So, rather than checking for reserved bits and synthesizing a #GP, just clear the reserved bits. This may seem pedantic, but since KVM currently does not support the "Host/Guest Only" bits (41:40), it is necessary to clear these bits rather than synthesizing #GP, because some popular guests (e.g Linux) will set the "Host Only" bit even on CPUs that don't support EFER.SVME, and they don't expect a #GP. For example, root@Ubuntu1804:~# perf stat -e r26 -a sleep 1 Performance counter stats for 'system wide': 0 r26 1.001070977 seconds time elapsed Feb 23 03:59:58 Ubuntu1804 kernel: [ 405.379957] unchecked MSR access error: WRMSR to 0xc0010200 (tried to write 0x0000020000130026) at rIP: 0xffffffff9b276a28 (native_write_msr+0x8/0x30) Feb 23 03:59:58 Ubuntu1804 kernel: [ 405.379958] Call Trace: Feb 23 03:59:58 Ubuntu1804 kernel: [ 405.379963] amd_pmu_disable_event+0x27/0x90 Fixes: ca724305 ("KVM: x86/vPMU: Implement AMD vPMU code for KVM") Reported-by: NLotus Fenn <lotusf@google.com> Signed-off-by: NJim Mattson <jmattson@google.com> Reviewed-by: NLike Xu <likexu@tencent.com> Reviewed-by: NDavid Dunn <daviddunn@google.com> Message-Id: <20220226234131.2167175-1-jmattson@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NSasha Levin <sashal@kernel.org> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com> Reviewed-by: NWei Li <liwei391@huawei.com>
-
- 17 7月, 2022 5 次提交
-
-
由 Tony Luck 提交于
mainline inclusion from mainline-5.17 commit 03b122da category: feature bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I5EZFM CVE: NA Intel-SIG: commit 03b122da x86/sgx: Hook arch_memory_failure() into mainline code. Backport for SGX MCA recovery co-existence support -------------------------------- Add a call inside memory_failure() to call the arch specific code to check if the address is an SGX EPC page and handle it. Note the SGX EPC pages do not have a "struct page" entry, so the hook goes in at the same point as the device mapping hook. Pull the call to acquire the mutex earlier so the SGX errors are also protected. Make set_mce_nospec() skip SGX pages when trying to adjust the 1:1 map. Signed-off-by: NTony Luck <tony.luck@intel.com> Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com> Reviewed-by: NJarkko Sakkinen <jarkko@kernel.org> Reviewed-by: NNaoya Horiguchi <naoya.horiguchi@nec.com> Tested-by: NReinette Chatre <reinette.chatre@intel.com> Link: https://lkml.kernel.org/r/20211026220050.697075-6-tony.luck@intel.comSigned-off-by: NZhiquan Li <zhiquan1.li@intel.com>
-
由 Tony Luck 提交于
mainline inclusion from mainline-5.17 commit a495cbdf category: feature bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I5EZFM CVE: NA Intel-SIG: commit a495cbdf x86/sgx: Add SGX infrastructure to recover from poison. Backport for SGX MCA recovery co-existence support -------------------------------- Provide a recovery function sgx_memory_failure(). If the poison was consumed synchronously then send a SIGBUS. Note that the virtual address of the access is not included with the SIGBUS as is the case for poison outside of SGX enclaves. This doesn't matter as addresses of code/data inside an enclave is of little to no use to code executing outside the (now dead) enclave. Poison found in a free page results in the page being moved from the free list to the per-node poison page list. Signed-off-by: NTony Luck <tony.luck@intel.com> Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com> Reviewed-by: NJarkko Sakkinen <jarkko@kernel.org> Tested-by: NReinette Chatre <reinette.chatre@intel.com> Link: https://lkml.kernel.org/r/20211026220050.697075-5-tony.luck@intel.comSigned-off-by: NZhiquan Li <zhiquan1.li@intel.com>
-
由 Tony Luck 提交于
mainline inclusion from mainline-5.17 commit 992801ae category: feature bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I5EZFM CVE: NA Intel-SIG: commit 992801ae x86/sgx: Initial poison handling for dirty and free pages. Backport for SGX MCA recovery co-existence support -------------------------------- A memory controller patrol scrubber can report poison in a page that isn't currently being used. Add "poison" field in the sgx_epc_page that can be set for an sgx_epc_page. Check for it: 1) When sanitizing dirty pages 2) When freeing epc pages Poison is a new field separated from flags to avoid having to make all updates to flags atomic, or integrate poison state changes into some other locking scheme to protect flags (Currently just sgx_reclaimer_lock which protects the SGX_EPC_PAGE_RECLAIMER_TRACKED bit in page->flags). In both cases place the poisoned page on a per-node list of poisoned epc pages to make sure it will not be reallocated. Signed-off-by: NTony Luck <tony.luck@intel.com> Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com> Reviewed-by: NJarkko Sakkinen <jarkko@kernel.org> Tested-by: NReinette Chatre <reinette.chatre@intel.com> Link: https://lkml.kernel.org/r/20211026220050.697075-4-tony.luck@intel.comSigned-off-by: NZhiquan Li <zhiquan1.li@intel.com>
-
由 Tony Luck 提交于
mainline inclusion from mainline-5.17 commit 40e0e784 category: feature bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I5EZFM CVE: NA Intel-SIG: commit 40e0e784 x86/sgx: Add infrastructure to identify SGX EPC pages. Backport for SGX MCA recovery co-existence support -------------------------------- X86 machine check architecture reports a physical address when there is a memory error. Handling that error requires a method to determine whether the physical address reported is in any of the areas reserved for EPC pages by BIOS. SGX EPC pages do not have Linux "struct page" associated with them. Keep track of the mapping from ranges of EPC pages to the sections that contain them using an xarray. N.B. adds CONFIG_XARRAY_MULTI to the SGX dependecies. So "select" that in arch/x86/Kconfig for X86/SGX. Create a function arch_is_platform_page() that simply reports whether an address is an EPC page for use elsewhere in the kernel. The ACPI error injection code needs this function and is typically built as a module, so export it. Note that arch_is_platform_page() will be slower than other similar "what type is this page" functions that can simply check bits in the "struct page". If there is some future performance critical user of this function it may need to be implemented in a more efficient way. Note also that the current implementation of xarray allocates a few hundred kilobytes for this usage on a system with 4GB of SGX EPC memory configured. This isn't ideal, but worth it for the code simplicity. Signed-off-by: NTony Luck <tony.luck@intel.com> Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com> Reviewed-by: NJarkko Sakkinen <jarkko@kernel.org> Tested-by: NReinette Chatre <reinette.chatre@intel.com> Link: https://lkml.kernel.org/r/20211026220050.697075-3-tony.luck@intel.comSigned-off-by: NZhiquan Li <zhiquan1.li@intel.com>
-
由 Tony Luck 提交于
mainline inclusion from mainline-5.17 commit d6d261bd category: feature bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I5EZFM CVE: NA Intel-SIG: commit d6d261bd x86/sgx: Add new sgx_epc_page flag bit to mark free pages. Backport for SGX MCA recovery co-existence support -------------------------------- SGX EPC pages go through the following life cycle: DIRTY ---> FREE ---> IN-USE --\ ^ | \-----------------/ Recovery action for poison for a DIRTY or FREE page is simple. Just make sure never to allocate the page. IN-USE pages need some extra handling. Add a new flag bit SGX_EPC_PAGE_IS_FREE that is set when a page is added to a free list and cleared when the page is allocated. Notes: 1) These transitions are made while holding the node->lock so that future code that checks the flags while holding the node->lock can be sure that if the SGX_EPC_PAGE_IS_FREE bit is set, then the page is on the free list. 2) Initially while the pages are on the dirty list the SGX_EPC_PAGE_IS_FREE bit is cleared. Signed-off-by: NTony Luck <tony.luck@intel.com> Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com> Reviewed-by: NJarkko Sakkinen <jarkko@kernel.org> Tested-by: NReinette Chatre <reinette.chatre@intel.com> Link: https://lkml.kernel.org/r/20211026220050.697075-2-tony.luck@intel.comSigned-off-by: NZhiquan Li <zhiquan1.li@intel.com>
-
- 13 7月, 2022 2 次提交
-
-
由 Kyung Min Park 提交于
mainline inclusion from mainline-5.11 commit b85a0425 category: feature feature: SPR New instructions bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I596EH CVE: N/A Intel-SIG: commit b85a0425 x86: Enumerate AVX Vector Neural Network instructions Backport for SPR core AVX VNNI support. ---------------------------- Add AVX version of the Vector Neural Network (VNNI) Instructions. A processor supports AVX VNNI instructions if CPUID.0x07.0x1:EAX[4] is present. The following instructions are available when this feature is present. 1. VPDPBUS: Multiply and Add Unsigned and Signed Bytes 2. VPDPBUSDS: Multiply and Add Unsigned and Signed Bytes with Saturation 3. VPDPWSSD: Multiply and Add Signed Word Integers 4. VPDPWSSDS: Multiply and Add Signed Integers with Saturation The only in-kernel usage of this is kvm passthrough. The CPU feature flag is shown as "avx_vnni" in /proc/cpuinfo. This instruction is currently documented in the latest "extensions" manual (ISE). It will appear in the "main" manual (SDM) in the future. Signed-off-by: NKyung Min Park <kyung.min.park@intel.com> Signed-off-by: NYang Zhong <yang.zhong@intel.com> Reviewed-by: NTony Luck <tony.luck@intel.com> Message-Id: <20210105004909.42000-2-yang.zhong@intel.com> Acked-by: NBorislav Petkov <bp@suse.de> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NLuming Yu <luming.yu@intel.com>
-
由 Kyung Min Park 提交于
mainline inclusion from mainline-5.11 commit e1b35da5 category: feature feature: SPR New instructions bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I596EH CVE: N/A Intel-SIG: commit e1b35da5 x86: Enumerate AVX512 FP16 CPUID feature flag Backport for SPR core 5G ISA support. ------------------------------------- Enumerate AVX512 Half-precision floating point (FP16) CPUID feature flag. Compared with using FP32, using FP16 cut the number of bits required for storage in half, reducing the exponent from 8 bits to 5, and the mantissa from 23 bits to 10. Using FP16 also enables developers to train and run inference on deep learning models fast when all precision or magnitude (FP32) is not needed. A processor supports AVX512 FP16 if CPUID.(EAX=7,ECX=0):EDX[bit 23] is present. The AVX512 FP16 requires AVX512BW feature be implemented since the instructions for manipulating 32bit masks are associated with AVX512BW. The only in-kernel usage of this is kvm passthrough. The CPU feature flag is shown as "avx512_fp16" in /proc/cpuinfo. Signed-off-by: Kyung Min Park kyung.min.park@intel.com Acked-by: Dave Hansen dave.hansen@intel.com Reviewed-by: Tony Luck tony.luck@intel.com Message-Id: 20201208033441.28207-2-kyung.min.park@intel.com Acked-by: Borislav Petkov bp@suse.de Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Luming Yu luming.yu@intel.com
-
- 08 7月, 2022 1 次提交
-
-
由 Ricardo Neri 提交于
mainline inclusion from mainline-5.18 commit 7b8f40b3 category: feature bugzilla: https://gitee.com/openeuler/intel-kernel/issues/I5DSOL Intel_SIG: commit 7b8f40b3 x86/cpu: Add definitions for the Intel Hardware Feedback Interface. Backport for Intel HFI (Hardware Feedback Interface) support ------------------------------------- Add the CPUID feature bit and the model-specific registers needed to identify and configure the Intel Hardware Feedback Interface. Acked-by: NBorislav Petkov <bp@suse.de> Signed-off-by: NRicardo Neri <ricardo.neri-calderon@linux.intel.com> Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Nyingbao jia <yingbao.jia@intel.com> Signed-off-by: NJun Tian <jun.j.tian@intel.com>
-