- 24 1月, 2017 5 次提交
-
-
由 Borislav Petkov 提交于
Change dependency to mce.c as we're using mce_inject_log() now to stick an MCE into the MCA subsystem. Signed-off-by: NBorislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Yazen Ghannam <Yazen.Ghannam@amd.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20170123183514.13356-6-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Borislav Petkov 提交于
Add the TSC value to the MCE record only when the MCE being logged is precise, i.e., it is logged as an exception or an MCE-related interrupt. So it doesn't look particularly easy to do without touching/changing a bunch of places. That's why I'm trying tricks first. For example, the mce-apei.c case I'm addressing by setting ->tsc only for errors of panic severity. The idea there is, that, panic errors will have raised an #MC and not polled. And then instead of propagating a flag to mce_setup(), it seems easier/less code to set ->tsc depending on the call sites, i.e., are we polling or are we preparing an MCE record in an exception handler/thresholding interrupt. Signed-off-by: NBorislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Yazen Ghannam <Yazen.Ghannam@amd.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20170123183514.13356-5-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Yazen Ghannam 提交于
Currently, we append the MCA_IPID[InstanceId] to the bank name to create the sysfs filename. The InstanceId field uniquely identifies a bank instance but it doesn't look very nice for most banks. Replace the InstanceId with a simpler, ascending (0, 1, ..) value. Only use this in the sysfs name when there is more than 1 instance. Otherwise, just use the bank's name as the sysfs name. Signed-off-by: NYazen Ghannam <Yazen.Ghannam@amd.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1484322741-41884-3-git-send-email-Yazen.Ghannam@amd.com Link: http://lkml.kernel.org/r/20170123183514.13356-4-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Borislav Petkov 提交于
We log a fake bank 128 MCE to note that we're handling a CPU thermal event. However, this confuses people into thinking that their hardware generates MCEs. Hijacking MCA for logging thermal events is a gross misuse anyway and it shouldn't have been done in the first place. And besides we have other means for dealing with thermal events which are much more suitable. So let's kill the MCE logging part. Signed-off-by: NBorislav Petkov <bp@suse.de> Acked-by: NAshok Raj <ashok.raj@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Yazen Ghannam <Yazen.Ghannam@amd.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20170105213846.GA12024@gmail.com Link: http://lkml.kernel.org/r/20170123183514.13356-3-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Borislav Petkov 提交于
... and get rid of the annoying: arch/x86/kernel/cpu/mcheck/mce-inject.c:97:13: warning: ‘mce_irq_ipi’ defined but not used [-Wunused-function] when doing randconfig builds. Signed-off-by: NBorislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Yazen Ghannam <Yazen.Ghannam@amd.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/20170123183514.13356-2-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 18 1月, 2017 1 次提交
-
-
由 Ruslan Ruslichenko 提交于
commit d32932d0 removed the irq_retrigger callback from the IO-APIC chip and did not add it to the new IO-APIC-IR irq chip. Unfortunately the software resend fallback is not enabled on X86, so edge interrupts which are received during the lazy disabled state of the interrupt line are not retriggered and therefor lost. Restore the callbacks. [ tglx: Massaged changelog ] Fixes: d32932d0 ("x86/irq: Convert IOAPIC to use hierarchical irqdomain interfaces") Signed-off-by: NRuslan Ruslichenko <rruslich@cisco.com> Cc: xe-linux-external@cisco.com Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/1484662432-13580-1-git-send-email-rruslich@cisco.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
- 17 1月, 2017 2 次提交
-
-
由 Dmitry Vyukov 提交于
emulator_fix_hypercall() replaces hypercall with vmcall instruction, but it does not handle GP exception properly when writes the new instruction. It can return X86EMUL_PROPAGATE_FAULT without setting exception information. This leads to incorrect emulation and triggers WARN_ON(ctxt->exception.vector > 0x1f) in x86_emulate_insn() as discovered by syzkaller fuzzer: WARNING: CPU: 2 PID: 18646 at arch/x86/kvm/emulate.c:5558 Call Trace: warn_slowpath_null+0x2c/0x40 kernel/panic.c:582 x86_emulate_insn+0x16a5/0x4090 arch/x86/kvm/emulate.c:5572 x86_emulate_instruction+0x403/0x1cc0 arch/x86/kvm/x86.c:5618 emulate_instruction arch/x86/include/asm/kvm_host.h:1127 [inline] handle_exception+0x594/0xfd0 arch/x86/kvm/vmx.c:5762 vmx_handle_exit+0x2b7/0x38b0 arch/x86/kvm/vmx.c:8625 vcpu_enter_guest arch/x86/kvm/x86.c:6888 [inline] vcpu_run arch/x86/kvm/x86.c:6947 [inline] Set exception information when write in emulator_fix_hypercall() fails. Signed-off-by: NDmitry Vyukov <dvyukov@google.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Wanpeng Li <wanpeng.li@hotmail.com> Cc: kvm@vger.kernel.org Cc: syzkaller@googlegroups.com Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
由 Zhou Chengming 提交于
The CPU hotplug function intel_pmu_cpu_starting() sets cpu_hw_events.excl_thread_id unconditionally to 1 when the shared exclusive counters data structure is already availabe for the sibling thread. This works during the boot process because the first sibling gets threadid 0 assigned and the second sibling which shares the data structure gets 1. But when the first thread of the core is offlined and onlined again it shares the data structure with the second thread and gets exclusive thread id 1 assigned as well. Prevent this by checking the threadid of the already online thread. [ tglx: Rewrote changelog ] Signed-off-by: NZhou Chengming <zhouchengming1@huawei.com> Cc: NuoHan Qiao <qiaonuohan@huawei.com> Cc: ak@linux.intel.com Cc: peterz@infradead.org Cc: kan.liang@intel.com Cc: dave.hansen@linux.intel.com Cc: eranian@google.com Cc: qiaonuohan@huawei.com Cc: davidcc@google.com Cc: guohanjun@huawei.com Link: http://lkml.kernel.org/r/1484536871-3131-1-git-send-email-zhouchengming1@huawei.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de> --- --- arch/x86/events/intel/core.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-)
-
- 14 1月, 2017 5 次提交
-
-
由 Peter Jones 提交于
Some machines, such as the Lenovo ThinkPad W541 with firmware GNET80WW (2.28), include memory map entries with phys_addr=0x0 and num_pages=0. These machines fail to boot after the following commit, commit 8e80632f ("efi/esrt: Use efi_mem_reserve() and avoid a kmalloc()") Fix this by removing such bogus entries from the memory map. Furthermore, currently the log output for this case (with efi=debug) looks like: [ 0.000000] efi: mem45: [Reserved | | | | | | | | | | | | ] range=[0x0000000000000000-0xffffffffffffffff] (0MB) This is clearly wrong, and also not as informative as it could be. This patch changes it so that if we find obviously invalid memory map entries, we print an error and skip those entries. It also detects the display of the address range calculation overflow, so the new output is: [ 0.000000] efi: [Firmware Bug]: Invalid EFI memory map entries: [ 0.000000] efi: mem45: [Reserved | | | | | | | | | | | | ] range=[0x0000000000000000-0x0000000000000000] (invalid) It also detects memory map sizes that would overflow the physical address, for example phys_addr=0xfffffffffffff000 and num_pages=0x0200000000000001, and prints: [ 0.000000] efi: [Firmware Bug]: Invalid EFI memory map entries: [ 0.000000] efi: mem45: [Reserved | | | | | | | | | | | | ] range=[phys_addr=0xfffffffffffff000-0x20ffffffffffffffff] (invalid) It then removes these entries from the memory map. Signed-off-by: NPeter Jones <pjones@redhat.com> Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org> [ardb: refactor for clarity with no functional changes, avoid PAGE_SHIFT] Signed-off-by: NMatt Fleming <matt@codeblueprint.co.uk> [Matt: Include bugzilla info in commit log] Cc: <stable@vger.kernel.org> # v4.9+ Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://bugzilla.kernel.org/show_bug.cgi?id=191121Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Jiri Olsa 提交于
As Peter suggested [1] rejecting non sampling PEBS events, because they dont make any sense and could cause bugs in the NMI handler [2]. [1] http://lkml.kernel.org/r/20170103094059.GC3093@worktop [2] http://lkml.kernel.org/r/1482931866-6018-3-git-send-email-jolsa@kernel.orgSigned-off-by: NJiri Olsa <jolsa@redhat.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vince@deater.net> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: http://lkml.kernel.org/r/20170103142454.GA26251@kravaSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Jiri Olsa 提交于
It's possible to set up PEBS events to get only errors and not any data, like on SNB-X (model 45) and IVB-EP (model 62) via 2 perf commands running simultaneously: taskset -c 1 ./perf record -c 4 -e branches:pp -j any -C 10 This leads to a soft lock up, because the error path of the intel_pmu_drain_pebs_nhm() does not account event->hw.interrupt for error PEBS interrupts, so in case you're getting ONLY errors you don't have a way to stop the event when it's over the max_samples_per_tick limit: NMI watchdog: BUG: soft lockup - CPU#22 stuck for 22s! [perf_fuzzer:5816] ... RIP: 0010:[<ffffffff81159232>] [<ffffffff81159232>] smp_call_function_single+0xe2/0x140 ... Call Trace: ? trace_hardirqs_on_caller+0xf5/0x1b0 ? perf_cgroup_attach+0x70/0x70 perf_install_in_context+0x199/0x1b0 ? ctx_resched+0x90/0x90 SYSC_perf_event_open+0x641/0xf90 SyS_perf_event_open+0x9/0x10 do_syscall_64+0x6c/0x1f0 entry_SYSCALL64_slow_path+0x25/0x25 Add perf_event_account_interrupt() which does the interrupt and frequency checks and call it from intel_pmu_drain_pebs_nhm()'s error path. We keep the pending_kill and pending_wakeup logic only in the __perf_event_overflow() path, because they make sense only if there's any data to deliver. Signed-off-by: NJiri Olsa <jolsa@kernel.org> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vince@deater.net> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: http://lkml.kernel.org/r/1482931866-6018-2-git-send-email-jolsa@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Tobias Klauser 提交于
info->si_addr is of type void __user *, so it should be compared against something from the same address space. This fixes the following sparse error: arch/x86/mm/mpx.c:296:27: error: incompatible types in comparison expression (different address spaces) Signed-off-by: NTobias Klauser <tklauser@distanz.ch> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Len Brown 提交于
The Intel Denverton microserver uses a 25 MHz TSC crystal, so we can derive its exact [*] TSC frequency using CPUID and some arithmetic, eg.: TSC: 1800 MHz (25000000 Hz * 216 / 3 / 1000000) [*] 'exact' is only as good as the crystal, which should be +/- 20ppm Signed-off-by: NLen Brown <len.brown@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/306899f94804aece6d8fa8b4223ede3b48dbb59c.1484287748.git.len.brown@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 12 1月, 2017 8 次提交
-
-
由 Paolo Bonzini 提交于
This is CVE-2017-2583. On Intel this causes a failed vmentry because SS's type is neither 3 nor 7 (even though the manual says this check is only done for usable SS, and the dmesg splat says that SS is unusable!). On AMD it's worse: svm.c is confused and sets CPL to 0 in the vmcb. The fix fabricates a data segment descriptor when SS is set to a null selector, so that CPL and SS.DPL are set correctly in the VMCS/vmcb. Furthermore, only allow setting SS to a NULL selector if SS.RPL < 3; this in turn ensures CPL < 3 because RPL must be equal to CPL. Thanks to Andy Lutomirski and Willy Tarreau for help in analyzing the bug and deciphering the manuals. Reported-by: NXiaohan Zhang <zhangxiaohan1@huawei.com> Fixes: 79d5b4c3 Cc: stable@nongnu.org Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Wanpeng Li 提交于
Reported by syzkaller: BUG: unable to handle kernel NULL pointer dereference at 00000000000001b0 IP: _raw_spin_lock+0xc/0x30 PGD 3e28eb067 PUD 3f0ac6067 PMD 0 Oops: 0002 [#1] SMP CPU: 0 PID: 2431 Comm: test Tainted: G OE 4.10.0-rc1+ #3 Call Trace: ? kvm_ioapic_scan_entry+0x3e/0x110 [kvm] kvm_arch_vcpu_ioctl_run+0x10a8/0x15f0 [kvm] ? pick_next_task_fair+0xe1/0x4e0 ? kvm_arch_vcpu_load+0xea/0x260 [kvm] kvm_vcpu_ioctl+0x33a/0x600 [kvm] ? hrtimer_try_to_cancel+0x29/0x130 ? do_nanosleep+0x97/0xf0 do_vfs_ioctl+0xa1/0x5d0 ? __hrtimer_init+0x90/0x90 ? do_nanosleep+0x5b/0xf0 SyS_ioctl+0x79/0x90 do_syscall_64+0x6e/0x180 entry_SYSCALL64_slow_path+0x25/0x25 RIP: _raw_spin_lock+0xc/0x30 RSP: ffffa43688973cc0 The syzkaller folks reported a NULL pointer dereference due to ENABLE_CAP succeeding even without an irqchip. The Hyper-V synthetic interrupt controller is activated, resulting in a wrong request to rescan the ioapic and a NULL pointer dereference. #include <sys/ioctl.h> #include <sys/mman.h> #include <sys/types.h> #include <linux/kvm.h> #include <pthread.h> #include <stddef.h> #include <stdint.h> #include <stdlib.h> #include <string.h> #include <unistd.h> #ifndef KVM_CAP_HYPERV_SYNIC #define KVM_CAP_HYPERV_SYNIC 123 #endif void* thr(void* arg) { struct kvm_enable_cap cap; cap.flags = 0; cap.cap = KVM_CAP_HYPERV_SYNIC; ioctl((long)arg, KVM_ENABLE_CAP, &cap); return 0; } int main() { void *host_mem = mmap(0, 0x1000, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0); int kvmfd = open("/dev/kvm", 0); int vmfd = ioctl(kvmfd, KVM_CREATE_VM, 0); struct kvm_userspace_memory_region memreg; memreg.slot = 0; memreg.flags = 0; memreg.guest_phys_addr = 0; memreg.memory_size = 0x1000; memreg.userspace_addr = (unsigned long)host_mem; host_mem[0] = 0xf4; ioctl(vmfd, KVM_SET_USER_MEMORY_REGION, &memreg); int cpufd = ioctl(vmfd, KVM_CREATE_VCPU, 0); struct kvm_sregs sregs; ioctl(cpufd, KVM_GET_SREGS, &sregs); sregs.cr0 = 0; sregs.cr4 = 0; sregs.efer = 0; sregs.cs.selector = 0; sregs.cs.base = 0; ioctl(cpufd, KVM_SET_SREGS, &sregs); struct kvm_regs regs = { .rflags = 2 }; ioctl(cpufd, KVM_SET_REGS, ®s); ioctl(vmfd, KVM_CREATE_IRQCHIP, 0); pthread_t th; pthread_create(&th, 0, thr, (void*)(long)cpufd); usleep(rand() % 10000); ioctl(cpufd, KVM_RUN, 0); pthread_join(th, 0); return 0; } This patch fixes it by failing ENABLE_CAP if without an irqchip. Reported-by: NDmitry Vyukov <dvyukov@google.com> Fixes: 5c919412 (kvm/x86: Hyper-V synthetic interrupt controller) Cc: stable@vger.kernel.org # 4.5+ Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Steve Rutherford 提交于
Introduces segemented_write_std. Switches from emulated reads/writes to standard read/writes in fxsave, fxrstor, sgdt, and sidt. This fixes CVE-2017-2584, a longstanding kernel memory leak. Since commit 283c95d0 ("KVM: x86: emulate FXSAVE and FXRSTOR", 2016-11-09), which is luckily not yet in any final release, this would also be an exploitable kernel memory *write*! Reported-by: NDmitry Vyukov <dvyukov@google.com> Cc: stable@vger.kernel.org Fixes: 96051572 Fixes: 283c95d0Suggested-by: NPaolo Bonzini <pbonzini@redhat.com> Signed-off-by: NSteve Rutherford <srutherford@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 David Matlack 提交于
KVM's lapic emulation uses static_key_deferred (apic_{hw,sw}_disabled). These are implemented with delayed_work structs which can still be pending when the KVM module is unloaded. We've seen this cause kernel panics when the kvm_intel module is quickly reloaded. Use the new static_key_deferred_flush() API to flush pending updates on module unload. Signed-off-by: NDavid Matlack <dmatlack@google.com> Cc: stable@vger.kernel.org Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Josh Poimboeuf 提交于
When unwinding a task, the end of the stack is always at the same offset right below the saved pt_regs, regardless of which syscall was used to enter the kernel. That convention allows the unwinder to verify that a stack is sane. However, newly forked tasks don't always follow that convention, as reported by the following unwinder warning seen by Dave Jones: WARNING: kernel stack frame pointer at ffffc90001443f30 in kworker/u8:8:30468 has bad value (null) The warning was due to the following call chain: (ftrace handler) call_usermodehelper_exec_async+0x5/0x140 ret_from_fork+0x22/0x30 The problem is that ret_from_fork() doesn't create a stack frame before calling other functions. Fix that by carefully using the frame pointer macros. In addition to conforming to the end of stack convention, this also makes related stack traces more sensible by making it clear to the user that ret_from_fork() was involved. Reported-by: NDave Jones <davej@codemonkey.org.uk> Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Miroslav Benes <mbenes@suse.cz> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/8854cdaab980e9700a81e9ebf0d4238e4bbb68ef.1483978430.git.jpoimboe@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Josh Poimboeuf 提交于
In the following commit: 0100301b ("sched/x86: Rewrite the switch_to() code") ... the layout of the 'inactive_task_frame' struct was designed to have a frame pointer header embedded in it, so that the unwinder could use the 'bp' and 'ret_addr' fields to report __schedule() on the stack (or ret_from_fork() for newly forked tasks which haven't actually run yet). Finish the job by changing get_frame_pointer() to return a pointer to inactive_task_frame's 'bp' field rather than 'bp' itself. This allows the unwinder to start one frame higher on the stack, so that it properly reports __schedule(). Reported-by: NMiroslav Benes <mbenes@suse.cz> Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Jones <davej@codemonkey.org.uk> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/598e9f7505ed0aba86e8b9590aa528c6c7ae8dcd.1483978430.git.jpoimboe@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Josh Poimboeuf 提交于
There are a handful of callers to save_stack_trace_tsk() and show_stack() which try to unwind the stack of a task other than current. In such cases, it's remotely possible that the task is running on one CPU while the unwinder is reading its stack from another CPU, causing the unwinder to see stack corruption. These cases seem to be mostly harmless. The unwinder has checks which prevent it from following bad pointers beyond the bounds of the stack. So it's not really a bug as long as the caller understands that unwinding another task will not always succeed. In such cases, it's possible that the unwinder may read a KASAN-poisoned region of the stack. Account for that by using READ_ONCE_NOCHECK() when reading the stack of another task. Use READ_ONCE() when reading the stack of the current task, since KASAN warnings can still be useful for finding bugs in that case. Reported-by: NDmitry Vyukov <dvyukov@google.com> Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Dave Jones <davej@codemonkey.org.uk> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Miroslav Benes <mbenes@suse.cz> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/4c575eb288ba9f73d498dfe0acde2f58674598f1.1483978430.git.jpoimboe@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Josh Poimboeuf 提交于
There are a handful of callers to save_stack_trace_tsk() and show_stack() which try to unwind the stack of a task other than current. In such cases, it's remotely possible that the task is running on one CPU while the unwinder is reading its stack from another CPU, causing the unwinder to see stack corruption. These cases seem to be mostly harmless. The unwinder has checks which prevent it from following bad pointers beyond the bounds of the stack. So it's not really a bug as long as the caller understands that unwinding another task will not always succeed. Since stack "corruption" on another task's stack isn't necessarily a bug, silence the warnings when unwinding tasks other than current. Reported-by: NDave Jones <davej@codemonkey.org.uk> Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Miroslav Benes <mbenes@suse.cz> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/00d8c50eea3446c1524a2a755397a3966629354c.1483978430.git.jpoimboe@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 11 1月, 2017 3 次提交
-
-
由 Colin King 提交于
When x86_pmu.num_counters is 32 the shift of the integer constant 1 is exceeding 32bit and therefor undefined behaviour. Fix this by shifting 1ULL instead of 1. Reported-by: CoverityScan CID#1192105 ("Bad bit shift operation") Signed-off-by: NColin Ian King <colin.king@canonical.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Kan Liang <kan.liang@intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Link: http://lkml.kernel.org/r/20170111114310.17928-1-colin.king@canonical.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Bjorn Helgaas 提交于
Martin reported that the Supermicro X8DTH-i/6/iF/6F advertises incorrect host bridge windows via _CRS: pci_root PNP0A08:00: host bridge window [io 0xf000-0xffff] pci_root PNP0A08:01: host bridge window [io 0xf000-0xffff] Both bridges advertise the 0xf000-0xffff window, which cannot be correct. Work around this by ignoring _CRS on this system. The downside is that we may not assign resources correctly to hot-added PCI devices (if they are possible on this system). Link: https://bugzilla.kernel.org/show_bug.cgi?id=42606Reported-by: NMartin Burnicki <martin.burnicki@meinberg.de> Signed-off-by: NBjorn Helgaas <bhelgaas@google.com> CC: stable@vger.kernel.org
-
由 Prarit Bhargava 提交于
hswep_uncore_cpu_init() uses a hardcoded physical package id 0 for the boot cpu. This works as long as the boot CPU is actually on the physical package 0, which is normaly the case after power on / reboot. But it fails with a NULL pointer dereference when a kdump kernel is started on a secondary socket which has a different physical package id because the locigal package translation for physical package 0 does not exist. Use the logical package id of the boot cpu instead of hard coded 0. [ tglx: Rewrote changelog once more ] Fixes: cf6d445f ("perf/x86/uncore: Track packages, not per CPU data") Signed-off-by: NPrarit Bhargava <prarit@redhat.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@suse.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Harish Chegondi <harish.chegondi@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/1483628965-2890-1-git-send-email-prarit@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org> Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
-
- 10 1月, 2017 5 次提交
-
-
由 Junichi Nomura 提交于
In generic_load_microcode(), curr_mc_size is the size of the last allocated buffer and since we have this performance "optimization" there to vmalloc a new buffer only when the current one is bigger, curr_mc_size ends up becoming the size of the biggest buffer we've seen so far. However, we end up saving the microcode patch which matches our CPU and its size is not curr_mc_size but the respective mc_size during the iteration while we're staring at it. So save that mc_size into a separate variable and use it to store the previously found microcode buffer. Without this fix, we could get oops like this: BUG: unable to handle kernel paging request at ffffc9000e30f000 IP: __memcpy+0x12/0x20 ... Call Trace: ? kmemdup+0x43/0x60 __alloc_microcode_buf+0x44/0x70 save_microcode_patch+0xd4/0x150 generic_load_microcode+0x1b8/0x260 request_microcode_user+0x15/0x20 microcode_write+0x91/0x100 __vfs_write+0x34/0x120 vfs_write+0xc1/0x130 SyS_write+0x56/0xc0 do_syscall_64+0x6c/0x160 entry_SYSCALL64_slow_path+0x25/0x25 Fixes: 06b8534c ("x86/microcode: Rework microcode loading") Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/4f33cbfd-44f2-9bed-3b66-7446cd14256f@ce.jp.nec.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Junichi Nomura 提交于
We allocate struct ucode_patch here. @size is the size of microcode data and used for kmemdup() later in this function. Fixes: 06b8534c ("x86/microcode: Rework microcode loading") Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/7a730dc9-ac17-35c4-fe76-dfc94e5ecd95@ce.jp.nec.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Borislav Petkov 提交于
Since on Intel we're required to do CPUID(1) first, before reading the microcode revision MSR, let's add a special helper which does the required steps so that we don't forget to do them next time, when we want to read the microcode revision. Signed-off-by: NBorislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/20170109114147.5082-4-bp@alien8.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Borislav Petkov 提交于
Intel supplies the microcode revision value in MSR 0x8b (IA32_BIOS_SIGN_ID) after CPUID(1) has been executed. Execute it each time before reading that MSR. It used to do sync_core() which did do CPUID but c198b121 ("x86/asm: Rewrite sync_core() to use IRET-to-self") changed the sync_core() implementation so we better make the microcode loading case explicit, as the SDM documents it. Reported-and-tested-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/20170109114147.5082-3-bp@alien8.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Borislav Petkov 提交于
... similarly to the cpuid_<reg>() variants. Signed-off-by: NBorislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/20170109114147.5082-2-bp@alien8.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
- 09 1月, 2017 2 次提交
-
-
由 Nicholas Mc Guire 提交于
Add the missing declarations of basic string functions to string.h to allow a clean build. Fixes: 5be86566 ("String-handling functions for the new x86 setup code.") Signed-off-by: NNicholas Mc Guire <hofrat@osadl.org> Link: http://lkml.kernel.org/r/1483781911-21399-1-git-send-email-hofrat@osadl.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Daniel Borkmann 提交于
If after too many passes still no image could be emitted, then swap back to the original program as we do in all other cases and don't use the one with blinding. Fixes: 959a7579 ("bpf, x86: add support for constant blinding") Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 07 1月, 2017 1 次提交
-
-
由 Nicolai Stange 提交于
With the following commit: 4bc9f92e ("x86/efi-bgrt: Use efi_mem_reserve() to avoid copying image data") ... efi_bgrt_init() calls into the memblock allocator through efi_mem_reserve() => efi_arch_mem_reserve() *after* mm_init() has been called. Indeed, KASAN reports a bad read access later on in efi_free_boot_services(): BUG: KASAN: use-after-free in efi_free_boot_services+0xae/0x24c at addr ffff88022de12740 Read of size 4 by task swapper/0/0 page:ffffea0008b78480 count:0 mapcount:-127 mapping: (null) index:0x1 flags: 0x5fff8000000000() [...] Call Trace: dump_stack+0x68/0x9f kasan_report_error+0x4c8/0x500 kasan_report+0x58/0x60 __asan_load4+0x61/0x80 efi_free_boot_services+0xae/0x24c start_kernel+0x527/0x562 x86_64_start_reservations+0x24/0x26 x86_64_start_kernel+0x157/0x17a start_cpu+0x5/0x14 The instruction at the given address is the first read from the memmap's memory, i.e. the read of md->type in efi_free_boot_services(). Note that the writes earlier in efi_arch_mem_reserve() don't splat because they're done through early_memremap()ed addresses. So, after memblock is gone, allocations should be done through the "normal" page allocator. Introduce a helper, efi_memmap_alloc() for this. Use it from efi_arch_mem_reserve(), efi_free_boot_services() and, for the sake of consistency, from efi_fake_memmap() as well. Note that for the latter, the memmap allocations cease to be page aligned. This isn't needed though. Tested-by: NDan Williams <dan.j.williams@intel.com> Signed-off-by: NNicolai Stange <nicstange@gmail.com> Reviewed-by: NArd Biesheuvel <ard.biesheuvel@linaro.org> Cc: <stable@vger.kernel.org> # v4.9 Cc: Dave Young <dyoung@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Mika Penttilä <mika.penttila@nextfour.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-efi@vger.kernel.org Fixes: 4bc9f92e ("x86/efi-bgrt: Use efi_mem_reserve() to avoid copying image data") Link: http://lkml.kernel.org/r/20170105125130.2815-1-nicstange@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 06 1月, 2017 1 次提交
-
-
由 Borislav Petkov 提交于
The following commit: 8196dab4 ("x86/cpu: Get rid of compute_unit_id") ... broke the initial strategy for Bulldozer-based cores' topology, where we consider each thread of a compute unit a standalone core and not a HT or SMT thread. Revert to the firmware-supplied core_id numbering and do not make them thread siblings as we don't consider them for such even if they technically are, more or less. Reported-and-tested-by: NBrice Goglin <Brice.Goglin@inria.fr> Tested-by: NYazen Ghannam <yazen.ghannam@amd.com> Signed-off-by: NBorislav Petkov <bp@suse.de> Cc: <stable@vger.kernel.org> # v4.6+ Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: 8196dab4 ("x86/cpu: Get rid of compute_unit_id") Link: http://lkml.kernel.org/r/20170105092638.5247-1-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 05 1月, 2017 5 次提交
-
-
由 Jan Dakinevich 提交于
Declaration of VMX_VPID_EXTENT_SUPPORTED_MASK occures twice in the code. Probably, it was happened after unsuccessful merge. Signed-off-by: NJan Dakinevich <jan.dakinevich@gmail.com> Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
-
由 David Carrillo-Cisneros 提交于
The conversion of Intel PMU drivers into modules did not include reference counting. The machine will crash when attempting to access deleted code if an event from a module PMU is started and the module removed before the event is destroyed. i.e. this crashes the machine: $ insmod intel-rapl-perf.ko $ perf stat -e power/energy-cores/ -C 0 & $ rmmod intel-rapl-perf.ko Set THIS_MODULE to pmu->module in Intel module PMUs so that generic code can handle reference counting and deny rmmod while an event still exists. Signed-off-by: NDavid Carrillo-Cisneros <davidcc@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bp@suse.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kan Liang <kan.liang@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul Turner <pjt@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1482455860-116269-1-git-send-email-davidcc@google.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Andy Shevchenko 提交于
The current implementation supports only Intel Merrifield platforms. Don't mess with the rest of the Intel MID family by not registering device with wrong properties. Signed-off-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20170102092450.87229-1-andriy.shevchenko@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Andy Shevchenko 提交于
The proper spelling of Anniedale SoC with 'e' in the middle. Fix typo in the comment line in intel-family.h header. Signed-off-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20170102092229.87036-1-andriy.shevchenko@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Lukasz Odzioba 提交于
A negative number can be specified in the cmdline which will be used as setup_clear_cpu_cap() argument. With that we can clear/set some bit in memory predceeding boot_cpu_data/cpu_caps_cleared which may cause kernel to misbehave. This patch adds lower bound check to setup_disablecpuid(). Boris Petkov reproduced a crash: [ 1.234575] BUG: unable to handle kernel paging request at ffffffff858bd540 [ 1.236535] IP: memcpy_erms+0x6/0x10 Signed-off-by: NLukasz Odzioba <lukasz.odzioba@intel.com> Acked-by: NBorislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: andi.kleen@intel.com Cc: bp@alien8.de Cc: dave.hansen@linux.intel.com Cc: luto@kernel.org Cc: slaoub@gmail.com Fixes: ac72e788 ("x86: add generic clearcpuid=... option") Link: http://lkml.kernel.org/r/1482933340-11857-1-git-send-email-lukasz.odzioba@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 30 12月, 2016 2 次提交
-
-
由 Herbert Xu 提交于
If aesni is built-in but pcbc is built as a module, then aesni will fail completely because when it tries to register the pcbc variant of aes the pcbc template is not available. This patch fixes this by modifying the pcbc presence test so that if aesni is built-in then pcbc must also be built-in for it to be used by aesni. Fixes: 85671860 ("crypto: aesni - Convert to skcipher") Reported-by: NStephan Müller <smueller@chronox.de> Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
-
由 Linus Torvalds 提交于
In commit 62906027 ("mm: add PageWaiters indicating tasks are waiting for a page bit") Nick Piggin made our page locking no longer unconditionally touch the hashed page waitqueue, which not only helps performance in general, but is particularly helpful on NUMA machines where the hashed wait queues can bounce around a lot. However, the "clear lock bit atomically and then test the waiters bit" sequence turns out to be much more expensive than it needs to be, because you get a nasty stall when trying to access the same word that just got updated atomically. On architectures where locking is done with LL/SC, this would be trivial to fix with a new primitive that clears one bit and tests another atomically, but that ends up not working on x86, where the only atomic operations that return the result end up being cmpxchg and xadd. The atomic bit operations return the old value of the same bit we changed, not the value of an unrelated bit. On x86, we could put the lock bit in the high bit of the byte, and use "xadd" with that bit (where the overflow ends up not touching other bits), and look at the other bits of the result. However, an even simpler model is to just use a regular atomic "and" to clear the lock bit, and then the sign bit in eflags will indicate the resulting state of the unrelated bit #7. So by moving the PageWaiters bit up to bit #7, we can atomically clear the lock bit and test the waiters bit on x86 too. And architectures with LL/SC (which is all the usual RISC suspects), the particular bit doesn't matter, so they are fine with this approach too. This avoids the extra access to the same atomic word, and thus avoids the costly stall at page unlock time. The only downside is that the interface ends up being a bit odd and specialized: clear a bit in a byte, and test the sign bit. Nick doesn't love the resulting name of the new primitive, but I'd rather make the name be descriptive and very clear about the limitation imposed by trying to work across all relevant architectures than make it be some generic thing that doesn't make the odd semantics explicit. So this introduces the new architecture primitive clear_bit_unlock_is_negative_byte(); and adds the trivial implementation for x86. We have a generic non-optimized fallback (that just does a "clear_bit()"+"test_bit(7)" combination) which can be overridden by any architecture that can do better. According to Nick, Power has the same hickup x86 has, for example, but some other architectures may not even care. All these optimizations mean that my page locking stress-test (which is just executing a lot of small short-lived shell scripts: "make test" in the git source tree) no longer makes our page locking look horribly bad. Before all these optimizations, just the unlock_page() costs were just over 3% of all CPU overhead on "make test". After this, it's down to 0.66%, so just a quarter of the cost it used to be. (The difference on NUMA is bigger, but there this micro-optimization is likely less noticeable, since the big issue on NUMA was not the accesses to 'struct page', but the waitqueue accesses that were already removed by Nick's earlier commit). Acked-by: NNick Piggin <npiggin@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Bob Peterson <rpeterso@redhat.com> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Andrew Lutomirski <luto@kernel.org> Cc: Andreas Gruenbacher <agruenba@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-